|
Post by steadyeddie on Dec 18, 2016 6:25:32 GMT -8
I use reinforced learning in my rollout policy in one and many player games. But for puzzles it turns out to be very important.
Like everyone else, I found suduko hard. Even once I employ the missing signal hack, I was finding that I'd play a poor move and be lost. So I figured what I wanted to do is encourage my player to play the moves it thought were good during a rollout, in order to influence which moves got better scores in the tree.
So during a rollout I select the move with the best score 16 times more often than the move I think is the weakest.
Importantly, as the results for those rollouts come in, I apply them back to the policy. For many puzzles this accentuates the path to the correct solution.
|
|
|
Post by Andrew Rose on Jan 27, 2017 14:30:36 GMT -8
Interesting. I'm pretty sure Sancho doesn't do anything like that. (It has a Steve special for Sudoku-like games.) Is this just of helpful for puzzles with graduated scores? (Oh, I guess that's all puzzles with your depth proxy.)
|
|
|
Post by steadyeddie on Jan 27, 2017 14:46:17 GMT -8
Yes, that's right. It helps most with the depth proxy games, hidato is the best example. It's not as good with knights tour large, as there's not much learning to be done. It does help with max knights.
It's no good with your evil queens unguided.
|
|