SteadyEddie (part 12/16): Reinforced learning + rollout

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 12/16): Reinforced learning + rollout Dec 18, 2016 6:25:32 GMT -8

Quote

Post by steadyeddie on Dec 18, 2016 6:25:32 GMT -8

I use reinforced learning in my rollout policy in one and many player games. But for puzzles it turns out to be very important.

Like everyone else, I found suduko hard. Even once I employ the missing signal hack, I was finding that I'd play a poor move and be lost. So I figured what I wanted to do is encourage my player to play the moves it thought were good during a rollout, in order to influence which moves got better scores in the tree.

So during a rollout I select the move with the best score 16 times more often than the move I think is the weakest.

Importantly, as the results for those rollouts come in, I apply them back to the policy. For many puzzles this accentuates the path to the correct solution.

Last Edit: Dec 18, 2016 12:12:55 GMT -8 by steadyeddie

Andrew Rose
Global Moderator

Posts: 100

SteadyEddie (part 12/16): Reinforced learning + rollout Jan 27, 2017 14:30:36 GMT -8

Quote

Post by Andrew Rose on Jan 27, 2017 14:30:36 GMT -8

Interesting. I'm pretty sure Sancho doesn't do anything like that. (It has a Steve special for Sudoku-like games.) Is this just of helpful for puzzles with graduated scores? (Oh, I guess that's all puzzles with your depth proxy.)

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 12/16): Reinforced learning + rollout Jan 27, 2017 14:46:17 GMT -8

Quote

Post by steadyeddie on Jan 27, 2017 14:46:17 GMT -8

Yes, that's right. It helps most with the depth proxy games, hidato is the best example. It's not as good with knights tour large, as there's not much learning to be done. It does help with max knights.

It's no good with your evil queens unguided.

General Game Playing

SteadyEddie (part 12/16): Reinforced learning + rollout

Post by steadyeddie on Dec 18, 2016 6:25:32 GMT -8

Post by Andrew Rose on Jan 27, 2017 14:30:36 GMT -8

Post by steadyeddie on Jan 27, 2017 14:46:17 GMT -8

Quick Reply