|
Post by steadyeddie on Dec 18, 2016 6:23:57 GMT -8
So, first up, if you see a score 100 from a rollout, I have special code to remember the path, and play it out as a plan. Easy.
Now, I've found that it's better to be much flatter with the rollouts than with two player games. Hence for puzzles I use higher values of S, and I vary K less (see previously).
I also don't allow a node to be expanded until it has had a rollout at least once. I've no idea why this is better, it's something I coded at the start and when I switch it off SE gets worse at puzzles.
When all my rollouts say "0", I use the depth as a proxy for the rollout score (with some sort of hack to get the value between 0 and 100). It's dirty. But you could see if as a way of maximizing the amount of time I stay alive.
|
|
|
Post by Andrew Rose on Jan 27, 2017 14:17:23 GMT -8
Hmm, so with your "depth as a proxy for score", my comments on post 10/16 obviously don't apply. What portion of puzzles is the depth proxy actually helpful for? Are there any where it's obviously harmful?
For something like Queens or Max Knights, it's clearly good. But for Rubik's cube it presumably doesn't help at all?
|
|
|
Post by steadyeddie on Jan 30, 2017 14:46:49 GMT -8
Yes that's right, its no use for Rubiks. It only engages when all the rollouts (I think 10000 in a row?) are 0. If there is any signal at all that is probably better.
|
|