|
Post by steadyeddie on Dec 18, 2016 6:20:19 GMT -8
Adding the results of a rollout back up the selection path is pretty simple, there is no secret there.
What is more surprising, is that propagating the information to parent nodes which are not the selection path seems to be actively unhelpful. Why is that? Well, if you try and back propagate immediately then because you are expanding down better lines, you'll be pushing "good" rollouts up to poorer first moves. This results in the tree getting excited about these poor moves.
So instead, what about storing the rollout, and then returning it if a selection comes down to that node via a different parent? I couldn't make this work, but I don't know why. There's an article on the forum somewhere with Steve and I discussing this- he's probably got better ideas than me, I'd look that up.
I don't think expansion is that complex. On Steve's advice (there's a forum post somewhere), I fake a rollout result if I hit a terminal node that the parent likes. So if I can choose a 100 score, then I back propagate it. Or if all the children are now confident (terminal). But not otherwise.
What is slightly more interesting is back propagation of confidence. If I have a non-confident node, and I discover, following a back expansion notification from a child, that all my children are now confident, then I change my confidence to 100, and nothing all my parents for them to do the same. Note that is different to back propagation rollouts.
|
|
qfwfq
New Member
Posts: 29
|
Post by qfwfq on Jan 3, 2017 23:55:16 GMT -8
Thanks for the description of your player! Good observation about the drawback of backpropagating along multiple paths. You said that you use a 'cache' to find nodes with the same state that you already visited. When do you clear it or remove entries from it?
|
|
|
Post by steadyeddie on Jan 4, 2017 16:30:51 GMT -8
I assume that question is from the post on MCTS basics ggp.boards.net/thread/29829/steadyeddie-4-16-mcts-basics. So I have function to hash a position. [ In the underlying (Sancho!) propnet this is implemented as some maths on the bitset of the states (stored as longs). I'm pleased to say that this is the only time that I've actually managed to give anything back to Andrew/Steve as they were based on apache's OpenBitSet which it seems has a poor implementation of hash resulting in a depressing number of clashes for GGP positions, so I implemented something better I think. Another reason to run the profiler- it pointed at the HashTable I had added as a bottleneck, and from then it was fairly easy to find the bug. ] And now I just have a huge table of hash of position(% size)->node. Its like 100M entries large (so few clashes) . Now when I have a new position I look up to see if there is an entry with the hash. If so I look to see if that has exactly the same state as the new position in my hand. If so, great, I now have two parents pointing at the same child. If not, it's a clash, I just create a new node as if there was no clash. When I clean up between moves, if a node becomes recycled, I remove it from the cache, if it is there. It's critical that nodes that are on the recycle pile are removed, so I need to do the recycling on the control thread.
|
|