MCTS and DAG back propogation

MCTS and DAG back propogation Aug 10, 2015 14:00:42 GMT -8

Quote

Post by steadyeddie on Aug 10, 2015 14:00:42 GMT -8

In the beginning I had a MC_Tree_S and the world was simple and good.

OK, not that great, since I had a memory issue that I (incorrectly) blamed on my tree. So I cached the nodes based on states, and, well, nothing really happened other than I felt virtuous. Turns out what I had built was a DAG.

Since then it has bothered me that back propagation isn't simple. I can back propagate confidence as that is pretty easy. But what should I do with rollout back propagation?

What I do is back propagate just down the selection line. But that annoys me enormously as I feel I am throwing away information and it's not like I have a great deal of information in the first place. So at some point I decided to back propogate up the other parent lines too. That led to pretty immediate disaster. And the reason was that my player loved some variations, and that crept out in other top layer moves which suddenly looked great, _if_ the oppo were to follow the variation into the "good" line. At the very least this wastes rollouts down a duff first move, at worst it makes a poor first move look attractive.

So, what do folks do to avoid this? It seems to me I need to abandon the whole MCTS 1.01 mechanism of selection and move choice. I tried something minimax-like to avoid what I term the "Patzer sees a check, patzer gives a check" problem as my (weak) player is obsessed with giving check in Speed chess. Sadly I had to back that out as it didn't work at all well (*). I think I need to revisit that, but before I do, am I missing some genius bit of design?

(*) The issue was I needed to define where MCTS stopped and when minimax started. And it turned out that it was very important whether it stopped on odd or even ply, which distorted the scores at the top level. [ Now I've typed that, why didn't I always stop on, say, even ply moves? Duh. ]

MCTS and DAG back propogation Aug 10, 2015 15:07:44 GMT -8

Quote

Post by Steve Draper on Aug 10, 2015 15:07:44 GMT -8

Aug 10, 2015 14:00:42 GMT -8 steadyeddie said:

Annoys the hell out of me too! Have experimented with variations but found that just propagating the selected line seems to work best. HOWEVER, we don't throw away the extra information entirely since we hold an edge traversal count as well as a node visit count, and when we back-propagate through nodes where those counts indicate that some of its value comes from other paths we modify the back propagated score (as seen further up the 'tree') to be a weighted blend of the rollout and the estimate of the node, which partially came from the other branches.

I've spent quite a long time experimenting with variants, and have not found an entirely satisfactory mechanism. I'd really like to throw out the MCTS update mechanism entirely, and calculate the selection score of each child in a more Bayesian manner from the known information about each immediate child at selection time. However, haven't got anything compelling to show for it yet - it's something I tend to come back to and tinker with from time to time! (usually in a much simpler reference player rather than in Sancho itself)

Post by steadyeddie on Aug 10, 2015 14:00:42 GMT -8

Post by Steve Draper on Aug 10, 2015 15:07:44 GMT -8

Quick Reply