Post by steadyeddie on Aug 10, 2015 14:00:42 GMT -8
In the beginning I had a MC_Tree_S and the world was simple and good.
OK, not that great, since I had a memory issue that I (incorrectly) blamed on my tree. So I cached the nodes based on states, and, well, nothing really happened other than I felt virtuous. Turns out what I had built was a DAG.
Since then it has bothered me that back propagation isn't simple. I can back propagate confidence as that is pretty easy. But what should I do with rollout back propagation?
What I do is back propagate just down the selection line. But that annoys me enormously as I feel I am throwing away information and it's not like I have a great deal of information in the first place. So at some point I decided to back propogate up the other parent lines too. That led to pretty immediate disaster. And the reason was that my player loved some variations, and that crept out in other top layer moves which suddenly looked great, _if_ the oppo were to follow the variation into the "good" line. At the very least this wastes rollouts down a duff first move, at worst it makes a poor first move look attractive.
So, what do folks do to avoid this? It seems to me I need to abandon the whole MCTS 1.01 mechanism of selection and move choice. I tried something minimax-like to avoid what I term the "Patzer sees a check, patzer gives a check" problem as my (weak) player is obsessed with giving check in Speed chess. Sadly I had to back that out as it didn't work at all well (*). I think I need to revisit that, but before I do, am I missing some genius bit of design?
(*) The issue was I needed to define where MCTS stopped and when minimax started. And it turned out that it was very important whether it stopped on odd or even ply, which distorted the scores at the top level. [ Now I've typed that, why didn't I always stop on, say, even ply moves? Duh. ]
OK, not that great, since I had a memory issue that I (incorrectly) blamed on my tree. So I cached the nodes based on states, and, well, nothing really happened other than I felt virtuous. Turns out what I had built was a DAG.
Since then it has bothered me that back propagation isn't simple. I can back propagate confidence as that is pretty easy. But what should I do with rollout back propagation?
What I do is back propagate just down the selection line. But that annoys me enormously as I feel I am throwing away information and it's not like I have a great deal of information in the first place. So at some point I decided to back propogate up the other parent lines too. That led to pretty immediate disaster. And the reason was that my player loved some variations, and that crept out in other top layer moves which suddenly looked great, _if_ the oppo were to follow the variation into the "good" line. At the very least this wastes rollouts down a duff first move, at worst it makes a poor first move look attractive.
So, what do folks do to avoid this? It seems to me I need to abandon the whole MCTS 1.01 mechanism of selection and move choice. I tried something minimax-like to avoid what I term the "Patzer sees a check, patzer gives a check" problem as my (weak) player is obsessed with giving check in Speed chess. Sadly I had to back that out as it didn't work at all well (*). I think I need to revisit that, but before I do, am I missing some genius bit of design?
(*) The issue was I needed to define where MCTS stopped and when minimax started. And it turned out that it was very important whether it stopped on odd or even ply, which distorted the scores at the top level. [ Now I've typed that, why didn't I always stop on, say, even ply moves? Duh. ]