Post by steadyeddie on Dec 18, 2016 6:21:57 GMT -8
As Bobby Fischer almost said "GGP-player sees a check, GGP-player plays a check".
I've really struggled with this more than any other aspect of MCTS. I believe the problem stems from the way exploration and exploitation are intertwined in the algorithm.
The problem is most obvious in speed chess because we understand that. A good number of players like the opening c3, Qb3. And for a long while SE usually followed that up with something like Qxb7?? A look into the tree shows that the problem stems from the fact that it's taken quite a few rollouts forBxb7 to emerge as a move that the tree wants to play down. This means that for a long while Qxb7 looks like a good move because all the other responses allow the queen to go an eating binge starting with Qxa8.
Given an infinite number of iterations this problem goes away. Min max would drop out of MCTS eventually. But instead you usually have a number of iterations which is insufficient to do both exploration and exploitation. So what to do?
My first effort at min-max was what you might expect; you look down the branches and pick the best. Except clearly you need to stop somewhere, since a sample of 1 is too few. So say you stop at 1000. Now the issue I had was that I'd find that one branch would end on an oppo move (which tended to score lower), and compare it to a branch ending on my move. So eventually I settled for just looking down 2 plies. Really, that few.
I also have a couple of self-invented anti patzer-sees-a-check features that I use in conjunction with min-max.
First is a variable I call "S". Bascially that is the ratio between the value and the visits part in the MCTS selection. For me, high S means a flatter distribution of rollouts. Lower S means more concentrated. Part of my candidate move algorithm is to start high at the start of the move, and then to gradually decrease it through the move. This means I consider all moves, but target patzer-sees-a-check situations towards the end because there will be a stronger emphasis on the rebutal move as S decreases.
The other is "K". That is the amount I multiply rollouts by before I add them, so a value of 20 means I count the rollout 20 times. Again, as an anti-patzer-sees-a-check I start K small, and increase it during the move, emphasising the exploitation towards the end of the move.
For a while I thought I could do better than the standard MCTS formula at the root. I thought I could calculate the probability that a move with a smaller score might be better than my current best move. I'd say that I should always have the same number of selections between "first" and "second" moves. Then I'd say that I'd try and make the probability of one of the other moves being better than the first, no better than the second by adding extra rollouts if necessary. It's just some stats maths. But it didn't work particularly well.
In the end I gave in and added support for actual candidate moves at the root. I start with all moves in, then reduce the number to 2 in the last second of the move. Like S and K this focusses the player on a couple of lines, just in case the #1 move is a patzer-sees-a-check.
I've really struggled with this more than any other aspect of MCTS. I believe the problem stems from the way exploration and exploitation are intertwined in the algorithm.
The problem is most obvious in speed chess because we understand that. A good number of players like the opening c3, Qb3. And for a long while SE usually followed that up with something like Qxb7?? A look into the tree shows that the problem stems from the fact that it's taken quite a few rollouts forBxb7 to emerge as a move that the tree wants to play down. This means that for a long while Qxb7 looks like a good move because all the other responses allow the queen to go an eating binge starting with Qxa8.
Given an infinite number of iterations this problem goes away. Min max would drop out of MCTS eventually. But instead you usually have a number of iterations which is insufficient to do both exploration and exploitation. So what to do?
My first effort at min-max was what you might expect; you look down the branches and pick the best. Except clearly you need to stop somewhere, since a sample of 1 is too few. So say you stop at 1000. Now the issue I had was that I'd find that one branch would end on an oppo move (which tended to score lower), and compare it to a branch ending on my move. So eventually I settled for just looking down 2 plies. Really, that few.
I also have a couple of self-invented anti patzer-sees-a-check features that I use in conjunction with min-max.
First is a variable I call "S". Bascially that is the ratio between the value and the visits part in the MCTS selection. For me, high S means a flatter distribution of rollouts. Lower S means more concentrated. Part of my candidate move algorithm is to start high at the start of the move, and then to gradually decrease it through the move. This means I consider all moves, but target patzer-sees-a-check situations towards the end because there will be a stronger emphasis on the rebutal move as S decreases.
The other is "K". That is the amount I multiply rollouts by before I add them, so a value of 20 means I count the rollout 20 times. Again, as an anti-patzer-sees-a-check I start K small, and increase it during the move, emphasising the exploitation towards the end of the move.
For a while I thought I could do better than the standard MCTS formula at the root. I thought I could calculate the probability that a move with a smaller score might be better than my current best move. I'd say that I should always have the same number of selections between "first" and "second" moves. Then I'd say that I'd try and make the probability of one of the other moves being better than the first, no better than the second by adding extra rollouts if necessary. It's just some stats maths. But it didn't work particularly well.
In the end I gave in and added support for actual candidate moves at the root. I start with all moves in, then reduce the number to 2 in the last second of the move. Like S and K this focusses the player on a couple of lines, just in case the #1 move is a patzer-sees-a-check.