Post by Steve Draper on Dec 30, 2013 14:26:21 GMT -8
For the past week or so, I've been working on blending heuristic signals into MCTS evaluation (to speed up convergence). For my test game I chose breakout (small) for two reasons:
I've tried a number of approaches to blending in the heuristic signals, which I won't diverge into here, but frustratingly I was not able to achieve a positive effect whatever I tried (I WAS able to remove the tendency to throw pieces away, but at the cost of not spotting some material-neutral/losing winning lines soon enough). For some time I wasn't sure if my heuristic blending mechanism was at fault, or its tuning parameters, or just that piece count isn't as useful a heuristic as I thought it would be for breakthrough.
Finally it occurred to me that the less time the MCTS had to converge, the more beneficial any positively correlated heuristic signal should be. I then tried reducing the play time (per move) in my test matches from 15 seconds to 5 seconds, and lo - the heuristic blending started to show a big advantage.
Unfortunately I'm not entirely sure what the lesson to take away from this is! It's one of more of:
I'm going to try full size breakthrough with the original time limit (15 secs) next, where I suspect I'll see a positive signal from the heuristic without super-fast play (but I'm far from certain!)...
- It was noticeable, watching play in this game, that my player (and indeed all those I have observed on Tiltyard) have a tendency to occasionally make give-away-a-piece-for-no-good-reason type moves, the elimination of which would significantly strengthen play (so anything that achieved that should be noticeable in test results, reasonably quickly)
- It's very easy to construct a (bespoke, so only immediately useful for testing purposes) heuristic for piece count in Breakthrough
I've tried a number of approaches to blending in the heuristic signals, which I won't diverge into here, but frustratingly I was not able to achieve a positive effect whatever I tried (I WAS able to remove the tendency to throw pieces away, but at the cost of not spotting some material-neutral/losing winning lines soon enough). For some time I wasn't sure if my heuristic blending mechanism was at fault, or its tuning parameters, or just that piece count isn't as useful a heuristic as I thought it would be for breakthrough.
Finally it occurred to me that the less time the MCTS had to converge, the more beneficial any positively correlated heuristic signal should be. I then tried reducing the play time (per move) in my test matches from 15 seconds to 5 seconds, and lo - the heuristic blending started to show a big advantage.
Unfortunately I'm not entirely sure what the lesson to take away from this is! It's one of more of:
- Any parameter tuning has to be in light of the play time allocated, so any tuning you've done at play times other than those you actually get exposed to might be questionable (which is a variable I could do without!)
- MCTS is a remarkably effective technique that converges pretty fast to the point where it is better than anything coming from such simplistic heuristics anyway, so given a reasonable thinking time the benefits of any heuristic will not be significant anyway (however, 'reasonable thinking time' will be HIGHLY dependent on the specific game and on the state machine being used)
- Breakthrough (small) is a bad choice, because on that board size any action is necessarily close to the end of the board, and so 'in sight' or terminal states for MCTS fairly easily
I'm going to try full size breakthrough with the original time limit (15 secs) next, where I suspect I'll see a positive signal from the heuristic without super-fast play (but I'm far from certain!)...