Paper on handling batches of rollouts in UCT

Paper on handling batches of rollouts in UCT Aug 11, 2015 15:39:09 GMT -8

Quote

Post by rxe on Aug 11, 2015 15:39:09 GMT -8

I seem to have a perpetual battle in trying to maintain a balanced tree - ever since the days of coursera and an 80 core machine where the more I tried to scale up, the worse it played. Been feeling a little more on top of this recently, but then again probably not. With transpositions and a solver, step wise changes to score are extremely detrimental to the tree - but I found adding a simple global tree version to galvanise which is incremented each back-propagation, and then checking each node (and edge) during node selection - and if triggered, will recursively recalculate the scores of each child. It is very slow, but has produced nice results and countered the effects of the step wise changes (and replaced the EMA code I had mentioned before). galvanise has a traversal count on each edge - which is different from the node visits (apparently similar to Sancho as per Steve in another thread). This finally has made sense to me (trust me it has hurt my head forever) that it is better to use traversals, for the simple reason that you have to look at it from the sole point of view of each node, and it is that's nodes exploitation/exploration balance that is important. Relevant to Alex's post here though, was just a few weeks ago, I found (purely by accident) that buffering the back propagation of results from worker threads so that there is only ever one back-propagation allowed and then start a new selection/worker run - and this has kept the traversals far more balanced (whereas before it would do all backprogations at once). This was a bit of 'aha' moment (maybe one year in, I finally understood UCB1), and I apologize for going on about what galvanise does and if this sounds like noise/nonsense - but honestly, I have spent 90% of time on this in the last 12 months. Thanks for the paper, hopefully I can wrap my head around this and finally gain some theoretical understanding.

Finally, for what its worth, it does seem that RAVE is far more immune to these imbalances - which I guess might be because it is not so reliant on the UCB component for exploration. With galvanise the UCB constant starts very low when RAVE is enabled (which is always enabled now), and there is much more chance of blindly charging down some arbitrary path. I do set the UCB constant much higher at the root node, which prevents a subset of these bad 'runaways'.

General Game Playing

Paper on handling batches of rollouts in UCT

Post by alandau on Aug 10, 2015 20:49:51 GMT -8

Post by rxe on Aug 11, 2015 15:39:09 GMT -8

Quick Reply