|
Post by alandau on Aug 10, 2015 20:49:51 GMT -8
Most of you have probably seen (or at least heard of) the GIGA-15 papers, but there was another group at IJCAI doing related work, the Computer Games Workshop: www.lamsade.dauphine.fr/~cazenave/cgw2015/cgw2015.htmlThere are a few relevant papers there. The one I'm most interested in is Marc Chee's paper (with Abdallah Saffidine and Michael Thielscher) on how to deal with "chunking" in UCT: www.lamsade.dauphine.fr/~cazenave/cgw2015/Chee.pdfI've mentioned before that Alloy has a propnet implementation that computes depth charges in batches, and Sam has done similar work using a GPU. We've both found that the technique appears to severely under-perform in actual game play compared to the raw numbers. I was intending to abandon the approach, but now I'm wondering if the techniques in the paper could salvage this approach. It apparently also applies to DAGs and transposition tables.
|
|
rxe
Junior Member
Posts: 61
|
Post by rxe on Aug 11, 2015 15:39:09 GMT -8
I seem to have a perpetual battle in trying to maintain a balanced tree - ever since the days of coursera and an 80 core machine where the more I tried to scale up, the worse it played. Been feeling a little more on top of this recently, but then again probably not. With transpositions and a solver, step wise changes to score are extremely detrimental to the tree - but I found adding a simple global tree version to galvanise which is incremented each back-propagation, and then checking each node (and edge) during node selection - and if triggered, will recursively recalculate the scores of each child. It is very slow, but has produced nice results and countered the effects of the step wise changes (and replaced the EMA code I had mentioned before). galvanise has a traversal count on each edge - which is different from the node visits (apparently similar to Sancho as per Steve in another thread). This finally has made sense to me (trust me it has hurt my head forever) that it is better to use traversals, for the simple reason that you have to look at it from the sole point of view of each node, and it is that's nodes exploitation/exploration balance that is important. Relevant to Alex's post here though, was just a few weeks ago, I found (purely by accident) that buffering the back propagation of results from worker threads so that there is only ever one back-propagation allowed and then start a new selection/worker run - and this has kept the traversals far more balanced (whereas before it would do all backprogations at once). This was a bit of 'aha' moment (maybe one year in, I finally understood UCB1), and I apologize for going on about what galvanise does and if this sounds like noise/nonsense - but honestly, I have spent 90% of time on this in the last 12 months. Thanks for the paper, hopefully I can wrap my head around this and finally gain some theoretical understanding.
Finally, for what its worth, it does seem that RAVE is far more immune to these imbalances - which I guess might be because it is not so reliant on the UCB component for exploration. With galvanise the UCB constant starts very low when RAVE is enabled (which is always enabled now), and there is much more chance of blindly charging down some arbitrary path. I do set the UCB constant much higher at the root node, which prevents a subset of these bad 'runaways'.
|
|