Post 2014 championships discussions and info

Steve Draper
Global Moderator

Posts: 143

Post 2014 championships discussions and info Aug 20, 2014 12:06:20 GMT -8

Quote

Post by Steve Draper on Aug 20, 2014 12:06:20 GMT -8

During the chat sessions in qualifiers and the final event, several people indicated they would be publishing some information on interesting work in their players (that I can recall, other than my own promised spilling of the beans on Sudoku, Alex was going to talk some about his new propnets, and General's Bayesian techniques were going to be revealed).

I'm really looking forward to reading stuff from as many people as possible, and sharing techniques to drive thing forward generally. You guys still planning to publish...?

Last Edit: Aug 20, 2014 12:06:52 GMT -8 by Steve Draper

rxe
Junior Member

Posts: 61

Post 2014 championships discussions and info Aug 28, 2014 8:21:23 GMT -8

Quote

Post by rxe on Aug 28, 2014 8:21:23 GMT -8

I can't say I was quoted as one of the people who were going to publish anything. But here you go anyways (it is admittedly not very interesting).

Basically galvanise that ran in the competition was mostly a basic implementation of MCTS, which used 'killer moves' during rollouts (I think it was mostly the same idea as MAST from CadiaPlayer), with a fast bucket based roulette wheel to choose from legal propositions, and not slow down rollouts.

Here is a breakdown of stuff I worked on.

1. I spent about 20 times the effort I did for my initial write of galvanise for coursera trying to get things to scale up and failing miserably. Not because of synchronization issues. But what I learned (after painfully discovering 2 threads was performing better than 60 about a 10 days before the competition) that MCTS is really not embarrassingly parallel as it first may seem. It isn't obvious but my long story short is if you have 60 threads all doing the 'select' phase at the same time, they are all going to select the same thing. And then update 60 times as if you ran it once, the tree statistics gets really messed up. In the end I updated the tree in each thread with a virtual count(ugh), and had to add explicit locking per thread in the slow path when the number of threads became larger than the number of legal moves available. The upshot is that is great for complex games, and bad for simple games like C4, where most threads as spinning on a lock most the time.

2. Tried a bunch of ad hoc variations of my own creation to replace UCB in selection phase, including some randomization (for 1), using variance and moving averages. None really seem to do better across all games, so for the competition I reverted to plain UCB.

3. I toyed with using an EMA (exponential moving average) for back propagation which actually I think works reasonable in some circumstances, but I disabled it for the competition.

4. One thing that I found that really helped was the integration of forced wins into the selection and back-propagation phases. This was also corrupting the statistics (the fix was easy but was fixed post-competition). Any node that had a 'finalised' move of score 100 for the player (and the other players were noops), could then always choose this move. Then that node can be finalised. Very quickly it can propagate back to the root what might take forever (or never) with normal convergence.

I am currently looking at evolving neural networks for node evaluation (so I am firmly in the learning camp, that learning between games is a good thing [as long as learning is only done during play/meta time] - although it is still ambiguous what the actually rules are here), hence to try something different from MCTS. It is a long, long way off, but since my first 3-4 months ggp was a whirlwind and two competitions I feel I have a long time to play.

Steve Draper
Global Moderator

Posts: 143

Post 2014 championships discussions and info Aug 28, 2014 10:52:39 GMT -8

Quote

Post by Steve Draper on Aug 28, 2014 10:52:39 GMT -8

Richard,

Interesting stuff - I'd love to hear more about your approach to identifying killer moves for the playouts (MAST is just a global weighting learnt for all moves, independent of state they are encountered in I believe - was yours the same?).

On your numbered points:

1) Yes indeed. In Sancho we have a similar issue, because we perform rollouts asynchronously from selection (selection passes queue rollout requests, which don't complete and update the tree until later). The result is the same issue you were encountering, that multiple selections occur before the results of the previous are known. We handle this by replacing the numVisits in each node with two quantities - numVisits (used by selection and updated also by selection, so that it DOES change synchronously as it is used to calculate UCT exploration values on the selected path), and numUpdates, which is an independent count used and updated on the back-propagation pass. In the simplistic scheme numUpdates simply lags numVisits slightly in effect. More recently (post 2014 competition) I've done some work to weight different playouts differently, and as a consequence numUpdates has now become a double rather than an int, and may diverge from numVisits somewhat. I'll probably write a blog post on this at some point, but it's not a radical gain (it improves a few games significantly, but overall it's not a dramatic win, and probably wasn't worth the 2-3 weeks it cost me to get it all working right)

(2)/(3) I have plans to experiment with moving away from the normal back-propagation scheme of MCTS in an attempt to remove distortions that occur in a few cases (below), which I'm hoping to do by considering a node to have a score, and a confidence. Selection then occurs based on exploitation+exploration from the children's score+confidence in the selection phase, foregoing back-propagation as a means to calculate, but rather always calculating based on the values in the immediate children (and caching on calculation, so back prop would just propagate some dirty flags). The motivation for this (if I can make it work, which remains to be seen) are that MCTS back-prop copes badly in two important cases:
i) On transpositions, where a just expanded node might have some children (transposed into) that are well explored already
ii) On completion - i.e. - when a child's true value becomes known with complete confidence - if you propagate terminality up the tree where possible (wins that can always be chosen make the parent a win too, all children being losses makes the parent a loss), then node scores can suddenly make a step-change as they become know to be absolute wins/losses. MCTS back-prop doesn't handle step changes like this well, and the parents will take some time to converge in light of the new information

4) Yes. Sancho has done this since its origins in Quixote during the Coursera course. It does indeed help a lot

I'll be very interested in how you get on with the neural networks. Since the end of the first run Coursera course I've been wanting to apply genetic algorithms to heuristic pattern selection(which will require offline learning). I probably still won't get round to looking into that though for at least another 6 months or so (too much lower hanging fruit still first). I definitely think that learning techniques generally are the way forward.

rxe
Junior Member

Posts: 61

Post 2014 championships discussions and info Aug 28, 2014 15:33:48 GMT -8

Quote

Post by rxe on Aug 28, 2014 15:33:48 GMT -8

Thanks Steve for the feedback! Let me briefly reply to some of your points/queries.

I think it pretty much the same as MAST. I take the global weighting during the first n seconds (both tree playout and rollout) of meta time. As well as the average score, also calculated the variance, and if any moves are scored 100 with no variance then they are flagged winnable and
they get treated specially during rollouts to play that move with 99% probability (this is a day and night difference in games like breakthrough - as I have no other greedy mechanism), before doing the roulette wheel choosing.

Glad to hear I am not alone with the statistics being messed up.

I think we solved it similarly (called it virtual visits, so it is indeed an extra field that is independent from normal visits). It was annoying cause it is the only point where the worker threads write to memory, and broke the whole model where all communication between worker and main were via a channel. I thought about sending the current value of the visits in the channel, and letting the back-propagation code handle it intelligently, but for some reason I can't recall decided against it. Note all MCTS phases except propagation are in the worker threads, and everything is lock free data structures. Although I now think that the lock free data structures are overkill, and using a 64 bit integer to encode score/visits would be sufficient (providing java does write this atomically to memory, and doesn't do two 32 bit writes).
Look forward to reading the blog post using a double.

As I suggested in section 4, the statistics were being messed up via the finalisation of the nodes. The strongly winnable - forced win - thing is just a special case in the finalisation of the nodes. Post competition I implemented, funny enough, what you said in ii), the messing up is exactly the step change you were talking about. I need to do something with i) also, although conceptually thought of as merging a node into the tree (which is a DAG) - is that what you mean by transpositions?

Last Edit: Aug 28, 2014 15:42:27 GMT -8 by rxe

Steve Draper
Global Moderator

Posts: 143

Post 2014 championships discussions and info Aug 29, 2014 4:28:12 GMT -8

Quote

Post by Steve Draper on Aug 29, 2014 4:28:12 GMT -8

Aug 28, 2014 15:33:48 GMT -8 rxe said:

As I suggested in section 4, the statistics were being messed up via the finalisation of the nodes. The strongly winnable - forced win - thing is just a special case in the finalisation of the nodes. Post competition I implemented, funny enough, what you said in ii), the messing up is exactly the step change you were talking about. I need to do something with i) also, although conceptually thought of as merging a node into the tree (which is a DAG) - is that what you mean by transpositions?

Yup you have understood what I meant perfectly I think. The transpositions just create links to nodes holding the same state that already exist, thus turning the tree into a DAG (acyclic if the game is well-formed or else there would be paths of unbounded length, which is not supposed to be allowed in a well formed game)

Andrew Rose
Global Moderator

Posts: 100

Post 2014 championships discussions and info Aug 29, 2014 13:11:03 GMT -8

Quote

Post by Andrew Rose on Aug 29, 2014 13:11:03 GMT -8

Richard,

Just to answer your question about Java and 64-bit writes: the relevant bit of the Java spec (http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#17.7) allows them to be performed as 2x 32-bit writes. This affects both longs and doubles (but not object references, which are always guaranteed to be updated atomically, irrespective of whether they're implemented as 32- or 64-bit pointers under the covers). However, the 64-bit HotSpot JVM certainly writes them atomically and I would have thought that most other 64-bit JVMs do that too.

Out of interest, are you using HotSpot or have you got your hands on a high-performance JVM?

Andrew

rxe Junior Member Posts: 61	Post 2014 championships discussions and info Aug 29, 2014 15:06:23 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by rxe on Aug 29, 2014 15:06:23 GMT -8 Thanks Andrew - that is great to know. Haven't tried anything other than 64-bit HotSpot. I tried both 1.7 and 1.8 and didn't notice any real runtime differences between them.

rxe Junior Member Posts: 61	Post 2014 championships discussions and info Aug 29, 2014 15:06:36 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by rxe on Aug 29, 2014 15:06:36 GMT -8 deleted
	Last Edit: Aug 29, 2014 15:09:32 GMT -8 by rxe: dupe

Steve Draper
Global Moderator

Posts: 143

Post 2014 championships discussions and info Aug 31, 2014 6:26:09 GMT -8

Quote

Post by Steve Draper on Aug 31, 2014 6:26:09 GMT -8

Aug 28, 2014 8:21:23 GMT -8 rxe said:

1. I spent about 20 times the effort I did for my initial write of galvanise for coursera trying to get things to scale up and failing miserably. Not because of synchronization issues. But what I learned (after painfully discovering 2 threads was performing better than 60 about a 10 days before the competition) that MCTS is really not embarrassingly parallel as it first may seem. It isn't obvious but my long story short is if you have 60 threads all doing the 'select' phase at the same time, they are all going to select the same thing. And then update 60 times as if you ran it once, the tree statistics gets really messed up. In the end I updated the tree in each thread with a virtual count(ugh), and had to add explicit locking per thread in the slow path when the number of threads became larger than the number of legal moves available. The upshot is that is great for complex games, and bad for simple games like C4, where most threads as spinning on a lock most the time.

2. Tried a bunch of ad hoc variations of my own creation to replace UCB in selection phase, including some randomization (for 1), using variance and moving averages. None really seem to do better across all games, so for the competition I reverted to plain UCB.

Richard, thinking about this a bit more, I am surprised that the randomization did not work. Did you try just treating the exploration part of the selection formula stochastically? That is to say instead of:

UCB = averageScore + alpha*sqrt(ln(numChildVisits)/numParentVisits);

use:

UCB = averageScore + alpha*Rand.nextDouble()*sqrt(ln(numChildVisits)/numParentVisits);

(maybe adjusting the value of alpha up also). That should remove sensitivity to selecting multiple times before you are able to update the (virtual) visit count, and removes the need to perform updates on selection...

rxe
Junior Member

Posts: 61

Post 2014 championships discussions and info Sept 1, 2014 3:10:25 GMT -8

Quote

Post by rxe on Sept 1, 2014 3:10:25 GMT -8

Aug 31, 2014 6:26:09 GMT -8 Steve Draper said:

UCB = averageScore + alpha*sqrt(ln(numChildVisits)/numParentVisits);

use:

UCB = averageScore + alpha*Rand.nextDouble()*sqrt(ln(numChildVisits)/numParentVisits);

(maybe adjusting the value of alpha up also). That should remove sensitivity to selecting multiple times before you are able to update the (virtual) visit count, and removes the need to perform updates on selection...

I tried running this with my workstation, since I don't have access to the big machine currently. Running with 10 cores on C4, and using the random in the ucb formula, the logs after meta time are:

Galvanise 04:13:59.640 [L3]: Done iterations 374502, depth 15.44, nodes 574359, remapped 99674, dupes 25, update 4050119
Galvanise 04:14:04.663 [L3]: Done iterations 743479, depth 11.77, nodes 826123, remapped 111230, dupes 15, update 3854056
Galvanise 04:14:09.582 [L3]: Done iterations 1142322, depth 13.34, nodes 1102407, remapped 116987, dupes 9, update 3529666
Galvanise 04:14:12.643 [L3]: Done iterations 1306466, depth 14.54, nodes 1216540, remapped 47574, dupes 5, update 2489018

And this is without the random in ucb formula, we see similar results but the dupes are much higher (meaning the node exists in the tree already).

Galvanise 04:23:08.778 [L3]: Done iterations 391082, depth 12.12, nodes 460432, remapped 100944, dupes 20719, update 3633172
Galvanise 04:23:13.854 [L3]: Done iterations 781077, depth 13.13, nodes 721023, remapped 99552, dupes 23648, update 3853796
Galvanise 04:23:18.880 [L3]: Done iterations 1184199, depth 14.66, nodes 992194, remapped 98998, dupes 26083, update 4087046
Galvanise 04:23:21.859 [L3]: Done iterations 1375963, depth 15.06, nodes 1123671, remapped 41253, dupes 14124, update 2562534

With my current solution, we see similar results to the adding in the random, but a little more iterations:

Galvanise 04:31:18.519 [L3]: Done iterations 459026, depth 13.82, nodes 605013, remapped 120717, dupes 404, update 3533040
Galvanise 04:31:23.573 [L3]: Done iterations 800785, depth 14.81, nodes 852022, remapped 86817, dupes 6, update 3851208
Galvanise 04:31:28.587 [L3]: Done iterations 1226236, depth 14.05, nodes 1162166, remapped 106543, dupes 10, update 4309097
Galvanise 04:31:31.588 [L3]: Done iterations 1464891, depth 14.83, nodes 1333543, remapped 62265, dupes 5, update 2626010

If I recall correctly with my current solution the results with running 60 cores on C4 were similar to the 10 cores above (ie it doesn't scale), so it would be interesting to run with your solution. Honestly, I am not sure sure the logs here mean that much - but dupes were a pretty good indicator that things were going bad.
I'll need to play around with some more and run some tests. Thanks Steve for the suggestion, I'll report more later.

(for reference, iterations: the number of full playouts since last reported, depth : average depth of tree playouts, nodes: the number of nodes in the tree, remapped: how many merges happened that weren't dupes, update: the number of microseconds spent actually doing back propagation in the main thread - AND I run this machine in a different timezone, and didn't do this at 4am !

General Game Playing

Post 2014 championships discussions and info

Post by Steve Draper on Aug 20, 2014 12:06:20 GMT -8

Post by rxe on Aug 28, 2014 8:21:23 GMT -8

Post by Steve Draper on Aug 28, 2014 10:52:39 GMT -8

Post by rxe on Aug 28, 2014 15:33:48 GMT -8

Post by Steve Draper on Aug 29, 2014 4:28:12 GMT -8

Post by Andrew Rose on Aug 29, 2014 13:11:03 GMT -8

Post by rxe on Aug 29, 2014 15:06:23 GMT -8

Post by rxe on Aug 29, 2014 15:06:36 GMT -8

Post by Steve Draper on Aug 31, 2014 6:26:09 GMT -8

Post by rxe on Sept 1, 2014 3:10:25 GMT -8

Quick Reply