UCB = averageScore + alpha*sqrt(ln(numChildVisits)/numParentVisits);
use:
UCB = averageScore + alpha*Rand.nextDouble()*sqrt(ln(numChildVisits)/numParentVisits);
(maybe adjusting the value of alpha up also). That should remove sensitivity to selecting multiple times before you are able to update the (virtual) visit count, and removes the need to perform updates on selection...
I tried running this with my workstation, since I don't have access to the big machine currently. Running with 10 cores on C4, and using the random in the ucb formula, the logs after meta time are:
Galvanise 04:13:59.640 [L3]: Done iterations 374502, depth 15.44, nodes 574359, remapped 99674, dupes 25, update 4050119
Galvanise 04:14:04.663 [L3]: Done iterations 743479, depth 11.77, nodes 826123, remapped 111230, dupes 15, update 3854056
Galvanise 04:14:09.582 [L3]: Done iterations 1142322, depth 13.34, nodes 1102407, remapped 116987, dupes 9, update 3529666
Galvanise 04:14:12.643 [L3]: Done iterations 1306466, depth 14.54, nodes 1216540, remapped 47574, dupes 5, update 2489018
And this is without the random in ucb formula, we see similar results but the dupes are much higher (meaning the node exists in the tree already).
Galvanise 04:23:08.778 [L3]: Done iterations 391082, depth 12.12, nodes 460432, remapped 100944, dupes 20719, update 3633172
Galvanise 04:23:13.854 [L3]: Done iterations 781077, depth 13.13, nodes 721023, remapped 99552, dupes 23648, update 3853796
Galvanise 04:23:18.880 [L3]: Done iterations 1184199, depth 14.66, nodes 992194, remapped 98998, dupes 26083, update 4087046
Galvanise 04:23:21.859 [L3]: Done iterations 1375963, depth 15.06, nodes 1123671, remapped 41253, dupes 14124, update 2562534
With my current solution, we see similar results to the adding in the random, but a little more iterations:
Galvanise 04:31:18.519 [L3]: Done iterations 459026, depth 13.82, nodes 605013, remapped 120717, dupes 404, update 3533040
Galvanise 04:31:23.573 [L3]: Done iterations 800785, depth 14.81, nodes 852022, remapped 86817, dupes 6, update 3851208
Galvanise 04:31:28.587 [L3]: Done iterations 1226236, depth 14.05, nodes 1162166, remapped 106543, dupes 10, update 4309097
Galvanise 04:31:31.588 [L3]: Done iterations 1464891, depth 14.83, nodes 1333543, remapped 62265, dupes 5, update 2626010
If I recall correctly with my current solution the results with running 60 cores on C4 were similar to the 10 cores above (ie it doesn't scale), so it would be interesting to run with your solution. Honestly, I am not sure sure the logs here mean that much - but dupes were a pretty good indicator that things were going bad.
I'll need to play around with some more and run some tests. Thanks Steve for the suggestion, I'll report more later.
(for reference, iterations: the number of full playouts since last reported, depth : average depth of tree playouts, nodes: the number of nodes in the tree, remapped: how many merges happened that weren't dupes, update: the number of microseconds spent actually doing back propagation in the main thread - AND I run this machine in a different timezone, and didn't do this at 4am !