qfwfq
New Member
Posts: 29
|
Post by qfwfq on Apr 12, 2016 11:26:30 GMT -8
I have a basic NN that tries to predict the MC scores. I plan to experiment with improving the network next, then implement the TD learning. (Saying this just to confirm my interest in the competition.)
|
|
|
Post by Andrew Rose on Apr 12, 2016 13:53:37 GMT -8
I'm struggling to make much progress, but definitely still interested and actively working on it.
I have a plain MLP network which I can train on TTT - but I'm surprised by how long it takes to learn, especially considering that it's being given the full set of correct data (pre-calculated).
I've discovered that prototyping is much quicker in Keras/Tensorflow than with Neuroph (which is much more limited in terms of its feature set). Also, the Tensorflow backend is an order of magnitude quicker in training, and that's without engaging a GPU (which I can't do because (a) my Tensorflow setup is in a VM which doesn't give access to the GPU and (b) I suspect my GPU is too old).
I'm probably near the point of switching to back to Breakthrough and then trying convolutional networks.
Is anybody willing to share their corpus of MCTS scores for various Breakthrough positions? (Don't worry if not, I can always generate some of my own.)
|
|
rxe
Junior Member
Posts: 61
|
Post by rxe on Apr 25, 2016 14:26:27 GMT -8
Andrew: I can share 187k board states, if you are still interested. The data is trained with approximately 10k MCTS iterations. The trained network plays reasonable, with a 3-ply or 5-ply search: www.ggp.org/view/tiltyard/matches/84fe6bbfa9d88984470fa7a07e1e50d4c57eea30I am still stuck in the age of ftp, and don't use any new fangled cloud services... but I guess could upload it to a github repo (it is about 4 MB compressed).
|
|
|
Post by Andrew Rose on Apr 25, 2016 23:52:29 GMT -8
Don't worry - I've generated about 200K of my own states. They aren't very high quality, but now I've got the infrastructure, it'll be trivial to get a better database.
|
|
|
Post by Andrew Rose on Apr 25, 2016 23:57:12 GMT -8
And that game looks pretty reasonable. At least up until move 37 where Galvanise obviously decided that actually winning the game was a secondary objective to toying with the opponent!
|
|
rxe
Junior Member
Posts: 61
|
Post by rxe on Apr 28, 2016 17:01:21 GMT -8
And that game looks pretty reasonable. At least up until move 37 where Galvanise obviously decided that actually winning the game was a secondary objective to toying with the opponent! haha - toying with the opponent has been the goal along! I am finding it a bit over my head currently to understand to DQN in a board game context, or well even DQN at all. I need someone to explain it to me like a five year old, with crayons. My current approach is to retrain the network with the same data samples, but obtaining the scores by running the network with some minimax (well iterative deeping, but similar to minimax). But the results are less than promising thus far.
|
|
|
Post by Andrew Rose on Apr 29, 2016 11:24:29 GMT -8
Richard - I've finally got as far as playing with some different models. I started with a simple MLP (single hidden layer). It seems to learn a pretty good approximation to my dataset in a few minutes (mean position evaluation error ~2/100).
Then I switched to something closely approximating your model. Some small differences...
- 1 channel with inputs -1.0 (black) / 0.0 (empty) / +1.0 (white) - Single sigmoid output neuron (because I'm predicting state value rather than directly choosing the best move) - SGD optimizer (because it worked better than Adam when using the MLP - and also with this convolutional model)
It takes minutes for a single epoch. And even leaving it running for a while, it doesn't learn anything like as well as the vanilla MLP version.
Assuming you've been playing a bit more, have you had similar experiences? Also, are you training on a CPU or a GPU? I've got an nvidia GPU but (a) it's only compute capability 2.1, whereas I think 3.0 is required for DNN and (b) my Ubuntu system (required for Tensorflow) is virtualised inside Oracle's VirtualBox and guests don't have access to the host's GPU.
|
|
rxe
Junior Member
Posts: 61
|
Post by rxe on Apr 29, 2016 12:56:31 GMT -8
For breakthrough I currently use 4 channels. One each for white/black pieces. And one each for whose turn it (so all 1s for black channel, if black's turn). There are 3 convolution layers of 3x3 - and start with 1 layer of padding... the initial layers are therefore 10x10 and ends up reduced to 4x4. I did try some variations, but didn't see much difference... so left it alone. There is a 1x1 convolution at the end, and a small fully connected (like size 8). For convolution layers the network uses relu, and tanh for fully connected layers. I believe I went with tanh because I added dropout to last 2 layers, and I was worried about dying relus problem (which I had just read about! no good reason). Again trial and error... not really sure what I was doing. What I found was that it was overfitting after 2 epochs, and adding those drop out layers allowed it to run many more epochs. Currently I only run 8 epochs with about 20-50k samples, but it got some minor improvement running up to 64... but it seems to get unstable after a while, and I think it would have to automatically reduce the learning rate and I haven't got that sophisticated yet. I switched from Adam to RMSProp - I don't have any good to reason to give other than I probably googled "what should I use in keras for regression problems". I generalized so that outputs are for all scores, so can also train non-fixed sum games and n-player games. This was pretty much my starting configuration, albeit for a policy network. And there hasn't been much evolution - so have never tried MLP or sigmoids ever. It was based largely off this: github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py - without max pooling layers. It also takes several minutes per epoch for 20k samples. That is with an old gpu card (compute 2.0 I think), does not support cuDNN and I imagine it would be 10 times faster with a titan. I also use theano as tensorflow requires more recent cards. Did you look at deeplearning4j, it is java and has native support in windows (AFAICT).
|
|
|
Post by hedron on Oct 30, 2017 10:06:20 GMT -8
Hello! So now that Go has been solved with unsupervised learning by AlphaGo Zero, has GGP also been solved (what I mean is that it seems like DeepMind's approach is applicable to any GGP game)? Thanks.
|
|
|
Post by alandau on Oct 31, 2017 23:20:02 GMT -8
We can't say that until they actually start to evaluate the approach on other games, and find an approach that works without manual modifications or introduction of new techniques.
My suspicion (though I am neither an ML expert nor a Go expert) is that it's possible there are features of Go that make it more amenable to their approach than other games -- my understanding being that they have a deeply powerful reinforcement learning element that is responsible for its strength, paired with a search element that is not especially novel or interesting.
So for one example, if it were given a formulation of the Rubik's cube puzzle where the reward function was "1" for a solved state and "0" for unsolved, I find it unlikely that it would be able to solve it -- you can't learn anything if all your rewards are the same. You'd have to either find a way to introduce partial rewards, or seed it with an initial set of solutions.
More subtly (and further from my zone of credibility), I wonder if there are properties of high-level Go play that make it easier for this kind of given-X-do-Y reinforcement learning to be successful relative to other, ostensibly simpler games like chess. (Mainly I'm thinking about the relative rarity of the capture of a large number of pieces that causes a significant shift in a section of the board, if both players play well. This may cause the value of a move to be a more "sensible" function of the positions of other pieces across the board.) If that's true, then in some of these cases, the existing computer players may yet outplay AlphaGo Zero.
|
|
qfwfq
New Member
Posts: 29
|
Post by qfwfq on Nov 2, 2017 11:08:23 GMT -8
Yes, Zero shows that ML is indeed able to surpass human play through unsupervised learning, at least (for now) for zero-sum turn-based games where there is a relatively strong signal (initially the network plays essentially at random, so if there are goal values or important situations that are extremely unlikely to be encountered during random play, training could take an unreasonably long time, or converge to a local optimum that is radically worse than the global one). But I think there is lots of room to improve applying it efficiently to GGP, as GGP currently has to work with slow state machines, potentially short learning/metagaming time, weak signal in some games, no prior game knowledge like board representation and symmetries, blown-up sets of states or actions, etc. So while it may be solved "in principle", Zero can't be applied as is to all GGP zero-sum turn-based games and converge in reasonable amounts of times; Additional techniques will be needed for that (game state analysis, tweaks for weak signals, etc.) Then there are different types of games, like for example cooperative and game-theory games, where ML could do well, but in the end there is no particularly brilliant strategy if the other players don't cooperate.
|
|