SteadyEddie (part 16/16): Neural Ned

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 16/16): Neural Ned Dec 18, 2016 6:31:58 GMT -8

Quote

Post by steadyeddie on Dec 18, 2016 6:31:58 GMT -8

My dream is to create a player which has as few as possible, or better still zero, specific bits of code to deal with specific kinds of games. I will not, for example, add piece heuristics to my player.

I mentioned above that I didn't have great results when I tried to use states to calculate RAVE scores. And that fact put me off doing what I intended to do with my GGP player from the start, which is to get it to play in the way I play chess as best I can. Clearly that involves a lot of decision making that is based on state, rather than moves.

Sadly events overtook me somewhat, and the AlphaGo people showed us what is possible with a non-general game player. Their nature paper is actually fairly intelligible, even to me. Whereas, sorry academic chaps, the normal GGP ones I find unnecessarily obfiscated. (If someone could translate the Woodstock paper into something I can understand that'd be grand).

First, here are some ideas that didn't work.

I tried using "neuroph" which is a popular Java based neural network. This took too long to train. Like a whole night for something a bit iffy. A bit iffy might be OK in conjunction with MCTS, but a whole night not so much. I intended to train it slowly, and store the results, but that felt like cheating. So with reluctance I figured I'd have to try something a little more tractible, and having a maths background I know the way you usually do this is by assuming linearity, as that is usually soluble. And maybe the outrageous simplification is not so important as I trust in MCTS to sort out the bumps.

So, I imagined a "neural network" with linear functions between the nodes. Obviously any intermediate layers are irrelevant in such a network and so we are just talking a dot vector multipication between inputs and output. And there are ways of optimizing the "least squares" approximation for such a system.

Least squares of what? Well, I've tried a few options. One option is to try the positions from the MCTS tree. These are more accurate as they are over multiple rollouts. Unfortunately the tree is never really that deep, so most of the states don't change. Instead I tried positions from rollouts. But which? Trying all, leads to too many positions that are the same, so currently I have a sample rate of 1 in 100. And I'm confused about whether to include terminal positions or not. I currently don't include them, but I'm pretty sure that's not right.

And how to do least squares. QR decompostion is one such way. Unfortunately it does not cope at all well with columns that don't change, and there are often many of those. You can preprocess the data to remove columns that don't change, but it's a bit annoying and prone to error. Instead I use SVD, and that worked out a lot better. Basically you end up being able to read the weights from the diagonal. I used apache's QR and SVD routines, but I found that these did bad things to rollouts while they were in their constructor. Instead I hacked out the time intensive initialization into a separate routine, and the JVM issues went away.

But though I got decent enough results with that, I wasn't convinced. I think the issue is that the computation takes so long that too few positions can be processed to make a decent eval function. It's pretty good, but not brilliant.

So instead I go for a "nudge" approach similar to the way in which (I think) a neural network is trained. Basically you feed in each position and for each you nudge up or down for each state so the resulting output is close to the actual rollout result. Then you hope that with enough data the actually important states will get "+1"'d statistically more to the point that they are in balance with the actual value they should be. And tada, all your latches and piece heuristics drop out ?

Well in theory. But I still have a problem with the RAVE eval- the move version still works better. I don't know why, perhaps it's bugged. But what about the rollouts- in theory that's the most exciting bit, because as you use a better policy for the rollout, surely the rollouts will get better, and hence the data you get out is better, in a wonderful reinforced learning experience?

Well one problem is the huge drop in rollout numbers. I'm afraid I'm rather brute force about this. For each move I'm looking at all the possible moves and working out if they were made what changes there would be to the state. This is heavy on computation roughly proportional to the number of moves per ply, though in some games (C4 for example) a lot is made back because the rollout is shorter as better moves are played.

Also my rollouts now suffer with patzer-sees-a-check in spades. I think I need to look 2 ply down the tree, but that's going to be even worse computationally. Oh, and it's not clear if now I've got a "nudge" approach whether I need things to be linear.

So, a work in progress.

There you go, 16 posts containing all the SteadyEddie design. Hope you enjoyed them.

Last Edit: Dec 18, 2016 12:08:28 GMT -8 by steadyeddie

rxe
Junior Member

Posts: 61

SteadyEddie (part 16/16): Neural Ned Dec 22, 2016 6:41:40 GMT -8

Quote

Post by rxe on Dec 22, 2016 6:41:40 GMT -8

Thanks for writing up, SteadyEddie.

Nudging individual states seems an interesting idea. If I understand it correctly (which I probably don't) I kind of do something similar for piece detection, albeit with a group of states. I might explore this a bit if you don't mind.

As for your tree rollout speeds, wow! Most games for galvanise are around 30k tree rollouts and peaks at 60k on say connect four. And it turns out to be a tad less on the amazon machines. I found 7 was the magic number of threads before performance drops off.

Last Edit: Dec 22, 2016 6:43:35 GMT -8 by rxe

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 16/16): Neural Ned Dec 24, 2016 8:43:15 GMT -8

Quote

Post by steadyeddie on Dec 24, 2016 8:43:15 GMT -8

No problem copying any ideas I've written up. It's kind of the point I guess.

In terms of rollouts, yes, as an software engineer, I'm pretty happy with that- though of course the underlying propnet (Steve's) should get most of the praise. But there was still a lot of "fun" in getting the most of out it. I still haven't- I'm currently working on separating tree maintenance and the selection thread, which I'm hoping will give me another boost- but it's tricky as there are assumptions really hard baked in.

Steve Draper
Global Moderator

Posts: 143

SteadyEddie (part 16/16): Neural Ned Dec 25, 2016 6:42:03 GMT -8

Quote

Post by Steve Draper on Dec 25, 2016 6:42:03 GMT -8

Interesting posts (thanks for sharing). Isn't your 'nudge' effectively doing iterative gradient descent on the 'utility function' of your linearized dot product's differnce to what MCTS is telling you? (or an approximation of that at least)

Claus New Member Posts: 18	SteadyEddie (part 16/16): Neural Ned Dec 30, 2016 14:13:37 GMT -8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Claus on Dec 30, 2016 14:13:37 GMT -8 I want to add my thanks for sharing your design. Lots of good ideas to ponder.

rxe
Junior Member

Posts: 61

SteadyEddie (part 16/16): Neural Ned Jan 4, 2017 12:40:10 GMT -8

Quote

Post by rxe on Jan 4, 2017 12:40:10 GMT -8

For anyone interested, I just forked galvanise and spent about about 20 mins commenting out the rave code and only running pure random rollouts. Now (with 3 worker threads) it averages about 120k tree playouts per second on C4, peaking at 160k. Running on tiltyard as gurgeh for laughs (with 3 worker threads).

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 16/16): Neural Ned Jan 4, 2017 16:07:40 GMT -8

Quote

Post by steadyeddie on Jan 4, 2017 16:07:40 GMT -8

I spent Xmas implementing the rollout "back-track-at-the-end" from your post ggp.boards.net/thread/413/galvanise-v2. I did that because I watched a GreenShell vs SE match of Larger C4, where SE's eval was purely down to how many threat columns (if you know what I mean) each player had. Nothing to do with who was actually winning.

So far, disappointing results. At C4 much better. Everything else slightly worse, probably down to the fact there is a small drop off in rollout speed.

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 16/16): Neural Ned Jan 4, 2017 16:11:05 GMT -8

Quote

Post by steadyeddie on Jan 4, 2017 16:11:05 GMT -8

And on your point, yes, the speed of the control thread is absolutely critical with the design I have (and I assume you have, but at least General does not have). I only re-eval RAVE components every 1000 visits or something.

rxe
Junior Member

Posts: 61

SteadyEddie (part 16/16): Neural Ned Jan 4, 2017 19:48:06 GMT -8

Quote

Post by rxe on Jan 4, 2017 19:48:06 GMT -8

Jan 4, 2017 16:07:40 GMT -8 steadyeddie said:

I spent Xmas implementing the rollout "back-track-at-the-end" from your post ggp.boards.net/thread/413/galvanise-v2. I did that because I watched a GreenShell vs SE match of Larger C4, where SE's eval was purely down to how many threat columns (if you know what I mean) each player had. Nothing to do with who was actually winning.

So far, disappointing results. At C4 much better. Everything else slightly worse, probably down to the fact there is a small drop off in rollout speed.

Ah the poor man's min-max! It really helps for sudden death like games, like C4 and breakthrough. I have a pretty pathetic hardcoded heuristic if it kicks in or not (some threshold in number of rollouts per second - like 1000 or something). I also feed in an estimated game depth end, so plays randomly until then. Seemed to at least help a bit in not losing orders of magnitude in performance.

Last Edit: Jan 4, 2017 20:24:50 GMT -8 by rxe

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 16/16): Neural Ned Jan 5, 2017 13:10:19 GMT -8

Quote

Post by steadyeddie on Jan 5, 2017 13:10:19 GMT -8

Dec 25, 2016 6:42:03 GMT -8 Steve Draper said:

Interesting posts (thanks for sharing). Isn't your 'nudge' effectively doing iterative gradient descent on the 'utility function' of your linearized dot product's differnce to what MCTS is telling you? (or an approximation of that at least)

Yes, I think that's right. It's not very sophisticated, it just needs to be simple enough to compute quickly.

General Game Playing

SteadyEddie (part 16/16): Neural Ned

Post by steadyeddie on Dec 18, 2016 6:31:58 GMT -8

Post by rxe on Dec 22, 2016 6:41:40 GMT -8

Post by steadyeddie on Dec 24, 2016 8:43:15 GMT -8

Post by Steve Draper on Dec 25, 2016 6:42:03 GMT -8

Post by Claus on Dec 30, 2016 14:13:37 GMT -8

Post by rxe on Jan 4, 2017 12:40:10 GMT -8

Post by steadyeddie on Jan 4, 2017 16:07:40 GMT -8

Post by steadyeddie on Jan 4, 2017 16:11:05 GMT -8

Post by rxe on Jan 4, 2017 19:48:06 GMT -8

Post by steadyeddie on Jan 5, 2017 13:10:19 GMT -8

Quick Reply