|
Post by Steve Draper on Dec 26, 2013 13:06:16 GMT -8
I'm starting this thread to begin an exchange of experiences/recommendations in regard to good choices of games to use as regression (or other more specific) test cases.
Here's my list and observations on each to kick things off:
Basic have-I-broken-something regression test: Connect 4 (complex enough to be sensitive to reasonably subtle breakages, but plays fast enough to give a decent statistical indication in 30-60 minutes - say 12 games or so)
Basic multi-player test - 3PFFA (not fixed sum, and doesn't suffer too badly from relative player order bias, which 3PC4 does)
Truely simultaneous play - blocker
Pseudo-simultaneous play (aka simultaneity is factored into independent sub-games) - Chinook (or C4 simul)
Complex game with large search space, and low signal-to-noise ratio (by which I mean the MCTS signal from the state resulting from good move takes some time to converge and give a statistically significant bias top that move) - Breakthrough
Highly transpositional game - Cephalopod micro - this one is a really good test case for some edge cases since it's game tree is very highly transpositional, and it also exhibits the unusual characteristic that several moves from a given state can all result in the SAME next state
Games where early decisions significantly constrain what can ultimately be achieved - Max Knights - initial good looking early moves (e.g. - knights in adjacent corners) can turn out to be bad, which only becomes apparent really rather deep in the game tree
Factorizable games - Chinook, C4 simul, Breakthrough w/walls (the last one is an especially good example I think since it's a bit more subtle)
Games with significant latches - escort latch breach (I haven't worked on this yet, and I notice my player has a nasty tendency to lose its King early, which should be fairly readily avoidable by latch analysis)
Anyone else have good choices they have found useful to analyse/work on?
|
|
|
Post by alandau on Dec 27, 2013 14:34:55 GMT -8
Bidding tic-tac-toe is another interesting test case for simultaneous play (in this case, intermixed with non-simultaneous play). Unlike with Blocker, perfect play by a role can ensure that they won't lose. It also has the interesting property you mentioned where multiple (combinations of) moves can result in the same game state.
If you want to try working on heuristics, Breakthrough Suicide is an excellent choice, as a good heuristic approach will beat the pants off UCT in that game.
|
|
|
Post by Steve Draper on Dec 28, 2013 13:56:53 GMT -8
Bidding tic-tac-toe is another interesting test case for simultaneous play (in this case, intermixed with non-simultaneous play). Unlike with Blocker, perfect play by a role can ensure that they won't lose. It also has the interesting property you mentioned where multiple (combinations of) moves can result in the same game state. If you want to try working on heuristics, Breakthrough Suicide is an excellent choice, as a good heuristic approach will beat the pants off UCT in that game. So what's the right heuristic to apply for Breakthrough Suicide? My initial thought was just to minimize the sum of the ranks all your pieces are on, but actually that won't work because ultimately (assuming your opponent won't make the final losing move, given any choice) your opponent will still have as many moves that do no lose as you will. So right now I'm thinking just minimize opponent mobility (subject to not accidentally getting to a terminal state by capturing his last piece) - that should cause a maximization of captures, which leaves the opponent with a small number of pieces to move, each of which moves necessarily puts him closer to losing (since the pieces can only move forward). Is there a better heuristic than that readily available? Interestingly, since minimizing mobility ion any breakthrough-type game more or less amounts to maximizing captures, I suspect it's a pretty good heuristic signal for normal Breakthrough too...?
|
|
|
Post by alandau on Dec 29, 2013 10:37:54 GMT -8
Any heuristic that encourages captures will be good for Breakthrough Suicide. UCT tends to avoid captures until late in the game -- either it's a capture near their side of the board, in which case your piece is getting closer to a losing state, or it's a capture on your side of the board, in which case you're removing one of their pieces that is close to a losing state for them. UCT will care too much about those pieces sitting close to the opposite end of the board, since they'll affect the outcome of depth charges on a regular basis.
|
|
|
Post by Andrew Rose on Jan 9, 2014 2:42:28 GMT -8
Steve,
You say that Blocker is a good test case for genuinely simultaneous play games. I've just got simultaneous play coded up (although I've thrown away a lot of good stuff in the process, which I'll need to add back, adapting it for simultaneous play along the way). I've been using Blocker to test it out, but it seems to have a very considerable bias for the "blocker". Playing against SampleMonteCarloGamer, 16/20 went to "blocker" (and in the other 4, SgianDubh managed to win as "crosser" - usually in just 4 moves each time).
I take it you have a different experience? Any insights?
Andrew
|
|
|
Post by Steve Draper on Jan 9, 2014 5:56:53 GMT -8
Steve, You say that Blocker is a good test case for genuinely simultaneous play games. I've just got simultaneous play coded up (although I've thrown away a lot of good stuff in the process, which I'll need to add back, adapting it for simultaneous play along the way). I've been using Blocker to test it out, but it seems to have a very considerable bias for the "blocker". Playing against SampleMonteCarloGamer, 16/20 went to "blocker" (and in the other 4, SgianDubh managed to win as "crosser" - usually in just 4 moves each time). I take it you have a different experience? Any insights? Andrew Oh, it definitely has that bias, but if you play a version that handles simultaneous play against your previous version that does not you should see a STRONG win rate increase over the space of a few games. The benefit of Blocker as an early test game is that it's quick and easy to understand visually. Bidding Tic Tac Toe is another one you could use, but it takes a bit more interpretation to spot systemic mistakes
|
|
|
Post by Steve Draper on Jan 12, 2014 8:15:08 GMT -8
I'm looking to add a learning capability, whereby various tuning parameters are learnt in simulated play over a long period between instances whose parameter settings are varied by a learning algorithm that isn't pertinent to my immediate question. In order to work on, and determine the viability of this, I need this I need an initial test game to attempt to employ it on with (ideally) the following characteristics: - Games can be played reasonably quickly. This might mean a high state simulation rate (so that low per-turn play times can be employed), or it might mean a game that is over in a small number of turns (or some combination)
- There isn't a pronounced role bias, so the game should not be a (obvious/practically detectable) strong win for any role
- The game is discriminatory as to player skill - ideally having a high skill correlation for all roles
- The game should be naturally understandable by a human, since this makes diagnosis of systemic blunders much easier (indeed makes the blunder apparent at all)
I'm not sure what the best candidates are. Things like TicTacToe or TicTacChess satisfy the first requirement well, but are hopeless for one of both of the other two. Currently th best I can think of is probably Breakthrough(small), but it's not great on the first criteria. Anyone have any suggestions?
|
|
|
Post by Andrew Rose on Jan 17, 2014 1:26:26 GMT -8
I'm starting this thread to begin an exchange of experiences/recommendations in regard to good choices of games to use as regression (or other more specific) test cases. Here's my list and observations on each to kick things off: ... Games with significant latches - escort latch breach (I haven't worked on this yet, and I notice my player has a nasty tendency to lose its King early, which should be fairly readily avoidable by latch analysis) ... Anyone else have good choices they have found useful to analyse/work on? For games with significant latches, I suggest the "base.firefighter" puzzle. (Also stanford.untwistycorridor, but it's too small, so you'll manage to solve it in a few seconds even without latch code.)
|
|
|
Post by Steve Draper on Feb 13, 2014 14:28:34 GMT -8
I want to second the recommendation of Bidding TTT as an interesting game to analyse behavior on (and once you get that sorted out fully move on to the larger variant with 10 tokens).
Not only does this have interesting simultaneous turn behavior but it also exhibits (in a small game, which helps the analysis of what is going on) several other interesting properties, any or all of which might lead to insights into flaws or potential optimizations (all in my case as it turned out, in either or both of flaw or optimization categories):
1) It's simultaneous turn, but only every other move, which leads to some interesting dynamics 2) In most states several moves transition to the SAME next state - this leads to some interesting representation and optimization possibilities in the MCTS graph 3) It's highly transitional (many paths lead to the same state) 4) It has a very large state space for such an apparently small game (especially the 10 token variant) 5) Basic MCTS converges very slowly in my experience on this game (the larger one anyway), and in an interesting way - high bid moves (in the bidding phase) initially look good to it, and then gradually fall away (and are eventually found to be a really bad idea). This non-monotonic trend in branch scores results in very slow convergence, and seems to be a fertile ground to play with variations in MCTS selection and some post-processing to try to speed the convergence process
|
|
levb
New Member
Posts: 18
|
Post by levb on Mar 12, 2014 12:11:05 GMT -8
I use the puzzle with 3 switches for regression testing. It's enough for catching the really stupid bugs.
|
|
wat
New Member
Posts: 32
|
Post by wat on Jun 8, 2014 20:53:52 GMT -8
Tic Tac Toe.
- Play strength should be already high with small play time. - You immediately catch obvious bugs. - You immediately catch tactical blunders. - Good for testing what happens when MCTS reaches terminal nodes. - Good for spotting overflows when playout count goes really high.
If it passes the test then I go for more complex games, like Knight Tour (single player), Free For All (non-zero sum), Connect Four (complex zero-sum) and/or Blocker (simultaneous). They are complex enough to stress your player, and simple enough for a human to understand what is going on.
|
|