|
Post by alandau on Aug 3, 2014 18:03:36 GMT -8
The fact that games on Tiltyard are chosen uniformly at random from the available set seems far from ideal. If there's a large number of variations of a particular type of game, it will be played an undue percentage of the time (and have a proportional influence on gamer rankings). Since games are chosen before players are selected to put in the game, games with many players (e.g. Chinese Checkers) are also over-represented in each player's history.
It doesn't seem likely that this will be fixed by the improvements currently slated in Scheduling.java. In particular, the problem of automatically identifying games that would seem very similar to a human seems difficult, and this is important to the entertainment value of GGP. It seems worth considering a manual weighting system instead.
At the risk of opening a can of worms, I've put together a first pass at a weighting below, with the following loose guidelines: - Favor iconic games - Favor games that are relatively different or unique, as opposed to games with lots of variations in the rotation - Favor games that are at least somewhat balanced (in some broad sense of "the outcome of the game is unpredictable but reflects players' skill") - Favor two-player games (as opposed to games with a higher number of players, which get played disproportionately as described above, or single-player games, which are interesting the first time played but less interesting to play repeatedly)
If this is adopted, an ongoing policy could be to also favor games that are new to the rotation.
It's worth noting that this could affect Agon rankings, since those depend on the set of games used and how often they're played.
//Higher number means more frequent appearance in rotation, linear with the number
"3pConnectFour", 1 "englishDraughts", 3 "dotsAndBoxes", 3 "knightThrough", 2 "breakthroughWalls", 2 "reversi", 3 "cephalopodMicro", 3 "breakthrough", 2 "nineBoardTicTacToe", 3 "pentagoSuicide", 3 "checkersSmall", 2 "checkersTiny", 1 "dotsAndBoxesSuicide", 3 "maze", 1 "ticTacToe", 3 //short enough to be inoffensive, IMO "ttcc4_2player", 3 "connectFourLarge", 2 "pegEuro", 1 "eightPuzzle", 1 "knightsTour", 1 "chinook", 3 "connectFourLarger", 2 "connectFour", 3 "breakthroughSmall", 2 "peg", 1 "connectFourSimultaneous", 2 "escortLatch", 2 "qyshinsu", 3 "connectFourSuicide", 1 "pentago", 3 "blocker", 3 "checkers", 3 "2pffa_zerosum", 1 "2pffa", 3 "3pffa", 2 "4pffa", 1 "2pttc", 1 "3pttc", 2 "4pttc", 1 "cittaceot", 2 "sheepAndWolf", 2 "ticTacToeLarge", 1 "ticTacToeLargeSuicide", 1 "connect5", 3 "max_knights", 1 "knightsTourLarge", 1 "quarto", 3 "quartoSuicide", 3 "biddingTicTacToe", 3 "biddingTicTacToe_10coins", 3 "chineseCheckers1", 1 "chineseCheckers2", 1 "chineseCheckers3", 2 "chineseCheckers4", 1 "chineseCheckers6", 1 "gt_attrition", 1 "gt_centipede", 1 "gt_chicken", 1 "gt_dollar", 1 "gt_prisoner", 2 "gt_ultimatum", 1 "gt_staghunt", 2 "gt_coordination", 1 "speedChess" 3
If this seems unsustainable, it may make more sense to put games in categories that themselves are assigned weights; so, new checkers variants go in the "checkers" category and get some of the fixed amount of weight given to that category, for example.
|
|
|
Post by Sam Schreiber on Aug 3, 2014 19:39:45 GMT -8
I'm definitely open to having a better approach to game selection in Tiltyard.
It shouldn't affect Agon ratings in a dramatic way, since they're already corrected for the games that are involved.
|
|
|
Post by Sam Schreiber on Aug 3, 2014 20:37:46 GMT -8
Also, is this related to the sudden appearance of a half dozen Sudoku variants? :-)
|
|
|
Post by Steve Draper on Aug 4, 2014 4:37:12 GMT -8
I would be happy with this, but with a (future?) direction of treating the manual weights as seeds or center-point weightings that can b adjusted by results (so things with strong role bias might be downgraded somewhat, as would things with low score variance between players [puzzles everyone solves reliably for example], or low skill-correlation).
In regard to the specific weights suggested, I have just a few that I think are possibly wrong:
1. Breakthrough/walls - this is unique on Tiltyard - it is the only factorizable game in which you actually have to choose with sub-game to play ion rather than it being forced. It should therefore be a 3. I would also make base Breakthrough a 3, and BreakthroughSmall a 2, but I feel less strongly about those
2. Escort Latch Breakthrough is again unique, being the only non-puzzle with strong goal latches, so should b a 3
3. IMO 9 board TTT should be downgraded to 2 for the reasons discussed in the competition char extensively (i.e. - it's very role biased). It is somewhat canonical however, which is why a 2 not a 1. TicTacChess on the other hand (which I don't think is included in Alex's list) I'd b happy to see disabled entirely! (it's just ridiculously role biased)
|
|
qfwfq
New Member
Posts: 29
|
Post by qfwfq on Aug 4, 2014 13:32:33 GMT -8
Good idea, using some manual weights look like a good and simple first step to address the problem.
Automating could be tricky, for example we know that how much a game is biased depends on the players' levels, and the average player level on Tiltyard will likely see a gradual increase for a few 'regulars' and seasonal fluctuations as waves of new players join (so it would be good to keep some easy games around for new players, and at the same time gradually introduce more difficult games, e.g. Hex and others that were used in the competition).
cittaceot seems easy and very biased. Any reason to keep it when we already have TTT (which at least results in a 50-50 score)?
Not sure what is the difference between TTT Large and Connect 5... do we need both?
I would also remove Pentago Suicide: There is already Pentago, and they seem both similarly biased according to the statistics.
For the weights, Alex's recommendations look reasonable, or maybe we can have a poll or shared document to inform Sam's decisions.
Perhaps a maximum score of 5 rather than 3 may be necessary, otherwise there are some family of games (like the game theory ones) which include many variants and would be over-represented. Categorizing by family would also solve that, but I don't think it's the right solution, for example Checkers and English Draught are in the same category for a human but the different capture rule makes them quite different games, or Breakthrough and Breakthrough Small are variants, but only if one ignores the different difficulty level).
|
|
|
Post by maciej on Aug 4, 2014 14:40:10 GMT -8
Let's write some new games! (I know I drifted into a separate topic now). I could spend an evening or two on encoding a nice game into GDL. Maybe a really simplified version of Heroes of Might and Magic battle? It was turn-based from what I recall.
What I found missing during the last competition were unknown games but other than that - I think it would be fresh for GGP.
|
|
|
Post by Sam Schreiber on Aug 5, 2014 9:14:37 GMT -8
I'm certainly open to new games.
It's not always bad to play games that heavily favor one role over another; they can be useful for verifying that your player is behaving reasonably, and they shouldn't hurt the Agon ratings for the players involved because those take role difficulties into account. (That was, in fact, the motivation for the development of Agon ratings.) And, while all of the players in the second day of this year's competition did really well on 9xTTT, that's not necessarily going to be true for newer players that are still in development, and I want Tiltyard to be a useful resource for developing new players that aren't necessarily very strong yet. That's why I think it's still useful to have Tic-Tac-Toe in the rotation, for example, even though it's not very exciting when played by experts.
I do like the idea of identifying "genres" of games that are similar, and choosing first by genre. Genres would consist of categories like "chess-like", "checkers-like", "breakthrough-like", "sudoku variants", "game theory games", etc. Individual games could belong to multiple genres: for example "breakthrough/walls" could belong to both the "breakthrough-like" and "factorable" genres. I remember that when I introduced the game theory games on Tiltyard, they suddenly started occupying a large percentage of matches simply due to the number of games; this would be a way to avoid that problem.
I'm reluctant to spend a lot of the community's time hand-tuning the weights for individual games.
One other idea that I've been toying with for a while now is to schedule matches in a way that maximizes the expected information produced by the match's outcome. For example, it's more informative to know the result of a match between two strong, similarly-rated players, than it is to know the results of a match between a strong player and random. And it's more informative to know the results of a match between two strong, similarly-rated players on a game with equally-difficult roles; but it's more informative to know the results of a match between a strong player and a weak player if they're playing a game where the strong player is in a role that has a disadvantage. Essentially, this algorithm would take the Agon skills for players and the Agon difficulties for games into account when scheduling new matches, to maximize the expected information that's produced from the outcome of the match.
|
|
|
Post by Sam Schreiber on Aug 5, 2014 9:42:55 GMT -8
To elaborate on that last thought: essentially it would treat Tiltyard scheduling as an optimization problem, choosing each match to help minimize the uncertainty associated with the full Agon rating of player skills and game difficulties. This would also have the nice property that it would prioritize matches for new players and new games, which have less of a known track record. So when a new game is added to Tiltyard, the scheduler would automatically "explore" it until it has a good idea about how difficult each role is, what the expected scores are, etc.
|
|
|
Post by Steve Draper on Aug 5, 2014 10:49:55 GMT -8
To elaborate on that last thought: essentially it would treat Tiltyard scheduling as an optimization problem, choosing each match to help minimize the uncertainty associated with the full Agon rating of player skills and game difficulties. This would also have the nice property that it would prioritize matches for new players and new games, which have less of a known track record. So when a new game is added to Tiltyard, the scheduler would automatically "explore" it until it has a good idea about how difficult each role is, what the expected scores are, etc. Sounds like good idea. I like the idea of categorization also. Perhaps a category name could be an optional extra field in the METADATA file for each game in the repo?
|
|
|
Post by Sam Schreiber on Aug 5, 2014 16:32:28 GMT -8
Yeah, I like that idea.
Okay, here's what I'm going to do. Short term, I'm going to manually categorize the various games, and then schedule matches by choosing a category at random, and then choosing a game within the category at random. (As a side benefit of doing this, I'll be able to include all of the Sudoku variants in the game rotation, rather than just two.)
Longer term I'm going to investigate the "information maximizing" approach. I may roll that out initially by scheduling 75% of matches by randomly choosing a category, and 25% of matches by "information maximizing", or something like that, and see how each approach behaves.
|
|
|
Post by Sam Schreiber on Aug 5, 2014 17:40:25 GMT -8
|
|
|
Post by Steve Draper on Aug 6, 2014 4:21:07 GMT -8
Looks good Sam. My only comment would be in relation to he observation you made that (in principal) games could belong to multiple categories. In particular I think you should add some categories for GGP-analysis concepts - specifically (at least for now, though there may well be others) 'latching' and 'factoring' games. Latching: Escort Latch Breakthrough, <some latch test puzzle which will need adding - firefighter say>. Games with peripheral latches which **could** be included (but weaker case by far): Reversi (corners are interesting latches), Dots&Boxes variants (13 boxes latches score), Cephalopod micro (6s are interesting latches) Factoring: Connect4 simul, Chinook, Breakthrough/walls, <be good to add a pseudo-puzzle from the Stanford repo here such as multiple Hamilton, or [even better] multiknightstour>
|
|
|
Post by Sam Schreiber on Aug 8, 2014 10:04:29 GMT -8
Yeah, I need to sit down and get more of the Stanford repo games into Base and onto Tiltyard.
|
|
|
Post by Andrew Rose on Aug 13, 2014 11:38:35 GMT -8
Excellent. (And nice to be able to get all the grades of Sudoku that I produced on the rotation without causing everybody to be playing Sudoku all the time.)
|
|