|
Post by maciej on Apr 3, 2014 17:13:34 GMT -8
Hello everybody (as I am new to the forum) Greetings from Poland, well currently from Australia First, a quick background: I am the main author of MiNI-Player, a GGP agent which competed twice in the competition. It made the second day in 2012 (reasonably good) and had an awful performance in 2013. Here is the paper in IEEE TCIAIG (Early Access, should be in a printed volume soon)/it was submitted for the journal in the late 2012, accepted for publication in June 2013, so a few things have changed - adaptation may be no longer that novel I want to ask about a particular feature of SgianDubh: "Learning capabilities, using a GDL isomorphism test to recognise previously played games and reload things previously learned about the game." Can it even be done? Games are obfuscated, order of rules can be changed, even definitions of the same games can be different but equivalent. So many things can theoretically change - syntax, semantics, the order of rules, superfluous rules, permutations of arguments (like (cell ?y ?x ?piece) instead of (cell ?x ?y ?piece) in the whole description), length... Testing isomorphisms of graphs is proved to be a NP-hard problem. And also isn't it against the GGP idea not to have any knowledge at the beginning? Best regards, Maciej
|
|
|
Post by alandau on Apr 3, 2014 17:33:13 GMT -8
Welcome to the forum!
I believe it's talking about defeating the particular obfuscation done on Tiltyard, which is a much simpler task. In particular, it doesn't even reorder the rules (as this could have random negative impacts on rules engines, such as GGP-Base's default ProverStateMachine).
As you say, if someone were to rewrite a game with a substantially different set of relations, it would be impractical to prove the equivalence of the games.
And then there is the question about the philosophy of GDL. I suspect that it's in the realm of GGP (broadly speaking) to consider what can be learned about games over time by a program, so long as that knowledge doesn't originate from a person. For example, Prof. Genesereth was considering an addition to the GDL standard to let the game server expose a new game to the gamers before playing several rounds of the game in a tournament, in an attempt to give the gamer time to develop deeper strategies.
|
|
|
Post by Steve Draper on Apr 4, 2014 11:28:48 GMT -8
Andrew can better respond about how Sgian Dubh did it, but I was looking at testing isomorphisms of the propnet based on dependencies. Firstly a canonically labelled propnet is independent of the rule obfuscation and of its ordering. You can analyse the propnet for dependencies between propositions (traversing does->legal jumps) to get a dependency matrix (with a 'component distance' in each cell of the matrix). Then you can generate a (hashed or otherwise) signature of each prop from its column and row in the matrix. You can then attempt to perform a match between the propositions in two games by essentially treating it as a constraint problem (any valid match must have matching props also have matching signatures, and this is VERY constraining typically).
I haven't attempted to do all of this for game matching, but I have done much of the dependency analysis part in order to identify factors in factorizable games. I believe that generating proposition signatures like this and then combining them in an order-independent hash to generate a game hash should work very well to quickly eliminate (simple hash comparison) most non-matching games from a game database. Any remaining candidates (and given a database of the sort of size we have for real game repositories I'd expect this in practice to always be 0 or 1) can then be matched by the constraint search approach.
This approach can also be used to find symmetries within a single game.
As an aside I got my development version of the factor analysis fully operational for the first time today (automatically identifying factorizable games and performing the factorization, then playing the factors independently and recombining the moves). It has some rough edges, and is certainly not theoretically 'correct' for all possible cases (but it gets the right factors for all the 'real' case I've exposed it to so far). Anyway - literally 15 minutes before starting this post I set it against the current release version of Sancho (no factorization capability) in a 10-game match of simultaneous connect 4 - it's currently 5:0 up with about half the rollout rate of the release version (exponential saving from branching factor majorly trumps linear performance advantages)
|
|
Claus
New Member
Posts: 18
|
Post by Claus on Jun 12, 2014 17:27:13 GMT -8
Maciej asked "And also isn't it against the GGP idea not to have any knowledge at the beginning?" I asked a similar question in the discussion forum of the recent GGP MOOC on Coursera. The answer there was inconclusive. Setting aside the pragmatics of whether it is feasible to enforce this with obfuscation technology, I am going to propose that ideally a GGP player should play as if each game was being encountered for the first time. This is consistent with the machine learning principal that the training set used to learn the algorithm parameters needs to be distinct from the testing set used to validate that the learning algorithm is effective. The naive reason for not recognizing the game is to prevent players from selecting an algorithm specifically designed for the game. The counterargument in favor of recognizing the game is to enable machine learning techniques for use in tuning the GGP algorithm (e.g. weighting different heuristics). The analogy is that humans learn to play well through repetition. However, that is what the metagaming phase is intended to allow, where the GGP player can play thousands of (random) games. Recognizing the game from prior sessions and applying that experience is basically an arbitrary extension of the time allowed for metagaming to play more systematic games. To the extent that the machine learning algorithms can benefit from the additional time (as opposed to manual tweaking ), this seems more of an argument for increasing the metagaming time than an argument against the ideal of playing each GGP session as if the game were presented for the first time. I will concede that there are some practical challenges in running competitions and game masters like Tiltyard if we are to allow metagaming to run for hours or days, but that would be the level playing field. Otherwise we start getting into an arms race optimizing the players for all known games. Machine learning that is not specific to recognizing a specific game presents an even greater challenge to the ideal that the GGP player should play as if it had never encountered the game previously. Machine learning in this sense would recognize features of the game from the GDL, another representation like the PropNet, or even playing games and would use these features to select "optimal" parameters for the GGP algorithm(s). (This is a generalization of measuring the correlation of heuristics with score and weighting the heuristics accordingly.) Experience playing many different games many times (possibly against different opponents) is used to compute the optimal parameters for different combinations of features. While the GGP field is probably some years away from this level of player, it does present the classic machine learning problem of over fitting to the training set. In this case, the games used to learn the feature weights associated with different parameters are the training set. The problem is that the training set may not be fully representative of the universe of data (games) and so the tuned parameters may not work so well in general as on the training set. The way to test for this is to use data (games) that were not in the training set, so competition would need to use different games (testing set) than were used to train the players. In a nutshell, if the goal is to win a competition then anything within the rules is valid, but if the goal is to find the best algorithm then we need to be careful to distinguish between training games (data) and testing games (data) and the competitions need to take this into account.
|
|
wat
New Member
Posts: 32
|
Post by wat on Jun 15, 2014 18:05:29 GMT -8
Apart from going against the generalist player ideal, there is another issue related to storing state between matches.
Player strength keeps oscilating up and down due to changes in player offline database. It goes up when it plays a game for the second time, and goes down when it sees a new game. It is a lot harder to measure your player strength with this happening as it adds a lot of noise into Agon skill rating. Did it's rating went up because of the last improvement you made? Or did the last improvement backfired but rating still went up only because the offline database is bigger?
In the long run, it hurts the evolution of your player.
|
|
Claus
New Member
Posts: 18
|
Post by Claus on Jun 16, 2014 8:30:37 GMT -8
The straightforward, albeit painful, solution to confounding the effects of the offline database with the algorithm changes is to reset the offline database after each algorithm change. The painful part is when the offline database represents hundreds or thousands of past games, which makes it tempting to make a risk assessment of whether the change is significant enough to make the database reset worthwhile in terms of the design knowledge gained.
The other interesting point here is that the training depends not only on the games used, which are static or changes slowly even when using a facility like Tiltyard, but also on the players, which is subject to greater change depending on who else is testing players. Training is difficult to control unless you have access to many strong players and the resources to host the players and a game master of your own.
This relates to the general challenge in machine learning that the correct answers need to be known for the training and testing sets used with the algorithm before applying it to the larger universe of data where the answer is not known (consider speech recognition). At first blush, we seem to have it easy in GGP because we always get to terminal scores, which seem like an answer. In fact, this is an approximation because it depends on the "strength" of the opponent, and is reflected in the statistics collected by Tiltyard to assess how much weight to assign to the game outcomes in rating the players. The only games with known answers are those, like tic tac toe or connect four, that have been completely solved to show that game is always a draw, a win for a specific player, or more generally, the specific score for each player in a perfectly played game.
This puts us into an advanced area of machine learning where algorithms "learn" on data sets where the correct answers are not known. These algorithms may use a small "bootstrap" training set with known answers, followed by a larger training set with unknown answers to refine the parameters. A separate testing set is then used for evaluating the algorithm, but the evaluation is based on statistical measurements of quality. Recommendation algorithms fall in this area.
|
|
cat
New Member
Posts: 2
|
Post by cat on Jul 22, 2014 23:48:38 GMT -8
That's a very interesting question. Recently I was thinking about evading Tiltyard obfuscation, but without further changes in our player it will just save for as a few seconds of compilation time, so it is not too much gain.
On the one hand everyone who said that every game should be threaded like seen for a first time are right. It doesn't distort the outcomes/skills, and can be seen as more 'fair' play. It is also closer to the idea of GGP, and main research goal should be to construct the best algorithm for such scenario.
However, in such case we omit quite interesting question - actually what one could gain from using past games knowledge, and how difficult in practice is to recognize the same game. The answer for the second question can include even so sophisticated tricks as defining some 'normal form' of gdl specification which gives the biggest chances to recognize that two different games encode the same set of rules. What can be gained from memorizing games I suppose is also not so trivial question. We can remember some nodes from the MC tree, but what nodes? The can have no usage if game goes in different direction. Or if we want to store as many as possible is it not too much information? If such knowledge can be used only to this particular game or also to games with similar structures?
To summarize, I want to say that if it is ok that main trend is focused in developing the best algorithms for new games, some research and trials for learning games previously seen should not be completely abandoned.
|
|
|
Post by Andrew Rose on Aug 13, 2014 10:41:47 GMT -8
Sorry - it's a long time since I've been on these boards. I'm the author of SgianDubh.
The GDL isomorphism test I implemented was very simple indeed - because the only problem it had to overcome was alpha-renaming as done by the Tiltyard. It couldn't even cope with re-ordering, but it didn't need to. If there were servers which arbitrarily re-ordered the rules, I could easily extend it to deal with that. And yes, whilst sub-graph isomorphism is NP-hard in general, it's often straightforward to determine that two games aren't the same in practice. (Even a very simple test on the number of symbols in the GDL description is enough to split all the games on the Tiltyard rotations into very nearly as many classes as there are games.)
I reckon that learning is a key part of artificial intelligence, which is why I added the capability to SgianDubh (now ported to Sancho). That said, it is clearly controversial enough that we disabled it for the recent competition (IGGP14).
|
|