|
Post by Steve Draper on Jan 9, 2014 18:52:17 GMT -8
A couple of days ago my attention was drawn to Alex's excellent blog, and in particular this post: alloyggp.blogspot.com/2013/12/improving-chinese-checkers-performance.html about performance in Chinese Checkers. Coincidentally I had noticed, about two days prior to this, that performance in 6 player Chinese Checkers was very bad (having had Tiltyard draw me into 6 player games 4 times in one night). Alex's analysis made it fairly obvious what the nature of the problem was, but I wanted an automated way to address it, without necessitating changes to the game rules, as published on Tiltyard. I spent some time looking at the propnets the current rules generated, and found that the main feature was a bunch of large ORs whose inputs were small ANDs typically featuring a fixed base prop and different move props (in the logic for determining whether a cell stays the same, which as Alex pointed out, ends up being most of the network). The general form has EVERY AND input to a particular large OR with at least one (and sometimes several in some games) common input(s). In such cases the ANDs can be distributed over the OR, to move the common factor to a single AND after the OR (provided the ANDs ONLY feed the OR, and their output is not used in any other way). This is not hard to detect by post-generation analysis of the propnet, so I wrote some code to do so and make the change to distribute the AND over the OR automatically. The results vary between nothing (games that don't use nasty disjuncts in their description, such as Breakthrough), through significant (e.g. - Connect5, where performance roughly doubles), to downright amazing (Chinese Checkers 6 player, where the performance improves by a factor of 35, and the component count is reduced by 80%). I have subsequently refined the code slightly to only bother doing anything on definite win cases (at least by component count), and that version is now running in the latest rev of Sancho (aka Quixote) on Tiltyard. It has been regression tested in the state machine validator through the entire repository, so I don't expect functional issues, but it may be that it will turn out to hurt performance in some edge cases. The next step is to see whether slightly more general analogous refactorings of the propnet (distributing ORs over large ANDs for instance) will provide any significant further benefit in extant games. One final observation, that should yield further improvements, is that in the context of a well-formed game, you can be certain that a player always has at least ONE legal move, and with that observation any OR whose inputs consist only of move props can be replaced by the NOT of an AND of all the OTHER move props, so if an OR takes inputs from more than half of the move props in the game the network can always be reduced by the DeMorgan's transformation to and AND (might be worth adding the inverted-inputs-AND as a component explicitly to remove the overhead of separate component for the NOTs in that case also).
|
|
|
Post by Sam Schreiber on Jan 9, 2014 23:16:06 GMT -8
Great job! I highly recommend these kind of optimizations.
|
|
qfwfq
New Member
Posts: 29
|
Post by qfwfq on Jan 10, 2014 11:28:41 GMT -8
Good idea about the legal moves!
Another case where a big OR (related to a 'distinct' in the GDL) could be replaced by a much smaller NOT is when the OR inputs are most of the values of a counter. I found this some time ago when looking at EightPuzzle, but have not had time to check whether there are any other games with this structure, yet.
|
|
|
Post by Steve Draper on Jan 10, 2014 16:05:15 GMT -8
As an FYI - I have now implemented a generic duplicate-logic remover, which will replace any duplicates of collections of components that are computing the same thing (rewiring anything they were outputting to to the remaining single instance). Has only a small effect on most games (most don't generate much, if any, duplication), but a few showed significant benefit (checkers about 10%, Qinyshu 100% speedup, Connect5 50%ish)
|
|
|
Post by alandau on Jan 11, 2014 14:13:01 GMT -8
Nice work! The OPNF actually tries to do something similar at the GDL level, before we even start building out the graph. (Obviously, it's not perfect.) The gist is that it transforms rules that look like this:
(<= (baz ?a ?b) (foo ?a ?b) (bar ?b ?c))
into rules that look like this:
(<= (baz ?a ?b) (foo ?a ?b) (bar_tmp0 ?b)) (<= (bar_tmp0 ?b) (bar ?b ?c))
This is handled by the (extremely crufty) CondensationIsolator class.
|
|
|
Post by Steve Draper on Jan 11, 2014 16:35:12 GMT -8
In practice the distribution of ANDs over large ORs was by far the most effective. The transforms I've added (and their effectiveness non-numerically, though if anyone is interested in suggesting a few games they'd like figures for I can easily provide them) are:
1) Distribute common factor ANDs through large ORs - as mentioned in previous posts- significantly effective in several games
2) Distribute common factor ORs through large ANDS - appears never to occur for any generated propnet I have seen so far, so not at all effective!
3) Replace ANDs and ORs with large numbers (currently just all, but I'll generalize a bit sometime) of inputs from NOTs with the DeMorgan converse (AND(Not(InputSet)) == NOT(OR(InputSet))). Somewhat effective (up to about 10% component reduction) in a relatively small number of games
4) For ANDs or ORs whose inputs include more than half of any role's DOES props, replace with the complementary set of DOES props using DeMorgan's again. I was expecting this one to be quite effective but as it turns out it really isn't in almost any game in the repository
5) Generate a signature for the output of every gate based on the signatures of its inputs (with Transitions not counting as inputs, and (then or anyway) input-less components getting a randomized signature. This hash represents what the component calculates, so remove all duplicates (with some checking for hash collisions). Modestly effective (not not hugely so) in a fair number of games
Right now that's all the obvious ones I can think of.
|
|
|
Post by Steve Draper on Jan 15, 2014 9:03:29 GMT -8
I've found a couple more useful tweaks along these lines: - Factorize component logic where large fanouts occur. If your propnet does forward propagation (as mine does), then large fanouts from a component are a performance issue, because even when the state of most successor component outputs may not change (due to other inputs), having to loop over and process them is time consuming proportionately to the fanout. If large subsets of the components being fed by the large fanout themselves have other common factors, then it is often beneficial to move the factor up into a single separate component, which then becomes the one with the large fanout, but changes state less often.
- More substantially, but only for a minority of games, note that goal evaluation is only performed in terminal states (typically at the end of playouts). Thus the goal logic is not needed while playing the game through its non-terminal states. This allows us to remove the goal calculation logic entirely from the propnet used for playouts, apart from terminal goal evaluation, and use a separate cloned net (which itself can then have everything BUT the goals trimmed out) for goal evaluation. The result is a slimmed down network with which most of the game is played, and a completely independent propnet for goal evaluation (used once per playout instead of once per move). Most games don't benefit hugely from this (goal logic tends to be mostly common with terminality logic, either inherently, or due to the way the GDL is written), but some games benefit dramatically. In reversi the goal-less network optimizes down to under 4000 components, whereas the goal-only one is over 15,000! Reversi performance overall goes up about 300% with this optimization.
|
|
qfwfq
New Member
Posts: 29
|
Post by qfwfq on Jan 15, 2014 11:50:08 GMT -8
Did the same (goal splitting) to benefit Reversi (but there are bigger optimizations I still need to do as it's hard to gauge their impact in advance, for example splitting the net based on turn is potentially a big saving, but also substantial implementation work, and there may also be an extra cost due to component duplication).
In my propnet, I remove the outputs from the main net to the goals subnet, but keep the inputs from the goals to the main net, so it's easy to restart the goal computation from the terminal state.
In the past I've also tried splitting AND/ORs with too many inputs, but it didn't work out well for most games, and in any case fan-out may be more important than fan-in.
Another possible optimization I was thinking about was between the legals and the inputs: I see that some games have rules where a move ("does") is only allowed for example if a board square is empty, but in reality it will always be empty because also the rule for legal moves checks exactly the same condition.
|
|
|
Post by Steve Draper on Jan 15, 2014 13:41:02 GMT -8
Did the same (goal splitting) to benefit Reversi (but there are bigger optimizations I still need to do as it's hard to gauge their impact in advance, for example splitting the net based on turn is potentially a big saving, but also substantial implementation work, and there may also be an extra cost due to component duplication). In my propnet, I remove the outputs from the main net to the goals subnet, but keep the inputs from the goals to the main net, so it's easy to restart the goal computation from the terminal state. In the past I've also tried splitting AND/ORs with too many inputs, but it didn't work out well for most games, and in any case fan-out may be more important than fan-in. Another possible optimization I was thinking about was between the legals and the inputs: I see that some games have rules where a move ("does") is only allowed for example if a board square is empty, but in reality it will always be empty because also the rule for legal moves checks exactly the same condition. The whose-turn-is-it split is something I've always had since before the Coursera finale. I actually split my net on the base prop that changes most often in a brief simulation (which almost always is a 'control' proposition in fact). One split has it hard-wired to TRUE, the other to FALSE (and then both reduced by any redundant logic that reveals). It then chooses which machine to use based on the state of the splitting proposition in each state it is asked about.
|
|
|
Post by Andrew Rose on Jan 16, 2014 1:25:04 GMT -8
I've had a generalized redundant logic remover since before the Coursera final (based on computing a "fingerprint" of each component based on its type + inputs, identifying components with the same fingerprint and then removing the duplicate and rewiring the output of the other). I understand your DeMorgan stuff (although haven't implemented myself), but I don't get the "Distribute common factors ANDs through large ORs" thing. I've read your description several times, but still don't get it.
Any chance you could try explaining it in different words and/or with a picture?
|
|
|
Post by Steve Draper on Jan 16, 2014 6:36:05 GMT -8
I've had a generalized redundant logic remover since before the Coursera final (based on computing a "fingerprint" of each component based on its type + inputs, identifying components with the same fingerprint and then removing the duplicate and rewiring the output of the other). I understand your DeMorgan stuff (although haven't implemented myself), but I don't get the "Distribute common factors ANDs through large ORs" thing. I've read your description several times, but still don't get it. Any chance you could try explaining it in different words and/or with a picture? What I mean is this - suppose you have an OR gate with 100 inputs. Now suppose (simplest case) that all of those inputs are AND gates, and furthermore every such AND has one common input. You can remove the common input from every AND, and instead AND the output of the OR with the common input (so moving the AND of that common factor 'through' the OR). In some (many actually) cases you find that having taken the common factors out of the input ANDs you're left with 0 or 1 other inputs, and in either case you can remove the input AND component entirely, though the transformation is worthwhile regardless of this in most cases. Interesting that it sounds like your fingerprinting technique is essentially the same thing as I wound up with in my signature mechanism (which I now use to remove redundant logic). What checksums did you use, and what mechanism for combining fingerprint values of inputs? Do you check for hash-clashes or just trust the uniqueness of your fingerprints? In my case I use a 64-bit checksum (xxhash - see code.google.com/p/xxhash/) of the gate type (each type of component has a randomized 64-bit value) and input signature, where the input signature is the sum (I started with xor, but that failed miserably for on-reflection-obvious reasons) of the signatures of all input components. Components with no inputs, and transitions are given randomized values. I checked for hash clashes at first with some sanity check logic that the gate pattern really did match (not a hash collision), but when I moved from 32-bit to 64-bit hashes I never again saw a collision, and have now disabled the checking code since it's quite a lot of overhead (assuming a good hash, the probability of a collision in a 64-bit space given the size of propnets we have is ignoreable)
|
|
|
Post by Andrew Rose on Jan 16, 2014 11:13:32 GMT -8
Thanks Steve, that's crystal clear now. I see how that could have a good effect for some games.
My fingerprinting was very simple (and moderately inefficient, but not to the extent that it caused a problem). A fingerprint was a string, along the lines of...
OR: AND@67192, AND@81327, Prop@23410, ...
I produced the fingerprint for each component and stored in a hashmap from fingerprint -> component. When I found that the fingerprint was already in the hashmap, I merged the components.
Now that I write it out, I'm concerned that there might be a bug whereby I don't sort the inputs to a component (and potentially therefore, identical components could end up with different fingerprints). I'll look into that, but that basic idea is sound.
|
|