Proposal for a small tweak to Tiltyard rating system

Steve Draper
Global Moderator

Posts: 143

Proposal for a small tweak to Tiltyard rating system Jul 23, 2015 12:15:38 GMT -8

Quote

Post by Steve Draper on Jul 23, 2015 12:15:38 GMT -8

The current rating system on Tiltyard (both Agon and ELO) suffers severely from relativism - that is the score which any given player should eventually be in equilibrium at depends on the quality of the general population of players. Consider a couple of thought experiments:

1) Large population of perfect players - no statistical bias exists in any match so all ratings asymptotically converge on 0 (i.e. - God has an ELO of 0 in this environment!)

2) Large population of always-equal-quality players, where that quality increases over time (i.e. - perfect model of a bunch of equal research groups continually improving their players). Same result - although the absolute quality of players is steadily improving the ratings stay at 0

In both environments Random has a rating that is either very low (case (1)) [but not -infinity due to small games that are strong wins or strong draws, the latter actually being more significant, since they won't discount as 'easy' games], or steadily decreasing towards the score for the first case (case 2)

Since Random is a known quality fixed-point, why not give it a (arbitrary) fixed rating (both ELO and Agon) and simply display (this is purely a rendering difference) the other player's ratings as their difference from Random? Doing this would give us a meaningful measurement of absolute improvement over time, independently of what the population as a whole is doing.

I would propose fixing Random at 0, and initializing new Players at some higher fixed score (perhaps even at some percentage of the recent population moving-average or something to speed likely convergence).

Would this work?

qfwfq
New Member

Posts: 29

Proposal for a small tweak to Tiltyard rating system Aug 3, 2015 21:16:10 GMT -8

Quote

Post by qfwfq on Aug 3, 2015 21:16:10 GMT -8

If Random and Sancho play alone on Tiltyard for infinite time games that depend on the player's ability (not random, game-theoretic or some simultaneous games), my understanding (at least using the simplest ELO formula from Wikipedia) is that their ELO delta would continually increase without converging (or maybe converging to a huge number as even random can win, just with extremely small probability).

In this sense, whether the score for Random is fixed or not, Sancho would see its ELO continually increase even if its ability remains constant.

Not sure how this would translate to the many players and the actual mix of games, but I don't think this proposal would make it possible to compare ratings from different eras.

On the other hand, if it was possible to run the current version of a good player like Sancho for a long time and on the same hardware and without improving it, I think it would be a much better reference level (with the same logic of adjusting everybody's ratings so that the reference player's rating stays constant) as if another player's ability stays constant I think its ELO delta compared to the reference player should eventually reach equilibrium.

This could work ok, until most players become so much better that they almost always win against the reference player. But at that time a new and better reference player can be selected and its score fixed.

While not perfect, fixing the rating of a reference player seems an interesting idea to me to better compare scores over time. But the reference player should not be Random as its ability is so poor that it would slowly but continually push the other players up, or reach a delta ELO equilibrium that depends on the percentage of game theoretic games played on Tiltyard (in many of these, there is no strategy to consistently beat random) rather than on the player's abilities.

Also no idea if for Agon it would be different than ELO. Other complicating factors are the low average volume of traffic on Tiltyard and the fact that the mix of games varies over time.

steadyeddie
Junior Member

Posts: 58

Proposal for a small tweak to Tiltyard rating system Aug 13, 2015 15:17:59 GMT -8

Quote

Post by steadyeddie on Aug 13, 2015 15:17:59 GMT -8

As an aside, Groumf stole our rating points. Assuming he came in at 0, he stole 100 points from a closed system which totalled less than 1000. It shouldn't matter as it's all about the relative ratings. But tell that to the player who just went -ve. If I could see a reference player it'd make me happier. We could rotate the CPU cost if need be.

Another little teak I'd like to see is the ELO/AGON delta on a game. When I play internet chess ICC tells me the delta for win/lose. If I see when I'm playing Sancho it's +0.3 for win, -0.01 for a loss (or whatever) that'd make me feel a whole lot happier about my -ve rating. But I expect there's a whole queue of enhancements.

General Game Playing

Proposal for a small tweak to Tiltyard rating system

Post by Steve Draper on Jul 23, 2015 12:15:38 GMT -8

Post by qfwfq on Aug 3, 2015 21:16:10 GMT -8

Post by steadyeddie on Aug 13, 2015 15:17:59 GMT -8

Quick Reply