PSA - Understanding Glicko-derived Rating System

Discussion in 'Planetary Annihilation General Discussion' started by lokiCML, December 16, 2014.

  1. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    Please Note: I cannot say with 100% accuracy what the system does due to the changes made by Uber Ent. This is only meant to be informative. So that people have a better understanding of what's going on with the ranking system.

    But before I start talking about Glicko rating system. What is the purpose of a rating system? For me: it is about finding the closest interesting matched possible between a given set of players. It is not about giving incentive to the player by showing progression or otherwise. (But I do believe there should be a way to show progress but that's another topic.) Again it's about getting the closest matches reasonably possible. Also Glicko belongs to a family of rating systems that are probabilistic ("best guess") in nature. These include ELO, Glicko, and TrueSkill™ they are the Bayesian-based rating systems.

    Glicko Rating Systems

    It was developed as a extension to ELO by Professor Mark Glickman. The problem that he was trying to resolve was reliability of the rating which ELO provides no such metric. So he came up with Rating Deviation (RD) or standard deviation (in statistical terms) which is a measurement of uncertainty in a given rating. A high RD shows that the player's rating unreliable. Either because they're not playing frequently or is a beginner. A low RD shows that the player competes regularly.[1]

    A player's rating only changes from game outcomes but the RD updates both from game outcomes and time that lapses well not playing. The effect of this is the player's RD always decreases with games completed. Also RD always increases when the players do not participate in the rated games. Glicko learn about the player when they compete in more ranked matches lowering the RD because player's ability is better known. As time elapses well not playing ranked games it becomes more uncertain about the player's strength increasing the RD. [1]

    Rating changes are not balanced in the Glicko system. Depending on the players' RDs rating amount will increase, or decrease. This is completely different than the ELO were one player's rating increases by X (winner) amount and opponent's rating does decrease by X (loser) amount.[1] Glickman states:

    "The system does not conserve rating points - and with good reason! Suppose two players both have ratings of 1700, except one has not played in awhile and the other playing constantly. In the former case, the player's rating is not a reliable measure while in the latter case the rating is a fairly reliable measure. Let's say the player with the uncertain rating defeats the player with the precisely measured rating. Then I would claim that the player with the imprecisely measured rating should have his rating increase a fair amount (because we have learned something informative from defeating a player with a precisely measured ability) and the player with the precise rating should have his rating decrease by a very small amount (because losing to a player with an imprecise rating contains little information). That's the intuitive gist of my extension to the Elo system.

    On average, the system will stay roughly constant (by the law of large numbers). In other words, the above scenario in the long run should occur just as often with the imprecisely rated player losing."
    [2]

    In the Glicko system the player's strength is supposed to be shown as a confidence interval. This is done by taking player's rating and subtracting from it twice the RD which is used for the lowest value of the interval. For the highest value of the interval player's rating plus twice the RD. [1] Example from Glickman:

    " for example, if a player's rating is 1850 and the RD is 50, the interval would go from 1750 to 1950. We would then say that we're 95% condent that the player's actual strength is between 1750 and 1950. When a player has a low RD, the interval would be narrow, so that we would be 95% condent about a player's strength being in a small interval of values."
    [1]

    For more certainty:

    "Each player can be characterized as having a true (but unknown) rating that may be thought of as the player's average ability. We never get to know that value, partly because we only observe a finite number of games, but also because that true rating changes over time as a player's ability changes. But we can *estimate* the unknown rating. Rather than restrict oneself to a single estimate of the true rating, we can describe our estimate as an*interval* of plausible values. The interval is wider if we are less sure about the player's unknown true rating, and the interval is narrower if we are more sure about the unknown rating. The RD quantifies the uncertainty in terms of probability..."
    [2]

    - Mark Glickman

    Rating period which is a collection of games in a period of time. Ratings could be calculated after the end of period or game-by-game basis. A rating period could be several minutes or months and x number games at the discretion of admins. Prior period is used in the calculation for the next rating period. Glicko system works best when the number of games in a rating period is an average of 5-10 games per player.[1]

    Glicko-2 was developed as a extension to Glicko systems. Glickman explained:

    "Every player in the Glicko-2 system has a rating, a rating deviation, RD, and a rating volatility. The volatility measure indicates the degree of expected uctuation in a player's rating. The volatility measure is high when a player has erratic performances (e.g., when the player has had exceptionally strong results after a period of stability), and the volatility measure is low when the player performs at a consistent level."
    [3]

    The volatility measure is not used in calculating the RD and calculated at the end of rating period. "Glicko-2 system works best when the number of games in a rating period is moderate to large, say an average of atleast 10-15 games per player in a rating period. The rating scale for Glicko-2 is dierent from that of the original Glicko system. However, itis easy to go back and forth between the two scales." [3]

    Caution: Ranking systems do not care about your feelings, pride, ego, or progression. You might get butthurt.

    Pros and Cons


    Work-In-Progress

    TrueSkill:
    a brief

    Work-In-Progress

    A Note on
    Elo


    Work-In-Progress

    Summary


    Work-In-Progress

    Further Reading

    Work-In-Progress
    1. Team Liquid's version: http://www.teamliquid.net/forum/starcraft-2/142211-sc2-ladder-analysis-part-2
    2. Shannong's (Shadowera) version: http://www.shadowera.com/showthread...stem-works-and-why-high-ratings-act-strangely
    3. Antar's (Pokemon Showdown) version: http://www.smogon.com/forums/threads/everything-you-ever-wanted-to-know-about-ratings.3487422/
    4. Kurtgodden's (chess.com) http://www.chess.com/blog/kurtgodden/elo-to-glicko-your-rating-explained
    5. Erik's (chess.com) http://www.chess.com/article/view/chess-ratings---how-they-work

    Reference

    1. Glickman, M. (n.d.). Glicko Ratings. [online] Mark Glickman's World. Available at: http://www.glicko.net/glicko/glicko.pdf [Accessed 16 Dec. 2014].

    2. Vek/Glickman. (2008). FICS Help: glicko. [online] Freechess.org. Available at: http://www.freechess.org/Help/HelpFiles/glicko.html [Accessed 16 Dec. 2014].

    3. Glickman, M. (2013). Example of the Glicko-2 system. [online] Mark Glickman's World. Available at: http://www.glicko.net/glicko/glicko2.pdf [Accessed 16 Dec. 2014].
    Last edited: January 3, 2015
    Zaphys, Quitch, xankar and 7 others like this.
  2. cptconundrum

    cptconundrum Post Master General

    Messages:
    4,186
    Likes Received:
    4,900
    Very nice post. I have to wonder if it would be possible to cheat the system by practicing a lot and only rarely playing ranked games. At the moment it seems hard to gain many points because I play too often.
    sebovzeoueb likes this.
  3. Dementiurge

    Dementiurge Post Master General

    Messages:
    1,094
    Likes Received:
    693
    Coincidentally (and courtesy of the other thread), I made an .html page with some javascript that lets you test the Glicko 2 system.

    I think what Conundrum might be experiencing is that the matchmaking locks you into a set skill level of players and won't let you go. With Glicko 2, your rating can change very quickly if you play against opponents with a significantly different rating, [strike]but against similar opponents it's almost immovable[/strike]. Edit: Actually that's not true, multiple consecutive wins/losses should propel you in a direction quite quickly.

    I'll see if I can expand it to do Glicko 1 as well.

    Attached Files:

    lokiCML likes this.
  4. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    Thank you. While it's not really cheating because the system was designed to do that.:confused: That is one of the complaints about Glicko.:( Most likely because people just don't know how it works. If you to true skill it would likely do the same thing. Microsoft research used concepts for Glicko when they developed true skill. Also RD effectively stopped players from to a rank and you'll longer plays in order to keep it.

    Misunderstood a system causes developers and players difficulties with it.


    tl;dr: The price you pay for accuracy of a player's rating. It's not really cheating; it's what it's designed to do.:eek:
    Last edited: December 17, 2014
    xankar and elodea like this.
  5. Dementiurge

    Dementiurge Post Master General

    Messages:
    1,094
    Likes Received:
    693
    Made a tester that has Glicko 1 and 2, and lets you count unplayed rating periods. (I'm pretty sure the alg is correct. Glicko 1's example wasn't as comprehensive as Glicko 2's.)

    The 'cheating' is just using unranked play to artificially inflate your win:loss ratio, essentially playing with loaded dice. It seems like Glicko and Glicko 2 are very sensitive to it because they assume fair matchmaking.

    Attached Files:

  6. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    What happens if a rating period is two months or four months that has 30-50 or 50-100 games (less or more) in a period? And then having the uncertainty slowly go up over time of inactivity say 12 months give you full RD or after a consecutive number of rating periods not actively playing? After end of a rating period the ladder is reset.

    Do you know of any program that can process a data set of matches and perform attacks against them. Also that shows the results and comparison with different rating system such as ELO, Glicko-1/2, TrueSkill, etc. Now 'cheating' for me its sounds like feelings or not understanding rather they actual cheating but who knows.
  7. Dementiurge

    Dementiurge Post Master General

    Messages:
    1,094
    Likes Received:
    693
    For the most part, these are questions that the tester is meant to answer.
    The real-time length of a rating period is mostly irrelevant to Glicko, though it's worth noting that with Glicko 1 it may take over a hundred rating periods before your RD is "full".

    ELO, Glicko and TrueSkill use such different distributions for scoring that they're probably incomparable.
  8. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    Fair enough but for what I'm thinking a I need a real data set in order to do the attacks.
    The distributions that they use are logistic (USCF) or normal distribution (FIDE) for ELO. Glicko-1/2 uses a logistic distribution and TrueSkill uses a normal distribution or can use logistic distribution same goes for Glicko. These are implementation details so it depends. A rating is only relevant to a give set of players. It's not the ratings I am interested. What can a person or a group due to the overall ratings? How does it affect these ranking systems?

    - http://research.microsoft.com/en-us/projects/trueskill/faq.aspx

    Edit: MS research went with a normal distribution for TrueSkill because not as intensive to calculate as is logistic and both relatively give the same distribution of rankings.
    Last edited: December 31, 2014
    Quitch likes this.
  9. g0hstreaper

    g0hstreaper Well-Known Member

    Messages:
    686
    Likes Received:
    553
    I'm sorry for sounding rude may I have a TLDR version ( I see a lot of writing and go glossy eyed)
  10. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    TLDR version is To Be Continued...;)
  11. g0hstreaper

    g0hstreaper Well-Known Member

    Messages:
    686
    Likes Received:
    553
    awwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
  12. lokiCML

    lokiCML Post Master General

    Messages:
    1,973
    Likes Received:
    953
    Added a summary to OP. (To Be Continued...):)
    g0hstreaper likes this.
  13. g0hstreaper

    g0hstreaper Well-Known Member

    Messages:
    686
    Likes Received:
    553
    It's okay I actually went back and read it and overall it's a very informative post about the ranking systems how how the original ELO system was formed which is very well developed and now

    The more you know :D
  14. aapl2

    aapl2 Active Member

    Messages:
    260
    Likes Received:
    175
    it's not a book, just a few paragraphs. Just read it.
    stuart98 likes this.
  15. g0hstreaper

    g0hstreaper Well-Known Member

    Messages:
    686
    Likes Received:
    553
    I did say I went back and read it didn't I? :l
  16. aapl2

    aapl2 Active Member

    Messages:
    260
    Likes Received:
    175
    I was on an old verson of the thread before you did, it only showed it after I replied.
    ¯\_(ツ)_/¯
    g0hstreaper likes this.

Share This Page