Wednesday, May 28, 2014

Predicting the World Cup

Using ELO ratings, we can predict how far each team will make it in the 2014 World Cup.

Probability of each team reaching the Round of 16 (Ro16), Quarterfinals (QtrF), Semifinals (SemiF), and Finals, as well as the probability of winning the World Cup (Champ).  These data are not cumulative (e.g. Brazil has probabilities of 94% 65%,  52%, 38%, and 26% of reaching Ro16, QtrF, SemiF, Final, and Champ).

ELO is a ratings system originally devised for chess players.  Each player (or in this case, each soccer team) has a numerical rating.  When two teams play, the winner takes some points from the loser.  The amount of points exchanged depends on the relative ratings of the two teams.  I've used the ratings from www.eloratings.net, which uses data from www.world-results.net to track the results of every international soccer match and calculate a current ELO rating for each national team.

Based on the ELO rating of two teams, I calculate the probability of either team winning.  ELO is somewhat limited, in that it can only predict a binary outcome (win or loss), whereas in the World Cup group stages, there are 3 outcomes (win, loss, draw).  Luckily, Lars Schiefler at www.clubelo.com has come up with a model that uses ELO ratings to predict the number of goals scored by each team.  I've used that model to predict the outcomes of the group stages, and I use the regular ELO system to calculate the win probabilities of the knockout rounds.

Using these probabilities, I performed 100,000 random simulations of the entire tournament.  The full summary is in the figure above.  Here's the breakdown by group of the probabilities (in %) for each team.

"Grp1" is the winner of the group and "Grp2" is the runner-up. RO16, QtrF, SemiF, and Final are the knockout stages, and Win is the winner of the World Cup.
Finally, I wanted to look at how group selection affected each team's chances.  I did a regression between the ELO rating of each team and its probability to advance to the knockout round.  I've highlighted teams that stray from the regression line - team above the line have a higher chance of advancing than their rating would suggest, and teams below the line have a lower chance.



You can see that the USA and Ghana will have a hard time due to their match-up with Germany and Portugal.  Chile is also in a tough group, having to face Spain and the Netherlands.  Belgium and Russia have an easy go of it, being matched in Group H with Algeria and South Korea.  I was surprised that France didn't show up in this analysis, but then again, they are only ranked 12th in the world according to ELO, and Ecuador is a better team than a lot of people realize.

3 comments:

  1. Did you apply this model to last championship?, any agreemet between reality and model?

    ReplyDelete
  2. Good question. I used the ELO rankings from May 2010 to run the same simulation of the 2010 World Cup.

    http://i.imgur.com/7RDY4QA.png

    The blue bars show how far each team made it in the tournament, and the numbers again show the probability of reaching each stage. Qualitatively there is pretty good agreement, although it's tough to say with such a small number of games. I've found ELO data for 2006 and 2002, so I might do something more in-depth with that.

    ReplyDelete
  3. The 2010 groups also seemed to be quite a bit more defined in terms of potential progression to Ro16. The Ro16 results also seems reasonably in line with predictions. You might want to add a modifier for England always underperforming. :)

    ReplyDelete