By Keith Lyons, University of Canberra
It doesn’t matter if you’re a hard-core football nut, a once-every-four-years fan or even a psychic animal – most of us speculate on the winner of the World Cup.
The 2014 competition is held in Brazil (which, incidentally, has had the most national team success in football World Cups, winning five of seven finals appearances). Will the home team have an advantage too large to overcome? Will a neighbouring nation steal Brazil’s thunder? Or will a non-South American country take the coveted cup overseas?
Let’s take a look at two types of rankings and four predictions.
First up, we have the FIFA World Ranking tables which calculate points over a four-year period by adding:
the average number of points gained from matches during the past 12 monthsthe average number of points gained from matches older than 12 months (and this depreciates yearly).
Spain’s Andres Iniesta celebrates at the final whistle of the World Cup 2010 final match. EPA/Gerry Penn
The FIFA top ten (as of May 8, 2014) comprises:
England is just outside the top ten at 11, and Australia sits at 59.
Then we have the Elo rating system. Elo ratings were developed by Hungarian-American physics professor Arpad Elo and were originally used to rank chess players (Elo was a chess master).
Ratings are determined by calculating the relative skill levels of players (or teams, in the case of football).
The system was adapted for football in 1997 to take football-specific variables into account, such as the competition in which the game was played – World Cup matches are weighted more heavily than friendlies – and home ground advantage.
A Brazil fan at the 2010 World Cup. EPA/Daniel Dal Zennaro
The Elo top ten (as of June 2, 2014) is:
Australia comes in last at 33 (even though there are 32 teams in the World Cup, Serbia – ranked 26 – isn’t competing).
1. Just the stats
Goldman Sachs last month published their thoughts on the winning team – their fifth such book of World Cup predictions. Their approach includes:
a stochastic model of the outcomes for each of the 64 World Cup gamesa regression analysis of all full international games from 1960 (using goals scored)difference in Elo rankings between both teams (a figure they consider “the most powerful variable in the model”)a country-specific dummy variable relating to World Cup playhome advantage (country and continent)a Monte Carlo simulation with 100,000 draws.
Note that the Goldman Sachs model “does not use any information on the quality of teams or individual players that is not reflected in a team’s track record” and the approach is “purely statistical” – in other words, injuries have no bearing on their predicted outcomes.
They predict that Brazil will be victorious over Argentina in the final, 3-1.
Goldman Sachs’ predictions have, though, been subject to criticism. James Grayson, for example, outlines on a blog post that Goldman Sachs’ assumptions are too biased towards Brazil.
2. Based on strength
Sports media company Infostrada, which is “developing various methodologies to forecast major sporting tournaments by implementing various techniques”, used the Elo rating system to forecast the results of the World Cup. This approach:
is based on all historical match results from all teamsupdates the rating after each match to show current strengthhas teams gain points when winning and lose points when losinghas teams gain more points for beating stronger opponents.
They predict Brazil, Germany, Spain and Argentina will reach the semi-finals, with Brazil beating Argentina in the final.
3. A visualisation project
Brazilian software engineer Andrew Yuan shared his World Cup predictions earlier last week. He investigated factors “that are measurable, available and can be good indicators of a match outcome” and provided a detailed account of his methodology on github.
Andrew looked at the outcomes of 13.337 FIFA official matches since 1994 involving the 2014 World Cup teams. He looked at each team’s relation in FIFA ranking tables, the location where the match took place (home, away or neutral venue) and the proportion of matches won.
He used logistic regression, which models the relationship between a dependent variable and one or more independent variables, and allows him to look at the fit of the model and the significance of relationships.
In an interactive, you can see he has Brazil as his probable winner.
4. Top four prediction
David Dormagen from Freien Universität Berlin presented a very clear account last month of a simulation model he developed to predict the outcome of the 2014 World Cup.
His approach allows for the “integration of rating systems and rules where either no clear formula for a probability other than a win or loss exists or where the historical data is not enough to derive such a formula”.
In addition, he was “also able to combine the results from different rating methods with user-given weights without influencing other calculations, such as the calculation of the draw-probability, the adjustment of the win expectancy for home teams, or the calculation of the expected goals”.
After 100,000 iterations of his simulator, David identified four favourites for 2014 World Cup champions: Brazil, Spain, Argentina and Germany.
Of course, there are loads of predictions out there – these are just a few. But four different models described here have identified Brazil as probable winners of the 2014 World Cup, and three agree on the four semi-finalists – Andrew Yuan has Portugal in his four ahead of Argentina.
What will be fascinating is whether any team can outperform their “destiny”.
My own analysis of the World Cup will be triggered by a very basic question: did the higher Elo ranked team score the first goal in the game?
The answer to that question will enable a closer look at what happened if the higher ranked team loses. My expectation is that a higher ranked team that scores first will not lose (it will win or draw). I hope to explore any negative evidence and critical incidents that lead to a counter-predictive outcome.
In 2010 I used FIFA Rankings to ascribe status to teams. Some 51 of the 64 games in the 2010 World Cup were won by the higher FIFA ranked team. The exceptions were:
Serbia (v Ghana and v Australia)Cameroon (v Japan and v Denmark)France (v Mexico and v South Africa)Greece (v Korea)Spain (v Switzerland)Germany (v Serbia)Italy (v Slovakia)Denmark (v Japan)USA (v Ghana)Brazil (v Netherlands).
Nine of these 11 defeats were in group games. USA lost to Ghana in the Round of 16 and Brazil lost to the Netherlands in the Quarter Final.
For many years, performance analysts have engaged with a process that makes a permanent record of performance that enables description, analysis, modelling and prediction. The final part of this process for me is the transformation of performance by coaches and athletes.
We have an increasing amount of insight into performance. The 2014 World Cup is a great focus for analytical effort and elegance.
A version of this article was originally published on Clyde Street.
Keith Lyons does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.