REGRESSION MODELING
PREDICTING THE MEDAL TABLE OF THE SUMMER GAMES
Next year on July 24th, 2020, an expected 11,091 athletes from 206 nations will gather in Tokyo, Japan to celebrate at the opening ceremony of the Games of the XXXII Olympiad. They will compete for gold, silver and bronze medals in 339 events across 33 sports, honoring the long-standing tradition of the modern Olympic Games, which began in Athens, Greece in 1896.
As with many international sporting mega-events, professional forecasters and enthusiastic fans enjoy predicting the outcome of the Olympic Games. The national medal table is a common metric for quantifying the overall performance of each country, aggregating the number of gold, silver, bronze and total medals collected by the individual athletes of each national team.
Daniel Johnson, an economics professor from Colorado College, used socio-economic data to predict national Olympic performance from 2000 to 2008. His model predicted the total medal count of each country at the Beijing 2008 Olympics with 94% accuracy, relying on per-capita income, population, political structure, climate, home-field advantage and geographic proximity.
A model of the Sochi 2014 Olympics employed economic trade information, namely the total value of national exports, as well as geographic data, such as land area and latitude. Subsequently, the Rio 2016 Olympics was modeled with similar national information, including comparative levels of national wealth along with historic performance in previous Olympic Games.
Randi Griffin posted a complete Kaggle dataset containing the records of each athlete and event from the Athens 1896 Olympic Games through the Rio 2016 Olympic Games. With 271,116 records and 15 columns, let’s build our own machine learning regression model to predict the medal table of the Tokyo 2020 Olympic Games, which we can train using the historic Olympic record!
Continue reading the full story curated by Towards Data Science, a Medium publication...
SPORTS | MACHINE LEARNING
PANDAS
Two CSV files are available containing athlete results and Olympic country identifiers (NOCs).
271,116 athlete records contain the following data:
- ID - Unique number for each athlete
- Name - Athlete's name
- Sex - M or F
- Age - Integer
- Height - In centimeters
- Weight - In kilograms
- Team - Team name
- NOC - National Olympic Committee 3-letter code
- Games - Year and season
- Year - Integer
- Season - Summer or Winter
- City - Host city
- Sport - Sport
- Event - Event
- Medal - Gold, Silver, Bronze, or NA
Contact
Acknowledgements
- Logo by The Tokyo Organising Committee of the Olympic and Paralympic Games
- Dataset by Randi H Griffin
License