The goal of the Big Data FC project is to predict how many points a football team belonging to the main European football leagues will end the season with, according to the characteristics of its players.
To reach the goal, data relative to the football players will first be loaded, in order to then compose the football teams. After that, a second dataset will be used to gather seasonal rankings, for every football team.
The project as a whole is composed of:
- This notebook, containing all steps of:
- Data loading.
- Data cleaning and pre-processing
- Data visualization.
- Data analysis.
- Learning and evaluation.
- A custom scraper, to gather further players data.
- A set of REST APIs to query the loaded data and the prediction model.
- The collection of scraped datasets.
During the project, multiple approaches and techniques were explored and described in this notebook.
The notebook follows the thinking flow that happened during development stage:
- Notebook set-up and configuration
- Data loading and pre-processing
- Preliminary data exploration
- Multiple learning attempts:
- Naive
- Dimensionality reduction
- Learning-produced features (via Clustering)
- Prior-based approach (RP coefficient)
- Final observations and conclusion
By Daniele Solombrino and Davide Quaranta.