This repository contains data on test cricket matches alongside data considered relevant to the match result, e.g. the ratings of the teams when each match was played. This data is intended to provide insight into which factors are most stastitically significant to the outcome of a match, and hopefully build predictive models.
The project is currently in the very early stages of development. Processed data is only available for matches between 2003-2013. Data analysis is limited to a few exploratory notebooks.
First, clone the atoMEC repository and cd
into the main directory.
-
Recommended : using pipenv
This route is recommended because
pipenv
automatically creates a virtual environment and manages dependencies.- First, install
pipenv
if it is not already installed, for example viapip install pipenv
(or see pipenv for installation instructions) - Install the package and its dependencies with
pipenv install
- Use
pipenv shell
to activate the virtual environment - To set up a Jupyter kernel for the environment:
python -m ipykernel install --user --name=test_cricket_stats
- First, install
A requirements.txt
file is also provided (generated automatically from the Pipfile
) for alternative installation methods.
The structure of this project is based on the cookiecutter data science project template
To make the dataset, run make data
from the home directory.
That's it so far! Feel free to have a look at the exploratory notebooks for some ideas of what can be done with the data, but so far nothing else is implemented.
Contributions are highly welcome. Please adhere to the following simple guidelines:
- Contributors should develop on branches based off of
main
and merge requests should be tomain
- Please choose a descriptive branch name
- Python code should be formatted using black style
Data is sourced from the following sources:
- ICC historical rankings - rankings data up to March 2013
- Cricsheet - match data from 2004 - present
- howstat - series data
We are grateful for these data sources!