This is a template cookiecutter project for bootstrapping your work on python data projects. It contains :
- a directory structure for sorting your notebooks, data, models, figures, tasks and source code to reuse in notebooks
- a conda environment file with the basic python libraries and some extras :
- numpy / pandas / scikit-learn / seaborn / statsmodels / plotly / jupyterlab classic Data Science stack
- lightgbm for prediction
- missingno for missing data analysis
- invoke as a replacement to
Makefile
for managing project tasks - nbdime for diffing and merging notebooks
- path.py for browsing files in Python
- kaggle-api a CLI for interacting with Kaggle API (Optional)
- pytest and coverage (with badges)
The template post-hook will:
- install itself as a package
- add an ipykernel so that the environment is properly referenced by Jupyter
- Cookiecutter >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
In a folder where you want your project generated :
cookiecutter git@github.com:rcammisola/python-data-project.git
You can also clone the project in <path/to/template>
,
and from the folder where you want to generate your project, launch cookiecutter <path/to/template>
It will ask for the following values :
full_name
email
project_name
project_short_description
python_version
version
for_kaggle
Complete the values for your project and voilà ! Then follow the README
inside your new project for further installation.
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
This project is heavily influenced by drivendata's Data Science cookiecutter and cookiecutter Kaggle template project.
Other links that helped shape this cookiecutter :
- Write less terrible code with Jupyter Notebook
- Cookiecutter DataScience Opinions
- Working efficiently with JupyterLab
- Python Packaging - Ionel's Codelog
- Git init
- Add nb-clean to dev requiremnts
- Add bumpversion