This repo is my personal python project quick starter. It contains my favourite tools and options for creating python projects for data science, web development, and adhoc projects. While this is intended to be a personal resource, this is open to public users.
The quick-start includes the following features:
- Full set-up guide and checklist - so you can quickly set up tooling and get into coding
- Choice of dependency and virtual environment management with full-featured poetry workflow, partial conda workflow, and docker with poetry - so you can ensure code runs on different environments
- Pre-populated Git and Github assets including gitignore, codeowners, templates, issue labels - to make your Github experience more enjoyable
- Full featured documentation. Includes README with prompts, opinionated CONTRIBUTORS guide, and LICENSE. Sphinx documentation generation with markdown and API reference support, and ability to convert schema tables into data documentation. Github actions to automatically generate Github pages - to easily expose project documentation
- CI/CD framework including pre-commit for quick formatting, task automation with Nox, and Github Actions - to ensure high quality releases
- Linting, type checking, and tests with minimum of tool config files and close nox and pyproject.toml integration - to standardize code and minimize clutter
- Tools for release management including tagging and versioning process, Github actions for release notes, test-pypi and release actions - to simplify the code release process
my_project
dummy project example with logging, imports, pytest, argparse CLI, poetry scripts, docstrings - for a minimal non-tool codeset with warmed up examplesndj_pipeline
machine learning pipeline and framework - to separate data engineering concerns and build repeatable experiments
This guide contains three major sections:
- About this repo. General info about this repo.
- Setup new repo. Instructions for copying this repo to create a new project.
- README Template. Warmed up README template for new projects with writing prompts, instructions for usage and development.
The following sources have been inspiration for creating my own project quick starter.
Clone repo locally and delete the .git
directory to start fresh.
- Edit the
my_project.__init__.py
to revert to version0.1.0
. - Rename
my_project
package name and imports - Edit the
README.md
to include my project onwards. - Check python and library versions in noxfile
- Check python and library versions, and branch name in Github actions
- Change LICENSE as required. Use this guide to redo licenses across project
Change name from my_project to new name in:
README.md
CONTRIBUTING.md
.flake8
pyproject.toml
docs/conf.py
setup.py
tests/test_utils.py
Dockerfile
docker-compose.yml
See the Environment 1: Poetry in the Developer Guide to set up your own environment first.
Remove tools not required by poetry, but required for conda
- Delete
setup.py
- Delete
docs/requirements.txt
- Edit Conda references in
README.md
- Dockerfile and docker-compose.yml
See the Environment 2: Conda in the Developer Guide to set up your own environment first.
Note: using conda will mean incompatability with some Nox, Github actions, and library publish functionality. Only the default Nox sessions are included (with light flake8 checks), plus black and docs.
Additional conda related setup:
- Setup project details in
setup.py
. - Remove or update the following Github Actions:
- coverage
- release
- test-pypi
- tests
- Update project README specify conda instructions
Dockerfile
and docker-compose
are supported using poetry for dependencies.
See above instructions for conda cleanup.
See the Environment 3: Docker in the Developer Guide to set up your own environment first.
pre-commit install
git init
git add .
git add logs/.gitkeep --force
git commit -m "initial commit"
git tag 0.1.0
Create a repo in github and follow instructions to push (including tags).
Check if the branch name is main
or master
- Github Actions are set to use main
.
- Setup Codecov connection
- Setup pypi and test-pypi secrets, uncomment test-pypi github action
- In Github repo set up dependabot and Github pages
What is it, at a high high level? Who is the audience or end users? Any requirements? What are the feature and benefits?
The following are the quick start instructions for using the project as an end-user.
Follow the Instructions for developers to set up the virtual environment and dependency management.
We recommend poetry
for full functionality. An alternative conda
environment has been prepared but will not work with nox
.
Note: Instructions marked with %% are not functioning and are for demo purposes only.
Install the project using pip %%:
pip install my_project
Include an example of running the program with expected outputs.
To replicate the data transformations and model results, run the following commands from the project root.
These should be run from the poetry shell
, or conda
environment, or with the poetry run
prefix.
python -m ndj_pipeline.transform
python -m ndj_pipeline.model -p data/doordash_pred.yaml
python -m ndj_pipeline.final_prediction_clean
This will produce a feature rich dataset in data/processed
, model results and metrics under data/doordash_pred
, and the formatted predictions file under data_to_predict.csv
.
Example of using poetry to create scripts.
my_project -i1 environment.yml -i2 environment.yml -v
Alternatively run from Dockerfile or docker-compose. See Docker environment instructions for more details.
The user guides can be found on github pages.
This includes overview of features, discussion of ndj_pipeline
framework, and API reference.
Please raise an issue with bug
label and I will look into it!
The following are the setup instructions for developers looking to improve this project. For information on current contributors and guidelines see the contributors section. Follow each step here and ensure tests are working.
Poetry handles virtual environment management, dev and optional extra libraries, library development, builds and publishing.
Check the poetry website for the latest instructions on how to install poetry. You can use the following command on OS/linux to install poetry 1.1.9 used in this project.
curl -sSL https://install.python-poetry.org | python - --version 1.1.9
It is recommended to set virtual environment creation to within project using the following command.
This adds a .venv
directory to project to handle cache and virtual environment.
poetry config virtualenvs.in-project true
You can set up the virtual environment in the repo using the following command.
Make sure that any other virtual environments (i.e. conda deactivate
) are deactivated before running.
poetry install
Troubleshooting: You may need to point poetry to the correct python interpreter using the following command.
In another terminal and in conda, run which python
.
poetry env use /path/to/python3
When the environment is correctly installed, you can enter the virtual environment using poetry shell
. Library can be built using poetry build
.
Conda is a lightweight solution for Anaconda python users to handle virtual environment management and basic library specification.
Following commands will create the conda environment and setup the library in interactive development mode using setup.py.
conda env create -f environment.yml
conda activate my_project
pip install -e .
Library can be built using
python setup.py bdist_wheel
Docker goes beyond virtual environment management to virtualize the operating system itself. The docker container is specified through a Dockerfile and can be run with docker commands or docker-compose. Dependeny management is handled through poetry.
Use either of the following commands to setup and run the docker environment.
docker build -t ndj_cookie/my_project .
docker run --rm ndj_cookie/my_project
docker stop $(docker ps -a -q)
Example docker-compose also included.
docker-compose build
docker-compose up
docker-compose down
Nox is a command-line tool that automates testing in multiple Python environments, similar to tox, Makefiles or scripts. Unlike tox, Nox uses a standard Python file for configuration.
Here it is used for code quality, testing, and generating documentation.
The following command can be used to run mypy, lint, and tests. It is recommended to run these before pushing code, as this is run with Github Actions. Some checks such as black are run more frequently with pre-commit.
poetry run nox
Local Sphinx documentation can be generated with the following command. Documentation publishing using Github Actions to Github pages is enabled by default.
poetry run nox -s docs
Other available commands include:
poetry run nox -rs coverage
Pre-commit is a framework for managing and maintaining multi-language pre-commit hooks.
It intercepts the git commit
command to run checks of staged code before the commit is finalized.
The checks are specified in .pre-commit-config.yaml
.
Checks in use are quick, pragmatic, and apply automatic formatting checks.
If checks fail, it is usually only a matter of re-staging the files (git add
) and attempting to commit again.
The aim is to provide a lightweight way to keep some code standards automatically in line with standards. This does not replace the need to run nox tests, although pre-commits will satisfy some of the nox test checks.
On first time use of the repository, pre-commit will need to be installed locally.
You will need to be in the poetry shell
or conda
environment.
Run the following command to perform a first time install.
pre-commit install
This will cache several code assets used in the checks.
When you have new code to commit, pre-commit will kick in and check the code. Alternatively, you can run the following command to run for all files in repo.
pre-commit run --all-files
- Nick Jenkins - Data Scientist, API & Web dev, Team lead, Writer
See CONTRIBUTING.md in Github repo for specific instructions on contributing to project.
Usage rights governed by LICENSE in Github repo or page footer.