An ML project template with sensible defaults:
- Dockerised dev setup
- Unit test setup
- Automated tests for model metrics
- CI pipeline as code
For infrastructure-related stuff (e.g. provisioning of CI server, deployments, etc.), please refer to https://github.com/ThoughtWorksInc/ml-cd-starter-kit.
- Fork repository: https://github.com/ThoughtWorksInc/ml-app-template
- Clone repository:
git clone https://github.com/YOUR_USERNAME/ml-app-template
- To develop on local environment with installed Python packages, run:
pipenv install
then activate environment withpipenv shell
3.b. to run anything without activating the virtual environment, for example, nosetests, trypipenv run nosetests
- Install Docker (Mac, Linux)
- Start Docker on your desktop
- Build image and start container:
# build docker image [Mac/Linux users]
docker build . -t ml-app-template
# build docker image [Windows users]
MSYS_NO_PATHCONV=1 docker build . -t ml-app-template
# start docker container [Mac/Linux users]
docker run -it -v $(pwd):/home/ml-app-template \
-p 8080:8080 \
-p 8888:8888 \
ml-app-template bash
# start docker container [Windows users]
winpty docker run -it -v C:\\Users\\path\\to\\your\\ml-app-template:/home/ml-app-template -p 8080:8080 -p 8888:8888 ml-app-template bash
# Note: to find the path, you can run `pwd` in git bash, and manually replace forward slashes (/) with double backslashes (\\)
You're ready to roll! Here are some common commands that you can run in your dev workflow. Run these in the container.
# add some color to your terminal
source bin/color_my_terminal.sh
# activate virtual environment for python
pipenv shell
# run unit tests
nosetests
# run unit tests in watch mode and color output
nosetests --with-watch --rednose --nologcapture
# train model
SHOULD_USE_MLFLOW=false python src/train.py
# start flask app in development mode
python src/app.py
# make requests to your app
# 1. In your browser, visit http://localhost:8080
# 2. Open another terminal in the running container (detailed instructions below) and run:
bin/predict.sh http://localhost:8080
# You can also use this script to test your deployed application later:
bin/predict.sh http://my-app.com
Here are some other commands that you may find useful
# see list of running containers
docker ps
# start a bash shell in a running container
docker exec -it <container-id> bash
# starting jupyter notebook server on http://localhost:8888
jupyter notebook --ip 0.0.0.0 --no-browser --allow-root
We've created a project template to help you with the boilerplate code that we usually have to write in any typical project.
To reduce incidental complexity, we used a simple dataset (boston housing prices) to train a simple linear regression model. Replace the (i) data, (ii) data preprocessing code and (iii) model specification for your use case.
This is the project structure:
.
├── Dockerfile
├── README.md
├── requirements-dev.txt # specify dev dependencies (e.g. jupyter) here
├── requirements.txt # specify app dependencies here
├── ci.gocd.yaml # specify your CI pipeline here
└── src # place your code here
├── app.py
├── app_with_logging.py
├── tests # place your tests here
│ ├── test.py
│ └── test_model_metrics.py
└── settings.py # define environment variables here
└── train.py
├── bin # store shell scripts here
│ ├── color_my_terminal.sh
│ ├── configure_venv_locally.sh
│ ├── predict.sh
│ ├── start_server.sh
│ ├── test.sh
│ ├── test_model_metrics.sh
│ └── train_model.sh
├── docs
│ ├── FAQs.md
│ └── mlflow.md
├── models # serialize stuff here
│ ├── _keep
│ ├── column_order.joblib
│ └── model.joblib
For logging, app_with_logging.py
contains the code for logging (i) inputs to the model, (ii) model outputs and (iii) LIME metrics. You can refer to this file to send logs to elasticsearch using fluentd. To keep the main app simple to accessible to people who may not be familiar with these technologies, we've kept it in a separate file app_with_logging.py
for reference.
Please refer to FAQs for instructions on how to configure VS Code or PyCharm to give you intellisense and auto-complete suggestions as you code.
To provision the infrastructure used in this repo (e.g. GoCD, MLFlow, EFK), please check out the ml-cd-starter-kit
repo and follow the instrutions in the README.
When you're done setting up the infrastructure, do the following:
- in
src/settings.py
, update the ip addresses with that of your own infrastructure. - in
ci.gocd.yaml
, replacedavified/ml-app-template
withYOUR_USERNAME/YOUR_IMAGE_NAME
If you encounter any errors, please refer to FAQs for a list of common errors and how to fix them.