Skip to content

EdwardCuiPeacock/nbcu-metadata-enhancement

Repository files navigation

NBCU Metadata Enhancement

SETUP NEW MODEL:

STEP 0 - CHANGE DIRECTORY:

set working dir to training

cd training

STEP 1 - CHANGE CONFIGS:

STEP 2 - SOURCE COMMANDS:

source pipeline_commands.sh

STEP 3 - SETUP PIPELINE:

you can now just use the commands

build_pipeline
update_pipeline
run_pipeline

to build/update/launch your kubeflow pipeline

NOTE: VERSIONS USED

tfx=0.28
skaffold=v1.17.0 (should work with v2, just change the build.yaml)
tensorflow=2.4.1

This codebase is split into two main folders:

Each of these folders should have their own readme, which explain how to run the pipeline/service locally or in the cloud, and any setup instructions.

The purpose of this readme is for any information which is required for both training and serving folders. Add any information here that you feel is relevant to both training and serving.


Python version management

We use pyenv to manage our Python version, and this is specified in a .python-version file in the serving and training directories.

To get started, cd into the training or serving directory, and make sure you have the correct python version installed with pyenv:

pyenv install `cat .python-version`
pyenv local `cat .python-version`  # Activate the correct python version

Package management

We currently use Poetry for python package management. We prefer to use Poetry rather than Pipenv or similar as Poetry seems to be simpler and faster.

Poetry

This is used to create a virtual environment and install all python packages inside. There are separate pyproject.yaml and poetry.lock files for both training and serving folders.

To install poetry just run:

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -

Once installed, you can install all the required packages (including dev packages) with the following:

poetry install

from either the serving dir or the training dir.

To enter the virtualenv in order to run commands with the installed packages, use

poetry shell

which will activate the virtualenv for you.

To add a new package, you can run:

poetry add <package>

See more advanced usage at https://python-poetry.org/docs/cli/#add. Make sure to commit the new poetry.lock file to git if you add any new packages.

Precommit

This project has an automatic linter setup which runs both Black and Flake8. A good writeup of this solution is here.

To setup precommit:

# Install pre-commit
pip install pre-commit

# Setup pre-commit hooks
pre-commit install

To run the precommit on all files:

make pre-commit

In addition to being ran on every commit, this is ensured with the linter stage in bibcd. This builds a dockerfile and runs the pre-commit on all files.

Building and Running the Pipeline

You can use the MLCLI tool in order to build and run your pipelines.

Please look at the README for instructions on how to get started.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages