-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename Stack to Stacks and Fix Bugs #107
Changes from 6 commits
ab4cb34
ac75e54
a44c8f3
0d9111d
8106188
cfdc2ad
c334c9f
ccc9fa5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,4 +11,4 @@ | |
__pycache__/ | ||
.cache | ||
*.pyc | ||
mlops-stack.iml | ||
mlops-stacks.iml |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,6 @@ | ||
# Databricks MLOps Stack | ||
# Databricks MLOps Stacks | ||
|
||
> **_NOTE:_** This feature is in [private preview](https://docs.databricks.com/release-notes/release-types.html). The interface/APIs may change and no formal support is available during the preview. However, you can still create new production-grade ML projects using the stack. | ||
If interested in trying it out, please fill out this [form](https://docs.google.com/forms/d/e/1FAIpQLSfHXCmkbsEURjQQvtUGObgh2D5q1eD4YRHnUxZ0M4Hu0W63WA/viewform), and you’ll be contacted by a Databricks representative. | ||
> **_NOTE:_** This feature is in [public preview](https://docs.databricks.com/release-notes/release-types.html). | ||
|
||
This repo provides a customizable stack for starting new ML projects | ||
on Databricks that follow production best-practices out of the box. | ||
|
@@ -19,25 +18,25 @@ Your organization can use the default stack as is or customize it as needed, e.g | |
adapt individual components to fit your organization's best practices. See the | ||
[stack customization guide](stack-customization.md) for more details. | ||
|
||
Using Databricks MLOps stack, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state | ||
management, with an easy transition to production. You can also use MLOps stack as a building block | ||
Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state | ||
management, with an easy transition to production. You can also use MLOps Stacks as a building block | ||
in automation for creating new data science projects with production-grade CI/CD pre-configured. | ||
|
||
![MLOps Stack diagram](doc-images/mlops-stack.png) | ||
![MLOps Stacks diagram](doc-images/mlops-stacks.png) | ||
|
||
See the [FAQ](#FAQ) for questions on common use cases. | ||
|
||
## ML pipeline structure and devloop | ||
[See this page](Pipeline.md) for detailed description and diagrams of the ML pipeline | ||
structure defined in the default stack. | ||
|
||
## Using this stack | ||
## Using MLOps Stacks | ||
|
||
### Prerequisites | ||
- Python 3.8+ | ||
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) >= v0.204.0 | ||
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) >= v0.208.1 | ||
|
||
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.204.0 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation. | ||
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.208.1 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation. | ||
|
||
Please follow [the instruction](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#install-the-cli) to install and set up databricks CLI. Releases of databricks CLI can be found in the [releases section](https://github.com/databricks/cli/releases) of databricks/cli repository. | ||
|
||
|
@@ -47,7 +46,7 @@ Please follow [the instruction](https://docs.databricks.com/en/dev-tools/cli/dat | |
|
||
To create a new project, run: | ||
|
||
databricks bundle init https://github.com/databricks/mlops-stack | ||
databricks bundle init mlops-stacks | ||
|
||
This will prompt for parameters for project initialization. Some of these parameters are required to get started: | ||
* ``input_project_name``: name of the current project | ||
|
@@ -78,42 +77,41 @@ See the generated ``README.md`` for next steps! | |
|
||
## FAQ | ||
|
||
### Do I need separate dev/staging/prod workspaces to use this stack? | ||
### Do I need separate dev/staging/prod workspaces to use MLOps Stacks? | ||
We recommend using separate dev/staging/prod Databricks workspaces for stronger | ||
isolation between environments. For example, Databricks REST API rate limits | ||
are applied per-workspace, so if using [Databricks Model Serving](https://docs.databricks.com/applications/mlflow/model-serving.html), | ||
using separate workspaces can help prevent high load in staging from DOSing your | ||
production model serving endpoints. | ||
|
||
However, you can run the stack against just a single workspace, against a dev and | ||
staging/prod workspace, etc. Just supply the same workspace URL for | ||
However, you can create a single workspace stack, by supplying the same workspace URL for | ||
`input_databricks_staging_workspace_host` and `input_databricks_prod_workspace_host`. If you go this route, we | ||
recommend using different service principals to manage staging vs prod resources, | ||
to ensure that CI workloads run in staging cannot interfere with production resources. | ||
|
||
### I have an existing ML project. Can I productionize it using this stack? | ||
Yes. Currently, you can instantiate a new project from the stack and copy relevant components | ||
into your existing project to productionize it. The stack is modularized, so | ||
### I have an existing ML project. Can I productionize it using MLOps Stacks? | ||
Yes. Currently, you can instantiate a new project and copy relevant components | ||
into your existing project to productionize it. MLOps Stacks is modularized, so | ||
you can e.g. copy just the GitHub Actions workflows under `.github` or ML resource configs | ||
under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources`` | ||
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml`` into your existing project. | ||
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml`` into your existing project. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 Thanks |
||
### Can I adopt individual components of the stack? | ||
For this use case, we recommend instantiating the full stack via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) | ||
and copying the relevant stack subdirectories. For example, all ML resource configs | ||
### Can I adopt individual components of MLOps Stacks? | ||
For this use case, we recommend instantiating via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) | ||
and copying the relevant subdirectories. For example, all ML resource configs | ||
are defined under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources`` | ||
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``, while CI/CD is defined e.g. under `.github` | ||
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``, while CI/CD is defined e.g. under `.github` | ||
if using GitHub Actions, or under `.azure` if using Azure DevOps. | ||
|
||
### Can I customize this stack? | ||
### Can I customize my MLOps Stack? | ||
Yes. We provide the default stack in this repo as a production-friendly starting point for MLOps. | ||
However, in many cases you may need to customize the stack to match your organization's | ||
best practices. See [the stack customization guide](stack-customization.md) | ||
for details on how to do this. | ||
|
||
### Does the MLOps stack cover data (ETL) pipelines? | ||
### Does the MLOps Stacks cover data (ETL) pipelines? | ||
|
||
Since MLOps Stack is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html), | ||
Since MLOps Stacks is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html), | ||
it's not limited only to ML workflows and assets - it works for assets across the Databricks Lakehouse. For instance, while the existing ML | ||
code samples contain feature engineering, training, model validation, deployment and batch inference workflows, | ||
you can use it for Delta Live Tables pipelines as well. | ||
|
@@ -127,7 +125,7 @@ Please provide feedback (bug reports, feature requests, etc) via GitHub issues. | |
We welcome community contributions. For substantial changes, we ask that you first file a GitHub issue to facilitate | ||
discussion, before opening a pull request. | ||
|
||
This stack is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html) | ||
MLOps Stacks is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html) | ||
that generates new projects given user-supplied parameters. Parametrized project code can be found under | ||
the `{{.input_root_dir}}` directory. | ||
|
||
|
@@ -164,25 +162,25 @@ Run integration tests only: | |
pytest tests --large-only | ||
``` | ||
|
||
### Previewing stack changes | ||
When making changes to the stack, it can be convenient to see how those changes affect | ||
an actual new ML project created from the stack. To do this, you can create an example | ||
project from your local checkout of the stack, and inspect its contents/run tests within | ||
### Previewing changes | ||
When making changes to MLOps Stacks, it can be convenient to see how those changes affect | ||
a generated new ML project. To do this, you can create an example | ||
project from your local checkout of the repo, and inspect its contents/run tests within | ||
the project. | ||
|
||
We provide example project configs for Azure (using both GitHub and Azure DevOps) and AWS (using GitHub) under `tests/example-project-configs`. | ||
To create an example Azure project, using Azure DevOps as the CI/CD platform, run the following from the desired parent directory | ||
of the example project: | ||
|
||
``` | ||
# Note: update MLOPS_STACK_PATH to the path to your local checkout of the stack | ||
MLOPS_STACK_PATH=~/mlops-stack | ||
databricks bundle init "$MLOPS_STACK_PATH" --config-file "$MLOPS_STACK_PATH/tests/example-project-configs/azure/azure-devops.json" | ||
# Note: update MLOPS_STACKS_PATH to the path to your local checkout of the MLOps Stacks repo | ||
MLOPS_STACKS_PATH=~/mlops-stacks | ||
databricks bundle init "$MLOPS_STACKS_PATH" --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/azure/azure-devops.json" | ||
``` | ||
|
||
To create an example AWS project, using GitHub Actions for CI/CD, run: | ||
``` | ||
# Note: update MLOPS_STACK_PATH to the path to your local checkout of the stack | ||
MLOPS_STACK_PATH=~/mlops-stack | ||
databricks bundle init "$MLOPS_STACK_PATH" --config-file "$MLOPS_STACK_PATH/tests/example-project-configs/aws/aws-github.json" | ||
# Note: update MLOPS_STACKS_PATH to the path to your local checkout of the MLOps Stacks repo | ||
MLOPS_STACKS_PATH=~/mlops-stacks | ||
databricks bundle init "$MLOPS_STACKS_PATH" --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/aws/aws-github.json" | ||
``` |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's 9x larger. I don't think it's a big deal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I had to take the original image, overlay the new text, and then take a screenshot since I couldn't find the original file where we created the diagram 🤦♂️ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
# Stack Customization Guide | ||
We provide the default stack in this repo as a production-friendly starting point for MLOps. | ||
# MLOps Stacks Customization Guide | ||
We provide the default MLOps Stack in this repo as a production-friendly starting point for MLOps. | ||
|
||
For generic enhancements not specific to your organization | ||
(e.g. add support for a new CI/CD provider), we encourage you to consider contributing the | ||
change back to the default stack, so that the community can help maintain and enhance it. | ||
change back to the MLOps Stacks repo, so that the community can help maintain and enhance it. | ||
|
||
However, in many cases you may need to customize the stack, for example if: | ||
* You have different Databricks workspace environments (e.g. a "test" workspace for CI, in addition to dev/staging/prod) | ||
|
@@ -19,20 +19,20 @@ default stack. Before getting started, we encourage you to read | |
the [contributor guide](README.md#contributing) to learn how to | ||
make, preview, and test changes to your custom stack. | ||
|
||
### Fork the default stack repo | ||
Fork the default stack repo. You may want to create a private fork if you're tailoring | ||
### Fork the MLOps Stacks repo | ||
Fork the MLOps Stacks repo. You may want to create a private fork if you're tailoring | ||
the stack to the specific needs of your organization, or a public fork if you're creating | ||
a generic new stack. | ||
|
||
### (optional) Set up CI for your new stack | ||
Tests for the default stack are defined under the `tests/` directory and are | ||
### (optional) Set up CI | ||
Tests for MLOps Stacks are defined under the `tests/` directory and are | ||
executed in CI by Github Actions workflows defined under `.github/`. We encourage you to configure | ||
CI in your own stack repo to ensure the stack continues to work as you make changes. | ||
CI in your own MLOps Stacks repo to ensure it continues to work as you make changes. | ||
If you use GitHub Actions for CI, the provided workflows should work out of the box. | ||
Otherwise, you'll need to translate the workflows under `.github/` to the CI provider of your | ||
choice. | ||
|
||
### Update stack parameters | ||
### Update MLOps Stacks parameters | ||
Update parameters in your fork as needed in `databricks_template_schema.json` and update corresponding template variable in `library/template_variables.tmpl`. Pruning the set of | ||
parameters makes it easier for data scientists to start new projects, at the cost of reduced flexibility. | ||
|
||
|
@@ -41,16 +41,15 @@ For example, you may have a fixed set of staging & prod Databricks workspaces (o | |
also run all of your ML pipelines on a single cloud, in which case the `input_cloud` parameter is unnecessary. | ||
|
||
The easiest way to prune parameters and replace them with hardcoded values is to follow | ||
the [contributor guide](README.md#previewing-stack-changes) to generate an example project with | ||
parameters substituted-in, and then copy the generated project contents back into your stack. | ||
the [contributor guide](README.md#previewing-changes) to generate an example project with | ||
parameters substituted-in, and then copy the generated project contents back into your MLOps Stacks repo. | ||
|
||
## Customize individual components | ||
|
||
### Example ML code | ||
The default stack provides example ML code using [MLflow recipes](https://mlflow.org/docs/latest/recipes.html#). | ||
MLOps Stacks provides example ML code. | ||
You may want to customize the example code, e.g. further prune it down into a skeleton for data scientists | ||
to fill out, or remove and replace the use of MLflow Recipes if you expect data scientists to work on problem | ||
types that are currently unsupported by MLflow Recipes. | ||
to fill out. | ||
|
||
If you customize this component, you can still use the CI/CD and ML resource components to build production ML pipelines, as long as you provide ML | ||
notebooks with the expected interface. For example, model training under ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/training/notebooks/`` and inference under | ||
|
@@ -60,14 +59,13 @@ You may also want to update developer-facing docs under `template/{{.input_root_ | |
or `template/{{.input_root_dir}}/docs/ml-developer-guide-fs.md`, which will be read by users of your stack. | ||
|
||
### CI/CD workflows | ||
The default stack currently has the following sub-components for CI/CD: | ||
MLOps Stacks currently has the following sub-components for CI/CD: | ||
* CI/CD workflow logic defined under `template/{{.input_root_dir}}/.github/` for testing and deploying ML code and models | ||
* Automated scripts and docs for setting up CI/CD under `template/{{.input_root_dir}}/.mlops-setup-scripts/` | ||
* Logic to trigger model deployment through REST API calls to your CD system, when model training completes. | ||
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/TriggerModelDeploy.py`` | ||
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/ModelDeployment.py`` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue must be from long time ago 🤦 . |
||
### ML resource configs | ||
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``. | ||
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``. | ||
It defines the ML config resources to be included and workspace host for each deployment target. | ||
|
||
ML resource configs (databricks CLI bundles code definitions of ML jobs, experiments, models etc) can be found under | ||
|
@@ -80,7 +78,7 @@ When updating this component, you may want to update developer-facing docs in | |
``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources/README.md``. | ||
|
||
### Docs | ||
After making stack customizations, make any changes needed to | ||
the stack docs under `template/{{.input_root_dir}}/docs` and in the main README | ||
(`template/{{.input_root_dir}}/README.md`) to reflect any updates you've made to the stack. | ||
For example, you may want to include a link to your custom stack in `template/{{.input_root_dir}}/README.md`. | ||
After making customizations, make any changes needed to | ||
the docs under `template/{{.input_root_dir}}/docs` and in the main README | ||
(`template/{{.input_root_dir}}/README.md`) to reflect any updates you've made to the MLOps Stacks repo. | ||
For example, you may want to include a link to your custom MLOps Stacks repo in `template/{{.input_root_dir}}/README.md`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Shall we cut a release and update the alias pointing to the release version later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! We'll be cutting the MLOps Stacks v0.2 release after this PR merges, and deco team has plans to add this semantic versioning to the bundle aliases.