MS-Build 2020: Building an End-to-End ML Pipeline for Big Data

This repo holds information and resources for you to create the Microsoft Build 2020 - Building End-to-End Machine Learning pipelines for Big Data Session demo.

Prerequisites:

Azure account
Eventhubs
Azure Databricks
Azure Machine Learning
Azure KeyVault
Kubernetes Environment / Azure Container Instance

Data Flow

Ingest stream data into Azure Blob storage with Event hubs and Azure Databricks.
Preprocess the data to fit our schema - Apache Spark.
Save the data in parquet format - in raw storage directory.
Merge Batch(historical) and Stream(new) data with Apache Spark - save in preprocessed storage directory.
Create multiple Azure ML(AML) Datasets from Azure Databricks environment - save in refined storage directory.
Use Azure Machine Learning cluster compute to run multiple experiments on AML Datasets from VSCode.
Log ML models and ML algorithms parameters using MLflow.
Serve chosen ML model through Dockerized REST API service on Kubernetes.

Tutorials:

Ingest Data with Azure Blob and Eventhubs.
Collect, Analyze and Process Stream data with Azure Databricks and Eventhubs.
Track and log ML metrics with MLflow and AML.
Log & Deploy your ML Models to Kubernetes environment.

Q&A

If you have questions/concerns or would like to chat, contact us:

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
code		code
images		images
notebooks		notebooks
outputs		outputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MS-Build 2020: Building an End-to-End ML Pipeline for Big Data

Prerequisites:

Data Flow

Tutorials:

Q&A

About

Releases

Packages

Languages

License

adipolak/ms-build-e2e-ml-bigdata

Folders and files

Latest commit

History

Repository files navigation

MS-Build 2020: Building an End-to-End ML Pipeline for Big Data​

Prerequisites:

Data Flow

Tutorials:

Q&A

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

MS-Build 2020: Building an End-to-End ML Pipeline for Big Data

Packages