This repo holds information and resources for you to create the Microsoft Build 2020 - Building End-to-End Machine Learning pipelines for Big Data Session demo.
- Azure account
- Eventhubs
- Azure Databricks
- Azure Machine Learning
- Azure KeyVault
- Kubernetes Environment / Azure Container Instance
- Ingest stream data into Azure Blob storage with Event hubs and Azure Databricks.
- Preprocess the data to fit our schema - Apache Spark.
- Save the data in parquet format - in raw storage directory.
- Merge Batch(historical) and Stream(new) data with Apache Spark - save in preprocessed storage directory.
- Create multiple Azure ML(AML) Datasets from Azure Databricks environment - save in refined storage directory.
- Use Azure Machine Learning cluster compute to run multiple experiments on AML Datasets from VSCode.
- Log ML models and ML algorithms parameters using MLflow.
- Serve chosen ML model through Dockerized REST API service on Kubernetes.
- Ingest Data with Azure Blob and Eventhubs.
- Collect, Analyze and Process Stream data with Azure Databricks and Eventhubs.
- Track and log ML metrics with MLflow and AML.
- Log & Deploy your ML Models to Kubernetes environment.
If you have questions/concerns or would like to chat, contact us: