diff --git a/blog_post.html b/blog_post.html index 5ed686b..5c34496 100644 --- a/blog_post.html +++ b/blog_post.html @@ -3093,74 +3093,74 @@
Amit Arora, Madhur Prashant, Antara Raisa
+Amit Arora, Madhur Prashant, Antara Raisa, Johnny Chivers
This post is co-written with customer_names from Twilio.
-Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional collaboration. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams.
-Amazon SageMaker MLOps is a suite of features that includes Amazon SageMaker Projects (CI/CD), Amazon SageMaker Pipelines and Amazon SageMaker Model Registry.
+Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional collaboration. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams for their specific use cases.
+Amazon SageMaker MLOps is a suite of features that includes Amazon SageMaker Projects (CI/CD), Amazon SageMaker Pipelines and Amazon SageMaker Model Registry. In this blog post, we will discuss SageMaker Pipelines and Model Registry.
SageMaker Pipelines allows for straightforward creation and management of ML workflows, while also offering storage and reuse capabilities for workflow steps. The SageMaker Model Registry centralizes model tracking, simplifying model deployment.
This blog post focuses on enabling AWS customers to have flexibility for using their data source of choice, and integrate it seamlessly with Amazon SageMaker Processing jobs, where you can leverage a simplified, managed experience to run data pre- or post-processing and model evaluation workloads on the Amazon SageMaker platform.
-Twilio needed to implement an MLOPs pipeline with their customer data stored within PrestoDB. PrestoDB is an open-source SQL query engine that is designed for fast analytic queries against data of any size.
+Twilio needed to implement an MLOPs pipeline and query data as a part of this process from PrestoDB. PrestoDB is an open-source SQL query engine that is designed for fast analytic queries against data of any size from multiple sources.
In this post, we show you a step-by-step implementation to achieve the following:
How you can read raw data available in PrestoDB via SageMaker Processing Jobs
Train a binary classification model using SageMaker Training Jobs and tune the model using SageMaker Automatic Model Tuning
Lastly, run a batch transform for inference on your raw data fetched from prestoDB and deploy the model as a real time SageMaker endpoint
Run a batch transform for inference on your raw data fetched from prestoDB and deploy the model as a real time SageMaker endpoint for inference
Twilio is an american cloud communications company, based in San Francisco, California and provides programmable communication tools for making and receiving phone calls, sending and receiving text messages, and performing other communication functions using its web service APIs. Being one of the largest strategic AWS customers, Twilio engages with Data and AI/ML servives to run their daily workloads. This blog resolves around the steps AWS and Twilio took to migrate Twilio’s MLOps, implementation of training models and running batch inferences (that are able to detect burner accounts based on unusual user activity) to Amazon SageMaker.
-Burners are phone numbers which are available online to everyone and are used to hide identities by creating fake accounts on our customers’ apps/websites. Twilio built an effective data and machine learning operations pipeline to detect these anomaly phone numbers with the help of a binary classification model using the scikit-learn RandomForestClassifier. The training data they used for this pipeline is available via PrestoDB tables and is read into Pandas through the PrestoDB Python client. This data is then read into an Apache Spark dataframe (although the model training happens only using the data in the Pandas dataframe).
-The end goal was to convert all the existing steps into three sub solutions utilizing SageMaker Pipelines: a training pipeline, batch inference pipeline and finally, to deploy the trained model on an Amazon SageMaker Endpoint for real-time inference.
+Twilio is an american cloud communications company, based in San Francisco, California and provides programmable communication tools for making and receiving phone calls, sending and receiving text messages, and performing other communication functions using its web service APIs. Being one of the largest AWS customers, Twilio engages with Data and AI/ML servives to run their daily workloads. This blog resolves around the steps AWS and Twilio took to migrate Twilio’s existing MLOps, implementation of training models and running batch inferences (that are able to detect burner accounts based on unusual user activity) to Amazon SageMaker.
+Burners are phone numbers which are available online to everyone and are used to hide identities by creating fake accounts on customers’ apps/websites. Twilio built a data and machine learning operations pipeline to detect these anomaly phone numbers with the help of a binary classification model using the scikit-learn RandomForestClassifier. The training data they used for this pipeline is made available via PrestoDB tables and is read into Pandas through the PrestoDB Python client. This data is then read into an Apache Spark dataframe for further analysis and machine learning operations.
+The end goal was to convert all the existing steps into a three fold solution utilizing SageMaker Pipelines to enable more frequent model re-training, optimized batch processing, while customers take advatage of flexibility of data access via an open-source SQL query engine: 1/Implement a training pipeline and 2/batch inference pipeline (by connecting a sagemaker processing job with data queried from Presto). 3/Finally, we also demonstrate deploying the trained model on a SageMaker Endpoint for real-time inference.
For the proof of concept, we used the TPCH-Connector as our choice of open source data (this allows users to test Presto’s capabilities and query syntax without needing to configure access to an external data source). Using this solution, Twilio successfully migrated to SageMaker pipelines with the open source solution that can be viewed here published on aws-samples github: mlops-pipeline-prestodb.
The solution presented provides an implementation for training a machine learning model and running batch inference on Amazon SageMaker. The goal is to enable more frequent model re-training, optimized batch processing, and even add additional capabilities such as making the model available as a real-time endpoint. This solution also provides a design pattern built on AWS best practices that can be replicated for other ML workloads with minimal overhead. This is divided into three main steps: training pipeline, batch inference pipeline, and an implementation of real time inference support for the choice of maching learning model.
-We have divided the solution into the following main components that are open sourced and can be run through simple config file updates.
-This solution includes the following components:
+The solution presented provides an implementation for training a machine learning model and running batch inference on Amazon SageMaker using data fetched from a PrestoDB table. This solution provides a design pattern built on AWS best practices that can be replicated for other ML workloads with minimal overhead. This is divided into three main steps: training pipeline, batch inference pipeline, and an implementation of real time inference support for the choice of maching learning model.
+This solution is now open source and can be run through simple config file updates. For more information on the config.yml file walkthrough, view this link (add a link here poiting to the config file).
+This solution includes the following steps:
The solution design consists of three parts - Setting up the data preparation and training pipeline, preparing for the batch transform step, and deploying the approved model of choice as a real time SageMaker endpoint for inference. All of these parts utilize information from a single config.yml file, which includes the AWS and Presto credential information, Individial step pipeline parameters for the data preprocessing, training, tuning, model evaluation, model registeration and real time endpoint steps of this solution. This configuration file is highly customizable for the user to use and run the solution end to end with minimal-no code changes.
-The main components of this solution are as describe below:
-The solution design consists of the following parts - Setting up the data preparation and training pipeline, preparing for the batch transform step, and deploying the approved model of choice as a real time SageMaker endpoint for inference. All of these parts utilize information from a single config.yml file, which includes the necessary AWS and Presto credential information to connect to a presto server on an EC2 instance, Individial step pipeline parameters for the data preprocessing, training, tuning, model evaluation, model registeration and real time endpoint steps of this solution. This configuration file is highly customizable for the user to use and run the solution end to end with minimal-no code changes.
+The main components of this solution are as described in detail below:
+Pipeline.start
which triggers and instantiates all steps above.ml.c5.xlarge
instance with a minimum instance count of 1 and maximum instance count of 3 (configurable by the user) and automatic scaling policy configured.To implement the solution provided in this post, you should have an AWS account and familarity with SageMaker, Amazon S3, and PrestoDB.
+To implement the solution provided in this post, you should have an AWS account and familarity with SageMaker, S3, and PrestoDB.
The following prerequisites need to be in place before running this code.
We will use the built-in datasets available in PrestoDB for this repo. Following the instructions below to setup PrestoDB on an Amazon EC2 instance in your account. If you already have access to a PrestoDB instance then you can skip this section but keep its connection details handy (see the presto
section in the config
file).
We will use the built-in datasets available in PrestoDB for this repo. Follow the instructions below to setup PrestoDB on an Amazon EC2 instance in your account. If you already have access to a PrestoDB instance then you can skip this section but keep its connection details handy (see the presto
section in the config
file).
Create a security group to limit access to Presto. Create a security group called MyPrestoSG with two inbound rules to only allow access to Presto.
Once the prerequisite steps are complete and the config.yml file is set up correctly, we are now ready to run the “mlops-pipeline-prestodb” implementation:
+Once the prerequisites and set up is complete and the config.yml file is set up correctly, we are now ready to run the mlops-pipeline-prestodb
implementation. Follow the steps below or access the github repository to walk through the solution:
On the SageMaker console, or your IDE of choice, choose 0_model_training_pipeline.inpynb in the navigation pane. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook. This notebook demonstrates how SageMaker Pipelines can be used to string together a sequence of data processing, model training, tuning and evaluation step to train a binary classification machine learning model using scikit-learn. The trained model can then be used for batch inference, or hosted on a SageMaker endpoint for realtime inference.
Preprocess data step: In this step of the notebook, we set our pipeline input parameters when triggering our pipeline execution. We use a preprocess script which is read to connect to presto and query data, that is then sent to an Amazon S3 bucket split into train, test and validation datasets. Using these files, this step can then use the data for training the model.
+Preprocess data step: In this step of the notebook, we set our pipeline input parameters when triggering our pipeline execution. We use a preprocess script which is read to connect to the presto server on our EC2 instance, and query data (using the query specified and configurable in the config file), that is then sent to an S3 bucket split into train, test and validation datasets. Using the data in these files, we can train our machine learning model.
step_args = sklearn_processor.run(
- code=config['scripts']['preprocess_data'],
- source_dir=config['scripts']['source_dir'],
- outputs=outputs_preprocessor,
- arguments=[
- "--host", host_parameter,
- "--port", port_parameter,
- "--presto_credentials_key", presto_parameter,
- "--region", region_parameter,
- "--presto_catalog", presto_catalog_parameter,
- "--presto_schema", presto_schema_parameter,
- "--train_split", train_split.to_string(),
- "--test_split", test_split.to_string(),
- ],
- )
-
- step_preprocess_data = ProcessingStep(
- name=config['data_processing_step']['step_name'],
- step_args=step_args,
- )
# declare the sk_learn processer
+step_args = sklearn_processor.run(
+ ## code refers to the data preprocessing script that is responsible for querying data from the presto server
+ code=config['scripts']['preprocess_data'],
+ source_dir=config['scripts']['source_dir'],
+ outputs=outputs_preprocessor,
+ arguments=[
+ "--host", host_parameter,
+ "--port", port_parameter,
+ "--presto_credentials_key", presto_parameter,
+ "--region", region_parameter,
+ "--presto_catalog", presto_catalog_parameter,
+ "--presto_schema", presto_schema_parameter,
+ "--train_split", train_split.to_string(),
+ "--test_split", test_split.to_string(),
+ ],
+ )
+
+ step_preprocess_data = ProcessingStep(
+ name=config['data_processing_step']['step_name'],
+ step_args=step_args,
+ )
config['scripts']['source_dir']
which refers to our data preprocessing script that connects to the EC2 instance where the presto server runs. This script is responsible for extracting data from the query that you define. You can query the data you want by modifying the query parameter in the config file query parameter. We are using the sample query as an example to extract open source TPCH data
on orders, discounts and order priorities.config['scripts']['source_dir']
which refers to our data preprocessing script that connects to the EC2 instance where the presto server runs. This script is responsible for extracting data from the query that you define. You can query the data you want by modifying the query parameter in the config.yml file query parameter. We are using the sample query as an example to extract open source TPCH data
on orders, discounts and order priorities.Train Model Step: In this step of the notebook, we use the SKLearn estimator from SageMaker SDK and the RandomForestClassifier from scikit-learn to train the ML model. The HyperparameterTunerclass is used for running automatic model tuning to determine the set of hyperparameters that provide the best performance (maximize the AUC metric). We use the train, test and validation files that are sent to an S3 bucket after querying it from PrestoDB.
In the code below, the sklearn_estimator
object is created with parameters that are configured in the config file and uses this training script to train the ML model. The hyperparameters are also configurable by the user via the config file.
Train Model Step: In this step of the notebook, we use the SKLearn estimator from SageMaker SDK and the RandomForestClassifier from scikit-learn to train the ML model. The HyperparameterTunerclass is used for running automatic model tuning to determine the set of hyperparameters that provide the best performance based on a given metric threshold (maximize the AUC metric).
In the code below, the sklearn_estimator
object is created with parameters that are configured in the config file and uses this training script to train the ML model. The hyperparameters are also configurable by the user via the config file. This step accesses the train, test and validation files that are created as a part of the previous data preprocessing step.
sklearn_estimator = SKLearn(
- entry_point=config['scripts']['training_script'],
- role=role,
- instance_count=config['training_step']['instance_count'],
- instance_type=config['training_step']['instance_type'],
- framework_version=config['training_step']['sklearn_framework_version'],
- base_job_name=config['training_step']['base_job_name'],
- hyperparameters={
- "n_estimators": config['training_step']['n_estimators'],
- "max_depth": config['training_step']['max_depth'],
- "features": config['training_step']['training_features'],
- "target": config['training_step']['training_target'],
- },
- tags=config['training_step']['tags']
-)
-# Create Hyperparameter tuner object. Ranges from https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-tuning.html
-rf_tuner = HyperparameterTuner(
- estimator=sklearn_estimator,
- objective_metric_name=config['tuning_step']['objective_metric_name'],
- hyperparameter_ranges={
- "n_estimators": IntegerParameter(config['tuning_step']['hyperparam_ranges']['n_estimators'][0], config['tuning_step']['hyperparam_ranges']['n_estimators'][1]),
- "max_depth": IntegerParameter(config['tuning_step']['hyperparam_ranges']['max_depth'][0], config['tuning_step']['hyperparam_ranges']['max_depth'][1]),
- "min_samples_split": IntegerParameter(config['tuning_step']['hyperparam_ranges']['min_samples_split'][0], config['tuning_step']['hyperparam_ranges']['min_samples_split'][1]),
- "max_features": CategoricalParameter(config['tuning_step']['hyperparam_ranges']['max_features'])
- },
- max_jobs=config['tuning_step']['maximum_training_jobs'], ## reducing this for testing purposes
- metric_definitions=config['tuning_step']['metric_definitions'],
- max_parallel_jobs=config['tuning_step']['maximum_parallel_training_jobs'], ## reducing this for testing purposes
-)
-
-step_tuning = TuningStep(
- name=config['tuning_step']['step_name'],
- tuner=rf_tuner,
- inputs={
- "train": TrainingInput(
- s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs[
- "train" ## refer to this
- ].S3Output.S3Uri,
- content_type="text/csv",
- ),
- "test": TrainingInput(
- s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
- content_type="text/csv",
- ),
- },
-)
Evaluate model step: The purpose of this step is to check that the trained and tuned model has an accuracy level above a configurable threshold and only then register the model with the model registry (from where it can be subsequently approved and deployed). If the model accuracy does not meet a configured threshold then the pipeline fails and the model is not registered with the model registry. We use the ScriptProcessor
with an evaluation script that a user creates to evaluate the trained model based on metrics of choice.
Evaluate model step: The purpose of this step is to check if the trained and tuned model has an accuracy level above a configurable threshold and only then register the model with the model registry (from where it can be subsequently approved and deployed). If the model accuracy does not meet a configured threshold then the pipeline fails and the model is not registered with the model registry. We use the ScriptProcessor
with an evaluation script that a user creates to evaluate the trained model based on a metric of choice.
Evaluation Report
is generated that is sent to the S3 bucket:Evaluation Report
is generated that is sent to the S3 bucket for analysis:
evaluation_report = PropertyFile(
name="EvaluationReport", output_name="evaluation", path=config['evaluation_step']['evaluation_filename']
)
model.predict
. The evaluation report sent to S3 contains information on metrics like precision, recall, accuracy
.Register model step: Once the trained model meets the model performance requirements, a new version of the model is registered with the model registry for further analysis.
Register model step: Once the trained model meets the model performance requirements, a new version of the model is registered with the model registry for further analysis and model creation.
# Crete a RegisterModel step, which registers the model with SageMaker Model Registry.
step_register_model = RegisterModel(
@@ -3426,7 +3431,7 @@ Testing the solution<
tags=config['register_model_step']['tags']
)
The model is registered with the model Registry with approval status set to PendingManualApproval, this means the model cannot be deployed on a SageMaker Endpoint unless its status in the registry is changed to Approved manually via the SageMaker console, programmatically or through a Lambda function.
-Adding conditions to the pipeline is done with a ConditionStep. In this case, we only want to register the new model version with the model registry if the new model meets a specific accuracy condition:
+Adding conditions to the pipeline is done with a ConditionStep. In this case, we only want to register the new model version with the model registry if the new model meets a specific accuracy condition:
step_fail = FailStep(
name=config['fail_step']['step_name'],
@@ -3450,10 +3455,11 @@ Testing the solution<
name=config['condition_step']['step_name'],
conditions=[cond_gte],
if_steps=[step_register_model],
- else_steps=[step_fail], ## if this fails - add a step here (from the quip)
+ else_steps=[step_fail], ## if this fails
)
If the accuracy condition is not met, a step_fail
step is executed that sends an error message to the user and the pipeline fails.
Orchestrate all steps and start the pipeline: Once you have created the pipeline steps as above, you can instantiate and start it with custom parameters making the pipeline agnostic to who is triggering it, but also to the scripts and data used. The pipeline can be started using the CLI, the SageMaker Studio UI or the SDK and below there is a screenshot of what it looks like in the SageMaker Studio UI.
+Orchestrate all steps and start the pipeline: Once you have created the pipeline steps as above, you can instantiate and start it with custom parameters making the pipeline agnostic to who is triggering it, but also to the scripts and data used. The pipeline can be started using the CLI, the SageMaker Studio UI or the SDK.
# Start pipeline with credit data and preprocessing script
execution = pipeline.start(
execution_display_name=pipeline.name,
@@ -3474,11 +3480,11 @@ Testing the solution<
# print the summary of the pipeline run once it is completed
print_pipeline_execution_summary(execution.list_steps(), pipeline.name)
At the end of the training pipeline, your pipeline structure on Amazon SageMaker should look like this:
-Now that the model is registered, get access to the registered model manually on the sagemaker studio model registry console, or programmatically in the next notebook, approve it and run the second portion of this solution: Batch Transform Step
At the end of the executing the entire training pipeline, your pipeline structure on Amazon SageMaker Pipelines should look like this:
+Now that the model is registered, you can get access to the registered model manually on the sagemaker studio model registry console, or programmatically in the next notebook, approve it and run the second portion of this solution: Batch Transform Step
Next Choose 1_batch_transform_pipeline.ipynb
. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook. This notebook will run a batch transform step using the model trained in the previous notebook. It does so by running the following steps:
# What instance type to use for processing.
processing_instance_type = ParameterString(
@@ -3523,8 +3529,9 @@ Testing the solution<
ModelPackageArn=latest_model_package_arn,
ModelApprovalStatus="Approved",
)
Now we have extracted the latest model from the SageMaker Model Registry, and programmatically approved it. You can also approve the model manually on the SageMaker Model Registry page on SageMaker Studio as given in the screenshot below.
## represents the output processing for the batch pre processing step
batch_output=[
@@ -3546,25 +3553,26 @@ Testing the solution<
# Use the sklearn_processor's run method and configure the batch preprocessing step
step_args = sklearn_processor.run(
- code=config['scripts']['batch_transform_get_data'],
- source_dir=config['scripts']['source_dir'],
- outputs=batch_output,
- arguments=[
- "--host", host_parameter,
- "--port", port_parameter,
- "--presto_credentials_key", presto_parameter,
- "--region", region_parameter,
- "--presto_catalog", presto_catalog_parameter,
- "--presto_schema", presto_schema_parameter,
- ],
-)
-
-# declare the batch step that is called later in pipeline execution
-batch_data_prep = ProcessingStep(
- name=config['data_processing_step']['step_name'],
- step_args=step_args,
-)
Now, with the image uri, we refer to the ‘inference.py’ script that grabs information on features to use while making predictions. Using this, we will create the model which automatically trigger the training and the preprocess data step Run the transformer step on the created model.
+ # here, we add in a `code` or an entry point that uses the data preprocess script for collecting data in a batch and storing it in S3 + code=config['scripts']['batch_transform_get_data'], + source_dir=config['scripts']['source_dir'], + outputs=batch_output, + arguments=[ + "--host", host_parameter, + "--port", port_parameter, + "--presto_credentials_key", presto_parameter, + "--region", region_parameter, + "--presto_catalog", presto_catalog_parameter, + "--presto_schema", presto_schema_parameter, + ], +) + +# declare the batch step that is called later in pipeline execution +batch_data_prep = ProcessingStep( + name=config['data_processing_step']['step_name'], + step_args=step_args, +) +Once the batch data preparation step is complete, we declare a model with the image uri and refer to the ‘inference.py’ script that grabs information on features to use while making predictions. Using this, we will create the model which automatically trigger the training and the preprocess data step Run the transformer step on the created model.
# create the model image based on the model data and refer to the inference script as an entry point for batch inference
model = Model(
image_uri=image_uri,
@@ -3615,85 +3623,100 @@ Testing the solution<
# start the pipeline execution:
response = sagemaker_client.start_pipeline_execution(
PipelineName=batch_transform_pipeline.name
-)
Once the pipeline run has completed, run batch inference and view the output below:
-while True:
- resp = client.describe_pipeline_execution(
- PipelineExecutionArn=response['PipelineExecutionArn']
- )
- status = resp['PipelineExecutionStatus']
At the end of the batch transform pipeline, your pipeline structure on Amazon SageMaker should look like this:
Lastly, Choose 2_realtime_inference.ipynb
. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook. This notebook extracts the latest approved model from the model registry and deploys it as a realtime endpoint. It does so by running the following steps:
At the end of the batch transform pipeline, your pipeline structure on Amazon SageMaker Pipelines should look like this:
Lastly, Choose 2_realtime_inference.ipynb
. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook. This notebook extracts the latest approved model from the model registry and deploys it as a SageMaker endpoint for real time inference. It does so by running the following steps:
image uri
to use and extract the latest approved model the same way we did in the prior batch transfrom notebook. Once you have extracted the latest approved model, use a container list with the specific inference.py
file to create the model and run inferences against.image uri
to use and extract the latest approved model the same way we did in the prior batch transfrom notebook. Once you have extracted the latest approved model, use a container list with the specific inference.py
file to create the model and run inferences against. This model creation and endpoint deployment is specific to the Scikit-learn model configuration and will change based on your use case.container_list = [{
- 'Image': image_uri,
- 'ModelDataUrl': model_data_url,
- 'Environment': {
- 'SAGEMAKER_PROGRAM': 'inference.py',
- 'SAGEMAKER_SUBMIT_DIRECTORY': compressed_inference_script_uri,
- }
-}]
-
-## create the model object and call deploy on it
-create_model_response = sm.create_model(
- ModelName = model_name,
- ExecutionRoleArn = role,
- Containers=container_list
-)
In this code, we use the inference.py file specific to the Scikit Learn model. We then create our endpoint configuration, setting our ManagedInstanceScaling
to ENABLED
with our desired MaxInstanceCount
and MinInstanceCount
.
create_endpoint_config_response = sm.create_endpoint_config(
-EndpointConfigName = endpoint_config_name,
-ProductionVariants=[{
- 'InstanceType': instance_type,
- ## have max instance count configured here
- 'InitialInstanceCount': min_instances,
- 'InitialVariantWeight': 1,
- 'ModelName': model_name,
- 'VariantName': 'AllTraffic',
- ## change your managed instance configuration here
- "ManagedInstanceScaling":{
- "MaxInstanceCount": max_instances,
- "MinInstanceCount": min_instances,
- "Status": "ENABLED",}
-}])
container_list = [{
+ 'Image': image_uri,
+ 'ModelDataUrl': model_data_url,
+ 'Environment': {
+ 'SAGEMAKER_PROGRAM': 'inference.py',
+ 'SAGEMAKER_SUBMIT_DIRECTORY': compressed_inference_script_uri,
+ }
+}]
+
+## create the model object and call deploy on it
+create_model_response = sm.create_model(
+ ModelName = model_name,
+ ExecutionRoleArn = role,
+ Containers=container_list
+)
In this code, we use the inference.py file specific to the Scikit Learn model. We then create our endpoint configuration, setting our ManagedInstanceScaling
to ENABLED
with our desired MaxInstanceCount
and MinInstanceCount
for automatic scaling.
create_endpoint_config_response = sm.create_endpoint_config(
+EndpointConfigName = endpoint_config_name,
+ProductionVariants=[{
+ 'InstanceType': instance_type,
+ ## have max instance count configured here
+ 'InitialInstanceCount': min_instances,
+ 'InitialVariantWeight': 1,
+ 'ModelName': model_name,
+ 'VariantName': 'AllTraffic',
+ ## change your managed instance configuration here
+ "ManagedInstanceScaling":{
+ "MaxInstanceCount": max_instances,
+ "MinInstanceCount": min_instances,
+ "Status": "ENABLED",}
+}])
create_endpoint_response = sm.create_endpoint(
-EndpointName=endpoint_name,
-EndpointConfigName=endpoint_config_name)
-logger.info(f"Going to deploy the real time endpoint -> {create_endpoint_response['EndpointArn']}")
-
-# wait for endpoint to reach a terminal state (InService) using describe endpoint
-describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)
-
-while describe_endpoint_response["EndpointStatus"] == "Creating":
- describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)
- print(describe_endpoint_response["EndpointStatus"])
- time.sleep(15)
create_endpoint_response = sm.create_endpoint(
+EndpointName=endpoint_name,
+EndpointConfigName=endpoint_config_name)
+logger.info(f"Going to deploy the real time endpoint -> {create_endpoint_response['EndpointArn']}")
+
+# wait for endpoint to reach a terminal state (InService) using describe endpoint
+describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)
+
+while describe_endpoint_response["EndpointStatus"] == "Creating":
+ describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)
+ print(describe_endpoint_response["EndpointStatus"])
+ time.sleep(15)
Now run inference against the data extracted from prestoDB:
-body_str = "total_extended_price,avg_discount,total_quantity\n1,2,3\n66.77,12,2"
+
+response = smr.invoke_endpoint(
+ EndpointName=endpoint_name,
+ Body=body_str.encode('utf-8') ,
+ ContentType='text/csv',
+)
+
+response_str = response["Body"].read().decode()
+response_str
We have now seen an end to end process of our solution, from fetching data by connecting to a Presto Server on an EC2 instance, followed by training, evaluating, registering the model. We then approved the latest registered model from our training pipeline solution and ran batch inference against batch data stored in S3. We finally deployed the latest approved model as a real time SageMaker endpoint to run inferences against. Take a look at the results below.
Here is a compilation of some queries and responses generated by our implementation from the real time endpoint deployment stage:
-–> need to add results
-–> need to add tips
+Query | +Answer | +
---|---|
total_extended_price,avg_discount,total_quantity,2,3,12,2 | +– response – | +