From dcc93984e5a4d52f273b0d15803c501e177501af Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:16:01 -0700 Subject: [PATCH 01/25] Update how-to-monitor-datasets.md Update this article with model monitoring migration guidance. --- .../v1/how-to-monitor-datasets.md | 208 +++++++++++++++++- 1 file changed, 200 insertions(+), 8 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 23b607a4c0..a1e70b7d3b 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -3,7 +3,7 @@ title: Detect data drift on datasets (preview) titleSuffix: Azure Machine Learning description: Learn how to set up data drift detection in Azure Learning. Create datasets monitors (preview), monitor for data drift, and set up alerts. services: machine-learning -ms.service: azure-machine-learning +ms.service: machine-learning ms.subservice: mldata ms.reviewer: franksolomon ms.author: xunwan @@ -14,6 +14,11 @@ ms.custom: UpdateFrequency5, data4ml, sdkv1 #Customer intent: As a data scientist, I want to detect data drift in my datasets and set alerts for when drift is large. --- +# Data drift(preview) will be retired, and replaced by Model Monitor + +Data drift(preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](https://learn.microsoft.com/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli) for your data drift tasks. +Please check the content below to understand the replacement, feature gaps and manual change steps. + # Detect data drift (preview) on datasets [!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] @@ -34,7 +39,7 @@ With Azure Machine Learning dataset monitors (preview), you can: An [Azure Machine Learning dataset](how-to-create-register-datasets.md) is used to create the monitor. The dataset must include a timestamp column. -You can view data drift metrics with the Python SDK or in Azure Machine Learning studio. Other metrics and insights are available through the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with the Azure Machine Learning workspace. +You can view data drift metrics with the Python SDK or in Azure Machine Learning studio. Other metrics and insights are available through the [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md) resource associated with the Azure Machine Learning workspace. > [!IMPORTANT] > Data drift detection for datasets is currently in public preview. @@ -49,6 +54,44 @@ To create and work with dataset monitors, you need: * The [Azure Machine Learning SDK for Python installed](/python/api/overview/azure/ml/install), which includes the azureml-datasets package. * Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data. +When you migrate to Model Monitor, please check the prerequisites as following: + +# [Azure CLI](#tab/azure-cli) + +[!INCLUDE [basic prereqs cli](includes/machine-learning-cli-prereqs.md)] + +# [Python SDK](#tab/python) + +[!INCLUDE [basic prereqs sdk](includes/machine-learning-sdk-v2-prereqs.md)] + +# [Studio](#tab/azure-studio) + +Before following the steps in this article, make sure you have the following prerequisites: + +* An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/). + +* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](quickstart-create-resources.md) article to create them. + +--- + +* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](how-to-assign-roles.md). + +* For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: + + * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md). + + * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](how-to-collect-production-data.md). + +* For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to: + + * Have a means to collect production data and register it as an Azure Machine Learning data asset. + * Update the registered data asset continuously for model monitoring. + * (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking. + +> [!IMPORTANT] +> +> Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: `Standard_E4s_v3`, `Standard_E8s_v3`, `Standard_E16s_v3`, `Standard_E32s_v3`, and `Standard_E64s_v3`. You can select the VM instance type with the `create_monitor.compute.instance_type` property in your YAML configuration or from the dropdown in the Azure Machine Learning studio. + ## What is data drift? Model accuracy degrades over time, largely because of data drift. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues. @@ -76,7 +119,7 @@ With a dataset monitor you can: The data drift algorithm provides an overall measure of change in data and indication of which features are responsible for further investigation. Dataset monitors produce many other metrics by profiling new data in the `timeseries` dataset. -Custom alerting can be set up on all metrics generated by the monitor through [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview). Dataset monitors can be used to quickly catch data issues and reduce the time to debug the issue by identifying likely causes. +Custom alerting can be set up on all metrics generated by the monitor through [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md). Dataset monitors can be used to quickly catch data issues and reduce the time to debug the issue by identifying likely causes. Conceptually, there are three primary scenarios for setting up dataset monitors in Azure Machine Learning. @@ -103,10 +146,25 @@ You monitor [Azure Machine Learning datasets](how-to-create-register-datasets.md The monitor compares the baseline and target datasets. +#### Migrate to Model Monitor +In Model Monitor, you can find corresponding concepts as following, and you can find more details in this article [Set up model monitoring by bringing in your production data to Azure Machine Learning](https://learn.microsoft.com/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli#set-up-out-of-box-model-monitoring): +* Reference dataset: similar to your baseline dataset for data drift detection, it is set as the recent past production inference dataset. +* Production inference data: similar to your target dataset in data drift detection, the production inference data can be collected automatically from models deployed in production. It can also be inference data you store. + + ## Create target dataset The target dataset needs the `timeseries` trait set on it by specifying the timestamp column either from a column in the data or a virtual column derived from the path pattern of the files. Create the dataset with a timestamp through the [Python SDK](#sdk-dataset) or [Azure Machine Learning studio](#studio-dataset). A column representing a "timestamp" must be specified to add `timeseries` trait to the dataset. If your data is partitioned into folder structure with time info, such as '{yyyy/MM/dd}', create a virtual column through the path pattern setting and set it as the "partition timestamp" to enable time series API functionality. +### Migrate to Model Monitor +When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). + +When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). + +Following sections contain more details on how to migrate to Model Monitor. + + + # [Python SDK](#tab/python) @@ -162,6 +220,140 @@ If your data is already partitioned by date or time, as is the case here, you ca --- + + +If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time. + +# [Azure CLI](#tab/azure-cli) + +Azure Machine Learning model monitoring uses `az ml schedule` to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition: + +```azurecli +az ml schedule create -f ./out-of-box-monitoring.yaml +``` + +The following YAML contains the definition for the out-of-box model monitoring. + +:::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: + +# [Python SDK](#tab/python) + +You can use the following code to set up the out-of-box model monitoring: + +```python +from azure.identity import DefaultAzureCredential +from azure.ai.ml import MLClient +from azure.ai.ml.entities import ( + AlertNotification, + MonitoringTarget, + MonitorDefinition, + MonitorSchedule, + RecurrencePattern, + RecurrenceTrigger, + ServerlessSparkCompute +) + +# get a handle to the workspace +ml_client = MLClient( + DefaultAzureCredential(), + subscription_id="subscription_id", + resource_group_name="resource_group_name", + workspace_name="workspace_name", +) + +# create the compute +spark_compute = ServerlessSparkCompute( + instance_type="standard_e4s_v3", + runtime_version="3.3" +) + +# specify your online endpoint deployment +monitoring_target = MonitoringTarget( + ml_task="classification", + endpoint_deployment_id="azureml:credit-default:main" +) + + +# create alert notification object +alert_notification = AlertNotification( + emails=['abc@example.com', 'def@example.com'] +) + +# create the monitor definition +monitor_definition = MonitorDefinition( + compute=spark_compute, + monitoring_target=monitoring_target, + alert_notification=alert_notification +) + +# specify the schedule frequency +recurrence_trigger = RecurrenceTrigger( + frequency="day", + interval=1, + schedule=RecurrencePattern(hours=3, minutes=15) +) + +# create the monitor +model_monitor = MonitorSchedule( + name="credit_default_monitor_basic", + trigger=recurrence_trigger, + create_monitor=monitor_definition +) + +poller = ml_client.schedules.begin_create_or_update(model_monitor) +created_monitor = poller.result() +``` + +# [Studio](#tab/azure-studio) + +1. Navigate to [Azure Machine Learning studio](https://ml.azure.com). +1. Go to your workspace. +1. Select **Monitoring** from the **Manage** section +1. Select **Add**. + + :::image type="content" source="media/how-to-monitor-models/add-model-monitoring.png" alt-text="Screenshot showing how to add model monitoring." lightbox="media/how-to-monitor-models/add-model-monitoring.png"::: + +1. On the **Basic settings** page, use **(Optional) Select model** to choose the model to monitor. +1. The **(Optional) Select deployment with data collection enabled** dropdown list should be automatically populated if the model is deployed to an Azure Machine Learning online endpoint. Select the deployment from the dropdown list. +1. Select the training data to use as the comparison reference in the **(Optional) Select training data** box. +1. Enter a name for the monitoring in **Monitor name** or keep the default name. +1. Notice that the virtual machine size is already selected for you. +1. Select your **Time zone**. +1. Select **Recurrence** or **Cron expression** scheduling. +1. For **Recurrence** scheduling, specify the repeat frequency, day, and time. For **Cron expression** scheduling, enter a cron expression for monitoring run. + + :::image type="content" source="media/how-to-monitor-models/model-monitoring-basic-setup.png" alt-text="Screenshot of basic settings page for model monitoring." lightbox="media/how-to-monitor-models/model-monitoring-basic-setup.png"::: + +1. Select **Next** to go to the **Advanced settings** section. +1. Select **Next** on the **Configure data asset** page to keep the default datasets. +1. Select **Next** to go to the **Select monitoring signals** page. +1. Select **Next** to go to the **Notifications** page. Add your email to receive email notifications. +1. Review your monitoring details and select **Create** to create the monitor. + + + +When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). + +You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: + +* Collect production inference data from models deployed in production. +* Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data. +* Provide a custom data preprocessing component and register it as an Azure Machine Learning component. + +You must provide a custom data preprocessing component if your data isn't collected with the [data collector](how-to-collect-production-data.md). Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing. + +Your custom preprocessing component must have these input and output signatures: + + | Input/Output | Signature name | Type | Description | Example value | + |---|---|---|---|---| + | input | `data_window_start` | literal, string | data window start-time in ISO8601 format. | 2023-05-01T04:31:57.012Z | + | input | `data_window_end` | literal, string | data window end-time in ISO8601 format. | 2023-05-01T04:31:57.012Z | + | input | `input_data` | uri_folder | The collected production inference data, which is registered as an Azure Machine Learning data asset. | azureml:myproduction_inference_data:1 | + | output | `preprocessed_data` | mltable | A tabular dataset, which matches a subset of the reference data schema. | | + +For an example of a custom data preprocessing component, see [custom_preprocessing in the azuremml-examples GitHub repo](https://github.com/Azure/azureml-examples/tree/main/cli/monitoring/components/custom_preprocessing). + + ## Create dataset monitor Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). @@ -174,6 +366,9 @@ As described later, a dataset monitor runs at a set frequency (daily, weekly, mo The **backfill** function runs a backfill job, for a specified start and end date range. A backfill job fills in expected missing data points in a data set, as a way to ensure data accuracy and completeness. +> [!NOTE] +> Azure Machine Learning model monitoring doesn't support manual **backfill** function, if you want to redo the model monitor for a specif time range, you can create another model monitor for that specific time range. + # [Python SDK](#tab/python) @@ -318,7 +513,7 @@ Metrics in the chart depend on the type of feature. | Metric | Description | | ------ | ----------- | - | Euclidian distance     |  Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions.  The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature.  | + | Euclidian distance | Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions. The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature. | | Unique values | Number of unique values (cardinality) of the feature. | On this chart, select a single date to compare the feature distribution between the target and this date for the displayed feature. For numeric features, this shows two probability distributions. If the feature is numeric, a bar chart is shown. @@ -327,7 +522,7 @@ On this chart, select a single date to compare the feature distribution between ## Metrics, alerts, and events -Metrics can be queried in the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. +Metrics can be queried in the [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. To get started, navigate to the [Azure portal](https://portal.azure.com) and select your workspace's **Overview** page. The associated Application Insights resource is on the far right: @@ -373,9 +568,6 @@ Limitations and known issues for data drift monitors: * If the SDK `backfill()` function doesn't generate the expected output, it may be due to an authentication issue. When you create the compute to pass into this function, don't use `Run.get_context().experiment.workspace.compute_targets`. Instead, use [ServicePrincipalAuthentication](/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication) such as the following to create the compute that you pass into that `backfill()` function: -> [!NOTE] -> Do not hard code the service principal password in your code. Instead, retrieve it from the Python environment, key store, or other secure method of accessing secrets. - ```python auth = ServicePrincipalAuthentication( tenant_id=tenant_id, From 2492d5ff46375cc1d3cccf24d49bf44f7c51e7b0 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:17:41 -0700 Subject: [PATCH 02/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index a1e70b7d3b..9820d07255 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -3,7 +3,7 @@ title: Detect data drift on datasets (preview) titleSuffix: Azure Machine Learning description: Learn how to set up data drift detection in Azure Learning. Create datasets monitors (preview), monitor for data drift, and set up alerts. services: machine-learning -ms.service: machine-learning +ms.service: azure-machine-learning ms.subservice: mldata ms.reviewer: franksolomon ms.author: xunwan From 21b743379cc95944e1f8ad07a00754735f1037aa Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:19:02 -0700 Subject: [PATCH 03/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 9820d07255..5fe165eee7 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -39,7 +39,7 @@ With Azure Machine Learning dataset monitors (preview), you can: An [Azure Machine Learning dataset](how-to-create-register-datasets.md) is used to create the monitor. The dataset must include a timestamp column. -You can view data drift metrics with the Python SDK or in Azure Machine Learning studio. Other metrics and insights are available through the [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md) resource associated with the Azure Machine Learning workspace. +You can view data drift metrics with the Python SDK or in Azure Machine Learning studio. Other metrics and insights are available through the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with the Azure Machine Learning workspace. > [!IMPORTANT] > Data drift detection for datasets is currently in public preview. From 74f44af7cb1e7b30c031e58c74ffeb63279e64dd Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:20:34 -0700 Subject: [PATCH 04/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 5fe165eee7..89f4651c19 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -119,7 +119,7 @@ With a dataset monitor you can: The data drift algorithm provides an overall measure of change in data and indication of which features are responsible for further investigation. Dataset monitors produce many other metrics by profiling new data in the `timeseries` dataset. -Custom alerting can be set up on all metrics generated by the monitor through [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md). Dataset monitors can be used to quickly catch data issues and reduce the time to debug the issue by identifying likely causes. +Custom alerting can be set up on all metrics generated by the monitor through [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview). Dataset monitors can be used to quickly catch data issues and reduce the time to debug the issue by identifying likely causes. Conceptually, there are three primary scenarios for setting up dataset monitors in Azure Machine Learning. From ce26ea5e632971d663e8cbf8ef96a2da5839c9e6 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:22:13 -0700 Subject: [PATCH 05/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 89f4651c19..0781fff0af 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -513,7 +513,7 @@ Metrics in the chart depend on the type of feature. | Metric | Description | | ------ | ----------- | - | Euclidian distance | Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions. The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature. | + | Euclidian distance | Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions. The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature. | | Unique values | Number of unique values (cardinality) of the feature. | On this chart, select a single date to compare the feature distribution between the target and this date for the displayed feature. For numeric features, this shows two probability distributions. If the feature is numeric, a bar chart is shown. From 2122bc890b2ffd9e0c8be8592f88b0c37274bcd2 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:26:17 -0700 Subject: [PATCH 06/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 0781fff0af..744a6f9fad 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -513,7 +513,7 @@ Metrics in the chart depend on the type of feature. | Metric | Description | | ------ | ----------- | - | Euclidian distance | Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions. The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature. | + | Euclidian distance | Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same categorical column from two datasets. 0 indicates no difference in the empirical distributions. The more it deviates from 0, the more this column has drifted. Trends can be observed from a time series plot of this metric and can be helpful in uncovering a drifting feature. | | Unique values | Number of unique values (cardinality) of the feature. | On this chart, select a single date to compare the feature distribution between the target and this date for the displayed feature. For numeric features, this shows two probability distributions. If the feature is numeric, a bar chart is shown. @@ -522,7 +522,7 @@ On this chart, select a single date to compare the feature distribution between ## Metrics, alerts, and events -Metrics can be queried in the [Azure Application Insights](../../azure-monitor/app/app-insights-overview.md) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. +Metrics can be queried in the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. To get started, navigate to the [Azure portal](https://portal.azure.com) and select your workspace's **Overview** page. The associated Application Insights resource is on the far right: @@ -568,6 +568,10 @@ Limitations and known issues for data drift monitors: * If the SDK `backfill()` function doesn't generate the expected output, it may be due to an authentication issue. When you create the compute to pass into this function, don't use `Run.get_context().experiment.workspace.compute_targets`. Instead, use [ServicePrincipalAuthentication](/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication) such as the following to create the compute that you pass into that `backfill()` function: +> [!NOTE] +> Do not hard code the service principal password in your code. Instead, retrieve it from the Python environment, key store, or other secure method of accessing secrets. +> + ```python auth = ServicePrincipalAuthentication( tenant_id=tenant_id, From 39f5f18a950c79314fa8dca783318e89e0c64240 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:40:11 -0700 Subject: [PATCH 07/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 744a6f9fad..a197de91b3 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -14,7 +14,7 @@ ms.custom: UpdateFrequency5, data4ml, sdkv1 #Customer intent: As a data scientist, I want to detect data drift in my datasets and set alerts for when drift is large. --- -# Data drift(preview) will be retired, and replaced by Model Monitor +# Data drift (preview) will be retired, and replaced by Model Monitor Data drift(preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](https://learn.microsoft.com/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli) for your data drift tasks. Please check the content below to understand the replacement, feature gaps and manual change steps. From 1f58da7d27434ad863ca822b47d267d31b4b3355 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:42:57 -0700 Subject: [PATCH 08/25] Update how-to-monitor-datasets.md --- .../machine-learning/v1/how-to-monitor-datasets.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index a197de91b3..3d1d0f7eb5 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -54,15 +54,16 @@ To create and work with dataset monitors, you need: * The [Azure Machine Learning SDK for Python installed](/python/api/overview/azure/ml/install), which includes the azureml-datasets package. * Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data. +### Migrate to Model Monitor When you migrate to Model Monitor, please check the prerequisites as following: # [Azure CLI](#tab/azure-cli) -[!INCLUDE [basic prereqs cli](includes/machine-learning-cli-prereqs.md)] +[!INCLUDE [basic prereqs cli](./includes/machine-learning-cli-prereqs.md)] # [Python SDK](#tab/python) -[!INCLUDE [basic prereqs sdk](includes/machine-learning-sdk-v2-prereqs.md)] +[!INCLUDE [basic prereqs sdk](./includes/machine-learning-sdk-v2-prereqs.md)] # [Studio](#tab/azure-studio) @@ -70,17 +71,17 @@ Before following the steps in this article, make sure you have the following pre * An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/). -* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](quickstart-create-resources.md) article to create them. +* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](./quickstart-create-resources.md) article to create them. --- -* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](how-to-assign-roles.md). +* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](./how-to-assign-roles.md). * For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: - * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md). + * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](./how-to-deploy-online-endpoints.md). - * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](how-to-collect-production-data.md). + * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](./how-to-collect-production-data.md). * For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to: From d9f5d3eaacf72441caf550f810896ea293e64244 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 16 Sep 2024 20:50:19 -0700 Subject: [PATCH 09/25] Update how-to-monitor-datasets.md --- .../v1/how-to-monitor-datasets.md | 233 +++++++++--------- 1 file changed, 116 insertions(+), 117 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 3d1d0f7eb5..37c39291e8 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -157,15 +157,6 @@ In Model Monitor, you can find corresponding concepts as following, and you can The target dataset needs the `timeseries` trait set on it by specifying the timestamp column either from a column in the data or a virtual column derived from the path pattern of the files. Create the dataset with a timestamp through the [Python SDK](#sdk-dataset) or [Azure Machine Learning studio](#studio-dataset). A column representing a "timestamp" must be specified to add `timeseries` trait to the dataset. If your data is partitioned into folder structure with time info, such as '{yyyy/MM/dd}', create a virtual column through the path pattern setting and set it as the "partition timestamp" to enable time series API functionality. -### Migrate to Model Monitor -When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). - -When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). - -Following sections contain more details on how to migrate to Model Monitor. - - - # [Python SDK](#tab/python) @@ -221,7 +212,119 @@ If your data is already partitioned by date or time, as is the case here, you ca --- +## Create dataset monitor + +Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). +As described later, a dataset monitor runs at a set frequency (daily, weekly, monthly) intervals. It analyzes new data available in the target dataset since its last run. In some cases, such analysis of the most recent data may not suffice: + +- The new data from the upstream source was delayed due to a broken data pipeline, and this new data wasn't available when the dataset monitor ran. +- A time series dataset had only historical data, and you want to analyze drift patterns in the dataset over time. For example: compare traffic flowing to a website, in both winter and summer seasons, to identify seasonal patterns. +- You're new to Dataset Monitors. You want to evaluate how the feature works with your existing data before you set it up to monitor future days. In such scenarios, you can submit an on-demand run, with a specific target dataset set date range, to compare with the baseline dataset. + +The **backfill** function runs a backfill job, for a specified start and end date range. A backfill job fills in expected missing data points in a data set, as a way to ensure data accuracy and completeness. + +> [!NOTE] +> Azure Machine Learning model monitoring doesn't support manual **backfill** function, if you want to redo the model monitor for a specif time range, you can create another model monitor for that specific time range. + +# [Python SDK](#tab/python) + + +[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] + +See the [Python SDK reference documentation on data drift](/python/api/azureml-datadrift/azureml.datadrift) for full details. + +The following example shows how to create a dataset monitor using the Python SDK + +```python +from azureml.core import Workspace, Dataset +from azureml.datadrift import DataDriftDetector +from datetime import datetime + +# get the workspace object +ws = Workspace.from_config() + +# get the target dataset +target = Dataset.get_by_name(ws, 'target') + +# set the baseline dataset +baseline = target.time_before(datetime(2019, 2, 1)) + +# set up feature list +features = ['latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'snowDepth', 'stationName', 'countryOrRegion'] + +# set up data drift detector +monitor = DataDriftDetector.create_from_datasets(ws, 'drift-monitor', baseline, target, + compute_target='cpu-cluster', + frequency='Week', + feature_list=None, + drift_threshold=.6, + latency=24) + +# get data drift detector by name +monitor = DataDriftDetector.get_by_name(ws, 'drift-monitor') + +# update data drift detector +monitor = monitor.update(feature_list=features) + +# run a backfill for January through May +backfill1 = monitor.backfill(datetime(2019, 1, 1), datetime(2019, 5, 1)) + +# run a backfill for May through today +backfill1 = monitor.backfill(datetime(2019, 5, 1), datetime.today()) + +# disable the pipeline schedule for the data drift detector +monitor = monitor.disable_schedule() + +# enable the pipeline schedule for the data drift detector +monitor = monitor.enable_schedule() +``` + +> [!TIP] +> For a full example of setting up a `timeseries` dataset and data drift detector, see our [example notebook](https://aka.ms/datadrift-notebook). + + +# [Studio](#tab/azure-studio) + + +1. Navigate to the [studio's homepage](https://ml.azure.com). +1. Select the **Data** tab. +1. Select **Dataset monitors**. + ![Monitor list](./media/how-to-monitor-datasets/monitor-list.png) + +1. Select the **+Create monitor** button, and select **Next** to continue through the wizard. + +:::image type="content" source="media/how-to-monitor-datasets/wizard.png" alt-text="Create a monitor wizard"::: + +* **Select target dataset**. The target dataset is a tabular dataset with timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored. + +* **Select baseline dataset.** Select the tabular dataset to be used as the baseline for comparison of the target dataset over time. The baseline dataset must have features in common with the target dataset. Select a time range to use a slice of the target dataset, or specify a separate dataset to use as the baseline. + +* **Monitor settings**. These settings are for the scheduled dataset monitor pipeline to create. + + | Setting | Description | Tips | Mutable | + | ------- | ----------- | ---- | ------- | + | Name | Name of the dataset monitor. | | No | + | Features | List of features that to analyze for data drift over time. | Set to a model's output feature(s) to measure concept drift. Don't include features that naturally drift over time (month, year, index, etc.). You can backfill and existing data drift monitor after adjusting the list of features. | Yes | + | Compute target | Azure Machine Learning compute target to run the dataset monitor jobs. | | Yes | + | Enable | Enable or disable the schedule on the dataset monitor pipeline | Disable the schedule to analyze historical data with the backfill setting. It can be enabled after the dataset monitor is created. | Yes | + | Frequency | The frequency that to use, to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each job compares data in the target dataset according to the frequency:
  • Daily: Compare most recent complete day in target dataset with baseline
  • Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline
  • Monthly: Compare most recent complete month in target dataset with baseline | No | + | Latency | Time, in hours, it takes for data to arrive in the dataset. For instance, if it takes three days for data to arrive in the SQL DB the dataset encapsulates, set the latency to 72. | Can't be changed after the creation of the dataset monitor | No | + | Email addresses | Email addresses for alerting based on breach of the data drift percentage threshold. | Emails are sent through Azure Monitor. | Yes | + | Threshold | Data drift percentage threshold for email alerting. | Further alerts and events can be set on many other metrics in the workspace's associated Application Insights resource. | Yes | + +After completion of the wizard, the resulting dataset monitor will appear in the list. Select it to go to that monitor's details page. + +--- + +### Migrate to Model Monitor +When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). + +When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). + +Following sections contain more details on how to migrate to Model Monitor. + +### If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time. @@ -312,7 +415,7 @@ created_monitor = poller.result() 1. Select **Monitoring** from the **Manage** section 1. Select **Add**. - :::image type="content" source="media/how-to-monitor-models/add-model-monitoring.png" alt-text="Screenshot showing how to add model monitoring." lightbox="media/how-to-monitor-models/add-model-monitoring.png"::: + :::image type="content" source="./media/how-to-monitor-models/add-model-monitoring.png" alt-text="Screenshot showing how to add model monitoring." lightbox="./media/how-to-monitor-models/add-model-monitoring.png"::: 1. On the **Basic settings** page, use **(Optional) Select model** to choose the model to monitor. 1. The **(Optional) Select deployment with data collection enabled** dropdown list should be automatically populated if the model is deployed to an Azure Machine Learning online endpoint. Select the deployment from the dropdown list. @@ -323,7 +426,7 @@ created_monitor = poller.result() 1. Select **Recurrence** or **Cron expression** scheduling. 1. For **Recurrence** scheduling, specify the repeat frequency, day, and time. For **Cron expression** scheduling, enter a cron expression for monitoring run. - :::image type="content" source="media/how-to-monitor-models/model-monitoring-basic-setup.png" alt-text="Screenshot of basic settings page for model monitoring." lightbox="media/how-to-monitor-models/model-monitoring-basic-setup.png"::: + :::image type="content" source="./media/how-to-monitor-models/model-monitoring-basic-setup.png" alt-text="Screenshot of basic settings page for model monitoring." lightbox="./media/how-to-monitor-models/model-monitoring-basic-setup.png"::: 1. Select **Next** to go to the **Advanced settings** section. 1. Select **Next** on the **Configure data asset** page to keep the default datasets. @@ -332,7 +435,7 @@ created_monitor = poller.result() 1. Review your monitoring details and select **Create** to create the monitor. - +### If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: @@ -341,7 +444,7 @@ You can also set up model monitoring for models deployed to Azure Machine Learni * Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data. * Provide a custom data preprocessing component and register it as an Azure Machine Learning component. -You must provide a custom data preprocessing component if your data isn't collected with the [data collector](how-to-collect-production-data.md). Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing. +You must provide a custom data preprocessing component if your data isn't collected with the [data collector](./how-to-collect-production-data.md). Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing. Your custom preprocessing component must have these input and output signatures: @@ -355,110 +458,6 @@ Your custom preprocessing component must have these input and output signatures: For an example of a custom data preprocessing component, see [custom_preprocessing in the azuremml-examples GitHub repo](https://github.com/Azure/azureml-examples/tree/main/cli/monitoring/components/custom_preprocessing). -## Create dataset monitor - -Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). - -As described later, a dataset monitor runs at a set frequency (daily, weekly, monthly) intervals. It analyzes new data available in the target dataset since its last run. In some cases, such analysis of the most recent data may not suffice: - -- The new data from the upstream source was delayed due to a broken data pipeline, and this new data wasn't available when the dataset monitor ran. -- A time series dataset had only historical data, and you want to analyze drift patterns in the dataset over time. For example: compare traffic flowing to a website, in both winter and summer seasons, to identify seasonal patterns. -- You're new to Dataset Monitors. You want to evaluate how the feature works with your existing data before you set it up to monitor future days. In such scenarios, you can submit an on-demand run, with a specific target dataset set date range, to compare with the baseline dataset. - -The **backfill** function runs a backfill job, for a specified start and end date range. A backfill job fills in expected missing data points in a data set, as a way to ensure data accuracy and completeness. - -> [!NOTE] -> Azure Machine Learning model monitoring doesn't support manual **backfill** function, if you want to redo the model monitor for a specif time range, you can create another model monitor for that specific time range. - -# [Python SDK](#tab/python) - - -[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] - -See the [Python SDK reference documentation on data drift](/python/api/azureml-datadrift/azureml.datadrift) for full details. - -The following example shows how to create a dataset monitor using the Python SDK - -```python -from azureml.core import Workspace, Dataset -from azureml.datadrift import DataDriftDetector -from datetime import datetime - -# get the workspace object -ws = Workspace.from_config() - -# get the target dataset -target = Dataset.get_by_name(ws, 'target') - -# set the baseline dataset -baseline = target.time_before(datetime(2019, 2, 1)) - -# set up feature list -features = ['latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'snowDepth', 'stationName', 'countryOrRegion'] - -# set up data drift detector -monitor = DataDriftDetector.create_from_datasets(ws, 'drift-monitor', baseline, target, - compute_target='cpu-cluster', - frequency='Week', - feature_list=None, - drift_threshold=.6, - latency=24) - -# get data drift detector by name -monitor = DataDriftDetector.get_by_name(ws, 'drift-monitor') - -# update data drift detector -monitor = monitor.update(feature_list=features) - -# run a backfill for January through May -backfill1 = monitor.backfill(datetime(2019, 1, 1), datetime(2019, 5, 1)) - -# run a backfill for May through today -backfill1 = monitor.backfill(datetime(2019, 5, 1), datetime.today()) - -# disable the pipeline schedule for the data drift detector -monitor = monitor.disable_schedule() - -# enable the pipeline schedule for the data drift detector -monitor = monitor.enable_schedule() -``` - -> [!TIP] -> For a full example of setting up a `timeseries` dataset and data drift detector, see our [example notebook](https://aka.ms/datadrift-notebook). - - -# [Studio](#tab/azure-studio) - - -1. Navigate to the [studio's homepage](https://ml.azure.com). -1. Select the **Data** tab. -1. Select **Dataset monitors**. - ![Monitor list](./media/how-to-monitor-datasets/monitor-list.png) - -1. Select the **+Create monitor** button, and select **Next** to continue through the wizard. - -:::image type="content" source="media/how-to-monitor-datasets/wizard.png" alt-text="Create a monitor wizard"::: - -* **Select target dataset**. The target dataset is a tabular dataset with timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored. - -* **Select baseline dataset.** Select the tabular dataset to be used as the baseline for comparison of the target dataset over time. The baseline dataset must have features in common with the target dataset. Select a time range to use a slice of the target dataset, or specify a separate dataset to use as the baseline. - -* **Monitor settings**. These settings are for the scheduled dataset monitor pipeline to create. - - | Setting | Description | Tips | Mutable | - | ------- | ----------- | ---- | ------- | - | Name | Name of the dataset monitor. | | No | - | Features | List of features that to analyze for data drift over time. | Set to a model's output feature(s) to measure concept drift. Don't include features that naturally drift over time (month, year, index, etc.). You can backfill and existing data drift monitor after adjusting the list of features. | Yes | - | Compute target | Azure Machine Learning compute target to run the dataset monitor jobs. | | Yes | - | Enable | Enable or disable the schedule on the dataset monitor pipeline | Disable the schedule to analyze historical data with the backfill setting. It can be enabled after the dataset monitor is created. | Yes | - | Frequency | The frequency that to use, to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each job compares data in the target dataset according to the frequency:
  • Daily: Compare most recent complete day in target dataset with baseline
  • Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline
  • Monthly: Compare most recent complete month in target dataset with baseline | No | - | Latency | Time, in hours, it takes for data to arrive in the dataset. For instance, if it takes three days for data to arrive in the SQL DB the dataset encapsulates, set the latency to 72. | Can't be changed after the creation of the dataset monitor | No | - | Email addresses | Email addresses for alerting based on breach of the data drift percentage threshold. | Emails are sent through Azure Monitor. | Yes | - | Threshold | Data drift percentage threshold for email alerting. | Further alerts and events can be set on many other metrics in the workspace's associated Application Insights resource. | Yes | - -After completion of the wizard, the resulting dataset monitor will appear in the list. Select it to go to that monitor's details page. - ---- ## Understand data drift results From 0b23009031ee5678478692fef10884f844926814 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Tue, 17 Sep 2024 17:01:20 -0700 Subject: [PATCH 10/25] Update how-to-monitor-datasets.md --- .../v1/how-to-monitor-datasets.md | 32 +++++++++---------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 37c39291e8..a006180d51 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -16,11 +16,9 @@ ms.custom: UpdateFrequency5, data4ml, sdkv1 # Data drift (preview) will be retired, and replaced by Model Monitor -Data drift(preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](https://learn.microsoft.com/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli) for your data drift tasks. +Data drift(preview) will be retired at 09/01/2025, and you can start to use [Model Monitor](../how-to-monitor-model-performance.md) for your data drift tasks. Please check the content below to understand the replacement, feature gaps and manual change steps. -# Detect data drift (preview) on datasets - [!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] Learn how to monitor data drift and set alerts when drift is high. @@ -59,11 +57,11 @@ When you migrate to Model Monitor, please check the prerequisites as following: # [Azure CLI](#tab/azure-cli) -[!INCLUDE [basic prereqs cli](./includes/machine-learning-cli-prereqs.md)] +[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] # [Python SDK](#tab/python) -[!INCLUDE [basic prereqs sdk](./includes/machine-learning-sdk-v2-prereqs.md)] +[!INCLUDE [basic prereqs sdk](../includes/machine-learning-sdk-v2-prereqs.md)] # [Studio](#tab/azure-studio) @@ -71,17 +69,17 @@ Before following the steps in this article, make sure you have the following pre * An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/). -* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](./quickstart-create-resources.md) article to create them. +* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](../quickstart-create-resources.md) article to create them. --- -* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](./how-to-assign-roles.md). +* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md). * For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: - * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](./how-to-deploy-online-endpoints.md). + * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](../how-to-deploy-online-endpoints.md). - * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](./how-to-collect-production-data.md). + * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](../how-to-collect-production-data.md). * For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to: @@ -148,7 +146,7 @@ You monitor [Azure Machine Learning datasets](how-to-create-register-datasets.md The monitor compares the baseline and target datasets. #### Migrate to Model Monitor -In Model Monitor, you can find corresponding concepts as following, and you can find more details in this article [Set up model monitoring by bringing in your production data to Azure Machine Learning](https://learn.microsoft.com/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli#set-up-out-of-box-model-monitoring): +In Model Monitor, you can find corresponding concepts as following, and you can find more details in this article [Set up model monitoring by bringing in your production data to Azure Machine Learning](../how-to-monitor-model-performance.md#set-up-out-of-box-model-monitoring): * Reference dataset: similar to your baseline dataset for data drift detection, it is set as the recent past production inference dataset. * Production inference data: similar to your target dataset in data drift detection, the production inference data can be collected automatically from models deployed in production. It can also be inference data you store. @@ -318,15 +316,15 @@ After completion of the wizard, the resulting dataset monitor will appear in the --- ### Migrate to Model Monitor -When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). +When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). -When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). +When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). Following sections contain more details on how to migrate to Model Monitor. ### If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection -If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli) at deployment time. +If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time. # [Azure CLI](#tab/azure-cli) @@ -415,7 +413,7 @@ created_monitor = poller.result() 1. Select **Monitoring** from the **Manage** section 1. Select **Add**. - :::image type="content" source="./media/how-to-monitor-models/add-model-monitoring.png" alt-text="Screenshot showing how to add model monitoring." lightbox="./media/how-to-monitor-models/add-model-monitoring.png"::: + :::image type="content" source="../media/how-to-monitor-models/add-model-monitoring.png" alt-text="Screenshot showing how to add model monitoring." lightbox="../media/how-to-monitor-models/add-model-monitoring.png"::: 1. On the **Basic settings** page, use **(Optional) Select model** to choose the model to monitor. 1. The **(Optional) Select deployment with data collection enabled** dropdown list should be automatically populated if the model is deployed to an Azure Machine Learning online endpoint. Select the deployment from the dropdown list. @@ -426,7 +424,7 @@ created_monitor = poller.result() 1. Select **Recurrence** or **Cron expression** scheduling. 1. For **Recurrence** scheduling, specify the repeat frequency, day, and time. For **Cron expression** scheduling, enter a cron expression for monitoring run. - :::image type="content" source="./media/how-to-monitor-models/model-monitoring-basic-setup.png" alt-text="Screenshot of basic settings page for model monitoring." lightbox="./media/how-to-monitor-models/model-monitoring-basic-setup.png"::: + :::image type="content" source="../media/how-to-monitor-models/model-monitoring-basic-setup.png" alt-text="Screenshot of basic settings page for model monitoring." lightbox="../media/how-to-monitor-models/model-monitoring-basic-setup.png"::: 1. Select **Next** to go to the **Advanced settings** section. 1. Select **Next** on the **Configure data asset** page to keep the default datasets. @@ -436,7 +434,7 @@ created_monitor = poller.result() ### If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection -When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](https://learn.microsoft.com/azure/machine-learning/how-to-collect-production-data?view=azureml-api-2&tabs=azure-cli), you can also [set up model monitoring with custom signals and metrics](https://learn.microsoft.com/en-us/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-studio#set-up-model-monitoring-with-custom-signals-and-metrics). +When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: @@ -444,7 +442,7 @@ You can also set up model monitoring for models deployed to Azure Machine Learni * Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data. * Provide a custom data preprocessing component and register it as an Azure Machine Learning component. -You must provide a custom data preprocessing component if your data isn't collected with the [data collector](./how-to-collect-production-data.md). Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing. +You must provide a custom data preprocessing component if your data isn't collected with the [data collector](../how-to-collect-production-data.md). Without this custom data preprocessing component, the Azure Machine Learning model monitoring system won't know how to process your data into tabular form with support for time windowing. Your custom preprocessing component must have these input and output signatures: From 3382e5f23931a91a306469379a6a6f9c9aa57d40 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Tue, 17 Sep 2024 22:49:55 -0700 Subject: [PATCH 11/25] Update how-to-monitor-datasets.md --- .../v1/how-to-monitor-datasets.md | 37 +++++++++++-------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index a006180d51..24f0df175d 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -55,9 +55,6 @@ To create and work with dataset monitors, you need: ### Migrate to Model Monitor When you migrate to Model Monitor, please check the prerequisites as following: -# [Azure CLI](#tab/azure-cli) - -[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] # [Python SDK](#tab/python) @@ -87,6 +84,10 @@ Before following the steps in this article, make sure you have the following pre * Update the registered data asset continuously for model monitoring. * (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking. +# [Azure CLI](#tab/azure-cli) + +[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] + > [!IMPORTANT] > > Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: `Standard_E4s_v3`, `Standard_E8s_v3`, `Standard_E16s_v3`, `Standard_E32s_v3`, and `Standard_E64s_v3`. You can select the VM instance type with the `create_monitor.compute.instance_type` property in your YAML configuration or from the dropdown in the Azure Machine Learning studio. @@ -208,8 +209,13 @@ If your data is already partitioned by date or time, as is the case here, you ca :::image type="content" source="media/how-to-monitor-datasets/timeseries-partitiontimestamp.png" alt-text="Partition timestamp"::: + +# [Azure CLI](#tab/azure-cli) + +Not supported. --- + ## Create dataset monitor Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). @@ -313,6 +319,8 @@ monitor = monitor.enable_schedule() After completion of the wizard, the resulting dataset monitor will appear in the list. Select it to go to that monitor's details page. +# [Azure CLI](#tab/azure-cli) +Not supported --- ### Migrate to Model Monitor @@ -326,18 +334,6 @@ Following sections contain more details on how to migrate to Model Monitor. If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time. -# [Azure CLI](#tab/azure-cli) - -Azure Machine Learning model monitoring uses `az ml schedule` to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition: - -```azurecli -az ml schedule create -f ./out-of-box-monitoring.yaml -``` - -The following YAML contains the definition for the out-of-box model monitoring. - -:::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: - # [Python SDK](#tab/python) You can use the following code to set up the out-of-box model monitoring: @@ -431,7 +427,18 @@ created_monitor = poller.result() 1. Select **Next** to go to the **Select monitoring signals** page. 1. Select **Next** to go to the **Notifications** page. Add your email to receive email notifications. 1. Review your monitoring details and select **Create** to create the monitor. +# [Azure CLI](#tab/azure-cli) + +Azure Machine Learning model monitoring uses `az ml schedule` to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition: +```azurecli +az ml schedule create -f ./out-of-box-monitoring.yaml +``` + +The following YAML contains the definition for the out-of-box model monitoring. + +:::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: +--- ### If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). From aacdd188b20a32e6bad870d5e88c5925339ca8ee Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Tue, 17 Sep 2024 23:05:09 -0700 Subject: [PATCH 12/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 24f0df175d..93eea5016a 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -87,7 +87,7 @@ Before following the steps in this article, make sure you have the following pre # [Azure CLI](#tab/azure-cli) [!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] - +--- > [!IMPORTANT] > > Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: `Standard_E4s_v3`, `Standard_E8s_v3`, `Standard_E16s_v3`, `Standard_E32s_v3`, and `Standard_E64s_v3`. You can select the VM instance type with the `create_monitor.compute.instance_type` property in your YAML configuration or from the dropdown in the Azure Machine Learning studio. From 2556ad63711a8db3ee720a79ecfbd19ea10b91b1 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Tue, 17 Sep 2024 23:18:54 -0700 Subject: [PATCH 13/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 93eea5016a..33fb55cc57 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -68,8 +68,6 @@ Before following the steps in this article, make sure you have the following pre * An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](../quickstart-create-resources.md) article to create them. ---- - * Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md). * For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: From b9d89ea867ac2f4e221d31a434c5955cda20335f Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Tue, 17 Sep 2024 23:40:38 -0700 Subject: [PATCH 14/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 33fb55cc57..84d2ca0db3 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -144,7 +144,7 @@ You monitor [Azure Machine Learning datasets](how-to-create-register-datasets.md The monitor compares the baseline and target datasets. -#### Migrate to Model Monitor +### Migrate to Model Monitor In Model Monitor, you can find corresponding concepts as following, and you can find more details in this article [Set up model monitoring by bringing in your production data to Azure Machine Learning](../how-to-monitor-model-performance.md#set-up-out-of-box-model-monitoring): * Reference dataset: similar to your baseline dataset for data drift detection, it is set as the recent past production inference dataset. * Production inference data: similar to your target dataset in data drift detection, the production inference data can be collected automatically from models deployed in production. It can also be inference data you store. @@ -214,6 +214,7 @@ Not supported. --- + ## Create dataset monitor Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). @@ -321,7 +322,8 @@ After completion of the wizard, the resulting dataset monitor will appear in the Not supported --- -### Migrate to Model Monitor + +## Migrate to Model Monitor When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). From 2e237906aca7e9d3f8afd7489954aa8545fbff5c Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Wed, 18 Sep 2024 08:22:51 -0700 Subject: [PATCH 15/25] Update how-to-monitor-datasets.md --- .../machine-learning/v1/how-to-monitor-datasets.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 84d2ca0db3..0b7fa4e563 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -68,6 +68,11 @@ Before following the steps in this article, make sure you have the following pre * An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](../quickstart-create-resources.md) article to create them. +# [Azure CLI](#tab/azure-cli) + +[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] +--- + * Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md). * For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: @@ -82,10 +87,6 @@ Before following the steps in this article, make sure you have the following pre * Update the registered data asset continuously for model monitoring. * (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking. -# [Azure CLI](#tab/azure-cli) - -[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] ---- > [!IMPORTANT] > > Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: `Standard_E4s_v3`, `Standard_E8s_v3`, `Standard_E16s_v3`, `Standard_E32s_v3`, and `Standard_E64s_v3`. You can select the VM instance type with the `create_monitor.compute.instance_type` property in your YAML configuration or from the dropdown in the Azure Machine Learning studio. @@ -215,6 +216,7 @@ Not supported. + ## Create dataset monitor Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the [Python SDK](#sdk-monitor) or [Azure Machine Learning studio](#studio-monitor). @@ -319,6 +321,7 @@ monitor = monitor.enable_schedule() After completion of the wizard, the resulting dataset monitor will appear in the list. Select it to go to that monitor's details page. # [Azure CLI](#tab/azure-cli) + Not supported --- @@ -427,6 +430,7 @@ created_monitor = poller.result() 1. Select **Next** to go to the **Select monitoring signals** page. 1. Select **Next** to go to the **Notifications** page. Add your email to receive email notifications. 1. Review your monitoring details and select **Create** to create the monitor. + # [Azure CLI](#tab/azure-cli) Azure Machine Learning model monitoring uses `az ml schedule` to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition: From 22eecfba50f0b2c22649b2be98a513f57df05867 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Wed, 18 Sep 2024 08:35:43 -0700 Subject: [PATCH 16/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 0b7fa4e563..b8261f9254 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -52,7 +52,7 @@ To create and work with dataset monitors, you need: * The [Azure Machine Learning SDK for Python installed](/python/api/overview/azure/ml/install), which includes the azureml-datasets package. * Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data. -### Migrate to Model Monitor +## Migrate to Model Monitor When you migrate to Model Monitor, please check the prerequisites as following: @@ -108,7 +108,7 @@ This top down approach makes it easy to monitor data instead of traditional rule In Azure Machine Learning, you use dataset monitors to detect and alert for data drift. -### Dataset monitors +## Dataset monitors With a dataset monitor you can: @@ -333,7 +333,7 @@ When you migrate to Model Monitor, if you didn't deploy your model to production Following sections contain more details on how to migrate to Model Monitor. -### If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection +## If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time. @@ -444,7 +444,7 @@ The following YAML contains the definition for the out-of-box model monitoring. :::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: --- -### If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection +## If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: From 37af843af259e2987ed7634340573a9cc864580e Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Wed, 18 Sep 2024 08:47:01 -0700 Subject: [PATCH 17/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index b8261f9254..e051a56f77 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -52,7 +52,7 @@ To create and work with dataset monitors, you need: * The [Azure Machine Learning SDK for Python installed](/python/api/overview/azure/ml/install), which includes the azureml-datasets package. * Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data. -## Migrate to Model Monitor +## Prerequisites (Migrate to Model Monitor) When you migrate to Model Monitor, please check the prerequisites as following: @@ -326,14 +326,14 @@ Not supported --- -## Migrate to Model Monitor +## Create Model Monitor (Migrate to Model Monitor) When you migrate to Model Monitor, if you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time, Azure Machine Learning collects production inference data, and automatically stores it in Microsoft Azure Blob Storage. You can then use Azure Machine Learning model monitoring to continuously monitor this production inference data, and you can directly choose the model to create target dataset (production inference data in Model Monitor). When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). Following sections contain more details on how to migrate to Model Monitor. -## If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection +## If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection (Migrate to Model Monitor) If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time. @@ -444,7 +444,7 @@ The following YAML contains the definition for the out-of-box model monitoring. :::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: --- -## If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection +## If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection (Migrate to Model Monitor) When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: From 406466df9d16c3f07b8f7502f5b54b58c3c7bad3 Mon Sep 17 00:00:00 2001 From: challenp Date: Fri, 20 Sep 2024 09:26:56 -0700 Subject: [PATCH 18/25] Update azure-government.md Add extra note around BCDR due to FedRAMP requirements --- articles/ai-services/openai/azure-government.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/articles/ai-services/openai/azure-government.md b/articles/ai-services/openai/azure-government.md index cf9499db98..da28568d69 100644 --- a/articles/ai-services/openai/azure-government.md +++ b/articles/ai-services/openai/azure-government.md @@ -17,7 +17,9 @@ This article highlights the differences when using Azure OpenAI in Azure Governm ## Azure OpenAI models -Learn more about the different capabilities of each model in [Azure OpenAI Service models](./concepts/models.md). The following sections show model availability by region and deployment type. +Learn more about the different capabilities of each model in [Azure OpenAI Service models](./concepts/models.md). For customers with [Business Continuity and Disaster Recovery (BCDR) considerations](./how-to/business-continuity-disaster-recovery.md), please take careful note of the deployment types, regions, and model availability below as not all model/type combinations are available in both regions. + +The following sections show model availability by region and deployment type. ### Standard deployment model availability From 13d2b9a8519e874ed9bdd631c68f435d8ac08456 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Fri, 20 Sep 2024 12:45:21 -0700 Subject: [PATCH 19/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index e051a56f77..d1b3a30b9a 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -233,7 +233,7 @@ The **backfill** function runs a backfill job, for a specified start and end dat > Azure Machine Learning model monitoring doesn't support manual **backfill** function, if you want to redo the model monitor for a specif time range, you can create another model monitor for that specific time range. # [Python SDK](#tab/python) - + [!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] @@ -290,7 +290,7 @@ monitor = monitor.enable_schedule() # [Studio](#tab/azure-studio) - + 1. Navigate to the [studio's homepage](https://ml.azure.com). 1. Select the **Data** tab. @@ -442,6 +442,7 @@ az ml schedule create -f ./out-of-box-monitoring.yaml The following YAML contains the definition for the out-of-box model monitoring. :::code language="yaml" source="~/azureml-examples-main/cli/monitoring/out-of-box-monitoring.yaml"::: + --- ## If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection (Migrate to Model Monitor) From 236842627a022bd6cd2684cd90cd7e570a275728 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 23 Sep 2024 10:27:39 -0700 Subject: [PATCH 20/25] Update how-to-monitor-datasets.md --- .../v1/how-to-monitor-datasets.md | 41 ++----------------- 1 file changed, 3 insertions(+), 38 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index d1b3a30b9a..64b4386a98 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -53,44 +53,9 @@ To create and work with dataset monitors, you need: * Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data. ## Prerequisites (Migrate to Model Monitor) -When you migrate to Model Monitor, please check the prerequisites as following: +When you migrate to Model Monitor, please check the prerequisites as mentioned in this article [Prerequisites of Azure Machine Learning model monitoring](../how-to-monitor-model-performance.md#prerequisites). -# [Python SDK](#tab/python) - -[!INCLUDE [basic prereqs sdk](../includes/machine-learning-sdk-v2-prereqs.md)] - -# [Studio](#tab/azure-studio) - -Before following the steps in this article, make sure you have the following prerequisites: - -* An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/). - -* An Azure Machine Learning workspace and a compute instance. If you don't have these resources, use the steps in the [Quickstart: Create workspace resources](../quickstart-create-resources.md) article to create them. - -# [Azure CLI](#tab/azure-cli) - -[!INCLUDE [basic prereqs cli](../includes/machine-learning-cli-prereqs.md)] ---- - -* Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure Machine Learning workspace, or a custom role allowing `Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*`. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md). - -* For monitoring a model that is deployed to an Azure Machine Learning online endpoint (managed online endpoint or Kubernetes online endpoint), be sure to: - - * Have a model already deployed to an Azure Machine Learning online endpoint. Both managed online endpoint and Kubernetes online endpoint are supported. If you don't have a model deployed to an Azure Machine Learning online endpoint, see [Deploy and score a machine learning model by using an online endpoint](../how-to-deploy-online-endpoints.md). - - * Enable data collection for your model deployment. You can enable data collection during the deployment step for Azure Machine Learning online endpoints. For more information, see [Collect production data from models deployed to a real-time endpoint](../how-to-collect-production-data.md). - -* For monitoring a model that is deployed to an Azure Machine Learning batch endpoint or deployed outside of Azure Machine Learning, be sure to: - - * Have a means to collect production data and register it as an Azure Machine Learning data asset. - * Update the registered data asset continuously for model monitoring. - * (Recommended) Register the model in an Azure Machine Learning workspace, for lineage tracking. - -> [!IMPORTANT] -> -> Model monitoring jobs are scheduled to run on serverless Spark compute pools with support for the following VM instance types: `Standard_E4s_v3`, `Standard_E8s_v3`, `Standard_E16s_v3`, `Standard_E32s_v3`, and `Standard_E64s_v3`. You can select the VM instance type with the `create_monitor.compute.instance_type` property in your YAML configuration or from the dropdown in the Azure Machine Learning studio. - ## What is data drift? Model accuracy degrades over time, largely because of data drift. For machine learning models, data drift is the change in model input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues. @@ -233,7 +198,7 @@ The **backfill** function runs a backfill job, for a specified start and end dat > Azure Machine Learning model monitoring doesn't support manual **backfill** function, if you want to redo the model monitor for a specif time range, you can create another model monitor for that specific time range. # [Python SDK](#tab/python) - + [!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)] @@ -290,7 +255,7 @@ monitor = monitor.enable_schedule() # [Studio](#tab/azure-studio) - + 1. Navigate to the [studio's homepage](https://ml.azure.com). 1. Select the **Data** tab. From db25aac19d7961c97a507d5102416219d799e077 Mon Sep 17 00:00:00 2001 From: Facundo Santiago Date: Mon, 23 Sep 2024 15:13:30 -0400 Subject: [PATCH 21/25] Update llama-index.md --- .../ai-studio/how-to/develop/llama-index.md | 35 ++++++++++++++++--- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/articles/ai-studio/how-to/develop/llama-index.md b/articles/ai-studio/how-to/develop/llama-index.md index 9f284d0079..271f8ad417 100644 --- a/articles/ai-studio/how-to/develop/llama-index.md +++ b/articles/ai-studio/how-to/develop/llama-index.md @@ -13,7 +13,7 @@ author: eric-urban # Develop applications with LlamaIndex and Azure AI studio -In this article, you learn how to use [LlamaIndex](https://github.com/run-llama/llama_index) with models deployed from the Azure AI model catalog deployed to Azure AI studio. +In this article, you learn how to use [LlamaIndex](https://github.com/run-llama/llama_index) with models deployed from the Azure AI model catalog in Azure AI studio. Models deployed to Azure AI studio can be used with LlamaIndex in two ways: @@ -49,7 +49,7 @@ To run this tutorial, you need: ## Configure the environment -To use LLMs deployed in Azure AI studio, you need the endpoint and credentials to connect to it. The parameter `model_name` is not required for endpoints serving a single model, like Managed Online Endpoints. Follow these steps to get the information you need from the model you want to use: +To use LLMs deployed in Azure AI studio, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use: 1. Go to the [Azure AI studio](https://ai.azure.com/). 2. Go to deployments and select the model you deployed as indicated in the prerequisites. @@ -79,10 +79,15 @@ llm = AzureAICompletionsModel( ) ``` +> [!TIP] +> The parameter `model_name` in the constructor is not required for endpoints serving a single model, like serverless endpoints). + Alternatively, if your endpoint support Microsoft Entra ID, you can use the following code to create the client: ```python +import os from azure.identity import DefaultAzureCredential +from llama_index.llms.azure_inference import AzureAICompletionsModel llm = AzureAICompletionsModel( endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], @@ -91,7 +96,7 @@ llm = AzureAICompletionsModel( ``` > [!NOTE] -> > Note: When using Microsoft Entra ID, make sure that the endpoint was deployed with that authentication method and that you have the required permissions to invoke it. +> When using Microsoft Entra ID, make sure that the endpoint was deployed with that authentication method and that you have the required permissions to invoke it. If you are planning to use asynchronous calling, it's a best practice to use the asynchronous version for the credentials: @@ -99,6 +104,7 @@ If you are planning to use asynchronous calling, it's a best practice to use the from azure.identity.aio import ( DefaultAzureCredential as DefaultAzureCredentialAsync, ) +from llama_index.llms.azure_inference import AzureAICompletionsModel llm = AzureAICompletionsModel( endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], @@ -132,7 +138,7 @@ llm = AzureAICompletionsModel( ## Use LLMs models -Use the `chat` endpoint for chat instruction models. The `complete` method is still available for model of type `chat-completions`. On those cases, your input text is converted to a message with `role="user"`. +You can use the client directly or [#configure-the-models-used-by-your-code](Configure the models used by your code) in LlamaIndex. To use the model directly, use the `chat` method for chat instruction models: ```python from llama_index.core.llms import ChatMessage @@ -156,9 +162,11 @@ for r in response: print(r.delta, end="") ``` +The `complete` method is still available for model of type `chat-completions`. On those cases, your input text is converted to a message with `role="user"`. + ## Use embeddings models -In the same way you create an LLM client, you can connect to an embedding model. In the following example, we are setting again the environment variable to now point to an embeddings model: +In the same way you create an LLM client, you can connect to an embeddings model. In the following example, we are setting the environment variable to now point to an embeddings model: ```bash export AZURE_INFERENCE_ENDPOINT="" @@ -176,6 +184,21 @@ embed_model = AzureAIEmbeddingsModel( ) ``` +The following example shows a simple test to verify it works: + +```python +from llama_index.core.schema import TextNode + +nodes = [ + TextNode( + text="Before college the two main things I worked on, " + "outside of school, were writing and programming." + ) +] +response = embed_model(nodes=nodes) +print(response[0].embedding) +``` + ## Configure the models used by your code You can use the LLM or embeddings model client individually in the code you develop with LlamaIndex or you can configure the entire session using the `Settings` options. Configuring the session has the advantage of all your code using the same models for all the operations. @@ -200,3 +223,5 @@ In general, you use a combination of both strategies. ## Related content * [How to get started with Azure AI SDKs](sdk-overview.md) +* [Reference for LlamaIndex Embeddings Integration](https://llamahub.ai/l/embeddings/llama-index-embeddings-azure-inference) +* [Reference for LlamaIndex LLMs Integration](https://llamahub.ai/l/llms/llama-index-llms-azure-inference) From 81780c6baebc2b33fc2210512911944901800eea Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 23 Sep 2024 13:07:40 -0700 Subject: [PATCH 22/25] Update how-to-monitor-datasets.md --- .../machine-learning/v1/how-to-monitor-datasets.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 64b4386a98..84cd62f090 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -286,8 +286,10 @@ monitor = monitor.enable_schedule() After completion of the wizard, the resulting dataset monitor will appear in the list. Select it to go to that monitor's details page. # [Azure CLI](#tab/azure-cli) + Not supported + --- @@ -298,11 +300,12 @@ When you migrate to Model Monitor, if you didn't deploy your model to production Following sections contain more details on how to migrate to Model Monitor. -## If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled data collection (Migrate to Model Monitor) +## Create Model Monitor via automatically collected production data (Migrate to Model Monitor) If you have deployed your model to production in an Azure Machine Learning online endpoint and enabled [data collection](../how-to-collect-production-data.md) at deployment time. # [Python SDK](#tab/python) + You can use the following code to set up the out-of-box model monitoring: @@ -371,6 +374,7 @@ created_monitor = poller.result() ``` # [Studio](#tab/azure-studio) + 1. Navigate to [Azure Machine Learning studio](https://ml.azure.com). 1. Go to your workspace. @@ -397,6 +401,7 @@ created_monitor = poller.result() 1. Review your monitoring details and select **Create** to create the monitor. # [Azure CLI](#tab/azure-cli) + Azure Machine Learning model monitoring uses `az ml schedule` to schedule a monitoring job. You can create the out-of-box model monitor with the following CLI command and YAML definition: @@ -410,10 +415,10 @@ The following YAML contains the definition for the out-of-box model monitoring. --- -## If you didn't deploy your model to production in an Azure Machine Learning online endpoint or you don't want to use data collection (Migrate to Model Monitor) +## Create Model Monitor via custom data preprocessing component (Migrate to Model Monitor) When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). -You can also set up model monitoring for models deployed to Azure Machine Learning batch endpoints or deployed outside of Azure Machine Learning. If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: +If you don't have a deployment, but you have production data, you can use the data to perform continuous model monitoring. To monitor these models, you must be able to: * Collect production inference data from models deployed in production. * Register the production inference data as an Azure Machine Learning data asset, and ensure continuous updates of the data. From c005f753c1abf5a8a2f05618e097682bd8c9d499 Mon Sep 17 00:00:00 2001 From: Xun Wang <54865857+SturgeonMi@users.noreply.github.com> Date: Mon, 23 Sep 2024 13:21:42 -0700 Subject: [PATCH 23/25] Update how-to-monitor-datasets.md --- .../machine-learning/v1/how-to-monitor-datasets.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 84cd62f090..cd78bdca83 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -156,7 +156,6 @@ dset = dset.register(ws, 'target') > For a full example of using the `timeseries` trait of datasets, see the [example notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb) or the [datasets SDK documentation](/python/api/azureml-core/azureml.data.tabulardataset#with-timestamp-columns-timestamp-none--partition-timestamp-none--validate-false----kwargs-). # [Studio](#tab/azure-studio) - If you create your dataset using Azure Machine Learning studio, ensure the path to your data contains timestamp information, include all subfolders with data, and set the partition format. @@ -175,8 +174,10 @@ If your data is already partitioned by date or time, as is the case here, you ca # [Azure CLI](#tab/azure-cli) + Not supported. + --- @@ -266,11 +267,11 @@ monitor = monitor.enable_schedule() :::image type="content" source="media/how-to-monitor-datasets/wizard.png" alt-text="Create a monitor wizard"::: -* **Select target dataset**. The target dataset is a tabular dataset with timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored. +1. **Select target dataset**. The target dataset is a tabular dataset with timestamp column specified which to analyze for data drift. The target dataset must have features in common with the baseline dataset, and should be a `timeseries` dataset, which new data is appended to. Historical data in the target dataset can be analyzed, or new data can be monitored. -* **Select baseline dataset.** Select the tabular dataset to be used as the baseline for comparison of the target dataset over time. The baseline dataset must have features in common with the target dataset. Select a time range to use a slice of the target dataset, or specify a separate dataset to use as the baseline. +1. **Select baseline dataset.** Select the tabular dataset to be used as the baseline for comparison of the target dataset over time. The baseline dataset must have features in common with the target dataset. Select a time range to use a slice of the target dataset, or specify a separate dataset to use as the baseline. -* **Monitor settings**. These settings are for the scheduled dataset monitor pipeline to create. +1. **Monitor settings**. These settings are for the scheduled dataset monitor pipeline to create. | Setting | Description | Tips | Mutable | | ------- | ----------- | ---- | ------- | @@ -415,6 +416,7 @@ The following YAML contains the definition for the out-of-box model monitoring. --- + ## Create Model Monitor via custom data preprocessing component (Migrate to Model Monitor) When you migrate to Model Monitor, if you didn't deploy your model to production in an Azure Machine Learning online endpoint, or you don't want to use [data collection](../how-to-collect-production-data.md), you can also [set up model monitoring with custom signals and metrics](../how-to-monitor-model-performance.md#set-up-model-monitoring-with-custom-signals-and-metrics). From c359c2ece706623690cdf22a12745400a3d42596 Mon Sep 17 00:00:00 2001 From: Ross McAllister <10053959+rmca14@users.noreply.github.com> Date: Mon, 23 Sep 2024 14:32:30 -0700 Subject: [PATCH 24/25] Update how-to-monitor-datasets.md Acrolinx --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index cd78bdca83..0d60b4642f 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -504,7 +504,7 @@ On this chart, select a single date to compare the feature distribution between ## Metrics, alerts, and events -Metrics can be queried in the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. +Metrics can be queried in the [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview) resource associated with your machine learning workspace. You have access to all features of Application Insights including set up for custom alert rules and action groups to trigger an action such as an Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details. To get started, navigate to the [Azure portal](https://portal.azure.com) and select your workspace's **Overview** page. The associated Application Insights resource is on the far right: From 26ec09c9d69ea45a3673b5708fac47e57f6656bd Mon Sep 17 00:00:00 2001 From: Ross McAllister <10053959+rmca14@users.noreply.github.com> Date: Mon, 23 Sep 2024 14:41:02 -0700 Subject: [PATCH 25/25] Update how-to-monitor-datasets.md --- articles/machine-learning/v1/how-to-monitor-datasets.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/machine-learning/v1/how-to-monitor-datasets.md b/articles/machine-learning/v1/how-to-monitor-datasets.md index 0d60b4642f..1513fa23f4 100644 --- a/articles/machine-learning/v1/how-to-monitor-datasets.md +++ b/articles/machine-learning/v1/how-to-monitor-datasets.md @@ -205,7 +205,7 @@ The **backfill** function runs a backfill job, for a specified start and end dat See the [Python SDK reference documentation on data drift](/python/api/azureml-datadrift/azureml.datadrift) for full details. -The following example shows how to create a dataset monitor using the Python SDK +The following example shows how to create a dataset monitor using the Python SDK: ```python from azureml.core import Workspace, Dataset