Follow the steps below to train a Azure AI Document Intelligence custom extraction machine learning (ML) model.
In the repository folder data/samples/train, there are pre-labeled training forms:
These need to be uploaded to the Azure Storage account.
- Go to the Azure Portal and select the Azure Data Lake Storage Account that was created by deployment script. The name of this account should be storage.
- On the left side of the menu pane, select
Data storage
, thenContainers
and go to thesamples
container. - Create a new folder and name it
train
. - In the
train
folder, create two folders. One namedcontoso_set_1
and the other namedcontoso_set_2
. - Upload the sample labeling files in data/samples/train/contoso_set_1 and data/samples/train/contoso_set_2 into the corresponding folders. You now have two full sets of pre-labeled data to create the machine learning models.
In this step, you will train custom Azure AI Document intelligence custom extraction models and merge them into a composite model. For more information, please refer to Azure online document Compose Custom Models.
-
Go to Document Intelligence Studio, scroll down to
Custom Extraction Model
and selectCreate new
, as illustrated below. -
Select
+Create a project
to create a project. -
Enter a project name. For example
SafetyFormProject-Set-1
or any other project name of your choice. -
Enter a project description. For example
Custom document intelligence model with samples contoso_set_1
and clickContinue
. -
Select your Subscription, Resource Group and the Document Intelligence resource.
-
Select the latest, non-preview API Version.
-
Now you will be prompted to enter the training data source, as illustrated below. Select your subscription. Select Resource Group, and Azure storage created by the deployment scripts. Enter
samples
in the Blob container field. Entertrain/contoso_set_1
in the Folder path field. ClickContinue
. -
Review Information and click
Create Project
. This step connects the form recognizer studio to Azure data lake storage/container resource in your subscription to access the training data. -
After the project is created, forms with OCR, field key and value pair will appear as illustrated below. Click '
Train
' on upper right corner. -
Fill in information as below, and select the dropdown "Build Mode" to
Template
, and then clickTrain
. -
Once the training for
contoso_set_1
samples is done, the model will be located inModels
section with confidence score of each field, as illustrated below. -
Train a second model with files stored in
train/contoso_set_2
, using above steps to create a new project and model. Name your second model asconsoto-set-2
or choice of your own. -
Click 'Models' from your project. You will see a list of models already created. You can now merge individual models into a composite model. Select
contoso-set-1
andcontoso-set-2
, then clickCompose
. The system will prompt you for a new model name and description. Name itcontoso-safety-forms
and provide a description. ClickCompose
. -
Now your model id
contoso-safety-forms
will appear in the Model ID list, as illustrated below. -
If you called your composite model
contoso-safety-forms
, you can go on to to Run the Solution -
If you did NOT call your composite model
contoso-safety-forms
, follow the instructions below:- From the the Azure Portal, open the resource group you deployed this solution to.
- Find the Azure Functions App, click the resource and get to its overview page.
- On left panel, under section Settings, click Environment variables. Under the App, locate CUSTOM_BUILT_MODEL_ID click it and replace the default value with your composite model id.
- click OK and then Save. After this, your Azure Functions app will work with this document intelligence extraction model.