Structured and Unstructured Docs in Azure Search

To provide uniform search capabilities across both - structured (Azure SQL) and unstructured (Azure Blob) documents we need to combine indexed data in a single searchable index.

Create Azure resources
Index Schema
Scoring Profile
BlobIndexer
SQLIndexer
Custom Skillset
Web Application DotNet
WebApp Hosting
Reporting
SQL data preparation
Blob Data Preparation
References

Create Azure resources

Create Azure Search and Cognitive service

Index Schema

Prepare a schema that would be populated by indexers, we will use postman to send th REST API call

Import Postman collection/environment (refer to details in Search Index Creation folder)
Create index by sending POST API request CreateIndex
Fiedls created by Json definition contain fields that will be populated by indexer from SQL (e.g Event_Type, Source etc), from Blob data (content_type etc), by cognitive services skills ( locations, keyphrases, organizations) and by custom skill (EventType)

** Index Key ** filed ID is populated by both indexers and should be unique across both, sql populates it with PRIMARY KEY and Blob with unique path to the blob file.

Scoring Profile

Part of the index definition is also Scoring Profiles helping to boost keyphrases, project name and most recent high ranking records.

 "scoringProfiles": [  
	    {  
	      "name": "boostKeys",  
	      "text": {  
	        "weights": {  
	          "keyphrases": 1.5,  
	          "Project_Name": 5  
	        }  
	      }  
	    },  
	    {  
	      "name": "newAndHighlyRated",  
	      "functions": [  
	        {  
	          "type": "freshness",  
	          "fieldName": "LessonDate",  
	          "boost": 10,  
	          "interpolation": "quadratic",  
	          "freshness": {  
	            "boostingDuration": "P365D"  
	          }  
	        },  
	        {
	          "type": "magnitude",  
	          "fieldName": "rating",  
	          "boost": 10,  
	          "interpolation": "linear",  
	          "magnitude": {  
	            "boostingRangeStart": 1,  
	            "boostingRangeEnd": 5,  
	            "constantBoostBeyondRange": false  
	          }  
	        }  
	      ]  
	    }  
     ]

DataSources

Prepare SQL and Blob data (refer to the sections below) and define Blob and SQL DataSources using Rest API

For SQL DataSource enable change tracking fo the table to be sure Indexer only indexes updated fields:

 {
            "name": "sharepointexport",
            "description": "source for excel parsed data",
            "type": "azuresql",
            "subtype": null,
            "credentials": {
                "connectionString": "{{sqlConnectionUrl}};"
            },
            "container": {
                "name": "[Lessons$]"
            },
            "dataChangeDetectionPolicy": {"@odata.type" : "#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy"},
            "dataDeletionDetectionPolicy": null
}

Indexer setup

Now we will define how indexers will process data and populate index. Create Skillsets prior to creating indexers if they use them like in our example.

BlobIndexer

Refer to CreateBlobIndexer example in postman collection, it maps fields typically extracted by BlobIndexer to the schema we defined using fieldMappings :

     "fieldMappings": [
                {
                    "sourceFieldName": "metadata_storage_path",
                    "targetFieldName": "ID",
                    "mappingFunction": {
                        "name": "base64Encode",
                        "parameters": null
                    }
                },
                {
                    "sourceFieldName": "metadata_storage_path",
                    "targetFieldName": "metadata_storage_path",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "metadata_storage_path",
                    "targetFieldName": "Source",
                    "mappingFunction" : { "name" : "extractTokenAtPosition", "parameters" : { "delimiter" : "/", "position" : 4 } }
                },
                {
                	
                    "sourceFieldName": "metadata_last_modified",
                    "targetFieldName": "LessonDate",
                    "mappingFunction": null
                },
                {
                	
                    "sourceFieldName": "metadata_content_type",
                    "targetFieldName": "content_type",
                    "mappingFunction": null
                },
                {
                	
                    "sourceFieldName": "Project_Name",
                    "targetFieldName": "Project_Name",
                    "mappingFunction": null
                }
     ]

it' important to have ID mapped to uniquely identified field, and we map also fields with similar meaning with sql data. In this example above we also extract part of the Path in the blob container to serve as Source field data. And we map the custom Project_Name metadata from blob record to the index as well.

so that example Blob below will get it's all firelds extracted

Indexer also defines how to map fields extracted by cognitive skillset we are using with it.

{
            
            "name": "blobtosqlindexer",
            "description": "combined indexer",
            "dataSourceName": "cac-items",
            "skillsetName": "cogniskillset",
            "targetIndexName": "lessons-index",
            "outputFieldMappings": [
                {
                    "sourceFieldName": "/document/merged_content/organizations",
                    "targetFieldName": "organizations",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "/document/merged_content/locations",
                    "targetFieldName": "locations",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "/document/merged_content/keyphrases",
                    "targetFieldName": "keyphrases",
                    "mappingFunction": null
                }
            ]
}

SQLIndexer

Refer to CreateSqlIndexer example in postman collection, it maps fields extracted by SqlIndexer to the schema we defined using fieldMappings :

 {
           
            "name": "sqldata-indexer",
            "description": "opt",
            "dataSourceName": "sharepointexport",
            "skillsetName": "sqlstoreskillset",
            "targetIndexName": "lessons-index",
            "disabled": null,
            "schedule": null,
            "parameters": {
                "batchSize": null,
                "maxFailedItems": null,
                "maxFailedItemsPerBatch": null,
                "base64EncodeKeys": false,
                "configuration": {}
            },
            "fieldMappings": [
            	 {
                    "sourceFieldName": "Lesson_Learned_Comment",
                    "targetFieldName": "content",
                    "mappingFunction": null
                }
            ],
            "outputFieldMappings": [
                {
                    "sourceFieldName": "/document/content/locations",
                    "targetFieldName": "locations",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "/document/content/organizations",
                    "targetFieldName": "organizations",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "/document/content/keyphrases",
                    "targetFieldName": "keyphrases",
                    "mappingFunction": null
                },
                {
                    "sourceFieldName": "/document/EventType",
                    "targetFieldName": "EventType",
                    "mappingFunction": null
                }
            ]
        }

Once indexers created they start process of indexing and populating the search data.

Custom Skillset

In our example CreateSkillset we have created a skillset that uses cognitive services to extract data, skill to store extracted data in knowledge store and custom skill that maps various entries of Event_Type to only two possible values in the new field EventType Our custom skill is hosted in Azure Function and here we define what data will be paased to the skill and what is output:

{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Our new OPS Mapping Event Type custom skill",
"uri": "{{funcUrl}}",
"batchSize": 100,
"context": "/document",
"inputs": [
    {
        "name": "Event_Type",
        "source": "/document/Event_Type"
    }
    ],
    "outputs": [
    {
        "name": "eventType",
        "targetName": "EventType"
    }
    ]
}

Note that when we will create Indexer it will define in outputMappings that EventType field coming from custom skill will be mapped to Indexed field.

Publish Custom Skill to Azure Function and test it using TestFuncSkill

You could see the data processes in Function Logs:

Web Application DotNet

To provide UI app for the serach use following template Web UI Template and customize it as per the requirements.

See the folder WebApp-Dotnet for more details

WebApp Hosting

To host webapp use App Service service connected with AppInsights.

Use Visual Studio (or CI/CD) to publish the application - for tutorial refer to Publish a Web app to Azure App Service using Visual Studio

Reporting

Reporting contains optional pre-built PowerBI reports that can be used to monitor your solution and to understand user search behavior. They leverage data captured via Application Insights and extended to use Knowledge Store and can be modified to meet your particular business objectives.

SQL data preparation

To import structured data from Excel create Azure SQL database and create a table with all the fields coming from excel columns and add IDENTITY column that would searve as unique key for the index

CREATE TABLE [dbo].[Lessons$] (
ID int IDENTITY(1, 1) PRIMARY KEY,
[Project_Name] nvarchar(255),
[Line_Of_Business] nvarchar(255),
[PM_Phase] nvarchar(255),
[Event_Type] nvarchar(255),
[Lessons_Learned_Category] nvarchar(255),
[Lesson_Learned_Comment] nvarchar(max),
[Project_Type] nvarchar(255),
[Source] nvarchar(255),
[LessonDate] datetime
)

And Enable Change tracking as defined in Enable Change TRacking in SQL Server

ALTER DATABASE datasearch  
SET CHANGE_TRACKING = ON  
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)  
  

ALTER TABLE [dbo].[Lessons$] 
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON)

Use Tasks-> Import Data wizard to import excel data, choose Microsoft Excel as input datasource

and OLEDB provider for SQL server as destination

Verify all the mappings

Blob Data Preparation

Create Azure Storage account and Blob cotainer, upload documents in two folders based on their Source.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Custom Skills/CognitiveSearch.Skills		Custom Skills/CognitiveSearch.Skills
Reporting		Reporting
Search Index Creation		Search Index Creation
WebApp-Dotnet		WebApp-Dotnet
WebApp-React		WebApp-React
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured and Unstructured Docs in Azure Search

Create Azure resources

Index Schema

Scoring Profile

DataSources

Indexer setup

BlobIndexer

SQLIndexer

Custom Skillset

Web Application DotNet

WebApp Hosting

Reporting

SQL data preparation

Blob Data Preparation

References

About

Releases

Packages

Languages

lenisha/azure-search-custom-skills

Folders and files

Latest commit

History

Repository files navigation

Structured and Unstructured Docs in Azure Search

Create Azure resources

Index Schema

Scoring Profile

DataSources

Indexer setup

BlobIndexer

SQLIndexer

Custom Skillset

Web Application DotNet

WebApp Hosting

Reporting

SQL data preparation

Blob Data Preparation

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages