page_type | languages | products | urlFragment | name | description | azureDeploy | |||
---|---|---|---|---|---|---|---|---|---|
sample |
|
|
azure-hocr-generator-sample |
hOCR generator sample skill for AI search |
This custom skill generates an hOCR document from the output of the OCR skill. |
This custom skill generates an hOCR document from the output of the OCR skill.
This skill has no additional requirements than the ones described in the root README.md
file.
This function doesn't require any application settings.
{
"values": [
{
"recordId": "r1",
"data": {
"ocrImageMetadataList": [
{
"layoutText": {
"language": "en",
"text": "Hello World. -John",
"lines": [
{
"boundingBox": [
{ "x": 10, "y": 10 },
{ "x": 50, "y": 10 },
{ "x": 50, "y": 30 },
{ "x": 10, "y": 30 }
],
"text": "Hello World."
},
{
"boundingBox": [
{ "x": 110, "y": 10 },
{ "x": 150, "y": 10 },
{ "x": 150, "y": 30 },
{ "x": 110, "y": 30 }
],
"text": "-John"
}
],
"words": [
{
"boundingBox": [
{ "x": 10, "y": 10 },
{ "x": 50, "y": 10 },
{ "x": 50, "y": 14 },
{ "x": 10, "y": 14 }
],
"text": "Hello"
},
{
"boundingBox": [
{ "x": 10, "y": 16 },
{ "x": 50, "y": 16 },
{ "x": 50, "y": 30 },
{ "x": 10, "y": 30 }
],
"text": "World."
},
{
"boundingBox": [
{ "x": 110, "y": 10 },
{ "x": 150, "y": 10 },
{ "x": 150, "y": 30 },
{ "x": 110, "y": 30 }
],
"text": "-John"
}
]
},
"imageStoreUri": "https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff",
"width": 40,
"height": 200
}
],
"wordAnnotations": [
{
"value": "Hello",
"description": "An annotation on 'Hello'"
}
]
}
}
]
}
{
"values": [
{
"recordId": "r1",
"data": {
"hocrDocument": {
"metadata": "\r\n <?xml version='1.0' encoding='UTF-8'?>\r\n <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>\r\n <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>\r\n <head>\r\n <title></title>\r\n <meta http-equiv='Content-Type' content='text/html;charset=utf-8' />\r\n <meta name='ocr-system' content='Microsoft Cognitive Services' />\r\n <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>\r\n </head>\r\n <body>\r\n<div class='ocr_page' id='page_0' title='image \"https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff\"; bbox 0 0 40 200; ppageno 0'>\r\n<div class='ocr_carea' id='block_0_1'>\r\n<span class='ocr_line' id='line_0_0' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_0_0' title='bbox 10 10 50 14' data-annotation='An annotation on 'Hello''>Hello</span>\r\n<span class='ocrx_word' id='word_0_0_1' title='bbox 10 16 50 30' >World.</span>\r\n</span>\r\n<span class='ocr_line' id='line_0_1' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_1_2' title='bbox 110 10 150 30' >-John</span>\r\n</span>\r\n</div>\r\n</div>\r\n\r\n</body></html>",
"text": "Hello World. -John "
}
},
"errors": [],
"warnings": []
}
]
}
In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Generate HOCR for webpage rendering",
"uri": "[AzureFunctionEndpointUrl]/api/hocr-generator?code=[AzureFunctionDefaultHostKey]",
"batchSize": 1,
"context": "/document",
"inputs": [
{
"name": "ocrImageMetadataList",
"source": "/document/normalized_images/*/ocrImageMetadata"
},
{
"name": "wordAnnotations",
"source": "/document/acronyms"
}
],
"outputs": [
{
"name": "hocrDocument",
"targetName": "hocrDocument"
}
]
}