hOCR Generator

This custom skill generates an hOCR document from the output of the OCR skill.

Requirements

This skill has no additional requirements than the ones described in the root README.md file.

Settings

This function doesn't require any application settings.

Deployment

Sample Input:

{
	"values": [
	    {
	        "recordId": "r1",
	        "data": {
	            "ocrImageMetadataList": [
	                {
	                    "layoutText": {
	                        "language": "en",
	                        "text": "Hello World. -John",
	                        "lines": [
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 10 },
	                                    { "x": 50, "y": 10 },
	                                    { "x": 50, "y": 30 },
	                                    { "x": 10, "y": 30 }
	                                ],
	                                "text": "Hello World."
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 110, "y": 10 },
	                                    { "x": 150, "y": 10 },
	                                    { "x": 150, "y": 30 },
	                                    { "x": 110, "y": 30 }
	                                ],
	                                "text": "-John"
	                            }
	                        ],
	                        "words": [
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 10 },
	                                    { "x": 50, "y": 10 },
	                                    { "x": 50, "y": 14 },
	                                    { "x": 10, "y": 14 }
	                                ],
	                                "text": "Hello"
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 10, "y": 16 },
	                                    { "x": 50, "y": 16 },
	                                    { "x": 50, "y": 30 },
	                                    { "x": 10, "y": 30 }
	                                ],
	                                "text": "World."
	                            },
	                            {
	                                "boundingBox": [
	                                    { "x": 110, "y": 10 },
	                                    { "x": 150, "y": 10 },
	                                    { "x": 150, "y": 30 },
	                                    { "x": 110, "y": 30 }
	                                ],
	                                "text": "-John"
	                            }
	                        ]
	                    },
	                    "imageStoreUri": "https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff",
	                    "width": 40,
	                    "height": 200
	                }
	            ],
	            "wordAnnotations": [
	                {
	                    "value": "Hello",
	                    "description": "An annotation on 'Hello'"
	                }
	            ]
	        }
	    }
	]
}

Sample Output:

{
    "values": [
        {
            "recordId": "r1",
            "data": {
                "hocrDocument": {
                    "metadata": "\r\n            <?xml version='1.0' encoding='UTF-8'?>\r\n            <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>\r\n            <html xmlns='http://www.w3.org/1999/xhtml' xml:lang='en' lang='en'>\r\n            <head>\r\n                <title></title>\r\n                <meta http-equiv='Content-Type' content='text/html;charset=utf-8' />\r\n                <meta name='ocr-system' content='Microsoft Cognitive Services' />\r\n                <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>\r\n            </head>\r\n            <body>\r\n<div class='ocr_page' id='page_0' title='image \"https://[somestorageaccount].blob.core.windows.net/pics/lipsum.tiff\"; bbox 0 0 40 200; ppageno 0'>\r\n<div class='ocr_carea' id='block_0_1'>\r\n<span class='ocr_line' id='line_0_0' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_0_0' title='bbox 10 10 50 14' data-annotation='An annotation on 'Hello''>Hello</span>\r\n<span class='ocrx_word' id='word_0_0_1' title='bbox 10 16 50 30' >World.</span>\r\n</span>\r\n<span class='ocr_line' id='line_0_1' title='baseline -0.002 -5; x_size 30; x_descenders 6; x_ascenders 6'>\r\n<span class='ocrx_word' id='word_0_1_2' title='bbox 110 10 150 30' >-John</span>\r\n</span>\r\n</div>\r\n</div>\r\n\r\n</body></html>",
                    "text": "Hello World. -John "
                }
            },
            "errors": [],
            "warnings": []
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Generate HOCR for webpage rendering",
    "uri": "[AzureFunctionEndpointUrl]/api/hocr-generator?code=[AzureFunctionDefaultHostKey]",
    "batchSize": 1,
    "context": "/document",
    "inputs": [
        {
            "name": "ocrImageMetadataList",
            "source": "/document/normalized_images/*/ocrImageMetadata"
        },
        {
            "name": "wordAnnotations",
            "source": "/document/acronyms"
        }
    ],
    "outputs": [
        {
            "name": "hocrDocument",
            "targetName": "hocrDocument"
        }
    ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

hOCR Generator

Requirements

Settings

Deployment

Sample Input:

Sample Output:

Sample Skillset Integration

Files

README.md

Latest commit

History

README.md

File metadata and controls

hOCR Generator

Requirements

Settings

Deployment

Sample Input:

Sample Output:

Sample Skillset Integration