From 614accf3667084f3a5167adc791cc634bdee9a15 Mon Sep 17 00:00:00 2001
From: Iain <iain.mcconnell@gmail.com>
Date: Mon, 7 Dec 2020 10:00:40 -0600
Subject: [PATCH] add enrich document paragraph

---
 docsrc/source/ingest.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/docsrc/source/ingest.rst b/docsrc/source/ingest.rst
index 27673594..56664489 100644
--- a/docsrc/source/ingest.rst
+++ b/docsrc/source/ingest.rst
@@ -61,6 +61,15 @@ Word Embeddings
 We provide the option to train word embeddings on top of the extracted corpuses. We use _FastText to train over the extracted
 corpus at ingestion time. The resulting embeddings are saved to disk.
 
+Context Enrichment
+------------------
+
+If the enrich option is enabled at ingest time, all output parquet files from the Ingest process are enhanced with
+semantic context. For every table or table caption row in those output parquets every mention of the table label for
+that row is detected within all the content (text) from that document. That context is then appended to the output
+parquets as a duplicate of the original table or table caption row with the original content replaced by all the available
+context text. Searches on context-enriched data that include relevant context should return the tables themselves.
+
 
 .. _preprint: https://arxiv.org/abs/1910.12462
 .. _PDFMiner.six: https://github.com/pdfminer/pdfminer.six