DocVisor is an open-source visualization tool for document layout analysis. With DocVisor, it is possible to visualize data from three prominent document analysis tasks: Full Document Analysis, OCR and Box-Supervised Region Parsing. DocVisor offers various features such as ground-truth and intermediate output visualization, sorting data by key metrics as well as comparison of outputs from various other models simultaneously.
NOTE: A detailed documentation to this repository is provided at : https://ihdia.iiit.ac.in/docvisor/
DocVisor also supports visualization of some common datasets such as PubTabNet and DocBank.
There are two main components of the OCR Layout provided in the docvisor tool:
The user can select a substring of the text, and the corresponding portion in the image gets highlighted.
Note: This component will only work if you have attentions for your model.
Gif depicting the Text2Image mapping using the Text Selection feature provided in OCR layout of the DocVisor tool.
The user can select a sub-portion of the image, and the corresponding substring in the predicted text of models having attentions gets highlighted.
Gif depicting the Image2Text mapping using the Image Selection feature provided in OCR layout of the DocVisor tool.
The tool also supports data involving latex. The tool displays both the compiled and actual ground truth/ predicted string. This feature is presently only for non-attention models. The user can use this feature by setting the dtype
variable in the metadata to latex
.
Gif depicting the Latex support provided by the OCR layout of the docvisor tool.
Gif showing the visualization of ground Truth and predictions on the Indiscapes-v2 dataset using the Fully Automatic Layout.
Gif showing the visualization of ground Truth and predictions on the MS-COCO dataset using the Fully Automatic Layout.
Our Fully Automatic Layout can also be used to load and visulaize the PubTabNet dataset. A gif of the same is added below:
Gif showing the visualization of ground Truth and predictions on the PubTabNet dataset using the Fully Automatic Layout.
Our Fully Automatic Layout can also be used to load and visulaize the DocBank dataset. A gif of the same is added below:
Gif showing the visualization of ground Truth and predictions on the DocBank dataset using the Fully Automatic Layout.
Gif showing the region wise visualization of ground Truth and predictions on the indiscapes-v2 dataset.
If you have git installed on your local machine, run the following command to clone the docvisor repository.
git clone https://github.com/ihdia/docvisor
If you do not have git or you want to download the zip file, download the zip file from here and unzip the tool to any location on your divice.
There three main layouts of the DocVisor tool:
- Fully Automatic
- Box Supervised
- OCR
You can load one or more of these tools to the DocVisor tool at any given point in time.
- To load the Fully Automatic tool, prepare your datafiles as described here
- To load the Box Supervised tool, prepare your datafiles as described here
- To load the OCR tool, prepare your datafiles as described here
-
Create a conda environment and activate it, using the following command:
conda create --name docvisor python=3.7 conda activate docvisor
-
Ensure that the pip points to the docvisor environment by running
which pip
. If it does not, then run the following command:conda install pip
-
Install the requirements necessary:
pip install -r requirements.txt
- Place all the metaData files in one directory
The metaData directory will look like:
metaData/
- ocr_handwritten.json
- ocr_printed.json
- fullyAutomatic.json
- boxSupervised.json
- Change the path of the metaData file in the docvisor/config.py file.
Launch the tool by running ./run.sh
script.
We have provided an example folder in the repository for all the layouts. To load the example layouts, follow the steps below:
- Ensure that the
metaDataDir
field intool/config.py
is set toexample/metaData
- Run
./run.sh
script to load the app
NOTE: Ensure that the requirements have been installed. To do so, refer to this step.
For a detailed documentation of each tool, kindly visit the DocVisor's documentation page.