Project page | Paper | Demonstration | Poster | Springer
This repository contains code for the paper
"Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval" Siddhant Bansal, Praveen Krishnan, C.V. Jawahar published in DAS 2020.
Click on the image to play the video.
git clone https://github.com/Sid2697/Word-recognition-and-retrieval.git
- Python >= 3.5
- PyTorch
- Scikit-learn
- NumPy
- tqdm
requirements.txt
has been provided for installing Python dependencies.
pip install -r requirements.txt
The deep embeddings used in this work are generated using the End2End network proposed in:
Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). pp. 1–6 (April 2018). https://doi.org/10.1109/DAS.2018.70
Word text and image's deep embeddings for testing this repository are provided in the embeddings
folder.
Text files containing the information about the embeddings are required while running the code. They are in the format
<img1-path><space><text1-string><space><dummyInt><space>1
<img2-path><space><text2-string><space><dummyInt><space>1
...
One can refer to and use https://github.com/kris314/hwnet for generating embeddings.
For the purpose of making it easier to explore the code in this repository, sample text files and embeddings are provided in gen_files
and embeddings
, respectively.
Original Dataset used in this work will be released by CVIT soon.
For running word recognition use the command:
python word_recognition.py
For running word recognition with confidence score use the command:
python word_recognition.py --use_confidence=True
Other arguments for word recognition experiment are:
--image_embeds
--topk_embeds
--predictions_file
--image_file
image_embeds
is used to provide path to the image embeddingstopk_embeds
is used to provide path to the TopK predictions' embeddingsimage_file
is used to provide path to the image's text information filepredictions_file
is used to provide path to the TopK predictions' text information file
For running word retrieval use the command:
python word_retrieval.py
For running word retrieval's naive merge experiment use the command:
python word_retrieval.py --experiment_label=naive_merge
Other options for experiment_label
are: ocr_rank
and query_expand
Other major arguments for word retrieval experiment are:
text_features
is used to provide path to the text embeddingsimage_features
is used to provide path to the image embeddingsannotations_path
is used to provide path to the text file containing annotationsocr_opt_path
is used to provide path to the text file containing OCR predictions
The software is licensed under the MIT License. If you find this work useful, please cite the following paper:
@InProceedings{10.1007/978-3-030-57058-3_22,
author="Bansal, Siddhant and Krishnan, Praveen and Jawahar, C. V.",
title="Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval",
booktitle="Document Analysis Systems",
year="2020",
publisher="Springer International Publishing",
pages="309--323",
isbn="978-3-030-57058-3"
}
In case of any query contact Siddhant Bansal.