Thai-Nutrition-Table-Extraction on GitHub
This project is created to extract data form Nutrition Table by using computer vision technique to help user collect data about nutrition for using with Health care application, Fitness app, etc.
NOTICE Thank to text-detection-ctpn and Tesseract, we using their source code for detect and recognize text in nutrition table.
- Computer running Linux or MacOS
- Python 3.7.1 or later
- Pip 20.1 or later
- Install Python libraries.
pip install -r requirements.txt
-
Check directory
text_detection/checkpoints_mlt
. If directory not exists, download the file from google drive or baidu yun. Then extract file and putcheckpoints_mlt/
intext-detection/
. -
Setup
nms
andbbox
. Because of the libraries are written in Cython, hence you have to build the library by using follow command.
cd text_detection/utils/bbox
chmod +x make.sh
./make.sh
-
Install Tesseract by following this document.
-
Install Tesseract pretrained to supporting Thai language by going to this page and download
tha.traineddata
. Then set theTESSDATA_PREFIX
environment variable and put file inESSDATA_PREFIX/tessdata/tha.traineddata
.
- The Thai Nutrition Table images are in
images/
directory.
- Run
main.py
to see result.
python main.py