The code and dataset for IJCAI 2022 paper "Plane Geometry Diagram Parsing".
We propose the PGDPNet, the first end-to-end deep learning model for explicit geometry diagram parsing. And we construct a large-scale dataset PGDP5K, containing dense and fine-grained annotations of primitives and relations. Our method demonstrates superior performance of diagram parsing, outperforming previous methods remarkably.
- Complete submission of the initial model (21/4/2022)
You could download the dataset from Dataset Homepage.
"name": {
"file_name": ...,
"width": ...,
"height": ...,
"geos": {
"points": [id, loc(x, y)],
"lines": [id, loc(x1, y1, x2, y2)],
"circles": [id, loc(x, y, r, quadrant)]
},
"symbols": [id, sym_class, text_class, text_content, bbox(x, y, w, h)],
"relations": {
"geo2geo": [point2line(online, endpoint), point2circle(oncircle, center)],
"sym2sym": [...],
"sym2geo": [...]
}
}
"name": {
"point_instances": [...],
"line_instances": [...],
"circle_instances": [...],
"diagram_logic_forms": [
PointLiesOnLine, PointLiesOnCircle, Equals, MeasureOf, Perpendicular,
Parallel, LengthOf, ...
],
"point_positions": {...}
}
- Python version: 3.8
- CUDA version: 10.1
- GCC version: 5.4.0
- Other settings refer to requirements.txt
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
conda install -c dglteam dgl-cuda10.1==0.6.1
pip install -r requirements.txt
We use 4 NVIDIA TITAN Xp GPUs for the training and more GPUs with large batch size will bring some performance improvment.
The following will install the lib with symbolic links, so that you can modify the files if you want and won't need to re-build it.
python setup.py build develop --no-deps
At first, you should set the paths of dataset in the ./geo_parse/config/paths_catalog.py
. Change the varibles of DATA_DIR
, PGDP5K_train
, PGDP5K_val
and PGDP_test
according to the location of PGDP5K dataset. The default parameter configurations are set in the config files of ./configs/PGDP5K/geo_MNV2_FPN.yaml
and ./geo_parse/config/defaults.py
, and you could adjust them according to your situations.
python -m torch.distributed.launch \
--nproc_per_node=4 \
--master_port=$((RANDOM + 10000)) \
tools/train_net.py \
--config-file configs/PGDP5K/geo_MNV2_FPN.yaml \
SOLVER.IMS_PER_BATCH 12 \
TEST.IMS_PER_BATCH 4 \
OUTPUT_DIR training_dir/PGDP5K_geo_MNV2_FPN
The training records of the PGDPNet are saved in the folder OUTPUT_DIR
, including models, log, last checkpoint and inference results.
Set the path of model weight and corresponding config file to get inference results, and the parsing results are saved in the new folder .\inference
by default.
python tools/test_net.py \
--config-file configs/PGDP5K/geo_MNV2_FPN.yaml \
MODEL.WEIGHT training_dir/PGDP5K_geo_MNV2_FPN/model_final.pth \
TEST.IMS_PER_BATCH 1
The inference process use one GPU with batch size 1 in default. Due to effect of image resolution in the preprocessing, it has some difference ammong experimental results with various batch sizes. And you could reduce image resolutions appropriatly to accelerate inference while maintaining comparable performance.
Considering the diversity and equality of logic forms, we improved the evaluation method based on Inter-GPS. You can evaluate the generated logic forms compared with the ground truth by setting paths of test set (test_set_path
), ground truth of logic form (diagram_gt
) and predication of logic form (diagram_pred
):
cd ./InterGPS/diagram_parser/evaluation_new
python calc_diagram_accuracy.py \
--test_set_path ./PGDP5K/test \
--diagram_gt ./PGDP5K/our_diagram_logic_forms_annot.json \
--diagram_pred ./inference/PGDP5K_test/logic_forms_pred.json
InterGPS | PGDPNet w/o GNN |
PGDPNet | ||
All | Likely Same | 65.7 | 98.4 | 99.0 |
Almost Same | 44.4 | 93.1 | 96.6 | |
Perfect Recall | 40.0 | 79.7 | 86.2 | |
Totally Same | 27.3 | 78.2 (+50.9) | 84.7 (+6.5) | |
Geo2Geo | Likely Same | 63.9 | 99.1 | 99.0 |
Almost Same | 49.4 | 97.3 | 97.1 | |
Perfect Recall | 78.7 | 96.9 | 97.4 | |
Totally Same | 40.8 | 93.6 | 94.5 | |
Non-Geo2Geo | Likely Same | 67.3 | 95.8 | 98.0 |
Almost Same | 49.8 | 88.2 | 94.9 | |
Perfect Recall | 45.7 | 81.3 | 87.0 | |
Totally Same | 40.5 | 80.6 | 86.4 |
We also realize the demo script in the demo/PGDP_Demo.ipynb
. Because this project has not implemented a text recognizer, only samples from the PGDP5K can be tested at this time whose text contents are set as ground truth. During use, you could adjust corresponding variables in the demo script, such as config-file
, weights
, MODEL.DEVICE
and img_path
.
If the paper, the dataset, or the code helps you, please cite the papers in the following format:
@inproceedings{Zhang2023PGPS,
title = {A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram},
author = {Zhang, Ming-Liang and Yin, Fei and Liu, Cheng-Lin},
booktitle = {IJCAI},
year = {2023},
}
@inproceedings{Zhang2022,
title = {Plane Geometry Diagram Parsing},
author = {Zhang, Ming-Liang and Yin, Fei and Hao, Yi-Han and Liu, Cheng-Lin},
booktitle = {Proceedings of the Thirty-First International Joint Conference on
Artificial Intelligence, {IJCAI-22}},
pages = {1636--1643},
year = {2022},
month = {7},
doi = {10.24963/ijcai.2022/228},
}
@article{Hao2022PGDP5KAD,
title={PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems},
author={Yihan Hao and Mingliang Zhang and Fei Yin and Linlin Huang},
journal={2022 26th International Conference on Pattern Recognition (ICPR)},
year={2022},
pages={1763-1769}
}
The codes of this project are based on FCOS and Inter-GPS. Please let us know if you encounter any issues. You could contact with the first author (zhangmingliang2018@ia.ac.cn) or leave an issue in the github repo.