layout-yolo-models
contains two YOLOv5m checkpoints trained for page layout analysis on classical commentaries. Detailed information can be found in this paper.
This notebook offers a quick demo of how the models can be used to segment other commentaries available on the Internet Archive.
Models are trained with the datasets used in the paper, including under copyright commentaries. Public domain data has been released as the GT4HistCommentLayout dataset. Please refer to stats.txt
for detailed statistics about the data.
More
Labelling regions follows the Segmonto-compatible annotations proposed in the paper:
Region | Coarse | Fine |
---|---|---|
commentary | MainZone | MainZone:commentary |
critical apparatus | MarginTextZone | MarginTextZone:criticalApparatus |
footnotes | MarginTextZone | MarginTextZone:footnotes |
page number | NumberingZone | NumberingZone:pageNumber |
text number | NumberingZone | NumberingZone:textNumber |
bibliography | MainZone | MainZone:bibliography |
handwritten marginalia | MarginTextZone | MarginTextZone:handwrittenNote |
index | MainZone | MainZone:index |
others | CustomZone | CustomZone |
printed marginalia | MarginTextZone | MarginTextZone:printedNote |
table of contents | MainZone | MainZone:ToC |
title | TitlePageZone | TitlePageZone |
translation | MainZone | MainZone:translation |
appendix | MainZone | MainZone:appendix |
introduction | MainZone | MainZone:introduction |
preface | paratext | MainZone:preface |
primary text | MainZone | MainZone:primaryText |
running header | RunningTitleZone | RunningTitleZone |
Each model is trained with its respective label set (coarse
or fine
).
More
name | value |
---|---|
Epochs | 200 |
Hyperparameters | YOLOv5 defaults |
Image size | 1280 |
YOLO version | YOLOv5, v6.1-356-g4d8d84b |
Model size | YOLOv5m |
More
Results are computed on the test set (see paper) using mean-average-precision, which yields results inferior to YOLOv5's native evaluation tool.
mAP | MainZone | MarginTextZone | NumberingZone | RunningTitleZone | TitlePageZone |
---|---|---|---|---|---|
0.662 | 0.862 | 0.750 | 0.892 | 0.950 | 0.133 |
mAP | CustomZone:other | MainZone:ToC | MainZone:appendix | MainZone:bibliography | MainZone:commentary | MainZone:introduction | MainZone:preface | MainZone:primaryText | MainZone:translation | MarginTextZone:criticalApparatus | MarginTextZone:footnote | MarginTextZone:printedNote | NumberingZone:pageNumber | NumberingZone:textNumber | RunningTitleZone | TitlePageZone |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.690006852 | 0.483957231 | 0 | 0.83333331 | 0.75 | 0.93403023 | 0.78166848 | 0.69999999 | 0.64651763 | 0.85653406 | 0.8403641 | 0.71988797 | 0.66250002 | 0.96583301 | 0.88592309 | 0.93956083 | 0.04 |
Here's a quick example of a good prediction...
...and of a bad prediction (in this case, two regions overlapping).
If you use this dataset in your research, please cite the following publication:
@inproceedings{najem-meyer_page-layout-analysis_2022,
title = {Page {{Layout Analysis}} of {{Text-heavy Historical Documents}}: A {{Comparison}} of {{Textual}} and {{Visual Approaches}}},
booktitle = {Proceedings of the {{Conference}} on {{Computational Humanities Research}} 2022},
author = {{Najem-Meyer}, Sven and Romanello, Matteo},
year = {2022},
pages = {36--54},
publisher = {{CEUR-WS}},
address = {{Antwerp}},
url = {https://ceur-ws.org/Vol-3290/long_paper8670.pdf}
}
Models in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.