Create dataset loader for CORD #180

SamuelCahyawijaya · 2022-08-02T01:35:59Z

Dataset	cord_v2
Description	In this paper, we introduce a novel dataset called CORD, which stands for a Consolidated Receipt Dataset for post-OCR parsing. To the best of our knowledge, this is the first publicly available dataset which includes both box-level text and parsing class annotations. The parsing class labels are provided in two-levels. The eight superclasses include store, payment, menu, subtotal, and total. The eight superclasses are subdivided into 54 subclasses e.g., store has nine subclasses including name, address, telephone, and fax.
Furthermore, it also provides line annotations for the serialization task which is a newly emerging problem as a combination of the two tasks.
License	CC-BY 4.0

muhsatrio added the hacktoberfest label Oct 2, 2022

Provide feedback