Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for CORD #180

Open
SamuelCahyawijaya opened this issue Aug 2, 2022 · 0 comments
Open

Create dataset loader for CORD #180

SamuelCahyawijaya opened this issue Aug 2, 2022 · 0 comments

Comments

@SamuelCahyawijaya
Copy link
Member

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?cord_v2

Dataset cord_v2
Description In this paper, we introduce a novel dataset called CORD, which stands for a Consolidated Receipt Dataset for post-OCR parsing. To the best of our knowledge, this is the first publicly available dataset which includes both box-level text and parsing class annotations. The parsing class labels are provided in two-levels. The eight superclasses include store, payment, menu, subtotal, and total. The eight superclasses are subdivided into 54 subclasses e.g., store has nine subclasses including name, address, telephone, and fax.
Furthermore, it also provides line annotations for the serialization task which is a newly emerging problem as a combination of the two tasks.
License CC-BY 4.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants