Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for ICON Indonesian Constituency Treebank #368

Open
SamuelCahyawijaya opened this issue Nov 28, 2023 · 0 comments
Open

Comments

@SamuelCahyawijaya
Copy link
Member

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?icon

Dataset icon
Description In this work, we publish ICON (Indonesian CONstituency treebank), a manually-annotated benchmark Indonesian constituency treebank with a size of 10,000 sentences and approximately 124,000 constituents and 182,000 tokens, which can support the training of state-of-the-art transformer-based models. We use 15 phrase level tags and 24 POS tags. The sentences were taken from Wikipedia (3000) and news articles (7000).
License CC-BY-SA 4.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant