Skip to content

titaenstad/norwegian_rhyme_scheme_corpus

Repository files navigation

NoRSC: Norwegian Rhyme Scheme Corpus

This is a corpus of rhyme scheme annotated poetry extracted from the Public Domain Texts from NBdigital (no: Fritt tilgjengelege tekster frå NBdigital) corpus from Språkbanken.

Each poem is split into stanzas, and each stanza is annotated on rhyme scheme.

Data set

This data set consists of 5158 stanzas, or 26 198 lines annotated with rhyme scheme codes.
This poetry stems from 11 books from the source corpus. We will hopefully be able to expand this in the future.

File structure

The directory poems contain unannotated poetry files.
The file tsvs/tita_rhymes_poems.tsv contains v1 of the complete rhyme annotated poetry set.
The file tsvs/norwegian_rhyme_scheme_corpus_v11.tsv contains version 1.1 of the complete rhyme annotated poetry set (89 stanzas are re-annotated).

Rhyme annotated word pairs and sentence pairs

The file tsvs/positive_pairs.tsv contains 7238 positive rhyme pairs.
The file tsvs/negative_pairs.tsv contains 22 447 negative rhyme pairs.
The file tsvs/rhyme_sentence_pairs.tsv contains 35 409 sentence pairs annotated with rhyme (1=rhyme, 0=not rhyme).
(These are from v.1)

How to help annotate:

Read the annotation tool README

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published