Skip to content

GT Corpora

Niko Partanen edited this page Mar 8, 2017 · 2 revisions

This page documents conventions, standards and relevant workflows used for storing the Freiburg-Tromsø Corpora at the Giellatekno corpus repositories.

Intro

The [Freiburg-based documentation projects|freiburg.html] archive their video and audio ressources along with catalogue and content metadata at TLA. The Giellatekno corpus repositories host copies of our ELAN annotation files.

Documentation of the Giellatekno corpus repositories

Workflows

Linking corpus data to multimedia stored at TLA

Access

The Giellatekno corpus repositories are divided in a free part (for texts in the public domain that we can redistribute freely) and a bound part (for texts that can be shared only upon agreement).

Some corpora are also available in this repository. It is worth noting that these are very large XML files (more than a gigabyte with Northern Saami), and at least to me it is bit unclear what is the most efficient and intended way to use them.

Clone this wiki locally