Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 2.12 KB

README.md

File metadata and controls

23 lines (12 loc) · 2.12 KB

Spoken Komi Corpus: Erik Vászolyi

This is a subset of the Spoken Komi Corpus. It contains recordings carried out by Erik Vászolyi in Komi Republic in the 1950s and 1960s. The recordings have been archived in several institutions, and the goal of this work is to connect different transcription versions and digitized files to one another in an openly accessible database.

In our reference system we use the recording signums that are used in the Tape Archive of the Finnish Language located at the Institute for the Languages of Finland.

The dataset accompanies and forms a part of the Spoken Komi Corpus that is being being archived at the Language Archive under the node Permic Varieties. Our team is also continuously working to make the materials available through the Language Bank of Finland. Additionally, those materials that can be openly licensed are stored in GitHub.

The work is done within the project Language Documentation meets Language Technology: The Next Step in the Description of Komi funded by the Kone Foundation.

Authors: Rogier Blokland, Niko Partanen and Michael Rießler.

Structure

Directory data contains ELAN files. They are associated with audio recordings that are not distributed in this data package, but can be acquired from the archives where they are stored. Exact information about their locations in the archives can be found from audio/README.md.

Unfinished tasks

Work like this is always under improvements. Currently there are several transcription and translation tiers that have not been checked or added. The work is being updated continuously, and we aim to improve the documentation. If there are any unclear aspects of this work, please open an issue in this GitHub repository.

License

The transcriptions published in the Specimina Sibirica series have been made available here with permission. These, and our own transcriptions, are available under CC-BY license.