Skip to content

Latest commit

 

History

History
186 lines (153 loc) · 12.9 KB

README.md

File metadata and controls

186 lines (153 loc) · 12.9 KB

DOI License: CC BY-NC-SA 4.0 Build Status

otmm_tonic_dataset

The tonic test datasets for classical Ottoman-Turkish makam music

Introduction

This repository contains datasets of annotated tonic frequencies of the audio recordings of Ottoman-Turkish makam music.

If you use the dataset in your work, please cite:

Şentürk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.

The annotations are compiled from several research papers published under the CompMusic project. For more information about the original datasets, please refer to the relevant paper.

There are approximately 2000 recordings annotated in the latest version. Each recording is annotated by at least one expert and half of the recordings are annotated by at least two annotators. When the score is available, score-informed tonic identification (Şentürk, S., 2016) is applied to the recording. The result is included in the dataset after it is verified by a human.

For detailed statistics, please refer to the the Jupyter notebook, extras/statistics.ipynb.

Erratum

In November 2016, we discovered several errors in the tonic annotations. Since then, approximately 45 percent of the recordings have been verified by a human annotator and/or by the score-informed tonic identification method proposed in (Şentürk, S., Gulati, S., and Serra, X., 2013). This method is reported to provide near perfect results (>99% on paper's dataset). In addition, the annotations of each recording are cross-validated automatically among each other using continous integration (see: the section Automatic Validation for details).

So far, we have validated 2000 annotations and changed around 100 of them. This correspond to a human error of 5%, which is acceptable given the rigor of the task. Note that most of the fixes have been simply adjusting the annotated tonic frequency into a finer precision (<20 cents).

Annotation structure

The data is stored in the JSON file, annotations.json and organized as a dictionary of recordings. Each annotated recording is uniquely identified with a MusicBrainz MBID. The annotations are stored as a list of dictionaries. Each annotation (in the list) includes the annotated frequency, source dataset, relevant publication, time interval, tonic symbol, observations of the annotator and if the octave of the annotated value is considered (for example, the octave is ambiguous in orchestral instrumental recordings).

An example recording is displayed below:

"ed189797-5c50-4fde-abfa-cb1c8a2a2571": {
  "mbid": "http://musicbrainz.org/recording/ed189797-5c50-4fde-abfa-cb1c8a2a2571", 
  "verified": true, 
  "annotations": [
    {
      "time_interval": [
        1, 
        244
      ], 
      "citation": "\u015eent\u00fcrk, S., Gulati, S., and Serra, X. (2013). Score Informed Tonic Identification for Makam Music of Turkey. In Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pages 175\u2013180, Curitiba, Brazil.", 
      "tonic_symbol": "A4", 
      "source": "https://github.com/MTG/otmm_tonic_dataset/blob/7f28c1a3261b9146042155ee5e0f9e644d9ebcfa/senturk2013karar_ismir/tonic_annotations.csv", 
      "value": 175.7, 
      "octave_wrapped": true, 
      "observations": "The musicians start playing (in Isfahan Pe\u015frev) the tonic approximately at 175Hz."
    }, 
    {
      "time_interval": [
        1, 
        244
      ], 
      "citation": "Atl\u0131, H. S., Bozkurt, B., \u015eent\u00fcrk, S. (2015). A Method for Tonic Frequency Identification of Turkish Makam Music Recordings. In Proceedings of 5th International Workshop on Folk Music Analysis (FMA 2015), pages 119\u2013122, Paris, France.", 
      "tonic_symbol": "A4", 
      "source": "https://github.com/MTG/otmm_tonic_dataset/blob/7f28c1a3261b9146042155ee5e0f9e644d9ebcfa/atli2015tonic_fma/TD2.csv", 
      "value": 175.0, 
      "octave_wrapped": true, 
      "observations": "The musicians start playing (in Isfahan Pe\u015frev) the tonic approximately at 175Hz."
    }, 
    {
      "time_interval": [
        245, 
        324
      ], 
      "citation": "\u015eent\u00fcrk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.", 
      "tonic_symbol": "A4", 
      "source": "https://github.com/MTG/otmm_tonic_dataset/tree/senturk2016thesis", 
      "value": 185.0, 
      "octave_wrapped": true, 
      "observations": "At the 245th second mark, the virtuosos somehow lose their coordination and the melodic intervals are mixed. The tonic played at the conclusion (e.g. the karar note) of the first performance (Isfahan Pe\u015frev) is around 185 Hz."
    }, 
    {
      "time_interval": [
        326, 
        866
      ], 
      "citation": "\u015eent\u00fcrk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.", 
      "tonic_symbol": "A4", 
      "source": "https://github.com/MTG/otmm_tonic_dataset/tree/senturk2016thesis", 
      "value": 169.0, 
      "octave_wrapped": true, 
      "observations": "Isfahan Sazsemaisi has a relatively stable tonic frequency at around 169Hz. Note that the historical recordings tend to have local pitch shifts which makes it hard to identify a precise or correct tonic frequency."
    }, 
    {
      "time_interval": [], 
      "music_score": "https://github.com/MTG/SymbTr/tree/v2.4.3/txt/isfahan--pesrev--devrikebir----tanburi_cemil_bey", 
      "citation": "\u015eent\u00fcrk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.", 
      "value": 88.0, 
      "source": "https://github.com/sertansenturk/tomato/blob/v0.9.1/tomato/joint/jointanalyzer.py#L90", 
      "observations": "Tonic identified from the note models obtained by joint audio-score analysis", 
      "tonic_symbol": "A4", 
      "octave_wrapped": true
    }, 
    {
      "time_interval": [], 
      "music_score": "https://github.com/MTG/SymbTr/tree/v2.4.3/txt/isfahan--sazsemaisi--aksaksemai----tanburi_cemil_bey", 
      "citation": "\u015eent\u00fcrk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.", 
      "value": 167.3, 
      "source": "https://github.com/sertansenturk/tomato/blob/v0.9.1/tomato/joint/jointanalyzer.py#L90", 
      "observations": "Tonic identified from the note models obtained by joint audio-score analysis", 
      "tonic_symbol": "A4", 
      "octave_wrapped": true
    }
  ]
}

Below, each dictionary key is explained in detail:

mbid: String. The URL of the recording MBID in MusicBrainz   verified: Boolean. True means all annotations in the recording have been verified by another person within a window of 20 cents to the actual tonic frequency. See Seeger, C. (1958) for the musicological justification of the cent precision. annotations: List. Holds the list of annotation dictionaries
time_interval: 2 x 1 List of Floats. The start and end time stamp of the tonic annotation in the recording. It is used when the tonic frequency (or symbol) changes within the performance. If there is no change, its value is empty.
citation: String. Relevant research paper the annotation is taken from.
value: Float. The annotation frequency in Hz.
source: String. The URL where the annotation is originally taken from. It points to the relevant commit/tag and file, where applicable. Note that the value might be different from the original by the final verifier.
tonic_symbol: String. Symbol of the tonic note according to the AEU theory. It is given in the SymbTr format, i.e. [letter][octave][accidental][comma]. Example: B4b1.
octave_wrapped: Boolean. False, if the annotator did not (or is not able to) consider the octave of the tonic.
observations: String. The comments provided by the annotator.
music_score: String (joint audio-score analysis only). The name of the SymbTr-score used in the joint analysis

Additional resources

Most of the recordings in this dataset cannot be shared due to copyright. However relevant features are already computed and they can be downloaded from the Dunya-makam after registration. Please refer to the API documentation (http://dunya.compmusic.upf.edu/docs/makam.html) to how to access the data.

During verification, several annotations are removed from time to time due to practical reasons. These recordings are listed in removed.json. You can inspect the json file to see why each particular recording is removed.

Automatic validation

After each commit the annotations in the dataset are validated automatically by running several tests using Travis CI (link). Currently the tests are:

  1. Cross-checking whether all annotations of a recording are at maximum 20 cents apart from each other. Recordings with tonic varying over time are omitted.
  2. The removed annotations in removed.json are not re-introduced by mistake.

The tests also report several warnings:

  1. The recordings with annotations, which do not have the tonic symbol written.
  2. The number of recordings, which only have a single annotation, hence not cross-checked.
  3. The number of recordings, which have not been verified by a final human.

These warning will only be shown in Travis CI, if there is a validation error separately. If you'd like to produce the warnings, you have to run the test manually in Python 2.7. To do so:

  • Open a terminal
  • Clone the Github repository to your local machine into the current directory (or wherever you want)
clone https://github.com/MTG/otmm_tonic_dataset.git
  • Enter to the folder of the repository
cd otmm_tonic_dataset
  • run "python"
python
  • Then in the Python shell, run:
from unittests.validate_annotations import test_annotations
test_annotations()

References

Seeger, C. (1958). Prescriptive and descriptive music-writing. Music Quarterly, 64(2):184–195.

Annotation Sources

Şentürk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman-Turkish Makam Music. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.

Şentürk, S., & Serra X. (2016). Composition Identification in Ottoman-Turkish Makam Music Using Transposition-Invariant Partial Audio-Score Alignment. In Proceedings of 13th Sound and Music Computing Conference (SMC 2016). pages 434-441, Hamburg, Germany

Karakurt, A., Şentürk S., & Serra X. (2016). MORTY: A Toolbox for Mode Recognition and Tonic Identification. In Proceedings of 3rd International Digital Libraries for Musicology Workshop (DLfM 2016). pages 9-16, New York, NY, USA

Atlı, H. S., Bozkurt, B., Şentürk, S. (2015). A Method for Tonic Frequency Identification of Turkish Makam Music Recordings. In Proceedings of 5th International Workshop on Folk Music Analysis (FMA 2015), pages 119–122, Paris, France.

Şentürk, S., Gulati, S., and Serra, X. (2013). Score informed tonic identification for makam music of Turkey. In Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pages 175–180, Curitiba, Brazil.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.