mirdata

Common loaders for Music Information Retrieval (MIR) datasets. Find the API documentation here.

This library provides tools for working with common MIR datasets, including tools for:

downloading datasets to a common location and format
validating that the files for a dataset are all present
loading annotation files to a common format, consistent with the format required by mir_eval
parsing track level metadata for detailed evaluations

Installation

To install, simply run:

pip install mirdata

Quick example

import mirdata

orchset = mirdata.initialize('orchset')
orchset.download()  # download the dataset
orchset.validate()  # validate that all the expected files are there

example_track = orchset.choice_track()  # choose a random example track
print(example_track)  # see the available data

See the documentation for more examples and the API reference.

Currently supported datasets

Supported datasets include AcousticBrainz, DALI, Guitarset, MAESTRO, TinySOL, among many others.

For the complete list of supported datasets, see the documentation

Citing

There are two ways of citing mirdata:

If you are using the library for your work, please cite the version you used as indexed at Zenodo:

If you refer to mirdata's design principles, motivation etc., please cite the following paper:

"mirdata: Software for Reproducible Usage of Datasets"
Rachel M. Bittner, Magdalena Fuentes, David Rubinstein, Andreas Jansson, Keunwoo Choi, and Thor Kell
in International Society for Music Information Retrieval (ISMIR) Conference, 2019

@inproceedings{
  bittner_fuentes_2019,
  title={mirdata: Software for Reproducible Usage of Datasets},
  author={Bittner, Rachel M and Fuentes, Magdalena and Rubinstein, David and Jansson, Andreas and Choi, Keunwoo and Kell, Thor},
  booktitle={International Society for Music Information Retrieval (ISMIR) Conference},
  year={2019}
}

When working with datasets, please cite the version of mirdata that you are using (given by the DOI above) AND include the reference of the dataset, which can be found in the respective dataset loader using the cite() method.

Contributing a new dataset loader

We welcome contributions to this library, especially new datasets. Please see contributing for guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 342 Commits
.github		.github
docs		docs
mirdata		mirdata
scripts		scripts
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mirdata

Installation

Quick example

Currently supported datasets

Citing

Contributing a new dataset loader

About

Releases 22

Packages

Contributors 37

Languages

License

mir-dataset-loaders/mirdata

Folders and files

Latest commit

History

Repository files navigation

mirdata

Installation

Quick example

Currently supported datasets

Citing

Contributing a new dataset loader

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 22

Packages 0

Contributors 37

Languages

Packages