A collection of tools and 1,000,000+ unified annotations for bioacoustics datasets.
Dataset | Species | # of annotated calls | Dataset size (GB) | Duration (hh:mm:ss) | License |
---|---|---|---|---|---|
Animal Sounds | Birds, cats, chickens, cows, dogs, donkeys, frogs, lions, monkeys, sheep | 809 | 0.13 | 0:57:47 | - |
AnuraSet | Anurans | 16089 | 18.6 | 27:00:00 | cc-by-4.0 |
BIRDeep | 38 avian species | 3749 | 13.41 | 8:50:00 | MIT |
BirdVox | 25 avian species | 35402 | 0.79 | 7:23:00 | cc-by-4.0 |
Domestic Canary | Canary | 23308 | 0.856 | 3:00:00 | cc-by-4.0 |
Columbia/Costa Rica Coffee Farms | 89 avian species | 6952 | 3.8 | 34:00:00 | cc-by-4.0 |
Darpa | Humans | 1718 | 0.67 | 4:00:00 | No license specified, the work may be protected by copyright |
Avain Dawn | 58 avian species, 1 amphibian species | 41183 | 20.3 | 131:15:00 | cc-by-4.0 |
DCASE | Birds | 7206 | 32.00 | 17:25:00 | cc-by-4.0 |
Egyptian fruit bat | Egyptian fruit bat | 90000 | 91.00 | 37:45:00 | cc-by-4.0 |
ENABirds | Birds | 16052 | 1.40 | 6:20:00 | cc-by-1.0 |
Female Rook | Rook birds | 3417 | 54.37 | 10:45:36 | cc-by-nc-nd-4.0 |
The Vocal Repertoire of Adult and Neonate Otters | Otter | 441 | 0.57 | 0:06:23 | cc |
Hainan Gibbons | Hainan Gibbons | 1233 | 13.39 | 104:00:00 | cc-by-4.0 |
Hawaii Birds | 27 avian species | 59583 | 5.8 | 51:00:00 | cc-by-4.0 |
HICEAS | Whales, Dolphins | 796 | 3.10 | 12:40:00 | "Public dataset hosted in Google Cloud Storage" |
Distributed acoustic cues for caller identity in macaque vocalization | Macaques monkeys | 7285 | 0.15 | 0:45:00 | cc-by-1.0 |
InfantMarmosetVox | Marmosets monkeys | 169318 | 21.2 | 58:20:00 | cc-by-4.0 |
Northeast US Sounds | 81 avian species | 50760 | 27.8 | 285:00:00 | cc-by-4.0 |
Orcas Classifications | Orca whales | 398 | 0.26 | 0:26:30 | - |
Pigs | Pig | 6887 | 0.2 | 0:40:26 | cc-by-4.0 |
Rainforest | Birds, frogs | 1216 | 13.05 | 20:16:00 | "Free for personal or academic purposes" |
Rodents | Rodents: mouse, gerbil | 4576 | 1.36 | 0:48:34 | cc-by-4.0 |
Rook | Rook birds | 17662 | 23.49 | 17:21:17 | cc-by-4.0 |
Sierra Nevada | 21 avian species | 10976 | 3.57 | 16:40:00 | cc-by-4.0 |
Southwest Amazon | 132 avian species | 16482 | 4.51 | 21:00:00 | cc-by-4.0 |
Watkins Marine Animal Sounds | 21 dolphin, 13 seal, 32 whale species | 15152 | 9.61 | 29:10:15 | "Sound files are free to download for personal or academic use" |
Western US | 56 avian species | 20147 | 7.08 | 33:00:00 | cc-by-4.0 |
To install all of the data, you need about TOTAL_GB of free space. But you can also pick and choose which datasets you'd like to download.
- Run ./scripts/download_data.sh
After running the download script, your datasets folder should look like this:
└── datasets/
├── annotations.pkl
├── dataset1/
│ ├── audio/
│ │ ├── audio1.wav
│ │ └── audio2.wav
│ ├── annotations.pkl
│ └── stats.txt
└── dataset2/
├── audio/
├── annotations.pkl
└── stats.txt
There are individual annotation files for each dataset and one master annotations file located directly in the datasets folder. annotations.pkl is a Python dictionary structured as the following
{
wav_file_path: [
{'start_time': 0, 'end_time': 1.7, 'species': 'bird', 'sub-species': 'serinus canaria'},
{'start_time': 2.3, 'end_time': 2.48, ...},
...
]
...
}
-
./scripts/generate_spectrograms.sh - Running this will generate 100 high quality mel spectrograms for each dataset and place them in the visualizations folder
-
scripts/moving_spectrogram.py example_audio.wav output.wav - This script takes in a wav file and generates a moving spectogram with audio called example_audio.wav and saves it in visualizations/output.wav
-
training/data_engine.py - This a very helpful file to take in datasets and easily produce a PyTorch Dataset. The __ getitem __() method has the output [audio, is_vocalization, species, speaker]. Audio is a tensor of numbers, is_vocalization is a boolean, species is the species of the vocalization, and speaker is the speaker of the vocalization. species and speaker will both be "Noise" if it is a non-vocalization event and speaker will be 'no-speaker' if there is no speaker data. Dataset has three required parameters: datasets_path which should just be datsets folder. Save_path which is where train and val splits will be stored. And datasets which is a list of all the datasets you would like to utilize. After the data is loaded into the data_engine, you can call data_engine.get_annotated_dataset(dataset_names=[]) which returns the above stated PyTorch dataset.