This repository is for generating sound event localization and detection (SELD) samples, including microphone-array-format signals and first-order-ambisonics-format signals.
Download FSD50K (sound event samples)): https://zenodo.org/records/4060432
Download TAU-SRIR DB (spatial room impulse responses, optional)): https://zenodo.org/records/6408611
The Python version is 3.11.8.
If you need to generate first-order-ambisonic format signals, please download matlab R2023b and Matlab Engine for Python 3.11.8. The corresponding scripts are implemented by Matlab.
Run the following command to install necessary Python packages.
pip install -r requirements.txt
The database
directory should contain the FSD50K and TAU-SRIR DB datasets:
./database ├────FSD50K │ ├── FSD50K.dev_audio │ ├── FSD50K.doc │ ├── FSD50K.eval_audio │ └── ... ├────TAU-SRIR_DB │ ├── TAU-SNoise_DB │ └── TAU-SRIR_DB └─...
Then run:
python make_dataset.py 1 # synthesize SELD samples using simulated SRIRs
or run:
python make_dataset.py 2 # synthesize SELD samples using collected SRIRs from TAU-SRIR DB
This repository contains several files, which in total create a complete data generation framework.
- The
get_parameters.py
is a separate script used for setting the parameters for the data generation process. - The
make_dataset.py
is the main script in which the whole framework is used to perform the full data generation process. - The
utils.py
is an additional file containing complementary functions for other scripts. - The
data_generator/db_config.py
is a class for containing audio filelists and data parameters from audio datasets used for the mixture generation. - The
data_generator/data_synthesis.py
is a class for data synthesis using simulated SRIRs. - The
data_generator/data_synthesis_test.py
is a class for data synthesis using collected SRIRs from TAU-SRIR DB. - The
srir
contains the scripts of simulating RIRs and first-order-ambisonics converting. - The
source_datasets/material_absorption
contains material absorption coefficients, and more details can be found in [2]. - The
source_datasets/single_source_samples
contains selected single-source sound event clips from FSD50K, and more details can be found in [2].
Moreover, one object file is included in case that the database configuration via db_config.py
takes too much time:
- The
db_config_fsd50k.obj
is a DBConfig class containing information about the database and files for the FSD50K dataset.
Please consider citing our paper if you find this code useful for your research.
[1] Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, and Jun Yang, "Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection," IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2024. URL
[2] Jinbo Hu, Yin Cao, Ming Wu, Fang Kang, Feiran Yang, Wenwu Wang, Mark D. Plumbley, Jun Yang, "PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection" arXiv:2411.06399, 2024. URL