This is our contribution to DCASE2020 Task 3 Challenge.
(C) 2020 Andrés Pérez-López and Rafael Ibañez-Usach.
This repository holds the implementation of the Parametric Particle Filter (PAPAFIL) method, as described in [1].
PAPAFIL is based on four main building blocks:
- Estimate single-source TF bins and compute their instantaneous narrowband DOA.
- Use a particle tracking system to transform DOAs into consistent event trajectories and activations.
- The B-Format input signal is filtered spatially and temporally with those annotations, producing monophonic event estimates.
- A class label is assigned to each event estimate using a single-class classifier based on GBM.
The architecture is depicted in the following Figure, where Omega and Ypsilon represen the localizationt and temporal activity, respectively, and Kappa is the sound event class.
An example of the output for steps 1. and 2. is plotted in the following Figure
(which can be obtained in the code using the internal debug
parameter).
More information on the method, including the evaluation metric results on the cross-validation development set, is provided in the related article [1].
- APRI contains the files created in our contribution.
- seld-dcase2020 contains a fork of the baseline system [2,3], with minor adaptations for our usage.
- multiple-target-tracking-master contains a fork of the Matlab code which implements the particle filter [4].
The following list enumerates the dependencies of the proposed method. Please check the baseline method repository for their specific requierements.
- python 3.6.9
- numpy
- scipy
- sklearn v0.23.0
- matplotlib
- soundfile (pysoundfile)
- librosa 0.7.2
- keras (v2.3.1 - just for baseline)
- tensorflow (v2.0.0 - just for baseline)
- essentia 2.1b6.dev234 (check https://essentia.upf.edu/installing.html)
- pandas
- matlab(*)
(*)The particle filter engine is coded in Matlab. Therefore, a valid instalation is required in order to run the localization system. However, the matlab-python wrapper handles all the process, so you don't need to go out of your python IDE. A python port of the library is not planned for the near future, although it would be a fantastic tool for the SELD community...
- Download the dataset, and place it in a suitable path in your computer
- Go to
baseline/parameter.py
and set your user name (l. 11), the dataset paths (l. 104-118) and the matplotlib backend (l. 121-124).
There are two different ways to generate the monophonic event dataset with which to train the models, which correspond to PAPAFIL1 and PAPAFIL2 methods [REF TODO]:
- Dataset of PAPAFIL1 is obtained by just parsing the annotation files, and spatially filtering events according to the groundtruth labels.
This is the purpose of the
generate_audio_from_annotations.py
script. - Conversely, PAPAFIL2 dataset is created by actually using the parametric particle filter system, and it is implemented in
generate_audio_from_analysis.py
. In both cases, the resulting datasets will be created in the same folder where the FOA dataset lays.
Acoustic features extraction can be carried out by executing pipeline_feature_engineering.py
. Several parameters can be tunned to setting up the pipeline (for example, whether using augmented data or not).
As a result, up to three dataframes are created (depending on the input parameters):
- df_real: acoustic features for events obtained using metadata (event generation using:
generate_audio_from_annotations.py
). - df_augmented: acoustic features for events obtained from augmented data (event generation using:
generate_audio_from_annotations.py
+training_batch_data_augmentation.py
). - df_extra: acoustic features for events obtained using particle filtering framework (event generation using:
generate_audio_from_analysis.py
). This dataframe is just used in PAPAFIL2
Event class prediction model can be trained by executing pipeline_modelling.py
. It is possible to use several ML algorithms implementations (see get_model_utils.py
). The pipeline allows for multiple user choices: features selection, algorithm, gridseach, etc.
As a result, trained model is stored in a specific folder. Execution settings are also stored in a text file in order to improve traceability.
Due to the developer is very scatterbrained some paremeters could be hardcoded...
As mentioned above, the particle filtering is implemented in Matlab.
The code, contained in multple-target-tracking-master/func_tracking.m
is called from python.
The DOA estimates are passed as a temporary csv file, and the result events are retrieved as a .mat file with the same name im the same folder.
The paths in the matlab file are hardcoded; sorry for that. You will have to set them to point your actual paths.
-
Different parameter configurations, organized as presets, can be described in
baseline/parameter.py
. Each preset specifies different sub-systems to be used in the analysis, along with the selected parameter values: localization (ld_method
andld_method_args
), beamforming mode (beamforming_mode
), classification architecture and model (class_method
andclass_method_args
), and optional postprocessing (event_filter_activation
).
Furthermore, -
The script
run.py
contains the main loop used to perform the SELD analysis in evaluation mode, according to the specified preset. The output of the script is a set of annotation files, which will be created in a folder named after the preset used. There are three main boolean options available:write
: enables output file writing; usually we want this feature activated.plot
: plots the groundtruth and estimated events of all output files for the given preset. Not recommended for full passes of the dataset.quick
: allows the script execution only in the files specified in the listquick_audio_files
; useful for quick tests.
-
Two complementary scripts provide helpful information from already computed output files:
eval.py
computes the evaluation metrics of an existing set of output annotations, specified by preset name.plot.py
plots the groundtruth and estimated events on the list of selected files.
compute_metrics.py
: mainly adapted frombaseline/metrics
, provides the evaluation metrics implementation.eval.py
: quick evaluation of an output annotation set.event_class_prediction.py
: applies trained model for event classificationgenerate_audio_from_analysis.py
: create the PAPAFIL2 training set.generate_audio_from_annotations.py
: create the PAPAFIL1 training set.generate_irs.py
: generate a set of IRs for data augmentation purposes.get_audio_features.py
: computes the extraction of acoustic features by using essentia library.get_data_augmentation.py
: generates augmented data by applying different transformation to original audio filesget_dataframes.py
: utils related with Pandas dataframesget_model_utils.py
: ML algorithms implementations. Also contains feature selection methods.localization_detection.py
: implements the methods used for DOA estimation and particle filtering.pipeline_feature_engineering.py
: pipeline aimed to extract features and configure training dataframes.pipeline_modelling.py
pipeline aimed to train classifier for event classification.plot.py
: convenience script for result visualizationpostprocessing.py
: utils for postprocessing filters used in some presets.run.py
: main SELD scripttraining_batch_data_augmentation.py
: get augmented data from multiple input audiostraining_batch_generate_audio_features.py
: get acoustic features from multiple input audiosutils.py
: contains many useful methods for data handling, parametric analysis, plotting, etc.
IR/
: the folder containing the IRs used for reverberant data augmentation, generated by [5].models/
: holds the different machine learning model parameters and architecturesplot_paper/
: some scripts for generating plots for the articlesresults/
: system output, with a folder for each preset
parameter.py
: main configuration script
func_tracking.m
: implements the particle filter.
Do What the Fuck You Want To Public License
Except for the contents in the metrics
folder that have MIT License.
The rest of the repository is licensed under the TAU License.
(Note: actually, it is possible that the whole project should be converted into GPL because of this one, but I'm not really sure how that affects other third party libraries, i.e. the baseline. Anyway, please contact me if you want to use anything from here, and we will see how to proceed).
[1] "PAPAFIL: a low complexity sound event localization and detection method with parametric particle filtering and gradient boosting". Andrés Pérez-López and Rafael Ibañez-Usach. Submitted to the Worshop on Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020).
[2] "Sound event localization and detection of overlapping sources using convolutional recurrent neural network". Sharath Adavanne, Archontis Politis, Joonas Nikunen and Tuomas Virtanen. IEEE Journal of Selected Topics in Signal Processing (JSTSP 2018).
[3] "Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network". Sharath Adavanne, Archontis Politis and Tuomas Virtanen. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2019)
[4] "Rao-Blackwellized Monte Carlo data association for multiple target tracking". Simo Särkkä, Aki Vehtari and Jouko Lampinen. Proceedings of the seventh international conference on information fusion. Vol. 1. I, 2004.
[5] "A Python library for Multichannel Acoustic Signal Processing." Andrés Pérez-López and Archontis Politis. Audio Engineering Society Convention 148. Audio Engineering Society, 2020.