A python implementation of “SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
- Contributions
- Learning competing and time-varying direct-path inter-channel phase differences (or IPD sequence) for multiple moving sources
- avoids the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets
- exhibits reliable peaks around the actual directions of sources by the constructed spatial spectrum
- Iterative source detection and localization
- separates the merged peaks of spatial spectrum caused by the interaction between sources
- achieves superior performance for the azimuth and elevation estimation of multiple moving sound sources
- Learning competing and time-varying direct-path inter-channel phase differences (or IPD sequence) for multiple moving sources
- Suited cases
- good or adverse noisy and reverberant scenario
- single or multiple sound sources
- static or moving source sources
- the number of sound sources is known or unknown
- different topologies of microphone arrays
- Source signals: from LibriSpeech database
- Real-world multi-channel microphone signals: from LOCATA database
-
Preparation
-
Training
python RunSRPDNN.py --train --gen-on-the-fly --gpu-id [*] (--use-amp)
-
Evaluation
- use GPU
python RunSRPDNN.py --test --gpu-id [*] --time 00000001 --eval-mode locata pred eval (--use-amp)
- use CPU
python RunSRPDNN.py --test --no-cuda --time 00000001 --eval-mode locata pred eval (--use-amp)
-
Pretrained models
- exp/00000002/best_model.tar
If you find our work useful in your research, please consider citing:
@InProceedings{yang2022srpdnn,
author = "Bing Yang and Hong Liu and Xiaofei Li",
title = "SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization",
booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
year = "2022",
pages = "721-725"}
MIT