SRP-DNN

A python implementation of “SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

Contributions
- Learning competing and time-varying direct-path inter-channel phase differences (or IPD sequence) for multiple moving sources
  - avoids the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets
  - exhibits reliable peaks around the actual directions of sources by the constructed spatial spectrum
- Iterative source detection and localization
  - separates the merged peaks of spatial spectrum caused by the interaction between sources
  - achieves superior performance for the azimuth and elevation estimation of multiple moving sound sources
Suited cases
- good or adverse noisy and reverberant scenario
- single or multiple sound sources
- static or moving source sources
- the number of sound sources is known or unknown
- different topologies of microphone arrays

Datasets

Source signals: from LibriSpeech database
Real-world multi-channel microphone signals: from LOCATA database

Quick start

Preparation
- copy the train-clean-100, dev-clean and test-clean folders of LibriSpeech database to SRP-DNN/data/SrcSig/LibriSpeech
- install: numpy, scipy, soundfile, tqdm, matplotlib, gpuRIR, webrtcvad, etc.

Training

python RunSRPDNN.py --train --gen-on-the-fly --gpu-id [*] (--use-amp)

Evaluation

use GPU

python RunSRPDNN.py --test --gpu-id [*] --time 00000001 --eval-mode locata pred eval (--use-amp)

use CPU

python RunSRPDNN.py --test --no-cuda --time 00000001 --eval-mode locata pred eval (--use-amp)

Pretrained models
- exp/00000002/best_model.tar

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{yang2022srpdnn,
    author = "Bing Yang and Hong Liu and Xiaofei Li",
    title = "SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization",
    booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
    year = "2022",
    pages = "721-725"}

Reference code

Cross3D

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
code		code
exp		exp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRP-DNN

Datasets

Quick start

Citation

Reference code

Licence

About

Releases

Packages

Languages

License

BingYang-20/SRP-DNN

Folders and files

Latest commit

History

Repository files navigation

SRP-DNN

Datasets

Quick start

Citation

Reference code

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages