PyTorch implementation of RPNSD. Our code is largely based on a Faster R-CNN implementation faster-rcnn.pytorch by jwyang.
- Clone this project
git clone https://github.com/HuangZiliAndy/RPNSD.git
cd RPNSD
- Add your Python path to
PATH
variable inpath.sh
, the current default is~/anaconda3/bin
. - Install PyTorch (0.4.0) and torchvision according to your CUDA version
conda install pytorch==0.4.0 cuda91 torchvision pillow"<7" -c pytorch
- Install the packages in requirements.txt
pip install -r requirements.txt
- Prepare Kaldi and Faster R-CNN library (You can specify a Kaldi root if you already have it)
cd tools
make KALDI=<path/to/a/compiled/kaldi/directory>
- Set your backend computing environment to
cmd.sh
# Select the backend used by run.sh from "local", "sge", "slurm", or "ssh"
cmd_backend='local'
The purpose of this step includes
- Prepare a large diarization dataset with Mixer6, SRE and SWBD. The majority of the dataset is two-channel telephone conversation of two people. We sum up the channels to create diarization style training data.
- Prepare test set with CALLHOME dataset. Since the CALLHOME dataset doesn't specify train/dev/test, we use 5 folds cross validation.
./run_prepare_shared.sh
Training on the Mixer6 + SRE + SWBD dataset. Default setting uses single GPU and takes about 4 days.
./train.sh
Pretrained model is available at pretrain-model.
Adapt the model on in-domain data. Since we use 5 folds cross validation, each time we train on 400 utterances from CALLHOME dataset and test on 100.
./adapt.sh
Inference stage.
- Forward the network to get speech region proposals, speaker embedding and background probability.
- Post-processing with clustering and NMS.
- Compute Diarization Error Rate (DER).
./inference.sh
One example from CALLHOME dataset. The first stream is the ground truth label, the second stream is the x-vector system, and the third stream is RPNSD.
@inproceedings{huang2020speaker,
Title={Speaker Diarization with Region Proposal Network},
Author={Huang, Zili and Watanabe, Shinji and Fujita, Yusuke and Garcia, Paola and Shao, Yiwen and Povey, Daniel and Khudanpur, Sanjeev},
Booktitle={Accepted to ICASSP 2020},
Year={2020}
}
@article{jjfaster2rcnn,
Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},
Title = {A Faster Pytorch Implementation of Faster R-CNN},
Journal = {https://github.com/jwyang/faster-rcnn.pytorch},
Year = {2017}
}