Roadmap (Current support List) | Documents | Paper | Runtime | Pretrained Models | Huggingface Demo | Modelscope Demo
WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.
pip install git+https://github.com/wenet-e2e/wespeaker.git
Command-line usage (use -h
for parameters):
$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt
$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding
$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav
$ wespeaker --task diarization --audio_file audio.wav
Python programming usage:
import wespeaker
model = wespeaker.load_model('chinese')
embedding = model.extract_embedding('audio.wav')
utt_names, embeddings = model.extract_embedding_list('wav.scp')
similarity = model.compute_similarity('audio1.wav', 'audio2.wav')
diar_result = model.diarize('audio.wav')
Please refer to python usage for more command line and python programming usage.
- Clone this repo
git clone https://github.com/wenet-e2e/wespeaker.git
- Create conda env: pytorch version >= 1.10.0 is required !!!
conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
pre-commit install # for clean and tidy code
-
2023.11.13: Support CLI usage of wespeaker, check python usage for details.
-
2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see #186.
-
2023.07.14: Support the NIST SRE16 recipe, see #177.
-
2023.07.10: Support the Self-Supervised Learning recipe on Voxceleb, including DINO, MoCo and SimCLR, see #180.
-
2023.06.30: Support the SphereFace2 loss function, with better performance and noisy robust in comparison with the ArcMargin Softmax, see #173.
-
2023.04.27: Support the CAM++ model, with better performance and single-thread inference rtf in comparison with the ResNet34 model, see #153.
- VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
- 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving 2.627% (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
- 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving 0.447%/0.043 EER/mindcf on vox1-O-clean test set
- 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
- EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
- CNCeleb: Speaker Verification recipe on the CnCeleb dataset
- NIST SRE16: Speaker Verification recipe for the 2016 NIST Speaker Recognition Evaluation Plan. Similar recipe can be found in Kaldi.
- 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
- VoxConverse: Diarization recipe on the VoxConverse dataset
For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community
.
We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.
If you find wespeaker useful, please cite it as
@inproceedings{wang2023wespeaker,
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
If you are interested to contribute, feel free to contact @wsstriving or @robin1001