VERSA

VERSA (Versatile Evaluation of Speech and Audio) is a toolkit dedicated to collecting evaluation metrics in speech and audio quality. Our goal is to provide a comprehensive connection to the cutting-edge techniques developed for evaluation. The toolkit is also tightly integrated into ESPnet.

Colab Demonstration

Colab Demonstration at Interspeech2024 Tutorial

Install

The base-installation is as easy as follows:

git clone https://github.com/shinjiwlab/versa.git
cd versa
pip install .

As for collection purposes, VERSA instead of re-distributing the model, we try to align as much to the original API provided by the algorithm developer. Therefore, we have many dependencies. We try to include as many as default, but there are cases where the toolkit needs specific installation requirements. Please refer to our list-of-metric section for more details on whether the metrics are automatically included or not. If not, we provide an installation guide or installers in tools.

Quick test

python versa/test/test_general.py

# test metrics with additional installation
python versa/test/test_{metric}.py

Usage

Simple usage case for a few samples.

# direct usage
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1 \
    --pred test/test_samples/test2 \
    --output_file test_result

# with scp-style input
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result

# with kaldi-ark style
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --io kaldi
  
# For text information
python versa/bin/scorer.py \
    --score_config egs/separate_metrics/wer.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --text test/test_samples/text

Use launcher with slurm job submissions

# use the launcher
# Option1: with gt speech
./launch.sh \
  <pred_speech_scp> \
  <gt_speech_scp> \
  <score_dir> \
  <split_job_num> 

# Option2: without gt speech
./launch.sh \
  <pred_speech_scp> \
  None \
  <score_dir> \
  <split_job_num>

# aggregate the results
cat <score_dir>/result/*.result.cpu.txt > <score_dir>/utt_result.cpu.txt
cat <score_dir>/result/*.result.gpu.txt > <score_dir>/utt_result.gpu.txt

# show result
python scripts/show_result.py <score_dir>/utt_result.cpu.txt
python scripts/show_result.py <score_dir>/utt_result.gpu.txt

Access egs/*.yaml for different configs for different setups.

List of Metrics

Independent Metrics

We include x mark if the metric is auto-installed in versa.

Number	Auto-Install	Metric Name (Auto-Install)	Key in config	Key in report	Code Source	References
1	x	Deep Noise Suppression MOS Score of P.835 (DNSMOS)	pseudo_mos	dnsmos_overall	speechmos (MS)	paper
2	x	Deep Noise Suppression MOS Score of P.808 (DNSMOS)	pseudo_mos	dnsmos_p808	speechmos (MS)	paper
3		Non-intrusive Speech Quality and Naturalness Assessment (NISQA)			NISQA	paper
4	x	UTokyo-SaruLab System for VoiceMOS Challenge 2022 (UTMOS)	pseudo_mos	utmos	speechmos	paper
5	x	Packet Loss Concealment-related MOS Score (PLCMOS)	pseudo_mos	plcmos	speechmos (MS)	paper
6	x	PESQ in TorchAudio-Squim	squim_no_ref	torch_squim_pesq	torch_squim	paper
7	x	STOI in TorchAudio-Squim	squim_no_ref	torch_squim_stoi	torch_squim	paper
8	x	SI-SDR in TorchAudio-Squim	squim_no_ref	torch_squim_si_sdr	torch_squim	paper
9	x	Singing voice MOS	singmos	singmos	singmos	paper
10	x	Sheet SSQA MOS Models	sheet_ssqa	sheet_ssqa	Sheet	paper
11		UTMOSv2: UTokyo-SaruLab MOS Prediction System	utmosv2	utmosv2	UTMOSv2	paper
12		Speech Contrastive Regression for Quality Assessment without reference (ScoreQ)	scoreq_nr	scoreq_nr	ScoreQ	paper
13	x	Speech enhancement-based SI-SNR	se_snr	se_si_snr	ESPnet
14	x	Speech enhancement-based CI-SDR	se_snr	se_ci_sdr	ESPnet
15	x	Speech enhancement-based SAR	se_snr	se_sar	ESPnet
16	x	Speech enhancement-based SDR	se_snr	se_sdr	ESPnet
17	x	PAM: Prompting Audio-Language Models for Audio Quality Assessment	pam	pam	PAM	Paper
18		Speech-to-Reverberation Modulation energy Ratio (SRMR)	srmr	srmr	SRMRpy	Paper
19	x	Voice Activity Detection (VAD)	vad	vad_info	SileroVAD
20		Speaker Turn Taking (SPK-TT)
21	x	SPeaker Word Rate (SWR)
22	x	Auti-spoofing Score (SpoofS) with AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks	asvspoof_score	asvspoof_score	AASIST	Paper

Dependent Metrics

Number	Auto-Install	Metric Name (Auto-Install)	Key in config	Key in report	Code Source	References
1	x	Mel Cepstral Distortion (MCD)	mcd_f0	mcd	espnet and s3prl-vc	paper
2	x	F0 Correlation	mcd_f0	f0_corr	espnet and s3prl-vc	paper
3	x	F0 Root Mean Square Error	mcd_f0	f0_rmse	espnet and s3prl-vc	paper
4	x	Signal-to-interference Ratio (SIR)	signal_metric	sir	espnet	-
5	x	Signal-to-artifact Ratio (SAR)	signal_metric	sar	espnet	-
6	x	Signal-to-distortion Ratio (SDR)	signal_metric	sdr	espnet	-
7	x	Convolutional scale-invariant signal-to-distortion ratio (CI-SDR)	signal_metric	ci-sdr	ci_sdr	paper
8	x	Scale-invariant signal-to-noise ratio (SI-SNR)	signal_metric	si-snr	espnet	paper
9	x	Perceptual Evaluation of Speech Quality (PESQ)	pesq	pesq	pesq	paper
10	x	Short-Time Objective Intelligibility (STOI)	stoi	stoi	pystoi	paper
11	x	Speech BERT Score	discrete_speech	speech_bert	discrete speech metric	paper
12	x	Discrete Speech BLEU Score	discrete_speech	speech_belu	discrete speech metric	paper
13	x	Discrete Speech Token Edit Distance	discrete_speech	speech_token_distance	discrete speech metric	paper
14		Dynamic Time Warping Cost Metric	warpq	warpq	WARP-Q	paper
15		Speech Contrastive Regression for Quality Assessment with reference (ScoreQ)	scoreq_ref	scoreq_ref	ScoreQ	paper
16		2f-Model
17	x	Log-Weighted Mean Square Error	log_wmse	log_wmse	log_wmse
18	x	ASR-oriented Mismatch Error Rate (ASR-Mismatch)
19		Virtual Speech Quality Objective Listener (VISQOL)	visqol	visqol	google-visqol	paper
20		Frequency-Weighted SEGmental SNR (FWSEGSNR)	pysepm	pysepm_fwsegsnr	pysepm	Paper
21		Weighted Spectral Slope (WSS)	pysepm	pysepm_wss	pysepm	Paper
22		Cepstrum Distance Objective Speech Quality Measure (CD)	pysepm	pysepm_cd	pysepm	Paper
23		Composite Objective Speech Quality (composite)	pysepm	pysepm_Csig, pysepm_Cbak, pysepm_Covl	pysepm	Paper
24		Coherence and speech intelligibility index (CSII)	pysepm	pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low	pysepm	Paper
25		Normalized-covariance measure (NCM)	pysepm	pysepm_ncm	pysepm	Paper

Non-match Metrics

Number	Auto-Install	Metric Name (Auto-Install)	Key in config	Key in report	Code Source	References
1		NORESQA : A Framework for Speech Quality Assessment using Non-Matching References	noresqa	noresqa	Noresqa	Paper
2	x	MOS in TorchAudio-Squim	squim_ref	torch_squim_mos	torch_squim	paper
3	x	ESPnet Speech Recognition-based Error Rate	espnet_wer	espnet_wer	ESPnet	paper
4	x	ESPnet-OWSM Speech Recognition-based Error Rate	owsm_wer	owsm_wer	ESPnet	paper
5	x	OpenAI-Whisper Speech Recognition-based Error Rate	whisper_wer	whisper_wer	Whisper	paper
6		Emotion2vec similarity (emo2vec)	emo2vec_similarity	emotion_similarity	emo2vec	paper
7	x	Speaker Embedding Similarity	speaker	spk_similarity	espnet	paper
8		NOMAD: Unsupervised Learning of Perceptual Embeddings For Speech Enhancement and Non-Matching Reference Audio Quality Assessment	nomad	nomad	Nomad	paper
9		Contrastive Language-Audio Pretraining Score (CLAP Score)	clap_score	clap_score	fadtk	paper
10		Accompaniment Prompt Adherence (APA)	apa	apa	Sony-audio-metrics	paper
11		Log Likelihood Ratio (LLR)	pysepm	pysepm_llr	pysepm	Paper

Distributional Metrics (in verifying)

Number	Metric Name (Auto-Install)	Key in config	Key in report	Code Source	References
1	Frechet Audio Distance (FAD)	fad	fad	fadtk	paper
2	Kullback-Leibler Divergence on Embedding Distribution	kl_embedding	kl_embedding	Stability-AI
3	Audio Density Score	audio_density_coverage	audio_density	Sony-audio-metrics	paper
4	Audio Coverage Score	audio_density_coverage	audio_coverage	Sony-audio-metrics	paper
5	KID : Kernel Distance Metric for Audio/Music Quality	KID	Paper

Citation

If you find this repo useful, please cite the following papers:

@misc{shi2024versaversatileevaluationtoolkit,
      title={VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music}, 
      author={Jiatong Shi and Hye-jin Shim and Jinchuan Tian and Siddhant Arora and Haibin Wu and Darius Petermann and Jia Qi Yip and You Zhang and Yuxun Tang and Wangyou Zhang and Dareen Safar Alharthi and Yichen Huang and Koichi Saito and Jionghao Han and Yiwen Zhao and Chris Donahue and Shinji Watanabe},
      year={2024},
      eprint={2412.17667},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2412.17667}, 
}

@misc{shi2024espnetcodeccomprehensivetrainingevaluation,
      title={ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech}, 
      author={Jiatong Shi and Jinchuan Tian and Yihan Wu and Jee-weon Jung and Jia Qi Yip and Yoshiki Masuyama and William Chen and Yuning Wu and Yuxun Tang and Massa Baali and Dareen Alharhi and Dong Zhang and Ruifan Deng and Tejes Srivastava and Haibin Wu and Alexander H. Liu and Bhiksha Raj and Qin Jin and Ruihua Song and Shinji Watanabe},
      year={2024},
      eprint={2409.15897},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2409.15897}, 
}

Acknowledgement

We sincerely thank all the open-source implementations listed in https://github.com/shinjiwlab/versa/tree/main#list-of-metrics

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
egs		egs
scripts		scripts
test		test
tools		tools
versa		versa
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md
launch.sh		launch.sh
setup.py		setup.py
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VERSA

Colab Demonstration

Install

Quick test

Usage

List of Metrics

Independent Metrics

Dependent Metrics

Non-match Metrics

Distributional Metrics (in verifying)

Citation

Acknowledgement

About

Releases

Packages

Contributors 4

Languages

License

shinjiwlab/versa

Folders and files

Latest commit

History

Repository files navigation

VERSA

Colab Demonstration

Install

Quick test

Usage

List of Metrics

Independent Metrics

Dependent Metrics

Non-match Metrics

Distributional Metrics (in verifying)

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages