Skip to content

shinjiwlab/versa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VERSA

VERSA (Versatile Evaluation of Speech and Audio) is a toolkit dedicated to collecting evaluation metrics in speech and audio quality. Our goal is to provide a comprehensive connection to the cutting-edge techniques developed for evaluation. The toolkit is also tightly integrated into ESPnet.

Colab Demonstration

Colab Demonstration at Interspeech2024 Tutorial

Install

The base-installation is as easy as follows:

git clone https://github.com/shinjiwlab/versa.git
cd versa
pip install .

As for collection purposes, VERSA instead of re-distributing the model, we try to align as much to the original API provided by the algorithm developer. Therefore, we have many dependencies. We try to include as many as default, but there are cases where the toolkit needs specific installation requirements. Please refer to our list-of-metric section for more details on whether the metrics are automatically included or not. If not, we provide an installation guide or installers in tools.

Quick test

python versa/test/test_general.py

# test metrics with additional installation
python versa/test/test_{metric}.py

Usage

Simple usage case for a few samples.

# direct usage
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1 \
    --pred test/test_samples/test2 \
    --output_file test_result

# with scp-style input
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result

# with kaldi-ark style
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --io kaldi
  
# For text information
python versa/bin/scorer.py \
    --score_config egs/separate_metrics/wer.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --text test/test_samples/text

Use launcher with slurm job submissions

# use the launcher
# Option1: with gt speech
./launch.sh \
  <pred_speech_scp> \
  <gt_speech_scp> \
  <score_dir> \
  <split_job_num> 

# Option2: without gt speech
./launch.sh \
  <pred_speech_scp> \
  None \
  <score_dir> \
  <split_job_num>

# aggregate the results
cat <score_dir>/result/*.result.cpu.txt > <score_dir>/utt_result.cpu.txt
cat <score_dir>/result/*.result.gpu.txt > <score_dir>/utt_result.gpu.txt

# show result
python scripts/show_result.py <score_dir>/utt_result.cpu.txt
python scripts/show_result.py <score_dir>/utt_result.gpu.txt 

Access egs/*.yaml for different configs for different setups.

List of Metrics

Independent Metrics

We include x mark if the metric is auto-installed in versa.

Number Auto-Install Metric Name (Auto-Install) Key in config Key in report Code Source References
1 x Deep Noise Suppression MOS Score of P.835 (DNSMOS) pseudo_mos dnsmos_overall speechmos (MS) paper
2 x Deep Noise Suppression MOS Score of P.808 (DNSMOS) pseudo_mos dnsmos_p808 speechmos (MS) paper
3 Non-intrusive Speech Quality and Naturalness Assessment (NISQA) NISQA paper
4 x UTokyo-SaruLab System for VoiceMOS Challenge 2022 (UTMOS) pseudo_mos utmos speechmos paper
5 x Packet Loss Concealment-related MOS Score (PLCMOS) pseudo_mos plcmos speechmos (MS) paper
6 x PESQ in TorchAudio-Squim squim_no_ref torch_squim_pesq torch_squim paper
7 x STOI in TorchAudio-Squim squim_no_ref torch_squim_stoi torch_squim paper
8 x SI-SDR in TorchAudio-Squim squim_no_ref torch_squim_si_sdr torch_squim paper
9 x Singing voice MOS singmos singmos singmos paper
10 x Sheet SSQA MOS Models sheet_ssqa sheet_ssqa Sheet paper
11 UTMOSv2: UTokyo-SaruLab MOS Prediction System utmosv2 utmosv2 UTMOSv2 paper
12 Speech Contrastive Regression for Quality Assessment without reference (ScoreQ) scoreq_nr scoreq_nr ScoreQ paper
13 x Speech enhancement-based SI-SNR se_snr se_si_snr ESPnet
14 x Speech enhancement-based CI-SDR se_snr se_ci_sdr ESPnet
15 x Speech enhancement-based SAR se_snr se_sar ESPnet
16 x Speech enhancement-based SDR se_snr se_sdr ESPnet
17 x PAM: Prompting Audio-Language Models for Audio Quality Assessment pam pam PAM Paper
18 Speech-to-Reverberation Modulation energy Ratio (SRMR) srmr srmr SRMRpy Paper
19 x Voice Activity Detection (VAD) vad vad_info SileroVAD
20 Speaker Turn Taking (SPK-TT)
21 x SPeaker Word Rate (SWR)
22 x Auti-spoofing Score (SpoofS) with AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks asvspoof_score asvspoof_score AASIST Paper

Dependent Metrics

Number Auto-Install Metric Name (Auto-Install) Key in config Key in report Code Source References
1 x Mel Cepstral Distortion (MCD) mcd_f0 mcd espnet and s3prl-vc paper
2 x F0 Correlation mcd_f0 f0_corr espnet and s3prl-vc paper
3 x F0 Root Mean Square Error mcd_f0 f0_rmse espnet and s3prl-vc paper
4 x Signal-to-interference Ratio (SIR) signal_metric sir espnet -
5 x Signal-to-artifact Ratio (SAR) signal_metric sar espnet -
6 x Signal-to-distortion Ratio (SDR) signal_metric sdr espnet -
7 x Convolutional scale-invariant signal-to-distortion ratio (CI-SDR) signal_metric ci-sdr ci_sdr paper
8 x Scale-invariant signal-to-noise ratio (SI-SNR) signal_metric si-snr espnet paper
9 x Perceptual Evaluation of Speech Quality (PESQ) pesq pesq pesq paper
10 x Short-Time Objective Intelligibility (STOI) stoi stoi pystoi paper
11 x Speech BERT Score discrete_speech speech_bert discrete speech metric paper
12 x Discrete Speech BLEU Score discrete_speech speech_belu discrete speech metric paper
13 x Discrete Speech Token Edit Distance discrete_speech speech_token_distance discrete speech metric paper
14 Dynamic Time Warping Cost Metric warpq warpq WARP-Q paper
15 Speech Contrastive Regression for Quality Assessment with reference (ScoreQ) scoreq_ref scoreq_ref ScoreQ paper
16 2f-Model
17 x Log-Weighted Mean Square Error log_wmse log_wmse log_wmse
18 x ASR-oriented Mismatch Error Rate (ASR-Mismatch)
19 Virtual Speech Quality Objective Listener (VISQOL) visqol visqol google-visqol paper
20 Frequency-Weighted SEGmental SNR (FWSEGSNR) pysepm pysepm_fwsegsnr pysepm Paper
21 Weighted Spectral Slope (WSS) pysepm pysepm_wss pysepm Paper
22 Cepstrum Distance Objective Speech Quality Measure (CD) pysepm pysepm_cd pysepm Paper
23 Composite Objective Speech Quality (composite) pysepm pysepm_Csig, pysepm_Cbak, pysepm_Covl pysepm Paper
24 Coherence and speech intelligibility index (CSII) pysepm pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low pysepm Paper
25 Normalized-covariance measure (NCM) pysepm pysepm_ncm pysepm Paper

Non-match Metrics

Number Auto-Install Metric Name (Auto-Install) Key in config Key in report Code Source References
1 NORESQA : A Framework for Speech Quality Assessment using Non-Matching References noresqa noresqa Noresqa Paper
2 x MOS in TorchAudio-Squim squim_ref torch_squim_mos torch_squim paper
3 x ESPnet Speech Recognition-based Error Rate espnet_wer espnet_wer ESPnet paper
4 x ESPnet-OWSM Speech Recognition-based Error Rate owsm_wer owsm_wer ESPnet paper
5 x OpenAI-Whisper Speech Recognition-based Error Rate whisper_wer whisper_wer Whisper paper
6 Emotion2vec similarity (emo2vec) emo2vec_similarity emotion_similarity emo2vec paper
7 x Speaker Embedding Similarity speaker spk_similarity espnet paper
8 NOMAD: Unsupervised Learning of Perceptual Embeddings For Speech Enhancement and Non-Matching Reference Audio Quality Assessment nomad nomad Nomad paper
9 Contrastive Language-Audio Pretraining Score (CLAP Score) clap_score clap_score fadtk paper
10 Accompaniment Prompt Adherence (APA) apa apa Sony-audio-metrics paper
11 Log Likelihood Ratio (LLR) pysepm pysepm_llr pysepm Paper

Distributional Metrics (in verifying)

Number Auto-Install Metric Name (Auto-Install) Key in config Key in report Code Source References
1 Frechet Audio Distance (FAD) fad fad fadtk paper
2 Kullback-Leibler Divergence on Embedding Distribution kl_embedding kl_embedding Stability-AI
3 Audio Density Score audio_density_coverage audio_density Sony-audio-metrics paper
4 Audio Coverage Score audio_density_coverage audio_coverage Sony-audio-metrics paper
5 KID : Kernel Distance Metric for Audio/Music Quality KID Paper

Citation

If you find this repo useful, please cite the following papers:

@misc{shi2024versaversatileevaluationtoolkit,
      title={VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music}, 
      author={Jiatong Shi and Hye-jin Shim and Jinchuan Tian and Siddhant Arora and Haibin Wu and Darius Petermann and Jia Qi Yip and You Zhang and Yuxun Tang and Wangyou Zhang and Dareen Safar Alharthi and Yichen Huang and Koichi Saito and Jionghao Han and Yiwen Zhao and Chris Donahue and Shinji Watanabe},
      year={2024},
      eprint={2412.17667},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2412.17667}, 
}

@misc{shi2024espnetcodeccomprehensivetrainingevaluation,
      title={ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech}, 
      author={Jiatong Shi and Jinchuan Tian and Yihan Wu and Jee-weon Jung and Jia Qi Yip and Yoshiki Masuyama and William Chen and Yuning Wu and Yuxun Tang and Massa Baali and Dareen Alharhi and Dong Zhang and Ruifan Deng and Tejes Srivastava and Haibin Wu and Alexander H. Liu and Bhiksha Raj and Qin Jin and Ruihua Song and Shinji Watanabe},
      year={2024},
      eprint={2409.15897},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2409.15897}, 
}

Acknowledgement

We sincerely thank all the open-source implementations listed in https://github.com/shinjiwlab/versa/tree/main#list-of-metrics