🔥 AudioBench 🔥

⚡ A repository for evaluating AudioLLMs in various tasks 🚀 ⚡
⚡ AudioBench: A Universal Benchmark for Audio Large Language Models 🚀 ⚡
🌟 Come to View Our Live Leaderboard on Huggingface Space 🌟

Change log

DEC 2024: Support More (35) datasets / More Models (2 cascade and 3 fusion models).
SEP 2024: Add MuChoMusic dataset for music evaluation (multiple choice questions).
AUG 2024: Support a 6 speech translation datasets. Update the evaluation script for several MCQ evaluation.
AUG 2024: Leaderboard is live. Check it out here.
JUL 2024: We are working hard on the leaderboard and speech translation dataset. Stay tuned!
JUL 2024: Support all INITIAL 26 datasets listed in AudioBench manuscript.

🔧 Installation

Installation with pip:

pip install -r requirements.txt

For model-as-judge evaluation, we serve the judgement model as a service via vllm on port 5000.

⏩ Quick Start

The example is hosting a Llama-3-70B-Instruct model and running the cascade Whisper + Llama-3 model.

# Step 1:
# Server the judgement model using VLLM framework (my example is using int4 quantized version)
# This requires with 1 * 80GB GPU
bash vllm_model_judge_llama_3_70b.sh

# Step 2:
# We perform model inference and obtain the evaluation results with the second GPU
GPU=2
BATCH_SIZE=1
OVERWRITE=True
NUMBER_OF_SAMPLES=-1 # indicate all test samples if number_of_samples=-1

MODEL_NAME=Qwen2-Audio-7B-Instruct

DATASET=cn_college_listen_mcq_test
METRICS=llama3_70b_judge_binary

bash eval.sh $DATASET $MODEL_NAME $GPU $BATCH_SIZE $OVERWRITE $METRICS $NUMBER_OF_SAMPLES

How to Evaluation AudioBench Supported Datasets?

That's as simple as it can be. Replace the DATASET and METRIC name. A full list of supported datasets can be found: SUPPORTED DATASETS.

DATASET=librispeech_test_clean
METRIC=wer

How to Evaluation AudioBench Supported Models?

That's as simple as it can be. Replace the MODEL_NAME. A full list of supported datasets can be found: SUPPORTED MODELS.

MODEL_NAME=cascade_whisper_large_v3_llama_3_8b_instruct

How to Evaluation Your Models?

To evaluate on new models, please refer to adding_new_model.

How to Evaluation on Your Dataset?

Two simple steps:

Add dataset loader and inference part. Example for cn_college_listen_mcq_test
Edit dataset.py

📖 Citation

If you find our work useful, please consider citing our paper!

@article{wang2024audiobench,
  title={AudioBench: A Universal Benchmark for Audio Large Language Models},
  author={Wang, Bin and Zou, Xunlong and Lin, Geyu and Sun, Shuo and Liu, Zhuohan and Zhang, Wenyu and Liu, Zhengyuan and Aw, AiTi and Chen, Nancy F},
  journal={arXiv preprint arXiv:2406.16020},
  year={2024}
}

Researchers, companies or groups that are using AudioBench:

Llama3-S: When Llama Learns to Listen
More to come...

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt
vllm_model_judge_llama_3_70b.sh		vllm_model_judge_llama_3_70b.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 AudioBench 🔥

Change log

🔧 Installation

⏩ Quick Start

How to Evaluation AudioBench Supported Datasets?

How to Evaluation AudioBench Supported Models?

How to Evaluation Your Models?

How to Evaluation on Your Dataset?

📖 Citation

Researchers, companies or groups that are using AudioBench:

About

Releases

Packages

Languages

License

AudioLLMs/AudioBench

Folders and files

Latest commit

History

Repository files navigation

🔥 AudioBench 🔥

Change log

🔧 Installation

⏩ Quick Start

How to Evaluation AudioBench Supported Datasets?

How to Evaluation AudioBench Supported Models?

How to Evaluation Your Models?

How to Evaluation on Your Dataset?

📖 Citation

Researchers, companies or groups that are using AudioBench:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages