Skip to content

jwliao1209/Multimodal-Pathological-Voice-Classification

 
 

Repository files navigation

Multimodal Pathological Voice Classification

This repository is code of AI CUP 2023 Spring Multimodal Pathological Voice Classification Competition. We achieved a public ranking of 8th and a private ranking of 1th, corresponding to scores of 0.657057 and 0.641098, respectively.

Getting the code

You can download all the files in this repository by cloning this repository:

git clone https://github.com/jwliao1209/Audio-Classification.git

Proposed pipeline

The feature extraction process consists of two parts. In the first part, we employ the Fast Fourier Transform (FFT) to extract frequency features and calculate statistical indicators, constructing a global feature. The second part involves utilizing a deep learning-based pretraining model to extract local features, followed by dimension reduction using Principal Component Analysis (PCA) to retain relevant feature combinations. For the model training phase, we utilize machine learning-based tree models, namely Random Forest and LightGBM, and a deep learning-based transformer model called TabPFN for prediction purposes. An ensemble is performed on the predicted probabilities these models generate to obtain the final output CleanShot 2023-09-03 at 17 30 45@2x

Requirements

To set the environment, you can run this command:

conda create --name audio python=3.10
conda activate audio
pip install -r requirements.txt

Data Preprocessing

python WavEncoder.py --train_wav <training_audio_directory> \
                     --public_wav <public_audio_directory> \
                     --private_wav <private_audio_directory> \
                     --train_csv <training_csv_directory> \
                     --public_csv <public_csv_directory> \
                     --private_csv <private_csv_directory> \
                     --output_path <output_path>

Reproducing training results

bash ./run_reproduce.sh

Operating system and device

We develop the code on Ubuntu 22.04.1 LTS operating system and use python 3.10 version. All trainings are performed on a server with Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz and NVIDIA GeForce GTX 1080 Ti GPU.

Citation

@misc{
    title  = {multimodal_pathological_voice_classification},
    author = {Chun-Hsien Chen, Shu-Cheng Zheng, Jia-Wei Liao, Yi-Cheng Hung},
    url    = {https://github.com/jwliao1209/Audio-Classification},
    year   = {2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • Shell 2.8%