Skip to content

[ICTC-2024] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Minh Nguyen, Thanh Trung Nguyen, Hua Hiep Nguyen, Phuong-Nam Tran, Duc Ngoc Minh Dang

License

Notifications You must be signed in to change notification settings

nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Official implementation for the paper: Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture. The paper has been accepted to The 15th International Conference on ICT Convergence (ICTC2024).

Please press ⭐ button and/or cite papers if you feel helpful.

Abstract

In this study, we compared three architectures for the task of age and gender recognition from voice data: Long Short-Term Memory networks (LSTM), Hybrid of Convolutional Neural Networks Bidirectional Long Short-Term Memory (CNNs-BiLSTM), and the recently released RezoNet architecture. The dataset used in the study is sourced from Mozilla Common Voice in Japanese. Features such as pitch, magnitude, Mel-frequency cepstral coefficients (MFCCs), and filter-bank energies were extracted from the voice data for signal processing, and three architectures were evaluated. Our evaluation revealed that LSTM was slightly less accurate than RezoNet (83.1%), with hybrid CNNs-BiLSTM (93.1%) and LSTM achieving the best accuracy for gender recognition (93.5%). However, hybrid CNNs-BiLSTM architecture outperformed the other models in age recognition, with an accuracy of 69.75%, compared to 64.25% and 44.88% for LSTM and RezoNet, respectively. Using Japanese language data and the extracted characteristics, the hybrid CNNs-BiLSTM architecture model demonstrated the highest accuracy in both tests, highlighting its efficacy in voice-based age and gender detection. These results suggest promising avenues for future research and practical applications in this field.

Index Terms: Voice-Based Age and Gender Recognition, RezoNet, Convolutional Neural Network, Long Short-Term Memory, Bidirectional Long-Term Memory, Deep Learning.

Table of Contents

Usage

Dataset

In this study, we use voice dataset from Mozilla Comman Voice.

Download in here

Clone this repository

git clone "https://github.com/nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton.git"

Create Conda Enviroment and Install Requirement

conda create -n Voice-Based-Age-and-Gender-Recogniton python=3.10 -y
conda activate Voice-Based-Age-and-Gender-Recogniton
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Contact

For any information, please contact the main author:

Nhut Minh Nguyen at FPT University, Vietnam

Email: minhnhut.ngnn@gmail.com

GitHub: https://github.com/nhut-ngnn

About

[ICTC-2024] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Minh Nguyen, Thanh Trung Nguyen, Hua Hiep Nguyen, Phuong-Nam Tran, Duc Ngoc Minh Dang

Topics

Resources

License

Stars

Watchers

Forks