Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Official implementation for the paper: Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture. The paper has been accepted to The 15th International Conference on ICT Convergence (ICTC2024).

Please press ⭐ button and/or cite papers if you feel helpful.

Abstract

In this study, we compared three architectures for the task of age and gender recognition from voice data: Long Short-Term Memory networks (LSTM), Hybrid of Convolutional Neural Networks Bidirectional Long Short-Term Memory (CNNs-BiLSTM), and the recently released RezoNet architecture. The dataset used in the study is sourced from Mozilla Common Voice in Japanese. Features such as pitch, magnitude, Mel-frequency cepstral coefficients (MFCCs), and filter-bank energies were extracted from the voice data for signal processing, and three architectures were evaluated. Our evaluation revealed that LSTM was slightly less accurate than RezoNet (83.1%), with hybrid CNNs-BiLSTM (93.1%) and LSTM achieving the best accuracy for gender recognition (93.5%). However, hybrid CNNs-BiLSTM architecture outperformed the other models in age recognition, with an accuracy of 69.75%, compared to 64.25% and 44.88% for LSTM and RezoNet, respectively. Using Japanese language data and the extracted characteristics, the hybrid CNNs-BiLSTM architecture model demonstrated the highest accuracy in both tests, highlighting its efficacy in voice-based age and gender detection. These results suggest promising avenues for future research and practical applications in this field.

Index Terms: Voice-Based Age and Gender Recognition, RezoNet, Convolutional Neural Network, Long Short-Term Memory, Bidirectional Long-Term Memory, Deep Learning.

Usage

Dataset

In this study, we use voice dataset from Mozilla Comman Voice.

Download in here

Clone this repository

git clone "https://github.com/nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton.git"

Create Conda Enviroment and Install Requirement

conda create -n Voice-Based-Age-and-Gender-Recogniton python=3.10 -y
conda activate Voice-Based-Age-and-Gender-Recogniton
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Contact

For any information, please contact the main author:

Nhut Minh Nguyen at FPT University, Vietnam

Email: minhnhut.ngnn@gmail.com

GitHub: https://github.com/nhut-ngnn

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Code		Code
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Abstract

Table of Contents

Usage

Dataset

Clone this repository

Create Conda Enviroment and Install Requirement

Contact

About

Languages

License

nhut-ngnn/Voice-Based-Age-and-Gender-Recogniton

Folders and files

Latest commit

History

Repository files navigation

Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet, and Hybrid CNNs-BiLSTM Architecture

Abstract

Table of Contents

Usage

Dataset

Clone this repository

Create Conda Enviroment and Install Requirement

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages