Voice Gender Recognition

Overview

This project focuses on training a machine learning model to classify voices as male or female based on their acoustic properties. The dataset used for training consists of 36,168 voice samples collected from a Yandex contest. The best-performing model achieves an accuracy of 98% during cross-validation.

Dataset Information

You can download the raw dataset, which includes .wav files and a corresponding .csv file with labels, from the following link: Raw Dataset.

Alternatively, a pre-processed version of the dataset, containing 3.5k rows (due to hardware limitations), is available here: Pre-processed Dataset.

Acoustic Properties

The following acoustic properties are measured for each voice sample:

meanfreq: Mean frequency (in kHz)
sd: Standard deviation of frequency
median: Median frequency (in kHz)
Q25: First quantile (in kHz)
Q75: Third quantile (in kHz)
IQR: Interquantile range (in kHz)
skew: Skewness
kurt: Kurtosis
sp.ent: Spectral entropy
sfm: Spectral flatness
mode: Mode frequency
centroid: Frequency centroid
peakf: Peak frequency (frequency with highest energy)
meanfun: Average fundamental frequency measured across the acoustic signal
minfun: Minimum fundamental frequency measured across the acoustic signal
maxfun: Maximum fundamental frequency measured across the acoustic signal
meandom: Average dominant frequency measured across the acoustic signal
mindom: Minimum dominant frequency measured across the acoustic signal
maxdom: Maximum dominant frequency measured across the acoustic signal
dfrange: Range of dominant frequency measured across the acoustic signal
modindx: Modulation index, calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range

Technology Stack

The project utilizes the following libraries and tools:

Pandas: For data manipulation and analysis
NumPy: For numerical operations
Seaborn and Matplotlib: For data visualization
Scikit-learn: For machine learning algorithms and model evaluation
XGBoost: For gradient boosting algorithms
Librosa: For audio processing and feature extraction
Concurrent: For parallel processing
Logging: For logging errors and warnings

Environment Setup

Create a Virtual Environment:
```
python -m venv env
```
```
 source env/bin/activate
```
```
 pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio2vec.py		audio2vec.py
dataAnalysis.ipynb		dataAnalysis.ipynb
model.ipynb		model.ipynb
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Gender Recognition

Overview

Dataset Information

Acoustic Properties

Technology Stack

Environment Setup

About

Releases

Packages

Languages

License

Melodiz/voice-gender-with-extractor

Folders and files

Latest commit

History

Repository files navigation

Voice Gender Recognition

Overview

Dataset Information

Acoustic Properties

Technology Stack

Environment Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages