Skip to content

Feature extraction of speech signal is the initial stage of any speech recognition system.

Notifications You must be signed in to change notification settings

aishoot/Speech_Feature_Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Feature Extraction

The repository describes the feature extraction methods for speech signals.

Free speech datasets

  • OpenLSR: OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.
  • VoxForge: VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.
  • TIMIT: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
  • Mozilla Speech: Mozilla Releases the world's Second Largest Public Voice Data Set on Nov 29th, 2017.
  • Open Data for Deep Learning

File description

  • feature_extraction_functions.py: a set of feature extraction functions from RDShi-SpeakerCount.
  • MFCC: Mel-frequency cepstral coefficients calculation.
    • MFCC.py, MFCCTest.py: Compute the MFCC feature.
    • FeatureExtraction.ipynb: Speech preprocessing, including loading data, pre-emphasis, framing, window, Fourier-transform, power spectrum, filter banks, mfccs and mean normalization.
  • Volume: volume calculation.
  • ZeroCR: Zero-Crossing Rate calculation.
  • Pitch: Pitch calculation and pitch tracking.
  • Timbre: spectrogram drawing.
  • VAD: EPD (End-Point Detection), or Speech Detection, or VAD(Voice Activity Detection).

Requirements

Anaconda3 (Python3.x)

References & Code source

Releases

No releases published

Packages

No packages published