Skip to content

Aurora11111/speaker-recognition-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speaker recognition

PyTorch implementation of speech embedding net and loss described here: https://arxiv.org/pdf/1710.10467.pdf.

Also contains code to create embeddings compatible as input for the speaker diarization model found at https://github.com/google/uis-rnn

training loss

The TIMIT speech corpus was used to train the model, found here: https://catalog.ldc.upenn.edu/LDC93S1, or here, https://github.com/philipperemy/timit

Dependencies

  • PyTorch 0.4.1
  • python 3.5+
  • numpy 1.15.4
  • librosa 0.6.1

The python WebRTC VAD found at https://github.com/wiseman/py-webrtcvad is required to create run dvector_create.py, but not to train the neural network.

Preprocessing

Change the following config.yaml key to a regex containing all .WAV files in your downloaded TIMIT dataset. The TIMIT .WAV files must be converted to the standard format (RIFF) for the dvector_create.py script, but not for training the neural network.

unprocessed_data: './TIMIT/*/*/*/*.wav'

Run the preprocessing script:

./data_preprocess.py 

Two folders will be created, train_tisv and test_tisv, containing .npy files containing numpy ndarrays of speaker utterances with a 90%/10% training/testing split.

GE2E-loss model training

To train the speaker verification model, run:

./train_speech_embedder.py 

with the following config.yaml key set to true:

training: !!bool "true"

for testing, set the key value to:

training: !!bool "false"

The log file and checkpoint save locations are controlled by the following values:

log_file: './speech_id_checkpoint/Stats'
checkpoint_dir: './speech_id_checkpoint'

Only TI-SV is implemented.

Performance

EER across 10 epochs: 0.0377

D vector embedding creation

After training and testing the model, run dvector.py to create the data.pkl

The file can be loaded and used to train the triple-loss model.

triplet-loss model training

After create dvector,we use the triplet loss to train a model which are discribed here: https://arxiv.org/pdf/1705.02304.pdf run train.py

Reference

When reference speakers,run cli.py

https://github.com/HarryVolek/PyTorch_Speaker_Verification

https://github.com/philipperemy/deep-speaker

About

Speaker recognition ,Voiceprint recognition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages