speaker recognition

PyTorch implementation of speech embedding net and loss described here: https://arxiv.org/pdf/1710.10467.pdf.

Also contains code to create embeddings compatible as input for the speaker diarization model found at https://github.com/google/uis-rnn

The TIMIT speech corpus was used to train the model, found here: https://catalog.ldc.upenn.edu/LDC93S1, or here, https://github.com/philipperemy/timit

Dependencies

PyTorch 0.4.1
python 3.5+
numpy 1.15.4
librosa 0.6.1

The python WebRTC VAD found at https://github.com/wiseman/py-webrtcvad is required to create run dvector_create.py, but not to train the neural network.

Preprocessing

Change the following config.yaml key to a regex containing all .WAV files in your downloaded TIMIT dataset. The TIMIT .WAV files must be converted to the standard format (RIFF) for the dvector_create.py script, but not for training the neural network.

unprocessed_data: './TIMIT/*/*/*/*.wav'

Run the preprocessing script:

./data_preprocess.py

Two folders will be created, train_tisv and test_tisv, containing .npy files containing numpy ndarrays of speaker utterances with a 90%/10% training/testing split.

GE2E-loss model training

To train the speaker verification model, run:

./train_speech_embedder.py

with the following config.yaml key set to true:

training: !!bool "true"

for testing, set the key value to:

training: !!bool "false"

The log file and checkpoint save locations are controlled by the following values:

log_file: './speech_id_checkpoint/Stats'
checkpoint_dir: './speech_id_checkpoint'

Only TI-SV is implemented.

Performance

EER across 10 epochs: 0.0377

D vector embedding creation

After training and testing the model, run dvector.py to create the data.pkl

The file can be loaded and used to train the triple-loss model.

triplet-loss model training

After create dvector,we use the triplet loss to train a model which are discribed here: https://arxiv.org/pdf/1705.02304.pdf run train.py

Reference

When reference speakers,run cli.py

https://github.com/HarryVolek/PyTorch_Speaker_Verification

https://github.com/philipperemy/deep-speaker

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Results		Results
config		config
LICENSE		LICENSE
README.md		README.md
SR		SR
VAD_segments.py		VAD_segments.py
cli.py		cli.py
data.py		data.py
data_load.py		data_load.py
data_preprocess.py		data_preprocess.py
devector.py		devector.py
dvector_create.py		dvector_create.py
embedding_create.py		embedding_create.py
hparam.py		hparam.py
save_feature.py		save_feature.py
sequence_testid.py		sequence_testid.py
singlepredict.py		singlepredict.py
speakerid_json.py		speakerid_json.py
speech_embedder_net.py		speech_embedder_net.py
speech_features.py		speech_features.py
train.py		train.py
train_speech_embedder.py		train_speech_embedder.py
triplet_loss.py		triplet_loss.py
unseen_speakers.py		unseen_speakers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speaker recognition

Dependencies

Preprocessing

GE2E-loss model training

Performance

D vector embedding creation

triplet-loss model training

Reference

About

Releases

Packages

Contributors 3

Languages

License

Aurora11111/speaker-recognition-pytorch

Folders and files

Latest commit

History

Repository files navigation

speaker recognition

Dependencies

Preprocessing

GE2E-loss model training

Performance

D vector embedding creation

triplet-loss model training

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages