Tensorflow and kaldi implementation of our Interspeech2019 paper VAE-based regularization for deep speaker embedding
note: the repo is not the final release, I will clean up our experiemental code and update soon
- computer
- Linux (centos 7)
- conda (Python 3.6)
- Tensorflow-gpu 1.8
- kaldi-toolkit
- VoxCeleb
- SITW
- CSLT_SITW
- use kaldi to extract x-vector from uttrance and get
xvector.ark
files - covert the kaldi
xvector.ark
files to numpy binary data format (xvector.ark
->xvector.npz
) - use tensorflow to train a VAE model, and get the V-vectors
- use kaldi recipes to calculate EER (equal error rate)
-
install kaldi (note: if you are one of CSLT members, you can referanceDr. tzy's Kaldi or CSLT Kaldi)
-
create a conda environment and install the necessary Python package
# for example
conda create -n tf python=3.6
conda activate tf
pip install -r requirements.txt
- git clone the code and modify the
path.sh
, make sure thatpath.sh
contains your kaldi path
git clone https://github.com/zyzisyz/v-vector-tf.git
# edit path.sh
vim path.sh
# export KALDI_ROOT=${replace it by your kaldi root path}
- calculate baseline EER
bash baseline.sh
- Train a model
# first of all, activate the conda Python environment
conda activate tf
# you can edit train.sh to change VAE model's config
bash train.sh
- Use kaldi-toolkit to train the backend scoring model and calculate EER
bash eval.sh
SITW Dev. Core
Cosine | PCA | PLDA | L-PLDA | P-PLDA | |
---|---|---|---|---|---|
x-vector | 15.67 | 16.17 | 9.09 | 3.12 | 4.16 |
a-vector | 16.10 | 16.48 | 11.21 | 4.24 | 5.01 |
v-vector | 10.32 | 9.94 | 3.62 | 3.54 | 4.31 |
c-vector | 9.05 | 8.55 | 3.50 | 3.31 | 3.85 |
Read the paper for more detail
Licensed under the Apache License, Version 2.0, Copyright zyzisyz
Yang Zhang (zyziszy@foxmail.com)