Skip to content

Tensorflow and kaldi implementation of our paper "VAE-based regularization for deep speaker embedding"

License

Notifications You must be signed in to change notification settings

CSLT-THU/IS2019-VAE

Repository files navigation

v-vector-tf

Tensorflow and kaldi implementation of our Interspeech2019 paper VAE-based regularization for deep speaker embedding

note: the repo is not the final release, I will clean up our experiemental code and update soon

Dependency

  1. computer
  2. Linux (centos 7)
  3. conda (Python 3.6)
  4. Tensorflow-gpu 1.8
  5. kaldi-toolkit

Datasets and X-vector

  1. VoxCeleb
  2. SITW
  3. CSLT_SITW

Steps

  1. use kaldi to extract x-vector from uttrance and get xvector.ark files
  2. covert the kaldi xvector.ark files to numpy binary data format (xvector.ark -> xvector.npz)
  3. use tensorflow to train a VAE model, and get the V-vectors
  4. use kaldi recipes to calculate EER (equal error rate)

Usage

  1. install kaldi (note: if you are one of CSLT members, you can referanceDr. tzy's Kaldi or CSLT Kaldi)

  2. create a conda environment and install the necessary Python package

# for example
conda create -n tf python=3.6
conda activate tf
pip install -r requirements.txt
  1. git clone the code and modify the path.sh, make sure that path.sh contains your kaldi path
git clone https://github.com/zyzisyz/v-vector-tf.git

# edit path.sh
vim path.sh
# export KALDI_ROOT=${replace it by your kaldi root path}
  1. calculate baseline EER
bash baseline.sh
  1. Train a model
# first of all, activate the conda Python environment
conda activate tf
# you can edit train.sh to change VAE model's config
bash train.sh
  1. Use kaldi-toolkit to train the backend scoring model and calculate EER
bash eval.sh

Our result

SITW Dev. Core

Cosine PCA PLDA L-PLDA P-PLDA
x-vector 15.67 16.17 9.09 3.12 4.16
a-vector 16.10 16.48 11.21 4.24 5.01
v-vector 10.32 9.94 3.62 3.54 4.31
c-vector 9.05 8.55 3.50 3.31 3.85

Read the paper for more detail

About

Licensed under the Apache License, Version 2.0, Copyright zyzisyz

Repo Author

Yang Zhang (zyziszy@foxmail.com)

Contributors

About

Tensorflow and kaldi implementation of our paper "VAE-based regularization for deep speaker embedding"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published