Pretrained models
The model was trained on VoxCeleb1 dataset.
Model details:
- 40-dim log mel spectrogram as input
- 3-layer LSTM with hidden dimensions being 256
- 256-dim attentive pooled speaker embedding
Training details:
- 64 speakers, 10 utterances per speaker in a batch
- 250K steps