This repository contains the data preparation and evaluation code for the TS-VAD and TS-SEP experiments in our 2024 IEEE/ACM TASLP article, TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings by Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux (IEEE Xplore, arXiv).
The core and training code is available at https://github.com/merlresearch/tssep .
Using an existing environment, you can install the data preparation code with:
git clone https://github.com/merlresearch/tssep.git
cd tssep
pip install -e .
cd ..
git clone https://github.com/fgnt/tssep_data.git
cd tssep_data
pip install -e .
If you want so setup a fresh environment, see tools/README.md.
Once you have installed a fresh environment, you can activate it with . tools/path.sh
(It will also setup some environment variables).
Note: Kaldi and MPI are required for the recipes.
For ASR, you can use
openai-whisper
, espnet
or nemo_toolkit
as alternatives.
ToDo: Limit this to whisper, it has less dependencies.
egs/libri_css/README.md#steps-to-run-the-recipe contains the instructions for the LibriCSS data preparation, training and evaluation.
egs/libri_css/README.md#steps-to-evaluate-a-pretrained-model contains the instructions for the LibriCSS evaluation with a pretrained model.
If you are using this code please cite our paper ( ):
@article{Boeddeker2024feb,
author = {Boeddeker, Christoph and Subramanian, Aswin Shanmugam and Wichern, Gordon and Haeb-Umbach, Reinhold and Le Roux, Jonathan},
title = {{TS-SEP}: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
year = 2024,
volume = 32,
pages = {1185--1197},
month = feb,
}