This project presents a common training procedure for TTS and ASR models suitable for a low-resource setup. During this common training, we sequentially run supervised and unsupervised training, the models produce new unpaired data and 'learn from each other'.
This repository containes reused code from ForwardTacotron (majority of the structure), Tacotron2, WaveGlow and Huggingface tutorial .
The code runs with Python 3.8.8. We are working on compatibility between the verions (currently incompatible).
Install requirements.txt
.
- You need to clone the data repo
cd data/
git clone git@github.com:giellalt/speech-sme.git speech-sme-tts
cp -r speech-sme-tts speech-sme-asr
We need two copies of data! The data repo is private, you need to have access. The fisrt data folder (first clone) should be renamed to data/speech-sme-tts
and the second one - data/speech-sme-asr
. You also need sme-freecorpus.txt
to be in home dir.
-
You still should be in
data/
. Runpython preprocess_asr_tts.py
. This will take some time. It will write the training files, split them and resample data for TTS and ASR tasks. -
cd ..
and runpython preprocess.py
, thenpython train_tacotron.py --force_align
andpython process_for_asr.py
(requires a lot of RAM) - these will finish data prep for tts and asr. If you cannot runpython process_for_asr.py
you can download pickled dataset from here. -
Preptrained models are here. Place the folder (don't rename)
checkpoint-27363
that you dowloaded inasr_output/
AND incheckpoints/sme_speech_tts.asr_forward/
(make a new dir incheckpoints/
). Place the files fromtacotron
incheckpoints/sme_speech_tts.tacotron/
. If you want to run inference, you need to put files fromforward_tacotron
incheckpoints/sme_speech_tts.forward/
. Putwaveglow_14000_st
inwaveglow/
folder.sme-freecorpus.txt
should be in home dir.
If everything worked out fine with the previous steps, you can now start the common training of TTS and ASR with python train_forward.py
. This repo is setup for inference, so if you want to train the models, you need to do a bit extra work. You need forward_tacotron/forward_step430K_weights.pyt
&forward_tacotron/forward_step_430K_optim.pyt
. Change the paths in utils/paths.py
respectively.
Alternatively, with your own data, you need to repeat Data preparation steps with the tacotron model that you trained, for asr you would need to run python process_for_asr.py --from_scratch
. This will create and save a new processor and vocab.
You would need to train ASR and TTS models without dual_transformation
for around 500 steps for ASR and at least 300K for TTS.
When you run python gen_forward --alpha .95 waveglow
or python gen_forward --alpha .95 griffinlim
this will generate audio in audio
folder from sentences.txt. The vocoder would be waveglow (recommended) or griffinlim respectively. --alpha
value (float) is responsible for teh speed of the audio.
Run predict.py
to inference with ASR model. This will both run WER calculation over the whole test set and will print out the predictions for the first 10 sentences in the dataset.
- Log in as instructed here.
- Go to
~/cluster/projects/nn9866k/
- mkdir [your project folder]
- Run
module load PyTorch/1.4.0-fosscuda-2019b-Python-3.7.4
- Run
module unload PyTorch/1.4.0-fosscuda-2019b-Python-3.7.4
- You can now put your code and data in [your project folder] -- e.g.
git clone
or upload (like Transmit) or scp (scp -r [your things] user@saga.sigma2.no:/the/path/to/the/shared/place
) - Make virtual env
python3 -m venv env
and ACTIVATE! pip install [your requirements.txt]
.- Do some edits if you need (if you need to test your code). Nothing that requires cuda would work here. Only text cleaning and similar tasks.
- Deactivate env.
- Create a file like
run_training.sh
- more here sbatch [your shell script]
will queue your task and run. You will see the running output in a file {job_id}.out, but please note, it will take a while before you see the first print statement. They arrive in batches (e.g. only after epoch is finished, you will see the prints). To see if the training is going, you can monitor .csv file with gpu usage stats.