Based on their paper SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing and the original repository on Github we bring you a simple script for using the SpeechT5 model from Microsoft using Python
to generate speech from text.
Example code and the model card can be found on the Huggingface model page.
The script uses the Huggingface Transformers library to load the model and tokenizer. The model is then used to generate speech from text. The script is very simple and can be easily modified to suit your needs. It loads the file prompt.txt
and processes it in batches that contain two lines of input. For every two lines, it outputs a wav
file to disk as speech_#.wave
where #
is the number of the batch. The script will also print the the results to the console as it generates the audio files.
After editing prompt.txt
to contain the text you want to generate speech from, you can run the script like this;
python3 app.py
You'll end up with files such as;
speech_0.wav
speech_1.wav
speech_2.wav
If you need a combined MP3 file of all the generated audio, you can use the mp3.py
script to combine each of the files output by the model into a single MP3 file (output.mp3
). You can run it like this;
python3 mp3.py
We do not archive files, we overwrite them each run to keep the script simple. If you want to keep the files, move them to another directory before running the script again.
- Write your script in
prompt.txt
- Run
python3 app.py
- Optionally preview
speech_#.wav
files - Run
python3 mp3.py
to combine all thespeech_#.wav
files into a singleoutput.mp3
file - Optionally preview
output.mp3
file - Archive
prompt.txt
,speech_#.wav
andoutput.mp3
files into their own directory.
A future version may simply create a directory for each run and archive the files there by UUID.
You'll need to install the Python libraries;
pip install -r requirements.txt
This will load the following libraries - use a virtual environment if you want to keep your system clean;
transformers
numpy
torch
datasets
transformers
accelerate
soundfile
pathlib
pydub
sentencepiece
This code is released fully under a GNU GPL v3 license.
See the Free Software Foundation's page for more information.