Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.
- Ensure that you have
conda
ready. If not, may I suggest Mambaforge? - Run
synesthesiam/docker-mozillatts
:docker run -it -p 5002:5002 synesthesiam/mozillatts
. - Create a conda env:
conda env create -n tts -f conda-requirements.txt -y
. - Activate the env:
conda activate tts
. - Start the server with Gunicorn:
gunicorn main:app --bind 0.0.0.0:80 --timeout 3600 --worker-class sanic.worker.GunicornWorker
.- I don't use the HTTP server that comes with Sanic, because macOS complains "The process has forked and you cannot use this CoreFoundation functionality safely".
- As seen in the
--timeout
option, a request allowed to run for 1h only. Very large text, therefore, may fail. - I migrated from Flask to Sanic because Sanic natively supports async view functions, which saves me from interacting with
asyncio.get_event_loop()
.
Simply do:
docker-compose up --build
Something to note:
- The Dockerfile in this repo is for the URL-to-audio web server only. It still requires the
synesthesiam/docker-mozillatts
image to be running in a container. Therefore, although you can manually set up the 2 containers, the Docker Compose way is always going to be easier. - It uses Gunicorn instead of the vanilla Sanic HTTP server.
The containers work together like this:
Sidenote: The diagram above is generated with this command:
docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz render -m image docker-compose.yml
To hear the playback,
- If you are on macOS, ensure that
sox
is installed:brew install sox
. This provides the playback commandplay
. - If you are on Linux,
aplay
should do.
Now, you can convert a webpage (using https://sjmulder.nl/en/
as an example) into audio using:
curl -G --output - \
--data-urlencode 'url=https://sjmulder.nl/en/' \
'http://localhost:80/' | \
play -
Of course, you can always save the returned audio as a file and work from there.
Here's a list of future features and tasks:
- Investigate why
aiohttp
calls to thesynesthesiam/docker-mozillatts
would fail (See the Notebook). - Make
.lrc
lyrics or subtitles to go with the audio file. - Make the speech read out different formats with different audio clues. For example:
- bolded text can be read with emphasis or notification sounds,
- read out the numbering of list items, no matter whether the list is ordered or unordered,
- different levels of headings should be announced ("## lorem ipsum" should sound like "Section 1: lorem ipsum."), and
- images should be announced with their
alt
titles. If that's unavailable, call a image labeling service to generate a caption on-the-fly.
GPL v3. See LICENSE
.