Webpage-to-Speech

Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.

Deployment

To run locally

Ensure that you have conda ready. If not, may I suggest Mambaforge?
Run synesthesiam/docker-mozillatts: docker run -it -p 5002:5002 synesthesiam/mozillatts.
Create a conda env: conda env create -n tts -f conda-requirements.txt -y.
Activate the env: conda activate tts.
Start the server with Gunicorn: gunicorn main:app --bind 0.0.0.0:80 --timeout 3600 --worker-class sanic.worker.GunicornWorker.
- I don't use the HTTP server that comes with Sanic, because macOS complains "The process has forked and you cannot use this CoreFoundation functionality safely".
- As seen in the --timeout option, a request allowed to run for 1h only. Very large text, therefore, may fail.
- I migrated from Flask to Sanic because Sanic natively supports async view functions, which saves me from interacting with asyncio.get_event_loop().

To run with Docker

Simply do:

docker-compose up --build

Something to note:

The Dockerfile in this repo is for the URL-to-audio web server only. It still requires the synesthesiam/docker-mozillatts image to be running in a container. Therefore, although you can manually set up the 2 containers, the Docker Compose way is always going to be easier.
It uses Gunicorn instead of the vanilla Sanic HTTP server.

The containers work together like this:

Sidenote: The diagram above is generated with this command:

docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz render -m image docker-compose.yml

Usage

To hear the playback,

If you are on macOS, ensure that sox is installed: brew install sox. This provides the playback command play.
If you are on Linux, aplay should do.

Now, you can convert a webpage (using https://sjmulder.nl/en/ as an example) into audio using:

curl -G --output - \
    --data-urlencode 'url=https://sjmulder.nl/en/' \
    'http://localhost:80/' | \
    play -

Of course, you can always save the returned audio as a file and work from there.

To-dos

Here's a list of future features and tasks:

Investigate why aiohttp calls to the synesthesiam/docker-mozillatts would fail (See the Notebook).
Make .lrc lyrics or subtitles to go with the audio file.
Make the speech read out different formats with different audio clues. For example:
- bolded text can be read with emphasis or notification sounds,
- read out the numbering of list items, no matter whether the list is ordered or unordered,
- different levels of headings should be announced ("## lorem ipsum" should sound like "Section 1: lorem ipsum."), and
- images should be announced with their alt titles. If that's unavailable, call a image labeling service to generate a caption on-the-fly.

License

GPL v3. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
mozillatts		mozillatts
redis_conf		redis_conf
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
conda-requirements.txt		conda-requirements.txt
docker-compose.png		docker-compose.png
docker-compose.yml		docker-compose.yml
main.ipynb		main.ipynb
main.py		main.py
pip-requirements.txt		pip-requirements.txt
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webpage-to-Speech

Deployment

To run locally

To run with Docker

Usage

To-dos

License

About

Releases

Packages

Contributors 2

Languages

License

tslmy/tts

Folders and files

Latest commit

History

Repository files navigation

Webpage-to-Speech

Deployment

To run locally

To run with Docker

Usage

To-dos

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages