Ability to Read Longer Audio (ie Audiobooks) #83

fakerybakery · 2023-11-22T01:56:34Z

fakerybakery
Nov 22, 2023

Hi,
Might it be possible to implement a tqdm progress bar for longer text? This would make it possible to easily narrate entire audiobooks!
Thanks!

yl4579 · 2023-11-22T03:38:46Z

yl4579
Nov 22, 2023
Maintainer

I think so far it’s not very good at narrating the entire audiobook because the training data isn’t the entire audiobook. The training data is purely independent clip taken from amateur audiobooks readings, rather than an entire audiobook. It won’t be like ElevenLabs that are trained with professional audiobook datasets as these data are usually not public domains. However, if we do have the data, it can be easily changed to train on this sort of data, by conditioning on the previous style to sample the current style. This probably would reproduce the effect of ElevenLabs, especially for dialogues.

The closest dataset to entire audiobook is LJSpeech, but again it’s completely non-fiction, so it won’t be good for any fiction reading (no dialogue), and it might produce unnatural intonation’s because each clip was treated independently during training.

0 replies

fakerybakery · 2023-11-22T03:46:35Z

fakerybakery
Nov 22, 2023
Author

Hmm. Thanks. LibriVox seems like a good place to get public domain audiobooks. Are there any plans to add this capability in the future?

0 replies

yl4579 · 2023-11-22T03:51:19Z

yl4579
Nov 22, 2023
Maintainer

LibriTTS is already taken from LibriVox, but for some reason they aren’t complete audiobook narration but very fragmentized clips taken from complete audiobook narrations. I don’t know why they remove a lot of clips.

0 replies

fakerybakery · 2023-11-22T16:40:28Z

fakerybakery
Nov 22, 2023
Author

I feel like the quality would be lower if you trained it on an entire audiobook, right? I don't know, I guess it just feels like the longer the samples are the worse it will be (I might be wrong). Maybe we can use Tortoise TTS's splitting script with this?

However, if it's possible to train a TTS model on long text without degrading quality, it shouldn't be too hard to write a script to scrape LibriVox based on readers (they have an API). I was able to make this dataset a while back using their API, but I didn't include readers at that time.

0 replies

yl4579 · 2023-11-22T17:16:10Z

yl4579
Nov 22, 2023
Maintainer

No we do have to train on audio clips, but the idea is we condition the current style sampling on previous text and style, so it will be more continuous and possibly also makes it handle dialogue better (if the audio clips are split according to dialogues). It won’t work if we train on entire audio clips because we don’t have enough RAM.

0 replies

fakerybakery · 2023-11-22T17:21:16Z

fakerybakery
Nov 22, 2023
Author

Hmm interesting! Are you planning to implement something like this in the future?

0 replies

yl4579 · 2023-11-22T20:21:48Z

yl4579
Nov 22, 2023
Maintainer

Yeah probably, but I don't think it'll be that simple. If the effort is more than trivial concatenation it could be a different project or paper, but now the difference probably won't be big enough on LibriTTS dataset because there is no dialogue. It's more useful if we can get some fictional audiobook datasets that are separated by characters.

0 replies

fakerybakery · 2023-11-22T23:18:47Z

fakerybakery
Nov 22, 2023
Author

Hmm. Hypothetically, if there was a long audiobook dataset available, how difficult do you think it would be to implement?

0 replies

fakerybakery · 2023-11-23T00:10:11Z

fakerybakery
Nov 23, 2023
Author

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

0 replies

MariasStory · 2023-11-23T13:34:03Z

MariasStory
Nov 23, 2023

I implemented a basic long-text reader on the online demo by splitting text, but it isn't perfect yet. (update: I removed it because someone said it made it harder to clone with Docker)

I am fine with removing the long-text option, because I think that it should be a default setting in every task.
I say that the long text can/should be split and processed automatically.

0 replies

fivestones · 2023-11-27T08:23:55Z

fivestones
Nov 27, 2023

The problem I had with long-text and splitting by sentences is that occasionally, only with short (less than 40 character) sentences I got very loud white noice, sometimes minutes long for a single sentence. I dealt with this by combining short sentences with other sentences around them to make sure none of the text blocks given to styletts2 was short (see #46) but if that was taken care of it would be much easier to deal with long-form text.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to Read Longer Audio (ie Audiobooks) #83

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ability to Read Longer Audio (ie Audiobooks) #83

fakerybakery Nov 22, 2023

Replies: 11 comments

yl4579 Nov 22, 2023 Maintainer

fakerybakery Nov 22, 2023 Author

yl4579 Nov 22, 2023 Maintainer

fakerybakery Nov 22, 2023 Author

yl4579 Nov 22, 2023 Maintainer

fakerybakery Nov 22, 2023 Author

yl4579 Nov 22, 2023 Maintainer

fakerybakery Nov 22, 2023 Author

fakerybakery Nov 23, 2023 Author

MariasStory Nov 23, 2023

fivestones Nov 27, 2023

fakerybakery
Nov 22, 2023

yl4579
Nov 22, 2023
Maintainer

fakerybakery
Nov 22, 2023
Author

yl4579
Nov 22, 2023
Maintainer

fakerybakery
Nov 22, 2023
Author

yl4579
Nov 22, 2023
Maintainer

fakerybakery
Nov 22, 2023
Author

yl4579
Nov 22, 2023
Maintainer

fakerybakery
Nov 22, 2023
Author

fakerybakery
Nov 23, 2023
Author

MariasStory
Nov 23, 2023

fivestones
Nov 27, 2023