Introduction

LM-TTS is a single stage transformer language model capable of generating speech samples conditioned on text prompts. The text prompts are passed through a learned embedding layer to obtain a sequence of hidden-state representations. LM-TTS is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Model structure

The LM-TTS model can be de-composed into three distinct stages:

Learned embedding layer: maps the text inputs to a sequence of hidden-state representations.
LM-TTS decoder: a language model (LM) that auto-regressively generates audio tokens (or codes) conditional on the encoder hidden-state representations.
Audio decoder: used to recover the audio waveform from the audio tokens predicted by the decoder.ß

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Model structure

About

Releases

Packages

License

anandaswarup/lm-tts

Folders and files

Latest commit

History

Repository files navigation

Introduction

Model structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages