Skip to content

anandaswarup/lm-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Introduction

LM-TTS is a single stage transformer language model capable of generating speech samples conditioned on text prompts. The text prompts are passed through a learned embedding layer to obtain a sequence of hidden-state representations. LM-TTS is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Model structure

The LM-TTS model can be de-composed into three distinct stages:

  1. Learned embedding layer: maps the text inputs to a sequence of hidden-state representations.
  2. LM-TTS decoder: a language model (LM) that auto-regressively generates audio tokens (or codes) conditional on the encoder hidden-state representations.
  3. Audio decoder: used to recover the audio waveform from the audio tokens predicted by the decoder.ß

About

Language Modeling based Text-to-Speech Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published