Multimodal audio LMs for TTS, ASR, and voice cloning
- Python 3.10+
- CUDA 12.1+
Install dependencies:
pip install -r requirements.txt
Install ffmpeg:
For linux:
sudo apt update -y
sudo apt upgrade -y
sudo apt install ffmpeg -y
python -m inference --model_path 11mlabs/indri-0.1-124m-tts --device cuda:0 --port 8000
Defaults:
device
:cuda:0
port
:8000
Choices:
model_path
: HuggingFace collection
Redirect to http://localhost:8000/docs
to see the API documentation and test the service.