MinMochi serves the Genmo Mochi text-to-video model as a production-ready API. Generate high-quality videos from text prompts with minimal setup.
- 🐍 Python 3.10+
- 🎮 GPU Requirements:
- Recommended: NVIDIA A100 or H100
- Suitable: NVIDIA A6000 or A40
- ☁️ Active AWS account
- 🐳 Docker
# Get the code
git clone https://github.com/VikramxD/Minimochi
cd minimochi
# Set up environment
pip install uv
uv venv .venv
uv pip install -r requirements.txt
uv pip install -e . --no-build-isolation
MinMochi uses Pydantic settings for configuration management. The configuration is split into three main modules:
# Default settings, can be overridden with MOCHI_ prefixed env variables
model_name = "Genmo-Mochi"
transformer_path = "imnotednamode/mochi-1-preview-mix-nf4"
pipeline_path = "VikramxD/mochi-diffuser-bf16"
dtype = torch.bfloat16
device = "cuda"
# Optimization Settings
enable_vae_tiling = True
enable_model_cpu_offload = True
enable_attention_slicing = False
# Video Generation Settings
num_inference_steps = 20
guidance_scale = 7.5
height = 480
width = 848
num_frames = 150
fps = 10
# Override with environment variables
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""
AWS_REGION = "ap-south-1"
AWS_BUCKET_NAME = "diffusion-model-bucket"
output_dir = Path("weights")
repo_id = "genmo/mochi-1-preview"
model_file = "dit.safetensors"
decoder_file = "decoder.safetensors"
encoder_file = "encoder.safetensors"
dtype = "bf16" # Options: "fp16", "bf16"
python src/api/mochi_serve.py
import requests
import json
url = "http://localhost:8000/api/v1/video/mochi"
payload = {
"prompt": "A beautiful sunset over the mountains",
"negative_prompt": "",
"num_inference_steps": 100,
"guidance_scale": 7.5,
"height": 480,
"width": 848,
"num_frames": 150,
"fps": 10
}
response = requests.post(url, json=[payload])
print(response.json())
Prometheus metrics available at /metrics
:
- Request processing time
- GPU memory usage
- Inference time
- Structured logging with loguru
- Log rotation at 100MB
- 1-week retention period
- Logs stored in
logs/api.log
Resolution | Frames | Min GPU Memory |
---|---|---|
480x480 | 60 | 16GB |
576x576 | 60 | 20GB |
768x768 | 60 | 24GB |
This project is licensed under the MIT License - see the LICENSE file for details.
- Genmo.ai for the original Mochi model
- Hugging Face Diffusers
- LitServe - API framework