Skip to content

VikramxD/Minimochi

Repository files navigation

Cute Mochi Logo

MinMochi

Minimalist API Server for Mochi Text-to-Video Generation

Python 3.10+ License: MIT PRs Welcome Torch 2.0+

🚀 Overview

MinMochi serves the Genmo Mochi text-to-video model as a production-ready API. Generate high-quality videos from text prompts with minimal setup.

🛠️ System Requirements

  • 🐍 Python 3.10+
  • 🎮 GPU Requirements:
    • Recommended: NVIDIA A100 or H100
    • Suitable: NVIDIA A6000 or A40
  • ☁️ Active AWS account
  • 🐳 Docker

📦 Installation

# Get the code
git clone https://github.com/VikramxD/Minimochi
cd minimochi

# Set up environment
pip install uv
uv venv .venv
uv pip install -r requirements.txt
uv pip install -e . --no-build-isolation

⚙️ Configuration

MinMochi uses Pydantic settings for configuration management. The configuration is split into three main modules:

1. Mochi Settings (mochi_settings.py)

# Default settings, can be overridden with MOCHI_ prefixed env variables
model_name = "Genmo-Mochi"
transformer_path = "imnotednamode/mochi-1-preview-mix-nf4"
pipeline_path = "VikramxD/mochi-diffuser-bf16"
dtype = torch.bfloat16
device = "cuda"

# Optimization Settings
enable_vae_tiling = True
enable_model_cpu_offload = True
enable_attention_slicing = False

# Video Generation Settings
num_inference_steps = 20
guidance_scale = 7.5
height = 480
width = 848
num_frames = 150
fps = 10

2. AWS Settings (aws_settings.py)

# Override with environment variables
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""
AWS_REGION = "ap-south-1"
AWS_BUCKET_NAME = "diffusion-model-bucket"

3. Model Weights Settings (mochi_weights.py)

output_dir = Path("weights")
repo_id = "genmo/mochi-1-preview"
model_file = "dit.safetensors"
decoder_file = "decoder.safetensors"
encoder_file = "encoder.safetensors"
dtype = "bf16"  # Options: "fp16", "bf16"

🎬 Usage

Launch Server

python src/api/mochi_serve.py

Generate Videos

import requests
import json

url = "http://localhost:8000/api/v1/video/mochi"
payload = {
    "prompt": "A beautiful sunset over the mountains",
    "negative_prompt": "",
    "num_inference_steps": 100,
    "guidance_scale": 7.5,
    "height": 480,
    "width": 848,
    "num_frames": 150,
    "fps": 10
}

response = requests.post(url, json=[payload])
print(response.json())

📊 Monitoring

Metrics

Prometheus metrics available at /metrics:

  • Request processing time
  • GPU memory usage
  • Inference time

Logging

  • Structured logging with loguru
  • Log rotation at 100MB
  • 1-week retention period
  • Logs stored in logs/api.log

🎛️ GPU Memory Requirements

Resolution Frames Min GPU Memory
480x480 60 16GB
576x576 60 20GB
768x768 60 24GB

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments