Skip to content

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥

License

Notifications You must be signed in to change notification settings

Yuan-ManX/ai-game-devtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

AI Game DevTools (AI-GDT) 🎮

AI-Game

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥

Table of Contents

Project List

Tool (AI LLM)

Source Description Paper Game Engine Type
AgentGPT 🤖 Assemble, configure, and deploy autonomous AI Agents in your browser. Tool
AICommand ChatGPT integration with Unity Editor. Unity Tool
AIOS LLM Agent Operating System. Tool
AI Scientist The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv Tool
Assistant CLI A comfortable CLI tool to use ChatGPT service🔥 Tool
Auto-GPT An experimental open-source attempt to make GPT-4 fully autonomous. Tool
BabyAGI This Python script is an example of an AI-powered task management system. Tool
👶🤖🖥️ BabyAGI UI BabyAGI UI is designed to make it easier to run and develop with babyagi in a web app, like a ChatGPT. Tool
baichuan-7B A large-scale 7B pretraining language model developed by Baichuan. Tool
Baichuan-13B A 13B large language model developed by Baichuan Intelligent Technology. Tool
Baichuan 2 A series of large language models developed by Baichuan Intelligent Technology. Tool
Bisheng Bisheng is an open LLM devops platform for next generation AI applications. Tool
Character-LLM A Trainable Agent for Role-Playing. arXiv Tool
ChatDev Communicative Agents for Software Development. arXiv Tool
ChatGPT-API-unity Binds ChatGPT chat completion API to pure C# on Unity. Unity Tool
ChatGPTForUnity ChatGPT for unity. Unity Tool
ChatRWKV ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. Tool
ChatYuan Large Language Model for Dialogue in Chinese and English. Tool
Chinese-LLaMA-Alpaca-3 (Chinese Llama-3 LLMs) developed from Meta Llama 3. Tool
Chrome-GPT An AutoGPT agent that controls Chrome on your desktop. Tool
CogVLM CogVLM, a powerful open-source visual language foundation model. arXiv Tool
CoreNet A library for training deep neural networks. Tool
DBRX DBRX is a large language model trained by Databricks. Tool
DCLM DataComp for Language Models. arXiv Tool
DemoGPT Auto Gen-AI App Generator with the Power of Llama 2 Tool
Design2Code Automating Front-End Engineering Tool
Devika Devika is an Agentic AI Software Engineer. Tool
Devon An open-source pair programmer. Tool
Dora Generating powerful websites, one prompt at a time. Tool
Flowise Drag & drop UI to build your customized LLM flow using LangchainJS. Tool
Gemini Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code. Tool
Gemma Gemma is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models. Tool
gemma.cpp lightweight, standalone C++ inference engine for Google's Gemma models. Tool
GLM-4 GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. Tool
GPT4All A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue. Tool
GPT-4o GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. Tool
GPTScript Develop LLM Apps in Natural Language. Tool
Grok-1 The weights and architecture of our 314 billion parameter Mixture-of-Experts model, Grok-1. Tool
HuggingChat Making the community's best AI chat models available to everyone. Tool
Hugging Face API Unity Integration This Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. Unity Tool
ImageBind ImageBind One Embedding Space to Bind Them All. arXiv Tool
Index-1.9B A SOTA lightweight multilingual LLM. Tool
InteractML-Unity InteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D. Unity Tool
InteractML-Unreal Engine Bringing Machine Learning to Unreal Engine. Unreal Engine Tool
InternLM InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. arXiv Tool
InternLM-XComposer InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. arXiv Tool
Jan Bring AI to your Desktop. Tool
Lamini Lamini allows any engineering team to outperform general purpose LLMs through RLHF and fine- tuning on their own data. Tool
LaMini-LM LaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions. Tool
LangChain LangChain is a framework for developing applications powered by language models. Tool
LangFlow ⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. Tool
LaVague Automate automation with Large Action Model framework. Tool
Lemur Open Foundation Models for Language Agents. Tool
Lepton AI A Pythonic framework to simplify AI service building. Tool
Lit-LLaMA Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Tool
llama2-webui Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Tool
Llama 3 The official Meta Llama 3 GitHub site. Tool
Llama 3.1 Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Tool
LLaSM Large Language and Speech Model. Tool
LLM Answer Engine Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. Tool
llm.c LLM training in simple, raw C/CUDA. Tool
LLMUnity Create characters in Unity with LLMs! Unity Tool
LLocalSearch LLocalSearch is a completely locally running search engine using LLM Agents. Tool
LogicGamesSolver A Python tool to solve logic games with AI, Deep Learning and Computer Vision. Tool
LongWriter LongWriter: Unleashing 10,000+ Word Generation From Long Context LLMs. arXiv Tool
Large World Model (LWM) Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model. arXiv Tool
Lumina-T2X Lumina-T2X is a unified framework for Text to Any Modality Generation. arXiv Tool
MetaGPT The Multi-Agent Framework Tool
MiniCPM-2B An end-side LLM outperforms Llama2-13B. Tool
MiniGPT-4 Enhancing Vision-language Understanding with Advanced Large Language Models. arXiv Tool
MiniGPT-5 Interleaved Vision-and-Language Generation via Generative Vokens. arXiv Tool
Mixtral 8x7B A high quality Sparse Mixture-of-Experts. arXiv Tool
Mistral 7B The best 7B model to date, Apache 2.0. Tool
Mistral Large Mistral Large is a new cutting-edge text generation model. It reaches top-tier reasoning capabilities. Tool
MLC LLM Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. Tool
MobiLlama Towards Accurate and Lightweight Fully Transparent GPT. arXiv Tool
MoE-LLaVA Mixture of Experts for Large Vision-Language Models. arXiv Tool
Moshi Moshi is an experimental conversational AI. Tool
Moshi Moshi: a speech-text foundation model for real time dialogue. Tool
MOSS An open-source tool-augmented conversational language model from Fudan University. Tool
mPLUG-Owl🦉 Modularization Empowers Large Language Models with Multimodality. arXiv Tool
Nemotron-4 A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. arXiv Tool
NExT-GPT Any-to-Any Multimodal Large Language Model. Tool
OLMo Open Language Model arXiv Tool
OmniLMM Large multi-modal models for strong performance and efficient deployment. Tool
OneLLM One Framework to Align All Modalities with Language. arXiv Tool
Open-Assistant OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. Tool
OpenDevin An autonomous AI software engineer. Tool
Orion-14B Orion-14B is a family of models includes a 14B foundation LLM, and a series of models. arXiv Tool
Panda Overseas Chinese open source large language model, based on Llama-7B, -13B, -33B, -65B for continuous pre-training in the Chinese field. Tool
Perplexica An AI-powered search engine. Tool
Pi AI chatbot designed for personal assistance and emotional support. Tool
Qwen1.5 Qwen1.5 is the improved version of Qwen. Tool
Qwen2 Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. Tool
Qwen-7B The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. Tool
RepoAgent RepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. arXiv Tool
Sanity AI Engine Sanity AI Engine for the Unity Game Development Tool. Unity Tool
SearchGPT 🌳 Connecting ChatGPT with the Internet Tool
ShareGPT4V Improving Large Multi-Modal Models with Better Captions. Tool
Skywork Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. Tool
StableLM Stability AI Language Models. arXiv Tool
Stanford Alpaca An Instruction-following LLaMA Model. Tool
Text generation web UI A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA. Tool
TinyChatEngine On-Device LLM Inference Library. Tool
ToolBench An open platform for training, serving, and evaluating large language model for tool learning. Tool
Unity ChatGPT Unity ChatGPT Experiments. Unity Tool
Unity OpenAI-API Integration Integrate openai GPT-3 language model and ChatGPT API into a Unity project. Unity Tool
Unreal Engine 5 Llama LoRA A proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools. Unreal Engine Tool
UnrealGPT A collection of Unreal Engine 5 Editor Utility widgets powered by GPT3/4. Unreal Engine Tool
Video-LLaVA Learning United Visual Representation by Alignment Before Projection. arXiv Tool
WebGPT Run GPT model on the browser with WebGPU. Tool
Web3-GPT Deploy smart contracts with AI Tool
WordGPT 🤖 Bring the power of ChatGPT to Microsoft Word Tool
XAgent An Autonomous LLM Agent for Complex Task Solving. Tool
Yi A series of large language models trained from scratch by developers. Tool
01 Project The open-source language model computer. Tool

^ Back to Contents ^

Game (Agent)

Source Description Paper Game Engine Type
AgentBench A Comprehensive Benchmark to Evaluate LLMs as Agents. arXiv Agent
Agent Group Chat An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior. arXiv Agent
Agent K An autoagentic AGI that is self-evolving and modular. Agent
AgentScope Start building LLM-empowered multi-agent applications in an easier way. arXiv Agent
AgentSims An Open-Source Sandbox for Large Language Model Evaluation. Agent
AI Town AI Town is a virtual town where AI characters live, chat and socialize. Agent
anime.gf Local & Open Source Alternative to CharacterAI. Game
Astrocade Create games with AI Game
Atomic Agents The Atomic Agents framework is designed to be modular, extensible, and easy to use. Agent
AutoAgents A Framework for Automatic Agent Generation. Agent
AutoGen Enable Next-Gen Large Language Model Applications. arXiv Agent
behaviac Behaviac is a framework of the game AI development. Framework
Biomes Biomes is an open source sandbox MMORPG built for the web using web technologies such as Next.js, Typescript, React and WebAssembly. Game
Buffer of Thoughts Thought-Augmented Reasoning with Large Language Models. arXiv Agent
Byzer-Agent Easy, fast, and distributed agent framework for everyone. Agent
Cat Town A C(h)atGPT-powered simulation with cats. Agent
Cat Town A C(h)atGPT-powered simulation with cats. Agent
CharacterGLM Customizing Chinese Conversational AI Characters with Large Language Models. arXiv Agent
ChatDev Communicative Agents for Software Development. arXiv Agent
CogAgent CogAgent is an open-source visual language model improved based on CogVLM. arXiv Agent
Cradle Towards General Computer Control. Agent
crewAI Framework for orchestrating role-playing, autonomous AI agents. Agent
Dify Dify is an open-source LLM app building platform. Agent
Digital Life Project Autonomous 3D Characters with Social Intelligence. arXiv Agent
everything-ai Your fully proficient, AI-powered and local chatbot assistant🤖. Agent
fabric fabric is an open-source framework for augmenting humans using AI. Agent
FastGPT FastGPT is a knowledge-based platform built on the LLM. Agent
fastRAG Efficient Retrieval Augmentation and Generation Framework. Agent
GameAISDK Image-based game AI automation framework. Framework
GameNGen Diffusion Models Are Real-Time Game Engines. arXiv Game
GameGen-O GameGen-O: Open-world Video Game Generation. Game
GenAgent GenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI. arXiv Agent
Generative Agents Interactive Simulacra of Human Behavior. arXiv Agent
Genie Generative Interactive Environments. Game
gigax Runtime, LLM-powered NPCs. Game
HippoRAG Neurobiologically Inspired Long-Term Memory for Large Language Models. arXiv Agent
Interactive LLM Powered NPCs Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game! Game
IoA An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. Agent
KwaiAgents A generalized information-seeking agent system with Large Language Models (LLMs). arXiv Agent
LangChain Get your LLM application from prototype to production. Agent
Langflow Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. Agent
LangGraph Studio LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. Agent
LARP Language-Agent Role Play for open-world games. arXiv Agent
LLama Agentic System Agentic components of the Llama Stack APIs. Agent
LlamaIndex LlamaIndex is a data framework for your LLM application. Agent
MindSearch 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). Agent
Mixture of Agents (MoA) Mixture-of-Agents Enhances Large Language Model Capabilities. arXiv Agent
MMRole MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents. arXiv Agent
Moonlander.ai Start building 3D games without any coding using generative AI. Framework
MuG Diffusion MuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves. Game
Oasis Oasis is an interactive world model developed by Decart and Etched. Based on diffusion transformers, Oasis takes in user keyboard input and generates gameplay in an autoregressive manner. Game
OmAgent A multimodal agent framework for solving complex tasks. Agent
OpenAgents An Open Platform for Language Agents in the Wild. Agent
Opus An AI app that turns text into a video game. Game
Pipecat Open Source framework for voice and multimodal conversational AI. Agent
Qwen-Agent Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. Agent
Ragas Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. Agent
RPBench-Auto An automated pipeline for evaluating LLMs for role-playing. Game
SIMA A generalist AI agent for 3D virtual environments. Agent
StoryGames.ai AI for Dreamers Make Games. Game
SWE-agent Agent Computer Interfaces Enable Software Engineering Language Models. arXiv Agent
TaskGen A Task-based agentic framework building on StrictJSON outputs by LLM agents. Agent
TEN Agent TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. Agent
Translation Agent Agentic translation using reflection workflow. Agent
Twitter Twitter Personality is a web application that analyzes your Twitter handle to create a personalized personality profile using Wordware AI Agent. Agent
Unbounded Unbounded: A Generative Infinite Game of Character Life Simulation. arXiv Game
Video2Game Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. arXiv Game
V-IRL Grounding Virtual Intelligence in Real Life. arXiv Agent
WebDesignAgent An agent used for webdesign. Agent
XAgent An Autonomous LLM Agent for Complex Task Solving. Agent

^ Back to Contents ^

Code

Source Description Paper Game Engine Type
AI Code Translator Use AI to translate code from one language to another. Code
aiXcoder-7B aiXcoder-7B Code Large Language Model. Code
bloop bloop is a fast code search engine written in Rust. Code
Chapyter ChatGPT Code Interpreter in Jupyter Notebooks. Code
CodeGeeX An Open Multilingual Code Generation Model. arXiv Code
CodeGeeX2 A More Powerful Multilingual Code Generation Model. Code
CodeGeeX4 CodeGeeX4: Open Multilingual Code Generation Model. Code
CodeGen CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. arXiv Code
CodeGen2 CodeGen2 models for program synthesis. arXiv Code
Code Llama Code Llama is a large language models for code based on Llama 2. Code
CodeTF One-stop Transformer Library for State-of-the-art Code LLM. Code
CodeT5 Open Code LLMs for Code Understanding and Generation. Code
Cursor Write, edit, and chat about your code with GPT-4 in a new type of editor. Code
DeepSeek Coder DeepSeek Coder: Let the Code Write Itself. arXiv Code
OpenAI Codex OpenAI Codex is a descendant of GPT-3. Code
PandasAI Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational. Code
RobloxScripterAI RobloxScripterAI is an AI-powered code generation tool for Roblox. Roblox Code
Scikit-LLM Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. Code
SoTaNa The Open-Source Software Development Assistant. arXiv Code
Stable Code 3B Coding on the Edge. Code
StarCoder 💫 StarCoder is a language model (LM) trained on source code and natural language text. arXiv Code
StarCoder 2 StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. arXiv Code
UnityGen AI UnityGen AI is an AI-powered code generation plugin for Unity. Unity Code
Void Void is an open source Cursor alternative. Write code with the best AI tools, retain full control over your data, and access powerful AI features. Code

^ Back to Contents ^

Writer

Source Description Paper Game Engine Type
AI-Writer AI writes novels, generates fantasy and romance web articles, etc. Chinese pre-trained generative model. Writer
Notebook.ai Notebook.ai is a set of tools for writers, game designers, and roleplayers to create magnificent universes – and everything within them. Writer
Novel Notion-style WYSIWYG editor with AI-powered autocompletions. Writer
NovelAI Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around. Writer

^ Back to Contents ^

Image

Source Description Paper Game Engine Type
AnyDoor Zero-shot Object-level Image Customization. arXiv Image
AnyText Multilingual Visual Text Generation And Editing. arXiv Image
AutoStudio Crafting Consistent Subjects in Multi-turn Interactive Image Generation. arXiv Image
Blender-ControlNet Using ControlNet right in Blender. Blender Image
BriVL Bridging Vision and Language Model. arXiv Image
CatVTON CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models. arXiv Image
CLIPasso A method for converting an image of an object to a sketch, allowing for varying levels of abstraction. arXiv Image
ClipDrop Create stunning visuals in seconds. Image
ComfyUI A powerful and modular stable diffusion GUI with a graph/nodes interface. Image
ConceptLab Creative Generation using Diffusion Prior Constraints. arXiv Image
ControlNet ControlNet is a neural network structure to control diffusion models by adding extra conditions. arXiv Image
CSGO CSGO: Content-Style Composition in Text-to-Image Generation. arXiv Image
DALL·E 2 DALL·E 2 is an AI system that can create realistic images and art from a description in natural language. Image
Dashtoon Studio Dashtoon Studio is an AI powered comic creation platform. Comic
DeepAI DeepAI offers a suite of tools that use AI to enhance your creativity. Image
DeepFloyd IF IF by DeepFloyd Lab at StabilityAI. Image
Depth Anything V2 Depth Anything V2 arXiv Image
Depth map library and poser Depth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui. Image
Diffuse to Choose Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All. arXiv Image
Disco Diffusion A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations. Image
DragGAN Interactive Point-based Manipulation on the Generative Image Manifold. arXiv Image
Draw Things AI- assisted image generation in Your Pocket. Image
DWPose Effective Whole-body Pose Estimation with Two-stages Distillation. arXiv Image
EasyPhoto Your Smart AI Photo Generator. Image
Flux This repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers. Image
Follow-Your-Click Open-domain Regional Image Animation via Short Prompts. arXiv Image
Fooocus Focus on prompting and generating. Image
GIFfusion Create GIFs and Videos using Stable Diffusion. Image
Grounded-Segment-Anything Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs. arXiv Image
HivisionIDPhotos HivisionIDPhotos: a lightweight and efficient AI ID photos tools. Image
Hua Hua is an AI image editor with Stable Diffusion (and more). Image
Hunyuan-DiT A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. arXiv Image
IC-Light IC-Light is a project to manipulate the illumination of images. Image
Ideogram Helping people become more creative. Image
Imagen Imagen is an AI system that creates photorealistic images from input text. Image
img2img-turbo One-Step Image-to-Image with SD-Turbo. Image
Img2Prompt Get prompts from stable diffusion generated images. Image
InstantID Zero-shot Identity-Preserving Generation in Seconds. arXiv Image
InternLM-XComposer2 InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. arXiv Image
KOALA Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. Image
Kolors Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. Image
KREA Generate images and videos with a delightful AI-powered design tool. Image
LaVi-Bridge Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation. arXiv Image
LayerDiffusion Transparent Image Layer Diffusion using Latent Transparency. arXiv Image
Lexica A Stable Diffusion prompts search engine. Image
LlamaGen Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation. arXiv Image
Lumina-mGPT Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. arXiv Image
MetaShoot MetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way. Unreal Engine Image
Midjourney Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. Image
MIGC MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis. arXiv Image
MimicBrush Zero-shot Image Editing with Reference Imitation. arXiv Image
OmniGen OmniGen: Unified Image Generation. arXiv Image
Omost Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability. Image
Openpose Editor Openpose Editor for AUTOMATIC1111's stable-diffusion-webui. Image
Outfit Anyone Ultra-high quality virtual try-on for Any Clothing and Any Person. Image
PaintsUndo PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings. Image
PhotoMaker Customizing Realistic Human Photos via Stacked ID Embedding. arXiv Image
Photoroom AI Background Generator. Image
Plask AI image generation in the cloud. Image
Prompt.Art The Generators Hub. Image
PuLID Pure and Lightning ID Customization via Contrastive Alignment. arXiv Image
Rich-Text-to-Image Expressive Text-to-Image Generation with Rich Text. arXiv Image
RPG-DiffusionMaster Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). Image
SEED-Story SEED-Story: Multimodal Long Story Generation with Large Language Model. arXiv Image
Segment Anything Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click. arXiv Image
Segment Anything Model 2 (SAM 2) SAM 2: Segment Anything in Images and Videos. arXiv Image
sd-webui-controlnet WebUI extension for ControlNet. Image
SDXL-Lightning Progressive Adversarial Diffusion Distillation. arXiv Image
SDXS Real-Time One-Step Latent Diffusion Models with Image Conditions. Image
Stable.art Photoshop plugin for Stable Diffusion with Automatic1111 as backend (locally or with Google Colab). Image
Stable Cascade Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name "Stable Cascade". Image
Stable Diffusion A latent text-to-image diffusion model. Image
stable-diffusion.cpp Stable Diffusion in pure C/C++. Image
Stable Diffusion web UI A browser interface based on Gradio library for Stable Diffusion. Image
Stable Diffusion web UI Web-based UI for Stable Diffusion. Image
Stable Diffusion WebUI Chinese Chinese version of stable-diffusion-webui. Image
Stable Diffusion XL Generate images from text. arXiv Image
Stable Diffusion XL Turbo Real-Time Text-to-Image Generation. Image
Stable Diffusion 3.5 Stable Diffusion 3.5 open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. Image
Stable Doodle Stable Doodle is a sketch-to-image tool that converts a simple drawing into a dynamic image. Image
StableStudio StableStudio by Stability AI Image
StoryMaker StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation. arXiv Image
StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation. Image
StyleDrop Text-To-Image Generation in Any Style. arXiv Image
SyncDreamer Generating Multiview-consistent Images from a Single-view Image. arXiv Image
UltraEdit UltraEdit: Instruction-based Fine-Grained Image Editing at Scale. arXiv Image
UltraPixel UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks. arXiv Image
Unity ML Stable Diffusion Core ML Stable Diffusion on Unity. Unity Image
Vispunk Visions Text-to-Image generation platform. Image

^ Back to Contents ^

Texture

Source Description Paper Game Engine Type
CRM Single Image to 3D Textured Mesh with Convolutional Reconstruction Model. arXiv Texture
DreamMat High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. arXiv Texture
DreamSpace Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. Texture
Dream Textures Stable Diffusion built-in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. Blender Texture
InstructHumans Editing Animated 3D Human Textures with Instructions. arXiv Texture
InteX Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting. arXiv Texture
MaterialSeg3D MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. arXiv Texture
MeshAnything MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. arXiv Mesh
Neuralangelo High-Fidelity Neural Surface Reconstruction. arXiv Texture
Paint-it Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. Texture
Polycam Create your own 3D textures just by typing. Texture
TexFusion Synthesizing 3D Textures with Text-Guided Image Diffusion Models. arXiv Texture
Text2Tex Text-driven texture Synthesis via Diffusion Models. arXiv Texture
Texture Lab AI-generated texures. You can generate your own with a text prompt. Texture
With Poly Create Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library. Texture
X-Mesh X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. arXiv Texture

^ Back to Contents ^

Shader

Source Description Paper Game Engine Type
AI Shader ChatGPT-powered shader generator for Unity. Unity Shader

^ Back to Contents ^

3D Model

Source Description Paper Game Engine Type
Animate3D Animate3D: Animating Any 3D Model with Multi-view Video Diffusion. arXiv 3D
Anything-3D Segment-Anything + 3D. Let's lift the anything to 3D. arXiv Model
Any2Point Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding. arXiv 3D
BlenderGPT Use commands in English to control Blender with OpenAI's GPT-4. Blender Model
Blender-GPT An all-in-one Blender assistant powered by GPT3/4 + Whisper integration. Blender Model
Blockade Labs Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. Model
CF-3DGS COLMAP-Free 3D Gaussian Splatting. arXiv 3D
CharacterGen CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization. arXiv 3D
chatGPT-maya Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions. Maya Model
CityDreamer Compositional Generative Model of Unbounded 3D Cities. arXiv 3D
CSM Generate 3D worlds from images and videos. 3D
Dash Your Copilot for World Building in Unreal Engine. Unreal Engine 3D
DreamCatalyst DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation. arXiv 3D
DreamGaussian4D Generative 4D Gaussian Splatting. arXiv 4D
DUSt3R Geometric 3D Vision Made Easy. arXiv 3D
GALA3D GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. arXiv 3D
GaussCtrl GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing. arXiv 3D
GaussianCube A Structured and Explicit Radiance Representation for 3D Generative Modeling. arXiv 3D
GaussianDreamer Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors. arXiv 3D
GenieLabs Empower your game with AI-UGC. 3D
HiFA High-fidelity Text-to-3D with advance Diffusion guidance. Model
HoloDreamer HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions. arXiv 3D
Hunyuan3D-1.0 Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. arXiv 3D
Infinigen Infinite Photorealistic Worlds using Procedural Generation. arXiv 3D
Instruct-NeRF2NeRF Editing 3D Scenes with Instructions. arXiv Model
Interactive3D Create What You Want by Interactive 3D Generation. arXiv 3D
Isotropic3D Image-to-3D Generation Based on a Single CLIP Embedding. 3D
LATTE3D Large-scale Amortized Text-To-Enhanced3D Synthesis. arXiv 3D
LION Latent Point Diffusion Models for 3D Shape Generation. arXiv Model
Luma AI Capture in lifelike 3D. Unmatched photorealism, reflections, and details. The future of VFX is now, for everyone! Model
lumine AI AI-Powered Creativity. 3D
Make-It-3D High-Fidelity 3D Creation from A Single Image with Diffusion Prior. arXiv Model
Meshy Create Stunning 3D Game Assets with AI. 3D
Mootion Magical 3D AI Animation Maker. 3D
MVDream Multi-view Diffusion for 3D Generation. arXiv 3D
NVIDIA Instant NeRF Instant neural graphics primitives: lightning fast NeRF and more. Model
One-2-3-45 Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization. arXiv Model
Paint3D Paint Anything 3D with Lighting-Less Texture Diffusion Models. arXiv 3D
PAniC-3D Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. arXiv Model
Point·E Point cloud diffusion for 3D model synthesis. Model
ProlificDreamer High-Fidelity and diverse Text-to-3D generation with Variational score Distillation. arXiv Model
SF3D SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement. arXiv 3D
Shap-E Generate 3D objects conditioned on text or images. arXiv Model
Sloyd 3D modelling has never been easier. Model
Spline AI The power of AI is coming to the 3rd dimension. Generate objects, animations, and textures using prompts. Model
Stable Dreamfusion A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. Model
SV3D Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. arXiv 3D
Tafi AI text to 3D character engine. Model
3D-GPT Procedural 3D Modeling with Large Language Models. arXiv 3D
3D-LLM Injecting the 3D World into Large Language Models. arXiv 3D
3Dpresso Extract a 3D model of an object, captured on a video. Model
3DTopia Text-to-3D Generation within 5 Minutes. arXiv 3D
3DTopia-XL 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. arXiv 3D
threestudio A unified framework for 3D content generation. Model
TripoSR A state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image. arXiv Model
Unique3D High-Quality and Efficient 3D Mesh Generation from a Single Image. arXiv 3D
UnityGaussianSplatting Toy Gaussian Splatting visualization in Unity. Unity 3D
ViVid-1-to-3 Novel View Synthesis with Video Diffusion Models. arXiv 3D
Voxcraft Crafting Ready-to-Use 3D Models with AI. 3D
Wonder3D Single Image to 3D using Cross-Domain Diffusion. arXiv 3D
Zero-1-to-3 Zero-shot One Image to 3D Object. arXiv Model

^ Back to Contents ^

Avatar

Source Description Paper Game Engine Type
AniPortrait Audio-Driven Synthesis of Photorealistic Portrait Animations. arXiv Avatar
CALM Conditional Adversarial Latent Models for Directable Virtual Characters. arXiv Avatar
ChatAvatar Progressive generation Of Animatable 3D Faces Under Text guidance. Avatar
ChatdollKit ChatdollKit enables you to make your 3D model into a chatbot. Unity Avatar
DreamTalk When Expressive Talking Head Generation Meets Diffusion Probabilistic Models. arXiv Avatar
Duix Duix - Silicon-Based Digital Human SDK 🌐🤖 Avatar
EchoMimic EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions. arXiv Avatar
EMOPortraits Emotion-enhanced Multimodal One-shot Head Avatars. Avatar
E3 Gen Efficient, Expressive and Editable Avatars Generation. arXiv Avatar
ExAvatar ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. arXiv Avatar
GeneAvatar Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image. arXiv Avatar
GeneFace++ Generalized and Stable Real-Time 3D Talking Face Generation. Avatar
Hallo Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation. arXiv Avatar
Hallo2 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation. arXiv Avatar
HeadSculpt Crafting 3D Head Avatars with Text. arXiv Avatar
IntrinsicAvatar IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing. arXiv Avatar
Linly-Talker Digital Avatar Conversational System. Avatar
LivePortrait LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. arXiv Avatar
MotionGPT Human Motion as a Foreign Language, a unified motion-language generation model using LLMs. arXiv Avatar
MusePose MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation. Avatar
MuseTalk Real-Time High Quality Lip Synchorization with Latent Space Inpainting. Avatar
MuseV Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. Avatar
Portrait4D Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. arXiv Avatar
Ready Player Me Integrate customizable avatars into your game or app in days. Avatar
RodinHD RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models. arXiv Avatar
StyleAvatar3D Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. arXiv Avatar
Text2Control3D Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model. arXiv Avatar
Topo4D Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. arXiv Avatar
UnityAIWithChatGPT Based on Unity, ChatGPT+UnityChan voice interactive display is realized. Unity Avatar
Vid2Avatar 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. arXiv Avatar
VLOGGER Multimodal Diffusion for Embodied Avatar Synthesis. Avatar
Wild2Avatar Rendering Humans Behind Occlusions. arXiv Avatar

^ Back to Contents ^

Animation

Source Description Paper Game Engine Type
Animate Anyone Consistent and Controllable Image-to-Video Synthesis for Character Animation. arXiv Animation
AnimateAnything Fine-Grained Open Domain Image Animation with Motion Guidance. arXiv Animation
AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. arXiv Animation
AnimateLCM Let's Accelerate the Video Generation within 4 Steps! arXiv Animation
Animate-X Animate-X: Universal Character Image Animation with Enhanced Motion Representation. arXiv Animation
AnimateZero Video Diffusion Models are Zero-Shot Image Animators. arXiv Animation
AnimationGPT An AIGC tool for generating game combat motion assets. Animation
Deforum Deforum leverages Stable Diffusion to generate evolving AI visuals. Animation
DrawingSpinUp DrawingSpinUp: 3D Animation from Single Character Drawings. arXiv Animation
DreaMoving A Human Video Generation Framework based on Diffusion Models. arXiv Animation
FaceFusion Next generation face swapper and enhancer. Animation
FreeInit Bridging Initialization Gap in Video Diffusion Models. arXiv Animation
GeneFace Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis. arXiv Animation
ID-Animator Zero-Shot Identity-Preserving Human Video Generation. arXiv Animation
MagicAnimate Temporally Consistent Human Image Animation using Diffusion Model. arXiv Animation
NUWA DragNUWA is an open-domain diffusion-based video generation model takes text, image, and trajectory controls as inputs to achieve controllable video generation. arXiv Animation
NUWA-Infinity NUWA-Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. Animation
NUWA-XL A novel Diffusion over Diffusion architecture for eXtremely Long video generation. Animation
Omni Animation AI Generated High Fidelity Animations. Animation
PIA Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models. arXiv Animation
SadTalker Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. arXiv Animation
SadTalker-Video-Lip-Sync This project is based on SadTalkers Wav2lip for video lip synthesis. Animation
Stable Animation A powerful text-to-animation tool for developers. Animation
TaleCrafter An interactive story visualization tool that support multiple characters. arXiv Animation
ToonCrafter ToonCrafter: Generative Cartoon Interpolation. arXiv Animation
Wav2Lip Accurately Lip-syncing Videos In The Wild. arXiv Animation
Wonder Studio An AI tool that automatically animates, lights and composes CG characters into a live-action scene. Animation

^ Back to Contents ^

Visual

Source Description Paper Game Engine Type
Cambrian-1 Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. arXiv Multimodal LLMs
CogVLM2 GPT4V-level open-source multi-modal model based on Llama3-8B. Visual
CoTracker It is Better to Track Together. arXiv Visual
EVF-SAM EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. arXiv Visual
FaceHi It is Better to Track Together. Visual
InternLM-XComposer2 InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. arXiv Visual
Kangaroo Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input. Visual
LGVI Towards Language-Driven Video Inpainting via Multimodal Large Language Models. Visual
LLaVA++ Extending Visual Capabilities with LLaMA-3 and Phi-3. Visual
LLaVA-OneVision LLaVA-OneVision: Easy Visual Task Transfer. arXiv Visual
LongVA Long Context Transfer from Language to Vision. arXiv Visual
MaskViT Masked Visual Pre-Training for Video Prediction. arXiv Visual
MiniCPM-Llama3-V 2.5 A GPT-4V Level MLLM on Your Phone. Visual
MoE-LLaVA Mixture of Experts for Large Vision-Language Models. arXiv Visual
MotionLLM Understanding Human Behaviors from Human Motions and Videos. arXiv Visual
PLLaVA Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. arXiv Visual
Qwen-VL A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv Visual
Sapiens Sapiens: Foundation for Human Vision Models. arXiv Visual
ShareGPT4V Improving Large Multi-modal Models with Better Captions. arXiv Visual
SOLO SOLO: A Single Transformer for Scalable Vision-Language Modeling. arXiv Visual
Video-CCAM Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. Visual
Video-LLaVA Learning United Visual Representation by Alignment Before Projection. arXiv Visual
VideoLLaMA 2 Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. arXiv Visual
Video-MME The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. arXiv Visual
Vitron A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. Visual
VILA VILA: On Pre-training for Visual Language Models. arXiv Visual

^ Back to Contents ^

Video

Source Description Paper Game Engine Type
360DVD Controllable Panorama Video Generation with 360-Degree Video Diffusion Model. arXiv Video
Animate-A-Story Retrieval-Augmented Video Generation for Telling a Story. arXiv Video
Anything in Any Scene Photorealistic Video Object Insertion. Video
ART•V Auto-Regressive Text-to-Video Generation with Diffusion Models. arXiv Video
Assistive Meet the generative video platform that brings your ideas to life. Video
AtomoVideo High Fidelity Image-to-Video Generation. arXiv Video
BackgroundRemover Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source. Video
Boximator Generating Rich and Controllable Motions for Video Synthesis. arXiv Video
CoDeF Content Deformation Fields for Temporally Consistent Video Processing. arXiv Video
CogVideo Generate Videos from Text Descriptions. Video
CogVideoX CogVideoX is an open-source version of the video generation model, which is homologous to 清影. Video
CogVLM CogVLM is a powerful open-source visual language model (VLM). Visual
CoNR Genarate vivid dancing videos from hand-drawn anime character sheets(ACS). arXiv Video
Decohere Create what can't be filmed. Video
Descript Descript is the simple, powerful , and fun way to edit. Video
Diffutoon High-Resolution Editable Toon Shading via Diffusion Models. arXiv Video
dolphin General video interaction platform based on LLMs. Video
DomoAI Amplify Your Creativity with DomoAI. Video
DreamCinema DreamCinema: Cinematic Transfer with Free Camera and 3D Character. arXiv Video
DynamiCrafter Animating Open-domain Images with Video Diffusion Priors. arXiv Video
EDGE We introduce EDGE, a powerful method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to arbitrary input music. arXiv Video
EMO Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. arXiv Video
Emu Video Factorizing Text-to-Video Generation by Explicit Image Conditioning. Video
Etna Etna can generate corresponding video content based on short text descriptions. Video
Fairy Fast Parallelized Instruction-Guided Video-to-Video Synthesis. Video
Follow-Your-Canvas Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. arXiv Video
Follow Your Pose Pose-Guided Text-to-Video Generation using Pose-Free Videos. arXiv Video
FullJourney Your complete suite of AI Creation tools at your fingertips. Video
Gen-2 A multi-modal AI system that can generate novel videos with text, images, or video clips. Video
Generative Dynamics Generative Image Dynamics. Video
Genie Generative Interactive Environments. arXiv Video
Genmo Magically make videos with AI. Video
GenTron Diffusion Transformers for Image and Video Generation. Video
HiGen Hierarchical Spatio-temporal Decoupling for Text-to-Video generation. Video
Hotshot-XL Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. Video
Imagen Video Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. Video
InstructVideo Instructing Video Diffusion Models with Human Feedback. arXiv Video
I2VGen-XL High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. arXiv Video
LaVie High-Quality Video Generation with Cascaded Latent Diffusion Models. arXiv Video
LTX Studio LTX Studio is a holistic, AI-driven filmmaking platform for creators, marketers, filmmakers and studios. Video
Lumiere A Space-Time Diffusion Model for Video Generation. arXiv Video
LVDM Latent Video Diffusion Models for High-Fidelity Long Video Generation. arXiv Video
MagicVideo Efficient Video Generation With Latent Diffusion Models. arXiv Video
MagicVideo-V2 Multi-Stage High-Aesthetic Video Generation. arXiv Video
Magic Hour AI Video for Creators made simple. Video
MAGVIT-v2 Tokenizer is key to visual generation. Video
MAGVIT Masked Generative Video Transformer. Video
Make-A-Video Make-A-Video is a state-of-the-art AI system that generates videos from text. arXiv Video
Make Pixels Dance High-Dynamic Video Generation. arXiv Video
Make-Your-Video Customized Video Generation Using Textual and Structural Guidance. arXiv Video
MicroCinema A Divide-and-Conquer Approach for Text-to-Video Generation. arXiv Video
MIMO MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling. arXiv Video
Mini-Gemini Mining the Potential of Multi-modality Vision Language Models. Vision
MobileVidFactory Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. Video
Mochi 1 Mochi 1 is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. Video
MOFA-Video Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model. arXiv Video
MoneyPrinterTurbo Use large models to generate short videos with one click. Video
Moonvalley Moonvalley is a groundbreaking new text-to-video generative AI model. Video
Mora More like Sora for Generalist Video Generation. arXiv Video
Morph Studio With our Text-to-Video AI Magic, manifest your creativity through your prompt. Video
MotionClone MotionClone: Training-Free Motion Cloning for Controllable Video Generation. arXiv Video
MotionCtrl A Unified and Flexible Motion Controller for Video Generation. arXiv Video
MotionDirector Motion Customization of Text-to-Video Diffusion Models. arXiv Video
Motionshop An application of replacing the characters in video with 3D avatars. Video
Mov2mov Mov2mov plugin for Automatic1111/stable-diffusion-webui. Video
MovieFactory Automatic Movie Creation from Text using Large Generative Models for Language and Images. arXiv Video
Neural Frames Discover the synthesizer for the visual world. Video
NeverEnds Create your world. Video
Open-Sora Democratizing Efficient Video Production for All. Video
Open-Sora Open-Sora Plan. Video
Phenaki A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes. arXiv Video
Pika Labs Pika Labs is revolutionizing video-making experience with AI. Video
Pixeling Pixeling empowers our customers to create highly precise, ultra-realistic, and extremely controllable visual content including images, videos and 3D models. Video
PixVerse Create breath-taking videos with AI. Video
Pollinations Creating gets easy, fast, and fun. Video
Reuse and Diffuse Iterative Denoising for Text-to-Video Generation. arXiv Video
ShortGPT An experimental AI framework for automated short/video content creation. Video
Show-1 Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. arXiv Video
Snap Video Scaled Spatiotemporal Transformers for Text-to-Video Synthesis. arXiv Video
Sora Creating video from text. Video
SoraWebui SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. Video
StableVideo Text-driven Consistency-aware Diffusion Video Editing. Video
Stable Video Diffusion Stable Video Diffusion (SVD) Image-to-Video. Video
StoryDiffusion Consistent Self-Attention for Long-Range Image and Video Generation. arXiv Video
StreamingT2V Consistent, Dynamic, and Extendable Long Video Generation from Text. arXiv Video
StyleCrafter nhancing Stylized Text-to-Video Generation with Style Adapter. arXiv Video
TATS Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer. Video
Text2Video-Zero Text-to-Image Diffusion Models are Zero-Shot Video Generators. arXiv Video
TF-T2V A Recipe for Scaling up Text-to-Video Generation with Text-free Videos. arXiv Video
Tora Tora: Trajectory-oriented Diffusion Transformer for Video Generation. arXiv Video
Track-Anything Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. arXiv Video
Tune-A-Video One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. arXiv Video
TwelveLabs Multimodal AI that understands videos like humans. Video
UniVG Towards UNIfied-modal Video Generation. Video
Vchitect-2.0 Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. Video
VGen A holistic video generation ecosystem for video generation building on diffusion models. arXiv Video
ViewCrafter ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis. arXiv Video
Video-ChatGPT Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. arXiv Video
VideoComposer Compositional Video Synthesis with Motion Controllability. arXiv Video
VideoCrafter1 Open Diffusion Models for High-Quality Video Generation. arXiv Video
VideoCrafter2 Overcoming Data Limitations for High-Quality Video Diffusion Models. arXiv Video
VideoDrafter Content-Consistent Multi-Scene Video Generation with LLM. arXiv Video
VideoElevator Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models. arXiv Video
VideoFactory Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation. Video
VideoGen A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. arXiv Video
VideoLCM Video Latent Consistency Model. arXiv Video
Video LDMs Align your Latents: High- resolution Video Synthesis with Latent Diffusion Models. arXiv Video
Video-LLaVA Learning United Visual Representation by Alignment Before Projection. arXiv Video
VideoMamba State Space Model for Efficient Video Understanding. arXiv Video
Video-of-Thought Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. Video
VideoPoet A large language model for zero-shot video generation. arXiv Video
Vispunk Motion Create realistic videos using just text. Video
VisualRWKV VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. Visual
V-JEPA Video Joint Embedding Predictive Architecture. arXiv Video
W.A.L.T Photorealistic Video Generation with Diffusion Models. arXiv Video
Zeroscope Zeroscope Text-to-Video. Video

^ Back to Contents ^

Audio

Source Description Paper Game Engine Type
AcademiCodec An Open Source Audio Codec Model for Academic Research. Audio
Amphion An Open-Source Audio, Music, and Speech Generation Toolkit. arXiv Audio
ArchiSound Audio generation using diffusion models, in PyTorch. Audio
Audiobox Unified Audio Generation with Natural Language Prompts. Audio
AudioEditing Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. arXiv Audio
Audiogen Codec A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵. Audio
AudioGPT Understanding and Generating Speech, Music, Sound, and Talking Head. arXiv Audio
AudioLCM Text-to-Audio Generation with Latent Consistency Models. arXiv Audio
AudioLDM Text-to-Audio Generation with Latent Diffusion Models. arXiv Audio
AudioLDM 2 Learning Holistic Audio Generation with Self-supervised Pretraining. arXiv Audio
Auffusion Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. arXiv Audio
CTAG Creative Text-to-Audio Generation via Synthesizer Programming. Audio
FoleyCrafter FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. arXiv Audio
MAGNeT Masked Audio Generation using a Single Non-Autoregressive Transformer. Audio
Make-An-Audio Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. arXiv Audio
Make-An-Audio 3 Transforming Text into Audio via Flow-based Large Diffusion Transformers. arXiv Audio
NeuralSound Learning-based Modal Sound Synthesis with Acoustic Transfer. arXiv Audio
OptimizerAI Sounds for Creators, Game makers, Artists, Video makers. Audio
Qwen2-Audio Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud. arXiv Audio
SEE-2-SOUND Zero-Shot Spatial Environment-to-Spatial Sound. arXiv Audio
SoundStorm Efficient Parallel Audio Generation. arXiv Audio
Stable Audio Fast Timing-Conditioned Latent Audio Diffusion. Audio
Stable Audio Open Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. Audio
SyncFusion SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis. arXiv Audio
TANGO Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. Audio
VTA-LDM Video-to-Audio Generation with Hidden Alignment. arXiv Audio
WavJourney Compositional Audio Creation with Large Language Models. arXiv Audio

^ Back to Contents ^

Music

Source Description Paper Game Engine Type
AIVA The Artificial Intelligence composing emotional soundtrack music. Music
Amper Music Custom music generation technology powered by Amper. Music
Boomy Create generative music. Share it with the world. Music
ChatMusician Fostering Intrinsic Musical Abilities Into LLM. Music
Chord2Melody Automatic Music Generation AI. Music
Diff-BGM A Diffusion Model for Video Background Music Generation. arXiv Music
FluxMusic FluxMusic: Text-to-Music Generation with Rectified Flow Transformer. arXiv Music
GPTAbleton Draft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc. Music
HeyMusic.AI AI Music Generator Music
Image to Music AI Image to Music Generator is a tool that uses artificial intelligence to convert images into music. Music
JEN-1 Text-Guided Universal Music Generation with Omnidirectional Diffusion Models. Music
Jukebox A Generative Model for Music. arXiv Music
Magenta Magenta is a research project exploring the role of machine learning in the process of creating art and music. Music
MeLoDy Efficient Neural Music Generation Music
Mubert AI Generative Music. Music
MuseNet A deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. Music
MusicGen Simple and Controllable Music Generation. arXiv Music
MusicLDM Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. arXiv Music
MusicLM Generating Music From Text. arXiv Music
Riffusion App Riffusion is an app for real-time music generation with stable diffusion. Music
Sonauto Sonauto is an AI music editor that turns prompts, lyrics, or melodies into full songs in any style. Music
SoundRaw AI music generator for creators. Music
Soundry AI Generative AI tools including text-to-sound and infinite sample packs. Music

^ Back to Contents ^

Singing Voice

Source Description Paper Game Engine Type
DiffSinger Singing Voice Synthesis via Shallow Diffusion Mechanism. arXiv Singing Voice
Retrieval-based-Voice-Conversion-WebUI An easy-to-use SVC framework based on VITS. Singing Voice
so-vits-svc SoftVC VITS Singing Voice Conversion. Singing Voice
VI-SVS Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger. Singing Voice

^ Back to Contents ^

Speech

Source Description Paper Game Engine Type
Applio Ultimate voice cloning tool, meticulously optimized for unrivaled power, modularity, and user-friendly experience. Speech
Audyo Text in. Audio out. Speech
Bark Text-Prompted Generative Audio Model. Speech
Bert-VITS2 VITS2 Backbone with multilingual bert. Speech
ChatTTS ChatTTS is a generative speech model for daily dialogue. Speech
CLAPSpeech Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training. arXiv Speech
CosyVoice Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. Speech
DEX-TTS Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability. arXiv Speech
EmotiVoice A Multi-Voice and Prompt-Controlled TTS Engine. Speech
Fliki Turn text into videos with AI voices. Speech
GLM-4-Voice GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. Speech
Glow-TTS A Generative Flow for Text-to-Speech via Monotonic Alignment Search. arXiv Speech
GPT-SoVITS A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI. Speech
LOVO LOVO is the go-to AI Voice Generator & Text to Speech platform for thousands of creators. Speech
MahaTTS An Open-Source Large Speech Generation Model. Speech
Matcha-TTS A fast TTS architecture with conditional flow matching. arXiv Speech
MeloTTS High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. Speech
MetaVoice-1B AI for human-level speech intelligence. Speech
Narakeet Easily Create Voiceovers Using Realistic Text to Speech. Speech
Mini-Omni Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. arXiv Speech
One-Shot-Voice-Cloning One Shot Voice Cloning base on Unet-TTS. Speech
OpenVoice Instant voice cloning by MyShell. Speech
OverFlow Putting flows on top of neural transducers for better TTS. Speech
RealtimeTTS RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. Speech
SenseVoice SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). Speech
SpeechGPT Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. arXiv Speech
speech-to-text-gpt3-unity This is the repo I use Whisper and ChatGPT API from OpenAI in Unity. Unity Speech
Stable Speech Stability AI's Text-to-Speech model. Speech
StableTTS Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. Speech
StyleTTS 2 Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. arXiv Speech
tortoise.cpp tortoise.cpp: GGML implementation of tortoise-tts. Speech
TorToiSe-TTS A multi-voice TTS system trained with an emphasis on quality. Speech
TTS Generation WebUI TTS Generation WebUI (Bark, MusicGen, Tortoise, RVC, Vocos, Demucs). Speech
VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv Speech
VALL-E X Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling arXiv Speech
Vocode Vocode is an open-source library for building voice-based LLM applications. Speech
Voicebox Text-Guided Multilingual Universal Speech Generation at Scale. arXiv Speech
VoiceCraft Zero-Shot Speech Editing and Text-to-Speech in the Wild. Speech
Whisper Whisper is a general-purpose speech recognition model. Speech
WhisperSpeech An Open Source text-to-speech system built by inverting Whisper. Speech
X-E-Speech Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. Speech
XTTS XTTS is a library for advanced Text-to-Speech generation. Speech
YourTTS Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. arXiv Speech
ZMM-TTS Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. arXiv Speech

^ Back to Contents ^

Analytics

Source Description Game Engine Type
Ludo.ai Assistant for game research and design. Analytics

^ Back to Contents ^