A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
Updated
Feb 6, 2022 - Python
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Automated modeling and machine learning framework FEDOT
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
A knowledge base construction engine for richly formatted data
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
Sequence-to-Sequence Framework in PyTorch
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
DANCE: a deep learning library and benchmark platform for single-cell analysis
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."