vlm
Here are 148 public repositories matching this topic...
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
-
Updated
Nov 22, 2024 - Python
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
Updated
Nov 22, 2024 - Python
This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"
-
Updated
Nov 22, 2024 - Python
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
-
Updated
Nov 22, 2024 - Python
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
-
Updated
Nov 22, 2024
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img
-
Updated
Nov 21, 2024 - Python
Fluid-Structure Interaction Analysis Using FEM and UVLM
-
Updated
Nov 21, 2024 - MATLAB
Yet another set of LLM nodes for ComfyUI (for local/remote OpenAI-like APIs, multi-modal models supported)
-
Updated
Nov 21, 2024 - Python
Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.
-
Updated
Nov 21, 2024 - Python
A hands-on repository dedicated to building mainstream deep learning models from scratch using PyTorch
-
Updated
Nov 21, 2024 - Jupyter Notebook
Multi-modal Chatbot based on OpenAI
-
Updated
Nov 20, 2024 - Python
RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
-
Updated
Nov 22, 2024 - Python
-
Updated
Nov 20, 2024 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
-
Updated
Nov 20, 2024 - Python
Improve this page
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."