vlm

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Nov 22, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Nov 22, 2024
Python

XiaomingX / awesome-gaussian-splatting

Star

一个精选的 Gaussian Splatting 资源列表，受到 awesome-computer-vision 的启发。

awesome ai cv vlm llm gaussian-splatting

Updated Nov 22, 2024

TIGER-AI-Lab / VLM2Vec

Star

This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"

retrieval language-model vlm rag

Updated Nov 22, 2024
Python

modelscope / evalscope

Star

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

performance evaluation vlm rag llm

Updated Nov 22, 2024
Python

ThuCCSLab / Awesome-LM-SSP

Star

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

nlp security privacy jailbreak safety awesome-list language-model vlm adversarial-attacks diffusion-models llm

Updated Nov 22, 2024

yueliu1999 / Awesome-Jailbreak-on-LLMs

Star

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

security privacy ai jailbreak safety vlm llm llms vlms

Updated Nov 21, 2024

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img

Updated Nov 21, 2024
Python

KRproject-tech / FSI_by_FEM_and_UVLM

Star

Fluid-Structure Interaction Analysis Using FEM and UVLM

matlab fem fluid-simulation aerodynamics vlm fluid-structure-interaction finite-element-method vortex-lattice ancf absolute-nodal-coordinate-formulation

Updated Nov 21, 2024
MATLAB

asaddi / ComfyUI-YALLM-node

Star

Yet another set of LLM nodes for ComfyUI (for local/remote OpenAI-like APIs, multi-modal models supported)

extension vlm openai-api llm comfyui-nodes

Updated Nov 21, 2024
Python

NVIDIA-Omniverse-blueprints / 3d-conditioning

Star

Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.

vlm digital-twin omniverse

Updated Nov 21, 2024
Python

awmthink / pytorch-in-action

Star

A hands-on repository dedicated to building mainstream deep learning models from scratch using PyTorch

cnn pytorch gan classification deeplearning object-detection multi-modal from-scratch vlm diffusion-models aigc llm

Updated Nov 21, 2024
Jupyter Notebook

ndurner / oai_chat

Star

Multi-modal Chatbot based on OpenAI

chat chatbot openai gradio vlm gradio-interface gpt-4 llm vision-language-model llm-inference gpt-4v

Updated Nov 20, 2024
Python

RobotecAI / rai

Star

RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.

ai robotics ros2 vlm multimodal embodied-artificial-intelligence embodied-agent embodied-ai o3de llm generative-ai ai-agents-framework embodied-agents robotec

Updated Nov 22, 2024
Python

SeungjaeLim / Efficient-Road-Repairs-System.VLM

Star

vlm vllm microsoft-phi-3

Updated Nov 20, 2024
Python

TIGER-AI-Lab / Mantis

Star

Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Nov 20, 2024
Python

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm

Here are 148 public repositories matching this topic...

sgl-project / sglang

arc53 / doc2md

balrog-ai / BALROG

mgonzs13 / llama_ros

NexaAI / nexa-sdk

xlang-ai / OSWorld

XiaomingX / awesome-gaussian-splatting

TIGER-AI-Lab / VLM2Vec

modelscope / evalscope

ThuCCSLab / Awesome-LM-SSP

yueliu1999 / Awesome-Jailbreak-on-LLMs

heshengtao / comfyui_LLM_party

KRproject-tech / FSI_by_FEM_and_UVLM

asaddi / ComfyUI-YALLM-node

NVIDIA-Omniverse-blueprints / 3d-conditioning

awmthink / pytorch-in-action

ndurner / oai_chat

RobotecAI / rai

SeungjaeLim / Efficient-Road-Repairs-System.VLM

TIGER-AI-Lab / Mantis

Improve this page

Add this topic to your repo