Qwen2 VL 72B AWQ Gradio UI

This repository provides a simple Gradio UI to run Qwen2 VL 72B AWQ, enabling both image and video inferencing.

Tested only to work in venv on Pop-OS/Ubuntu 22.04 and only with Qwen/Qwen2-VL-72B-Instruct-AWQ downloaded from here: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ/tree/main

If you run into any installation issues, visit the following two links for more info (I am not affiliated with Qwen in any way/shape/form...I just don't have the time to help people install this. So, head on over to the Qwen repo and HuggingFace page and chances are you may find the answer(s) you are seeking): https://github.com/QwenLM/Qwen2-VL and https://github.com/QwenLM/Qwen2-VL

Note: On their HuggingFace page, Qwen2-VL advises the following:

"We advise build from source with command pip install git+https://github.com/huggingface/transformers, or you might encounter the following error:

KeyError: 'qwen2_vl'

https://github.com/QwenLM/Qwen2-VL

Note: There is further room for inference/speed optimization to be made via DeepSpeed/etc. This repo is just a quick way to test out Qwen2 VL 72B.

Requirements

Python 3.8+
CUDA-compatible GPU (recommended)

Setup

Clone this repository:

git clone https://github.com/Kaszebe/Large-Vision-Language-Model-UI.git
cd Large-Vision-Language-Model-UI

Create a virtual environment:
```
python3 -m venv qwen_venv
```
Activate the virtual environment:
```
source qwen_venv/bin/activate
```

Install the required packages:

pip install transformers accelerate qwen-vl-utils gradio
pip install flash-attn --no-build-isolation

Download the Qwen2-VL-72B-AWQ model:
- Visit Hugging Face Model
- Download the model files
- Place the files in a directory on your local machine
- Set the QWEN_MODEL_PATH environment variable:
```
export QWEN_MODEL_PATH="/path/to/your/model"
```

Running the Model

Activate the virtual environment (if not already activated):
```
source qwen_venv/bin/activate
```
Run the script:
```
python run_qwen_model.py --flash-attn2
```
This will launch the Gradio interface, allowing you to interact with the Qwen2-VL-72B-AWQ model through a web browser. You can upload images or videos and input prompts to generate descriptions.

Usage

Open the provided URL in your web browser.
Upload an image or video.
Enter a text prompt.
Adjust generation parameters if needed.
Click "Submit" to generate a description.

Notes

The model performs best with a CUDA-compatible GPU.
Processing time may vary depending on your hardware and the complexity of the input.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
run_qwen_model.py		run_qwen_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2 VL 72B AWQ Gradio UI

Note: There is further room for inference/speed optimization to be made via DeepSpeed/etc. This repo is just a quick way to test out Qwen2 VL 72B.

Requirements

Setup

Running the Model

Usage

Notes

--Enjoy!

About

Releases

Packages

Languages

Kaszebe/Large-Vision-Language-Model-UI

Folders and files

Latest commit

History

Repository files navigation

Qwen2 VL 72B AWQ Gradio UI

Note: There is further room for inference/speed optimization to be made via DeepSpeed/etc. This repo is just a quick way to test out Qwen2 VL 72B.

Requirements

Setup

Running the Model

Usage

Notes

--Enjoy!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages