how to run it locally #2

mryt66 · 2024-07-25T11:04:29Z

No description provided.

bhupesh-sf · 2024-09-30T15:45:31Z

I have the same question, I have apple Mac M2 pro.

TerminatedProcess · 2024-09-30T16:35:34Z

I have a NVIDIA 4070 super rtx with 16 gig of ram. It's not enough.

KingNish24 · 2024-10-02T08:39:08Z

You can click on 3 dots than select run with docker to run it locally.

greyrabbit2003 · 2024-11-09T09:08:20Z

Issue: Out of Memory Error with Qwen2VL Model on RTX 3060 (12GB VRAM)

Environment:

GPU: NVIDIA RTX 3060 (12GB VRAM)
PyTorch Version: 2.0 (CUDA-compatible)
Transformers Version: according to requirements.txt in virtualenv
Operating System: Ubuntu 20.04

Problem Description:

I am encountering a torch.cuda.OutOfMemoryError when attempting to load the Qwen2VLForConditionalGeneration model on my RTX 3060, which has a 12GB VRAM capacity. From my understanding, the model requires approximately 16GB of VRAM, resulting in the error below:

python3 app.py /home/tokeniser/anaconda3/lib/python3.11/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). warnings.warn(_BETA_TRANSFORMS_WARNING) /home/tokeniser/anaconda3/lib/python3.11/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). warnings.warn(_BETA_TRANSFORMS_WARNING) The argument trust_remote_codeis to be used with Auto classes. It has no effect here and is ignored.Qwen2VLRotaryEmbeddingcan now be fully parameterized by passing the model config through theconfigargument. All other arguments will be removed in v4.46 Loading checkpoint shards: 100%|██████████████████| 5/5 [00:35<00:00, 7.11s/it] Traceback (most recent call last): File "/home/tokeniser/Videos/OpenGPT-4o/app.py", line 3, in <module> from chatbot import model_inference, EXAMPLES, chatbot File "/home/tokeniser/Videos/OpenGPT-4o/chatbot.py", line 30, in <module> model = Qwen2VLForConditionalGeneration.from_pretrained(MODEL_ID, trust_remote_code=True, torch_dtype=torch.float16).to("cuda").eval() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3167, in to return super().to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) ^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB (GPU 0; 11.75 GiB total capacity; 10.86 GiB already allocated; 97.94 MiB free; 11.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Questions:

Are there recommended methods to manage memory for larger models on GPUs with 12GB VRAM?

Is there an equivalent model for Qwen2VL that could work within 12GB VRAM?

Any pointers to documentation or example code that demonstrates loading Qwen2VL (or similar models) with a reduced memory footprint?

1. Would running this setup in Docker provide any solutions, or will VRAM limitations remain an issue even in a containerized environment?

Additional Info:

The VRAM on my GPU gets nearly filled with the initial model load, leaving limited memory for inference. Even after trying to switch to torch.float16 and using PyTorch's memory management configurations (e.g., PYTORCH_CUDA_ALLOC_CONF), the VRAM requirements still exceed the available capacity.

Thank you for any guidance or tips on managing this effectively!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to run it locally #2

how to run it locally #2

mryt66 commented Jul 25, 2024

bhupesh-sf commented Sep 30, 2024

TerminatedProcess commented Sep 30, 2024

KingNish24 commented Oct 2, 2024

greyrabbit2003 commented Nov 9, 2024 •

edited

Loading

how to run it locally #2

how to run it locally #2

Comments

mryt66 commented Jul 25, 2024

bhupesh-sf commented Sep 30, 2024

TerminatedProcess commented Sep 30, 2024

KingNish24 commented Oct 2, 2024

greyrabbit2003 commented Nov 9, 2024 • edited Loading

Issue: Out of Memory Error with Qwen2VL Model on RTX 3060 (12GB VRAM)

Environment:

Problem Description:

Questions:

greyrabbit2003 commented Nov 9, 2024 •

edited

Loading