Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run it locally #2

Open
mryt66 opened this issue Jul 25, 2024 · 4 comments
Open

how to run it locally #2

mryt66 opened this issue Jul 25, 2024 · 4 comments

Comments

@mryt66
Copy link

mryt66 commented Jul 25, 2024

No description provided.

@bhupesh-sf
Copy link

I have the same question, I have apple Mac M2 pro.

@TerminatedProcess
Copy link

I have a NVIDIA 4070 super rtx with 16 gig of ram. It's not enough.

@KingNish24
Copy link
Owner

You can click on 3 dots than select run with docker to run it locally.

image

@greyrabbit2003
Copy link

greyrabbit2003 commented Nov 9, 2024

Issue: Out of Memory Error with Qwen2VL Model on RTX 3060 (12GB VRAM)

Environment:

  • GPU: NVIDIA RTX 3060 (12GB VRAM)
  • PyTorch Version: 2.0 (CUDA-compatible)
  • Transformers Version: according to requirements.txt in virtualenv
  • Operating System: Ubuntu 20.04

Problem Description:

I am encountering a torch.cuda.OutOfMemoryError when attempting to load the Qwen2VLForConditionalGeneration model on my RTX 3060, which has a 12GB VRAM capacity. From my understanding, the model requires approximately 16GB of VRAM, resulting in the error below:

python3 app.py /home/tokeniser/anaconda3/lib/python3.11/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). warnings.warn(_BETA_TRANSFORMS_WARNING) /home/tokeniser/anaconda3/lib/python3.11/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). warnings.warn(_BETA_TRANSFORMS_WARNING) The argument trust_remote_codeis to be used with Auto classes. It has no effect here and is ignored.Qwen2VLRotaryEmbeddingcan now be fully parameterized by passing the model config through theconfigargument. All other arguments will be removed in v4.46 Loading checkpoint shards: 100%|██████████████████| 5/5 [00:35<00:00, 7.11s/it] Traceback (most recent call last): File "/home/tokeniser/Videos/OpenGPT-4o/app.py", line 3, in <module> from chatbot import model_inference, EXAMPLES, chatbot File "/home/tokeniser/Videos/OpenGPT-4o/chatbot.py", line 30, in <module> model = Qwen2VLForConditionalGeneration.from_pretrained(MODEL_ID, trust_remote_code=True, torch_dtype=torch.float16).to("cuda").eval() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3167, in to return super().to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) ^^^^^^^^^ File "/home/tokeniser/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB (GPU 0; 11.75 GiB total capacity; 10.86 GiB already allocated; 97.94 MiB free; 11.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Questions:

    1. Are there recommended methods to manage memory for larger models on GPUs with 12GB VRAM?
      
    1. Is there an equivalent model for Qwen2VL that could work within 12GB VRAM?
      
    1. Any pointers to documentation or example code that demonstrates loading Qwen2VL (or similar models) with a reduced memory footprint?
      
    1. Would running this setup in Docker provide any solutions, or will VRAM limitations remain an issue even in a containerized environment?

Additional Info:

The VRAM on my GPU gets nearly filled with the initial model load, leaving limited memory for inference. Even after trying to switch to torch.float16 and using PyTorch's memory management configurations (e.g., PYTORCH_CUDA_ALLOC_CONF), the VRAM requirements still exceed the available capacity.

Thank you for any guidance or tips on managing this effectively!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants