Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #97

Open
karthik101200 opened this issue Jun 26, 2024 · 6 comments
Open

CUDA out of memory #97

karthik101200 opened this issue Jun 26, 2024 · 6 comments

Comments

@karthik101200
Copy link

num_rendered, color, depth, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 35.79 GiB (GPU 0; 23.49 GiB total capacity; 990.78 MiB already allocated; 21.66 GiB free; 1.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

35.79 gb seems like a lot?

When I run on more powerful mobile GPU (Ada A2000) but with less VRAM (8GB) it starts training but goes out of memory after 4.5k iterations. Any way to solve either of the issues?

@hbb1
Copy link
Owner

hbb1 commented Jun 26, 2024

Can you print full error message and clarify:
(1) how many images did you use? what is the resolution?
(2) how many points when OOM?

@karthik101200
Copy link
Author

karthik101200 commented Jun 26, 2024

109 images. resolution is default i think you have kept it -1 and around 15 points at OOM

@karthik101200
Copy link
Author

the number of images are large as it is quiet a big unbounded scene like a simulation env of hospital from a robot POV and the mesh created although it was incomplete but quite promising after 4.5k iterations hence wanted to train it more

@hbb1
Copy link
Owner

hbb1 commented Jun 26, 2024

You can store the images into cpu device for reducing gpu memory consumption. see PR #45

@karthik101200
Copy link
Author

karthik101200 commented Jun 26, 2024

so just to confirm i need to pass --data_device cpu right? Because when I do that I get OOM but with a message of higher allocation
CUDA out of memory. Tried to allocate 72.07 GiB (GPU 0; 23.49 GiB total capacity; 159.48 MiB already allocated; 22.58 GiB free; 210.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@hbb1
Copy link
Owner

hbb1 commented Jun 26, 2024

You need follow the PR to make necessary changes. very minor changes:

906bf01
99bd153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants