Medvram option for GPUs with less than 4 GB #414

ganelonhb · 2024-09-18T20:02:35Z

ganelonhb
Sep 18, 2024

With quantization, it is possible to fit stable diffusion 1.5 models on my GPU with 2GB of VRAM. It even generates images at about 6sec/it, which is very fast. However, we are missing a lot of features when we quantize weights, namely LORA. 1.5 is, of course, a bit outdated by now but it's good enough (especially fine-tuned checkpoints) to be used on low-end devices.

On stable diffusion webui, the --medvram feature is very useful. It allows me to load models, LORA, VAE, and even the CLIP model all in VRAM on a machine with limited memory constraints. I understand that this project is not stable diffusion webui, but I still do think there is a use case for it here.

From here, I learned that the way this optimization works is by splitting up the model.

Makes the Stable Diffusion model consume less VRAM by splitting it into three parts - cond (for transforming text into numerical representation), first_stage (for converting a picture into latent space and back), and unet (for actual denoising of latent space) and making it so that only one is in VRAM at all times, sending others to CPU RAM. Lowers performance, but only by a bit - except if live previews are enabled.

I believe that getting stable diffusion running on more and more devices at faster and faster speeds should be paramount in the mission of this project.

I would be more than willing to learn more about this codebase and to implement the feature on my own, as well as maintain it if necessary. However, I will definitely need help! If anyone contributing to this project could point me in the right direction as to how to implement something like this, I would be very grateful for the help! I'm reviewing how it's done in webui, but it's behind a lot of abstraction as you can imagine with Python.

ganelonhb · 2024-09-20T22:39:30Z

ganelonhb
Sep 20, 2024
Author

Okay, I've given a good look at the library. It seems what I'm interested in doing is adding an option to to the StableDiffusionGGML class to swap through the CLIP encoder, the VAE, and the Model itself from CPU to VRAM as they are needed, rather than just loading them all into VRAM at once in the constructor.

This option only makes sense if the backend is one of the GPU options, so it will only do anything if you are using one of those. Otherwise, it will log a message like "No GPU backend, ignoring --medvram."

I will go ahead and start working on it, but if there are any suggestions or corrections to be had, please voice them. This is my first time working on any sort of ML stuff since college, so I may have misunderstood things!

0 replies

ganelonhb · 2024-09-21T21:24:12Z

ganelonhb
Sep 21, 2024
Author

It seems like the primary bottleneck when it comes to supporting less VRAM is need for flash attention on GPU. GPU stuff is way beyond my expertise, but I'm still going to try and implement the stuff I mention in this discussion in hopes that it can speed up generation on machines that can't fit everything into the GPU all at once.

4 replies

Green-Sky Sep 21, 2024

check out #386 , it does some magic flash attention memory reductions :)

ganelonhb Sep 22, 2024
Author

Oh man, that's great! I thought I'd be waiting a bit for this. Shame about the Vulkan thingy, but eh what the hell, amazing work!

ganelonhb Sep 22, 2024
Author

Real quick, in your opinion do you think it's a worthwhile venture to try and work on this idea, or do you think the overhead introduced from swapping from VRAM to RAM will end up being too high?

Green-Sky Sep 22, 2024

Yea, should be worth it.
If I am not mistake, #221 had some dynamic model loading code. Too bad the author has no time to continue it for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medvram option for GPUs with less than 4 GB #414

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Medvram option for GPUs with less than 4 GB #414

ganelonhb Sep 18, 2024

Replies: 2 comments · 4 replies

ganelonhb Sep 20, 2024 Author

ganelonhb Sep 21, 2024 Author

Green-Sky Sep 21, 2024

ganelonhb Sep 22, 2024 Author

ganelonhb Sep 22, 2024 Author

Green-Sky Sep 22, 2024

ganelonhb
Sep 18, 2024

Replies: 2 comments 4 replies

ganelonhb
Sep 20, 2024
Author

ganelonhb
Sep 21, 2024
Author

ganelonhb Sep 22, 2024
Author

ganelonhb Sep 22, 2024
Author