Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Stable Diffusion Operators #904

Merged
merged 21 commits into from
Aug 4, 2024

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Jul 30, 2024

I implemented the Operators necessary for stable-diffusion.cpp to run using Vulkan. The corresponding PR is leejet/stable-diffusion.cpp#291

Image generation works now, but I want add some minor stuff for LORA/TAESD (leejet/stable-diffusion.cpp#291 (comment)), run further tests to make sure everything works, and maybe do some performance checks and optimizations before setting this to ready.

@0cc4m
Copy link
Collaborator Author

0cc4m commented Jul 30, 2024

@ggerganov I fixed two bugs while implementing this (fd01e5d and ecc1f51), can I just cherry-pick those into a llama.cpp PR or would that cause issues with the repo synchronization?

Edit: Also 577b132

@ggerganov
Copy link
Owner

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

@0cc4m
Copy link
Collaborator Author

0cc4m commented Jul 30, 2024

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

It doesn't seem to cause any significant issue on llama.cpp, so I'll wait for a sync unless someone opens an issue that would be fixed by this.

@ggerganov
Copy link
Owner

Btw, does this fix the following tests:

ggerganov/llama.cpp#8613 (comment)

@0cc4m
Copy link
Collaborator Author

0cc4m commented Jul 30, 2024

Btw, does this fix the following tests:

ggerganov/llama.cpp#8613 (comment)

It should, yes. When refactoring the shader code into files I set a preprocessor value incorrectly, which caused matmuls to fail when k is not divisible by 8.

@0cc4m 0cc4m marked this pull request as ready for review July 31, 2024 07:49
@0cc4m
Copy link
Collaborator Author

0cc4m commented Jul 31, 2024

I think I caught all of the major issues now, stable-diffusion.cpp works with Vulkan with these changes on AMD and Nvidia.

@SkutteOleg
Copy link

SkutteOleg commented Jul 31, 2024

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:

ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

@0cc4m
Copy link
Collaborator Author

0cc4m commented Jul 31, 2024

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:

ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

Please always add what model you are running and what command you called it with.

@SkutteOleg
Copy link

My bad. Didn't have time to test thoroughly at the time.

After some further testing I've determined the source of the problem to be quantization. Here is an example command:

sd.exe -p "A lovely cat" -m "v1-5-pruned-emaonly.ckpt" --type q8_0

Log:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1660 SUPER (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[INFO ] stable-diffusion.cpp:176  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt'
[INFO ] model.cpp:744  - load D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:199  - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:205  - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:427  - total params memory size = 1618.48MB (VRAM 1618.48MB, RAM 0.00MB): clip 125.20MB(VRAM), unet 1398.81MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:431  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt' completed, taking 19.80s
[INFO ] stable-diffusion.cpp:451  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:569  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1028 - apply_loras completed, taking 0.00s
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

@0cc4m
Copy link
Collaborator Author

0cc4m commented Aug 1, 2024

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

@0cc4m
Copy link
Collaborator Author

0cc4m commented Aug 1, 2024

I forgot to check img2img, GGML_OP_PAD was missing for that. I added it now.

@SkutteOleg
Copy link

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

Works great, thank you! Also the issue I was having where 1024x1024 would produce broken outputs is gone. I also was having an issue where Vulkan was looking too blotchy and noisy compared to CUDA12 and it is fixed as well to the point where CUDA12 images look noisier to me now.

All my use cases are covered, great job!

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, should we proceed with merge?

@0cc4m
Copy link
Collaborator Author

0cc4m commented Aug 4, 2024

Nice, should we proceed with merge?

I will add LEAKY_RELU (leejet/stable-diffusion.cpp#291 (comment)) in the next few hours, then we can merge.

Copy link
Contributor

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can confirm that txt2img and img2img is working fine on vulkan

@ggerganov ggerganov merged commit 18703ad into ggerganov:master Aug 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants