Vulkan Stable Diffusion Operators #904

0cc4m · 2024-07-30T05:30:31Z

I implemented the Operators necessary for stable-diffusion.cpp to run using Vulkan. The corresponding PR is leejet/stable-diffusion.cpp#291

Image generation works now, but I want add some minor stuff for LORA/TAESD (leejet/stable-diffusion.cpp#291 (comment)), run further tests to make sure everything works, and maybe do some performance checks and optimizations before setting this to ready.

0cc4m · 2024-07-30T05:32:36Z

@ggerganov I fixed two bugs while implementing this (fd01e5d and ecc1f51), can I just cherry-pick those into a llama.cpp PR or would that cause issues with the repo synchronization?

Edit: Also 577b132

ggerganov · 2024-07-30T07:06:18Z

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

0cc4m · 2024-07-30T08:02:26Z

It's easier to merge in one repo and sync to others. But if it's high priority you can cherry pick in llama.cpp and Ill resolve later

It doesn't seem to cause any significant issue on llama.cpp, so I'll wait for a sync unless someone opens an issue that would be fixed by this.

ggerganov · 2024-07-30T08:10:15Z

Btw, does this fix the following tests:

ggerganov/llama.cpp#8613 (comment)

0cc4m · 2024-07-30T08:12:08Z

Btw, does this fix the following tests:

ggerganov/llama.cpp#8613 (comment)

It should, yes. When refactoring the shader code into files I set a preprocessor value incorrectly, which caused matmuls to fail when k is not divisible by 8.

…ia GPUs

0cc4m · 2024-07-31T07:51:01Z

I think I caught all of the major issues now, stable-diffusion.cpp works with Vulkan with these changes on AMD and Nvidia.

SkutteOleg · 2024-07-31T11:24:02Z

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:

ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

0cc4m · 2024-07-31T11:45:42Z

It doesn't look ready yet, the latest commit crashes every time for me with settings that worked before:
ggml_extend.hpp:939  - clip compute buffer size: 1.40 MB(VRAM)
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

Please always add what model you are running and what command you called it with.

SkutteOleg · 2024-07-31T14:26:43Z

My bad. Didn't have time to test thoroughly at the time.

After some further testing I've determined the source of the problem to be quantization. Here is an example command:

sd.exe -p "A lovely cat" -m "v1-5-pruned-emaonly.ckpt" --type q8_0

Log:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1660 SUPER (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[INFO ] stable-diffusion.cpp:176  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt'
[INFO ] model.cpp:744  - load D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:199  - Stable Diffusion 1.x
[INFO ] stable-diffusion.cpp:205  - Stable Diffusion weight type: q8_0
[INFO ] stable-diffusion.cpp:427  - total params memory size = 1618.48MB (VRAM 1618.48MB, RAM 0.00MB): clip 125.20MB(VRAM), unet 1398.81MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:431  - loading model from 'D:\Program Files\ComfyUI_windows_portable\ComfyUI\models\checkpoints\v1-5-pruned-emaonly.ckpt' completed, taking 19.80s
[INFO ] stable-diffusion.cpp:451  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:569  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1028 - apply_loras completed, taking 0.00s
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-vulkan.cpp:3073: GGML_ASSERT(d_X->size >= x_sz * ne02 * ne03) failed

0cc4m · 2024-08-01T06:46:39Z

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

0cc4m · 2024-08-01T08:50:47Z

I forgot to check img2img, GGML_OP_PAD was missing for that. I added it now.

SkutteOleg · 2024-08-01T09:54:36Z

@SkutteOleg Thank you for the report, I messed up one of the conditions for selecting an quantized matmul shader. That's fixed now, can you try again?

Works great, thank you! Also the issue I was having where 1024x1024 would produce broken outputs is gone. I also was having an issue where Vulkan was looking too blotchy and noisy compared to CUDA12 and it is fixed as well to the point where CUDA12 images look noisier to me now.

All my use cases are covered, great job!

ggerganov

Nice, should we proceed with merge?

0cc4m · 2024-08-04T09:07:22Z

Nice, should we proceed with merge?

I will add LEAKY_RELU (leejet/stable-diffusion.cpp#291 (comment)) in the next few hours, then we can merge.

LostRuins

can confirm that txt2img and img2img is working fine on vulkan

0cc4m added 10 commits July 27, 2024 17:50

Fix Vulkan repeat op

aebb11d

Implement Vulkan concat op

c7b32f8

Delete old Vulkan shader generator

04d5e49

Implement Vulkan im2col op

d7c8cb7

Implement Vulkan unary gelu_quick op

23a0068

Implement Vulkan group_norm op

31fd289

Implement Vulkan timestep_embedding op

9568dcd

Implement Vulkan upscale op

e4b2ff7

Fix Vulkan vk_context tensor extra index issue

fd01e5d

Fix Vulkan matmul shader parameter bug

ecc1f51

Properly fix Vulkan matmul shader parameter bug

577b132

0cc4m added 5 commits July 30, 2024 10:36

Add Vulkan ADD f16 + f32 -> f16 operator support

5954f2c

Implement Vulkan tanh op

a3c06bc

Fix Vulkan group count too large Validation error on non-Nvidia GPUs

75f3980

Throw error when too much memory is requested

c2c163e

Fix another Vulkan group count too large Validation error on non-Nvid…

f5d9260

…ia GPUs

0cc4m marked this pull request as ready for review July 31, 2024 07:49

Fix matmul MMQ condition

4641577

Implement Vulkan pad op

483dd10

Fix Vulkan crash when tensor is used multiple times in a compute graph

78a4f0b

ggerganov approved these changes Aug 4, 2024

View reviewed changes

0cc4m added 2 commits August 4, 2024 15:57

Add Vulkan CONCAT f16 + f16 -> f16 op

4e302bb

Add Vulkan LEAKY_RELU op

9484d1e

LostRuins approved these changes Aug 4, 2024

View reviewed changes

ggerganov merged commit 18703ad into ggerganov:master Aug 4, 2024
4 checks passed

Green-Sky mentioned this pull request Aug 4, 2024

Add vulkan backend leejet/stable-diffusion.cpp#291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan Stable Diffusion Operators #904

Vulkan Stable Diffusion Operators #904

0cc4m commented Jul 30, 2024

0cc4m commented Jul 30, 2024 •

edited

Loading

ggerganov commented Jul 30, 2024

0cc4m commented Jul 30, 2024

ggerganov commented Jul 30, 2024

0cc4m commented Jul 30, 2024

0cc4m commented Jul 31, 2024

SkutteOleg commented Jul 31, 2024 •

edited

Loading

0cc4m commented Jul 31, 2024

SkutteOleg commented Jul 31, 2024

0cc4m commented Aug 1, 2024

0cc4m commented Aug 1, 2024

SkutteOleg commented Aug 1, 2024

ggerganov left a comment

0cc4m commented Aug 4, 2024

LostRuins left a comment

Vulkan Stable Diffusion Operators #904

Vulkan Stable Diffusion Operators #904

Conversation

0cc4m commented Jul 30, 2024

0cc4m commented Jul 30, 2024 • edited Loading

ggerganov commented Jul 30, 2024

0cc4m commented Jul 30, 2024

ggerganov commented Jul 30, 2024

0cc4m commented Jul 30, 2024

0cc4m commented Jul 31, 2024

SkutteOleg commented Jul 31, 2024 • edited Loading

0cc4m commented Jul 31, 2024

SkutteOleg commented Jul 31, 2024

0cc4m commented Aug 1, 2024

0cc4m commented Aug 1, 2024

SkutteOleg commented Aug 1, 2024

ggerganov left a comment

Choose a reason for hiding this comment

0cc4m commented Aug 4, 2024

LostRuins left a comment

Choose a reason for hiding this comment

0cc4m commented Jul 30, 2024 •

edited

Loading

SkutteOleg commented Jul 31, 2024 •

edited

Loading