Replies: 1 comment 5 replies
-
Thankfully, this is very simple! They are defined to run in serial in WebGPU. Each dispatch is its own usage scope, so will run as-if serial. We communicate this to metal by making a "serial" compute pass, causing this behavior. See #2659 for discussion of parallel compute passes - something we want just haven't been implemented |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I've got a couple of compute shaders that work on two different buffers and later some another shader merges those two buffers together - I've got a strong suspicion that most of those shaders (being mixed ALU/mem-bound) should be scheduled in parallel, but it looks like Mac only "parallelizes" the first ones:
For completion, that's how it looks like in Xcode's dependencies tab:
Inspecting those shaders tells me that they spend a lot of time waiting for memory (and don't actually use that many registers), making them a perfect candidate for parallelization:
... but it's not happening - is there any way for me to debug why scheduler would do this or the other thing?
(maybe there's some low-hanging fruit to check, some accidentally-created synchronization edge that I could get rid of etc.)
Thanks!
Edit: of course I'm aware that passes being executed in parallel doesn't imply any performance gain (and that it's some magical code in the driver which ultimately decides what's happening) - I'd just like to make sure there's no low-hanging optimization-fruit there.
Beta Was this translation helpful? Give feedback.
All reactions