You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been thinking a lot about using tile shaders for tessellation for another (Metal) project, but I'm not convinced it'll work. In best case scenario it would create an overtly complex mapping from control points/patches to pixels (since you can't explicitly control the grid size of a tile shader). How does Metal handle side effects of vertex/fragment shaders? Draw calls generally can happen in any order as I understand it, but does Metal's tracking include watching for buffer/texture writes and postponing draw calls as appropriate?
I spent a few hours yesterday evening (finally) trying to figure out a way to nudge Metal into recognizing shader side effects during a render encoder. This is probably obvious given what Apple has said about how vertex, fragment, and tile shaders get scheduled on Apple Silicon GPUs, but I can't any way of barriering commands. I tested both void vertex and tile shaders, writing to a buffer, and using the value of that buffer during a subsequent draw call in two scenarios, one reading in vertex and the other reading in fragment, as well as reading in tile.
It should be noted that Metal on Apple Silicon offers almost no tooling to suggest barriers between commands during a render encoder.
memoryBarrier isn't available on Apple Silicon for render encoders, only compute (it's in macOS SDK but triggers an abort when called on M1).
MTLFences cannot be used mid-encoder.
useResource is available but had no effect on results, it's functionality most certainly limited to residency of argument buffer contents as intended.
The results were essentially exactly what one would expect under the assumption that all vertex execution during a render encoder happens first, followed serially by all fragment+tile execution.
Writing in vertex, reading in vertex renders correctly using newly written data.
Writing in vertex, reading in fragment renders correctly using newly written data.
Writing in vertex, reading in tile renders correctly using newly written data.
Writing in tile, reading in vertex renders incorrectly using initialized zero data.
Writing in tile, reading in fragment renders frequently-flashing junk.
Writing in tile, reading in tile renders infrequently-flashing junk.
A big asterisk on these tests was that I was only testing for buffer visibility between shader stages. Buffer visibility into the tessellator wasn't tested, which is of course is the relevancy to MoltenVK. Regardless, I don't believe the success cases are reliable. There's no guarantee that a future Apple Silicon GPU won't overlap, and without any barrier API there's no way to prevent that in a future-proof way. Due to this, I don't believe there's any scenario where tile dispatch as it exists today can be used for tessellation control.
As a bonus test, I also ran the vertex-vertex test, with appropriate memoryBarrier in between, on an AMD Mac and it rendered correctly. Using a void vertex instead of kernel for tessellation control might be a performant alternative for legacy machines (assuming of course tessellation buffer fetch happens in the vertex stage so the barrier is applied correctly).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Followup from my comment some time ago in #1192.
I spent a few hours yesterday evening (finally) trying to figure out a way to nudge Metal into recognizing shader side effects during a render encoder. This is probably obvious given what Apple has said about how vertex, fragment, and tile shaders get scheduled on Apple Silicon GPUs, but I can't any way of barriering commands. I tested both void vertex and tile shaders, writing to a buffer, and using the value of that buffer during a subsequent draw call in two scenarios, one reading in vertex and the other reading in fragment, as well as reading in tile.
It should be noted that Metal on Apple Silicon offers almost no tooling to suggest barriers between commands during a render encoder.
The results were essentially exactly what one would expect under the assumption that all vertex execution during a render encoder happens first, followed serially by all fragment+tile execution.
A big asterisk on these tests was that I was only testing for buffer visibility between shader stages. Buffer visibility into the tessellator wasn't tested, which is of course is the relevancy to MoltenVK. Regardless, I don't believe the success cases are reliable. There's no guarantee that a future Apple Silicon GPU won't overlap, and without any barrier API there's no way to prevent that in a future-proof way. Due to this, I don't believe there's any scenario where tile dispatch as it exists today can be used for tessellation control.
As a bonus test, I also ran the vertex-vertex test, with appropriate memoryBarrier in between, on an AMD Mac and it rendered correctly. Using a void vertex instead of kernel for tessellation control might be a performant alternative for legacy machines (assuming of course tessellation buffer fetch happens in the vertex stage so the barrier is applied correctly).
Beta Was this translation helpful? Give feedback.
All reactions