Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions. #22699

sergachev · 2024-07-27T13:07:13Z

This will require openxla/xla#15399 to work.

Code for jax/_src/cudnn/fusion.py provided by @hawkinsp.

jax/_src/cudnn/fusion.py

sergachev · 2024-08-08T21:54:15Z

The required change in XLA is done, this one is ready.

hawkinsp

Sorry for the slow review. Given this API isn't yet public, I'm comfortable merging it more or less as is.

Is there a minimum cudnn version for this test to pass?

sergachev · 2024-08-27T16:53:03Z

Is there a minimum cudnn version for this test to pass?

9.0.

hawkinsp · 2024-08-27T16:57:55Z

The change looks fine to me, but it crashes in CI.

sergachev · 2024-08-27T17:18:43Z

Do I see right, that both failing checks are in non-GPU configurations?

hawkinsp · 2024-08-27T17:21:04Z

Yeah, you're right. Probably you need to error if lowering on a non-CUDA platform?

I think just add platform="cuda" to the register_lowering. Currently you're asserting that lowering works everywhere.

You should also skip the test if not on cuda @jtu.run_on_devices("cuda") iirc.

sergachev · 2024-08-27T18:50:42Z

Done.

hawkinsp · 2024-09-04T19:28:22Z

Sorry, it took me a long time to look at this. This test fails in our internal CI because it seems on V100 (which we run in CI) the rewrite to a cudnn fusion does not happen. Instead, the after optimization hlo ends up with a cublas gemm. Is that intended? Should the test be gated on particular GPU generations?

sergachev · 2024-09-04T22:15:44Z

It should run on H100. Is this https://github.com/google/jax/pull/22699/files#diff-77b54950a53c3196a56e8f570cb6dcd4eca602b5a8b4220f5cd2acb86f060e7fR1548 not sufficient to filter by GPU type?

sergachev · 2024-09-04T23:27:52Z

Anyway, I looked at other tests and added a check with skipTest(). It actually works on Ampere+.

hawkinsp · 2024-09-05T00:16:21Z

It should run on H100. Is this https://github.com/google/jax/pull/22699/files#diff-77b54950a53c3196a56e8f570cb6dcd4eca602b5a8b4220f5cd2acb86f060e7fR1548 not sufficient to filter by GPU type?

It appears not. However, in general BUILD rules aren't enough, because we support running the tests via other means such as pytest. So a BUILD rule filter is helpful (it stops us from running pointless tests), but the test should also skip itself if the hardware it needs isn't present.

hawkinsp · 2024-09-06T01:09:42Z

The rewrite also seems to fail on A100?

sergachev · 2024-09-06T09:32:53Z

I tested it on A100.

hawkinsp · 2024-09-08T18:54:42Z

I'm still finding this to fail in CI. It looks like the cudnn fusion is produced at the HLO fed to the compiler, but for some reason it gets rewritten away.

Are we guaranteed that the fusion will be emitted, or can it sometimes be autotuned away or something? Are there any other circumstances under which the fusion will fall back?

sergachev · 2024-09-09T12:20:32Z

Indeed, I examined the tests we have (https://github.com/openxla/xla/blob/main/xla/service/gpu/transforms/cudnn_custom_call_converter_test.cc#L27, https://github.com/openxla/xla/blob/main/xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc#L709) and realised, that the latter one relies on xla_gpu_cublas_fallback(false).

Fix: #23505

zhangqiaorjc requested a review from hawkinsp August 7, 2024 17:15

nouiz reviewed Aug 7, 2024

View reviewed changes

jax/_src/cudnn/fusion.py Show resolved Hide resolved

sergachev force-pushed the cudnn_fusion branch 2 times, most recently from 917e5da to 65f6e35 Compare August 8, 2024 21:46

hawkinsp approved these changes Aug 27, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Aug 27, 2024

kokoro-team removed the kokoro:force-run label Aug 27, 2024

sergachev force-pushed the cudnn_fusion branch from 65f6e35 to 87f7704 Compare August 27, 2024 18:49

hawkinsp approved these changes Aug 28, 2024

View reviewed changes

google-ml-butler bot added the kokoro:force-run label Aug 28, 2024

kokoro-team removed the kokoro:force-run label Aug 28, 2024

hawkinsp added pull ready Ready for copybara import and testing and removed pull ready Ready for copybara import and testing labels Sep 4, 2024

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions.

85d792a

sergachev force-pushed the cudnn_fusion branch from 87f7704 to 85d792a Compare September 4, 2024 23:26

hawkinsp approved these changes Sep 5, 2024

View reviewed changes

google-ml-butler bot added the kokoro:force-run label Sep 5, 2024

kokoro-team removed the kokoro:force-run label Sep 5, 2024

copybara-service bot merged commit 8fe99ff into jax-ml:main Sep 5, 2024
14 checks passed

sergachev deleted the cudnn_fusion branch September 5, 2024 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions. #22699

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions. #22699

sergachev commented Jul 27, 2024

sergachev commented Aug 8, 2024

hawkinsp left a comment

sergachev commented Aug 27, 2024

hawkinsp commented Aug 27, 2024

sergachev commented Aug 27, 2024

hawkinsp commented Aug 27, 2024

sergachev commented Aug 27, 2024

hawkinsp commented Sep 4, 2024

sergachev commented Sep 4, 2024

sergachev commented Sep 4, 2024

hawkinsp commented Sep 5, 2024

hawkinsp commented Sep 6, 2024

sergachev commented Sep 6, 2024

hawkinsp commented Sep 8, 2024

sergachev commented Sep 9, 2024

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions. #22699

Add cudnn_fusion decorator lowering computations to XLA cuDNN fusions. #22699

Conversation

sergachev commented Jul 27, 2024

sergachev commented Aug 8, 2024

hawkinsp left a comment

Choose a reason for hiding this comment

sergachev commented Aug 27, 2024

hawkinsp commented Aug 27, 2024

sergachev commented Aug 27, 2024

hawkinsp commented Aug 27, 2024

sergachev commented Aug 27, 2024

hawkinsp commented Sep 4, 2024

sergachev commented Sep 4, 2024

sergachev commented Sep 4, 2024

hawkinsp commented Sep 5, 2024

hawkinsp commented Sep 6, 2024

sergachev commented Sep 6, 2024

hawkinsp commented Sep 8, 2024

sergachev commented Sep 9, 2024