Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing compile time of JAX HEALPix (I)FFT implementations #171

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

matt-graham
Copy link
Collaborator

Related to #140 though this doesn't completely remove the loops in the JAX HEALPix FFT and IFFT implementations, but it does reduce the number of unrolled operations and so compile time. Unfortunately the optimizations do make the code a bit less readable and less directly tied to the NumPy implementations.

I've tested locally against the tests added in #170 which pass, but we would probably want to merge that in first, so I'm marking this as draft until that is merged in and we can then rebase on top of that.

Compared to previous implementations, this tries to vectorize operations as much as possible by processing data connected to $\theta$ rings of the same size (all equatorial rings and the pairs of polar rings of equal sizes) together.

The big gain is in vectorizing the operations on all the equally sized equatorial bands together, as this removes around 2 * nside unrolled loop iterations in favour of one set of vectorized operations. Processing the pairs of polar rings together gives a smaller but still helpful reduction in compile time.

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4ef9c67) 91.63% compared to head (84fbf0b) 91.64%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #171   +/-   ##
=======================================
  Coverage   91.63%   91.64%           
=======================================
  Files          22       22           
  Lines        2510     2512    +2     
=======================================
+ Hits         2300     2302    +2     
  Misses        210      210           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@matt-graham matt-graham force-pushed the mmg/healpix-fft-compile-optimizations branch from 9f43f3b to 84fbf0b Compare December 4, 2023 14:45
@matt-graham matt-graham marked this pull request as ready for review December 4, 2023 14:45
Copy link
Collaborator

@CosmoMatt CosmoMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @matt-graham this should help for now. As you say, this approach still has the same scaling due to the outer loops but should be at least a factor of 2 less compile time, likely a little better.

@matt-graham matt-graham merged commit aff7f27 into main Dec 4, 2023
3 checks passed
@matt-graham matt-graham deleted the mmg/healpix-fft-compile-optimizations branch December 4, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants