Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for parallel FFT algorithm #103

Merged
merged 3 commits into from
Nov 29, 2020

Conversation

ValarDragon
Copy link
Member

@ValarDragon ValarDragon commented Nov 28, 2020

Description

Renames variables and documents the parallel FFT. I was going to start work on #83 , but then realized I didn't actually understand our paralell FFT algorithm, so didn't know how to best adapt it to this setting. The implemented algorithm seems like it was a direct translation of the C++ parallel FFT algorithm in libfqfft, which is also completely undocumented.

I'm still not sure what the underlying algorithm is. (Its not quite Cooley Tukey, since both the FFT's being done are of the same size) Its definitely slower than it ought to be at the moment.

I left comments in the code pointing out how to reduce its arithmetic complexity at least from 2N + |coset size| FFT to 2 * |coset size| FFT per thread. However that is still higher than Cooley Tukey (but perhaps Cooley Tukey is less clear to parallelize or will have less cache locality).


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work. - N/A, functionality unchanged at the moment
  • Wrote unit tests - N/A
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md - N/A
  • Re-reviewed Files changed in the Github PR explorer

@Pratyush
Copy link
Member

Amazing, thanks for this! this might explain the slowdown here: arkworks-rs/groth16#11

@ValarDragon
Copy link
Member Author

Oh cool! Glad theres something to validate that this is bad (I was a bit worried that noone had flagged this, since it seemed like a parallelism blocker). I think this does explain that slowdown. I can work on trying to go from 2N -> 2 coset size FFT work per thread after this.

At some later time, we probably need to have a broader consideration of the parallel FFT algorithm. Its possible to parallelize evenly with no arithmetic overhead, but this is bad for data locality and one would expect would slow things down. But theres a ton of different trade-offs to go for in the "increase arithmetic complexity, improve data locality / parallel speed" that we need to be explicit about. (I'm pretty confident that the current method is what we will end with!)

@Pratyush
Copy link
Member

So far this LGTM! I can't say that I fully understand the details of the algorithm; I mostly ported it over from the original Bellman implementation 😅

@weikengchen
Copy link
Member

(I can't wait to update my benchmark results)

@ValarDragon ValarDragon merged commit 9bc5417 into master Nov 29, 2020
@ValarDragon ValarDragon deleted the document_parallel_fft branch November 29, 2020 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants