Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use LoopVectorization for faster operations #111

Merged
merged 25 commits into from
Aug 1, 2024
Merged

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Jul 30, 2024

I'm getting significant speedups from this. Zygote pretty much works OOTB because we already have the rrules defined. SciML/NeuralOperators.jl#17 should be pretty much handled once this is merged.

fixes #94

TODOs

  • Replace all loops with @tturbo.
    • activation gradient loop not working see shortcomings
    • replace some of the reduction operations with VectorizedReductions.jl
  • Introduce a fast_activation that is completely OOP but still allows for LoopVectorization optimizations
  • matmul and matmuladd
    • polyalgorithm that uses LV for smaller matrices
    • handle internal_operation_mode correctly
    • add CRC.rrules
  • Enzyme Support
    • matmul
    • activation
    • affine_normalize
      • groupnorm
      • batchnorm
    • bias_activation
    • dropout
    • normalization -- update statistics. Can we mark it non-diff for enzyme?
    • fast_mean_var
  • Downstream Errors
    • Dimension checking in matmul and matmuladd
    • Enzyme tests run locally

Current Shortcomings

  • We can't @turbo the activation gradient function. We might want to hardcode some of the common cases for gradient computation -- tanh, gelu, sigmoid and relu would be my top contenders here

Copy link

codecov bot commented Jul 30, 2024

Codecov Report

Attention: Patch coverage is 50.36855% with 202 lines in your changes missing coverage. Please review.

Project coverage is 73.86%. Comparing base (ffffa89) to head (eef1dc0).

Files Patch % Lines
src/impl/affine_normalize.jl 40.12% 94 Missing ⚠️
src/impl/dropout.jl 35.55% 29 Missing ⚠️
src/impl/bias_activation.jl 42.55% 27 Missing ⚠️
src/impl/matmul.jl 73.68% 25 Missing ⚠️
src/impl/activation.jl 32.00% 17 Missing ⚠️
src/utils.jl 33.33% 6 Missing ⚠️
src/impl/fused_dense.jl 86.95% 3 Missing ⚠️
src/impl/normalization.jl 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
- Coverage   80.58%   73.86%   -6.73%     
==========================================
  Files          29       30       +1     
  Lines        1329     1580     +251     
==========================================
+ Hits         1071     1167      +96     
- Misses        258      413     +155     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal avik-pal force-pushed the ap/perf-lv branch 9 times, most recently from 54904a3 to 54cd354 Compare July 31, 2024 01:39
@avik-pal avik-pal force-pushed the ap/perf-lv branch 3 times, most recently from 7a13bac to 87a09fb Compare July 31, 2024 04:14
src/utils.jl Outdated Show resolved Hide resolved
@avik-pal avik-pal merged commit 854ba3f into main Aug 1, 2024
2 of 4 checks passed
@avik-pal avik-pal deleted the ap/perf-lv branch August 1, 2024 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enzyme Rules for operations
1 participant