feat: use LoopVectorization for faster operations #111

avik-pal · 2024-07-30T02:58:17Z

I'm getting significant speedups from this. Zygote pretty much works OOTB because we already have the rrules defined. SciML/NeuralOperators.jl#17 should be pretty much handled once this is merged.

fixes #94

TODOs

Current Shortcomings

We can't @turbo the activation gradient function. We might want to hardcode some of the common cases for gradient computation -- tanh, gelu, sigmoid and relu would be my top contenders here

codecov · 2024-07-30T03:04:27Z

Codecov Report

Attention: Patch coverage is 50.36855% with 202 lines in your changes missing coverage. Please review.

Project coverage is 73.86%. Comparing base (ffffa89) to head (eef1dc0).

Files	Patch %	Lines
src/impl/affine_normalize.jl	40.12%	94 Missing ⚠️
src/impl/dropout.jl	35.55%	29 Missing ⚠️
src/impl/bias_activation.jl	42.55%	27 Missing ⚠️
src/impl/matmul.jl	73.68%	25 Missing ⚠️
src/impl/activation.jl	32.00%	17 Missing ⚠️
src/utils.jl	33.33%	6 Missing ⚠️
src/impl/fused_dense.jl	86.95%	3 Missing ⚠️
src/impl/normalization.jl	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
- Coverage   80.58%   73.86%   -6.73%     
==========================================
  Files          29       30       +1     
  Lines        1329     1580     +251     
==========================================
+ Hits         1071     1167      +96     
- Misses        258      413     +155

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/utils.jl

avik-pal mentioned this pull request Jul 30, 2024

Benchmark Tests for FNO and DeepONets SciML/NeuralOperators.jl#17

Open

4 tasks

avik-pal force-pushed the ap/perf-lv branch 9 times, most recently from 54904a3 to 54cd354 Compare July 31, 2024 01:39

avik-pal added 12 commits July 30, 2024 19:02

test: bug fixes and use correct threads

362a916

feat: use LoopVectorization for faster operations

6509016

fix: rework matmul to use operation modes

5e54f29

feat: add rrules for matmul and matmuladd

6173637

feat: replace mean and var with VectorizedStatistics

46feb93

feat: add EnzymeRules for matmul! and matmuladd!

bbb8787

feat: add EnzymeRules for _alpha_dropout_kernel!

d5445ba

feat: add EnzymeRules for _fast_activation!

4724704

refactor: remove unwanted reshapes in BN impl

1afd91b

docs: add perf note on LV to dense

267b81b

feat: add a public version of OOP activation

73b8961

fix: instance norm gradients with enzyme

59145df

avik-pal force-pushed the ap/perf-lv branch 3 times, most recently from 7a13bac to 87a09fb Compare July 31, 2024 04:14

feat: bias activation enzyme rules

585db59

avik-pal force-pushed the ap/perf-lv branch from 87a09fb to 585db59 Compare July 31, 2024 04:20

avik-pal added 2 commits July 30, 2024 22:10

perf: tune the impls a bit

6aa1dfe

refactor: restructure normalization functions

e83277e

avik-pal force-pushed the ap/perf-lv branch from 3970d27 to e83277e Compare July 31, 2024 05:47

avik-pal added 4 commits July 31, 2024 01:19

fix: support batchnorm and groupnorm for enzyme bypassing turbo

6193da5

fix: dimension checks for matmul

10fa8bf

fix: error in enzyme gradient for matmul

482df28

refactor: use macro to bypass loopvectorization

63cb023

github-actions bot reviewed Aug 1, 2024

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

avik-pal added 6 commits July 31, 2024 18:00

fix: run LV matmul only if check_args is true

8c0334c

chore: run formatter

565b8f1

fix: dispatch to loopvec for groupnorm

ded928f

perf: upperbound LV usage

953b710

fix: wrong function in macro

eef1dc0

perf: revert upperbound LV usage

bedde7b

avik-pal merged commit 854ba3f into main Aug 1, 2024
2 of 4 checks passed

avik-pal deleted the ap/perf-lv branch August 1, 2024 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use LoopVectorization for faster operations #111

feat: use LoopVectorization for faster operations #111

avik-pal commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

feat: use LoopVectorization for faster operations #111

feat: use LoopVectorization for faster operations #111

Conversation

avik-pal commented Jul 30, 2024 • edited Loading

TODOs

Current Shortcomings

codecov bot commented Jul 30, 2024 • edited Loading

Codecov Report

avik-pal commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading