feat: use LoopVectorization for faster operations #111
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm getting significant speedups from this. Zygote pretty much works OOTB because we already have the rrules defined. SciML/NeuralOperators.jl#17 should be pretty much handled once this is merged.
fixes #94
TODOs
@tturbo
.replace some of the reduction operations withVectorizedReductions.jl
fast_activation
that is completely OOP but still allows for LoopVectorization optimizationsmatmul
andmatmuladd
internal_operation_mode
correctlymatmul
activation
affine_normalize
groupnorm
batchnorm
bias_activation
dropout
normalization
-- update statistics. Can we mark it non-diff for enzyme?fast_mean_var
matmul
andmatmuladd
Current Shortcomings
@turbo
the activation gradient function. We might want to hardcode some of the common cases for gradient computation --tanh
,gelu
,sigmoid
andrelu
would be my top contenders here