Benchmark Tests for FNO and DeepONets #17

ayushinav · 2024-07-26T00:39:40Z

To fix #13

comparison with neuraloperators
comparison with deepxde
comparison with Flux variant
GPU comparisons

I had some issues with CUDA and all when installing torch CUDA toolkit.

ayushinav · 2024-07-26T00:44:48Z

I guess a more appropriate place for this would be SciMLBenchmarks? @avik-pal

avik-pal · 2024-07-26T00:46:11Z

Yes the CPU ones can go to SciMLBenchmarks. How are the julia native ones so slow? Did you run the profiler to see where the bottlenecks are?

ayushinav · 2024-07-28T03:43:59Z

For DeepONet, most of the time goes to the layers, iiuc

For FNOs, ig ffts are a bit expensive

avik-pal · 2024-07-28T05:37:45Z

Couple of things to check:

Load MKL.jl before calling the neural networks
Is the number of threads consistent between julia and python?

avik-pal · 2024-07-28T05:39:26Z

Also that compile function might be compiling the torch model with dynamo. It is pretty much impossible to beat that with Lux running in eager mode. You could try and compile the DeepONet model with EnzymeAD/Reactant.jl#55 and see the performance.

avik-pal · 2024-07-30T00:54:27Z

Also looking at the plots you are on quite an old version of LuxLib, update it, and it should address some performance

avik-pal · 2024-07-30T02:59:06Z

@ayushinav can you install LuxDL/LuxLib.jl#111 and let me know how the performance is?

avik-pal · 2024-08-03T16:05:53Z

I checked the recent lux releases. The current problems are

__project isn't fast enough. This needs to be rewritten to use LoopVectorization on CPU
batched_mul -- I am looking into this in LuxLib, this is quite easy to fix
allocations are expensive. Nothing much we can do there honestly.

The current numbers for Lux on this PR are single threaded, Pytorch uses all cores by default.

ChrisRackauckas · 2024-08-03T22:09:57Z

Can we make this into a SciMLBenchmarks script? That will be easier to maintain in the long run.

We can make that support GPU

bench/pytorch.py

avik-pal · 2024-08-04T20:58:23Z

Haven't looked into the FNO version much, but that will most likely need LuxDL/LuxLib.jl#118 for performance. To summarize the issue there:

gelu and friends are surprisingly expensive operations, and fusing them into the fused_dense operations slows down the entire loop. But that is not true in general. For example, tanh/relu/abs, etc. can and should be fused into the main loop for performance. So we need to extend the current implementation with additional traits to see which activations can be fused for performance.
But the main perf gain will come from better gradient ops -- even for gelu. Currently, we fail to use LoopVectorization in that case but if we hardcode some of the common cases the performance will significantly improve.

avik-pal · 2024-08-07T01:51:18Z

@ayushinav can we get this finished?

avik-pal · 2024-08-08T22:46:51Z

@ayushinav bump

ayushinav · 2024-08-08T22:50:53Z

@avik-pal
Yes, I'm on this. I need to write the gradients for the new __projects using LoopVectorization. The projection becomes faster then, but there isn't a significant speed up for the overall network.

avik-pal · 2024-08-08T22:53:30Z

Do that in a separate PR, let's get the benchmarks aligned first. The pytorch ones use a different size it seems.

ayushinav · 2024-08-09T03:03:22Z

The sizes are now aligned. The python variant of DeepONet only supports 1 eval point (in the unaligned case), and the Flux variant doesn't support batching. To have the same size for inputs, I made the batch size and the eval points same to compare with both the variants.

The Flux variant of FNO only supported a fixed length of kernels, which is fixed here.

The difference in size now is because python uses (batch_size, N) whereas julia uses (N, batch_size)

avik-pal · 2024-08-09T04:55:44Z

I am guessing the overhead in FNO is currently from fft?

ayushinav · 2024-08-09T05:22:33Z

Fair share of fft and matmuladd, though the share of later has increased

avik-pal · 2024-08-09T07:18:51Z

Can you also profile the backward pass for the FNO? I am surprised it is that bad

ayushinav · 2024-08-10T02:21:50Z

For mse loss, profiling the backward as gradient(ps -> loss(first(model(x, ps, st)), y), ps)

avik-pal · 2024-08-14T06:05:09Z

using the permuted formulation it is now all just fft time in forward and backward. It is quite surprising that our FFT is so much slower than pytorch.

@ayushinav might be worth giving https://github.com/Taaitaaiger/RustFFT.jl a shot and checking the performance on CPUs

Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.24.5 to 1.24.6. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.24.5...v1.24.6) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[skip ci] [skip docs]

avik-pal mentioned this pull request Jul 30, 2024

feat: use LoopVectorization for faster operations LuxDL/LuxLib.jl#111

Merged

20 tasks

This comment was marked as outdated.

Sign in to view

avik-pal force-pushed the bm_docs branch from 7bb1628 to b4db705 Compare August 4, 2024 20:46

avik-pal requested changes Aug 4, 2024

View reviewed changes

bench/pytorch.py Show resolved Hide resolved

avik-pal force-pushed the bm_docs branch 2 times, most recently from dd5b486 to df49692 Compare August 4, 2024 21:59

avik-pal force-pushed the bm_docs branch 2 times, most recently from b613796 to c2439c6 Compare August 26, 2024 16:25

avik-pal force-pushed the bm_docs branch 2 times, most recently from 5450203 to f10c0fb Compare September 28, 2024 01:53

avik-pal force-pushed the bm_docs branch from f10c0fb to a62313e Compare September 28, 2024 01:57

dependabot bot and others added 8 commits September 27, 2024 21:58

benchmarking with python codes

074b77e

updated versions

4767854

fix: cleanup the benchmarking code

8844504

feat: setup basic flux code

71f267e

[skip ci] [skip docs]

Fair tests across Lux, Flux, and pytorch

877a849

minor cleanup

821bdc0

perf: use fast_activation instead of broadcasting

188707c

avik-pal force-pushed the bm_docs branch from a62313e to 82e88e2 Compare September 28, 2024 01:59

perf: use the permuted formulation

9de027b

avik-pal force-pushed the bm_docs branch from 82e88e2 to 9de027b Compare September 28, 2024 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Tests for FNO and DeepONets #17

Benchmark Tests for FNO and DeepONets #17

ayushinav commented Jul 26, 2024 •

edited

Loading

ayushinav commented Jul 26, 2024 •

edited

Loading

avik-pal commented Jul 26, 2024

ayushinav commented Jul 28, 2024

avik-pal commented Jul 28, 2024

avik-pal commented Jul 28, 2024

avik-pal commented Jul 30, 2024

avik-pal commented Jul 30, 2024

avik-pal commented Aug 3, 2024 •

edited

Loading

ChrisRackauckas commented Aug 3, 2024

This comment was marked as outdated.

avik-pal commented Aug 4, 2024

avik-pal commented Aug 7, 2024

avik-pal commented Aug 8, 2024

ayushinav commented Aug 8, 2024

avik-pal commented Aug 8, 2024

ayushinav commented Aug 9, 2024 •

edited

Loading

avik-pal commented Aug 9, 2024

ayushinav commented Aug 9, 2024

avik-pal commented Aug 9, 2024

ayushinav commented Aug 10, 2024

avik-pal commented Aug 14, 2024

Benchmark Tests for FNO and DeepONets #17

Are you sure you want to change the base?

Benchmark Tests for FNO and DeepONets #17

Conversation

ayushinav commented Jul 26, 2024 • edited Loading

ayushinav commented Jul 26, 2024 • edited Loading

avik-pal commented Jul 26, 2024

ayushinav commented Jul 28, 2024

avik-pal commented Jul 28, 2024

avik-pal commented Jul 28, 2024

avik-pal commented Jul 30, 2024

avik-pal commented Jul 30, 2024

avik-pal commented Aug 3, 2024 • edited Loading

ChrisRackauckas commented Aug 3, 2024

This comment was marked as outdated.

avik-pal commented Aug 4, 2024

avik-pal commented Aug 7, 2024

avik-pal commented Aug 8, 2024

ayushinav commented Aug 8, 2024

avik-pal commented Aug 8, 2024

ayushinav commented Aug 9, 2024 • edited Loading

avik-pal commented Aug 9, 2024

ayushinav commented Aug 9, 2024

avik-pal commented Aug 9, 2024

ayushinav commented Aug 10, 2024

avik-pal commented Aug 14, 2024

ayushinav commented Jul 26, 2024 •

edited

Loading

ayushinav commented Jul 26, 2024 •

edited

Loading

avik-pal commented Aug 3, 2024 •

edited

Loading

ayushinav commented Aug 9, 2024 •

edited

Loading