Skip to content

Commit

Permalink
Added some "lessons learned" to the README
Browse files Browse the repository at this point in the history
  • Loading branch information
hcschuetz committed Oct 11, 2023
1 parent 3696d4e commit 54ef1ef
Showing 1 changed file with 59 additions and 1 deletion.
60 changes: 59 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,62 @@ This project only deals with complex-to-complex FFTs.

## Lessons Learned

TODO
- The implementations differ in several ways.
Starting from a very straight-forward algorithm implemented in TypeScript,
various optimizations have been performed, ranging from technical details
to algorithmic changes, for example:
- usage of typed arrays instead of plain JavaScript arrays,
- pre-computation of cosines and sines;
- pre-allocation of memory,
- different ways of pre-computing of indexes,
- usage of lower-level programming languages (C++, Rust) and
Runtimes (WebAssembly),
- "radix-4" instead of "radix-2" FFT algorithm,

- Using additional technologies has some costs (ability to program in
other languages, setting up a more complex build chain, interfacing, ...)
and it might not be needed if other optimizations lead to sufficiently
efficient code.

On the other hand, in addition to some extra speedup by using
different technologies there is also a chance that
existing optimized library code can be used
(in our case, existing FFT libraries for C/C++ and Rust).

This project can be seen as an example of how
an application primarily implemented in JavaScript/TypeScript
can delegate some performance-critical functions to WebAssembly.

- Benchmark results appear to be inconsistent at times.
For example:
- The performance ranking of certain (optimized) implementations
depends on the browser used.
Some implementations may be faster than others on Firefox
while a different implementation may be the faster one on
V8-based browsers.
Occasionally even different browser versions produce different rankings.
- Minor code changes sometimes have surprising performance impacts.
This can, e.g., be caused by a function size crossing the limit
up to which the optimizer inlines the function.
- Just-in-time compilers perform optimizations on a function only
when that function has been run a number of times.
- Implementations scale differently.
For example the Rust-library implementation is already very good at
a data size of 2048,
but it significantly outperforms other implementations
for larger data sizes.
- Sometimes some other processes using the CPU or
thermal CPU throttling impact the results.

(Some of these aspects are more of an issue for JS runtimes.
WebAssembly code does not only tend to be faster,
but its speed is also more consistent.)

To get some understanding of these inconsistencies
(in particular to distinguish random and systematic inconsistencies),
the benchmark tool runs and displays multiple measurements for each
implementation.
To reduce the impact of thermal throttling, the benchmark tool can pause
for a while before each measurement.

- ...

0 comments on commit 54ef1ef

Please sign in to comment.