Releases · morousg/cvGPUSpeedup

This release brings significant sugar code enhancements:

Operation Builders:
- All Operations now have an static build() function that returns an instance of the proper InstantiableOperation type.
- Some default build() functions have been defined, and it is possible to add overloads of the function to ease the usage of the Operation.
- All Read and ReadBack Operations now have a default build_batch function, to help with the construction of Batch InstantiableOperations.
DeviceFunctions renamed to InstantiableOperations
- Since we wanted to shift the attention of the user thowards the Operation structs, we think having InstantiableOperation naming is more adequate, to avoid introducing more "concept" overhead. At the end, DF's are just the way to make Operations instantiable, using static polimorphism, hence the name of InstantiableOperation.
InstantiableOperation then() function:
- Continuing with the sugar code features, now you can create a transform kernel, by just using Operation types and their new build and then static functions. Here is an example:

const auto myOperationChain = PerThreadRead<_2D,uchar3>::build(params...).then(Cast<uchar3,float3>::build()).then(Mul<float>::build(3.3f));

All Operations are now compatible with both CPU and GPU:
- Saturate has been refactored to use GPU or CPU code, using come macros to give the compiler the proper code.
- Also, many of the implementations are now also constexpr.

In future releases we plan to work on:

Adding a CPU Transform Data Parallel Pattern, that uses both CPU threads and AVX instructions.
Making all operations constexpr, using constexpr OpenSource math libraries. We think this can be much better for performance than having assembler code snippets, which might be the fastest for that single operation, but breaks constexpr wich in some cases can be way worse for performance.

NOTES:

Starting with this release, some tests can only be compiled with CUDA 12.1 to 12.3. This is due to the 4KB limitation in earlier CUDA versions in the amount of Bytes passed as Kernel parameters. CUDA 12 increased that number to 32KB. CUDA 12.4 to 12.6 versions have a bug that affects some unit tests, in the nvcc compiler. This bug will be solved in a future nvcc release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: morousg/cvGPUSpeedup

Alpha-0.0.16 (Parallel Christmas)

Alpha-0.0.15

Alpha-0.0.14

Alpha-0.0.13

Alpha-0.0.12

Alpha-0.0.11

Alpha-0.0.10

Alpha-0.0.9

Alpha-0.0.8

Alpha-0.0.7