[enh] Refactor finiteness_checker #2669

icfaust · 2024-02-20T06:29:02Z

Description

Introduce avx2 ISA intrinsics for finiteness_checker. Duplicate AVX512 functions for AVX2 by switching certain numbers to half size and changing instructions from 512 to 256 bit width. Due to the hardcoded nature of the functions, it is not easily templated out without performance loss. This implementation should improve sklearnex performance on standard benchmarks.

Changes proposed in this pull request:

Add avx2 finite sum check
Add avx2 finiteness per element check
Add avx2 SOA supports
Move final comparison of finalMask out of for loop to reduce branching in AVX512 inf/nan check.
AVX2 to finiteness_checker_avx2_impl.i
AVX512 to finiteness_chcker_avx512_impli.
Common functions to finitness_checker_cpu.cpp, swap to templating
Expand definitions in finiteness_checker.h
Move __DAAL_KERNEL_MIN to daal_kernel_defines.h
fix bug in DAAL_DISPATCH_FUNCTION_BY_CPU and DAAL_DISPATCH_FUNCTION_BY_CPU_SAFE to properly select other ISAs

Tasks

Implement AVX2
Get it to compile
Green CI
Run sklearnex Benchmarks

cpp/daal/src/data_management/finiteness_checker.cpp

icfaust · 2024-02-20T07:12:08Z

/intelci: run

icfaust · 2024-02-20T09:49:02Z

/intelci: run

icfaust · 2024-02-20T12:01:26Z

/intelci: run

icfaust · 2024-02-21T08:36:54Z

test fail related to rbf kernel, which doesn't use this code

Alexandr-Solovev

In general it looks good to me, but, it's better to change the order of the new functions. Like firstly avx2 then avx512. Also, please use onedal-benchmarks with the kernel profiler as well as intelex benchmarks to check what kernels have been improved. I would like to say that you will get similar performance benefits in both.

cpp/daal/src/data_management/finiteness_checker.cpp

icfaust · 2024-04-19T08:06:56Z

This will likely pass CI, but performance benchmarks are necessary due to the underlying changes in the CPU function dispatching.

icfaust · 2024-04-19T08:07:08Z

/intelci: run

icfaust · 2024-04-22T07:14:22Z

/intelci: run

icfaust · 2024-04-22T07:15:10Z

Things required before re-review: a privateCI run for checking avx512, and oneDAL performance benchmarks of changes to function dispatching.

icfaust · 2024-04-22T08:52:41Z

Run with an avx512 build: http://intel-ci.intel.com/ef007c41-cb1f-f115-9514-a4bf010d0e2e failures due to un-related GPU issues.

icfaust · 2024-04-23T04:46:52Z

private CI failures due to unrelated GPU/dpc timeouts

icfaust · 2024-04-23T04:55:18Z

private CI run with last sklearnex master (includes _assert_all_finite tests coming from intel/scikit-learn-intelex#1759) http://intel-ci.intel.com/ef012d7e-a408-f166-adc8-a4bf010d0e2e

Vika-F

Thank you for the restructuring effort.
The code became much more understandable.

My final comment is that 'AVX' needs to be replaced with something else like 'SIMD'. Otherwise the code is good to go.

cpp/daal/src/data_management/finiteness_checker_avx2_impl.i

Vika-F · 2024-04-23T09:19:55Z

cpp/daal/src/data_management/finiteness_checker_avx2_impl.i

+    constexpr size_t numberOfBitsInByte = 8;
+    constexpr size_t nPerInstr          = avx2RegisterLength / (numberOfBitsInByte * sizeof(float));
+    services::internal::TArray<bool, avx2> notFiniteArr(nTotalBlocks);
+    bool * notFinitePtr = notFiniteArr.get();


This is just a FYI. No action is needed.

It might be better (in terms of performance) to have those kind of arrays as a 'thread local storage' (see daal::tls).

Having them as a regular arrays might lead to cache coherency issues when i-th thread updates i-th element of notFinitePtr array, and (i+1)-th thread updates (i+1)-th element of the same array. In this case both threads might need to synchronize the information in their caches as both i-th and (i+1)-th elements of the array might reside in both caches of i-th and (i+1)-th threads and can become unsynchronized after the update.

Vika-F · 2024-04-23T09:25:05Z

cpp/daal/src/data_management/finiteness_checker_cpu.cpp

+        bool * localNotFinite = tlsNotFinite.local();
+        DAAL_CHECK_MALLOC_THR(localNotFinite);
+
+        switch ((*tableFeaturesDict)[i].getIndexType())


Don't we need something that checks for integer overflow for integer features here? How is it done in array API?

Vika-F · 2024-04-23T09:34:50Z

cpp/daal/src/data_management/finiteness_checker_cpu.cpp

+        ReadRows<DataType, cpu> dataBlock(table, 0, nRows);
+        DAAL_CHECK_BLOCK_STATUS(dataBlock);
+        const DataType * dataPtr = dataBlock.get();
+
+        sum = computeSum<DataType, cpu>(1, nElements, &dataPtr);


Another FYI. No action is required.

It is better in terms of both performance and memory consumption to move ReadRows inside the parallel for that happens in computeSum.
Because ReadRows in some cases might copy the whole table and convert it -> +1 pass through the memory.
When ReadRows is made by blocks and is combined with the computational part it is usually more efficient.

cpp/daal/src/data_management/finiteness_checker_cpu.cpp

icfaust · 2024-04-23T14:14:33Z

/intelci: run

icfaust · 2024-04-24T04:44:32Z

Rerun due to CI timeouts: http://intel-ci.intel.com/ef01f546-5586-f1d1-863c-a4bf010d0e2e

Vika-F

LGTM

icfaust added 2 commits February 20, 2024 07:17

Add avx2 support

bfbd5f7

clang-formatting

161523d

icfaust added the enhancement label Feb 20, 2024

icfaust commented Feb 20, 2024

View reviewed changes

cpp/daal/src/data_management/finiteness_checker.cpp Outdated Show resolved Hide resolved

__mm256_set1_epi64 -> __mm256_set1_epi64x

94f7242

icfaust added 2 commits February 20, 2024 12:55

Update finiteness_checker.cpp

2fb5721

clang-formatting

f27bce1

icfaust marked this pull request as ready for review February 20, 2024 15:56

icfaust requested review from Alexsandruss, samir-nasibli and Alexandr-Solovev as code owners February 20, 2024 15:56

Alexandr-Solovev reviewed Feb 22, 2024

View reviewed changes

icfaust added 12 commits February 25, 2024 21:50

switch avx512 avx2 ordering

d40bab9

clang-formatting

98c058b

remove print statements

d1f0af4

isolate in macro ifs

874fcba

clang-formatting

e645cdc

add cpu and header

9a23ec0

centralize to header file

3708fb6

readd base case

9929072

removed valuesarenotfinite?

8fd2688

hanging chad

f2957b0

attempts at separating avx512

ef4ad09

forgotten saves

97bf714

samir-nasibli reviewed Feb 27, 2024

View reviewed changes

cpp/daal/src/data_management/finiteness_checker.cpp Outdated Show resolved Hide resolved

current status moving things to header|

12b8a57

icfaust added 9 commits April 18, 2024 12:04

add copyrights

b0cc68a

end space

57412d5

changes to match review

9438df8

last reviewer changes

b90a6cd

Update finiteness_checker_cpu.cpp

a5c6c89

another try at fixing bazel build issues

d478b8b

Merge branch 'oneapi-src:main' into dev/avx2_finite

59a046e

Update BUILD

066195b

formatting

5eff198

icfaust added 2 commits April 21, 2024 23:22

forgotten AVX naming leading to infinte loop

02e6056

add comments

0499a51

icfaust changed the title ~~[enh] Add avx2 support in finiteness_checker~~ [enh] Refactor finiteness_checker Apr 22, 2024

icfaust requested a review from Vika-F April 22, 2024 08:52

Vika-F reviewed Apr 23, 2024

View reviewed changes

icfaust added 3 commits April 23, 2024 06:35

small docstring updates

22a8c6c

more documentation

5012df5

oops

7ab6199

icfaust requested a review from Vika-F April 23, 2024 14:14

Vika-F approved these changes Apr 24, 2024

View reviewed changes

icfaust merged commit 34c4b78 into oneapi-src:main Apr 24, 2024
15 of 16 checks passed

icfaust deleted the dev/avx2_finite branch April 24, 2024 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enh] Refactor finiteness_checker #2669

[enh] Refactor finiteness_checker #2669

icfaust commented Feb 20, 2024 •

edited

Loading

icfaust commented Feb 20, 2024

icfaust commented Feb 20, 2024

icfaust commented Feb 20, 2024

icfaust commented Feb 21, 2024

Alexandr-Solovev left a comment

icfaust commented Apr 19, 2024

icfaust commented Apr 19, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 23, 2024

icfaust commented Apr 23, 2024

Vika-F left a comment

Vika-F Apr 23, 2024

Vika-F Apr 23, 2024

Vika-F Apr 23, 2024

icfaust commented Apr 23, 2024

icfaust commented Apr 24, 2024

Vika-F left a comment

[enh] Refactor finiteness_checker #2669

[enh] Refactor finiteness_checker #2669

Conversation

icfaust commented Feb 20, 2024 • edited Loading

Description

icfaust commented Feb 20, 2024

icfaust commented Feb 20, 2024

icfaust commented Feb 20, 2024

icfaust commented Feb 21, 2024

Alexandr-Solovev left a comment

Choose a reason for hiding this comment

icfaust commented Apr 19, 2024

icfaust commented Apr 19, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 22, 2024

icfaust commented Apr 23, 2024

icfaust commented Apr 23, 2024

Vika-F left a comment

Choose a reason for hiding this comment

Vika-F Apr 23, 2024

Choose a reason for hiding this comment

Vika-F Apr 23, 2024

Choose a reason for hiding this comment

Vika-F Apr 23, 2024

Choose a reason for hiding this comment

icfaust commented Apr 23, 2024

icfaust commented Apr 24, 2024

Vika-F left a comment

Choose a reason for hiding this comment

icfaust commented Feb 20, 2024 •

edited

Loading