Add chapter about CPU features dispatching into docs #2945

Vika-F · 2024-10-15T09:01:54Z

This PR adds the chapter that describes how CPU features dispatching is implemented in oneDAL into the documentation.

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

CONTRIBUTING.md

docs/source/contribution/cpu_features.rst

david-cortes-intel · 2024-10-18T11:27:36Z

Thanks for writing up this doc, it's very helpful.

A couple question after a quick look:

It suggests to put files under cpp/daal, but isn't that meant to be deprecated in favor of cpp/oneapi/dal?
Why are AVX512 intrinsics conditional on DAAL_INTEL_CPP_COMPILER? Aren't those also supported by GCC and CLANG when compiled with -march=avx512?

Co-authored-by: david-cortes-intel <david.cortes@intel.com>

Vika-F · 2024-10-18T12:05:49Z

@david-cortes-intel

Thanks for the prompt review!

It suggests to put files under cpp/daal, but isn't that meant to be deprecated in favor of cpp/oneapi/dal?

No, daal only deprecated as the API. but all the computational kernels for CPUs, otherwise, are implemented in cpp/daal.
And cpp/oneapi/dal only provides the new APIs and doesn't contain actual implementations for CPUs.

Someday the chapter about high-level oneDAL folders structure and what is located where and how all the parts connect will also be added I hope.

Why are AVX512 intrinsics conditional on DAAL_INTEL_CPP_COMPILER? Aren't those also supported by GCC and CLANG when compiled with -march=avx512?

It is probably because most of the CPU-specific functionality like intrinsics are compiler-specific.

docs/source/contribution/cpu_features.rst

…into dev/cpu_features_docs

docs/source/contribution/cpu_features.rst

Vika-F · 2024-10-21T09:42:12Z

@keeranroth and @rakshithgb-fujitsu, can you please take a look at this chapter?
It considers only x86 for now, but the similar code structure might be implemented for RISC-V and ARM to support various instruction set architectures.

rakshithgb-fujitsu · 2024-10-21T11:16:01Z

more of a question rather than a suggestion, the service defines for compiler macros defined here - https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h specifically regarding the ones that are mentioned for GNU and others, they don't really translate to any compiler hints. Does this mean that only icx compiler can leverage those hints in its current state?

Going forward since multiple architectures are supported, the compiler hints might be architecture specific, how would this be handled?

Vika-F · 2024-10-21T12:19:43Z

@rakshithgb-fujitsu
Thanks for the prompt response!

Yes, the sections for GNU and VS compilers do not have definitions for SIMD-related pragmas.
That's why there is no such pragmas or analogues in GNU and VS compilers.
Yes, for now only intel compilers will use those guidances.

But we are trying to guide other compilers as well where possible. You can see DAAL_PREFETCH and DAAL_FORCEINLINE definitions later in that file, for example.

Regarding the instruction set architecture (ISA) specific definitions, there is no problems with defining those. As all the ISA-specific definitions must be put under the respective defines. For example:

#if (__CPUID__(DAAL_CPU) == __avx512__)

// AVX-512 specific code goes here

#endif

So, all the ISA-specific definitions would also go under the respective defines.

I've tried to describe that in the chapter, but it seems I need to improve that part to make it more clear.

david-cortes-intel · 2024-10-21T12:35:11Z

Some like "ivdep" and "novector" do have equivalents in other compilers nowadays though - for example, there's #pragma GCC ivdep which is also recognized by clang; and #pragma loop(ivdep) for MSVC.

Vika-F · 2024-10-23T09:12:38Z

@david-cortes-intel

Some like "ivdep" and "novector" do have equivalents in other compilers nowadays though - for example, there's #pragma GCC ivdep which is also recognized by clang; and #pragma loop(ivdep) for MSVC.

Good catch. It would be good to improve the definitions from GCC and MSVC in this case. I've created a task for this.

keeranroth · 2024-10-23T10:02:21Z

CONTRIBUTING.md

@@ -85,6 +85,11 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github.

 ## Custom Components

+### CPU Features Dispatching
+
+oneDAL provides multiarchitecture binaries that contain codes for multiple variants of CPU instruction set architectures. When run on a certain hardware type, oneDAL chooses the code path which is most suitable for this particular hardware to achieve better performance.


nit: The term architecture is overloaded. Can we find more precise language here? Different ISA extensions (e.g. avx2, avx512) can be supported in the same binary, but it should be made clear that it's only variations on the same base ISA that are allowed. That is to cover adding documentation for Arm and RISC-V support in the future.

What do you think for the following phrasing?

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for SSE2, AVX2, AVX512, etc, on top of the x86-64 base architecture. Specialisations can exist for specific implementations (e.g. skylake-x, nehalem, etc). When run on a specific hardware implementation, oneDAL chooses the code path which is most suitable for that implementation.

I still don't think that is ideal, but I hope it illustrates the differentiation between ISA extension and ISA that I want to make clearer

This is a good observation. Currently in the chapter I do not make the distinction between the ISA in broader meaning (like x86, RISC-V, ARM, ...) and ISA extensions.
I will update the docs in accordance with your suggestion. It is hard for me to come up with a better wording for ISA and ISA extensions as well.

keeranroth · 2024-10-23T10:15:09Z

docs/source/contribution/cpu_features.rst

+- Intel\ |reg|\  Advanced Vector Extensions 2 (Intel\ |reg|\  AVX2)
+- Intel\ |reg|\  Advanced Vector Extensions 512 (Intel\ |reg|\  AVX-512)
+
+The particular code path is chosen at runtime based on the underlying hardware characteristics.


Suggested change

The particular code path is chosen at runtime based on the underlying hardware characteristics.

The particular code path is chosen at runtime based on underlying hardware properties.

keeranroth · 2024-10-23T10:19:49Z

docs/source/contribution/cpu_features.rst

+\*_kernel.h
+-----------
+
+Those files contain the definitions of one or several template classes that define member functions that


nit: Don't start the section with a pronoun. Put a full description of what you are describing. Maybe:

In the directory structure introduced in the last section, there are files with a `_kernel.h` suffix. These contain the definitions of ...

keeranroth · 2024-10-23T10:24:20Z

docs/source/contribution/cpu_features.rst

+- ``algorithmFPType``  Data type to use in intermediate computations for the algorithm,
+                       ``float`` or ``double``.


Suggested change

- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm,

``float`` or ``double``.

- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm.

Must be one of ``float`` or ``double``.

keeranroth · 2024-10-23T10:25:37Z

docs/source/contribution/cpu_features.rst

+\*_impl.i
+---------
+
+Those files contain the implementations of the computational functions defined in `*_kernel.h` files.


nit: Don't start the section with a pronoun. See the similar comment at the start of the \*_kernel.h section

keeranroth · 2024-10-23T10:26:16Z

docs/source/contribution/cpu_features.rst

+
+Although the implementation of the ``method1`` does not contain any instruction set specific code, it is
+expected that the developers leverage SIMD related macros available in |short_name|.
+For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in


Suggested change

For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in

For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in

keeranroth · 2024-10-23T10:29:14Z

docs/source/contribution/cpu_features.rst

+be placed under compiler-specific defines. For example, the Intel\ |reg|\  oneAPI DPC++/C++ Compiler specific code
+should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under
+CPU-specific defines. For example, the AVX-512 specific code should be placed under
+``__CPUID__(DAAL_CPU) == __avx512__``.


Suggested change

be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code

should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under

CPU-specific defines. For example, the AVX-512 specific code should be placed under

``__CPUID__(DAAL_CPU) == __avx512__``.

be gated by values of compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code

should be gated by the existence of the ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be gated on the value of

CPU-specific defines. For example, the AVX-512 specific code should be gated on the value

``__CPUID__(DAAL_CPU) == __avx512__``.

keeranroth · 2024-10-23T10:30:12Z

docs/source/contribution/cpu_features.rst

+\*_fpt_cpu.cpp
+--------------
+
+Those files contain the instantiations of the template classes defined in `*_kernel.h` files.


nit: Don't start a section with a pronoun. See the similar comment at the start of the \*_kernel.h section

keeranroth · 2024-10-23T10:31:15Z

docs/source/contribution/cpu_features.rst

+`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files
+are made replacing the ``fpt``, which stands for 'floating point type', and ``cpu`` parts of the file name
+as well as the corresponding ``DAAL_FPTYPE`` and ``DAAL_CPU`` macros with the actual data type and CPU type values.
+Then the resulting files are compiled with appropriate CPU-specific optimization compiler options.


Suggested change

Then the resulting files are compiled with appropriate CPU-specific optimization compiler options.

Then the resulting files are compiled with appropriate CPU-specific compiler optimization options.

keeranroth · 2024-10-23T10:34:09Z

docs/source/contribution/threading.rst

+|short_name| uses Intel\ |reg|\  oneAPI Threading Building Blocks (Intel\ |reg|\  oneTBB) to do parallel
 computations on CPU.

-But oneTBB is not used in the code of oneDAL algorithms directly. The algorithms rather
+But oneTBB is not used in the code of |short_name| algorithms directly. The algorithms rather


Don't start a paragraph with a conjunction ('but'). Combine the paragraphs:

... computations on CPU. oneTBB is not used in the code ...

… chapter about build systems.

david-cortes-intel · 2024-10-25T10:25:38Z

CONTRIBUTING.md

@@ -85,6 +85,12 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github.

 ## Custom Components

+### CPU Features Dispatching
+
+oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.


Suggested change

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc. extensions, on top of the x86-64 base architecture.

david-cortes-intel · 2024-10-25T10:25:50Z

CONTRIBUTING.md

+### CPU Features Dispatching
+
+oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
+When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.


Suggested change

When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.

When run on a specific hardware implementation like Haswell, Skylake-X, etc., oneDAL chooses the code path which is most suitable for that implementation.

david-cortes-intel · 2024-10-25T10:26:13Z

CONTRIBUTING.md

+
+oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
+When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.
+Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations.


Suggested change

Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations.

Contributors should leverage the [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform most optimally on various hardware implementations.

david-cortes-intel · 2024-10-25T10:27:41Z

docs/source/contribution/cpu_features.rst

+The most important definitions and functions for CPU features dispatching are located in the files
+|32e_make|_ for x86-64 architecture, |riscv_make|_ for RISC-V 64-bit architecture, and |arm_make|_
+for ARM architecture.
+Those files are included into operating system related files.


Suggested change

Those files are included into operating system related files.

Those files are included into operating system related makefiles.

david-cortes-intel · 2024-10-25T10:28:12Z

docs/source/contribution/cpu_features.rst

+To add a new architectural extension into |32e_make| file, ``CPUs`` and ``CPUs.files`` lists need to be updated.
+The functions like ``set_uarch_options_for_compiler`` and others should also be updated accordingly.
+
+The compiler options for the new architectural extension should be added to the respective file in


Suggested change

The compiler options for the new architectural extension should be added to the respective file in

The compiler options for the new architectural extension should be added to the respective file in the

Vika-F · 2024-10-25T11:52:50Z

@keeranroth , @david-cortes-intel
I think I've addressed all the comments. Can you please take a look one more time?

keeranroth

Looks good to me. Thanks @Vika-F

david-cortes-intel · 2024-10-25T13:15:15Z

@keeranroth , @david-cortes-intel I think I've addressed all the comments. Can you please take a look one more time?

Please remember about this point: #2945 (comment)

Pull changes from oneDAL main branch

Vika-F · 2024-10-28T10:10:12Z

@david-cortes-intel

Please remember about this point: #2945 (comment)

Sorry, I've forgot about that. Thanks for pointing it to me.
I've added a note about that:
https://github.com/oneapi-src/oneDAL/pull/2945/files#diff-d3dd36089bea7ea0a85941dd0e1d91c4456b5e23bcc29fd23bdce9afbd130ed1R144

Please take a look.

Add initial info about CPU features dispatching

100c3fa

Vika-F added the docs Issue/PR related to oneDAL docs label Oct 15, 2024

Vika-F added 2 commits October 15, 2024 11:24

Fix a typo

2a5eade

Add code samples

cdfd793

Vika-F changed the title ~~[WIP] Add chapter about CPU features dispatching into docs~~ Add chapter about CPU features dispatching into docs Oct 18, 2024

Vika-F marked this pull request as ready for review October 18, 2024 10:38

Vika-F requested review from a team, maria-Petrova, Alexsandruss and emmwalsh as code owners October 18, 2024 10:38

Vika-F requested review from david-cortes-intel and removed request for emmwalsh, maria-Petrova and a team October 18, 2024 10:39

david-cortes-intel reviewed Oct 18, 2024

View reviewed changes

Fix a typo in CONTRIBUTING.md

80f1cc9

Co-authored-by: david-cortes-intel <david.cortes@intel.com>

david-cortes-intel reviewed Oct 18, 2024

View reviewed changes

docs/source/contribution/cpu_features.rst Show resolved Hide resolved

david-cortes-intel reviewed Oct 18, 2024

View reviewed changes

docs/source/contribution/cpu_features.rst Show resolved Hide resolved

Vika-F added 3 commits October 21, 2024 01:48

Fix typos

7142175

Merge branch 'dev/cpu_features_docs' of https://github.com/Vika-F/daal …

c66f531

…into dev/cpu_features_docs

Add clarification about 'fpt' abbreviation meaning

4e93be0

david-cortes-intel reviewed Oct 21, 2024

View reviewed changes

Add information about compiler-specific and CPU-specific macros

9b31d10

Vika-F added 2 commits October 23, 2024 02:35

HTML rendering fixes

af7669a

Replace oneDAL with |short_name| to align with other .rst files

1ac2d3b

keeranroth reviewed Oct 23, 2024

View reviewed changes

1. Add the distinction between ISA and architecture extension; 2. Add…

8be8290

… chapter about build systems.

david-cortes-intel reviewed Oct 25, 2024

View reviewed changes

Vika-F added 2 commits October 25, 2024 04:23

Apply comments from review

c741226

Apply comments from review

e47546b

keeranroth approved these changes Oct 25, 2024

View reviewed changes

Vika-F added 2 commits October 28, 2024 10:46

Merge pull request #30 from oneapi-src/main

8c05b23

Pull changes from oneDAL main branch

Add a note about the files related to DAAL interface

6b3543c

david-cortes-intel approved these changes Oct 28, 2024

View reviewed changes

Vika-F merged commit e2be5f6 into oneapi-src:main Oct 29, 2024
15 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chapter about CPU features dispatching into docs #2945

Add chapter about CPU features dispatching into docs #2945

Vika-F commented Oct 15, 2024 •

edited

Loading

david-cortes-intel commented Oct 18, 2024

Vika-F commented Oct 18, 2024 •

edited

Loading

Vika-F commented Oct 21, 2024

rakshithgb-fujitsu commented Oct 21, 2024

Vika-F commented Oct 21, 2024

david-cortes-intel commented Oct 21, 2024

Vika-F commented Oct 23, 2024

keeranroth Oct 23, 2024

Vika-F Oct 24, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

keeranroth Oct 23, 2024

david-cortes-intel Oct 25, 2024

david-cortes-intel Oct 25, 2024

david-cortes-intel Oct 25, 2024

david-cortes-intel Oct 25, 2024

david-cortes-intel Oct 25, 2024

Vika-F commented Oct 25, 2024

keeranroth left a comment

david-cortes-intel commented Oct 25, 2024

Vika-F commented Oct 28, 2024

	The particular code path is chosen at runtime based on the underlying hardware characteristics.
	The particular code path is chosen at runtime based on underlying hardware properties.

		- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm,
		``float`` or ``double``.

	For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
	For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in

	Then the resulting files are compiled with appropriate CPU-specific optimization compiler options.
	Then the resulting files are compiled with appropriate CPU-specific compiler optimization options.

	oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
	oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc. extensions, on top of the x86-64 base architecture.

	When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.
	When run on a specific hardware implementation like Haswell, Skylake-X, etc., oneDAL chooses the code path which is most suitable for that implementation.

	Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations.
	Contributors should leverage the [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform most optimally on various hardware implementations.

	Those files are included into operating system related files.
	Those files are included into operating system related makefiles.

	The compiler options for the new architectural extension should be added to the respective file in
	The compiler options for the new architectural extension should be added to the respective file in the

Add chapter about CPU features dispatching into docs #2945

Add chapter about CPU features dispatching into docs #2945

Conversation

Vika-F commented Oct 15, 2024 • edited Loading

david-cortes-intel commented Oct 18, 2024

Vika-F commented Oct 18, 2024 • edited Loading

Vika-F commented Oct 21, 2024

rakshithgb-fujitsu commented Oct 21, 2024

Vika-F commented Oct 21, 2024

david-cortes-intel commented Oct 21, 2024

Vika-F commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vika-F commented Oct 25, 2024

keeranroth left a comment

Choose a reason for hiding this comment

david-cortes-intel commented Oct 25, 2024

Vika-F commented Oct 28, 2024

Vika-F commented Oct 15, 2024 •

edited

Loading

Vika-F commented Oct 18, 2024 •

edited

Loading