Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chapter about CPU features dispatching into docs #2945

Merged
merged 15 commits into from
Oct 29, 2024

Conversation

Vika-F
Copy link
Contributor

@Vika-F Vika-F commented Oct 15, 2024

This PR adds the chapter that describes how CPU features dispatching is implemented in oneDAL into the documentation.


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

@Vika-F Vika-F added the docs Issue/PR related to oneDAL docs label Oct 15, 2024
@Vika-F Vika-F changed the title [WIP] Add chapter about CPU features dispatching into docs Add chapter about CPU features dispatching into docs Oct 18, 2024
@Vika-F Vika-F marked this pull request as ready for review October 18, 2024 10:38
@Vika-F Vika-F requested review from david-cortes-intel and removed request for emmwalsh, maria-Petrova and a team October 18, 2024 10:39
CONTRIBUTING.md Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
@david-cortes-intel
Copy link
Contributor

Thanks for writing up this doc, it's very helpful.

A couple question after a quick look:

  • It suggests to put files under cpp/daal, but isn't that meant to be deprecated in favor of cpp/oneapi/dal?
  • Why are AVX512 intrinsics conditional on DAAL_INTEL_CPP_COMPILER? Aren't those also supported by GCC and CLANG when compiled with -march=avx512?

Co-authored-by: david-cortes-intel <david.cortes@intel.com>
@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 18, 2024

@david-cortes-intel

Thanks for the prompt review!

  • It suggests to put files under cpp/daal, but isn't that meant to be deprecated in favor of cpp/oneapi/dal?

No, daal only deprecated as the API. but all the computational kernels for CPUs, otherwise, are implemented in cpp/daal.
And cpp/oneapi/dal only provides the new APIs and doesn't contain actual implementations for CPUs.

Someday the chapter about high-level oneDAL folders structure and what is located where and how all the parts connect will also be added I hope.

  • Why are AVX512 intrinsics conditional on DAAL_INTEL_CPP_COMPILER? Aren't those also supported by GCC and CLANG when compiled with -march=avx512?

It is probably because most of the CPU-specific functionality like intrinsics are compiler-specific.

docs/source/contribution/cpu_features.rst Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
docs/source/contribution/cpu_features.rst Outdated Show resolved Hide resolved
@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 21, 2024

@keeranroth and @rakshithgb-fujitsu, can you please take a look at this chapter?
It considers only x86 for now, but the similar code structure might be implemented for RISC-V and ARM to support various instruction set architectures.

@rakshithgb-fujitsu
Copy link
Contributor

more of a question rather than a suggestion, the service defines for compiler macros defined here - https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h specifically regarding the ones that are mentioned for GNU and others, they don't really translate to any compiler hints. Does this mean that only icx compiler can leverage those hints in its current state?

Going forward since multiple architectures are supported, the compiler hints might be architecture specific, how would this be handled?

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 21, 2024

@rakshithgb-fujitsu
Thanks for the prompt response!

Yes, the sections for GNU and VS compilers do not have definitions for SIMD-related pragmas.
That's why there is no such pragmas or analogues in GNU and VS compilers.
Yes, for now only intel compilers will use those guidances.

But we are trying to guide other compilers as well where possible. You can see DAAL_PREFETCH and DAAL_FORCEINLINE definitions later in that file, for example.

Regarding the instruction set architecture (ISA) specific definitions, there is no problems with defining those. As all the ISA-specific definitions must be put under the respective defines. For example:

#if (__CPUID__(DAAL_CPU) == __avx512__)

// AVX-512 specific code goes here

#endif

So, all the ISA-specific definitions would also go under the respective defines.

I've tried to describe that in the chapter, but it seems I need to improve that part to make it more clear.

@david-cortes-intel
Copy link
Contributor

Some like "ivdep" and "novector" do have equivalents in other compilers nowadays though - for example, there's #pragma GCC ivdep which is also recognized by clang; and #pragma loop(ivdep) for MSVC.

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 23, 2024

@david-cortes-intel

Some like "ivdep" and "novector" do have equivalents in other compilers nowadays though - for example, there's #pragma GCC ivdep which is also recognized by clang; and #pragma loop(ivdep) for MSVC.

Good catch. It would be good to improve the definitions from GCC and MSVC in this case. I've created a task for this.

CONTRIBUTING.md Outdated
@@ -85,6 +85,11 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github.

## Custom Components

### CPU Features Dispatching

oneDAL provides multiarchitecture binaries that contain codes for multiple variants of CPU instruction set architectures. When run on a certain hardware type, oneDAL chooses the code path which is most suitable for this particular hardware to achieve better performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The term architecture is overloaded. Can we find more precise language here? Different ISA extensions (e.g. avx2, avx512) can be supported in the same binary, but it should be made clear that it's only variations on the same base ISA that are allowed. That is to cover adding documentation for Arm and RISC-V support in the future.

What do you think for the following phrasing?

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for SSE2, AVX2, AVX512, etc, on top of the x86-64 base architecture. Specialisations can exist for specific implementations (e.g. skylake-x, nehalem, etc). When run on a specific hardware implementation, oneDAL chooses the code path which is most suitable for that implementation.

I still don't think that is ideal, but I hope it illustrates the differentiation between ISA extension and ISA that I want to make clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good observation. Currently in the chapter I do not make the distinction between the ISA in broader meaning (like x86, RISC-V, ARM, ...) and ISA extensions.
I will update the docs in accordance with your suggestion. It is hard for me to come up with a better wording for ISA and ISA extensions as well.

- Intel\ |reg|\ Advanced Vector Extensions 2 (Intel\ |reg|\ AVX2)
- Intel\ |reg|\ Advanced Vector Extensions 512 (Intel\ |reg|\ AVX-512)

The particular code path is chosen at runtime based on the underlying hardware characteristics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The particular code path is chosen at runtime based on the underlying hardware characteristics.
The particular code path is chosen at runtime based on underlying hardware properties.

\*_kernel.h
-----------

Those files contain the definitions of one or several template classes that define member functions that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't start the section with a pronoun. Put a full description of what you are describing. Maybe:

In the directory structure introduced in the last section, there are files with a `_kernel.h` suffix. These contain the definitions of ...

Comment on lines +152 to +153
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm,
``float`` or ``double``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm,
``float`` or ``double``.
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm.
Must be one of ``float`` or ``double``.

\*_impl.i
---------

Those files contain the implementations of the computational functions defined in `*_kernel.h` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't start the section with a pronoun. See the similar comment at the start of the \*_kernel.h section


Although the implementation of the ``method1`` does not contain any instruction set specific code, it is
expected that the developers leverage SIMD related macros available in |short_name|.
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in

Comment on lines 176 to 179
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under
CPU-specific defines. For example, the AVX-512 specific code should be placed under
``__CPUID__(DAAL_CPU) == __avx512__``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under
CPU-specific defines. For example, the AVX-512 specific code should be placed under
``__CPUID__(DAAL_CPU) == __avx512__``.
be gated by values of compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code
should be gated by the existence of the ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be gated on the value of
CPU-specific defines. For example, the AVX-512 specific code should be gated on the value
``__CPUID__(DAAL_CPU) == __avx512__``.

\*_fpt_cpu.cpp
--------------

Those files contain the instantiations of the template classes defined in `*_kernel.h` files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't start a section with a pronoun. See the similar comment at the start of the \*_kernel.h section

`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files
are made replacing the ``fpt``, which stands for 'floating point type', and ``cpu`` parts of the file name
as well as the corresponding ``DAAL_FPTYPE`` and ``DAAL_CPU`` macros with the actual data type and CPU type values.
Then the resulting files are compiled with appropriate CPU-specific optimization compiler options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then the resulting files are compiled with appropriate CPU-specific optimization compiler options.
Then the resulting files are compiled with appropriate CPU-specific compiler optimization options.

Comment on lines 22 to 25
|short_name| uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel
computations on CPU.

But oneTBB is not used in the code of oneDAL algorithms directly. The algorithms rather
But oneTBB is not used in the code of |short_name| algorithms directly. The algorithms rather
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't start a paragraph with a conjunction ('but'). Combine the paragraphs:

... computations on CPU. oneTBB is not used in the code ...

CONTRIBUTING.md Outdated
@@ -85,6 +85,12 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github.

## Custom Components

### CPU Features Dispatching

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc. extensions, on top of the x86-64 base architecture.

CONTRIBUTING.md Outdated
### CPU Features Dispatching

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.
When run on a specific hardware implementation like Haswell, Skylake-X, etc., oneDAL chooses the code path which is most suitable for that implementation.

CONTRIBUTING.md Outdated

oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture.
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation.
Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations.
Contributors should leverage the [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform most optimally on various hardware implementations.

The most important definitions and functions for CPU features dispatching are located in the files
|32e_make|_ for x86-64 architecture, |riscv_make|_ for RISC-V 64-bit architecture, and |arm_make|_
for ARM architecture.
Those files are included into operating system related files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Those files are included into operating system related files.
Those files are included into operating system related makefiles.

To add a new architectural extension into |32e_make| file, ``CPUs`` and ``CPUs.files`` lists need to be updated.
The functions like ``set_uarch_options_for_compiler`` and others should also be updated accordingly.

The compiler options for the new architectural extension should be added to the respective file in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The compiler options for the new architectural extension should be added to the respective file in
The compiler options for the new architectural extension should be added to the respective file in the

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 25, 2024

@keeranroth , @david-cortes-intel
I think I've addressed all the comments. Can you please take a look one more time?

Copy link
Contributor

@keeranroth keeranroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @Vika-F

@david-cortes-intel
Copy link
Contributor

@keeranroth , @david-cortes-intel I think I've addressed all the comments. Can you please take a look one more time?

Please remember about this point: #2945 (comment)

@Vika-F
Copy link
Contributor Author

Vika-F commented Oct 28, 2024

@david-cortes-intel

Please remember about this point: #2945 (comment)

Sorry, I've forgot about that. Thanks for pointing it to me.
I've added a note about that:
https://github.com/oneapi-src/oneDAL/pull/2945/files#diff-d3dd36089bea7ea0a85941dd0e1d91c4456b5e23bcc29fd23bdce9afbd130ed1R144

Please take a look.

@Vika-F Vika-F merged commit e2be5f6 into oneapi-src:main Oct 29, 2024
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issue/PR related to oneDAL docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants