-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chapter about CPU features dispatching into docs #2945
Conversation
Thanks for writing up this doc, it's very helpful. A couple question after a quick look:
|
Co-authored-by: david-cortes-intel <david.cortes@intel.com>
Thanks for the prompt review!
No, daal only deprecated as the API. but all the computational kernels for CPUs, otherwise, are implemented in Someday the chapter about high-level oneDAL folders structure and what is located where and how all the parts connect will also be added I hope.
It is probably because most of the CPU-specific functionality like intrinsics are compiler-specific. |
@keeranroth and @rakshithgb-fujitsu, can you please take a look at this chapter? |
more of a question rather than a suggestion, the service defines for compiler macros defined here - https://github.com/oneapi-src/oneDAL/blob/main/cpp/daal/src/services/service_defines.h specifically regarding the ones that are mentioned for GNU and others, they don't really translate to any compiler hints. Does this mean that only Going forward since multiple architectures are supported, the compiler hints might be architecture specific, how would this be handled? |
@rakshithgb-fujitsu Yes, the sections for GNU and VS compilers do not have definitions for SIMD-related pragmas. But we are trying to guide other compilers as well where possible. You can see Regarding the instruction set architecture (ISA) specific definitions, there is no problems with defining those. As all the ISA-specific definitions must be put under the respective defines. For example:
So, all the ISA-specific definitions would also go under the respective defines. I've tried to describe that in the chapter, but it seems I need to improve that part to make it more clear. |
Some like "ivdep" and "novector" do have equivalents in other compilers nowadays though - for example, there's |
Good catch. It would be good to improve the definitions from GCC and MSVC in this case. I've created a task for this. |
CONTRIBUTING.md
Outdated
@@ -85,6 +85,11 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github. | |||
|
|||
## Custom Components | |||
|
|||
### CPU Features Dispatching | |||
|
|||
oneDAL provides multiarchitecture binaries that contain codes for multiple variants of CPU instruction set architectures. When run on a certain hardware type, oneDAL chooses the code path which is most suitable for this particular hardware to achieve better performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The term architecture is overloaded. Can we find more precise language here? Different ISA extensions (e.g. avx2, avx512) can be supported in the same binary, but it should be made clear that it's only variations on the same base ISA that are allowed. That is to cover adding documentation for Arm and RISC-V support in the future.
What do you think for the following phrasing?
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for SSE2, AVX2, AVX512, etc, on top of the x86-64 base architecture. Specialisations can exist for specific implementations (e.g. skylake-x, nehalem, etc). When run on a specific hardware implementation, oneDAL chooses the code path which is most suitable for that implementation.
I still don't think that is ideal, but I hope it illustrates the differentiation between ISA extension and ISA that I want to make clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good observation. Currently in the chapter I do not make the distinction between the ISA in broader meaning (like x86, RISC-V, ARM, ...) and ISA extensions.
I will update the docs in accordance with your suggestion. It is hard for me to come up with a better wording for ISA and ISA extensions as well.
- Intel\ |reg|\ Advanced Vector Extensions 2 (Intel\ |reg|\ AVX2) | ||
- Intel\ |reg|\ Advanced Vector Extensions 512 (Intel\ |reg|\ AVX-512) | ||
|
||
The particular code path is chosen at runtime based on the underlying hardware characteristics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The particular code path is chosen at runtime based on the underlying hardware characteristics. | |
The particular code path is chosen at runtime based on underlying hardware properties. |
\*_kernel.h | ||
----------- | ||
|
||
Those files contain the definitions of one or several template classes that define member functions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't start the section with a pronoun. Put a full description of what you are describing. Maybe:
In the directory structure introduced in the last section, there are files with a `_kernel.h` suffix. These contain the definitions of ...
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm, | ||
``float`` or ``double``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm, | |
``float`` or ``double``. | |
- ``algorithmFPType`` Data type to use in intermediate computations for the algorithm. | |
Must be one of ``float`` or ``double``. |
\*_impl.i | ||
--------- | ||
|
||
Those files contain the implementations of the computational functions defined in `*_kernel.h` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't start the section with a pronoun. See the similar comment at the start of the \*_kernel.h
section
|
||
Although the implementation of the ``method1`` does not contain any instruction set specific code, it is | ||
expected that the developers leverage SIMD related macros available in |short_name|. | ||
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and others pragmas defined in | |
For example, ``PRAGMA_IVDEP``, ``PRAGMA_VECTOR_ALWAYS``, ``PRAGMA_VECTOR_ALIGNED`` and other pragmas defined in |
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code | ||
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under | ||
CPU-specific defines. For example, the AVX-512 specific code should be placed under | ||
``__CPUID__(DAAL_CPU) == __avx512__``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be placed under compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code | |
should be placed under ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be placed under | |
CPU-specific defines. For example, the AVX-512 specific code should be placed under | |
``__CPUID__(DAAL_CPU) == __avx512__``. | |
be gated by values of compiler-specific defines. For example, the Intel\ |reg|\ oneAPI DPC++/C++ Compiler specific code | |
should be gated by the existence of the ``DAAL_INTEL_CPP_COMPILER`` define. All the CPU-specific code should be gated on the value of | |
CPU-specific defines. For example, the AVX-512 specific code should be gated on the value | |
``__CPUID__(DAAL_CPU) == __avx512__``. |
\*_fpt_cpu.cpp | ||
-------------- | ||
|
||
Those files contain the instantiations of the template classes defined in `*_kernel.h` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't start a section with a pronoun. See the similar comment at the start of the \*_kernel.h
section
`_fpt_cpu.cpp` files are not compiled directly into object files. First, multiple copies of those files | ||
are made replacing the ``fpt``, which stands for 'floating point type', and ``cpu`` parts of the file name | ||
as well as the corresponding ``DAAL_FPTYPE`` and ``DAAL_CPU`` macros with the actual data type and CPU type values. | ||
Then the resulting files are compiled with appropriate CPU-specific optimization compiler options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the resulting files are compiled with appropriate CPU-specific optimization compiler options. | |
Then the resulting files are compiled with appropriate CPU-specific compiler optimization options. |
|short_name| uses Intel\ |reg|\ oneAPI Threading Building Blocks (Intel\ |reg|\ oneTBB) to do parallel | ||
computations on CPU. | ||
|
||
But oneTBB is not used in the code of oneDAL algorithms directly. The algorithms rather | ||
But oneTBB is not used in the code of |short_name| algorithms directly. The algorithms rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't start a paragraph with a conjunction ('but'). Combine the paragraphs:
... computations on CPU. oneTBB is not used in the code ...
… chapter about build systems.
CONTRIBUTING.md
Outdated
@@ -85,6 +85,12 @@ For your convenience we also added [coding guidelines](http://oneapi-src.github. | |||
|
|||
## Custom Components | |||
|
|||
### CPU Features Dispatching | |||
|
|||
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture. | |
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc. extensions, on top of the x86-64 base architecture. |
CONTRIBUTING.md
Outdated
### CPU Features Dispatching | ||
|
||
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture. | ||
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation. | |
When run on a specific hardware implementation like Haswell, Skylake-X, etc., oneDAL chooses the code path which is most suitable for that implementation. |
CONTRIBUTING.md
Outdated
|
||
oneDAL provides binaries that can contain code targeting different architectural extensions of a base instruction set architecture (ISA). For example, code paths can exist for Intel(R) SSE2, Intel(R) AVX2, Intel(R) AVX-512, etc.extensions, on top of the x86-64 base architecture. | ||
When run on a specific hardware implementation like Haswell, Skylake-X, etc. , oneDAL chooses the code path which is most suitable for that implementation. | ||
Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contributors should leverage [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform well on various hardware implementations. | |
Contributors should leverage the [CPU Features Dispatching](http://oneapi-src.github.io/oneDAL/contribution/cpu_features.html) mechanism to implement the code of the algorithms that can perform most optimally on various hardware implementations. |
The most important definitions and functions for CPU features dispatching are located in the files | ||
|32e_make|_ for x86-64 architecture, |riscv_make|_ for RISC-V 64-bit architecture, and |arm_make|_ | ||
for ARM architecture. | ||
Those files are included into operating system related files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those files are included into operating system related files. | |
Those files are included into operating system related makefiles. |
To add a new architectural extension into |32e_make| file, ``CPUs`` and ``CPUs.files`` lists need to be updated. | ||
The functions like ``set_uarch_options_for_compiler`` and others should also be updated accordingly. | ||
|
||
The compiler options for the new architectural extension should be added to the respective file in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler options for the new architectural extension should be added to the respective file in | |
The compiler options for the new architectural extension should be added to the respective file in the |
@keeranroth , @david-cortes-intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @Vika-F
Please remember about this point: #2945 (comment) |
Pull changes from oneDAL main branch
Sorry, I've forgot about that. Thanks for pointing it to me. Please take a look. |
This PR adds the chapter that describes how CPU features dispatching is implemented in oneDAL into the documentation.
Checklist to comply with before moving PR from draft:
PR completeness and readability