Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for JIT-ing in AMD and NVIDIA backends #14280

Merged
merged 31 commits into from
Jul 25, 2024

Conversation

jchlanda
Copy link
Contributor

@jchlanda jchlanda commented Jun 25, 2024

This patch provides the following:

  • support for JIT compilation of Nvidia and AMD kernels
    This is guarded by SYCL_JIT_KERNELS environment variable. Target CPU and features can be provided through environment variables (SYCL_JIT_TARGET_CPU and SYCL_JIT_TARGET_FEATURES respectively), otherwise default, per-backend, values will be chosen.

  • caching of JIT-compiled kernels
    The runtime maintains a map of available JIT-ed kernels, accessible through a key, which is constructed from kernel's name and values of specialization constant (if provided).

  • materialization of specialization
    Materialization is done through a SYCLSpecConstMaterializer pass that receives the values of all specialization constants used by a kernel (from SYCLSpecConstDataInserter) and then walks all the uses of implicit kernel argument (_arg__specialization_constants_buffer), representing emulated specialization constants, with concrete values, turning them to de-facto compile time constants.

This PR extends the work done for kernel fusion and in a similar fashion it requires embedding of IR (-fsycl-embed-ir) during the initial compilation.

@jchlanda jchlanda requested a review from Naghasan June 25, 2024 10:00
@jchlanda jchlanda force-pushed the jakub/jit_spec_const branch 3 times, most recently from 1408f59 to ef2a74a Compare June 26, 2024 06:07
@jchlanda jchlanda changed the title [WIP] Add support for jitting in AMD and NVIDIA backends [WIP] Add support for JIT-ing in AMD and NVIDIA backends Jun 27, 2024
Also allow resetting of SYCL_CACHE_IN_MEM.
@steffenlarsen
Copy link
Contributor

Would it be possible to write some tests for this? Even maybe unittests?

I just remembered that I asked about tests. What is the plan for E2E or SYCL-level unittests?

@jchlanda
Copy link
Contributor Author

Would it be possible to write some tests for this? Even maybe unittests?

I just remembered that I asked about tests. What is the plan for E2E or SYCL-level unittests?

Testing of this is a bit tricky, but I though we could use the debug output of the specialization constant materializer pass as a proof of the JIT taking place (as the pass is only pass of the JIT pipeline). I have it almost ready, exercising both a regular kernel and kernel bundle. Going to add it to e2e.

Will ping you once it's push to this PR.

@jchlanda
Copy link
Contributor Author

Would it be possible to write some tests for this? Even maybe unittests?

I just remembered that I asked about tests. What is the plan for E2E or SYCL-level unittests?

Testing of this is a bit tricky, but I though we could use the debug output of the specialization constant materializer pass as a proof of the JIT taking place (as the pass is only pass of the JIT pipeline). I have it almost ready, exercising both a regular kernel and kernel bundle. Going to add it to e2e.

Will ping you once it's push to this PR.

OK, @steffenlarsen let me know what you thing about this test, thank you!

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test! Looks like a reasonable way to cover it.

// RUN: %{build} -fsycl-embed-ir -o %t.out
// RUN: env SYCL_JIT_AMDGCN_PTX_KERNELS=1 env SYCL_JIT_COMPILER_DEBUG="sycl-spec-const-materializer" %{run} %t.out &> %t.txt ; FileCheck %s --input-file %t.txt

#include <sycl/sycl.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/sycl.hpp is not permitted in E2E tests. You want sycl/detail/core.hpp and then probably some other headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jchlanda
Copy link
Contributor Author

@intel/llvm-gatekeepers I think we might be good to go on this one.

@martygrant martygrant merged commit 63c61d8 into intel:sycl Jul 25, 2024
16 checks passed
@steffenlarsen
Copy link
Contributor

@jchlanda - There are some post-commit failures caused by this PR. I am hoping #14770 will address them all.

@jchlanda
Copy link
Contributor Author

@jchlanda - There are some post-commit failures caused by this PR. I am hoping #14770 will address them all.

Yeap, just saw the PR, thank you for taking the time to address it.

@steffenlarsen
Copy link
Contributor

@jchlanda - Sadly there seems to be more failures related to it. Could you please take a look?

@sarnex
Copy link
Contributor

sarnex commented Jul 25, 2024

@jchlanda Do you mind confirming if you're working on the problem? If it's outside of working hours for you we might want to revert. Thanks

@Naghasan
Copy link
Contributor

@jchlanda Do you mind confirming if you're working on the problem? If it's outside of working hours for you we might want to revert. Thanks

I'm looking into this, I think Jakub is out of office now.

@sarnex
Copy link
Contributor

sarnex commented Jul 25, 2024

@Naghasan Thanks!

@Naghasan
Copy link
Contributor

Seems it was the NDEBUG macro causing issue, this should fix #14777

@sarnex
Copy link
Contributor

sarnex commented Jul 25, 2024

@Naghasan There are also unused param warnings:

/Users/runner/work/llvm/llvm/src/sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp:232:24: error: unused parameter 'TargetCPU' [-Werror,-Wunused-parameter]
    const std::string &TargetCPU, const std::string &TargetFeatures) {
                       ^
/Users/runner/work/llvm/llvm/src/sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp:232:54: error: unused parameter 'TargetFeatures' [-Werror,-Wunused-parameter]
    const std::string &TargetCPU, const std::string &TargetFeatures) {
                                                     ^
/Users/runner/work/llvm/llvm/src/sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp:313:24: error: unused parameter 'TargetCPU' [-Werror,-Wunused-parameter]
    const std::string &TargetCPU, const std::string &TargetFeatures) {
                       ^
/Users/runner/work/llvm/llvm/src/sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp:313:54: error: unused parameter 'TargetFeatures' [-Werror,-Wunused-parameter]
    const std::string &TargetCPU, const std::string &TargetFeatures) {
                                                     ^

https://github.com/intel/llvm/actions/runs/10095439923/job/27915595323

hdelan pushed a commit to hdelan/llvm that referenced this pull request Jul 26, 2024
This patch provides the following:
* support for JIT compilation of Nvidia and AMD kernels
This is guarded by `SYCL_JIT_KERNELS` environment variable. Target CPU
and features can be provided through environment variables
(`SYCL_JIT_TARGET_CPU` and `SYCL_JIT_TARGET_FEATURES` respectively),
otherwise default, per-backend, values will be chosen.

* caching of JIT-compiled kernels
The runtime maintains a map of available JIT-ed kernels, accessible
through a key, which is constructed from kernel's name and values of
specialization constant (if provided).

* materialization of specialization
Materialization is done through a `SYCLSpecConstMaterializer` pass that
receives the values of all specialization constants used by a kernel
(from `SYCLSpecConstDataInserter`) and then walks all the uses of
implicit kernel argument (`_arg__specialization_constants_buffer`),
representing emulated specialization constants, with concrete values,
turning them to de-facto compile time constants.

This PR extends the work done for kernel fusion and in a similar fashion
it requires embedding of IR (`-fsycl-embed-ir`) during the initial
compilation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants