Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Implementation of whole graph update #365

Closed
wants to merge 29 commits into from

Conversation

fabiomestre
Copy link
Collaborator

No description provided.

uditagarwal97 and others added 29 commits March 27, 2024 16:12
…pp (intel#13171)

All it's doing is setting doubleGRF, just do that using the first-class
API.

Manually tested this on Win.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Tested manually on Win/Lin with many runs, doesn't hang anymore.

Closes: intel#8815

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
per OSSF
(https://securityscorecards.dev/viewer/?uri=github.com/intel/llvm) all
workflows should have default top level permission set. Which we set to
below as per recommendation

permissions:
  contents: read

then within actual jobs, when needed, we added additional privileges. 

These changes were generated by the recommended OSSF tool 

This PR changes those workflows created/owned by intel/llvm repo. Will
do seperate PR for issues found in llvm/llvm-project inherited
workflows.
…ported on Native CPU (intel#13109)

Similarly to what is done for `nvptx` in
intel#13015, Native CPU maps `private` and
`generic` to the same address spaces, so we need to avoid getting
multiple definitions for the libclc builtins that use `generic`.
Previously we were hard-coding an -O2 optimization level for the
'signbit' builtin for all versions of GCC.

Despite this workaround, I found locally that I was unable to build with
GCC versions 12.2, 12.3, and 13.2. Reducing the optimization level to
-O1 allowed me to progress. This seems to follow the bug report already
linked, which had test cases at -O2 which were also failing.

With this in mind, we can also restrict the GCC versions we apply the
workaround to, so that more modern compilers should "just work" without
us having to do anything. That should save someone having to investigate
a performance report a year or so down the line...
This commit fixes the problem of missing build dependencies between
libclc source files and their various includes.

We would like to do this with compiler-generated dependency files
because then the dependencies are accurate and there are no false
positives, leading to unnecessary rebuilds. This is how regular C/C++
dependencies are usually tracked by CMake.

Note that this variable is an internal API so is not guaranteed to work,
but then again *all* of CMake's support for new languages (which we use
for CLC/LL languages) is an internal API. On balance this change is
probably worth it due to how minimally invasive it is.

The alternative would be to either:

1. list/glob all possible files in the directory as dependencies, which
would lead to false positives.
2. rewrite the library generation as a loop over all files and calling
`add_custom_command`, which can produce a dependency file (by tweaking
our clang command line) that can also be fed back to the same command
via the `DEPFILE` argument. This would be a much larger change and is
not as "neat".
When a non-blocking pipe operation fails,
CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST is expected. The runtime
needs to handle that case instead of throwing the exception.
the OSSF tool sucks and don't use its recommended default settings. It
suggested permissions content:read as default, but that broke most of
our workflows, instead use the GitHub recommended

permissions: read-all
…13045)

XPTI has unit tests that time the cost of each individual framework
action, but an E2E timing test isn't available. This PR adds a new
sample collector that shows how data can be pulled from the SYCL runtime
using XPTI and provides timing information for the callback handler
costs/event.

Allows:
 1. Zero cost application with XPTI_TRACE_ENABLE=0
 2. Zero cost callback handlers when run in calibration mode
3. Full E2E test when run with "--format none" which gives the average
cost of callback handlers/event

---------

Signed-off-by: Vasanth Tovinkere <vasanth.tovinkere@intel.com>
…ements (intel#13019)

We have a report of persistent cache failures. Traced to the directory
creation so I switched it to use C++17 std::filesystem routines for
`OSUtil::makeDir`. Also improved trace reporting.
…ntel#13196)

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
…oading (intel#13083)

Based on discussions with various stakeholders, we concluded that
spirv32/spirv64 are the best-suited strings for target architectures
when user wants to generate JIT code for Intel backends using DPCPP
compiler.
This PR adds changes to allow the DPCPP compiler to accept
spirv32/spirv64 as valid target architecture strings. spir/spir64 are
also valid target architecture strings, but will be deprecated in a
future commit.
This change will help us to align with the SPIR-V backend behavior and
ensure smoother SYCL upstreaming.
Currently, only JIT triples using spirv32/spirv64 are supported. AOT
triples using spirv32/spirv64 will be added soon.

Thanks

---------

Signed-off-by: Sudarsanam, Arvind <arvind.sudarsanam@intel.com>
Updates the git tag for the oneAPI Construction Kit.
Replace check for cv-unqualified object types with a check for
cv-unqualified trivial types to be in line with the
`sycl_ext_oneapi_private_alloca` extension specification:

> `ElementType` must be a cv-unqualified trivial type

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
…ntel#13202)

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Implementing the get_backend_info() functions for our SYCL
implementation based on SYCL 2020 spec. (Link here:
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
you may search for "get_backend_info()" there for the spec for these
functions)
There're six groups of variations for this function, namely
`sycl::platform::get_backend_info()`,
`sycl::context::get_backend_info()`, `sycl::device::get_backend_info()`,
`sycl::queue::get_backend_info()`, `sycl::eventv::get_backend_info()`,
and `sycl::kernel::get_backend_info()`

One known concern: it seems that sycl::platform, sycl::context and
sycl::kernel may have multiple associated device, but according to the
spec the return type for
`sycl::xxx::get_backend_info<info::device::version>()` should be
std::string (i.e. a single device version) so I'm just returning the
version of the first associated device in the list. Is this OK?

---------

Signed-off-by: Hu, Peisen <peisen.hu@intel.com>
* Update the test to initialize the input vectors with 0s to match
`bindless_helpers::fill_rand` requirement of non empty vector.
* Change the name of function `initVector` to  `init_vector`.
* move `init_vector`, `equal_vec` and `operator<<` in header
`bindless_helpers.hpp`.
…able to return result into a different matrix (intel#13151)

Currently, CUDA code that use this pattern:
  for (int i = 0; i < c_frag.num_elements; i++) {
    c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i];
  }
cannot be migrated to SYCL joint matrix.
This added overload addresses this.
Spec API is added here intel#13153
After
intel@370aa2a
grf_size control values changed to 128 and 256 values instead of values
like "small", "large".


> 2) Adds two new kernel properties
> `sycl::ext::intel::experimental::grf_size` and
> `sycl::ext::intel::experimental::grf_size_automatic`, as per the spec.
> `grf_size` adds the `sycl-grf-size` metadata with a value of the
> template parameter **(`128` or `256`)**. `grf_size_automatic` adds the
> `sycl-grf-size` metadata with a value of `0`.

and user is expected to specify value like this:
syclex::properties kernel_properties{intelex::grf_size<128>};
syclex::properties kernel_properties{intelex::grf_size<256>};
Apply clang-format to llvm.bitreverse lowering testcase

---------

Signed-off-by: Lu, John <john.lu@intel.com>
This is the 1st PR in prepare of enabling dev IGC test for some of the
SYCL tests.

Ref: intel#11552

Tested
https://github.com/intel/llvm/actions/runs/8461815185/job/23182202059
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.