Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Implementation of explicit update with indices #12840

Merged
merged 32 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b195409
[SYCL][Graph] Prototype of explicit update with indices
Bensuo Feb 12, 2024
384b9aa
Spell updatable consistently, add missing comments
Bensuo Mar 6, 2024
3eebeb2
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
Bensuo Mar 6, 2024
dfef9b3
Re-add update tests
Bensuo Mar 6, 2024
149611d
Update PI native_cpu dump file
Bensuo Mar 6, 2024
5b89527
Fix inconsistencies between spec and implementation
Bensuo Mar 6, 2024
f5a5518
Fix update tests not handling unsupported backends correctly
Bensuo Mar 7, 2024
7cb9d3f
[SYCL][Graph] Add limited_graph aspect
Bensuo Mar 11, 2024
561fd44
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
Bensuo Mar 11, 2024
b59ccc3
Update UR tag temporarily
Bensuo Mar 11, 2024
0fd968b
Fix aspect and tests
Bensuo Mar 11, 2024
e5a8297
[SYCL][Graph] set_arg for dynamic parameters
Bensuo Mar 12, 2024
b5011e3
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
Bensuo Mar 13, 2024
c51dbbf
Add comments for includes in graph.hpp
Bensuo Mar 13, 2024
2568f4a
Fix linux symbols and aspect test
Bensuo Mar 13, 2024
17a6687
Update windows symbol dumps
Bensuo Mar 13, 2024
6c405d0
Add test for updating multiple nodes with different arg indices
Bensuo Mar 13, 2024
16f2eb6
Update hip symbol dumps
Bensuo Mar 13, 2024
382f016
Fix host tasks moving associated accessors
Bensuo Mar 14, 2024
c17e6c6
Make graph aspects not mutually exclusive
Bensuo Mar 14, 2024
2612638
Simplify graph aspect checks
Bensuo Mar 14, 2024
57db3f6
Addressing MR comments
Bensuo Mar 18, 2024
d62c02c
Fix comment
Bensuo Mar 18, 2024
3cd2ddc
Add more complex accessor tests
Bensuo Mar 19, 2024
21c021f
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
Bensuo Mar 20, 2024
5e10190
Fix test
Bensuo Mar 20, 2024
a9c3bcd
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
Bensuo Mar 20, 2024
77165fd
Add missing pi functions, fix leaking of command buffer commands
Bensuo Mar 20, 2024
76fbc04
Update pi symbol dumps
Bensuo Mar 20, 2024
e4957e3
Add `REQUIRES: cuda_be` back to PI CUDA symbol test
EwanC Mar 22, 2024
8ca0c00
Merge remote-tracking branch 'origin/sycl' into ben/explicit-update-impl
EwanC Mar 22, 2024
5e8c703
clang-format updates
EwanC Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion llvm/include/llvm/SYCLLowerIR/DeviceConfigFile.td
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def AspectExt_oneapi_is_composite : Aspect<"ext_oneapi_is_composite">;
def AspectExt_oneapi_is_component : Aspect<"ext_oneapi_is_component">;
def AspectExt_oneapi_graph : Aspect<"ext_oneapi_graph">;
def AspectExt_intel_fpga_task_sequence : Aspect<"ext_intel_fpga_task_sequence">;
def AspectExt_oneapi_limited_graph : Aspect<"ext_oneapi_limited_graph">;
// Deprecated aspects
def AspectInt64_base_atomics : Aspect<"int64_base_atomics">;
def AspectInt64_extended_atomics : Aspect<"int64_extended_atomics">;
Expand Down Expand Up @@ -122,7 +123,7 @@ def : TargetInfo<"__TestAspectList",
AspectExt_oneapi_mipmap, AspectExt_oneapi_mipmap_anisotropy, AspectExt_oneapi_mipmap_level_reference, AspectExt_intel_esimd,
AspectExt_oneapi_ballot_group, AspectExt_oneapi_fixed_size_group, AspectExt_oneapi_opportunistic_group,
AspectExt_oneapi_tangle_group, AspectExt_intel_matrix, AspectExt_oneapi_is_composite, AspectExt_oneapi_is_component,
AspectExt_oneapi_graph, AspectExt_intel_fpga_task_sequence],
AspectExt_oneapi_graph, AspectExt_intel_fpga_task_sequence, AspectExt_oneapi_limited_graph],
[]>;
// This definition serves the only purpose of testing whether the deprecated aspect list defined in here and in SYCL RT
// match.
Expand Down
42 changes: 42 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ with the following entry-points:
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
| `urCommandBufferEnqueueExp` | Submit command-buffer to a command-queue for execution. |
| `urCommandBufferUpdateKernelLaunchExp` | Updates the parameters of a previous kernel launch command. |

See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
specification for more details.
Expand Down Expand Up @@ -230,6 +231,47 @@ on buffer usage in a graph so that their lifetime semantics are compatible with
a lazy work execution model. However these changes to storage lifetimes have not
yet been implemented.

## Graph Update

### Design Challenges

Graph update faces significant design challenges in SYCL:

* Lambda capture order is explicitly undefined in C++, so the user cannot reason
about the indices of arguments captured by kernel lambdas.
* Once arguments have been captured the actual type information is lost in the
transition through the integration header and extracting arguments in the SYCL
runtime, therefore we cannot automatically match new argument values by
querying the captured arguments without significant possibility for
collisions. For example, if a kernel captures two USM pointers and the user
wishes to update one, we cannot reason about which pointer they actually want
to update when we only know that: they are pointer args of a certain size.

The current approach is to limit graph update to the explicit APIs and where the
user is using `handler::set_arg()` or some equivalent to manually set kernel
arguments using indices. Therefore when updating we can use indices to avoid
collisions. In practice there are only a few current scenarios where `set_arg()`
can be used:

* The proposed ["Free Function Kernel"
extension](../extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc)
* OpenCL interop kernels created from SPIR-V source at runtime.

A possible future workaround lambda capture issues could be "Whole-Graph Update"
where if we can guarantee that lambda capture order is the same across two
different recordings we can then match parameter order when updating.

### Scheduler Integration

Graph updates in the runtime are synchronous calls however they can optionally
be done through the scheduler using a new command,
`sycl::detail::UpdateCommandBufferCommand`. This is needed when dealing with
accessor updates. Since a new buffer which the user creates for updating may not
yet have been lazily initialized on device we schedule a new command which has
requirements for these new accessors to correctly trigger allocations before
updating. This is similar to how individual graph commands are enqueued when
accessors are used in a graph node.

## Backend Implementation

Implementation of UR command-buffers
Expand Down
3 changes: 3 additions & 0 deletions sycl/include/sycl/detail/pi.def
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,9 @@ _PI_API(piextCommandBufferFillUSM)
_PI_API(piextCommandBufferPrefetchUSM)
_PI_API(piextCommandBufferAdviseUSM)
_PI_API(piextEnqueueCommandBuffer)
_PI_API(piextCommandBufferUpdateKernelLaunch)
_PI_API(piextCommandBufferRetainCommand)
_PI_API(piextCommandBufferReleaseCommand)

_PI_API(piextUSMPitchedAlloc)

Expand Down
68 changes: 65 additions & 3 deletions sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,10 @@
// piextEnqueueCooperativeKernelLaunch.
// 15.46 Add piextGetGlobalVariablePointer
// 15.47 Added PI_ERROR_FEATURE_UNSUPPORTED.
// 15.48 Add CommandBuffer update definitions

#define _PI_H_VERSION_MAJOR 15
#define _PI_H_VERSION_MINOR 47
#define _PI_H_VERSION_MINOR 48

#define _PI_STRING_HELPER(a) #a
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
Expand Down Expand Up @@ -445,6 +446,10 @@ typedef enum {
// Composite device
PI_EXT_ONEAPI_DEVICE_INFO_COMPONENT_DEVICES = 0x20111,
PI_EXT_ONEAPI_DEVICE_INFO_COMPOSITE_DEVICE = 0x20112,

// Command Buffers
PI_EXT_ONEAPI_DEVICE_INFO_COMMAND_BUFFER_SUPPORT = 0x20113,
PI_EXT_ONEAPI_DEVICE_INFO_COMMAND_BUFFER_UPDATE_SUPPORT = 0x20114,
} _pi_device_info;

typedef enum {
Expand Down Expand Up @@ -2320,7 +2325,10 @@ __SYCL_EXPORT pi_result piGetDeviceAndHostTimer(pi_device Device,
/// Command buffer extension
struct _pi_ext_command_buffer;
struct _pi_ext_sync_point;
struct _pi_ext_command_buffer_command;

using pi_ext_command_buffer = _pi_ext_command_buffer *;
using pi_ext_command_buffer_command = _pi_ext_command_buffer_command *;
using pi_ext_sync_point = pi_uint32;

typedef enum {
Expand All @@ -2330,7 +2338,40 @@ typedef enum {
struct pi_ext_command_buffer_desc final {
pi_ext_structure_type stype;
const void *pNext;
pi_queue_properties *properties;
pi_bool is_updatable;
};

// Command Buffer Update types
struct pi_ext_command_buffer_update_memobj_arg_desc_t final {
uint32_t arg_index;
const pi_mem_obj_property *properties;
pi_mem new_mem_obj;
};

struct pi_ext_command_buffer_update_pointer_arg_desc_t final {
uint32_t arg_index;
void *new_ptr;
};

struct pi_ext_command_buffer_update_value_arg_desc_t final {
uint32_t arg_index;
uint32_t arg_size;
void *new_value;
};

struct pi_ext_command_buffer_update_kernel_launch_desc final {
uint32_t num_mem_obj_args;
uint32_t num_ptr_args;
uint32_t num_value_args;
uint32_t num_work_dim;

pi_ext_command_buffer_update_memobj_arg_desc_t *mem_obj_arg_list;
pi_ext_command_buffer_update_pointer_arg_desc_t *ptr_arg_list;
pi_ext_command_buffer_update_value_arg_desc_t *value_arg_list;

size_t *global_work_offset;
size_t *global_work_size;
size_t *local_work_size;
};

/// API to create a command-buffer.
Expand Down Expand Up @@ -2374,12 +2415,14 @@ piextCommandBufferFinalize(pi_ext_command_buffer command_buffer);
/// \param sync_point_wait_list A list of sync points that this command must
/// wait on.
/// \param sync_point The sync_point associated with this kernel execution.
/// \param command Return pointer to the command representing this kernel
/// execution.
__SYCL_EXPORT pi_result piextCommandBufferNDRangeKernel(
pi_ext_command_buffer command_buffer, pi_kernel kernel, pi_uint32 work_dim,
const size_t *global_work_offset, const size_t *global_work_size,
const size_t *local_work_size, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);
pi_ext_sync_point *sync_point, pi_ext_command_buffer_command *command);

/// API to append a USM memcpy command to the command-buffer.
/// \param command_buffer The command-buffer to append onto.
Expand Down Expand Up @@ -2607,6 +2650,25 @@ piextEnqueueCommandBuffer(pi_ext_command_buffer command_buffer, pi_queue queue,
pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event);

/// API to update a kernel launch command inside of a command-buffer.
/// @param command The command to be updated.
/// @param desc Descriptor which describes the updated parameters of the kernel
/// launch.
__SYCL_EXPORT pi_result piextCommandBufferUpdateKernelLaunch(
pi_ext_command_buffer_command command,
pi_ext_command_buffer_update_kernel_launch_desc *desc);

/// API to increment the reference count of a command-buffer command.
/// \param command The command to release.
__SYCL_EXPORT pi_result
piextCommandBufferRetainCommand(pi_ext_command_buffer_command command);

/// API to decrement the reference count of a command-buffer command. After the
/// command reference count becomes zero, the command is deleted.
/// \param command The command to release.
__SYCL_EXPORT pi_result
piextCommandBufferReleaseCommand(pi_ext_command_buffer_command command);

/// API to destroy bindless unsampled image handles.
///
/// \param context is the pi_context
Expand Down
1 change: 1 addition & 0 deletions sycl/include/sycl/detail/pi.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ using PiKernelCacheConfig = ::pi_kernel_cache_config;
using PiExtSyncPoint = ::pi_ext_sync_point;
using PiExtCommandBuffer = ::pi_ext_command_buffer;
using PiExtCommandBufferDesc = ::pi_ext_command_buffer_desc;
using PiExtCommandBufferCommand = ::pi_ext_command_buffer_command;
using PiPeerAttr = ::pi_peer_attr;
using PiImageHandle = ::pi_image_handle;
using PiImageMemHandle = ::pi_image_mem_handle;
Expand Down
3 changes: 2 additions & 1 deletion sycl/include/sycl/detail/property_helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,9 @@ enum DataLessPropKind {
GraphAssumeDataOutlivesBuffer = 22,
GraphAssumeBufferOutlivesGraph = 23,
GraphDependOnAllLeaves = 24,
GraphUpdatable = 25,
// Indicates the last known dataless property.
LastKnownDataLessPropKind = 24,
LastKnownDataLessPropKind = 25,
// Exceeding 32 may cause ABI breaking change on some of OSes.
DataLessPropKindSize = 32
};
Expand Down
10 changes: 10 additions & 0 deletions sycl/include/sycl/device_aspect_macros.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,11 @@
#define __SYCL_ALL_DEVICES_HAVE_ext_intel_fpga_task_sequence__ 0
#endif

#ifndef __SYCL_ALL_DEVICES_HAVE_ext_oneapi_limited_graph__
// __SYCL_ASPECT(ext_oneapi_limited_graph, 63)
#define __SYCL_ALL_DEVICES_HAVE_ext_oneapi_limited_graph__ 0
#endif

#ifndef __SYCL_ANY_DEVICE_HAS_host__
// __SYCL_ASPECT(host, 0)
#define __SYCL_ANY_DEVICE_HAS_host__ 0
Expand Down Expand Up @@ -637,3 +642,8 @@
// __SYCL_ASPECT(ext_intel_fpga_task_sequence__, 62)
#define __SYCL_ANY_DEVICE_HAS_ext_intel_fpga_task_sequence__ 0
#endif

#ifndef __SYCL_ANY_DEVICE_HAS_ext_oneapi_limited_graph__
// __SYCL_ASPECT(ext_oneapi_limited_graph, 63)
#define __SYCL_ANY_DEVICE_HAS_ext_oneapi_limited_graph__ 0
#endif
Loading
Loading