Skip to content

Commit

Permalink
[SYCL][Graph] Prototype of explicit update with indices
Browse files Browse the repository at this point in the history
- Experimental implementation of explicit update with indices
- New scheduler command for updating a command buffer command
- PI equivalents for new UR APIs
- E2E and Unit Tests
  • Loading branch information
Bensuo committed Feb 27, 2024
1 parent e72b85c commit b195409
Show file tree
Hide file tree
Showing 43 changed files with 2,345 additions and 61 deletions.
42 changes: 42 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ with the following entry-points:
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
| `urCommandBufferEnqueueExp` | Submit command-buffer to a command-queue for execution. |
| `urCommandBufferUpdateKernelLaunchExp` | Updates the parameters of a previous kernel launch command. |

See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
specification for more details.
Expand Down Expand Up @@ -230,6 +231,47 @@ on buffer usage in a graph so that their lifetime semantics are compatible with
a lazy work execution model. However these changes to storage lifetimes have not
yet been implemented.

## Graph Update

### Design Challenges

Graph update faces significant design challenges in SYCL:

* Lambda capture order is explicitly undefined in C++, so the user cannot reason
about the indices of arguments captured by kernel lambdas.
* Once arguments have been captured the actual type information is lost in the
transition through the integration header and extracting arguments in the SYCL
runtime, therefore we cannot automatically match new argument values by
querying the captured arguments without significant possibility for
collisions. For example, if a kernel captures two USM pointers and the user
wishes to update one, we cannot reason about which pointer they actually want
to update when we only know that: they are pointer args of a certain size.

The current approach is to limit graph update to the explicit APIs and where the
user is using `handler::set_arg()` or some equivalent to manually set kernel
arguments using indices. Therefore when updating we can use indices to avoid
collisions. In practice there are only a few current scenarios where `set_arg()`
can be used:

* The proposed ["Free Function Kernel"
extension](../extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc)
* OpenCL interop kernels created from SPIR-V source at runtime.

A possible future workaround lambda capture issues could be "Whole-Graph Update"
where if we can guarantee that lambda capture order is the same across two
different recordings we can then match parameter order when updating.

### Scheduler Integration

Graph updates in the runtime are sychronous calls however they can optionally be
done through the scheduler using a new command,
`sycl::detail::UpdateCommandBufferCommand`. This is needed when dealing with
accessor updates. Since a new buffer which the user creates for updating may not
yet have been lazily initialized on device we schedule a new command which has
requirements for these new accessors to correctly trigger allocations before
updating. This is similar to how individual graph commands are enqueued when
accessors are used in a graph node.

## Backend Implementation

Implementation of UR command-buffers
Expand Down
1 change: 1 addition & 0 deletions sycl/include/sycl/detail/pi.def
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ _PI_API(piextCommandBufferFillUSM)
_PI_API(piextCommandBufferPrefetchUSM)
_PI_API(piextCommandBufferAdviseUSM)
_PI_API(piextEnqueueCommandBuffer)
_PI_API(piextCommandBufferUpdateKernelLaunch)

_PI_API(piextUSMPitchedAlloc)

Expand Down
56 changes: 53 additions & 3 deletions sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,10 @@
// 15.44 Add coarse-grain memory advice flag for HIP.
// 15.45 Added piextKernelSuggestMaxCooperativeGroupCount and
// piextEnqueueCooperativeKernelLaunch.
// 15.46 Add CommandBuffer update definitions

#define _PI_H_VERSION_MAJOR 15
#define _PI_H_VERSION_MINOR 45
#define _PI_H_VERSION_MINOR 46

#define _PI_STRING_HELPER(a) #a
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
Expand Down Expand Up @@ -443,6 +444,9 @@ typedef enum {
// Composite device
PI_EXT_ONEAPI_DEVICE_INFO_COMPONENT_DEVICES = 0x20111,
PI_EXT_ONEAPI_DEVICE_INFO_COMPOSITE_DEVICE = 0x20112,

// Command Buffers
PI_EXT_ONEAPI_DEVICE_INFO_COMMAND_BUFFER_UPDATE_SUPPORT = 0x20113,
} _pi_device_info;

typedef enum {
Expand Down Expand Up @@ -2314,7 +2318,10 @@ __SYCL_EXPORT pi_result piGetDeviceAndHostTimer(pi_device Device,
/// Command buffer extension
struct _pi_ext_command_buffer;
struct _pi_ext_sync_point;
struct _pi_ext_command_buffer_command;

using pi_ext_command_buffer = _pi_ext_command_buffer *;
using pi_ext_command_buffer_command = _pi_ext_command_buffer_command *;
using pi_ext_sync_point = pi_uint32;

typedef enum {
Expand All @@ -2324,7 +2331,40 @@ typedef enum {
struct pi_ext_command_buffer_desc final {
pi_ext_structure_type stype;
const void *pNext;
pi_queue_properties *properties;
pi_bool is_updateable;
};

// Command Buffer Update types
struct pi_ext_command_buffer_update_memobj_arg_desc_t final {
uint32_t arg_index;
const pi_mem_obj_property *properties;
pi_mem new_mem_obj;
};

struct pi_ext_command_buffer_update_pointer_arg_desc_t final {
uint32_t arg_index;
void *new_ptr;
};

struct pi_ext_command_buffer_update_value_arg_desc_t final {
uint32_t arg_index;
uint32_t arg_size;
void *new_value;
};

struct pi_ext_command_buffer_update_kernel_launch_desc final {
uint32_t num_mem_obj_args;
uint32_t num_ptr_args;
uint32_t num_value_args;
uint32_t num_work_dim;

pi_ext_command_buffer_update_memobj_arg_desc_t *mem_obj_arg_list;
pi_ext_command_buffer_update_pointer_arg_desc_t *ptr_arg_list;
pi_ext_command_buffer_update_value_arg_desc_t *value_arg_list;

size_t *global_work_offset;
size_t *global_work_size;
size_t *local_work_size;
};

/// API to create a command-buffer.
Expand Down Expand Up @@ -2368,12 +2408,14 @@ piextCommandBufferFinalize(pi_ext_command_buffer command_buffer);
/// \param sync_point_wait_list A list of sync points that this command must
/// wait on.
/// \param sync_point The sync_point associated with this kernel execution.
/// \param command Return pointer to the command representing this kernel
/// execution.
__SYCL_EXPORT pi_result piextCommandBufferNDRangeKernel(
pi_ext_command_buffer command_buffer, pi_kernel kernel, pi_uint32 work_dim,
const size_t *global_work_offset, const size_t *global_work_size,
const size_t *local_work_size, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);
pi_ext_sync_point *sync_point, pi_ext_command_buffer_command *command);

/// API to append a USM memcpy command to the command-buffer.
/// \param command_buffer The command-buffer to append onto.
Expand Down Expand Up @@ -2601,6 +2643,14 @@ piextEnqueueCommandBuffer(pi_ext_command_buffer command_buffer, pi_queue queue,
pi_uint32 num_events_in_wait_list,
const pi_event *event_wait_list, pi_event *event);

/// API to update a kernel launch command inside of a command-buffer.
/// @param command The command to be updated.
/// @param desc Descriptor which describes the updated parameters of the kernel
/// launch.
__SYCL_EXPORT pi_result piextCommandBufferUpdateKernelLaunch(
pi_ext_command_buffer_command command,
pi_ext_command_buffer_update_kernel_launch_desc *desc);

/// API to destroy bindless unsampled image handles.
///
/// \param context is the pi_context
Expand Down
1 change: 1 addition & 0 deletions sycl/include/sycl/detail/pi.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ using PiKernelCacheConfig = ::pi_kernel_cache_config;
using PiExtSyncPoint = ::pi_ext_sync_point;
using PiExtCommandBuffer = ::pi_ext_command_buffer;
using PiExtCommandBufferDesc = ::pi_ext_command_buffer_desc;
using PiExtCommandBufferCommand = ::pi_ext_command_buffer_command;
using PiPeerAttr = ::pi_peer_attr;
using PiImageHandle = ::pi_image_handle;
using PiImageMemHandle = ::pi_image_mem_handle;
Expand Down
3 changes: 2 additions & 1 deletion sycl/include/sycl/detail/property_helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,9 @@ enum DataLessPropKind {
GraphAssumeDataOutlivesBuffer = 22,
GraphAssumeBufferOutlivesGraph = 23,
GraphDependOnAllLeaves = 24,
GraphUpdateable = 25,
// Indicates the last known dataless property.
LastKnownDataLessPropKind = 24,
LastKnownDataLessPropKind = 25,
// Exceeding 32 may cause ABI breaking change on some of OSes.
DataLessPropKindSize = 32
};
Expand Down
102 changes: 91 additions & 11 deletions sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@

#pragma once

#include <sycl/context.hpp> // for context
#include <sycl/detail/export.hpp> // for __SYCL_EXPORT
#include <sycl/accessor.hpp>
#include <sycl/context.hpp> // for context
#include <sycl/detail/export.hpp> // for __SYCL_EXPORT
#include <sycl/detail/kernel_desc.hpp>
#include <sycl/detail/property_helper.hpp> // for DataLessPropKind, PropWith...
#include <sycl/device.hpp> // for device
#include <sycl/nd_range.hpp>
#include <sycl/properties/property_traits.hpp> // for is_property, is_property_of
#include <sycl/property_list.hpp> // for property_list

Expand All @@ -30,6 +33,15 @@ namespace ext {
namespace oneapi {
namespace experimental {

/// State to template the command_graph class on.
enum class graph_state {
modifiable, ///< In modifiable state, commands can be added to graph.
executable, ///< In executable state, the graph is ready to execute.
};

// Forward declare Graph class
template <graph_state State> class command_graph;

namespace detail {
// List of sycl features and extensions which are not supported by graphs. Used
// for throwing errors when these features are used with graphs.
Expand Down Expand Up @@ -73,15 +85,9 @@ UnsupportedFeatureToString(UnsupportedGraphFeatures Feature) {
class node_impl;
class graph_impl;
class exec_graph_impl;

class dynamic_parameter_impl;
} // namespace detail

/// State to template the command_graph class on.
enum class graph_state {
modifiable, ///< In modifiable state, commands can be added to graph.
executable, ///< In executable state, the graph is ready to execute.
};

enum class node_type {
empty = 0,
subgraph = 1,
Expand Down Expand Up @@ -113,6 +119,13 @@ class __SYCL_EXPORT node {
/// submission.
static node get_node_from_event(event nodeEvent);

/// Update the ND-Range of this node if it is a kernel execution node
template <int Dimensions>
void update_nd_range(nd_range<Dimensions> executionRange);

/// Update the Range of this node if it is a kernel execution node
template <int Dimensions> void update_range(range<Dimensions> executionRange);

private:
node(const std::shared_ptr<detail::node_impl> &Impl) : impl(Impl) {}

Expand Down Expand Up @@ -146,6 +159,14 @@ class assume_buffer_outlives_graph
public:
assume_buffer_outlives_graph() = default;
};

/// Property passed to command_graph<graph_state::modifiable>::finalize() to
/// mark the resulting executable command_graph as able to be updated.
class updateable
: public ::sycl::detail::DataLessProperty<::sycl::detail::GraphUpdateable> {
public:
updateable() = default;
};
} // namespace graph

namespace node {
Expand Down Expand Up @@ -336,12 +357,24 @@ class __SYCL_EXPORT executable_command_graph {
/// @param Graph Graph to use the inputs and outputs of.
void update(const command_graph<graph_state::modifiable> &Graph);

/// Updates a single node in this graph based on the contents of the provided
/// node.
/// @param Node The node to use for updating the graph.
void update(const node &Node);

/// Updates a number of nodes in this graph based on the contents of the
/// provided nodes.
/// @param Nodes The nodes to use for updating the graph.
void update(const std::vector<node> &Nodes);

protected:
/// Constructor used by internal runtime.
/// @param Graph Detail implementation class to construct with.
/// @param Ctx Context to use for graph.
/// @param PropList Optional list of properties to pass.
executable_command_graph(const std::shared_ptr<detail::graph_impl> &Graph,
const sycl::context &Ctx);
const sycl::context &Ctx,
const property_list &PropList = {});

template <class Obj>
friend decltype(Obj::impl)
Expand Down Expand Up @@ -385,13 +418,60 @@ class command_graph : public detail::modifiable_command_graph {
template <>
class command_graph<graph_state::executable>
: public detail::executable_command_graph {

protected:
friend command_graph<graph_state::executable>
detail::modifiable_command_graph::finalize(const sycl::property_list &) const;
using detail::executable_command_graph::executable_command_graph;
};

namespace detail {
class __SYCL_EXPORT dynamic_parameter_base {
public:
dynamic_parameter_base(
sycl::ext::oneapi::experimental::command_graph<graph_state::modifiable>
Graph);

void register_with_node(handler &CGH, int ArgIndex);

protected:
void updateValue(void *NewValue, size_t Size);

void updateAccessor(sycl::detail::AccessorBaseHost *Acc);
std::shared_ptr<dynamic_parameter_impl> impl;
};
} // namespace detail

template <typename ValueT>
class dynamic_parameter : public detail::dynamic_parameter_base {
static constexpr bool IsAccessor =
std::is_base_of_v<sycl::detail::AccessorBaseHost, ValueT>;
static constexpr sycl::detail::kernel_param_kind_t ParamType =
IsAccessor ? sycl::detail::kernel_param_kind_t::kind_accessor
: std::is_pointer_v<ValueT>
? sycl::detail::kernel_param_kind_t::kind_pointer
: sycl::detail::kernel_param_kind_t::kind_std_layout;

public:
dynamic_parameter(experimental::command_graph<graph_state::modifiable> Graph)
: detail::dynamic_parameter_base(Graph), MValue() {}

dynamic_parameter(ValueT InitialValue,
experimental::command_graph<graph_state::modifiable> Graph)
: detail::dynamic_parameter_base(Graph), MValue(InitialValue) {}

void update(const ValueT &NewValue) {
MValue = NewValue;
if constexpr (IsAccessor) {
detail::dynamic_parameter_base::updateAccessor(&MValue);
} else {
detail::dynamic_parameter_base::updateValue(&MValue, sizeof(ValueT));
}
}

private:
ValueT MValue;
};

/// Additional CTAD deduction guide.
template <graph_state State = graph_state::modifiable>
command_graph(const context &SyclContext, const device &SyclDevice,
Expand Down
Loading

0 comments on commit b195409

Please sign in to comment.