Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Enable in-order cmd-list #13088

Merged
merged 14 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 36 additions & 4 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,31 @@ The API for explicitly adding nodes to a `command_graph` includes a
property to be passed here. `depends_on_all_leaves` provides a shortcut for
adding all the current leaves of a graph as dependencies.

==== Enable-Profiling Property [[enable-profiling]]
EwanC marked this conversation as resolved.
Show resolved Hide resolved
[source,c++]
----
namespace sycl::ext::oneapi::experimental::property::node {
class enable_profiling {
public:
enable_profiling() = default;
};
}
----

The `property::graph::enable_profiling` property can be passed to a
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
`command_graph::add()` function and enables profiling support
for the node in the `command_graph<graph_state::executable>`.
Passing this property implies disabling certain optimizations.
EwanC marked this conversation as resolved.
Show resolved Hide resolved
This is why profiling is by default disabled on graphs, unless users
explicitly require it using either the `property::graph::enable_profiling`
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
property in building mode or the `property::queue::enable_profiling` on
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
the recorded queue (Record&Replay API).
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
As a result, the execution time of a graph finalized with profiling enabled
is longer than that of a graph without profiling capability.
An error will be thrown when attempting to profile an event
from a graph submission that was created without this property.


=== Graph

[source, c++]
Expand Down Expand Up @@ -681,6 +706,8 @@ Parameters:
* `propList` - Zero or more properties can be provided to the constructed node
via an instance of `property_list`. The `property::node::depends_on` property
can be passed here with a list of nodes to create dependency edges on.
The `enable_profiling` property enables the profiling of this node.
See <<enable-profiling, Enable-Profiling>> for more details.


Returns: The empty node which has been added to the graph.
Expand Down Expand Up @@ -718,6 +745,9 @@ Parameters:
* `propList` - Zero or more properties can be provided to the constructed node
via an instance of `property_list`. The `property::node::depends_on` property
can be passed here with a list of nodes to create dependency edges on.
The `enable_profiling` property enables the profiling of this node.
See <<enable-profiling, Enable-Profiling>> for more details.


Returns: The command-group function object node which has been added to the graph.

Expand Down Expand Up @@ -1043,10 +1073,12 @@ ways:
an implicit dependency before and after the graph execution, as if the graph
execution is one command-group submitted to the in-order queue.

2. `property::queue::enable_profiling` - This property has no effect on graph
recording. When set on the queue a graph is submitted to however, it allows
profiling information to be obtained from the event returned by a graph
submission. As it is not defined how a submitted graph will be split up for
2. `property::queue::enable_profiling` - This property must be set on the queue
in recording mode if users want to profile the commands recorded to
the graph.This property must also be set on the queue the queue a graph is
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
submitted to. It allows profiling information to be obtained from the event
returned by a graph submission.
As it is not defined how a submitted graph will be split up for
scheduling at runtime, the `uint64_t` timestamp reported from a profiling
query on a graph execution event has the following semantics, which may be
pessimistic about execution time on device.
Expand Down
3 changes: 2 additions & 1 deletion sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -2330,7 +2330,8 @@ typedef enum {
struct pi_ext_command_buffer_desc final {
pi_ext_structure_type stype;
const void *pNext;
pi_queue_properties *properties;
pi_bool is_in_order;
pi_bool enable_profiling;
};

/// API to create a command-buffer.
Expand Down
3 changes: 2 additions & 1 deletion sycl/include/sycl/detail/property_helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,9 @@ enum DataLessPropKind {
GraphAssumeDataOutlivesBuffer = 22,
GraphAssumeBufferOutlivesGraph = 23,
GraphDependOnAllLeaves = 24,
GraphEnableProfiling = 25,
// Indicates the last known dataless property.
LastKnownDataLessPropKind = 24,
LastKnownDataLessPropKind = 25,
// Exceeding 32 may cause ABI breaking change on some of OSes.
DataLessPropKindSize = 32
};
Expand Down
33 changes: 25 additions & 8 deletions sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,15 @@ class depends_on_all_leaves : public ::sycl::detail::DataLessProperty<
depends_on_all_leaves() = default;
};

/// Property used to enable node profiling.
/// Passing this property to the `command_graph::add()` function
/// ensures that profiling can be queried on this node.
class enable_profiling : public ::sycl::detail::DataLessProperty<
::sycl::detail::GraphEnableProfiling> {
public:
enable_profiling() = default;
};

} // namespace node
} // namespace property

Expand All @@ -200,15 +209,17 @@ class __SYCL_EXPORT modifiable_command_graph {
/// @param PropList Property list used to pass [0..n] predecessor nodes.
/// @return Constructed empty node which has been added to the graph.
node add(const property_list &PropList = {}) {
bool EnableProfiling =
PropList.has_property<property::node::enable_profiling>();
if (PropList.has_property<property::node::depends_on>()) {
auto Deps = PropList.get_property<property::node::depends_on>();
node Node = addImpl(Deps.get_dependencies());
node Node = addImpl(Deps.get_dependencies(), EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
return Node;
}
node Node = addImpl({});
node Node = addImpl({}, EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
Expand All @@ -220,15 +231,17 @@ class __SYCL_EXPORT modifiable_command_graph {
/// @param PropList Property list used to pass [0..n] predecessor nodes.
/// @return Constructed node which has been added to the graph.
template <typename T> node add(T CGF, const property_list &PropList = {}) {
bool EnableProfiling =
PropList.has_property<property::node::enable_profiling>();
if (PropList.has_property<property::node::depends_on>()) {
auto Deps = PropList.get_property<property::node::depends_on>();
node Node = addImpl(CGF, Deps.get_dependencies());
node Node = addImpl(CGF, Deps.get_dependencies(), EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
return Node;
}
node Node = addImpl(CGF, {});
node Node = addImpl(CGF, {}, EnableProfiling);
if (PropList.has_property<property::node::depends_on_all_leaves>()) {
addGraphLeafDependencies(Node);
}
Expand Down Expand Up @@ -303,14 +316,16 @@ class __SYCL_EXPORT modifiable_command_graph {
/// Template-less implementation of add() for CGF nodes.
/// @param CGF Command-group function to add.
/// @param Dep List of predecessor nodes.
/// @param EnableProfiling Enable node profiling.
/// @return Node added to the graph.
node addImpl(std::function<void(handler &)> CGF,
const std::vector<node> &Dep);
node addImpl(std::function<void(handler &)> CGF, const std::vector<node> &Dep,
const bool EnableProfiling);

/// Template-less implementation of add() for empty nodes.
/// @param Dep List of predecessor nodes.
/// @param EnableProfiling Enable node profiling.
/// @return Node added to the graph.
node addImpl(const std::vector<node> &Dep);
node addImpl(const std::vector<node> &Dep, const bool EnableProfiling);

/// Adds all graph leaves as dependencies
/// @param Node Destination node to which the leaves of the graph will be
Expand Down Expand Up @@ -340,8 +355,10 @@ class __SYCL_EXPORT executable_command_graph {
/// Constructor used by internal runtime.
/// @param Graph Detail implementation class to construct with.
/// @param Ctx Context to use for graph.
/// @param PropList Optional list of properties to pass.
executable_command_graph(const std::shared_ptr<detail::graph_impl> &Graph,
const sycl::context &Ctx);
const sycl::context &Ctx,
const property_list &PropList = {});

template <class Obj>
friend decltype(Obj::impl)
Expand Down
10 changes: 2 additions & 8 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,8 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit c98fdbcf1f43ce132fbae75336bda984e4ce2e78
# Merge: 5f4dd113 9b3cf9d3
# Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
# Date: Thu Mar 21 10:51:45 2024 +0000
# Merge pull request #1439 from nrspruit/fix_device_native_proxy_buffer
# [L0] Fix Native Host memory usage on device with copy back sync
set(UNIFIED_RUNTIME_TAG c98fdbcf1f43ce132fbae75336bda984e4ce2e78)
set(UNIFIED_RUNTIME_REPO "https://github.com/bensuo/unified-runtime.git")
set(UNIFIED_RUNTIME_TAG 8718fe1e553ea7cda05a0df2f4ef72acc5ce7397)

if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")
Expand Down
9 changes: 6 additions & 3 deletions sycl/plugins/unified_runtime/pi2ur.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4479,13 +4479,16 @@ piextCommandBufferCreate(pi_context Context, pi_device Device,
ur_context_handle_t UrContext =
reinterpret_cast<ur_context_handle_t>(Context);
ur_device_handle_t UrDevice = reinterpret_cast<ur_device_handle_t>(Device);
const ur_exp_command_buffer_desc_t *UrDesc =
reinterpret_cast<const ur_exp_command_buffer_desc_t *>(Desc);
ur_exp_command_buffer_desc_t UrDesc;
UrDesc.stype = UR_STRUCTURE_TYPE_EXP_COMMAND_BUFFER_DESC;
UrDesc.isInOrder = Desc->is_in_order;
UrDesc.enableProfiling = Desc->enable_profiling;
EwanC marked this conversation as resolved.
Show resolved Hide resolved
UrDesc.isUpdatable = false;
ur_exp_command_buffer_handle_t *UrCommandBuffer =
reinterpret_cast<ur_exp_command_buffer_handle_t *>(RetCommandBuffer);

HANDLE_ERRORS(
urCommandBufferCreateExp(UrContext, UrDevice, UrDesc, UrCommandBuffer));
urCommandBufferCreateExp(UrContext, UrDevice, &UrDesc, UrCommandBuffer));

return PI_SUCCESS;
}
Expand Down
2 changes: 2 additions & 0 deletions sycl/source/detail/event_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,8 @@ class event_impl {
return MEventFromSubmittedExecCommandBuffer;
}

void setProfilingEnabled(bool Value) { MIsProfilingEnabled = Value; }

const std::vector<EventImplPtr> &getPostCompleteEvents() const {
return MPostCompleteEvents;
}
Expand Down
33 changes: 25 additions & 8 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,13 @@ void exec_graph_impl::makePartitions() {
if (Node->MCGType == sycl::detail::CG::CodeplayHostTask) {
HostTaskList.push_back(Node);
}
// Next line is supposed to be temporary.
// Nodes are not profiled individually, but the profiling of the whole graph
// is enabled if at least one node has profiling enabled. This should be
// changed once the PR https://github.com/intel/llvm/pull/12592 on node
// profiling is merged. This also will involve updating all the UR enqueue
// cmd functions to add a new parameter containing the profiling status.
MEnableProfiling |= Node->MProfilingEnabled;
}

// Annotate nodes
Expand Down Expand Up @@ -261,6 +268,7 @@ void exec_graph_impl::makePartitions() {
}
if (Partition->MRoots.size() > 0) {
Partition->schedule();
Partition->checkIfGraphIsSinglePath();
MPartitions.push_back(Partition);
PartitionFinalNum++;
}
Expand Down Expand Up @@ -676,7 +684,10 @@ sycl::detail::pi::PiExtSyncPoint exec_graph_impl::enqueueNode(
void exec_graph_impl::createCommandBuffers(
sycl::device Device, std::shared_ptr<partition> &Partition) {
sycl::detail::pi::PiExtCommandBuffer OutCommandBuffer;
sycl::detail::pi::PiExtCommandBufferDesc Desc{};
sycl::detail::pi::PiExtCommandBufferDesc Desc{
pi_ext_structure_type::PI_EXT_STRUCTURE_TYPE_COMMAND_BUFFER_DESC, nullptr,
pi_bool(Partition->MIsInOrderGraph && !MEnableProfiling),
pi_bool(MEnableProfiling)};
auto ContextImpl = sycl::detail::getSyclObjImpl(MContext);
const sycl::detail::PluginPtr &Plugin = ContextImpl->getPlugin();
auto DeviceImpl = sycl::detail::getSyclObjImpl(Device);
Expand Down Expand Up @@ -946,6 +957,7 @@ exec_graph_impl::enqueue(const std::shared_ptr<sycl::detail::queue_impl> &Queue,
NewEvent->attachEventToComplete(Elem.second);
}
}
NewEvent->setProfilingEnabled(MEnableProfiling);
sycl::event QueueEvent =
sycl::detail::createSyclObjFromImpl<sycl::event>(NewEvent);
return QueueEvent;
Expand Down Expand Up @@ -1106,7 +1118,8 @@ modifiable_command_graph::modifiable_command_graph(
: impl(std::make_shared<detail::graph_impl>(
SyclQueue.get_context(), SyclQueue.get_device(), PropList)) {}

node modifiable_command_graph::addImpl(const std::vector<node> &Deps) {
node modifiable_command_graph::addImpl(const std::vector<node> &Deps,
const bool EnableProfiling) {
impl->throwIfGraphRecordingQueue("Explicit API \"Add()\" function");
std::vector<std::shared_ptr<detail::node_impl>> DepImpls;
for (auto &D : Deps) {
Expand All @@ -1115,11 +1128,13 @@ node modifiable_command_graph::addImpl(const std::vector<node> &Deps) {

graph_impl::WriteLock Lock(impl->MMutex);
std::shared_ptr<detail::node_impl> NodeImpl = impl->add(impl, DepImpls);
NodeImpl->MProfilingEnabled = EnableProfiling;
return sycl::detail::createSyclObjFromImpl<node>(NodeImpl);
}

node modifiable_command_graph::addImpl(std::function<void(handler &)> CGF,
const std::vector<node> &Deps) {
const std::vector<node> &Deps,
const bool EnableProfiling) {
impl->throwIfGraphRecordingQueue("Explicit API \"Add()\" function");
std::vector<std::shared_ptr<detail::node_impl>> DepImpls;
for (auto &D : Deps) {
Expand All @@ -1129,6 +1144,7 @@ node modifiable_command_graph::addImpl(std::function<void(handler &)> CGF,
graph_impl::WriteLock Lock(impl->MMutex);
std::shared_ptr<detail::node_impl> NodeImpl =
impl->add(impl, CGF, {}, DepImpls);
NodeImpl->MProfilingEnabled = EnableProfiling;
return sycl::detail::createSyclObjFromImpl<node>(NodeImpl);
}

Expand Down Expand Up @@ -1156,12 +1172,12 @@ void modifiable_command_graph::make_edge(node &Src, node &Dest) {
}

command_graph<graph_state::executable>
modifiable_command_graph::finalize(const sycl::property_list &) const {
modifiable_command_graph::finalize(const sycl::property_list &PropList) const {
// Graph is read and written in this scope so we lock
// this graph with full priviledges.
graph_impl::WriteLock Lock(impl->MMutex);
return command_graph<graph_state::executable>{this->impl,
this->impl->getContext()};
return command_graph<graph_state::executable>{
this->impl, this->impl->getContext(), PropList};
}

bool modifiable_command_graph::begin_recording(queue &RecordingQueue) {
Expand Down Expand Up @@ -1275,8 +1291,9 @@ std::vector<node> modifiable_command_graph::get_root_nodes() const {
}

executable_command_graph::executable_command_graph(
const std::shared_ptr<detail::graph_impl> &Graph, const sycl::context &Ctx)
: impl(std::make_shared<detail::exec_graph_impl>(Ctx, Graph)) {
const std::shared_ptr<detail::graph_impl> &Graph, const sycl::context &Ctx,
const property_list &PropList)
: impl(std::make_shared<detail::exec_graph_impl>(Ctx, Graph, PropList)) {
finalizeImpl(); // Create backend representation for executable graph
}

Expand Down
Loading
Loading