Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Implement Graph and node queries #348

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 98 additions & 98 deletions clang/test/CodeGen/fp-accuracy.c

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions libdevice/cmath_wrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ long int labs(long int x) { return __devicelib_labs(x); }
DEVICE_EXTERN_C_INLINE
long long int llabs(long long int x) { return __devicelib_llabs(x); }

DEVICE_EXTERN_C_INLINE
float fabsf(float x) { return __devicelib_fabsf(x); }

DEVICE_EXTERN_C_INLINE
div_t div(int x, int y) { return __devicelib_div(x, y); }

Expand Down
4 changes: 4 additions & 0 deletions libdevice/cmath_wrapper_fp64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
// reference. If users provide their own math or complex functions(with
// the prototype), functions in device libraries will be ignored and
// overrided by users' version.

DEVICE_EXTERN_C_INLINE
double fabs(double x) { return __devicelib_fabs(x); }

DEVICE_EXTERN_C_INLINE
double log(double x) { return __devicelib_log(x); }

Expand Down
6 changes: 6 additions & 0 deletions libdevice/device_math.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ long int __devicelib_labs(long int x);
DEVICE_EXTERN_C
long long int __devicelib_llabs(long long int x);

DEVICE_EXTERN_C
float __devicelib_fabsf(float x);

DEVICE_EXTERN_C
double __devicelib_fabs(double x);

DEVICE_EXTERN_C
div_t __devicelib_div(int x, int y);

Expand Down
4 changes: 4 additions & 0 deletions libdevice/fallback-cmath-fp64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
// To support fallback device libraries on-demand loading, please update the
// DeviceLibFuncMap in llvm/tools/sycl-post-link/sycl-post-link.cpp if you add
// or remove any item in this file.

DEVICE_EXTERN_C_INLINE
double __devicelib_fabs(double x) { return x < 0 ? -x : x; }

DEVICE_EXTERN_C_INLINE
double __devicelib_log(double x) { return __spirv_ocl_log(x); }

Expand Down
3 changes: 3 additions & 0 deletions libdevice/fallback-cmath.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ long int __devicelib_labs(long int x) { return x < 0 ? -x : x; }
DEVICE_EXTERN_C_INLINE
long long int __devicelib_llabs(long long int x) { return x < 0 ? -x : x; }

DEVICE_EXTERN_C_INLINE
float __devicelib_fabsf(float x) { return x < 0 ? -x : x; }

DEVICE_EXTERN_C_INLINE
div_t __devicelib_div(int x, int y) { return {x / y, x % y}; }

Expand Down
11 changes: 11 additions & 0 deletions sycl/doc/GetStartedGuide.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,17 @@ variable `SYCL_BUILD_PI_HIP_ROCM_DIR` which can be passed using the
python $DPCPP_HOME/llvm/buildbot/configure.py --hip \
--cmake-opt=-DSYCL_BUILD_PI_HIP_ROCM_DIR=/usr/local/rocm
```
If further customization is required — for instance when the layout of
individual directories can not be inferred from `SYCL_BUILD_PI_HIP_ROCM_DIR` —
it is possible to specify the location of HIP include, HSA include and HIP
library directories, using the following CMake variables:
* `SYCL_BUILD_PI_HIP_INCLUDE_DIR`,
* `SYCL_BUILD_PI_HIP_HSA_INCLUDE_DIR`,
* `SYCL_BUILD_PI_HIP_LIB_DIR`.
Please note that a similar customization would also be required for Unified
Runtime, see [the list of options provided by its
CMake](https://github.com/oneapi-src/unified-runtime#cmake-standard-options)
for details.

[LLD](https://llvm.org/docs/AMDGPUUsage.html) is necessary for the AMDGPU
compilation chain. The AMDGPU backend generates a standard ELF relocatable code
Expand Down
4 changes: 4 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,14 @@ with the following entry-points:
| `urCommandBufferFinalizeExp` | No more commands can be appended, makes command-buffer ready to enqueue on a command-queue. |
| `urCommandBufferAppendKernelLaunchExp` | Append a kernel execution command to command-buffer. |
| `urCommandBufferAppendUSMMemcpyExp` | Append a USM memcpy command to the command-buffer. |
| `urCommandBufferAppendUSMFillExp` | Append a USM fill command to the command-buffer. |
| `urCommandBufferAppendMemBufferCopyExp` | Append a mem buffer copy command to the command-buffer. |
| `urCommandBufferAppendMemBufferWriteExp` | Append a memory write command to a command-buffer object. |
| `urCommandBufferAppendMemBufferReadExp` | Append a memory read command to a command-buffer object. |
| `urCommandBufferAppendMemBufferCopyRectExp` | Append a rectangular memory copy command to a command-buffer object. |
| `urCommandBufferAppendMemBufferWriteRectExp` | Append a rectangular memory write command to a command-buffer object. |
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
| `urCommandBufferEnqueueExp` | Submit command-buffer to a command-queue for execution. |

See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
Expand Down Expand Up @@ -347,6 +349,8 @@ The types of commands which are unsupported, and lead to this exception are:
This corresponds to a memory buffer write command.
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
`dest` are USM pointers. This corresponds to a USM copy command.
* `handler::memset(ptr, value, numBytes)` - This corresponds to a USM memory
fill command.

Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
is supported, as a memory buffer copy command exists in the OpenCL extension.
Expand Down
2 changes: 2 additions & 0 deletions sycl/doc/developer/ContributeToDPCPP.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ All changes made to the DPC++ compiler and runtime library should generally
preserve existing ABI/API and contributors should avoid making incompatible
changes. One of the exceptions is experimental APIs, clearly marked so by
namespace or related specification.
If you wish to propose a new experimental DPC++ extension then read
[README-process.md](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/README-process.md).

Another exceptional case is the transition from SYCL 1.2.1 to SYCL 2020
standard.
Expand Down
94 changes: 90 additions & 4 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,19 @@ enum class graph_support_level {
emulated
};

enum class node_type {
empty,
subgraph,
kernel,
memcpy,
memset,
memfill,
prefetch,
memadvise,
ext_oneapi_barrier,
host_task,
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
};

namespace property {

namespace graph {
Expand Down Expand Up @@ -354,7 +367,18 @@ struct graphs_support;
} // namespace device
} // namespace info

class node {};
class node {
public:
node() = delete;

node_type get_type() const;

std::vector<node> get_predecessors() const;

std::vector<node> get_successors() const;

static node get_node_from_event(event nodeEvent);
};

// State of a graph
enum class graph_state {
Expand Down Expand Up @@ -390,6 +414,9 @@ public:
void make_edge(node& src, node& dest);

void print_graph(std::string path, bool verbose = false) const;

std::vector<node> get_nodes() const;
std::vector<node> get_root_nodes() const;
};

template<>
Expand Down Expand Up @@ -460,12 +487,56 @@ edges.

The `node` class provides the {crs}[common reference semantics].

==== Node Member Functions

Table {counter: tableNumber}. Member functions of the `node` class.
[cols="2a,a"]
|===
|Member Function|Description

|
[source,c++]
----
namespace sycl::ext::oneapi::experimental {
class node {};
}
node_type get_type() const;
----
|Returns a value representing the type of command this node represents.

|
[source,c++]
----
std::vector<node> get_predecessors() const;
----
|Returns a list of the predecessor nodes which this node directly depends on.

|
[source,c++]
----
std::vector<node> get_successors() const;
Bensuo marked this conversation as resolved.
Show resolved Hide resolved
----
|Returns a list of the successor nodes which directly depend on this node.

|
[source,c++]
----
static node get_node_from_event(event nodeEvent);
----
|Finds the node associated with an event created from a submission to a queue
in the recording state.

Parameters:

* `nodeEvent` - Event returned from a submission to a queue in the recording
state.

Returns: Graph node that was created when the command that returned
`nodeEvent` was submitted.

Exceptions:

* Throws with error code `invalid` if `nodeEvent` is not associated with a
graph node.

|===

==== Depends-On Property

Expand Down Expand Up @@ -776,6 +847,21 @@ Exceptions:
* Throws synchronously with error code `invalid` if the path is invalid or
the file extension is not supported or if the write operation failed.

|
[source,c++]
----
std::vector<node> get_nodes() const;
----
|Returns a list of all the nodes present in the graph in the order that they
were added.

|
[source,c++]
----
std::vector<node> get_root_nodes() const;
----
|Returns a list of all nodes in the graph which have no dependencies.

|===

Table {counter: tableNumber}. Member functions of the `command_graph` class for queue recording.
Expand Down
10 changes: 5 additions & 5 deletions sycl/doc/extensions/template.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -135,16 +135,16 @@ _It is also appropriate to give an indication of who the target audience is for
the extension. For example, if the extension is intended only for ninja
programmers, we might say something like:_

> The properties described in this extension are advanced features that most
> applications should not need to use. In most cases, applications get the
> best performance without using these properties.
The properties described in this extension are advanced features that most
applications should not need to use. In most cases, applications get the best
performance without using these properties.

_Occasionally, we might add an extension as a stopgap measure for a limited
audience. When this happens, it's best to discourage general usage with a
statement like:_

> This extension exists to solve a specific problem, and a general solution is
> still being evaluated. It is not recommended for general usage.
This extension exists to solve a specific problem, and a general solution is
still being evaluated. It is not recommended for general usage.

_Note that text should be wrapped at 80 columns as shown in this template.
Extensions use AsciiDoc markup language (like this template). If you need help
Expand Down
2 changes: 2 additions & 0 deletions sycl/include/sycl/detail/pi.def
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,8 @@ _PI_API(piextCommandBufferMemBufferWrite)
_PI_API(piextCommandBufferMemBufferWriteRect)
_PI_API(piextCommandBufferMemBufferRead)
_PI_API(piextCommandBufferMemBufferReadRect)
_PI_API(piextCommandBufferMemBufferFill)
_PI_API(piextCommandBufferFillUSM)
_PI_API(piextEnqueueCommandBuffer)

_PI_API(piextUSMPitchedAlloc)
Expand Down
45 changes: 41 additions & 4 deletions sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,10 @@
// 14.38 Change PI_MEM_ADVICE_* values to flags for use in bitwise operations.
// 14.39 Added PI_EXT_INTEL_DEVICE_INFO_ESIMD_SUPPORT device info query.
// 14.40 Add HIP _pi_mem_advice alises to match the PI_MEM_ADVICE_CUDA* ones.
// 14.41 Added piextCommandBufferMemBufferFill & piextCommandBufferFillUSM

#define _PI_H_VERSION_MAJOR 14
#define _PI_H_VERSION_MINOR 40
#define _PI_H_VERSION_MINOR 41

#define _PI_STRING_HELPER(a) #a
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
Expand Down Expand Up @@ -2441,7 +2442,7 @@ __SYCL_EXPORT pi_result piextCommandBufferMemBufferReadRect(
pi_buff_rect_offset buffer_offset, pi_buff_rect_offset host_offset,
pi_buff_rect_region region, size_t buffer_row_pitch,
size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch,
void *ptr, pi_uint32 num_events_in_wait_list,
void *ptr, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);

Expand All @@ -2458,7 +2459,7 @@ __SYCL_EXPORT pi_result piextCommandBufferMemBufferReadRect(
/// \param sync_point The sync_point associated with this memory operation.
__SYCL_EXPORT pi_result piextCommandBufferMemBufferWrite(
pi_ext_command_buffer command_buffer, pi_mem buffer, size_t offset,
size_t size, const void *ptr, pi_uint32 num_events_in_wait_list,
size_t size, const void *ptr, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);

Expand All @@ -2483,7 +2484,43 @@ __SYCL_EXPORT pi_result piextCommandBufferMemBufferWriteRect(
pi_buff_rect_offset buffer_offset, pi_buff_rect_offset host_offset,
pi_buff_rect_region region, size_t buffer_row_pitch,
size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch,
const void *ptr, pi_uint32 num_events_in_wait_list,
const void *ptr, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);

/// API to append a mem buffer fill command to the command-buffer.
/// \param command_buffer The command-buffer to append onto.
/// \param buffer is the location to fill the data.
/// \param pattern pointer to the pattern to fill the buffer with.
/// \param pattern_size size of the pattern in bytes.
/// \param offset Offset into the buffer to fill from.
/// \param size fill size in bytes.
/// \param num_sync_points_in_wait_list The number of sync points in the
/// provided wait list.
/// \param sync_point_wait_list A list of sync points that this command must
/// wait on.
/// \param sync_point The sync_point associated with this memory operation.
__SYCL_EXPORT pi_result piextCommandBufferMemBufferFill(
pi_ext_command_buffer command_buffer, pi_mem buffer, const void *pattern,
size_t pattern_size, size_t offset, size_t size,
pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);

/// API to append a USM fill command to the command-buffer.
/// \param command_buffer The command-buffer to append onto.
/// \param ptr pointer to the USM allocation to fill.
/// \param pattern pointer to the pattern to fill ptr with.
/// \param pattern_size size of the pattern in bytes.
/// \param size fill size in bytes.
/// \param num_sync_points_in_wait_list The number of sync points in the
/// provided wait list.
/// \param sync_point_wait_list A list of sync points that this command must
/// wait on.
/// \param sync_point The sync_point associated with this memory operation.
__SYCL_EXPORT pi_result piextCommandBufferFillUSM(
pi_ext_command_buffer command_buffer, void *ptr, const void *pattern,
size_t pattern_size, size_t size, pi_uint32 num_sync_points_in_wait_list,
const pi_ext_sync_point *sync_point_wait_list,
pi_ext_sync_point *sync_point);

Expand Down
35 changes: 35 additions & 0 deletions sycl/include/sycl/ext/oneapi/experimental/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,37 @@ enum class graph_state {
executable, ///< In executable state, the graph is ready to execute.
};

enum class node_type {
empty = 0,
subgraph,
kernel,
memcpy,
memset,
memfill,
prefetch,
memadvise,
ext_oneapi_barrier,
host_task
};

/// Class representing a node in the graph, returned by command_graph::add().
class __SYCL_EXPORT node {
public:
node() = delete;

/// Get the type of command associated with this node.
node_type get_type() const;

/// Get a list of all the node dependencies of this node.
std::vector<node> get_predecessors() const;

/// Get a list of all nodes which depend on this node.
std::vector<node> get_successors() const;

/// Get the node associated with a SYCL event returned from a queue recording
/// submission.
static node get_node_from_event(event nodeEvent);

private:
node(const std::shared_ptr<detail::node_impl> &Impl) : impl(Impl) {}

Expand Down Expand Up @@ -253,6 +282,12 @@ class __SYCL_EXPORT modifiable_command_graph {
/// as kernel args or memory access where applicable.
void print_graph(const std::string path, bool verbose = false) const;

/// Get a list of all nodes contained in this graph.
std::vector<node> get_nodes() const;

/// Get a list of all root nodes (nodes without dependencies) in this graph.
std::vector<node> get_root_nodes() const;

protected:
/// Constructor used internally by the runtime.
/// @param Impl Detail implementation class to construct object with.
Expand Down
Loading
Loading