From 75df78c0195a3237da96d576c3c72f94b1b6582c Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 2 Apr 2024 13:03:48 -0700 Subject: [PATCH] spec source for cl_khr_kernel_clock (#1103) * spec source for cl_khr_kernel_clock * updated after March 26th teleconference Clarified that this is a provisional extension Removed ext from feature names and feature test macros Added undefined behavior description to the SPIR-V environment spec * fix a few more places where the extension should be marked provisional * clarify in a few more places that this extension is provisional * remove provisional_notice.asciidoc, since it should not be used anymore --- OpenCL_API.txt | 2 +- OpenCL_C.txt | 82 +++++++++++++++++++++++++++++- api/appendix_e.asciidoc | 5 ++ api/cl_khr_kernel_clock.asciidoc | 62 ++++++++++++++++++++++ api/opencl_platform_layer.asciidoc | 37 ++++++++++++++ c/feature-dictionary.asciidoc | 24 +++++++++ env/extensions.asciidoc | 16 ++++++ ext/provisional_notice.asciidoc | 12 ----- ext/quick_reference.asciidoc | 4 ++ xml/cl.xml | 27 +++++++++- 10 files changed, 255 insertions(+), 16 deletions(-) create mode 100644 api/cl_khr_kernel_clock.asciidoc delete mode 100644 ext/provisional_notice.asciidoc diff --git a/OpenCL_API.txt b/OpenCL_API.txt index 2be2268c..e7e67a57 100644 --- a/OpenCL_API.txt +++ b/OpenCL_API.txt @@ -39,7 +39,7 @@ include::config/version-local-links.asciidoc[] // Formatting and links for API functions and enums. include::api/dictionary.asciidoc[] -// Feature Dictionary - used by some extensions. +// Feature Dictionary. include::c/feature-dictionary.asciidoc[] // External Footnotes diff --git a/OpenCL_C.txt b/OpenCL_C.txt index dd372a8b..0935f4fa 100644 --- a/OpenCL_C.txt +++ b/OpenCL_C.txt @@ -224,14 +224,28 @@ ifdef::cl_khr_integer_dot_product[] (when the `<>` extension macro is defined) | The OpenCL C compiler supports built-in functions that perform dot -products on 4x8 bit packed integer vectors +products on 4x8 bit packed integer vectors. | {opencl_c_integer_dot_product_input_4x8bit} + (when the `<>` extension macro is defined) | The OpenCL C compiler supports built-in functions that perform dot -products on 4x8 bit integer vectors +products on 4x8 bit integer vectors. endif::cl_khr_integer_dot_product[] +ifdef::cl_khr_kernel_clock[] +| {opencl_c_kernel_clock_scope_device} +| The OpenCL C compiler supports built-in functions that sample the value from a +clock shared by all work-items executing on the device. + +| {opencl_c_kernel_clock_scope_work_group} +| The OpenCL C compiler supports built-in functions that sample the value from a +clock shared by all work-items executing in the same work-group. + +| {opencl_c_kernel_clock_scope_sub_group} +| The OpenCL C compiler supports built-in functions that sample the value from a +clock shared by all work-items executing in the same sub-group. +endif::cl_khr_kernel_clock[] + |==== In OpenCL C 3.0 or newer, feature macros must expand to the value `1` if the @@ -462,6 +476,16 @@ The extension provides new <> operating on these types. endif::cl_khr_integer_dot_product[] +ifdef::cl_khr_kernel_clock[] +[[cl_khr_kernel_clock,cl_khr_kernel_clock]] +==== Kernel Clock + +The `cl_khr_kernel_clock` extension adds support for SPIR-V instructions and +OpenCL C built-in functions to sample the value from one of three clocks +provided by compute units. The extension provides the following functions: + +* <> +endif::cl_khr_kernel_clock[] ifdef::cl_khr_local_int32_base_atomics[] [[cl_khr_local_int32_base_atomics,cl_khr_local_int32_base_atomics]] @@ -15306,6 +15330,60 @@ endif::cl_khr_subgroup_shuffle_relative[] |==== +ifdef::cl_khr_kernel_clock[] +[[kernel-clock-functions]] +=== Kernel Clock Functions + +NOTE: The functionality described in this section <> +support for the `<>` extension. + +The `clock_read_device` and `clock_read_hilo_device` functions require support +for the {opencl_c_kernel_clock_scope_device} feature. +The `clock_read_work_group` and `clock_read_hilo_work_group` functions require +support for the {opencl_c_kernel_clock_scope_work_group} feature. +The `clock_read_sub_group` and `clock_read_hilo_sub_group` functions require +support for the {opencl_c_kernel_clock_scope_sub_group} feature. + +This section describes OpenCL C built-in functions that sample the value from +one of three clocks provided by compute units. + +[[table-kernel-clock-functions]] +.Built-in Kernel Clock Functions +[cols="1a,1",options="header",] +|==== +| Function | Description + +|[source,opencl_c] +---- +ulong clock_read_device(); +ulong clock_read_work_group(); +ulong clock_read_sub_group(); +---- + | Returns a sampled value of a clock as seen by the compute unit. + + An idealized clock is an unbounded unsigned scalar integer tick count + increasing monotonically over time. A clock’s rate of progress may vary + within the lifetime of a work-item, may vary across different + executions of the program, and may be affected by conditions beyond the + control of the programmer. The sampled value read by this function consists of + the least significant bits of the idealized clock’s tick count at the time the + instruction was executed. In particular, an observer may see sampled values wrap + around zero. + +|[source,opencl_c] +---- +uint2 clock_read_hilo_device(); +uint2 clock_read_hilo_work_group(); +uint2 clock_read_hilo_sub_group(); +---- + | Performs the same operation as `clock_read`, but returns the value as a + `uint2` whose `.lo` component contains the 32 least significant bits of the + result and `.hi` component contains the 32 most significant bits of the + result. + +|==== + +endif::cl_khr_kernel_clock[] + [[opencl-numerical-compliance]] = OpenCL Numerical Compliance diff --git a/api/appendix_e.asciidoc b/api/appendix_e.asciidoc index c88b8093..ec6626c2 100644 --- a/api/appendix_e.asciidoc +++ b/api/appendix_e.asciidoc @@ -598,3 +598,8 @@ Changes from *v3.0.14*: ** Restricted semaphores to a single associated device, see {khronos-opencl-pr}/996[#996]. * `<>`: ** Clarified that only rotating within a subgroup is supported, see {khronos-opencl-pr}/967[#967]. + +Changes from *v3.0.15*: + + * Added new extensions: + ** `<>` (provisional) diff --git a/api/cl_khr_kernel_clock.asciidoc b/api/cl_khr_kernel_clock.asciidoc new file mode 100644 index 00000000..7f4c4a0d --- /dev/null +++ b/api/cl_khr_kernel_clock.asciidoc @@ -0,0 +1,62 @@ +// Copyright 2024 The Khronos Group Inc. +// SPDX-License-Identifier: CC-BY-4.0 + +include::{generated}/meta/{refprefix}cl_khr_kernel_clock.txt[] + +=== Other Extension Metadata + +*Last Modified Date*:: + 2024-03-25 +*IP Status*:: + No known IP claims. +*Contributors*:: + - Kevin Petit, Arm Ltd. + + - Paul Fradgley, Imagination Technologies + + - Jeremy Kemp, Imagination Technologies + + - Ben Ashbaugh, Intel + + - Balaji Calidas, Qualcomm Technologies, Inc. + + - Ruihao Zhang, Qualcomm Technologies, Inc. + +=== Description + +`cl_khr_kernel_clock` adds the ability for a kernel to sample the value from one +of three clocks provided by compute units. + +OpenCL C compilers supporting this extension will define the extension macro +`cl_khr_kernel_clock`, and may define corresponding feature macros +{opencl_c_kernel_clock_scope_device}, +{opencl_c_kernel_clock_scope_work_group}, and +{opencl_c_kernel_clock_scope_sub_group} depending on the reported +capabilities. + +See the link:{OpenCLCSpecURL}#cl_khr_kernel_clock[Kernel Clock] section of the +OpenCL C specification for more information. + +=== Interactions With Other Extensions + +On devices that implement the `EMBEDDED` profile, the `cles_khr_int64` extension +is required for the `clock_read_device`, `clock_read_work_group` and +`clock_read_sub_group` functions to be present. + +Support for sub-groups is required for the `clock_read_sub_group` and +`clock_read_hilo_sub_group` functions to be present. + +// The 'New ...' section can be auto-generated + +=== New Types + + * {cl_device_kernel_clock_capabilities_khr_TYPE} + +=== New Enums + + * {cl_device_info_TYPE} + ** {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR} + * {cl_device_kernel_clock_capabilities_khr_TYPE} + ** {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR} + ** {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR} + ** {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR} + +=== Version History + + * Revision 0.9.0, 2024-03-25 + ** First assigned version (provisional). diff --git a/api/opencl_platform_layer.asciidoc b/api/opencl_platform_layer.asciidoc index 6211b138..7c39cb51 100644 --- a/api/opencl_platform_layer.asciidoc +++ b/api/opencl_platform_layer.asciidoc @@ -1985,6 +1985,26 @@ include::{generated}/api/version-notes/CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATIO is missing before version 2.0 of the extension. endif::cl_khr_integer_dot_product[] +ifdef::cl_khr_kernel_clock[] +| {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR_anchor} + +include::{generated}/api/version-notes/CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR.asciidoc[] + | {cl_device_kernel_clock_capabilities_khr_TYPE} + | Returns the kernel clock capabilities of the device. + + + {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR_anchor} is set when kernels are + allowed to call the `clock_read_device` and `clock_read_hilo_device` + OpenCL-C functions. + + {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR_anchor} is set when kernels + are allowed to call the `clock_read_work_group` and + `clock_read_hilo_work_group` OpenCL-C functions. + + {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR_anchor} is set when kernels + are allowed to call the `clock_read_sub_group` and + `clock_read_hilo_sub_group` OpenCL-C functions. +endif::cl_khr_kernel_clock[] + ifdef::cl_khr_pci_bus_info[] | {CL_DEVICE_PCI_BUS_INFO_KHR_anchor} @@ -2080,6 +2100,23 @@ returned for {CL_DEVICE_INTEGER_DOT_PRODUCT_CAPABILITIES_KHR}: |==== endif::cl_khr_integer_dot_product[] +ifdef::cl_khr_kernel_clock[] +OpenCL 3 devices must report the following feature macros via +{CL_DEVICE_OPENCL_C_FEATURES} when the corresponding bit is set in the bitfield +returned for {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR}: + +[cols="1,1",options="header"] +|==== +| Feature Bit | Feature Macro +| {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR} + | {opencl_c_kernel_clock_scope_device} +| {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR} + | {opencl_c_kernel_clock_scope_work_group} +| {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR} + | {opencl_c_kernel_clock_scope_sub_group} +|==== +endif::cl_khr_kernel_clock[] + ifdef::cl_khr_external_semaphore[] One of the two queries {CL_DEVICE_SEMAPHORE_IMPORT_HANDLE_TYPES_KHR} and {CL_DEVICE_SEMAPHORE_EXPORT_HANDLE_TYPES_KHR} must return a non-empty list diff --git a/c/feature-dictionary.asciidoc b/c/feature-dictionary.asciidoc index 4943b36b..e8375eb5 100644 --- a/c/feature-dictionary.asciidoc +++ b/c/feature-dictionary.asciidoc @@ -145,3 +145,27 @@ endif::[] ifndef::backend-html5[] :opencl_c_integer_dot_product_input_4x8bit_packed: pass:q[`\__opencl_c_​integer_​dot_​product_​input_​4x8bit_​packed`] endif::[] + +// opencl_c_kernel_clock_scope_device +ifdef::backend-html5[] +:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_kernel_clock_scope_device`] +endif::[] +ifndef::backend-html5[] +:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_​kernel_​clock_​scope_​device`] +endif::[] + +// opencl_c_kernel_clock_scope_work_group +ifdef::backend-html5[] +:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_kernel_clock_scope_work_group`] +endif::[] +ifndef::backend-html5[] +:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_​kernel_​clock_​scope_​work_​group`] +endif::[] + +// opencl_c_kernel_clock_scope_sub_group +ifdef::backend-html5[] +:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_kernel_clock_scope_sub_group`] +endif::[] +ifndef::backend-html5[] +:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_​kernel_​clock_​scope_​sub_​group`] +endif::[] diff --git a/env/extensions.asciidoc b/env/extensions.asciidoc index 4ef4fd7a..df025955 100644 --- a/env/extensions.asciidoc +++ b/env/extensions.asciidoc @@ -379,6 +379,22 @@ Otherwise, for the *GroupUniformArithmeticKHR* scan and reduction instructions, ** *OpTypeInt* with _Width_ equal to `32` or `64` (equivalent to `int`, `uint`, `long`, and `ulong`) ** *OpTypeFloat* (equivalent to `half`, `float`, and `double`) +==== `cl_khr_kernel_clock` + +If the OpenCL environment supports the extension `cl_khr_kernel_clock`, then the environment must accept modules that declare use of the extension `SPV_KHR_shader_clock` via *OpExtension*. + +If the OpenCL environment supports the extension `cl_khr_kernel_clock` and use of the SPIR-V extension `SPV_KHR_shader_clock` is declared in the module via *OpExtension*, then the environment must accept modules that declare the following SPIR-V capability: + +* *ShaderClockKHR* + +For the *OpReadClockKHR* instruction requiring this capability, supported values for _Scope_ are: + +* *Device*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR` is supported +* *Workgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR` is supported +* *Subgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR` is supported + +For unsupported _Scope_ values, the behavior of *OpReadClockKHR* is undefined. + === Embedded Profile Extensions ==== `cles_khr_int64` diff --git a/ext/provisional_notice.asciidoc b/ext/provisional_notice.asciidoc deleted file mode 100644 index 0cc0eb0d..00000000 --- a/ext/provisional_notice.asciidoc +++ /dev/null @@ -1,12 +0,0 @@ -// Copyright 2023-2024 The Khronos Group. This work is licensed under a -// Creative Commons Attribution 4.0 International License; see -// http://creativecommons.org/licenses/by/4.0/ - -[NOTE] -==== -This is a provisional OpenCL extension specification that has been Ratified under the Khronos Intellectual Property Framework. -It is being made publicly available as a provisional extension to enable review and feedback from the community. -While it is a provisional extension features may be added, removed, or changed in non-backward compatible ways. - -If you have feedback please create an issue on: https://github.com/KhronosGroup/OpenCL-Docs/ -==== \ No newline at end of file diff --git a/ext/quick_reference.asciidoc b/ext/quick_reference.asciidoc index 6fddf712..194c6df9 100644 --- a/ext/quick_reference.asciidoc +++ b/ext/quick_reference.asciidoc @@ -208,6 +208,10 @@ Language Specifications. | Integer dot product operations | Extension +| [[cl_khr_kernel_clock]] link:{APISpecURL}#cl_khr_kernel_clock[`cl_khr_kernel_clock`] +| Sample Clock Values Within a Kernel +| Extension + | [[cl_khr_mipmap_image]] link:{APISpecURL}#cl_khr_mipmap_image[`cl_khr_mipmap_image`] | Create and Use Images with Mipmaps | Extension diff --git a/xml/cl.xml b/xml/cl.xml index 63f3145c..6f1ae87b 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -254,6 +254,7 @@ server's OpenCL/api-docs repository. typedef cl_uint cl_image_requirements_info_ext; typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr + typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; Structure types @@ -1386,6 +1387,13 @@ server's OpenCL/api-docs repository. + + + + + + + In order to synchronize vendor IDs across Khronos APIs, Vulkan's vk.xml @@ -1545,7 +1553,8 @@ server's OpenCL/api-docs repository. - + + @@ -7477,5 +7486,21 @@ server's OpenCL/api-docs repository. + + + + + + + + + + + + + + + +