Skip to content

Commit

Permalink
spec source for cl_khr_kernel_clock (#1103)
Browse files Browse the repository at this point in the history
* spec source for cl_khr_kernel_clock

* updated after March 26th teleconference

Clarified that this is a provisional extension
Removed ext from feature names and feature test macros
Added undefined behavior description to the SPIR-V environment spec

* fix a few more places where the extension should be marked provisional

* clarify in a few more places that this extension is provisional

* remove provisional_notice.asciidoc, since it should not be used anymore
  • Loading branch information
bashbaug committed Apr 2, 2024
1 parent 2515b1d commit 75df78c
Show file tree
Hide file tree
Showing 10 changed files with 255 additions and 16 deletions.
2 changes: 1 addition & 1 deletion OpenCL_API.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ include::config/version-local-links.asciidoc[]
// Formatting and links for API functions and enums.
include::api/dictionary.asciidoc[]

// Feature Dictionary - used by some extensions.
// Feature Dictionary.
include::c/feature-dictionary.asciidoc[]

// External Footnotes
Expand Down
82 changes: 80 additions & 2 deletions OpenCL_C.txt
Original file line number Diff line number Diff line change
Expand Up @@ -224,14 +224,28 @@ ifdef::cl_khr_integer_dot_product[]
(when the `<<cl_khr_integer_dot_product>>` extension macro is defined)

| The OpenCL C compiler supports built-in functions that perform dot
products on 4x8 bit packed integer vectors
products on 4x8 bit packed integer vectors.

| {opencl_c_integer_dot_product_input_4x8bit} +
(when the `<<cl_khr_integer_dot_product>>` extension macro is defined)
| The OpenCL C compiler supports built-in functions that perform dot
products on 4x8 bit integer vectors
products on 4x8 bit integer vectors.
endif::cl_khr_integer_dot_product[]

ifdef::cl_khr_kernel_clock[]
| {opencl_c_kernel_clock_scope_device}
| The OpenCL C compiler supports built-in functions that sample the value from a
clock shared by all work-items executing on the device.

| {opencl_c_kernel_clock_scope_work_group}
| The OpenCL C compiler supports built-in functions that sample the value from a
clock shared by all work-items executing in the same work-group.

| {opencl_c_kernel_clock_scope_sub_group}
| The OpenCL C compiler supports built-in functions that sample the value from a
clock shared by all work-items executing in the same sub-group.
endif::cl_khr_kernel_clock[]

|====

In OpenCL C 3.0 or newer, feature macros must expand to the value `1` if the
Expand Down Expand Up @@ -462,6 +476,16 @@ The extension provides new <<table-builtin-functions, built-in vector
integer argument functions>> operating on these types.
endif::cl_khr_integer_dot_product[]

ifdef::cl_khr_kernel_clock[]
[[cl_khr_kernel_clock,cl_khr_kernel_clock]]
==== Kernel Clock

The `cl_khr_kernel_clock` extension adds support for SPIR-V instructions and
OpenCL C built-in functions to sample the value from one of three clocks
provided by compute units. The extension provides the following functions:

* <<table-kernel-clock-functions,Built-in Kernel Clock Functions>>
endif::cl_khr_kernel_clock[]

ifdef::cl_khr_local_int32_base_atomics[]
[[cl_khr_local_int32_base_atomics,cl_khr_local_int32_base_atomics]]
Expand Down Expand Up @@ -15306,6 +15330,60 @@ endif::cl_khr_subgroup_shuffle_relative[]

|====

ifdef::cl_khr_kernel_clock[]
[[kernel-clock-functions]]
=== Kernel Clock Functions

NOTE: The functionality described in this section <<unified-spec, requires>>
support for the `<<cl_khr_kernel_clock>>` extension. +
The `clock_read_device` and `clock_read_hilo_device` functions require support
for the {opencl_c_kernel_clock_scope_device} feature.
The `clock_read_work_group` and `clock_read_hilo_work_group` functions require
support for the {opencl_c_kernel_clock_scope_work_group} feature.
The `clock_read_sub_group` and `clock_read_hilo_sub_group` functions require
support for the {opencl_c_kernel_clock_scope_sub_group} feature.

This section describes OpenCL C built-in functions that sample the value from
one of three clocks provided by compute units.

[[table-kernel-clock-functions]]
.Built-in Kernel Clock Functions
[cols="1a,1",options="header",]
|====
| Function | Description

|[source,opencl_c]
----
ulong clock_read_device();
ulong clock_read_work_group();
ulong clock_read_sub_group();
----
| Returns a sampled value of a clock as seen by the compute unit.

An idealized clock is an unbounded unsigned scalar integer tick count
increasing monotonically over time. A clock’s rate of progress may vary
within the lifetime of a work-item, may vary across different
executions of the program, and may be affected by conditions beyond the
control of the programmer. The sampled value read by this function consists of
the least significant bits of the idealized clock’s tick count at the time the
instruction was executed. In particular, an observer may see sampled values wrap
around zero.

|[source,opencl_c]
----
uint2 clock_read_hilo_device();
uint2 clock_read_hilo_work_group();
uint2 clock_read_hilo_sub_group();
----
| Performs the same operation as `clock_read`, but returns the value as a
`uint2` whose `.lo` component contains the 32 least significant bits of the
result and `.hi` component contains the 32 most significant bits of the
result.

|====

endif::cl_khr_kernel_clock[]


[[opencl-numerical-compliance]]
= OpenCL Numerical Compliance
Expand Down
5 changes: 5 additions & 0 deletions api/appendix_e.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -598,3 +598,8 @@ Changes from *v3.0.14*:
** Restricted semaphores to a single associated device, see {khronos-opencl-pr}/996[#996].
* `<<cl_khr_subgroup_rotate>>`:
** Clarified that only rotating within a subgroup is supported, see {khronos-opencl-pr}/967[#967].

Changes from *v3.0.15*:

* Added new extensions:
** `<<cl_khr_kernel_clock>>` (provisional)
62 changes: 62 additions & 0 deletions api/cl_khr_kernel_clock.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// Copyright 2024 The Khronos Group Inc.
// SPDX-License-Identifier: CC-BY-4.0

include::{generated}/meta/{refprefix}cl_khr_kernel_clock.txt[]

=== Other Extension Metadata

*Last Modified Date*::
2024-03-25
*IP Status*::
No known IP claims.
*Contributors*::
- Kevin Petit, Arm Ltd. +
- Paul Fradgley, Imagination Technologies +
- Jeremy Kemp, Imagination Technologies +
- Ben Ashbaugh, Intel +
- Balaji Calidas, Qualcomm Technologies, Inc. +
- Ruihao Zhang, Qualcomm Technologies, Inc.

=== Description

`cl_khr_kernel_clock` adds the ability for a kernel to sample the value from one
of three clocks provided by compute units.

OpenCL C compilers supporting this extension will define the extension macro
`cl_khr_kernel_clock`, and may define corresponding feature macros
{opencl_c_kernel_clock_scope_device},
{opencl_c_kernel_clock_scope_work_group}, and
{opencl_c_kernel_clock_scope_sub_group} depending on the reported
capabilities.

See the link:{OpenCLCSpecURL}#cl_khr_kernel_clock[Kernel Clock] section of the
OpenCL C specification for more information.

=== Interactions With Other Extensions

On devices that implement the `EMBEDDED` profile, the `cles_khr_int64` extension
is required for the `clock_read_device`, `clock_read_work_group` and
`clock_read_sub_group` functions to be present.

Support for sub-groups is required for the `clock_read_sub_group` and
`clock_read_hilo_sub_group` functions to be present.

// The 'New ...' section can be auto-generated

=== New Types

* {cl_device_kernel_clock_capabilities_khr_TYPE}

=== New Enums

* {cl_device_info_TYPE}
** {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR}
* {cl_device_kernel_clock_capabilities_khr_TYPE}
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR}
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR}
** {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR}

=== Version History

* Revision 0.9.0, 2024-03-25
** First assigned version (provisional).
37 changes: 37 additions & 0 deletions api/opencl_platform_layer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1985,6 +1985,26 @@ include::{generated}/api/version-notes/CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATIO
is missing before version 2.0 of the extension.
endif::cl_khr_integer_dot_product[]

ifdef::cl_khr_kernel_clock[]
| {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR_anchor}

include::{generated}/api/version-notes/CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR.asciidoc[]
| {cl_device_kernel_clock_capabilities_khr_TYPE}
| Returns the kernel clock capabilities of the device. +

{CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR_anchor} is set when kernels are
allowed to call the `clock_read_device` and `clock_read_hilo_device`
OpenCL-C functions.

{CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR_anchor} is set when kernels
are allowed to call the `clock_read_work_group` and
`clock_read_hilo_work_group` OpenCL-C functions.

{CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR_anchor} is set when kernels
are allowed to call the `clock_read_sub_group` and
`clock_read_hilo_sub_group` OpenCL-C functions.
endif::cl_khr_kernel_clock[]

ifdef::cl_khr_pci_bus_info[]
| {CL_DEVICE_PCI_BUS_INFO_KHR_anchor}

Expand Down Expand Up @@ -2080,6 +2100,23 @@ returned for {CL_DEVICE_INTEGER_DOT_PRODUCT_CAPABILITIES_KHR}:
|====
endif::cl_khr_integer_dot_product[]

ifdef::cl_khr_kernel_clock[]
OpenCL 3 devices must report the following feature macros via
{CL_DEVICE_OPENCL_C_FEATURES} when the corresponding bit is set in the bitfield
returned for {CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR}:

[cols="1,1",options="header"]
|====
| Feature Bit | Feature Macro
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR}
| {opencl_c_kernel_clock_scope_device}
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR}
| {opencl_c_kernel_clock_scope_work_group}
| {CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR}
| {opencl_c_kernel_clock_scope_sub_group}
|====
endif::cl_khr_kernel_clock[]

ifdef::cl_khr_external_semaphore[]
One of the two queries {CL_DEVICE_SEMAPHORE_IMPORT_HANDLE_TYPES_KHR} and
{CL_DEVICE_SEMAPHORE_EXPORT_HANDLE_TYPES_KHR} must return a non-empty list
Expand Down
24 changes: 24 additions & 0 deletions c/feature-dictionary.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,27 @@ endif::[]
ifndef::backend-html5[]
:opencl_c_integer_dot_product_input_4x8bit_packed: pass:q[`\__opencl_c_&#8203;integer_&#8203;dot_&#8203;product_&#8203;input_&#8203;4x8bit_&#8203;packed`]
endif::[]

// opencl_c_kernel_clock_scope_device
ifdef::backend-html5[]
:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>device`]
endif::[]
ifndef::backend-html5[]
:opencl_c_kernel_clock_scope_device: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;device`]
endif::[]

// opencl_c_kernel_clock_scope_work_group
ifdef::backend-html5[]
:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>work_<wbr>group`]
endif::[]
ifndef::backend-html5[]
:opencl_c_kernel_clock_scope_work_group: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;work_&#8203;group`]
endif::[]

// opencl_c_kernel_clock_scope_sub_group
ifdef::backend-html5[]
:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_<wbr>kernel_<wbr>clock_<wbr>scope_<wbr>sub_<wbr>group`]
endif::[]
ifndef::backend-html5[]
:opencl_c_kernel_clock_scope_sub_group: pass:q[`\__opencl_c_&#8203;kernel_&#8203;clock_&#8203;scope_&#8203;sub_&#8203;group`]
endif::[]
16 changes: 16 additions & 0 deletions env/extensions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,22 @@ Otherwise, for the *GroupUniformArithmeticKHR* scan and reduction instructions,
** *OpTypeInt* with _Width_ equal to `32` or `64` (equivalent to `int`, `uint`, `long`, and `ulong`)
** *OpTypeFloat* (equivalent to `half`, `float`, and `double`)

==== `cl_khr_kernel_clock`

If the OpenCL environment supports the extension `cl_khr_kernel_clock`, then the environment must accept modules that declare use of the extension `SPV_KHR_shader_clock` via *OpExtension*.

If the OpenCL environment supports the extension `cl_khr_kernel_clock` and use of the SPIR-V extension `SPV_KHR_shader_clock` is declared in the module via *OpExtension*, then the environment must accept modules that declare the following SPIR-V capability:

* *ShaderClockKHR*

For the *OpReadClockKHR* instruction requiring this capability, supported values for _Scope_ are:

* *Device*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR` is supported
* *Workgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR` is supported
* *Subgroup*, if `CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR` is supported

For unsupported _Scope_ values, the behavior of *OpReadClockKHR* is undefined.

=== Embedded Profile Extensions

==== `cles_khr_int64`
Expand Down
12 changes: 0 additions & 12 deletions ext/provisional_notice.asciidoc

This file was deleted.

4 changes: 4 additions & 0 deletions ext/quick_reference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,10 @@ Language Specifications.
| Integer dot product operations
| Extension

| [[cl_khr_kernel_clock]] link:{APISpecURL}#cl_khr_kernel_clock[`cl_khr_kernel_clock`]
| Sample Clock Values Within a Kernel
| Extension

| [[cl_khr_mipmap_image]] link:{APISpecURL}#cl_khr_mipmap_image[`cl_khr_mipmap_image`]
| Create and Use Images with Mipmaps
| Extension
Expand Down
27 changes: 26 additions & 1 deletion xml/cl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ server's OpenCL/api-docs repository.
<type category="define">typedef <type>cl_uint</type> <name>cl_image_requirements_info_ext</name>;</type>
<type category="define">typedef <type>cl_bitfield</type> <name>cl_platform_command_buffer_capabilities_khr</name>;</type>
<type category="define">typedef <type>cl_bitfield</type> <name>cl_mutable_dispatch_asserts_khr</name></type>
<type category="define">typedef <type>cl_bitfield</type> <name>cl_device_kernel_clock_capabilities_khr</name>;</type>

<comment>Structure types</comment>
<type category="struct" name="cl_dx9_surface_info_khr">
Expand Down Expand Up @@ -1386,6 +1387,13 @@ server's OpenCL/api-docs repository.
<unused start="19" end="63"/>
</enums>

<enums name="cl_device_kernel_clock_capabilities_khr" vendor="Khronos" type="bitmask">
<enum bitpos="0" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR"/>
<enum bitpos="1" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR"/>
<enum bitpos="2" name="CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR"/>
<unused start="3" end="63"/>
</enums>

<enums start="0x10000" end="0x1FFFF" name="cl_khronos_vendor_id" vendor="Khronos">
<comment>
In order to synchronize vendor IDs across Khronos APIs, Vulkan's vk.xml
Expand Down Expand Up @@ -1545,7 +1553,8 @@ server's OpenCL/api-docs repository.
<enum value="0x1073" name="CL_DEVICE_INTEGER_DOT_PRODUCT_CAPABILITIES_KHR"/>
<enum value="0x1074" name="CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATION_PROPERTIES_8BIT_KHR"/>
<enum value="0x1075" name="CL_DEVICE_INTEGER_DOT_PRODUCT_ACCELERATION_PROPERTIES_4x8BIT_PACKED_KHR"/>
<unused start="0x1076" end="0x107F" comment="Reserved for cl_device_info"/>
<enum value="0x1076" name="CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR"/>
<unused start="0x1077" end="0x107F" comment="Reserved for cl_device_info"/>
<enum value="0x1080" name="CL_CONTEXT_REFERENCE_COUNT"/>
<enum value="0x1081" name="CL_CONTEXT_DEVICES"/>
<enum value="0x1082" name="CL_CONTEXT_PROPERTIES"/>
Expand Down Expand Up @@ -7477,5 +7486,21 @@ server's OpenCL/api-docs repository.
<command name="clCancelCommandsIMG"/>
</require>
</extension>
<extension name="cl_khr_kernel_clock" supported="opencl" ratified="opencl" provisional="true">
<require>
<type name="CL/cl.h"/>
</require>
<require comment="cl_device_info">
<enum name="CL_DEVICE_KERNEL_CLOCK_CAPABILITIES_KHR"/>
</require>
<require>
<type name="cl_device_kernel_clock_capabilities_khr"/>
</require>
<require comment="cl_device_kernel_clock_capabilities_khr">
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_DEVICE_KHR"/>
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_WORK_GROUP_KHR"/>
<enum name="CL_DEVICE_KERNEL_CLOCK_SCOPE_SUB_GROUP_KHR"/>
</require>
</extension>
</extensions>
</registry>

0 comments on commit 75df78c

Please sign in to comment.