OpenCL(TM) device activity tracing is the standard feature of OpenCL(TM) runtime that allows to measure execution time for the kernels and memory transfers running on the device.
All the device activities, which should be placed into the queue with clEnqueue*
call, have an event that allows to track their execution status (usually it's the last argument of clEnqueue*
function).
The same event could be used to get additional profiling information for the device activity, such as:
- time counter in nanoseconds when the command identified by event is enqueued in a command-queue by the host (
CL_PROFILING_COMMAND_QUEUED
); - time counter in nanoseconds when the command identified by event that has been enqueued is submitted by the host to the device associated with the command queue (
CL_PROFILING_COMMAND_SUBMIT
); - time counter in nanoseconds when the command identified by event starts execution on the device (
CL_PROFILING_COMMAND_START
); - time counter in nanoseconds when the command identified by event has finished execution on the device (
CL_PROFILING_COMMAND_END
).
Intel(R) Xeon(R) Processor / Intel(R) Core(TM) Processor (CPU) Runtimes use QueryPerformanceCounter
on Windows and CLOCK_MONOTONIC
on Linux as time sources for the counters described above. Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver also uses QueryPerformanceCounter
on Windows but CLOCK_MONOTONIC_RAW
on Linux.
Supported Runtimes:
- any OpenCL(TM) 1.0 and above
Supported OS:
- Linux
- Windows
Supported HW:
- Any
Needed Headers:
- OpenCL(TM) headers
Needed Libraries:
- OpenCL(TM) libraries
To retrieve the timestamps described above, one may:
- Enable profiling mode for the command queue of the target device:
cl_queue_properties props[3] = { CL_QUEUE_PROPERTIES,
CL_QUEUE_PROFILING_ENABLE, 0 };
cl_int status = CL_SUCCESS;
cl_command_queue queue =
clCreateCommandQueueWithProperties(context, device, props, &status);
assert(status == CL_SUCESS);
- Introduce the event for the target device activity, e.g. for the kernel:
cl_event event = nullptr;
status = clEnqueueNDRangeKernel(queue, kernel, dim, nullptr,
global_size, local_size, 0, nullptr, &event);
assert(status == CL_SUCESS);
- Set the callback to be notified when the activity is completed:
status = clSetEventCallback(event, CL_COMPLETE, EventNotify, nullptr);
assert(status == CL_SUCESS);
- Read the profiling data inside the callback:
void CL_CALLBACK EventNotify(cl_event event,
cl_int event_status,
void* user_data) {
cl_int status = CL_SUCCESS;
cl_ulong start = 0, end = 0;
assert(event_status == CL_COMPLETE);
status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &start, nullptr);
assert(status == CL_SUCCESS);
status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END,
sizeof(cl_ulong), &end, nullptr);
assert(status == CL_SUCCESS);
}
It's commonly needed to map OpenCL kernel timestamps to general CPU timeline. To solve this problem one should use clGetDeviceAndHostTimer
function to get time sync point between host and device:
cl_ulong device_timestamp = 0, host_timestamp = 0;
cl_int status = clGetDeviceAndHostTimer(
device, device_timestamp, host_timestamp);
assert(status == CL_SUCCESS)
Note, that host timestamp in Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is based on CLOCK_MONOTONIC_RAW
on Linux and QueryPerformanceCounter
on Windows (implementation specific, may be changed in future). Both timers are in nanoseconds.
- refer to the documentation for the function clGetEventProfilingInfo to learn more on OpenCL(TM) profiling