Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_khr_icd 2.0.0 Discussion #1003

Open
Kerilk opened this issue Nov 20, 2023 · 3 comments
Open

cl_khr_icd 2.0.0 Discussion #1003

Kerilk opened this issue Nov 20, 2023 · 3 comments

Comments

@Kerilk
Copy link
Contributor

Kerilk commented Nov 20, 2023

This is a discussion thread, the intent is to gather feedback on the new dispatching strategy proposed at the Montreal F2F, slides are below.

The gist of the idea is:

Fix Dispatching Issues

the current dispatch strategy leaves us open to segfaults, as the dispatch tables are not managed by the loader, and thus the loader has minimal information about them. The idea is to switch to loader managed dispatch as Vulkan does.

Change OpenCL objects layout

From:

struct _cl_platform_id { cl_icd_dispatch *dispatch; };

to:

struct _cl_platform_id {
  cl_icd_dispatch *dispatch;
  void            *dispatch_data;
};

The loader will be able to distinguish objects supporting the new dispatching strategy by using the first entry in the dispatch table, clGetPlatformIDs, that is not currently used by the loader since it uses clIcdGetPlatformIDsKHR to query platforms:

typedef struct _cl_icd_dispatch {
  /* OpenCL 1.0 */
  clGetPlatformIDs_t *clGetPlatformIDs;
  clGetPlatformInfo_t *clGetPlatformInfo;
  clGetDeviceIDs_t *clGetDeviceIDs;
  /* ... */
}

Dispatch through disp_data

The check and dispatch would look like this

  /* Dispatch test on 64 bit architecture */
  if ((uintptr_t)platform->dispatch->clGetPlatformIDs == 0x4F50454E434C3331) { // (ASCII “OPENCL31”)
    /* New dispatch through `disp_data` */
  } else {
    /* Old dispatch through `dispatch` */
  }

The disp_data field would be set by the loader, using the new clIcdSetPlatformDispatchDataKHR extension function, on platforms returned by clIcdGetPlatformIDsKHR, and populated by the new clIcdGetFunctionAddressForPlatformKHR extension function. This new function allows querying regular entry points for a platform.

Vendor Responsible to Propagate disp_data

This disp_data has to be propagated to child object, and this is left to the implementation to do. For instance, when calling:

cl_int clGetDeviceIDs(
  cl_platform_id platform,
  cl_device_type device_type,
  cl_uint num_entries,
  cl_device_id* devices,
  cl_uint* num_devices);

devices returned by the driver in devices should have their disp_data pointer set to the same value as the one platform currently has.

This cannot reliably done by the loader since regular objects can be created by vendor extensions:

cl_int clEnqueueSVMUnmapARM(
  cl_command_queue command_queue,
  void* svm_ptr,
  cl_uint num_events_in_wait_list,
  const cl_event* event_wait_list,
  cl_event* event);

This extension could create a cl_event object in the event, and the loader would have no opportunity to set the disp_data pointer. So the implementation is responsible for copying the disp_data pointer from command_queue inside event.

Fix Library Behavior

The loader is leaking memory as well as vendor library handles.

Destructors

Leverage library destructors to benefit from the system management of library lifetime (like Vulkan).

Library Unloading

Only unload vendor libraries that advertise this capability (through cl_khr_icd2 or another dedicated extension e.g. cl_khr_icd_unloadable).

Layer unloading

Add Necessary new layer destruction API and deprecate old version of the API.

Conclusion

This should address the main concern about the loader. Instance and application managed dispatch can be tackled once we decide on this part of the proposal. Loader managed dispatch is a prerequisite for instances and instance layers.

Please let me know your feedback, and don't hesitate to poke holes in the proposal.

OpenCL ICD Loader - Status and Perspective - 2023-10-27.pdf

@Kerilk
Copy link
Contributor Author

Kerilk commented Nov 20, 2023

The first part of the proposal, the loader managed dispatch has been implemented in the OpenCL ICD Loader from Khronos, in ocl-icd, but also in pocl.

Corresponding pull requests can be found here:

The implementations are both forward and backward compatible, as old implementations can be used with new loaders, and new implementations can be used with old loaders as well.

A formal definition of the extension can be found here: #1005
An extension enabling layer de-initialization can be found here: #962

@s-barannikov
Copy link

s-barannikov commented Feb 28, 2024

  1. Since clGetFunctionAddressForPlatformKHR returns addresses of core functions only, may it make sense to introduce an enum with all these functions? This would avoid the runtime cost of string comparisons when filling the dispatch table for each platform.
  2. It might be more convinient for implementations if CL objects (except for _cl_platform_id) stored a reference to the owning _cl_platform_id rather than propagating _cl_platform_id's disp_data. This way the implementations can easily check whether the object is valid (as defined in the glossary, "An object is only valid in the platform where it was created."). It would add an extra level of indirection when accessing the dispatch table though, and also make _cl_platform_id stand out in terms of the object layout.
  3. Reinterpreting clGetPlatformIDs field of the dispatch table as a tag looks a bit awkward. Maybe just check that cl_object::dispatch is NULL meaning ICD v2 is supported? It couldn't be NULL in ICD v1.

@Kerilk
Copy link
Contributor Author

Kerilk commented Feb 29, 2024

Thanks a lot for the review!

  1. That is an interesting idea, but I am unsure diverging from Vulkan and our current way to handle these kind of things is a good idea. If this becomes a bottleneck simple solutions like a hash table can be used to reign runtime down. Adding a set of constants to maintain seems like an additional burden on the spec for a small gain in practice.
  2. Implementations I know compare the value of the dispatch table pointer to know if an object is valid. So I am unsure this one buy us anything in practice, and adds a special case for platforms.
  3. As a matter of fact it could have been NULL in ICD1 since the loader is never using it (that's why the strategy is backward compatible) and using clIcdGetPlatformIDsKHR instead so if an implementation was shipping with the clGetPlatformIDs entry set to null we wouldn't have realized it. The other advantage of using a tag is that we can change it if we decide (hopefully never) to change the strategy down the road. Also, a NULL pointer could be an uninitialized value, this tag is unlikely to appear randomly in memory but is also very unlikely to be a valid function pointer on most architectures since it is not aligned (ends in 0x1).

edit: expanded on rationale for tag
edit2: expanded rationale on entry point query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants