-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Apple's OpenCL.framework? #31
Comments
Thanks for getting in touch. It most probably used to work at some point since I see it has been packaged for brew: |
By your command 8)
|
Yes, that is also referenced in the |
After trying
Is my OpenCL implementation too old?? Also: why does the loader complain about a missing |
What appears to be happening is that the OpenCL.framework library is not an installable client driver. It could be a loader in and of itself, in which case you would need to point the icd file to whatever this loader is loading. But most probably it means your OpenCL implementation is not supporting the What you see about the driver is us trying to identify an installable driver that would incorrectly advertise itself. So we're trying to see if the objects can be cast to a pointer to dispatch table and assert functions inside could be used. |
Hrmf, and on top of that Apple dropped OpenCL support quite a few releases ago so there probably isn't much point pursuing this further. I looked a bit more at HB and saw they package PoCL. Following their recipe I managed to build that myself but that doesn't seem to get me anything usable either:
Seems |
Bummer, I thought I had found the wrapper library I needed, but it seems not to work anymore (probably because it's even older than my OS, haha): |
I think I'm onto something. From an lldb session:
and a bit later, after returning from that query:
So we're dealing with two different definitions of |
There are many thing going on here, and I am going to try and answer some of it, but no in the order you presented them:
Indeed, but your last statement is not correct, Maybe we rejected it because the platform does not advertise the On a side note, the wrapper is a proof of concept, and it should not be used in any serious setup. It is flawed in at least two ways which I can think of:
For the POCL log you posted above:
This indicates POCL working fine. |
pocl works, and provides a more recent implementation than Apple's framework. My main interest here is to be able to compare the two implementations. I checked Now, what's going on here (Pt. 1!) is indeed that
And with that I'm at Pt 2: |
And a number of the examples shipped with |
First, great progress. I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the For callbacks, like the one used in For Pt1 and 2, not sure I am following you here. I would have assumed patching
You can also bypass looking through the dispatch table (and the CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformInfo(cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0
{
return _clGetPlatformInfo_(platform, param_name, param_value_size, param_value, param_value_size_ret);
} the I hope this helps. |
Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU. There are talks of implementing a pocl backend over Metal (see here: CHIP-SPV/chipStar#602 (comment)), but I wouldn't hold my breath. |
Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU.
It *claims* to be CPU-only, and that's also what I read in every discussion of it. But this software was developed when Apple still had a vested interest in scientific and computing applications and were (from what I understood) pretty good at it. Plus they can probably be much closer to the hardware than a portable implementation can be. I also wouldn't be surprised if CPU-only implementations didn't exactly get faster while they evolved to newer OCL versions (if you only look at the performance of older functionality on the same hardware, that is).
|
I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the `clReleaseXXXXX` wrapper functions, none of them actually free the wrapper struct, they just forward the wrapped object to the underlying implementation.
N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is `CL_CONTEXT_REFERENCE_COUNT` sufficiently reliable to determine when the wrapper object has to be freed?
Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe.
There are enough of those to make that a "nice" additional bit of work :)
For callbacks, like the one used in `clSetEventCallback`, the callabck is expected to be called with the `cl_event` handle the user used to register the callback. Given how the wrapper library works, the user will be provided the handle from the underlying implementation, and not the handle the `ocl_icd_wrapper` returned, since the underlying implementation has no knowledge of this handle. Fixing this could be achieved with closures, but this is __hard__.
A closure, like a lambda function? I suppose you'd have to write a wrapper callback function (or callback wrapper function?) that figures out what the user-provided handle is and then calls the user-provided callback with that handle? Could that be done without resorting to lookup tables?
That does sound like a harder problem than fixing the retain/release mechanism so the temporary alternative could be just to raise an error when an attempt is made to use event callbacks. But then maybe that would render the entire endeavour pointless (= I am on too unfamiliar grounds to assess how necessary and common use of such callbacks is).
For Pt1 and 2, not sure I am following you here.
I realise I wasn't very clear - part of the work I did was done instead of sleeping and now I don't even remember exactly why I added a wrapper for `clGetPlatformIDs` for instance ;)
Remember that CL_INVALID_PLATFORM error in the logs above? That came from `clGetPlatformInfo` because your code retrieves the pointer to that function dynamically and `ocd_icd_wrapper` is not a traditional wrapper library that overloads system functions. So you were calling into Apple's function which of course can't make any sense of the platform pointer (aka the dispatch table) it gets from you. Hence my patch to `ocl-icd`:
```
diff --git a/ocl_icd_loader.c b/ocl_icd_loader.c
index 12e0182..0f84d03 100644
--- a/ocl_icd_loader.c
+++ b/ocl_icd_loader.c
@@ -474,6 +474,16 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
break;
}
}
+#ifdef __APPLE__
+ int isAppleCL = false;
+ {
+ Dl_info info;
+ // check what library provides clGetPlatformInfo():
+ if (dladdr(plt_info_ptr, &info) && info.dli_fname) {
+ isAppleCL = strstr(info.dli_fname, "/System/Library/Frameworks/OpenCL.framework") != NULL;
+ }
+ }
+#endif
cl_uint num_platforms=0;
cl_int error;
error = (*plt_fn_ptr)(0, NULL, &num_platforms);
@@ -504,12 +514,22 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
p->vicd=&_icds[i];
p->pid=platforms[j];
+#ifdef __APPLE__
+ // If we're trying to work with Apple's OpenCL framework we'll need
+ // to get a wrapped clGetPlatformInfo() function because Apple's version
+ // will expect a pointer to the actual platform instead of to the dispatch
+ // table (on my system that pointer is always 0x7fff0000).
+ if (isAppleCL)
+#else
/* If clGetPlatformInfo is not exported and we are here, it
* means that OCL_ICD_ASSUME_ICD_EXTENSION. Si we try to take it
* from the dispatch * table. If that fails too, we have to
* bail.
*/
- if (plt_info_ptr == NULL) {
+ if (plt_info_ptr == NULL)
+#endif
+ {
+ debug(D_LOG, "Getting clGetPlatformInfo from dispatch table!");
plt_info_ptr = p->pid->dispatch->clGetPlatformInfo;
if (plt_info_ptr == NULL) {
debug(D_WARN, "Missing clGetPlatformInfo even in ICD dispatch table, skipping it");
```
(I'm quite content I remembered that Apple has `dl_addr` and that you already checked for `strstr` which kept this change simple!)
It would be nicer too have a more elegant check that also works when the framework is not in its official location. Checking just for `OpenCL.framework` is too ambiguous: I've already added some logic to my `ocl-icd` install recipe to create an additional installation as a framework, called ... OpenCL.framework . That in itself gives a usable way to chose between say Apple's and the POCL implementation by linking against the one OpenCL framework and then using `DYLD_FRAMEWORK_PATH` to point dyld to the other. But only in applications that don't hardcode the OpenCL framework location, like e.g. opencv does ...
I also patch the wrapped implementation: https://github.com/RJVB/ocl_icd_wrapper/blob/5d0da6cc2d5cb52cb9b80e0efb1891da67f366f2/ocl_icd_wrapper.c#L118
|
This is not what ocl_icd is doing here. It is calling the wrapper function #define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_; notice how the macro appends the underscores to the function name. The wrapper is linked against the apple OpenCL library and these functions are "embedded" inside the wrappers, and not contained into a table. Let me know how it goes. If something is not working after that, it must be because we have a bug in ocl-icd that needs to be fixed.
Yes and yes :) . I would use atomics for such a task, they usually prove better at these kind of jobs.
Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released. |
On Tuesday September 19 2023 08:05:46 Brice Videau wrote:
This is not what ocl_icd is doing here. It is calling the wrapper function `ocl_icd_wrapper` put in it's dispatch table:
```C
#define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_;
Where exactly?
For `clGetPlatformInfo` it first tries to load it dynamically from the library: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L457
That call will *not* fail with OpenCL.framework, and will give you a pointer to Apple's version that will not know what to do with a pointer to the dispatch table instead of whatever 0x7fff0000 points to.
I submitted a PR to your repo that should avoid the whole issue and allow not modifying ocl-icd at all:
https://github.com/RJVB/ocl_icd_wrapper/pull/1/files
I'll have a look later today and let you know, but I don't see how it would get around the fact that ocl-icd will only use the dispatch table for `clGetPlatformInfo` if that dynamic lookup failed.
I assume it's written that way because not intended to be used with implementations that aren't ICD-compatible, IOW the assumption is probably that the function found via dlsym will accept the platform_id handle it will be called with.
I had a quick look at Mesa's OpenCL implementation, which can be built as "standalone" and as ICD-compatible, to see if they handle the 2 situations differently. Of course they don't (AFAICT), but then again there probably is no reason. I guess dependent code is not supposed to know anything about the platform_id handle so it can represent anything the implementation wants.
> N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is `CL_CONTEXT_REFERENCE_COUNT` sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)
Yes and yes :)
There are 5 statements in my text above (incl. 2 explicit questions) to which you could be responding with "yes", so I have to ask you indeed meant the 2 explicit questions ;)
. I would use atomics for such a task, they usually prove better at these kind of jobs.
Faster, undoubtedly. Use them as a semaphore you mean, something like `if (--atomicBarrier == 0) { "do our thing"; atomicBarrier += 1; }` ?
If you did mean that "yes, it's possible to retain/release the wrapper objects" (using the underlying clRetain/Release functions!), are those locks are needed only to ensure that the 2 refcounts remain in sync, right?
Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released.
Isn't that the same thing I thought up - you'd still need to install an intermediate callback that calls the intended callback with the data from that small structure, no?
I thought of putting that small structure in the redefined cl_event structure in such a way a pointer to it can be returned as a cl_event*, something like
```
struct cl_event_wrapper {
struct cl_event theEvent;
struct wrapData {
// whatever you need
}
};
```
But that's dangerous . So then what, an internal lookup table that stores all those "small structures" as with the cl_event wrapper object as lookup key, queried when releasing that wrapper object?
|
dlsym should be querying pointers from the wrapper library, not the OpenCL framework. For now this symbol cannot be found in the wrapper library, so the loader will try querying it through In practice, I'll try to answer the ref counting questions in more details later, when I have more time. Edit: I see the wrapper library Edit 2: see here for the specification about the ICD library (the wrapper in this case) having to provide the 3 entry points:
This is what my patch to the wrapper library does, implement the third missing symbol. Edit 3: OK I get why you're getting the Apple symbol from the dlsym, it is getting grabbed because the Apple library is linked into the wrapper. It took me some time to get there. Of course my current PR is broken, since it will call itself rather than the underlying Apple implementation. I will have the get extension function to return the clGetPlatformInfo pointer which should override the one obtained through dlsym, as can be seen here: Lines 290 to 311 in fdde667
Edit 4: The real clean solution to the above issue would be having the Apple library be loaded at initialization via dlopen in the wrapper library, and it's symbol queried by dlsym. Pointers would be used inside the functions in the wrapper. This would ensure symbols from the Appple library cannot bleed into the loader. |
dlsym should be querying pointers from the wrapper library, not the OpenCL framework.
AFAIK OS X works as a general Unix in this aspect. If you query a shared library for a symbol that it uses but obtains from another library, you will get the address for that symbol (barring visibility tricks) but `dl_addr` will tell you that it comes from that other library.
You can check with my wrapper around dlsym: https://github.com/RJVB/legacy-tools/blob/master/dlsym.c .
Avoiding this is possible, but you'd have to overload the function. That is, ocd_icd_wrapper would have to have a function called `clGetPlatformInfo`, and use an init function to obtain the pointer to Apple's implementation.
R.
|
Indeed, this is my understanding as well. It just took me way too long to make the connection.
This is indeed the "correct" solution I was proposing above (edit 4). I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications. |
Indeed, this is my understanding as well. It just took me way too long to make the connection.
Lol, I also spent way too long trying to figure out another kind of sudden, "inexplicable" behaviour the other day. Until I realised I had linked the wrapper library to my framework version of ocd-icl ;)
This is indeed the "correct" solution I was proposing above (edit 4).
I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications.
I just found the time for a quick look, saw it in one of the commits but somehow not showing up in what I take to be the combined patch; I hope to get around to testing it today.
|
Would it be possible to make
ocl-icd
support Apple's OpenCL.framework somehow?Or maybe it is supported already and I just do something wrong? With
clinfo
linked to ocl-icd's libOpenCL.dylib :For comparison, here's the output when
clinfo
is linked to OpenCL.framework :The text was updated successfully, but these errors were encountered: