Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify behavior of clLinkProgram when linking fails #1075

Open
bashbaug opened this issue Mar 3, 2024 · 4 comments
Open

clarify behavior of clLinkProgram when linking fails #1075

bashbaug opened this issue Mar 3, 2024 · 4 comments
Labels
OpenCL API Spec Issues related to the OpenCL API specification.

Comments

@bashbaug
Copy link
Contributor

bashbaug commented Mar 3, 2024

Creating an issue based on discussion in PR #798.

The behavior of clLinkProgram does not seem to be precisely described and as a result implementations are behaving differently. We need to determine what we can fix now, and if we cannot fix everything, what we would like to fix in a future spec version.

Notes:

  • clLinkProgram creates a new program object, unlike clCompileProgram and clBuildProgram, which operate on program objects that have already been created.
  • clLinkProgram may (or may not!) link asynchronously if a callback function pfn_notify is passed to the function.
  • The spec defines conditions when "the linking operation can begin": if the context, list of devices, input programs and linker options specified are all valid and appropriate host and device resources needed to perform the link are available.

Some things we need to decide where implementations are behaving differently are:

  1. What are the situations when clLinkProgram must return a NULL program object and an error code in errcode_ret? Are these all of the cases where "the linking operation cannot begin", or are there other cases that must return a NULL program object and an error code also?
  2. Are there scenarios when clLinkProgram may return both a new non-NULL program object and an error code in errcode_ret? Or, if an error code is generated, will clLinkProgram also return a NULL program object?
  3. If a callback function is provided, will it always be called, even if an error occurs? If an error occurs, what program object is passed to the callback function?

(If you're curious to see how your implementation behaves, I put my tester here: https://github.com/bashbaug/SimpleOpenCLSamples/tree/link-program-error-behavior/samples/99_linkprogramerror.)

@karolherbst
Copy link
Contributor

karolherbst commented Mar 6, 2024

rusticl:

Running on platform: rusticl
Running on device: Mesa Intel(R) UHD Graphics (CML GT2)


Compiling program object 0x379c3d8...
In program_callback: program = 0x379c3d8, user_data = (nil)
Program build status: 0
Program build log:

End of program callback.

clCompileProgram() returned 0
Program compile log for device Mesa Intel(R) UHD Graphics (CML GT2):



Linking program...
In program_callback: program = 0x379c818, user_data = (nil)
Program build status: -2
Program build log:
(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

End of program callback.

(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

clLinkProgram() returned -17
clLinkProgram() created program object 0x379c818.
Program link log for device Mesa Intel(R) UHD Graphics (CML GT2):
(file=input,line=0,column=0,index=0): Unresolved external reference to "func".

All done.

but asynchronous compilation/linking hasn't been implemented yet.

@karolherbst
Copy link
Contributor

karolherbst commented Mar 6, 2024

some of my thoughts:

  1. a cl_program object is the only reliable way to fetch program logs. So I'd say it should depend on that. Usually the CL API returns NULL on errors NULL, but cl_program is special due to this reason and I'd prefer it stays the only reason.
  2. same as 1. I guess
  3. I think the spec is clear enough on that: pfn_notify is a function pointer to a notification routine. The notification routine is a callback function that an application can register and which will be called when the program executable has been built (successfully or unsuccessfully). So given the reason stated in 1. there will always be a cl_program object, therefore by deduction an attempted build also guarantees a valid cl_program object existing as you have no way to retrieve logs otherwise. The question remains if the callback should be called besides attempted compilations/linkings, but that would be a breaking change as applications might run into crashes if they receive a NULL cl_program object now and don't handle CL_INVALID_PROGRAM being returned or other error handling they deemed not necessary.

@SunSerega
Copy link
Contributor

Are you saying clLinkProgram should return a non-NULL program object even if the linking operation cannot begin, for instance in case CL_​INVALID_​CONTEXT is returned from the errcode_ret parameter?

In #798 I made it so a non-NULL program object is only returned (and passed to callback) if errcode_ret is either CL_SUCCESS or CL_LINK_PROGRAM_FAILURE.

That would be enough to always fetch logs, but always returning a non-NULL program object also sounds good. I guess this way is more consistent.

@karolherbst
Copy link
Contributor

no, I didn't. I only meant that in the callback it won't be NULL.

@bashbaug bashbaug added the OpenCL API Spec Issues related to the OpenCL API specification. label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenCL API Spec Issues related to the OpenCL API specification.
Projects
Status: Needs WG discussion
Development

No branches or pull requests

3 participants