-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[uma] Improve reporting of pool/provider specific errors #640
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,23 +67,25 @@ enum uma_result_t umaMemoryProviderFree(uma_memory_provider_handle_t hProvider, | |
/// independent way to return a provider specific result. | ||
/// | ||
/// \details | ||
/// - The string returned via the ppMessage is a NULL terminated C style | ||
/// string. | ||
/// - The string returned via the ppMessage is thread local. | ||
/// - The memory in the string returned via the ppMessage is owned by the | ||
/// adapter. | ||
/// - The application may call this function from simultaneous threads. | ||
/// - The implementation of this function should be lock-free. | ||
/// * Implementations *must* store the message and error code in thread-local | ||
/// storage prior to returning UMA_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC. | ||
/// | ||
/// * The message and error code will only be valid if a previously | ||
/// called entry-point returned UMA_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC. | ||
igchor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// | ||
/// * The memory pointed to by the C string returned in `ppMessage` is owned by | ||
/// the adapter and *must* be null terminated. | ||
/// | ||
/// * The application *may* call this function from simultaneous threads. | ||
/// | ||
/// * The implementation of this function *should* be lock-free. | ||
/// \param hProvider handle to the memory provider | ||
/// \param ppMessage [out] pointer to a string containing provider specific | ||
/// result in string representation | ||
/// \return UMA_RESULT_SUCCESS if the result being reported is to be considered | ||
/// a warning. Any other result code returned indicates that the | ||
/// adapter specific result is an error. | ||
enum uma_result_t | ||
umaMemoryProviderGetLastResult(uma_memory_provider_handle_t hProvider, | ||
const char **ppMessage); | ||
/// \param pError [out] pointer to an integer where the adapter specific error code will be stored | ||
void umaMemoryProviderGetLastNativeError(uma_memory_provider_handle_t hProvider, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what if some pool uses several memory providers, e.g. one for data and another one for metadata? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. User might need to know which (metadata or data) provider failed. I think we need to extend error codes to specify DATA_PROVIDER_ERROR and METADATA_PROVIDER_ERROR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not like the idea of introducing provider type-specific errors, especially if we consider changing the way how we specify provider type in the pool. We can think of storing the handle to the provider that failed and remove In general, I would like to see some code-snippet that demonstrates how these API suppose to be used by client (e.g. UR). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've extended the error codes in #641 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we assume that Still I think The code would look similar to what I proposed in the comment above (just with extra switch for metadata/data provider): void* ptr = umaPoolMalloc(hPool, size);
if (!ptr) {
uma_result_t ret = umaPoolGetLastAllocationError(hPool);
uma_memory_provider_handle_t provider;
if (ret == UMA_RESULT_ERROR_DATA_MEMORY_PROVIDER_SPECIFIC) {
provider = umaPoolGetDataMemoryProvider(hPool):
} else if (ret == UMA_RESULT_ERROR_METADATA_MEMORY_PROVIDER_SPECIFIC {
provider = umaPoolGetMetadataMemoryProvider(hPool):
} else {
....
}
const char *msg;
int32_t native_error;
umaMemoryProviderGetLastNativeError(provider, &msg, &native_error);
// cast native_error to ze_result_t and handle it
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then how would you know which provider failed? (i.e. which provider to pass to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide not to merge #641 we need some other way of specyfing which memory provider is the one that failed. The simplest approach would be just to have only one provider per pool (as we had originally). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vinser52 Anyway, this PR does not introduce UMA_RESULT_ERROR_DATA_MEMORY_PROVIDER_SPECIFIC nor UMA_RESULT_ERROR_METADATA_MEMORY_PROVIDER_SPECIFIC so I think this discussion can continue on this thread: https://github.com/oneapi-src/unified-runtime/pull/641/files#r1244132297 Can this PR be merged or do you have any other comments? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, to interpret the native error code you should know the concrete implementation referred by the memory provider handle.
And it does not matter whether it DATA or METADATA memory provider. You just need to know which memory provider failed. For example, imagine the following situation:
But an even better idea is to look how similar problem solved in STL:
It would be useful to understand how clients are going to use this native error? Do we have a code snippet from UR perspective? Let's discuss it by phone first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. I've implemented |
||
const char **ppMessage, | ||
int32_t *pError); | ||
|
||
/// | ||
/// \brief Retrieve recommended page size for a given allocation size. | ||
|
@@ -136,8 +138,17 @@ umaMemoryProviderPurgeForce(uma_memory_provider_handle_t hProvider, void *ptr, | |
/// \brief Retrive name of a given memory provider. | ||
/// \param hProvider handle to the memory provider | ||
/// \param ppName [out] pointer to a string containing name of the provider | ||
void umaMemoryProviderGetName(uma_memory_provider_handle_t hProvider, | ||
const char **ppName); | ||
const char *umaMemoryProviderGetName(uma_memory_provider_handle_t hProvider); | ||
|
||
/// \brief Retrieve handle to the last memory provider that returned status other | ||
/// than UMA_RESULT_SUCCESS on the calling thread. | ||
/// | ||
/// \details Handle to the memory provider is stored in the thread local | ||
/// storage. The handle is updated every time a memory provider | ||
/// returns status other than UMA_RESULT_SUCCESS. | ||
/// | ||
/// \return Handle to the memory provider | ||
uma_memory_provider_handle_t umaGetLastFailedMemoryProvider(); | ||
|
||
#ifdef __cplusplus | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
nit: perhaps a better name for this variable would be retCode? I think it may be more descriptive.