Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0] Fix DeviceInfo global mem free to report unsupported given MemCount==0 #1486

Merged
merged 1 commit into from
Apr 5, 2024

Conversation

nrspruit
Copy link
Contributor

No description provided.

@ph0b
Copy link
Contributor

ph0b commented Mar 29, 2024

This change assumes FreeMemory <= 0 means the FreeMemory api doesn't work, so I have a concern here: Wouldn't there be cases for which the call works and returns <=0 because the device is really out of memory ? In that case we really don't want to report global_memory.

@nrspruit
Copy link
Contributor Author

This change assumes FreeMemory <= 0 means the FreeMemory api doesn't work, so I have a concern here: Wouldn't there be cases for which the call works and returns <=0 because the device is really out of memory ? In that case we really don't want to report global_memory.

Thank you for the comment @ph0b , I realized I should be checking the MemCount. If there are no memory modules reported by the API, then free memory is invalid. If there are MemCount modules, then free memory is the correct value.

@nrspruit
Copy link
Contributor Author

This change assumes FreeMemory <= 0 means the FreeMemory api doesn't work, so I have a concern here: Wouldn't there be cases for which the call works and returns <=0 because the device is really out of memory ? In that case we really don't want to report global_memory.

Thank you for the comment @ph0b , I realized I should be checking the MemCount. If there are no memory modules reported by the API, then free memory is invalid. If there are MemCount modules, then free memory is the correct value.

If the API "fails", then the call will report a UR api failure. In the case of MemCount==0, it means that the API did not find any memory modules, which is not an error, but it does mean that the free memory is not set.

@nrspruit nrspruit changed the title [L0] Fix DeviceInfo global mem free to report global if free memory==0 [L0] Fix DeviceInfo global mem free to report global if MemCount==0 Mar 29, 2024
@nrspruit nrspruit added level-zero L0 adapter specific issues v0.9.x Include in the v0.9.x release labels Mar 29, 2024
@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label Apr 3, 2024
@nsirgien
Copy link

nsirgien commented Apr 4, 2024

Hi @nrspruit, can you please explain more about a reason behind this changes?
I mean, as far as I know, the intended behavior is that if you have free memory aspect, then you should get real free memory numbers from driver. And if you do not have free memory aspect, then attempt to access free memory information would lead to a runtime error.
So, I do not understand, when the code suppose to enter the else branch of this change?
I mean, before it was returning zero, which was wrong, as it was described in https://jira.devtools.intel.com/browse/URT-626.
But with change, it will now return global memory (which is a constant), which is IMHO also wrong.

@nrspruit nrspruit changed the title [L0] Fix DeviceInfo global mem free to report global if MemCount==0 [L0] Fix DeviceInfo global mem free to report unsupported given MemCount==0 Apr 4, 2024
…unt==0

- If there are no memory modules from zesDeviceEnumMemoryModules, then
  we cannot query the free memory from level zero and we do not support
  reporting free memory thru this api on this platform/os.

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@nrspruit
Copy link
Contributor Author

nrspruit commented Apr 4, 2024

Hi @nrspruit, can you please explain more about a reason behind this changes? I mean, as far as I know, the intended behavior is that if you have free memory aspect, then you should get real free memory numbers from driver. And if you do not have free memory aspect, then attempt to access free memory information would lead to a runtime error. So, I do not understand, when the code suppose to enter the else branch of this change? I mean, before it was returning zero, which was wrong, as it was described in https://jira.devtools.intel.com/browse/URT-626. But with change, it will now return global memory (which is a constant), which is IMHO also wrong.

Based on offline discussion, this has been changed to return unsupported if the number of memory modules returned is 0. That way you either get the free memory or unsupported, if the L0 driver cannot query the free memory in this device.

@aarongreig aarongreig merged commit 065bf2d into oneapi-src:main Apr 5, 2024
52 checks passed
nrspruit added a commit to nrspruit/llvm that referenced this pull request Apr 5, 2024
…emCount==0

pre-commit PR for oneapi-src/unified-runtime#1486

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
martygrant pushed a commit to intel/llvm that referenced this pull request Apr 8, 2024
…emCount==0 (#13209)

pre-commit PR for
oneapi-src/unified-runtime#1486

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
kbenzie pushed a commit to kbenzie/unified-runtime that referenced this pull request Apr 8, 2024
[L0] Fix DeviceInfo global mem free to report unsupported given MemCount==0
@kbenzie kbenzie mentioned this pull request Apr 16, 2024
19 tasks
kbenzie pushed a commit to kbenzie/llvm that referenced this pull request Apr 17, 2024
…emCount==0 (intel#13209)

pre-commit PR for
oneapi-src/unified-runtime#1486

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge v0.9.x Include in the v0.9.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants