Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ownership of gpu tasks #686

Open
devreal opened this issue Oct 29, 2024 · 3 comments
Open

Ownership of gpu tasks #686

devreal opened this issue Oct 29, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@devreal
Copy link
Contributor

devreal commented Oct 29, 2024

Description

The GPU task structure is allocated by the DSL but free'd by the device management thread. That breaks the ownership and prevents certain optimizations, like eliding the allocation by integrating the GPU task structure into a higher-level task object.

Describe the solution you'd like

The release of the memory holding the GPU task should be done by the DSL, which allocated it. Remove the free of that structure from the device task and move it into the release callback provided by the DSL (i.e., DTD and PTG).

Describe alternatives you've considered

Adding another callback to free the structure but that adds no value.

Additional context

@devreal devreal added the enhancement New feature or request label Oct 29, 2024
@bosilca
Copy link
Contributor

bosilca commented Oct 30, 2024

I agree, this has bugged me for a while but I was not able to find a clean solution. The problem is that we cannot free the gpu_task in task_release because we don't have the gpu_task there and because the task itself does not have a pointer back to the gpu_task.

We could make the gpu_task derive from parsec_object_t (which would give us access to a constructor/destructor) but the assumption there is that the object is automatically freed (after the destructors are called) when the refcount reaches zero. Not a great solution either.

bosilca added a commit to bosilca/parsec that referenced this issue Oct 30, 2024
This makes the lifecycle of the device tasks symmetric: they are
allocated by the DSL and therefore shall be freed by the DSL. This
addresses the request from ICLDisco#686.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
@devreal
Copy link
Contributor Author

devreal commented Oct 30, 2024

Ahh, I forgot that the task doesn't point back to the gpu task. Maybe a callback on the chore would do? It's in a chore callback where we create the gpu task so we might as well have a callback to destroy it. There is no use for it for host-side tasks but I don't see that as an issue, it's just ignored.

@bosilca
Copy link
Contributor

bosilca commented Oct 30, 2024

Look at #688. I'm trying to test it, but the new MPI detection sucks on all our machines.

bosilca added a commit to bosilca/parsec that referenced this issue Oct 30, 2024
This makes the lifecycle of the device tasks symmetric: they are
allocated by the DSL and therefore shall be freed by the DSL. This
addresses the request from ICLDisco#686.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants