Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Dynamic Symmetric Memory #909

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

wfaderhold21
Copy link
Collaborator

What

Currently, UCC has support for symmetric memory segments that are added during context creation for use with one-sided collectives. This PR adds support for memory segments to be added after a context and team have been created to better support programming models such as MPI. In addition, this adds support for attaching a key with the memory region for use with DPUs and offloading of collectives.

Why ?

With the current approach, a user will need to perform a memory copy of source and destination buffers to a known and already mapped memory region before a collective can begin using one-sided or offloaded collectives. This removes the need for the memory copy at the expense of memory exchanges on the first collective. This is useful if multiple collectives occur with the same buffers.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@manjugv
Copy link
Contributor

manjugv commented Mar 6, 2024

@wfaderhold21 Is this ready for review? @janjust Does this work for you?

@janjust
Copy link
Collaborator

janjust commented Mar 6, 2024

@manjugv In general this works, but I'm still a little concerned if keeping mapped data until tl destruction is the right approach, or if we should enable symmetric buffer unmapping.

@wfaderhold21
Copy link
Collaborator Author

@manjugv yes. It’s ready for review.

@janjust
Copy link
Collaborator

janjust commented May 29, 2024

@wfaderhold21 Did we address the possibility of adding an imported key? What about the ability to important keys with an API? I think we did a follow up study and concluded that parallelization does not improve perf much, and requires a unique memory region to handle key imports otherwise we run into segfaults.

@janjust
Copy link
Collaborator

janjust commented May 29, 2024

OK - after WG meeting I think I realize that my point is moot. I think this is good as is, we just need to put a protection for when keys are imported by multiple threads.

@wfaderhold21
Copy link
Collaborator Author

@Sergei-Lebedev @janjust was there any more feedback on this PR?

@janjust
Copy link
Collaborator

janjust commented Jun 12, 2024

@wfaderhold21 I don't think so - I think we're just really backlogged with other PRs and reviews - sorry for the slow progress but it's on my radar

Copy link
Collaborator

@janjust janjust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/components/tl/ucp/tl_ucp_context.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks

src/components/tl/ucp/tl_ucp_coll.c Outdated Show resolved Hide resolved
src/components/tl/ucp/tl_ucp_coll.c Outdated Show resolved Hide resolved
src/components/tl/ucp/tl_ucp_context.c Show resolved Hide resolved
src/components/tl/ucp/tl_ucp_coll.c Outdated Show resolved Hide resolved
@wfaderhold21
Copy link
Collaborator Author

@samnordmann @nsarka I think I made the corrections you all asked, but I also updated some code to bugs I found. Could you all take another look to make sure everything is OK there?

Copy link
Collaborator

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/components/tl/ucp/tl_ucp_coll.c Outdated Show resolved Hide resolved
src/components/tl/ucp/alltoall/alltoall_onesided.c Outdated Show resolved Hide resolved
src/components/tl/ucp/tl_ucp_coll.c Outdated Show resolved Hide resolved
@janjust
Copy link
Collaborator

janjust commented Aug 27, 2024

@Sergei-Lebedev @wfaderhold21 Hey guys, sorry to harp on this, but are we done with this PR? Is it ready to be merged, @Sergei-Lebedev any other input needed?

@wfaderhold21
Copy link
Collaborator Author

@janjust AFIAK nothing to be added on my end.

@wfaderhold21
Copy link
Collaborator Author

@Sergei-Lebedev @manjugv are there any outstanding comments? Can we push this forward?

@manjugv
Copy link
Contributor

manjugv commented Sep 30, 2024

@Sergei-Lebedev @manjugv are there any outstanding comments? Can we push this forward?

+1 from my side.

@wfaderhold21
Copy link
Collaborator Author

@Sergei-Lebedev @janjust @manjugv The requested changes were made to remove all API changes. I believe this PR now conforms to what we have discussed.

@janjust
Copy link
Collaborator

janjust commented Dec 5, 2024

@wfaderhold21 is this safe to "update with rebase"?

@janjust
Copy link
Collaborator

janjust commented Dec 5, 2024

Dependent on #1037

@wfaderhold21
Copy link
Collaborator Author

Dependent on #1037

This is different from #1037 and can be considered separately I believe

@janjust
Copy link
Collaborator

janjust commented Dec 19, 2024

This is different from #1037 and can be considered separately I believe

@wfaderhold21 with the changes in #1037 is this true? I think we decided to let user do the allgather?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants