Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ucc_mem_map limitations #883

Open
nirandaperera opened this issue Nov 22, 2023 · 4 comments
Open

ucc_mem_map limitations #883

nirandaperera opened this issue Nov 22, 2023 · 4 comments
Labels
API API change question Further information is requested

Comments

@nirandaperera
Copy link

Hi all,
I am trying to mem map memory regions using ucc_context_params.mem_params config in ucc_context_create stage.
I encountered several issues here.

  1. It seems like there is a hard limitation on the number of memory regions we can map. It's capped by #define MAX_NR_SEGMENTS 32. That would be really small given that ucc_mem_map_params.n_segments is a uint64.
  2. In ucx we can mem map regions dynamically. But it seems in ucc this is restricted at the context creation time. Is there a particular reason for this?
@Sergei-Lebedev
Copy link
Contributor

ping @manjugv @wfaderhold21

@wfaderhold21
Copy link
Collaborator

The memory regions mapped by UCC are used in collectives that use RDMA operations, such as PUT and GET operations, to perform the collective. To ensure proper execution and completion of the collective, the buffers are checked to ensure they have been mapped prior to issuing an operation. This check can be expensive when there are many memory regions; in prior work, we found that this will increase the latency for PUT/GET operations when using more than 32 regions. Thus, we limited the regions to 32. This kind of limitation fits PGAS memory usage like OpenSHMEM’s symmetric heap rather than others like MPI’s message passing, which would likely benefit from using the two-sided algorithms rather than the one-sided RDMA algorithms.

As for question 2, because we are using RDMA operations, the mkeys for the memory regions must be exchanged via an allgather operation prior to usage. Performing this dynamically could be possible with API changes, but would be expensive. By associating the memory regions to a context, we can combine the necessary allgather operation with the context creation allgather and hide some of the overhead.

@janjust
Copy link
Collaborator

janjust commented Nov 29, 2023

@wfaderhold21 @manjugv I was thinking about 2 for some time.
Yes it's expensive, initially, but it's perhaps the only way to allow 1-sided collectives in 2-sided programming models. Something we're tyring to do as part of our allreduce (as you know).
I was thinking of proposing an API extension in UCC to allow for dynamic memory region attach/update/detach.
Essentially a collective operation required to be called prior to collective_init, which would allow for 1-sided collectives.

@janjust
Copy link
Collaborator

janjust commented Nov 29, 2023

And to add onto this - this also makes sense when integrating into NCCL, because NCCL has the notion of preregistering buffers.
Moreover, XGVMI integration into UCC is another use case. We need to register xgvmi buffers dynamically to support xgvmi operations in UCC, it would significantly simplify using xgvmi usage in UCC.

@janjust janjust added API API change question Further information is requested labels Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API change question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants