Scalability of MPI_Win_allocate_shared #5931
Unanswered
ashwinraghu
asked this question in
Q&A
Replies: 1 comment 6 replies
-
Symmetrical allocation affords some optimizations, for example, skip the base address lookup and translations in RMA. I believe this is also needed to support OpenSHM's symmetric heap (https://pmodels.github.io/oshmpi-www/). In either case, it's an optional optimization. It looks like the current code always tries symmetrical heap allocation before fallback. As this can be costly per your report, we should add a configure option or CVAR to allow users to disable it. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A field report noted the execution time of MPI_Win_allocate_shared() increasing with the amount of memory being allocated. e.g. on a single node with 4 ranks, it takes 3 secs for 8GB, 6 secs for 16GB and 12 secs for 32GB.
I did some instrumentation of the 3.4.x implementation using callgrind. The resulting report shows the implementation spending a good part of the time executing instructions in the system call msync() called in turn from
MPIDIU_get_shm_symheap()->generate_random_addr().->check_maprange_ok()
(Column 1 is the number of instructions)
520,123,460 (96.79%) < ???:MPIDIU_get_shm_symheap (1x)
251,658,345 (46.83%) * ???:generate_random_addr
218,103,795 (40.59%) > ???:msync (16,777,215x)
50,331,648 ( 9.37%) > ???:__errno_location (16,777,216x)
Based on my reading of check_maprange_ok(), it's trying to detect a virtual address range of a given size that is free/unmapped.
The base address of this range is then bcast'd and used to map the shared memory across all processes.
The question: what is the motive for trying to ensure that the base address is identical across all processes, given that any access via load and store operations on the shared memory is to be done only after fetching the base address via MPI_Win_shared_query()?
It's also apparent that the fall-back method uses any available address returned by mmap() anyway.
Beta Was this translation helpful? Give feedback.
All reactions