Explore the potential of RMA-based APIs using libfabric
.
Please cite out paper (see bellow).
PMI (client) and hydra (server) provide process management abilities.
Download them from the mpich website.
To build rmem
, only the PMI
library is needed, although hydra (or any other PMI
) server is needed to execute rmem
.
Here are some examples of useful commands with hydra
:
# run 2 processes, 1 process-per-node, label the output
mpiexec -n 2 -ppn 1 -l
# run 2 processes, bind each of them to a core
mpiexec -bind-to core
To build libfabric
, there are different options (ex: brew install libfabric
on macos), here is how to do it from source:
./autogen.sh
CC=$(CC) CXX=$(CXX) ./configure --prefix=$(YOUR_PREFIX) --enable-psm3 --enable-sockets
# for CUDA support, add
--with-cuda=${CUDA_HOME} --with-gdrcopy
# for AMD support, add
--with-rocr=${ROCM_PATH}
make install -j 8
To build CXI (Slingshot-11 provider), you will need some workarounds:
- the main branch doesn't build on most supercomputer (lib-cxi is too old, see here), instead use this branch
- install json-c with from source with
cmake . -DCMAKE_INSTALL_PREFIX=${HOME}/json-c
and add the option--with-json=${HOME}/json-c
You can check that the build is working using fi_pingpong
:
# with mpiexec
mpiexec -n 1 ./fi_pingpong : -n 1 ./fi_pingpong localhost
# or without mpiexec
fi_pingpong & fi_pingpong localhost
We use a Makefile
to compile.
To handle the different systems, the file make_arch/default.mak
contains the different variable definitions needed to find the dependencies etc.
Specifically we rely on the following variables:
CC
gives the compiler to usePMI_DIR
the root directory ofpmi
OFI_DIR
the root directory ofofi
OPTS
(optional) contains flags to be passed to the compilers for more flexibility. E.g.-fsanitize=address
,-flto
etc
The Makefile
offers various targets by defaults:
rmem
: buildsrmem
info
: display info about the buildclean
/reallyclean
: cleans the builddefault
: displays the info and buildrmem
fast
: compiles for fast execution (equivalent toOPTS=-O3 -DNDEBUG
)debug
: compiles with debug symbols (equivalent toOPTS=-O0 -g
)verbose
: compiles for debug with added verbosity (equivalent toOPTS=-DVERBOSE make debug
)asan
: compiles with debug symbols (equivalent toOPTS=-fsanitize=address -fsanitize=undefined make verbose
)
Note: if you prefer to add another make_arch file, you can also invoke it using ARCH_FILE=make_arch/myfile make
.
the ready-to-receive protocol is used to expose readiness to reception by the target to the origin of the RMA call.
am
: will use active messaging (fi_send
) and pre-posted buffers at the sendertag
: will use tagged messaging (fi_tsend
andfi_trecv
). The main performance bottleneck is unexpected messagesatomic
: uses an atomic operation (fi_atomic
)
am
: will usefi_send
and pre-posted buffers at the sendertag
: will usefi_tsend
andfi_trecv
. The main performance bottleneck is unexpected messagescq_data
usesfi_cq_data
to close the epoch, to be used with-c order
delivery
uses delivery complete (FI_DELIVERY_COMPLETE
) on the payload operationfence
uses a fence to issue the down-to-close acknowledgmentcq_data
usesFI_CQ_DATA
to track remote completioncounter
usesFI_REMOTE_COUNTER
to track remote completion using remote countersorder
use network ordering, must be used with-d cq_data
Different networks have different capabilities and limitations, here is a list of the restrictions we have encountered:
psm3
: does not support RMA natively, emulated in software over tag messaging, see hereverbs;ofi_rxm
: poor native support ofFI_ATOMIC
cxi
: doesn't supportFI_CQ_DATA
for the momentsockets
: supports everything exceptFI_REMOTE_COUNTER
To be announced
/*
* Copyright (c) 2024, UChicago Argonne, LLC
* See COPYRIGHT in top-level directory
*/