Singularity container with OpenMPI and InfiniBand (UCX) #2983
Unanswered
VincentDonney
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Update : I have been working on different things and tried to create a "hybrid" container where I try to bind different folders of my machine, like the libraries, to the container, and then add them to $LD_LIBRARY_PATH or $PATH, but I just get an error on the glibc library (it seems that the container is trying to use the version of glibc of the bind dir and not its own). |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I'm working as an intern currently, and I was asked to build a Singularity container for OpenMPI to make distributed programming possible on multiple machines of our HPC cluster using containers.
Basically, I'm wondering what is the correct approach for something like this : do I want my container to take OpenMPI, UCX and Mellanox OFED (that we use together for InfiniBand) from outside, using something like environment-modules, or do I want to compile OFED, UCX and OpenMPI inside my container ? We could also consider hybrid solutions, like compiling OpenMPI and UCX inside the container but taking our OFED libraries from outside the container.
I would like to know if anyone has worked on a similar case before and can help me. What is the best practice ? What limits will I face with the different solutions I spoke of ?
So far I've already built a container with OFED, UCX and OpenMPI compiled inside, one with nothing inside but the small setup for environment-modules, and one sort of hybrid, where I compile OpenMPI and UCX inside the container, and install libraries using dnf, and take the little bits of OFED that are still missing to run InfiniBand from outside.
The thing is that I've built my container with a RHEL8.2 OS, which is the same as my host machine, and I don't know what issues I could face if I were to build a container with a RHEL7 or RHEL9 OS for example.
Now to provide more details about our stack, we use LSF as our job scheduler, though I haven't worked with it yet. Our machines are mostly RHEL8.2 and 8.6. Some older machines have mlx4_0 InfiniBand devices, while the newer have mlx5_0.
I have installed OpenMPI-5.0.3 and Singularity-ce-4.1.2 on my host machine, along with UCX-1.16.0 so that OpenMPI can use our InfiniBand network.
Inside my container, I compile MLNX_OFED_LINUX-4.9-7.1.0.0-rhel8.2-x86_64, UCX-1.16.0 and OpenMPI-5.0.3 from tarballs.
I've created a module for Openmpi-5.0.3 and one for UCX-1.16.0 that my container uses with environment-modules.
Here is my definition file for a hybrid container where we use Mellanox OFED and UCX from the host machine, and compile OpenMPI while referencing that UCX. (I mount the directory where my UCX installation is located). At first I tried to compile both UCX and OpenMPI from inside the container, but it was too complicated to bind every folder of Mellanox OFED, since its files are spread in multiple locations.
Any help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions