Releases · eth-cscs/COSMA

This release enables COSMA to take advantage of fast GPU-to-GPU interconnects like NVLink, to efficiently utilize modern Multi-GPU Systems. This is achieved in 2 ways:

Using NCCL/RCCL Libraries: by specifying -DCOSMA_WITH_NCCL=ON cmake option.
Using GPU-aware MPI: by specifying -DCOSMA_WITH_GPU_AWARE_MPI=ON cmake option, as proposed here.
See README and INSTALL for more info on how to build.

In addition, the following performance improvemets have been made:

Improved Caching:
- all nccl buffers, MPI comms, nccl comms are cached and reused when appropriate.
- all device memory is cached and reused.
Reduced Data Trasfers: the GPU backend of COSMA called Tiled-MM is extended to offer the possibility to the user to leave the resulting matrix C on the GPU. In that case, there is no need to trasfer matrix C from device to host, which not only reduces the communication, but also speeds up the whole cpu->gpu pipeline as no additional synchronizations are needed. Furthermore, reduce_scatter operation does not have to wait for C to be transfered back to host but is immediately invoked with GPU pointers, thus utilizing fast inter-gpu links. This way, there is no unnecessary data transfers between cpu<->gpu.
All collectives updated: both all-gather and reduce-scatter collectives are improved.
Reduced Data Reshuffling: avoids double reshuffling of data, i.e. the data from NCCL/RCCL GPU buffers is immediately copied in the right layout, without additional reshuffling.
Works for variable blocks: NCCL/RCCL' reduce_scatter operation assumes that all the blocks are of the same size and is hence not completely equivalent to MPI_Reduce_scatterv which we previously used. We padded all the blocks to be able to overcome this issue.
Portability: Supports both NVIDIA and AMD GPUs.
Tiled-MM: Updated submodule
COSTA: Updated submodule

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: eth-cscs/COSMA

COSMA-v2.6.6

COSMA-v2.6.5

COSMA-v2.6.4

COSMA-v2.6.3

COSMA-v2.6.2

COSMA-v2.6.1

2.6.0-fixed

COSMA-v2.6.0

COSMA-v2.5.1

COSMA-v2.5.0