Skip to content

RCCL-2.8.4 for ROCm 4.3.0

Compare
Choose a tag to compare
@saadrahim saadrahim released this 30 Jul 22:51

Added

  • Ability to select the number of channels to use for clique-based all reduce (RCCL_CLIQUE_ALLREDUCE_NCHANNELS). This can be adjusted to tune for performance when computation kernels are being executed in parallel.

Optimizations

  • Additional tuning for clique-based kernel AllReduce performance (still requires opt in with RCCL_ENABLE_CLIQUE=1)
  • Modification of default values for number of channels / byte limits for clique-based all reduce based on device architecture

Changed

  • Replaced RCCL_FORCE_ENABLE_CLIQUE to RCCL_CLIQUE_IGNORE_TOPO
  • Clique-based kernels can now be enabled on topologies where all active GPUs are XGMI-connected
  • Topologies not normally supported by clique-based kernels require RCCL_CLIQUE_IGNORE_TOPO=1

Fixed

  • Install script '-r' flag invoked alone no longer incorrectly deletes any existing builds.

Known issues

  • Managed memory is not currently supported for clique-based kernels