You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ability to select the number of channels to use for clique-based all reduce (RCCL_CLIQUE_ALLREDUCE_NCHANNELS). This can be adjusted to tune for performance when computation kernels are being executed in parallel.
Optimizations
Additional tuning for clique-based kernel AllReduce performance (still requires opt in with RCCL_ENABLE_CLIQUE=1)
Modification of default values for number of channels / byte limits for clique-based all reduce based on device architecture
Changed
Replaced RCCL_FORCE_ENABLE_CLIQUE to RCCL_CLIQUE_IGNORE_TOPO
Clique-based kernels can now be enabled on topologies where all active GPUs are XGMI-connected
Topologies not normally supported by clique-based kernels require RCCL_CLIQUE_IGNORE_TOPO=1
Fixed
Install script '-r' flag invoked alone no longer incorrectly deletes any existing builds.
Known issues
Managed memory is not currently supported for clique-based kernels