Skip to content

RCCL 2.20.5 for ROCm 6.2.0

Compare
Choose a tag to compare
@rocm-ci rocm-ci released this 02 Aug 16:15
45b618a

Changed

  • Compatibility with NCCL 2.20.5
  • Compatibility with NCCL 2.19.4
  • Performance tuning for some collective operations on MI300
  • Enabled NVTX code in RCCL
  • Replaced rccl_bfloat16 with hip_bfloat16
  • NPKit updates:
    • Removed warm-up iteration removal by default, need to opt in now
    • Doubled the size of buffers to accommodate for more channels
  • Modified rings to be rail-optimized topology friendly
  • Replaced ROCmSoftwarePlatform links with ROCm links

Added

  • Support for fp8 and rccl_bfloat8
  • Support for using HIP contiguous memory
  • Implemented ROC-TX for host-side profiling
  • Enabled static build
  • Added new rome model
  • Added fp16 and fp8 cases to unit tests
  • New unit test for main kernel stack size
  • New -n option for topo_expl to override # of nodes
  • Improved debug messages of memory allocations
  • Channel shuffling for IB systems

Fixed

  • Bug when configuring RCCL for only LL128 protocol
  • Scratch memory allocation after API change for MSCCL
  • Incorrect minNchannels in multi-node