NEWS

/**
 * @copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 *
 * See file LICENSE for terms.
 */

## Current

## 1.1.0 (TBD)

## Features

## API
- Added float 128 and float 32, 64, 128 (complex) data types
- Added Active Sets based collectives to support dynamic groups as well as
  point-to-point messaging
- Added ucc_team_get_attr interface

## Core
- Config file support
- Fixed component search

## CL

- Added split rail allreduce collective implementation
- Enable hierarchical alltoallv and barrier
- Fixed cleanup bugs


## TL
- Added SELF TL supporting team size one

### UCP

- Added service broadcast
- Added reduce_scatterv ring algorithm
- Added k-nomial based gather collective implementation
- Added one-sided get based algorithms

### SHARP
- Fixed SHARP OOB
- Added SHARP broadcast


### GPU Collectives (CUDA, NCCL TL and RCCL TL)
- Added support for CUDA TL (intranode collectives for NVIDIA GPUs)
- Added multiring allgatherv, alltoall, reduce-scatter, and reduce-scatterv
  multiring in CUDA TL
- Added topo based ring construction in CUDA TL to maximize bandwidth
- Added NCCL gather, scatter and its vector variant
- Enable using multiple streams for collectives
- Added support for RCCL gather (v), scatter (v), broadcast, allgather (v),
  barrier, alltoall (v) and all reduce collectives
- Added ROCm memory component
- Adapted all GPU collectives to executor design


### Tests
- Added tests for triggered collectives in perftests
- Fixed bugs in multi-threading tests

### Utils
- Added CPU model and vendor detection
- Several bug fixes in all components

## 1.0.0 (April 19th, 2022)

### Features

#### API
- Added Avg reduce operation
- Added nonblocking team destroy option
- Added user-defined datatype definitions
- Added Bfloat16 type
- Clarify semantics of core abstractions including teams and context
- Added timeout option

#### Core
- Added coll scoring and selection support
- Added support for Triggered collectives
- Added support for timeouts in collectives
- Added support for team create without ep in post
- Added support for multithreaded context progress
- Added support for nonblocking team destroy

#### CL 

- Added support for hierarchical collectives
- Added support for hierarchical allreduce collective operation
- Added support for collectives based on one-sided communication routines


#### TL
- Added SHARP TL

##### UCP

- Added Bcast SAG algorithm for large messages 
- Added Knomial based reduce algorithm 
- Making allgather and alltoall agree with the API
- Added SRA knomial allreduce algorithm
- Added pairwise alltoall and alltoallv algorithms
- Added allgather and allgatherv ring algorithms 
- Added support for collective operations based on one-sided semantics
- Added support for alltoall with one-sided transfer semantics
- Bug fixes

##### SHARP
- Added support for switch based hardware collectives (SHARP)

#### NCCL
- Add support for NCCL allreduce, alltoall, alltoallv, barrier, reduce, reduce
  scatter, bcast, allgather and allgatherv

#### Tests
- Updated tests to test the newly added algorithms and operations 


## 0.1.0 (TBD)

### Features

#### API
- UCC API to support library, contexts, teams, collective operations, execution
  engine, memory types, and triggered operations

#### Core
- Added implementation for UCC abstractions - library, context, team,
  collective operations, execution engine, memory types, and triggered
  operations
- Added support for memory types - CUDA, and CPU
- Added support for configuring UCC library and contexts


#### CL 

- Added support for collectives, while the source and destination is either in
  CPU or device (GPU) 
- Added support for UCC_THREAD_MULTIPLE
- Added support for CUDA stream-based collectives 


#### TL

- Added support for send/receive based collectives using UCX/UCP as a transport
  layer
- Support for basic collectives types including barrier, alltoall, alltoallv,
  broadcast, allgather, allgatherv, allreduce was added in the UCP TL
- Added support using NCCL as a transport layer
- Support for collectives types including alltoall, alltoallv, allgather,
  allgatherv, allreduce, and broadcast

#### Tests

- Added support for unit testing (gtest) infrastructure
- Added support for MPI tests