Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update NCCL, CUDA, cuDNN, and HPC-X #31

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

Eta0
Copy link
Collaborator

@Eta0 Eta0 commented Mar 5, 2024

Many Updates!

This change updates the following components:

NCCL

NCCL is updated to version 2.20.3-1 for supported CUDA & OS versions, which are:

  • CUDA 12.2 & 12.3 with Ubuntu 20.04
  • CUDA 12.3 only with Ubuntu 22.04

CUDA

The CUDA 12.3 releases have been bumped to the 12.3.2 patch, and the CUDA 12.3 × Ubuntu 20.04 build has been enabled, since it had been commented out previously due to issues that should already be fixed.

cuDNN

The CUDA 12.3 releases now use the newest version of cuDNN: cuDNN 9. This is exclusively available for CUDA 12.3 and the only cuDNN version available for CUDA 12.3, with the nvidia/cuda base images.

As noted in the cuDNN 9 release notes:

This is the first major version bump of cuDNN in almost 4 years. There are some exciting new features and also some changes that may be disruptive to current applications built against prior versions of cuDNN.

So there may or may not be downstream compatibility issues to work out (for example, with the PyTorch builds in coreweave/ml-containers), but it should not be more disruptive than these images' previous state of not having a cuDNN distribution included at all.

HPC-X

The HPC-X distribution is updated from 2.16 & 2.17 to 2.18 on all CUDA 12 releases.

@Eta0 Eta0 added the enhancement New feature or request label Mar 5, 2024
@Eta0 Eta0 requested review from wbrown and salanki March 5, 2024 19:25
@Eta0 Eta0 self-assigned this Mar 5, 2024
Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762301
Image: ghcr.io/coreweave/nccl-tests:12.0.1-cudnn8-devel-ubuntu22.04-nccl2.18.5-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762301
Image: ghcr.io/coreweave/nccl-tests:12.2.2-cudnn8-devel-ubuntu22.04-nccl2.19.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762301
Image: ghcr.io/coreweave/nccl-tests:12.1.1-cudnn8-devel-ubuntu22.04-nccl2.18.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762301
Image: ghcr.io/coreweave/nccl-tests:12.3.2-cudnn9-devel-ubuntu22.04-nccl2.20.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762300
Image: ghcr.io/coreweave/nccl-tests:12.0.1-cudnn8-devel-ubuntu20.04-nccl2.19.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762300
Image: ghcr.io/coreweave/nccl-tests:11.8.0-cudnn8-devel-ubuntu20.04-nccl2.16.5-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762300
Image: ghcr.io/coreweave/nccl-tests:12.1.1-cudnn8-devel-ubuntu20.04-nccl2.18.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762300
Image: ghcr.io/coreweave/nccl-tests:12.2.2-cudnn8-devel-ubuntu20.04-nccl2.20.3-1-8e2e3f3

Copy link

github-actions bot commented Mar 5, 2024

@Eta0 Build complete, success: https://github.com/coreweave/nccl-tests/actions/runs/8161762300
Image: ghcr.io/coreweave/nccl-tests:12.3.2-cudnn9-devel-ubuntu20.04-nccl2.20.3-1-8e2e3f3

@wbrown wbrown merged commit 868dc3d into master Mar 5, 2024
9 checks passed
@Eta0 Eta0 deleted the eta/nccl-cuda-cudnn-updates branch March 5, 2024 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants