Skip to content

Commit

Permalink
Update cudagraphs.md
Browse files Browse the repository at this point in the history
Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
  • Loading branch information
JimmyZhang12 authored Sep 17, 2024
1 parent 2daa49f commit ae98997
Showing 1 changed file with 5 additions and 6 deletions.
11 changes: 5 additions & 6 deletions docs/source/performance/cudagraphs.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,19 @@ Currently, cudagraphs increases the memory usage of activations by ~20%. End to

### When to use
Cudagraphs are recommended to be used when host overheads are expected to be significant, for instance systems with weak single-threaded CPU performance.
As a demonstration we show that with a GPT3 20B model, cudagraphs can improve performance by 11.4%. The increase in memory usage due to cudagraphs was observed to be 15GB.
As a demonstration we show that with a GPT3 20B model, cudagraphs can improve performance by 11.4%. The increase in memory usage due to cudagraphs was observed to be 15GB per gpu.
| Setting | TFLOP/s |
| ----- | ------ |
| No Cudagraphs | 750 |
| Cudagraphs | 836 |
- Setup: 16x GH200, MBS=2, TP=4, FP8 precision




### Usage
As of the NeMo 24.09 release, Cudagraphs is currently supported only for the pretraining of dense models.
To enable please add the following configs:
`model.enable_cuda_graph=True `
`model.use_te_rng_tracker=True `
`model.enable_cuda_graph=True`

`model.use_te_rng_tracker=True`

`model.get_attention_mask_from_fusion=True`

0 comments on commit ae98997

Please sign in to comment.