forked from NVIDIA/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SD] CUDA Graphs update (NVIDIA#8613)
* [SD] remove synchronizations Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Typo in logging Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] Remove the sync invoked by tensor allocation. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Make the model sync-free again. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Support PyTorch Lightning 2 for full iteration CUDA graph callback. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Add documentation about CUDAGraphCallback. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Support synthetic dataset. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Fix typo. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Fix the bug of wrong GN groups. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * remove circular dependency Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Change naming for offline clip Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Add exception when no gradient allreduce is called. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * rename enable_amp_o2_fp16 -> unet_precision Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Adjustments to PyTorch 2.3 Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * fix CUDA Graphs support in SD Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Document incompatibility betwee pipe parallelism and full iteration CUDA Graph callback for SD. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * update CUDA Graphs callback to PTL 2.1 Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] Full-fp16: push normalization layers in FP16. Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] enable CUDA Graphs in examples Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] add model warmup Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * fix sanity-check for CUDA Graphs Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] CUDA Graphs test Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Update cuda graph jenkins test Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * fix typo Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * fix path in test Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * handle unexpected precision value for PipelineMixedPrecisionPlugin Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * remove unused import Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * replace unsupported syntax Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * Add a gurad for megatron fused adam Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs for FSDP in clip_grads Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> * [SD] skip model warmup when CUDA Graph not captured Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> --------- Signed-off-by: Marek Wawrzos <mwawrzos@nvidia.com> Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Signed-off-by: Marek Wawrzos <marek.28.93@gmail.com> Co-authored-by: Szymon Mikler <smikler@nvidia.com> Co-authored-by: Wil Kong <alpha0422@gmail.com> Co-authored-by: Mengdi Wang <didow@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com> Co-authored-by: Mingyuan Ma <mingyuanm@nvidia.com>
- Loading branch information
1 parent
23baa48
commit c573826
Showing
15 changed files
with
349 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.