Skip to content

Commit

Permalink
Merge branch 'main' into export_wordlist_fix
Browse files Browse the repository at this point in the history
  • Loading branch information
JimmyZhang12 committed Apr 20, 2024
2 parents 9c54ba6 + 6533e48 commit 494e4a1
Show file tree
Hide file tree
Showing 37 changed files with 3,052 additions and 270 deletions.
7 changes: 6 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,12 @@ Add a one line overview of what this PR aims to accomplish.
```

# Jenkins CI
To run Jenkins, a NeMo User with write access must comment `jenkins` on the PR.

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

There's no need to comment `jenkins` on the PR to trigger Jenkins CI.
The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".

# Before your PR is "Ready for review"
**Pre checks**:
Expand Down
198 changes: 194 additions & 4 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3690,6 +3690,75 @@ jobs:
uses: actions/checkout@v2
- run: |
python examples/nlp/language_modeling/megatron_retro_pretraining.py \
trainer.num_nodes=1 \
trainer.devices=2 \
trainer.precision=bf16 \
trainer.accelerator=gpu \
model.data.data_prefix=['none'] \
exp_manager.exp_dir=examples/nlp/language_modeling/mcore_retro_results \
model.mcore_gpt=True \
model.tensor_model_parallel_size=1 \
model.pipeline_model_parallel_size=1 \
model.optim.name=distributed_fused_adam \
model.retro.retro_project_dir=/home/TestData/nlp/megatron_retro/mcore_retro/micro-wiki-core \
model.data.num_workers=4 \
model.micro_batch_size=1 \
model.data.shuffle_documents=False \
trainer.val_check_interval=30 \
+trainer.num_sanity_val_steps=0 \
model.init_method_std=0.023 \
model.optim.lr=6.0e-4 \
model.megatron_amp_O2=True \
model.data.splits_string=\'\"98,2,0\"\' \
model.data.dataloader_type=cyclic \
trainer.max_steps=10
python examples/nlp/language_modeling/megatron_retro_pretraining.py \
trainer.num_nodes=1 \
trainer.devices=2 \
trainer.precision=bf16 \
trainer.accelerator=gpu \
model.data.data_prefix=['none'] \
exp_manager.exp_dir=examples/nlp/language_modeling/mcore_retro_results \
model.mcore_gpt=True \
model.tensor_model_parallel_size=1 \
model.pipeline_model_parallel_size=1 \
model.optim.name=distributed_fused_adam \
model.retro.retro_project_dir=/home/TestData/nlp/megatron_retro/mcore_retro/micro-wiki-core \
model.data.num_workers=4 \
model.micro_batch_size=1 \
model.data.shuffle_documents=False \
trainer.val_check_interval=30 \
+trainer.num_sanity_val_steps=0 \
model.init_method_std=0.023 \
model.optim.lr=6.0e-4 \
model.megatron_amp_O2=True \
model.data.splits_string=\'\"98,2,0\"\' \
model.data.dataloader_type=cyclic \
trainer.max_steps=20
rm -rf examples/nlp/language_modeling/mcore_retro_results
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
if: "failure()"

L2_Legacy_Megatron_RETRO_Pretraining_and_Resume_Training:
needs: [cicd-test-container-setup]
runs-on: self-hosted-azure
container:
image: nemoci.azurecr.io/nemo_container_${{ github.run_id }}
options:
# --user 0:128
--device=/dev/nvidia0
--gpus all
--shm-size=8g
--env TRANSFORMERS_OFFLINE=0
--env HYDRA_FULL_ERROR=1
--volume /mnt/datadrive/TestData:/home/TestData
steps:
- name: Checkout repository
uses: actions/checkout@v2
- run: |
python examples/nlp/language_modeling/megatron_retro_pretraining_legacy.py \
trainer.devices=2 \
trainer.num_nodes=1 \
trainer.accelerator=gpu \
Expand All @@ -3700,7 +3769,7 @@ jobs:
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
trainer.val_check_interval=10 \
exp_manager.exp_dir=examples/nlp/language_modeling/retro_results \
exp_manager.exp_dir=examples/nlp/language_modeling/retro_legacy_results \
model.data.data_prefix= \
model.data.knn_index= \
model.data.retrieval_prefix= \
Expand All @@ -3720,7 +3789,7 @@ jobs:
model.dec_cross_attention=[1] \
+model.data.mock=True
python examples/nlp/language_modeling/megatron_retro_pretraining.py \
python examples/nlp/language_modeling/megatron_retro_pretraining_legacy.py \
trainer.devices=2 \
trainer.num_nodes=1 \
trainer.accelerator=gpu \
Expand All @@ -3731,7 +3800,7 @@ jobs:
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
trainer.val_check_interval=10 \
exp_manager.exp_dir=examples/nlp/language_modeling/retro_results \
exp_manager.exp_dir=examples/nlp/language_modeling/retro_legacy_results \
model.data.data_prefix= \
model.data.knn_index= \
model.data.retrieval_prefix= \
Expand All @@ -3751,7 +3820,7 @@ jobs:
model.dec_cross_attention=[1] \
+model.data.mock=True
rm -rf examples/nlp/language_modeling/retro_results
rm -rf examples/nlp/language_modeling/retro_legacy_results
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
if: "failure()"

Expand Down Expand Up @@ -6096,3 +6165,124 @@ jobs:
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
if: "failure()"


Nemo_CICD_Test:
needs:
- L0_Unit_Tests_GPU
- L0_Unit_Tests_CPU
- L2_Community_LLM_Checkpoints_tests_Llama
- L2_Community_LLM_Checkpoints_tests_StarCoder
- L2_Community_LLM_Checkpoints_tests_Falcon
- L2_Community_LLM_Checkpoints_tests_Baichuan2
- ASR_dev_run_Speech_to_Text
- ASR_dev_run_Speech_to_Text_WPE_-_CitriNet
- ASR_dev_run_Speech_Pre-training_-_CitriNet
- ASR_dev_run_Speech_To_Text_Finetuning
- ASR_dev_run_Speech_To_Text_HF_Finetuning
- ASR_dev_run_Speech_to_Text_WPE_-_Conformer
- ASR_dev_run-part_two_Speech_to_Text_WPE_-_Squeezeformer
- L2_Speech_to_Text_EMA
- L2_Speaker_dev_run_Speaker_Recognition
- L2_Speaker_dev_run_Speaker_Diarization
- L2_Speaker_dev_run_Speech_to_Label
- L2_Speaker_dev_run_Speaker_Diarization_with_ASR_Inference
- L2_Speaker_dev_run_Clustering_Diarizer_Inference
- L2_Speaker_dev_run_Neural_Diarizer_Inference
- L2_Speaker_dev_run_Multispeaker_ASR_Data_Simulation
- L2_ASR_Multi-dataloader_dev_run_Speech_to_Text_multi-dataloader
- L2_ASR_Multi-dataloader_dev_run_Speech_to_Label_multi-dataloader
- L2_ASR_Adapters_Linear_Adapters
- L2_ASR_Adapters_RelPos_MHA_Adapters
- L2_Speech_Transcription_Speech_to_Text_Transcribe
- L2_Transducer_alignment_Running_pytest
- L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav
- L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Ru_QN_with_mp3
- L2_G2P_Models_G2P_Conformer_training_evaluation_and_inference
- L2_G2P_Models_HeteronymClassificationModel_training_evaluation_and_inference
- L2_Dialogue_Classification_Intent_and_slot_classification_using_SGDQA
- L2_Dialogue_Classification_Intent_and_slot_classification_using_IntentSlotClassificationModel
- L2_Dialogue_Classification_Intent_classification_using_ZeroShotIntentModel
- L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel
- L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel_BART_Classifier
- L2_Dialogue_Classification_Design_Intent_classification_using_DialogueNearestNeighbourModel
- L2_Dialogue_Generation_Dialogue_Answer_Extender_using_DialogueS2SGenerationModel
- L2_Dialogue_Generation_Dialogue_SGD_Based_Answer_Extender_using_DialogueS2SGenerationModel
- L2_COPY_Dialogue_Answer_Extender_using_DialogueGPTGenerationModel
- L2_Duplex_Text_Normalization_with_Tarred_dataset
- L2_BERT_Text_Classification_with_BERT_Test
- L2_Parallel_BERT_Question-Answering_SQUAD_v1_1
- L2_Parallel_BERT_Question-Answering_SQUAD_v2_0
- L2_Parallel_BART_Question-Answering_SQUAD_v1_1
- L2_Parallel_BART_Question-Answering_SQUAD_v2_0
- L2_Parallel_GPT2_Question-Answering_SQUAD_v1_1
- L2_Parallel_GPT2_Question-Answering_SQUAD_v2_0
- L2_Intent_and_Slot_Classification_Tasks_Intent_and_Slot_Classification
- L2_Intent_and_Slot_Classification_Tasks_Multi-Label_Intent_and_Slot_Classification
- L2_Parallel_NLP_Examples2_NER_finetuning_from_pretrained_Test
- L2_Parallel_NLP_Examples2_Punctuation_and_capitalization_finetuning_from_pretrained_test
- L2_Parallel_NLP_Examples2_NER_with_TurkuNLP__bert-base-finnish-cased-v1
- L2_Parallel_NLP_Examples2_Evaluation_script_for_Token_Classification
- L2_Parallel_NLP_Examples2_Evaluation_script_for_Punctuation
- L2_Parallel_NLP_Examples2_Punctuation_Capitalization_2GPUs_with_DistilBERT_Finetuning_on_other_data
- Punctuation_Capitalization_tarred_dataset_create_and_use_tarred_dataset
- Punctuation_Capitalization_Using_model-common_datasets_parameters-label_vocab_dir
- Punctuation_Capitalization_inference_Restore_punctuation_and_capitalization_in_long_text
- L2_Pretraining_BERT_pretraining_from_Text
- L2_Pretraining_BERT_from_Preprocessed
- L2_Entity_Linking_Self_Alignment_Pretraining_BERT
- L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Post-LN
- L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Pre-LN
- L2_NMT_Attention_is_All_You_Need_Training_NMT_Multi-Validation
- L2_NMT_Attention_is_All_You_Need_Inference
- L2_NMT_Attention_is_All_You_Need_Finetuning
- L2_NMT_Tarred_Dataset_Creation_Auto_Tarred_Dataset_Creation
- L2_NMT_Tarred_Dataset_Creation_Script_Tarred_Dataset_Creation
- L2_Megatron_NMT_Training_TP2
- L2_Megatron_BART_Perceiver_MIM_Training_TP2
- L2_Megatron_Bert_Pretraining_and_Resume_Training_with_Pipeline_Parallelism
- L2_Megatron_Bert_Pretraining_and_Resume_Training
- L2_Megatron_Core_Bert_Pretraining_and_Resume_Training
- L2_Legacy_Megatron_RETRO_Pretraining_and_Resume_Training
- L2_Megatron_RETRO_Pretraining_and_Resume_Training
- L2_BioMegatron_Bert_NER_Task
- L2_Megatron_GPT_Pretraining_and_Resume_Training_TP2
- L2_Megatron_GPT_with_Rope_Pretraining_and_Resume_Training_TP2
- L2_Megatron_GPT_with_ALiBi_Pretraining_and_Resume_Training_TP2
- L2_Megatron_GPT_with_KERPLE_Pretraining_and_Resume_Training_TP2
- L2_Megatron_GPT_Pretraining_and_Resume_Training_PP2
- L2_Megatron_GPT_Finetuning_PP2
- L2_Megatron_GPT_Finetuning_StarCoder_PP1
- L2_Megatron_GPT_PEFT_Lora_PP2
- L2_Megatron_GPT_PEFT_Lora_TP2
- L2_Megatron_GPT_Eval
- L2_Megatron_GPT_Eval_PP2
- L2_Megatron_GPT_SFT_Eval_inference_seq_len_greaterThan_training_seq_len
- L2_Megatron_Change_Partitions_Reduce_TP_Num_Partitions_-2_to_1-_and_PP_Num_Partitions_-1_to_2
- L2_Megatron_Change_Partitions_Increase_TP_Num_Partitions_-2_to_4-_and_PP_Num_Partitions_-1_to_2
- L2_Megatron_T5_Pretraining_and_Resume_Training_TP2
- L2_Megatron_T5_with_ALiBi_Pretraining_and_Resume_Training_TP2
- L2_Megatron_T5_with_KERPLE_Pretraining_and_Resume_Training_TP2
- L2_Megatron_T5_Pretraining_and_Resume_Training_PP2
- L2_Megatron_T5_w_Mixture_of_Expert_Pretraining
- L2_Megatron_UL2_Pretraining_and_Resume_Training_TP2
- L2_Megatron_T5_Eval
- L2_Megatron_BART_Pretraining_and_Resume_Training_TP2
- L2_Megatron_BART_Pretraining_and_Resume_Training_PP2
- L2_Megatron_T5_GLUE_RTE
- L2_Megatron_T5_GLUE_XNLI
- L2_Megatron_T5_PEFT_Lora_TP2
- L2_Megatron_Mock_Data_Generation_MockGPTDataset
- L2_Megatron_Mock_Data_Generation_MockT5Dataset
- L2_TTS_Fast_dev_runs_1_Tacotron_2
- L2_TTS_Fast_dev_runs_1_WaveGlow
- L2_TTS_Fast_dev_runs_1_FastPitch
- L2_TTS_Fast_dev_runs_1_RADTTS
- L2_TTS_Fast_dev_runs_1_Mixer-TTS
- L2_TTS_Fast_dev_runs_1_Hifigan
- Speech_Checkpoints_tests

runs-on: ubuntu-latest
steps:
# This should depend on all the tests so we block/unblock based on all tests passing
- run: exit 0

9 changes: 5 additions & 4 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ pipeline {
environment {
NVTE_FUSED_ATTN = 0
NVTE_FLASH_ATTN = 0
PYTHONPATH = "/mnt/D3/JenkinsWorkDir/workspace/NeMo-multibranch_${GIT_BRANCH}/Megatron-LM"
}

options {
Expand Down Expand Up @@ -70,7 +71,7 @@ pipeline {
git fetch origin bfe21c3d68b0a9951e5716fb520045db53419c5e && \
git checkout FETCH_HEAD && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .'
NVTE_FRAMEWORK=pytorch pip install .'
}
}

Expand All @@ -91,7 +92,6 @@ pipeline {
pip install . && \
cd megatron/core/datasets && \
make'
sh 'export PYTHONPATH="${PYTHONPATH}:/mnt/D3/JenkinsWorkDir/workspace/NeMo-multibranch_${GIT_BRANCH}/Megatron-LM"'
}
}

Expand Down Expand Up @@ -183,13 +183,14 @@ pipeline {
steps {
sh "rm -rf /home/TestData/multimodal/stable_diffusion_train"
sh "python examples/multimodal/text_to_image/stable_diffusion/sd_train.py \
trainer.precision=16 \
trainer.precision=bf16 \
trainer.num_nodes=1 \
trainer.devices=1 \
++exp_manager.max_time_per_run=00:00:03:00 \
trainer.max_steps=20 \
model.micro_batch_size=1 \
model.global_batch_size=1 \
model.optim.name=megatron_fused_adam \
model.data.synthetic_data=True \
exp_manager.exp_dir=/home/TestData/multimodal/stable_diffusion_train \
model.inductor=False \
Expand Down Expand Up @@ -220,7 +221,7 @@ pipeline {
steps {
sh "rm -rf /home/TestData/multimodal/stable_diffusion_train_with_cuda_graphs"
sh "python examples/multimodal/text_to_image/stable_diffusion/sd_train.py \
trainer.precision=16 \
trainer.precision=bf16 \
trainer.num_nodes=1 \
trainer.devices=1 \
++exp_manager.max_time_per_run=00:00:03:00 \
Expand Down
42 changes: 34 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,43 @@
Latest News
-----------

- 2023/12/06 `New NVIDIA NeMo Framework Features and NVIDIA H200 <https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/>`_
.. raw:: html

.. image:: https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png
:target: https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility
:alt: H200-NeMo-performance
:width: 600
<details open>
<summary><b>Large Language Models and Multimodal</b></summary>
<details>
<summary><a href="https://cloud.google.com/blog/products/compute/gke-and-nvidia-nemo-framework-to-train-generative-ai-models">Accelerate your generative AI journey with NVIDIA NeMo framework on GKE</a> (2024/03/16) </summary>

NeMo Framework has been updated with state-of-the-art features,
such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups up to 4.2x for Llama-2 pre-training on H200.
**All of these features will be available in an upcoming release.**
An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.
<br><br>
</details>

<details>
<summary><a href="https://blogs.nvidia.com/blog/bria-builds-responsible-generative-ai-using-nemo-picasso/">Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso</a> (2024/03/06) </summary>

Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.
<br><br>
</details>

<details>
<summary><a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/">New NVIDIA NeMo Framework Features and NVIDIA H200</a> (2023/12/06) </summary>

NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.
<br><br>
<a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility"><img src="https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png" alt="H200-NeMo-performance" style="width: 600px;"></a>
<br><br>
</details>

<details>
<summary><a href="https://blogs.nvidia.com/blog/nemo-amazon-titan/">NVIDIA now powers training for Amazon Titan Foundation models</a> (2023/11/28) </summary>

NVIDIA NeMo framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.
<br><br>
</details>

</details>




Introduction
Expand Down
Loading

0 comments on commit 494e4a1

Please sign in to comment.