Best practice for training LLaMA models in Megatron-LM
-
Updated
Jan 2, 2024 - Python
Best practice for training LLaMA models in Megatron-LM
Annotations of the interesting ML papers I read
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Odysseus: Playground of LLM Sequence Parallelism
A LLaMA1/LLaMA12 Megatron implement.
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Running Large Language Model easily.
Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP
Add a description, image, and links to the megatron-lm topic page so that developers can more easily learn about it.
To associate your repository with the megatron-lm topic, visit your repo's landing page and select "manage topics."