Releases: hpcaitech/ColossalAI
Releases · hpcaitech/ColossalAI
Version v0.3.6 Release Today!
What's Changed
Release
- [release] update version (#5411) by Hongxin Liu
Colossal-llama2
- [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong
Hotfix
- [hotfix] fix stable diffusion inference bug. (#5289) by Youngon
- [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
- [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
- [hotfix] fix typo change _descrption to _description (#5331) by digger yu
- [hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
- [hotfix] fix sd vit import error (#5420) by MickeyCHAN
- [hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
- [hotfix] fix variable type for top_p (#5313) by CZYCW
Doc
- [doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
- [doc] update some translations with README-zh-Hans.md (#5382) by digger yu
- [doc] sora release (#5425) by binmakeswell
- [doc] fix blog link by binmakeswell
- [doc] fix blog link by binmakeswell
- [doc] updated installation command (#5389) by Frank Lee
- [doc] Fix typo (#5361) by yixiaoer
Eval-hotfix
- [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li
Devops
- [devops] fix extention building (#5427) by Hongxin Liu
Example
- [example]add gpt2 benchmark example script. (#5295) by flybird11111
- [example] reuse flash attn patch (#5400) by Hongxin Liu
Workflow
Shardformer
- [shardformer]gather llama logits (#5398) by flybird11111
Setup
Fsdp
Extension
- [extension] hotfix jit extension setup (#5402) by Hongxin Liu
Llama
- [llama] fix training and inference scripts (#5384) by Hongxin Liu
Full Changelog: v0.3.6...v0.3.5
Version v0.3.5 Release Today!
What's Changed
Release
- [release] update version (#5380) by Hongxin Liu
Llama
- Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
- [llama] fix memory issue (#5371) by Hongxin Liu
- [llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
- [llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
- [llama] add flash attn patch for npu (#5362) by Hongxin Liu
- [llama] update training script (#5360) by Hongxin Liu
- [llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu
Moe
- [moe] fix tests by ver217
- [moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
- [moe] fix mixtral forward default value (#5329) by Hongxin Liu
- [moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
- [moe] support mixtral (#5309) by Hongxin Liu
- [moe] update capacity computing (#5253) by Hongxin Liu
- [moe] init mixtral impl by Xuanlei Zhao
- [moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
- [moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
- [moe] merge moe into main (#4978) by Xuanlei Zhao
Lr-scheduler
- [lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu
Eval
- [eval] update llama npu eval (#5366) by Camille Zhong
Gemini
- [gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
- [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
- [gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
- [gemini] gemini support extra-dp (#5043) by flybird11111
- [gemini] gemini support tensor parallelism. (#4942) by flybird11111
Fix
- [fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen
Checkpointio
- [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu
Chat
Extension
Doc
- [doc] added docs for extensions (#5324) by Frank Lee
- [doc] add llama2-13B disyplay (#5285) by Desperado-Jia
- [doc] fix doc typo (#5256) by binmakeswell
- [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
- [doc] SwiftInfer release (#5236) by binmakeswell
- [doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
- [doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
- [doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
- [doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
- [doc] update pytorch version in documents. (#5177) by flybird11111
- [doc] fix colossalqa document (#5146) by Michelle
- [doc] updated paper citation (#5131) by Frank Lee
- [doc] add moe news (#5128) by binmakeswell
Tests
- [tests] fix t5 test. (#5322) by flybird11111
Accelerator
- Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
- [accelerator] fixed npu api by FrankLeeeee
- [accelerator] init the accelerator module (#5129) by Frank Lee
Workflow
- [workflow] updated CI image (#5318) by Frank Lee
- [workflow] fixed oom tests (#5275) by Frank Lee
- [workflow] fixed incomplete bash command (#5272) by Frank Lee
- [workflow] fixed build CI (#5240) by Frank Lee
Feat
Nfc
- [NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
- [nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
- [nfc] fix typo change directoty to directory (#5111) by digger yu
- [nfc] fix typo and author name (#5089) by digger yu
- [nfc] fix typo in docs/ (#4972) by digger yu
Hotfix
- [hotfix] fix 3d plugin test (#5292) by Hongxin Liu
- [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
- [hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
- [hotfix] removed unused flag (#5242) by Frank Lee
- [hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
- [Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
- [hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
- [hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
- [hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
- [hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang
Sync
Shardformer
- [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
- [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
- [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
- [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
- [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
- [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao
Ci
- [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
- [ci] fix shardformer tests. (#5255) by flybird11111
- [ci] fixed ddp test (#5254) by Frank Lee
- [ci] fixed booster test (#5251) by Frank Lee
Npu
- [npu] change device to accelerator api (#5239) by Hongxin Liu
- [npu] use extension for op builder (#5172) by Xuanlei Zhao
- [npu] support triangle attention for llama (#5130) by Xuanlei Zhao
- [npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
- [npu] add npu support for gemini and zero (#5067) by Hongxin Liu
Pipeline
- [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
- [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
- [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
- [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen
Format
- [format] applied code formatting on changed files in pull request 5234 (#5235) by [github-actions[bot]](https://api.github.com/users/githu...
Version v0.3.4 Release Today!
What's Changed
Release
- [release] update version (#4995) by Hongxin Liu
Pipeline inference
- [Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
- [Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
- [Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia
Doc
- [doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
- [doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
- Merge pull request #4889 from ppt0011/main by ppt0011
- [doc] add reminder for issue encountered with hybrid adam by ppt0011
- [doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
- Merge pull request #4858 from Shawlleyw/main by ppt0011
- [doc] update slack link (#4823) by binmakeswell
- [doc] add lazy init docs (#4808) by Hongxin Liu
- Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
- [doc] polish shardformer doc (#4779) by Baizhou Zhang
- [doc] add llama2 domain-specific solution news (#4789) by binmakeswell
Hotfix
- [hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
- [hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
- [hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
- [hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
- [hotfix] fix bug in sequence parallel test (#4887) by littsk
- [hotfix] Correct several erroneous code comments (#4794) by littsk
- [hotfix] fix norm type error in zero optimizer (#4795) by littsk
- [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing
Kernels
- [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li
Inference
- [Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
- [Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
- [inference] add reference and fix some bugs (#4937) by Xu Kai
- [inference] Add smmoothquant for llama (#4904) by Xu Kai
- [inference] add llama2 support (#4898) by Xu Kai
- [inference]fix import bug and delete down useless init (#4830) by Jianghai
Test
- [test] merge old components to test to model zoo (#4945) by Hongxin Liu
- [test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
- Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
- [test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao
Refactor
- [Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li
Nfc
- [nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
- [nfc] fix minor typo in README (#4846) by Blagoy Simandoff
- [NFC] polish code style (#4799) by Camille Zhong
- [NFC] polish colossalai/inference/quant/gptq/cai_gptq/init.py code style (#4792) by Michelle
Format
- [format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]
Gemini
- [gemini] support gradient accumulation (#4869) by Baizhou Zhang
- [gemini] support amp o3 for gemini (#4872) by Hongxin Liu
Kernel
- [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu
Feature
- [feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
- [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
- [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen
Checkpointio
- [checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
- [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang
Infer
- [infer] fix test bug (#4838) by Xu Kai
- [Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
- [Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao
Chat
- [chat] fix gemini strategy (#4698) by flybird11111
Misc
Lazy
- [lazy] support from_pretrained (#4801) by Hongxin Liu
Fix
- [fix] fix weekly runing example (#4787) by flybird11111
Full Changelog: v0.3.4...v0.3.3
Version v0.3.3 Release Today!
What's Changed
Release
- [release] update version (#4775) by Hongxin Liu
Inference
Feature
- [feature] add gptq for inference (#4754) by Xu Kai
- [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li
Bug
- [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
- [bug] fix get_default_parser in examples (#4764) by Baizhou Zhang
Lazy
- [lazy] support torch 2.0 (#4763) by Hongxin Liu
Chat
- [chat]: add lora merge weights config (#4766) by Wenhao Chen
- [chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen
Doc
- [doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
- [doc] clean up outdated docs (#4765) by Hongxin Liu
- Merge pull request #4757 from ppt0011/main by ppt0011
- [doc] put native colossalai plugins first in description section by Pengtai Xu
- [doc] add model examples for each plugin by Pengtai Xu
- [doc] put individual plugin explanation in front by Pengtai Xu
- [doc] explain suitable use case for each plugin by Pengtai Xu
- [doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
- [doc] polish shardformer doc (#4735) by Baizhou Zhang
- [doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
- [doc] Add user document for Shardformer (#4702) by Baizhou Zhang
- [doc] fix llama2 code link (#4726) by binmakeswell
- [doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
- [doc] Update booster user documents. (#4669) by Baizhou Zhang
Shardformer
- [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
- [shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
- [shardformer] update seq parallel document (#4730) by Bin Jia
- [shardformer] update pipeline parallel document (#4725) by flybird11111
- [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
- [shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
- [shardformer] update shardformer readme (#4689) by flybird11111
- [shardformer]fix gpt2 double head (#4663) by flybird11111
- [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
- [shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242
Misc
- [misc] update pre-commit and run all files (#4752) by Hongxin Liu
Format
- [format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]
Legacy
- [legacy] clean up legacy code (#4743) by Hongxin Liu
- Merge pull request #4738 from ppt0011/main by ppt0011
- [legacy] remove deterministic data loader test by Pengtai Xu
- [legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu
Kernel
- [kernel] update triton init #4740 (#4740) by Xuanlei Zhao
Example
- [example] llama2 add fine-tune example (#4673) by flybird11111
- [example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
- [example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang
Hotfix
- [hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
- [hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang
Devops
- [devops] fix concurrency group (#4667) by Hongxin Liu
- [devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu
Pipeline
- [pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang
Full Changelog: v0.3.3...v0.3.2
Version v0.3.2 Release Today!
What's Changed
Release
- [release] update version (#4623) by Hongxin Liu
Shardformer
- Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
- [shardformer] update shardformer readme (#4617) by flybird11111
- [shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
- [shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
- [shardformer] Pytree fix (#4533) by Jianghai
- [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
- [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
- [shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
- [shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
- [shardformer] fix opt test hanging (#4521) by flybird11111
- [shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
- [shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
- [shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
- [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
- [shardformer] opt fix. (#4514) by flybird11111
- [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
- [shardformer] tests for 3d parallel (#4493) by Jianghai
- [shardformer] chatglm support sequence parallel (#4482) by flybird11111
- [shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
- [shardformer] Pipeline/whisper (#4456) by Jianghai
- [shardformer] bert support sequence parallel. (#4455) by flybird11111
- [shardformer] bloom support sequence parallel (#4465) by flybird11111
- [shardformer] support interleaved pipeline (#4448) by LuGY
- [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
- [shardformer] fix import by ver217
- [shardformer] fix embedding by ver217
- [shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
- [shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
- [shardformer] update tests for all optimization (#4413) by flybird11111
- [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
- [shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
- [shardformer] test all optimizations (#4399) by flybird1111
- [shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
- [Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
- [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
- [shardformer] support Blip2 (#4243) by FoolPlayer
- [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
- [shardformer] pre-commit check files by klhhhhh
- [shardformer] register without auto policy by klhhhhh
- [shardformer] ChatGLM support layernorm sharding by klhhhhh
- [shardformer] delete some file by klhhhhh
- [shardformer] support chatglm without layernorm by klhhhhh
- [shardformer] polish code by klhhhhh
- [shardformer] polish chatglm code by klhhhhh
- [shardformer] add test kit in model zoo for chatglm by klhhhhh
- [shardformer] vit test finish and support by klhhhhh
- [shardformer] added tests by klhhhhh
- Feature/chatglm (#4240) by Kun Lin
- [shardformer] support whisper (#4212) by FoolPlayer
- [shardformer] support SAM (#4231) by FoolPlayer
- Feature/vit support (#4182) by Kun Lin
- [shardformer] support pipeline base vit model (#4284) by FoolPlayer
- [shardformer] support inplace sharding (#4251) by Hongxin Liu
- [shardformer] fix base policy (#4229) by Hongxin Liu
- [shardformer] support lazy init (#4202) by Hongxin Liu
- [shardformer] fix type hint by ver217
- [shardformer] rename policy file name by ver217
Legacy
- [legacy] move builder and registry to legacy (#4603) by Hongxin Liu
- [legacy] move engine to legacy (#4560) by Hongxin Liu
- [legacy] move trainer to legacy (#4545) by Hongxin Liu
Test
- [test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
- [test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
- [test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
- [test] skip some not compatible models by FoolPlayer
- [test] add shard util tests by ver217
- [test] update shardformer tests by ver217
- [test] remove useless tests (#4359) by Hongxin Liu
Zero
- [zero] hotfix master param sync (#4618) by Hongxin Liu
- [zero]fix zero ckptIO with offload (#4529) by LuGY
- [zero]support zero2 with gradient accumulation (#4511) by LuGY
Checkpointio
- [checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
- [checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu
Coati
- Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
- Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
- [coati] update ci by ver217
- [coati] add chatglm model (#4539) by yingliu-hpc
Doc
- [doc] add llama2 benchmark (#4604) by binmakeswell
- [DOC] hotfix/llama2news (#4595) by binmakeswell
- [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
- [doc] update Coati README (#4405) by Wenhao Chen
- [doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
- [doc] Fix gradient accumulation doc. (#4349) by flybird1111
Pipeline
- [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
- [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
- [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
- [pipeline] add chatglm (#4363) by Jianghai
- [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
- [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
- [pipeline] add unit test for 1f1b (#4303) by LuGY
- [pipeline] fix return_dict/fix pure_pipeline_test (#4331) by [Baizhou Zhang](htt...
Version v0.3.1 Release Today!
What's Changed
Release
- [release] update version (#4332) by Hongxin Liu
Chat
- [chat] fix compute_approx_kl (#4338) by Wenhao Chen
- [chat] removed cache file (#4155) by Frank Lee
- [chat] use official transformers and fix some issues (#4117) by Wenhao Chen
- [chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
- [chat] refactor trainer class (#4080) by Wenhao Chen
- [chat]: fix chat evaluation possible bug (#4064) by Michelle
- [chat] refactor strategy class with booster api (#3987) by Wenhao Chen
- [chat] refactor actor class (#3968) by Wenhao Chen
- [chat] add distributed PPO trainer (#3740) by Hongxin Liu
Zero
- [zero] optimize the optimizer step time (#4221) by LuGY
- [zero] support shard optimizer state dict of zero (#4194) by LuGY
- [zero] add state dict for low level zero (#4179) by LuGY
- [zero] allow passing process group to zero12 (#4153) by LuGY
- [zero]support no_sync method for zero1 plugin (#4138) by LuGY
- [zero] refactor low level zero for shard evenly (#4030) by LuGY
Nfc
- [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
- [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
- [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
- [NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
- [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
- [NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
- [NFC] fix: format (#4270) by dayellow
- [NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
- [NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
- [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
- [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
- [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
- [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
- [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
- [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
- [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
- [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
- [NFC] Fix format for mixed precision (#4253) by Jianghai
- [nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
- [nfc] fix dim not defined and fix typo (#3991) by digger yu
- [nfc] fix typo colossalai/zero (#3923) by digger yu
- [nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
- [nfc] fix typo colossalai/nn (#3887) by digger yu
- [nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu
Example
- Fix/format (#4261) by Michelle
- [example] add llama pretraining (#4257) by binmakeswell
- [example] fix bucket size in example of gpt gemini (#4028) by LuGY
- [example] update ViT example using booster api (#3940) by Baizhou Zhang
- Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
- [example] update opt example using booster api (#3918) by Baizhou Zhang
- [example] Modify palm example with the new booster API (#3913) by Liu Ziming
- [example] update gemini examples (#3868) by jiangmingyan
Ci
- [ci] support testmon core pkg change detection (#4305) by Hongxin Liu
Checkpointio
- [checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
- Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
- [checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
- [checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang
Lazy
- [lazy] support init on cuda (#4269) by Hongxin Liu
- [lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
- [lazy] refactor lazy init (#3891) by Hongxin Liu
Kernels
- [Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li
Docker
- [docker] fixed ninja build command (#4203) by Frank Lee
- [docker] added ssh and rdma support for docker (#4192) by Frank Lee
Dtensor
- [dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
- [dtensor] updated api and doc (#3845) by Frank Lee
Workflow
- [workflow] show test duration (#4159) by Frank Lee
- [workflow] added status check for test coverage workflow (#4106) by Frank Lee
- [workflow] cover all public repositories in weekly report (#4069) by Frank Lee
- [workflow] fixed the directory check in build (#3980) by Frank Lee
- [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
- [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
- [workflow] added docker latest tag for release (#3920) by Frank Lee
- [workflow] fixed workflow check for docker build (#3849) by Frank Lee
Cli
- [cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu
Format
- [format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]
Shardformer
- [shardformer] added development protocol for standardization (#4149) by Frank Lee
- [shardformer] made tensor parallelism configurable (#4144) by Frank Lee
- [shardformer] refactored some doc and api (#4137) by Frank Lee
- [shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
- [shardformer] added embedding gradient check (#4124) by Frank Lee
- [shardformer] import huggingface implicitly (#4101) by Frank Lee
- [shardformer] integrate with data parallelism (#4103) by Frank Lee
- [shardformer] supported fused normalization (#4112) by Frank Lee
- [shardformer] supported bloom model (#4098) by Frank Lee
- [shardformer] support vision transformer (#4096) by Kun Lin
- [shardformer] shardformer support opt models (#4091) by jiangmingyan
- [shardformer] refactored layernorm (#4086) by Frank Lee
- [shardformer] Add layernorm (#4072) by [FoolPlayer](https://api.github.co...
Version v0.3.0 Release Today!
What's Changed
Release
Nfc
- [nfc] fix typo colossalai/ applications/ (#3831) by digger yu
- [NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
- [NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
- [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
- [NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
- [NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
- [NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
- [NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
- [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
- [NFC] polish initializer_data.py code style (#3287) by RichardoLuo
- [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
- [NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
- [NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
- [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
- [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
- [NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
- [NFC] polish code style (#3273) by Xuanlei Zhao
- [NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
- [NFC] polish code style (#3268) by Yuanchen
- [NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
- [NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
- [NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
- [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
- [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow
Doc
- [doc] update document of gemini instruction. (#3842) by jiangmingyan
- Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
- [doc]fix by jiangmingyan
- [doc]fix by jiangmingyan
- [doc] add warning about fsdp plugin (#3813) by Hongxin Liu
- [doc] add removed change of config.py by jiangmingyan
- [doc] add removed warning by jiangmingyan
- [doc] update amp document by Mingyan Jiang
- [doc] update amp document by Mingyan Jiang
- [doc] update amp document by Mingyan Jiang
- [doc] update gradient accumulation (#3771) by jiangmingyan
- [doc] update gradient cliping document (#3778) by jiangmingyan
- [doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
- [doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
- [doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
- [doc] add tutorial for booster plugins (#3758) by Hongxin Liu
- [doc] add tutorial for cluster utils (#3763) by Hongxin Liu
- [doc] update hybrid parallelism doc (#3770) by jiangmingyan
- [doc] update booster tutorials (#3718) by jiangmingyan
- [doc] fix chat spelling error (#3671) by digger-yu
- [Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
- [doc] Fix typo under colossalai and doc(#3618) by digger-yu
- [doc] .github/workflows/README.md (#3605) by digger-yu
- [doc] fix setup.py typo (#3603) by digger-yu
- [doc] fix op_builder/README.md (#3597) by digger-yu
- [doc] Update .github/workflows/README.md (#3577) by digger-yu
- [doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
- [doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
- [doc] Update README.md (#3549) by digger-yu
- [doc] Update README-zh-Hans.md (#3541) by digger-yu
- [doc] hide diffusion in application path (#3519) by binmakeswell
- [doc] add requirement and highlight application (#3516) by binmakeswell
- [doc] Add docs for clip args in zero optim (#3504) by YH
- [doc] updated contributor list (#3474) by Frank Lee
- [doc] polish diffusion example (#3386) by Jan Roudaut
- [doc] add Intel cooperation news (#3333) by binmakeswell
- [doc] added authors to the chat application (#3307) by Fazzie-Maqianli
Workflow
- [workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
- [workflow] fixed testmon cache in build CI (#3806) by Frank Lee
- [workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
- [workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
- [workflow] enable testing for develop & feature branch (#3801) by Frank Lee
- [workflow] fixed the docker build workflow (#3794) by Frank Lee
Booster
- [booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
- [booster] torch fsdp fix ckpt (#3788) by wukong1992
- [booster] removed models that don't support fsdp (#3744) by wukong1992
- [booster] support torch fsdp plugin in booster (#3697) by wukong1992
- [booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
- [booster] fix no_sync method (#3709) by Hongxin Liu
- [booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
- [booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
- [booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
- [booster] add low level zero plugin (#3594) by Hongxin Liu
- [booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
- [booster] implement Gemini plugin (#3352) by ver217
Docs
Evaluation
Docker
Api
- [API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan
Test
- [test] fixed lazy init test import error (#3799) by Frank Lee
- Update test_ci.sh by Camille Zhong
- [test] refactor tests with spawn (#3452) by Frank Lee
- [test] reorganize zero/gem...
Version v0.2.8 Release Today!
What's Changed
Release
Format
- [format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]
Doc
- [doc] add ColossalChat news (#3304) by binmakeswell
- [doc] add ColossalChat (#3297) by binmakeswell
- [doc] fix typo (#3222) by binmakeswell
- [doc] update chatgpt doc paper link (#3229) by Camille Zhong
- [doc] add community contribution guide (#3153) by binmakeswell
- [doc] add Intel cooperation for biomedicine (#3108) by binmakeswell
Application
Chat
Coati
- [coati] fix inference profanity check (#3299) by ver217
- [coati] inference supports profanity check (#3295) by ver217
- [coati] add repetition_penalty for inference (#3294) by ver217
- [coati] fix inference output (#3285) by ver217
- [Coati] first commit (#3283) by Fazzie-Maqianli
Colossalchat
- [ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli
Examples
- [examples] polish AutoParallel readme (#3270) by YuliangLiu0306
- [examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323
Fx
- [fx] meta registration compatibility (#3253) by HELSON
- [FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306
Booster
- [booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
- [booster] implemented the cluster module (#3191) by Frank Lee
- [booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
- [booster] added the accelerator implementation (#3159) by Frank Lee
- [booster] implemented mixed precision class (#3151) by Frank Lee
Ci
- [CI] Fix pre-commit workflow (#3238) by Hakjin Lee
Api
- [API] implement device mesh manager (#3221) by YuliangLiu0306
- [api] implemented the checkpoint io module (#3205) by Frank Lee
Hotfix
- [hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
- [hotfix] layout converting issue (#3188) by YuliangLiu0306
Chatgpt
- [chatgpt] add precision option for colossalai (#3233) by ver217
- [chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
- [chatgpt] support instuct training (#3216) by Fazzie-Maqianli
- [chatgpt]add reward model code for deberta (#3199) by Yuanchen
- [chatgpt]support llama (#3070) by Fazzie-Maqianli
- [chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
- [chatgpt]Reward Model Training Process update (#3133) by BlueRum
- [chatgpt] fix trainer generate kwargs (#3166) by ver217
- [chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
- [chatgpt]update ci (#3087) by BlueRum
- [chatgpt]Fix examples (#3116) by BlueRum
- [chatgpt] fix lora support for gpt (#3113) by BlueRum
- [chatgpt] type miss of kwargs (#3107) by hiko2MSP
- [chatgpt] fix lora save bug (#3099) by BlueRum
Lazyinit
- [lazyinit] combine lazy tensor with dtensor (#3204) by ver217
- [lazyinit] add correctness verification (#3147) by ver217
- [lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217
Auto
Analyzer
- [Analyzer] fix analyzer tests (#3197) by YuliangLiu0306
Dreambooth
- [dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323
Auto-parallel
Zero
Test
- [test] fixed torchrec registration in model zoo (#3177) by Frank Lee
- [test] fixed torchrec model test (#3167) by Frank Lee
- [test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
- [test] added transformers models to test model zoo (#3135) by Frank Lee
- [test] added torchvision models to test model zoo (#3132) by Frank Lee
- [test] added timm models to test model zoo (#3129) by Frank Lee
Refactor
- [refactor] update docs (#3174) by Saurav Maheshkar
Tests
- [tests] model zoo add torchaudio models (#3138) by ver217
- [tests] diffuser models in model zoo (#3136) by HELSON
Docker
- [docker] Add opencontainers image-spec to
Dockerfile
(#3006) by Saurav Maheshkar
Dtensor
- [DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306
Workflow
Autochunk
- [autochunk] support complete benchmark (#3121) by Xuanlei Zhao
Tutorial
- [tutorial] update notes for TransformerEngine (#3098) by binmakeswell
Nvidia
- [NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani
Full Changelog: v0.2.8...v0.2.7
Version v0.2.7 Release Today!
What's Changed
Release
Chatgpt
- [chatgpt]add flag of action mask in critic(#3086) by Fazzie-Maqianli
- [chatgpt] change critic input as state (#3042) by wenjunyang
- [chatgpt] fix readme (#3025) by BlueRum
- [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
- [chatgpt]fix inference model load (#2988) by BlueRum
- [chatgpt] allow shard init and display warning (#2986) by ver217
- [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
- [chatgpt] making experience support dp (#2971) by ver217
- [chatgpt]fix lora bug (#2974) by BlueRum
- [chatgpt] fix inference demo loading bug (#2969) by BlueRum
- [ChatGPT] fix README (#2966) by Fazzie-Maqianli
- [chatgpt]add inference example (#2944) by BlueRum
- [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
- [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
- [chatgpt] fix rm eval (#2829) by BlueRum
- [chatgpt] add test checkpoint (#2797) by ver217
- [chatgpt] update readme about checkpoint (#2792) by ver217
- [chatgpt] startegy add prepare method (#2766) by ver217
- [chatgpt] disable shard init for colossalai (#2767) by ver217
- [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
- [chatgpt]fix train_rm bug with lora (#2741) by BlueRum
Kernel
- [kernel] added kernel loader to softmax autograd function (#3093) by Frank Lee
- [kernel] cached the op kernel and fixed version check (#2886) by Frank Lee
Analyzer
- [analyzer] a minimal implementation of static graph analyzer (#2852) by Super Daniel
Diffusers
- [diffusers] fix ci and docker (#3085) by Fazzie-Maqianli
Doc
- [doc] fixed typos in docs/README.md (#3082) by Frank Lee
- [doc] moved doc test command to bottom (#3075) by Frank Lee
- [doc] specified operating system requirement (#3019) by Frank Lee
- [doc] update nvme offload doc (#3014) by ver217
- [doc] add ISC tutorial (#2997) by binmakeswell
- [doc] add deepspeed citation and copyright (#2996) by ver217
- [doc] added reference to related works (#2994) by Frank Lee
- [doc] update news (#2983) by binmakeswell
- [doc] fix chatgpt inference typo (#2964) by binmakeswell
- [doc] add env scope (#2933) by binmakeswell
- [doc] added readme for documentation (#2935) by Frank Lee
- [doc] removed read-the-docs (#2932) by Frank Lee
- [doc] update installation for GPT (#2922) by binmakeswell
- [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
- [doc] fix GPT tutorial (#2860) by dawei-wang
- [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
- [doc] update OPT serving (#2804) by binmakeswell
- [doc] update example and OPT serving link (#2769) by binmakeswell
- [doc] add opt service doc (#2747) by Frank Lee
- [doc] fixed a typo in GPT readme (#2736) by cloudhuang
- [doc] updated documentation version list (#2730) by Frank Lee
Autochunk
- [autochunk] support vit (#3084) by Xuanlei Zhao
- [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao
Dtensor
- [DTensor] implement layout converter (#3055) by YuliangLiu0306
- [DTensor] refactor CommSpec (#3034) by YuliangLiu0306
- [DTensor] refactor sharding spec (#2987) by YuliangLiu0306
- [DTensor] implementation of dtensor (#2946) by YuliangLiu0306
Workflow
- [workflow] fixed doc build trigger condition (#3072) by Frank Lee
- [workflow] supported conda package installation in doc test (#3028) by Frank Lee
- [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
- [workflow] added auto doc test on PR (#2929) by Frank Lee
- [workflow] moved pre-commit to post-commit (#2895) by Frank Lee
Booster
Example
- [example] fix redundant note (#3065) by binmakeswell
- [example] fixed opt model downloading from huggingface by Tomek
- [example] add LoRA support (#2821) by Haofan Wang
Hotfix
- [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
- [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
- [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
- [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
- [hotfix] fix chunk size can not be divided (#2867) by HELSON
- Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
- [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
- [hotfix] add correct device for fake_param (#2796) by HELSON
Revert] recover "[refactor
Format
- [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]
Pipeline
- [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang
Fx
- [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel
Refactor
- [refactor] restructure configuration files (#2977) by Saurav Maheshkar
Misc
Autoparallel
- [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
- [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
- [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.where
(#2822) by Boyuan Yao - [autoparallel] Patch meta information of
torch.tanh()
andtorch.nn.Dropout
(#2773) by Boyuan Yao - [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
- [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.nn.Embedding
(#2760) by [Boyuan Ya...
Version v0.2.6 Release Today!
What's Changed
Release
Doc
- [doc] moved doc test command to bottom (#3075) by Frank Lee
- [doc] specified operating system requirement (#3019) by Frank Lee
- [doc] update nvme offload doc (#3014) by ver217
- [doc] add ISC tutorial (#2997) by binmakeswell
- [doc] add deepspeed citation and copyright (#2996) by ver217
- [doc] added reference to related works (#2994) by Frank Lee
- [doc] update news (#2983) by binmakeswell
- [doc] fix chatgpt inference typo (#2964) by binmakeswell
- [doc] add env scope (#2933) by binmakeswell
- [doc] added readme for documentation (#2935) by Frank Lee
- [doc] removed read-the-docs (#2932) by Frank Lee
- [doc] update installation for GPT (#2922) by binmakeswell
- [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
- [doc] fix GPT tutorial (#2860) by dawei-wang
- [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
- [doc] update OPT serving (#2804) by binmakeswell
- [doc] update example and OPT serving link (#2769) by binmakeswell
- [doc] add opt service doc (#2747) by Frank Lee
- [doc] fixed a typo in GPT readme (#2736) by cloudhuang
- [doc] updated documentation version list (#2730) by Frank Lee
Workflow
- [workflow] fixed doc build trigger condition (#3072) by Frank Lee
- [workflow] supported conda package installation in doc test (#3028) by Frank Lee
- [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
- [workflow] added auto doc test on PR (#2929) by Frank Lee
- [workflow] moved pre-commit to post-commit (#2895) by Frank Lee
Booster
Example
- [example] fix redundant note (#3065) by binmakeswell
- [example] fixed opt model downloading from huggingface by Tomek
- [example] add LoRA support (#2821) by Haofan Wang
Autochunk
- [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao
Chatgpt
- [chatgpt] change critic input as state (#3042) by wenjunyang
- [chatgpt] fix readme (#3025) by BlueRum
- [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
- [chatgpt]fix inference model load (#2988) by BlueRum
- [chatgpt] allow shard init and display warning (#2986) by ver217
- [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
- [chatgpt] making experience support dp (#2971) by ver217
- [chatgpt]fix lora bug (#2974) by BlueRum
- [chatgpt] fix inference demo loading bug (#2969) by BlueRum
- [ChatGPT] fix README (#2966) by Fazzie-Maqianli
- [chatgpt]add inference example (#2944) by BlueRum
- [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
- [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
- [chatgpt] fix rm eval (#2829) by BlueRum
- [chatgpt] add test checkpoint (#2797) by ver217
- [chatgpt] update readme about checkpoint (#2792) by ver217
- [chatgpt] startegy add prepare method (#2766) by ver217
- [chatgpt] disable shard init for colossalai (#2767) by ver217
- [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
- [chatgpt]fix train_rm bug with lora (#2741) by BlueRum
Dtensor
- [DTensor] refactor CommSpec (#3034) by YuliangLiu0306
- [DTensor] refactor sharding spec (#2987) by YuliangLiu0306
- [DTensor] implementation of dtensor (#2946) by YuliangLiu0306
Hotfix
- [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
- [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
- [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
- [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
- [hotfix] fix chunk size can not be divided (#2867) by HELSON
- Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
- [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
- [hotfix] add correct device for fake_param (#2796) by HELSON
Revert] recover "[refactor
Format
- [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]
Pipeline
- [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang
Fx
- [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel
Refactor
- [refactor] restructure configuration files (#2977) by Saurav Maheshkar
Kernel
Misc
Autoparallel
- [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
- [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
- [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.where
(#2822) by Boyuan Yao - [autoparallel] Patch meta information of
torch.tanh()
andtorch.nn.Dropout
(#2773) by Boyuan Yao - [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
- [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.nn.Embedding
(#2760) by Boyuan Yao - [autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306
Zero
- [zero] trivial zero optimizer refactoring (#2869) by YH
- [zero] fix wrong import (#2777) by Boyuan Yao
Cli
Triton
Nfc
- [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) by Michelle
- [NFC] polish code format by binmakeswell
- [NFC] polish colossala...