Releases · hpcaitech/ColossalAI

07 Mar 15:38

github-actions

v0.3.6

8020f42

Version v0.3.6 Release Today!

What's Changed

Release

[release] update version (#5411) by Hongxin Liu

Colossal-llama2

[colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

[hotfix] fix stable diffusion inference bug. (#5289) by Youngon
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
[hotfix] fix typo change _descrption to _description (#5331) by digger yu
[hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
[hotfix] fix sd vit import error (#5420) by MickeyCHAN
[hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
[hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

[doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
[doc] update some translations with README-zh-Hans.md (#5382) by digger yu
[doc] sora release (#5425) by binmakeswell
[doc] fix blog link by binmakeswell
[doc] fix blog link by binmakeswell
[doc] updated installation command (#5389) by Frank Lee
[doc] Fix typo (#5361) by yixiaoer

Eval-hotfix

[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

[devops] fix extention building (#5427) by Hongxin Liu

Example

[example]add gpt2 benchmark example script. (#5295) by flybird11111
[example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

[workflow] added pypi channel (#5412) by Frank Lee

Shardformer

[shardformer]gather llama logits (#5398) by flybird11111

Setup

[setup] fixed nightly release (#5388) by Frank Lee

Fsdp

[fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

[extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

[llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: v0.3.6...v0.3.5

Assets 2

23 Feb 08:46

github-actions

v0.3.5

adae123

Version v0.3.5 Release Today!

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by [github-actions[bot]](https://api.github.com/users/githu...

Assets 2

01 Nov 05:57

github-actions

v0.3.4

8993c8a

Version v0.3.4 Release Today!

What's Changed

Release

[release] update version (#4995) by Hongxin Liu

Pipeline inference

[Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
[Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
[Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

[doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
[doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
Merge pull request #4889 from ppt0011/main by ppt0011
[doc] add reminder for issue encountered with hybrid adam by ppt0011
[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
Merge pull request #4858 from Shawlleyw/main by ppt0011
[doc] update slack link (#4823) by binmakeswell
[doc] add lazy init docs (#4808) by Hongxin Liu
Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
[doc] polish shardformer doc (#4779) by Baizhou Zhang
[doc] add llama2 domain-specific solution news (#4789) by binmakeswell

Hotfix

[hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
[hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
[hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
[hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
[hotfix] fix bug in sequence parallel test (#4887) by littsk
[hotfix] Correct several erroneous code comments (#4794) by littsk
[hotfix] fix norm type error in zero optimizer (#4795) by littsk
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

[Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
[Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
[inference] add reference and fix some bugs (#4937) by Xu Kai
[inference] Add smmoothquant for llama (#4904) by Xu Kai
[inference] add llama2 support (#4898) by Xu Kai
[inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

[test] merge old components to test to model zoo (#4945) by Hongxin Liu
[test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
[test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

[Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

[nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
[nfc] fix minor typo in README (#4846) by Blagoy Simandoff
[NFC] polish code style (#4799) by Camille Zhong
[NFC] polish colossalai/inference/quant/gptq/cai_gptq/init.py code style (#4792) by Michelle

Format

[format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]

Gemini

[gemini] support gradient accumulation (#4869) by Baizhou Zhang
[gemini] support amp o3 for gemini (#4872) by Hongxin Liu

Kernel

[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

[feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

[checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang

Infer

[infer] fix test bug (#4838) by Xu Kai
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao

Chat

[chat] fix gemini strategy (#4698) by flybird11111

Misc

[misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

[lazy] support from_pretrained (#4801) by Hongxin Liu

Fix

[fix] fix weekly runing example (#4787) by flybird11111

Full Changelog: v0.3.4...v0.3.3

Assets 2

22 Sep 10:30

github-actions

v0.3.3

4146f1c

Version v0.3.3 Release Today!

What's Changed

Release

[release] update version (#4775) by Hongxin Liu

Inference

[inference] chatglm2 infer demo (#4724) by Jianghai

Feature

[feature] add gptq for inference (#4754) by Xu Kai
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
[bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

[lazy] support torch 2.0 (#4763) by Hongxin Liu

Chat

[chat]: add lora merge weights config (#4766) by Wenhao Chen
[chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen

Doc

[doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
[doc] clean up outdated docs (#4765) by Hongxin Liu
Merge pull request #4757 from ppt0011/main by ppt0011
[doc] put native colossalai plugins first in description section by Pengtai Xu
[doc] add model examples for each plugin by Pengtai Xu
[doc] put individual plugin explanation in front by Pengtai Xu
[doc] explain suitable use case for each plugin by Pengtai Xu
[doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
[doc] polish shardformer doc (#4735) by Baizhou Zhang
[doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
[doc] Add user document for Shardformer (#4702) by Baizhou Zhang
[doc] fix llama2 code link (#4726) by binmakeswell
[doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
[doc] Update booster user documents. (#4669) by Baizhou Zhang

Shardformer

[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
[shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
[shardformer] update seq parallel document (#4730) by Bin Jia
[shardformer] update pipeline parallel document (#4725) by flybird11111
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
[shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
[shardformer] update shardformer readme (#4689) by flybird11111
[shardformer]fix gpt2 double head (#4663) by flybird11111
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242

Misc

[misc] update pre-commit and run all files (#4752) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]

Legacy

[legacy] clean up legacy code (#4743) by Hongxin Liu
Merge pull request #4738 from ppt0011/main by ppt0011
[legacy] remove deterministic data loader test by Pengtai Xu
[legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu

Kernel

[kernel] update triton init #4740 (#4740) by Xuanlei Zhao

Example

[example] llama2 add fine-tune example (#4673) by flybird11111
[example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
[example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang

Hotfix

[hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
[hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang

Devops

[devops] fix concurrency group (#4667) by Hongxin Liu
[devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu

Pipeline

[pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang

Full Changelog: v0.3.3...v0.3.2

Assets 2

06 Sep 15:42

github-actions

v0.3.2

9709b8f

Version v0.3.2 Release Today!

What's Changed

Release

[release] update version (#4623) by Hongxin Liu

Shardformer

Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
[shardformer] update shardformer readme (#4617) by flybird11111
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
[shardformer] Pytree fix (#4533) by Jianghai
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
[shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
[shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
[shardformer] fix opt test hanging (#4521) by flybird11111
[shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
[shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
[shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
[shardformer] opt fix. (#4514) by flybird11111
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
[shardformer] tests for 3d parallel (#4493) by Jianghai
[shardformer] chatglm support sequence parallel (#4482) by flybird11111
[shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
[shardformer] Pipeline/whisper (#4456) by Jianghai
[shardformer] bert support sequence parallel. (#4455) by flybird11111
[shardformer] bloom support sequence parallel (#4465) by flybird11111
[shardformer] support interleaved pipeline (#4448) by LuGY
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
[shardformer] fix import by ver217
[shardformer] fix embedding by ver217
[shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
[shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
[shardformer] update tests for all optimization (#4413) by flybird11111
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
[shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
[shardformer] test all optimizations (#4399) by flybird1111
[shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
[Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
[shardformer] support Blip2 (#4243) by FoolPlayer
[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
[shardformer] pre-commit check files by klhhhhh
[shardformer] register without auto policy by klhhhhh
[shardformer] ChatGLM support layernorm sharding by klhhhhh
[shardformer] delete some file by klhhhhh
[shardformer] support chatglm without layernorm by klhhhhh
[shardformer] polish code by klhhhhh
[shardformer] polish chatglm code by klhhhhh
[shardformer] add test kit in model zoo for chatglm by klhhhhh
[shardformer] vit test finish and support by klhhhhh
[shardformer] added tests by klhhhhh
Feature/chatglm (#4240) by Kun Lin
[shardformer] support whisper (#4212) by FoolPlayer
[shardformer] support SAM (#4231) by FoolPlayer
Feature/vit support (#4182) by Kun Lin
[shardformer] support pipeline base vit model (#4284) by FoolPlayer
[shardformer] support inplace sharding (#4251) by Hongxin Liu
[shardformer] fix base policy (#4229) by Hongxin Liu
[shardformer] support lazy init (#4202) by Hongxin Liu
[shardformer] fix type hint by ver217
[shardformer] rename policy file name by ver217

Legacy

[legacy] move builder and registry to legacy (#4603) by Hongxin Liu
[legacy] move engine to legacy (#4560) by Hongxin Liu
[legacy] move trainer to legacy (#4545) by Hongxin Liu

Test

[test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
[test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
[test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
[test] skip some not compatible models by FoolPlayer
[test] add shard util tests by ver217
[test] update shardformer tests by ver217
[test] remove useless tests (#4359) by Hongxin Liu

Zero

[zero] hotfix master param sync (#4618) by Hongxin Liu
[zero]fix zero ckptIO with offload (#4529) by LuGY
[zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

[checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
[checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu

Coati

Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
[coati] update ci by ver217
[coati] add chatglm model (#4539) by yingliu-hpc

Doc

[doc] add llama2 benchmark (#4604) by binmakeswell
[DOC] hotfix/llama2news (#4595) by binmakeswell
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
[doc] update Coati README (#4405) by Wenhao Chen
[doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
[doc] Fix gradient accumulation doc. (#4349) by flybird1111

Pipeline

[pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
[pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
[pipeline] add chatglm (#4363) by Jianghai
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
[pipeline] add unit test for 1f1b (#4303) by LuGY
[pipeline] fix return_dict/fix pure_pipeline_test (#4331) by [Baizhou Zhang](htt...

Assets 2

01 Aug 07:02

github-actions

v0.3.1

8064771

Version v0.3.1 Release Today!

What's Changed

Release

[release] update version (#4332) by Hongxin Liu

Chat

[chat] fix compute_approx_kl (#4338) by Wenhao Chen
[chat] removed cache file (#4155) by Frank Lee
[chat] use official transformers and fix some issues (#4117) by Wenhao Chen
[chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
[chat] refactor trainer class (#4080) by Wenhao Chen
[chat]: fix chat evaluation possible bug (#4064) by Michelle
[chat] refactor strategy class with booster api (#3987) by Wenhao Chen
[chat] refactor actor class (#3968) by Wenhao Chen
[chat] add distributed PPO trainer (#3740) by Hongxin Liu

Zero

[zero] optimize the optimizer step time (#4221) by LuGY
[zero] support shard optimizer state dict of zero (#4194) by LuGY
[zero] add state dict for low level zero (#4179) by LuGY
[zero] allow passing process group to zero12 (#4153) by LuGY
[zero]support no_sync method for zero1 plugin (#4138) by LuGY
[zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

[NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
[NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
[NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
[NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
[NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
[NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
[NFC] fix: format (#4270) by dayellow
[NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
[NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
[NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
[NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
[NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
[NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
[NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
[NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
[NFC] Fix format for mixed precision (#4253) by Jianghai
[nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
[nfc] fix dim not defined and fix typo (#3991) by digger yu
[nfc] fix typo colossalai/zero (#3923) by digger yu
[nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
[nfc] fix typo colossalai/nn (#3887) by digger yu
[nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Fix/format (#4261) by Michelle
[example] add llama pretraining (#4257) by binmakeswell
[example] fix bucket size in example of gpt gemini (#4028) by LuGY
[example] update ViT example using booster api (#3940) by Baizhou Zhang
Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
[example] update opt example using booster api (#3918) by Baizhou Zhang
[example] Modify palm example with the new booster API (#3913) by Liu Ziming
[example] update gemini examples (#3868) by jiangmingyan

Ci

[ci] support testmon core pkg change detection (#4305) by Hongxin Liu

Checkpointio

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang

Lazy

[lazy] support init on cuda (#4269) by Hongxin Liu
[lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
[lazy] refactor lazy init (#3891) by Hongxin Liu

Kernels

[Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

[docker] fixed ninja build command (#4203) by Frank Lee
[docker] added ssh and rdma support for docker (#4192) by Frank Lee

Dtensor

[dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
[dtensor] updated api and doc (#3845) by Frank Lee

Workflow

[workflow] show test duration (#4159) by Frank Lee
[workflow] added status check for test coverage workflow (#4106) by Frank Lee
[workflow] cover all public repositories in weekly report (#4069) by Frank Lee
[workflow] fixed the directory check in build (#3980) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] added docker latest tag for release (#3920) by Frank Lee
[workflow] fixed workflow check for docker build (#3849) by Frank Lee

Cli

[cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]

Shardformer

[shardformer] added development protocol for standardization (#4149) by Frank Lee
[shardformer] made tensor parallelism configurable (#4144) by Frank Lee
[shardformer] refactored some doc and api (#4137) by Frank Lee
[shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
[shardformer] added embedding gradient check (#4124) by Frank Lee
[shardformer] import huggingface implicitly (#4101) by Frank Lee
[shardformer] integrate with data parallelism (#4103) by Frank Lee
[shardformer] supported fused normalization (#4112) by Frank Lee
[shardformer] supported bloom model (#4098) by Frank Lee
[shardformer] support vision transformer (#4096) by Kun Lin
[shardformer] shardformer support opt models (#4091) by jiangmingyan
[shardformer] refactored layernorm (#4086) by Frank Lee
[shardformer] Add layernorm (#4072) by [FoolPlayer](https://api.github.co...

Assets 2

25 May 08:26

github-actions

v0.3.0

d42b1be

Version v0.3.0 Release Today!

What's Changed

Release

[release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

[nfc] fix typo colossalai/ applications/ (#3831) by digger yu
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
[NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
[NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
[NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
[NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
[NFC] polish initializer_data.py code style (#3287) by RichardoLuo
[NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
[NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
[NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
[NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
[NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
[NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
[NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
[NFC] polish code style (#3273) by Xuanlei Zhao
[NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
[NFC] polish code style (#3268) by Yuanchen
[NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
[NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
[NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
[NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

[doc] update document of gemini instruction. (#3842) by jiangmingyan
Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
[doc]fix by jiangmingyan
[doc]fix by jiangmingyan
[doc] add warning about fsdp plugin (#3813) by Hongxin Liu
[doc] add removed change of config.py by jiangmingyan
[doc] add removed warning by jiangmingyan
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update gradient accumulation (#3771) by jiangmingyan
[doc] update gradient cliping document (#3778) by jiangmingyan
[doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
[doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
[doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
[doc] add tutorial for booster plugins (#3758) by Hongxin Liu
[doc] add tutorial for cluster utils (#3763) by Hongxin Liu
[doc] update hybrid parallelism doc (#3770) by jiangmingyan
[doc] update booster tutorials (#3718) by jiangmingyan
[doc] fix chat spelling error (#3671) by digger-yu
[Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
[doc] Fix typo under colossalai and doc(#3618) by digger-yu
[doc] .github/workflows/README.md (#3605) by digger-yu
[doc] fix setup.py typo (#3603) by digger-yu
[doc] fix op_builder/README.md (#3597) by digger-yu
[doc] Update .github/workflows/README.md (#3577) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
[doc] Update README.md (#3549) by digger-yu
[doc] Update README-zh-Hans.md (#3541) by digger-yu
[doc] hide diffusion in application path (#3519) by binmakeswell
[doc] add requirement and highlight application (#3516) by binmakeswell
[doc] Add docs for clip args in zero optim (#3504) by YH
[doc] updated contributor list (#3474) by Frank Lee
[doc] polish diffusion example (#3386) by Jan Roudaut
[doc] add Intel cooperation news (#3333) by binmakeswell
[doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

[workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
[workflow] fixed testmon cache in build CI (#3806) by Frank Lee
[workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
[workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
[workflow] enable testing for develop & feature branch (#3801) by Frank Lee
[workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

[booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
[booster] torch fsdp fix ckpt (#3788) by wukong1992
[booster] removed models that don't support fsdp (#3744) by wukong1992
[booster] support torch fsdp plugin in booster (#3697) by wukong1992
[booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
[booster] fix no_sync method (#3709) by Hongxin Liu
[booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
[booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
[booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
[booster] add low level zero plugin (#3594) by Hongxin Liu
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
[booster] implement Gemini plugin (#3352) by ver217

Docs

[docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

[evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

[Docker] Fix a couple of build issues (#3691) by Yanming W
Fix/docker action (#3266) by liuzeming

Api

[API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

[test] fixed lazy init test import error (#3799) by Frank Lee
Update test_ci.sh by Camille Zhong
[test] refactor tests with spawn (#3452) by Frank Lee
[test] reorganize zero/gem...

Assets 2

29 Mar 02:26

github-actions

v0.2.8

a0b3749

Version v0.2.8 Release Today!

What's Changed

Release

[release] v0.2.8 (#3305) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
[format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]

Doc

[doc] add ColossalChat news (#3304) by binmakeswell
[doc] add ColossalChat (#3297) by binmakeswell
[doc] fix typo (#3222) by binmakeswell
[doc] update chatgpt doc paper link (#3229) by Camille Zhong
[doc] add community contribution guide (#3153) by binmakeswell
[doc] add Intel cooperation for biomedicine (#3108) by binmakeswell

Application

[application] updated the README (#3301) by Frank Lee

Chat

[chat]polish prompts training (#3300) by BlueRum
[chat]Update Readme (#3296) by BlueRum

Coati

[coati] fix inference profanity check (#3299) by ver217
[coati] inference supports profanity check (#3295) by ver217
[coati] add repetition_penalty for inference (#3294) by ver217
[coati] fix inference output (#3285) by ver217
[Coati] first commit (#3283) by Fazzie-Maqianli

Colossalchat

[ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli

Examples

[examples] polish AutoParallel readme (#3270) by YuliangLiu0306
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323

Fx

[fx] meta registration compatibility (#3253) by HELSON
[FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306

Booster

[booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
[booster] implemented the cluster module (#3191) by Frank Lee
[booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
[booster] added the accelerator implementation (#3159) by Frank Lee
[booster] implemented mixed precision class (#3151) by Frank Lee

Ci

[CI] Fix pre-commit workflow (#3238) by Hakjin Lee

Api

[API] implement device mesh manager (#3221) by YuliangLiu0306
[api] implemented the checkpoint io module (#3205) by Frank Lee

Hotfix

[hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
[hotfix] layout converting issue (#3188) by YuliangLiu0306

Chatgpt

[chatgpt] add precision option for colossalai (#3233) by ver217
[chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
[chatgpt] support instuct training (#3216) by Fazzie-Maqianli
[chatgpt]add reward model code for deberta (#3199) by Yuanchen
[chatgpt]support llama (#3070) by Fazzie-Maqianli
[chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
[chatgpt]Reward Model Training Process update (#3133) by BlueRum
[chatgpt] fix trainer generate kwargs (#3166) by ver217
[chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
[chatgpt]update ci (#3087) by BlueRum
[chatgpt]Fix examples (#3116) by BlueRum
[chatgpt] fix lora support for gpt (#3113) by BlueRum
[chatgpt] type miss of kwargs (#3107) by hiko2MSP
[chatgpt] fix lora save bug (#3099) by BlueRum

Lazyinit

[lazyinit] combine lazy tensor with dtensor (#3204) by ver217
[lazyinit] add correctness verification (#3147) by ver217
[lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

[auto] fix requirements typo for issue #3125 (#3209) by Yan Fang

Analyzer

[Analyzer] fix analyzer tests (#3197) by YuliangLiu0306

Dreambooth

[dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

[auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

[zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

[test] fixed torchrec registration in model zoo (#3177) by Frank Lee
[test] fixed torchrec model test (#3167) by Frank Lee
[test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
[test] added transformers models to test model zoo (#3135) by Frank Lee
[test] added torchvision models to test model zoo (#3132) by Frank Lee
[test] added timm models to test model zoo (#3129) by Frank Lee

Refactor

[refactor] update docs (#3174) by Saurav Maheshkar

Tests

[tests] model zoo add torchaudio models (#3138) by ver217
[tests] diffuser models in model zoo (#3136) by HELSON

Docker

[docker] Add opencontainers image-spec to Dockerfile (#3006) by Saurav Maheshkar

Dtensor

[DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306

Workflow

[workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

[autochunk] support complete benchmark (#3121) by Xuanlei Zhao

Tutorial

[tutorial] update notes for TransformerEngine (#3098) by binmakeswell

Nvidia

[NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani

Full Changelog: v0.2.8...v0.2.7

Assets 2

10 Mar 06:56

github-actions

v0.2.7

26db1cb

Version v0.2.7 Release Today!

What's Changed

Release

[release] v0.2.7 (#3094) by Frank Lee
[release] v0.2.6 (#3057) by Frank Lee

Chatgpt

[chatgpt]add flag of action mask in critic(#3086) by Fazzie-Maqianli
[chatgpt] change critic input as state (#3042) by wenjunyang
[chatgpt] fix readme (#3025) by BlueRum
[chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
[chatgpt]fix inference model load (#2988) by BlueRum
[chatgpt] allow shard init and display warning (#2986) by ver217
[chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
[chatgpt] making experience support dp (#2971) by ver217
[chatgpt]fix lora bug (#2974) by BlueRum
[chatgpt] fix inference demo loading bug (#2969) by BlueRum
[ChatGPT] fix README (#2966) by Fazzie-Maqianli
[chatgpt]add inference example (#2944) by BlueRum
[chatgpt]support opt & gpt for rm training (#2876) by BlueRum
[chatgpt] Support saving ckpt in examples (#2846) by BlueRum
[chatgpt] fix rm eval (#2829) by BlueRum
[chatgpt] add test checkpoint (#2797) by ver217
[chatgpt] update readme about checkpoint (#2792) by ver217
[chatgpt] startegy add prepare method (#2766) by ver217
[chatgpt] disable shard init for colossalai (#2767) by ver217
[chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
[chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Kernel

[kernel] added kernel loader to softmax autograd function (#3093) by Frank Lee
[kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Analyzer

[analyzer] a minimal implementation of static graph analyzer (#2852) by Super Daniel

Diffusers

[diffusers] fix ci and docker (#3085) by Fazzie-Maqianli

Doc

[doc] fixed typos in docs/README.md (#3082) by Frank Lee
[doc] moved doc test command to bottom (#3075) by Frank Lee
[doc] specified operating system requirement (#3019) by Frank Lee
[doc] update nvme offload doc (#3014) by ver217
[doc] add ISC tutorial (#2997) by binmakeswell
[doc] add deepspeed citation and copyright (#2996) by ver217
[doc] added reference to related works (#2994) by Frank Lee
[doc] update news (#2983) by binmakeswell
[doc] fix chatgpt inference typo (#2964) by binmakeswell
[doc] add env scope (#2933) by binmakeswell
[doc] added readme for documentation (#2935) by Frank Lee
[doc] removed read-the-docs (#2932) by Frank Lee
[doc] update installation for GPT (#2922) by binmakeswell
[doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
[doc] fix GPT tutorial (#2860) by dawei-wang
[doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
[doc] update OPT serving (#2804) by binmakeswell
[doc] update example and OPT serving link (#2769) by binmakeswell
[doc] add opt service doc (#2747) by Frank Lee
[doc] fixed a typo in GPT readme (#2736) by cloudhuang
[doc] updated documentation version list (#2730) by Frank Lee

Autochunk

[autochunk] support vit (#3084) by Xuanlei Zhao
[autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Dtensor

[DTensor] implement layout converter (#3055) by YuliangLiu0306
[DTensor] refactor CommSpec (#3034) by YuliangLiu0306
[DTensor] refactor sharding spec (#2987) by YuliangLiu0306
[DTensor] implementation of dtensor (#2946) by YuliangLiu0306

Workflow

[workflow] fixed doc build trigger condition (#3072) by Frank Lee
[workflow] supported conda package installation in doc test (#3028) by Frank Lee
[workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
[workflow] added auto doc test on PR (#2929) by Frank Lee
[workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

[booster] init module structure and definition (#3056) by Frank Lee

Example

[example] fix redundant note (#3065) by binmakeswell
[example] fixed opt model downloading from huggingface by Tomek
[example] add LoRA support (#2821) by Haofan Wang

Hotfix

[hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
[hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
[hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
[hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
[hotfix] fix chunk size can not be divided (#2867) by HELSON
Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
[hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
[hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

[revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

[pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

[fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

[refactor] restructure configuration files (#2977) by Saurav Maheshkar

Misc

[misc] add reference (#2930) by ver217

Autoparallel

[autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
[autoparallel] find repeat blocks (#2854) by YuliangLiu0306
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
[autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
[autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
[autoparallel] rotor solver refactor (#2813) by Boyuan Yao
[autoparallel] Patch meta information of torch.nn.Embedding (#2760) by [Boyuan Ya...

Assets 2

10 Mar 06:57

github-actions

v0.2.6

89aa792

Version v0.2.6 Release Today!

What's Changed

Release

[release] v0.2.6 (#3057) by Frank Lee

Doc

[doc] moved doc test command to bottom (#3075) by Frank Lee
[doc] specified operating system requirement (#3019) by Frank Lee
[doc] update nvme offload doc (#3014) by ver217
[doc] add ISC tutorial (#2997) by binmakeswell
[doc] add deepspeed citation and copyright (#2996) by ver217
[doc] added reference to related works (#2994) by Frank Lee
[doc] update news (#2983) by binmakeswell
[doc] fix chatgpt inference typo (#2964) by binmakeswell
[doc] add env scope (#2933) by binmakeswell
[doc] added readme for documentation (#2935) by Frank Lee
[doc] removed read-the-docs (#2932) by Frank Lee
[doc] update installation for GPT (#2922) by binmakeswell
[doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
[doc] fix GPT tutorial (#2860) by dawei-wang
[doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
[doc] update OPT serving (#2804) by binmakeswell
[doc] update example and OPT serving link (#2769) by binmakeswell
[doc] add opt service doc (#2747) by Frank Lee
[doc] fixed a typo in GPT readme (#2736) by cloudhuang
[doc] updated documentation version list (#2730) by Frank Lee

Workflow

[workflow] fixed doc build trigger condition (#3072) by Frank Lee
[workflow] supported conda package installation in doc test (#3028) by Frank Lee
[workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
[workflow] added auto doc test on PR (#2929) by Frank Lee
[workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

[booster] init module structure and definition (#3056) by Frank Lee

Example

[example] fix redundant note (#3065) by binmakeswell
[example] fixed opt model downloading from huggingface by Tomek
[example] add LoRA support (#2821) by Haofan Wang

Autochunk

[autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Chatgpt

[chatgpt] change critic input as state (#3042) by wenjunyang
[chatgpt] fix readme (#3025) by BlueRum
[chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
[chatgpt]fix inference model load (#2988) by BlueRum
[chatgpt] allow shard init and display warning (#2986) by ver217
[chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
[chatgpt] making experience support dp (#2971) by ver217
[chatgpt]fix lora bug (#2974) by BlueRum
[chatgpt] fix inference demo loading bug (#2969) by BlueRum
[ChatGPT] fix README (#2966) by Fazzie-Maqianli
[chatgpt]add inference example (#2944) by BlueRum
[chatgpt]support opt & gpt for rm training (#2876) by BlueRum
[chatgpt] Support saving ckpt in examples (#2846) by BlueRum
[chatgpt] fix rm eval (#2829) by BlueRum
[chatgpt] add test checkpoint (#2797) by ver217
[chatgpt] update readme about checkpoint (#2792) by ver217
[chatgpt] startegy add prepare method (#2766) by ver217
[chatgpt] disable shard init for colossalai (#2767) by ver217
[chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
[chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Dtensor

[DTensor] refactor CommSpec (#3034) by YuliangLiu0306
[DTensor] refactor sharding spec (#2987) by YuliangLiu0306
[DTensor] implementation of dtensor (#2946) by YuliangLiu0306

Hotfix

[hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
[hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
[hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
[hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
[hotfix] fix chunk size can not be divided (#2867) by HELSON
Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
[hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
[hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

[revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

[pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

[fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

[refactor] restructure configuration files (#2977) by Saurav Maheshkar

Kernel

[kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Misc

[misc] add reference (#2930) by ver217

Autoparallel

[autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
[autoparallel] find repeat blocks (#2854) by YuliangLiu0306
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
[autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
[autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
[autoparallel] rotor solver refactor (#2813) by Boyuan Yao
[autoparallel] Patch meta information of torch.nn.Embedding (#2760) by Boyuan Yao
[autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306

Zero

[zero] trivial zero optimizer refactoring (#2869) by YH
[zero] fix wrong import (#2777) by Boyuan Yao

Cli

[cli] handled version check exceptions (#2848) by Frank Lee

Triton

[triton] added copyright information for flash attention (#2835) by Frank Lee

Nfc

[NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) by Michelle
[NFC] polish code format by binmakeswell
[NFC] polish colossala...

Assets 2

Releases: hpcaitech/ColossalAI

Version v0.3.6 Release Today!

What's Changed

Release

Colossal-llama2

Hotfix

Doc

Eval-hotfix

Devops

Example

Workflow

Shardformer

Setup

Fsdp

Extension

Llama

Version v0.3.5 Release Today!

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

Ci

Npu

Pipeline

Format

Version v0.3.4 Release Today!

What's Changed

Release

Pipeline inference

Doc

Hotfix

Kernels

Inference

Test

Refactor

Nfc

Format

Gemini

Kernel

Feature

Checkpointio

Infer

Chat

Misc

Lazy

Fix

Version v0.3.3 Release Today!

What's Changed

Release

Inference

Feature

Bug

Lazy

Chat

Doc

Shardformer

Misc

Format

Legacy

Kernel

Example

Hotfix

Devops

Pipeline