Release Version v0.3.0 Release Today! · hpcaitech/ColossalAI

What's Changed

Release

[release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

[nfc] fix typo colossalai/ applications/ (#3831) by digger yu
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
[NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
[NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
[NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
[NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
[NFC] polish initializer_data.py code style (#3287) by RichardoLuo
[NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
[NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
[NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
[NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
[NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
[NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
[NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
[NFC] polish code style (#3273) by Xuanlei Zhao
[NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
[NFC] polish code style (#3268) by Yuanchen
[NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
[NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
[NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
[NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

[doc] update document of gemini instruction. (#3842) by jiangmingyan
Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
[doc]fix by jiangmingyan
[doc]fix by jiangmingyan
[doc] add warning about fsdp plugin (#3813) by Hongxin Liu
[doc] add removed change of config.py by jiangmingyan
[doc] add removed warning by jiangmingyan
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update gradient accumulation (#3771) by jiangmingyan
[doc] update gradient cliping document (#3778) by jiangmingyan
[doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
[doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
[doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
[doc] add tutorial for booster plugins (#3758) by Hongxin Liu
[doc] add tutorial for cluster utils (#3763) by Hongxin Liu
[doc] update hybrid parallelism doc (#3770) by jiangmingyan
[doc] update booster tutorials (#3718) by jiangmingyan
[doc] fix chat spelling error (#3671) by digger-yu
[Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
[doc] Fix typo under colossalai and doc(#3618) by digger-yu
[doc] .github/workflows/README.md (#3605) by digger-yu
[doc] fix setup.py typo (#3603) by digger-yu
[doc] fix op_builder/README.md (#3597) by digger-yu
[doc] Update .github/workflows/README.md (#3577) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
[doc] Update README.md (#3549) by digger-yu
[doc] Update README-zh-Hans.md (#3541) by digger-yu
[doc] hide diffusion in application path (#3519) by binmakeswell
[doc] add requirement and highlight application (#3516) by binmakeswell
[doc] Add docs for clip args in zero optim (#3504) by YH
[doc] updated contributor list (#3474) by Frank Lee
[doc] polish diffusion example (#3386) by Jan Roudaut
[doc] add Intel cooperation news (#3333) by binmakeswell
[doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

[workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
[workflow] fixed testmon cache in build CI (#3806) by Frank Lee
[workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
[workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
[workflow] enable testing for develop & feature branch (#3801) by Frank Lee
[workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

[booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
[booster] torch fsdp fix ckpt (#3788) by wukong1992
[booster] removed models that don't support fsdp (#3744) by wukong1992
[booster] support torch fsdp plugin in booster (#3697) by wukong1992
[booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
[booster] fix no_sync method (#3709) by Hongxin Liu
[booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
[booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
[booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
[booster] add low level zero plugin (#3594) by Hongxin Liu
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
[booster] implement Gemini plugin (#3352) by ver217

Docs

[docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

[evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

[Docker] Fix a couple of build issues (#3691) by Yanming W
Fix/docker action (#3266) by liuzeming

Api

[API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

[test] fixed lazy init test import error (#3799) by Frank Lee
Update test_ci.sh by Camille Zhong
[test] refactor tests with spawn (#3452) by Frank Lee
[test] reorganize zero/gemini tests (#3445) by ver217
[test] fixed gemini plugin test (#3411) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3786 (#3787) by github-actions[bot]
[format] Run lint on colossalai.engine (#3367) by Hakjin Lee

Plugin

[plugin] a workaround for zero plugins' optimizer checkpoint (#3780) by Hongxin Liu
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) by Hongxin Liu

Chat

[chat] add performance and tutorial (#3786) by binmakeswell
[chat] fix bugs in stage 3 training (#3759) by Yuanchen
[chat] fix community example ray (#3719) by MisterLin1995
[chat] fix train_prompts.py gemini strategy bug (#3666) by zhang-yi-chi
[chat] PPO stage3 doc enhancement (#3679) by Camille Zhong
[chat] add opt attn kernel (#3655) by Hongxin Liu
[chat] typo accimulation_steps -> accumulation_steps (#3662) by tanitna
Merge pull request #3656 from TongLi3701/chat/update_eval by Tong Li
[chat] set default zero2 strategy (#3667) by binmakeswell
[chat] refactor model save/load logic (#3654) by Hongxin Liu
[chat] remove lm model class (#3653) by Hongxin Liu
[chat] refactor trainer (#3648) by Hongxin Liu
[chat] polish performance evaluator (#3647) by Hongxin Liu
Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu by Tong Li
[Chat] Remove duplicate functions (#3625) by ddobokki
[chat] fix enable single gpu training bug by zhang-yi-chi
[chat] polish code note typo (#3612) by digger-yu
[chat] update reward model sh (#3578) by binmakeswell
[chat] ChatGPT train prompts on ray example (#3309) by MisterLin1995
[chat] polish tutorial doc (#3551) by binmakeswell
[chat]add examples of training with limited resources in chat readme (#3536) by Yuanchen
[chat]: add vf_coef argument for PPOTrainer (#3318) by zhang-yi-chi
[chat] add zero2 cpu strategy for sft training (#3520) by ver217
[chat] fix stage3 PPO sample sh command (#3477) by binmakeswell
[Chat]Add Peft support & fix the ptx bug (#3433) by YY Lin
[chat]fix save_model(#3377) by Dr-Corgi
[chat]fix readme (#3429) by kingkingofall
[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) by Camille Zhong
[chat]fix sft training for bloom, gpt and opt (#3418) by Yuanchen
[chat] correcting a few obvious typos and grammars errors (#3338) by Andrew

Devops

[devops] fix doc test on pr (#3782) by Hongxin Liu
[devops] fix ci for document check (#3751) by Hongxin Liu
[devops] make build on PR run automatically (#3748) by Hongxin Liu
[devops] update torch version of CI (#3725) by Hongxin Liu
[devops] fix chat ci (#3628) by Hongxin Liu

Amp

[amp] Add naive amp demo (#3774) by jiangmingyan

Auto

[auto] fix install cmd (#3772) by binmakeswell

Fix

[fix] Add init to fix import error when importing _analyzer (#3668) by Ziyue Jiang

Ci

[CI] fix typo with tests/ etc. (#3727) by digger-yu
[CI] fix typo with tests components (#3695) by digger-yu
[CI] fix some spelling errors (#3707) by digger-yu
[CI] Update test_sharded_optim_with_sync_bn.py (#3688) by digger-yu

Example

[example] add train resnet/vit with booster example (#3694) by Hongxin Liu
[example] add finetune bert with booster example (#3693) by Hongxin Liu
[example] fix community doc (#3586) by digger-yu
[example] reorganize for community examples (#3557) by binmakeswell
[example] remove redundant texts & update roberta (#3493) by mandoxzhang
[example] update roberta with newer ColossalAI (#3472) by mandoxzhang
[example] update examples related to zero/gemini (#3431) by ver217

Tensor

[tensor] Refactor handle_trans_spec in DistSpecManager by YH

Zero

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) by YH
[zero] reorganize zero/gemini folder structure (#3424) by ver217

Gemini

[gemini] accelerate inference (#3641) by Hongxin Liu
[gemini] state dict supports fp16 (#3590) by Hongxin Liu
[gemini] support save state dict in shards (#3581) by Hongxin Liu
[gemini] gemini supports lazy init (#3379) by Hongxin Liu

Bot

[bot] Automated submodule synchronization (#3596) by github-actions[bot]

Misc

[misc] op_builder/builder.py (#3593) by digger-yu
[misc] add verbose arg for zero and op builder (#3552) by Hongxin Liu

Coati

[coati] fix install cmd (#3592) by binmakeswell
[coati] add costom model suppor tguide (#3579) by Fazzie-Maqianli
[coati] Fix LlamaCritic (#3475) by gongenlei

Fx

[fx] fix meta tensor registration (#3589) by Hongxin Liu

Chatgpt

[chatgpt] Detached PPO Training (#3195) by csric
[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) by Camille Zhong

Lazyinit

[lazyinit] fix clone and deepcopy (#3553) by Hongxin Liu

Checkpoint

[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479) by jiangmingyan
[checkpoint] support huggingface style sharded checkpoint (#3461) by jiangmingyan
[checkpoint] refactored the API and added safetensors support (#3427) by Frank Lee

Chat community

[Chat Community] Update README.md (fixed#3487) (#3506) by NatalieC323

Dreambooth

Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) by NatalieC323
[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) by NatalieC323

Autoparallel

[autoparallel]integrate auto parallel feature with new tracer (#3408) by YuliangLiu0306
[autoparallel] adapt autoparallel with new analyzer (#3261) by YuliangLiu0306

Moe

[moe] add checkpoint for moe models (#3354) by HELSON

Hotfix

[hotfix] meta_tensor_compatibility_with_torch2 by YuliangLiu0306

Full Changelog: v0.3.0...v0.2.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version v0.3.0 Release Today!

What's Changed

Release

Nfc

Doc

Workflow

Booster

Docs

Evaluation

Docker

Api

Test

Format

Plugin

Chat

Devops

Amp

Auto

Fix

Ci

Example

Tensor

Zero

Gemini

Bot

Misc

Coati

Fx

Chatgpt

Lazyinit

Checkpoint

Chat community

Dreambooth

Autoparallel

Moe

Hotfix