Releases: foundation-model-stack/fms-hf-tuning
Releases · foundation-model-stack/fms-hf-tuning
v2.1.2
v2.1.2-rc.1
What's Changed
- build(deps): set lower limit for transformers to 4.45 for granite 3.0 by @willmj in #387
- docs: Update supported models by @aluu317 in #389
Full Changelog: v2.1.1...v2.1.2-rc.1
v2.1.1
What's Changed
Dependency changes
- Pull in new versions of fms-acceleration-peft, fms-acceleration-foak with fixes for AutoGPTQ and gradient accumulation hooks and adds granite GPTQ model
- build(deps): set transformers below 4.46, waiting on fixes by @anhuong in #384
Additional changes
- docs: Update Supported Models List in README by @tharapalanivel in #382
Full Changelog: v2.1.0...v2.1.1
v2.1.1-rc.2
deps: set transformers below 4.46, waiting on fixes (#384) Signed-off-by: Anh Uong <anh.uong@ibm.com>
v2.1.1-rc.1
docs: Update supported models list in README (#382) Signed-off-by: Thara Palanivel <130496890+tharapalanivel@users.noreply.github.com>
v2.1.0
New Major Feature
- Support for GraniteForCausalLM model architecture
Dependency upgrades
- Upgraded
transformers
to version 4.45.2, now supports GraniteForCausalLM models. Note that if a model is trained with transformers v4.45, you need the same versiontransformers>=4.45
to load the trained model, prior versions oftransformers
will not be compatible. - Upgraded
accelerate
to version 1.0.1. - Limit upper bound of
torch
to be under v2.5.0, not including v2.5.0, so it's compatible with flash-attention-2. - Upgraded
fms_acceleration_peft
to v0.3.1 which includes disabling offloading state dict which caused ephemeral storage issues when loading large models with QLoRA. Also includes setting defaults when target_modules=None.
Additional bug fix
- Fix for crash when running a multi GPU training with a non-existent output dir.
Full list of Changes
- ci: run unit tests, fmt, image build on release branch by @anhuong in #361
- chore: update code owners by @anhuong in #363
- fix: crash when output directory doesn't exist by @HarikrishnanBalagopal in #364
- refactor: move tokenizer_data_utils with the rest of utils, add further unit testing. by @willmj in #348
- build(deps): update transformers and accelerate deps by @anhuong in #355
- build(deps): Update peft requirement from <0.13,>=0.8.0 to >=0.8.0,<0.14 by @dependabot in #354
- build(deps): Upgrade accelerate requirement to allow version 1.0.0 by @willmj in #371
- build: Set triton environment variables by @willmj in #370
- build(deps): torch<2.5 due to FA2 error with new version by @anhuong in #375
- chore: merge set of changes for v2.1.0 by @aluu317 in #376
Full Changelog: v2.0.1...v2.1.0
v2.1.0-rc.1
What's Changed
- ci: run unit tests, fmt, image build on release branch by @anhuong in #361
- chore: update code owners by @anhuong in #363
- fix: crash when output directory doesn't exist by @HarikrishnanBalagopal in #364
- refactor: move tokenizer_data_utils with the rest of utils, add further unit testing. by @willmj in #348
- build(deps): update transformers and accelerate deps by @anhuong in #355
- build(deps): Update peft requirement from <0.13,>=0.8.0 to >=0.8.0,<0.14 by @dependabot in #354
- build(deps): Upgrade accelerate requirement to allow version 1.0.0 by @willmj in #371
- build: Set triton environment variables by @willmj in #370
Full Changelog: v2.0.1...v2.0.2-rc.1
v2.0.1
New major features:
- Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
- Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
- Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag
lora_post_process_for_vllm
. See build README for details on how to set this flag. - Enablement of new flags for throughput improvements:
padding_free
to process multiple examples without adding padding tokens,multipack
for multi-GPU training to balance the number of tokens processed on each device, andfast_kernels
for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.
Dependency upgrades:
- Upgraded
transformers
to version 4.44.2 needed for tuning of all models - Upgraded
accelerate
to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.
API /interface changes:
train()
API now returns a tuple of trainer instance and additional metadata as a dict
Additional features and fixes
- Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag
resume_training
defaults toTrue
. - Addition of default pad token in tokenizer when
EOS
andPAD
tokens are equal to improve training quality. - JSON compatability for input datasets. See docs for details on data formats.
- Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag
embedding_size_multiple_of
.
Full List of what's Changed
- fix: do not resize embedding layer by default by @kmehant in #310
- fix: logger is unbound error by @HarikrishnanBalagopal in #308
- feat: Enable JSON dataset compatibility by @willmj in #297
- doc: How to tune LoRA lm_head by @aluu317 in #305
- docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
- fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
- fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
- feat: install fms-acceleration to enable qlora by @anhuong in #284
- feat: Migrating the trainer controller to python logger by @seshapad in #309
- fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
- dep: cap transformers version due to FSDP bug by @anhuong in #335
- deps: Add protobuf to support aLLaM models by @willmj in #336
- fix: add enable_aim build args in all stages needed by @anhuong in #337
- fix: remove lm_head post processing by @Abhishek-TAMU in #333
- doc: Add qLoRA README by @aluu317 in #322
- feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
- feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
- ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
- fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
- feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
- fix: cap transformers at v4.44 by @anhuong in #349
- fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
- feat: Add post processing logic to accelerate launch by @willmj in #351
- build: install additional fms-acceleration plugins by @anhuong in #350
- fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
- fix: check for wte.weight along with embed_tokens.weight by @willmj in #356
- release: merge set of changes for v2.0.0 by @Abhishek-TAMU in #357
- build(deps): unset hardcoded trl version to get latest updates by @anhuong in #358
New Contributors
Full Changelog: v1.2.2...v2.0.0
v2.0.0
v2.0.0-rc.2
What's Changed
Full Changelog: v2.0.0-rc.1...v2.0.0-rc.2