Skip to content

Releases: facebookresearch/fairscale

v0.3.3: [chore] 0.3.3 release (#568)

02 Apr 16:25
60694da
Compare
Choose a tag to compare
- releasing 0.3.3
- I need it in vissl for the auto_wrap_bn change

v0.3.2

02 Apr 16:24
9a37498
Compare
Choose a tag to compare
[chore] 0.3.2 release (#535)

v0.3.1: [chore] 0.3.1 release (#504)

02 Apr 16:24
84cec20
Compare
Choose a tag to compare
* [chore] 0.3.1 release

- mainly because vissl needs the new version
- added a doc on release steps

* Update CHANGELOG.md

Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

* review comments

Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

v0.3.0

23 Feb 06:57
d64ff25
Compare
Choose a tag to compare

[0.3.0] - 2021-02-22

Added

  • FullyShardedDataParallel (FSDP) (#413)
  • ShardedDDP fp16 grad reduction option (#402)
  • Expose experimental algorithms within the pip package (#410)

Fixed

  • Catch corner case when the model is too small with respect to the world size, and shards are empty (#406)
  • Memory leak in checkpoint_wrapper (#412)

v0.1.7

19 Feb 22:08
a606e84
Compare
Choose a tag to compare

Fixed

  • ShardedDDP and OSS handle model trainability changes during training (#369)
  • ShardedDDP state dict load/save bug (#386)
  • ShardedDDP handle train/eval modes (#393)
  • AdaScale handling custom scaling factors (#401)

Added

  • ShardedDDP manual reduce option for checkpointing (#389)

v0.1.6

11 Feb 01:31
ce9e7e4
Compare
Choose a tag to compare

Added

  • Checkpointing model wrapper (#376)
  • Faster OSS, flatbuffers (#371)
  • Small speedup in OSS clipgradnorm (#363)

Fixed

  • Bug in ShardedDDP with 0.1.5 depending the init (KeyError / OSS)
  • Much refactoring in Pipe (#357, #358, #360, #362, #370, #373)
  • Better pip integration / resident pytorch (#375)

v0.1.5

03 Feb 22:20
4401ced
Compare
Choose a tag to compare

Added

  • Pytorch compatibility for OSS checkpoints (#310)
  • Elastic checkpoints for OSS, world size can vary in between save and loads (#310)
  • Tensor views for OSS bucketing, reduced CPU use (#300)
  • Bucket calls in ShardedDDP, for faster inter node communications (#327)
  • FlattenParamWrapper, which flattens module parameters into a single tensor seamlessly (#317)
  • AMPnet experimental support (#304)

Fixed

  • ShardedDDP properly handles device changes via .to() (#353)
  • Add a new interface for AdaScale, AdaScaleWrapper, which makes it compatible with OSS (#347)

v0.1.4

07 Jan 18:59
53a912c
Compare
Choose a tag to compare

Fixed

  • Missing cu files in the pip package

v0.1.3

05 Jan 05:54
7cc8b34
Compare
Choose a tag to compare

Same as 0.1.2, but with the correct numbering in the source code (see init.py)

v0.1.2

04 Jan 20:30
84a3bdb
Compare
Choose a tag to compare

Added

  • AdaScale: Added gradient accumulation feature (#202)
  • AdaScale: Added support of torch.lr_scheduler (#229)

Fixed

  • AdaScale: smoothing factor value fixed when using gradient accumulation (#235)
  • Pipe: documentation on balancing functions (#243)
  • ShardedDDP: handle typical NLP models
  • ShardedDDP: better partitioning when finetuning