v0.0.3
Released on September 30, 2019.
Featured
torchgpipe now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.
Other Improvements
- Added support for PyTorch 1.2.
- Redesigned the internal pipeline parallelism to represent dependencies transparently.
- Fixed the hanging issue when an exception is raised in a partition.
- Fixed the unintended size accumulation (#3 by @842974287) of
balance_by_size()
.
Breaking Changes:
- No more support for PyTorch 1.0.
- Changed type of
GPipe.devices
fromtuple
tolist
. - Removed
current_microbatch()
. This approach turned out to be incompatible with checkpointing.