- bypass DNS check for studio local exec
- add PyTorch version environment variable, to facilitate SMTT
- pin coverage version to fix pipeline issue
- Add torch_distributed support for Trainium
- update README and contributing guidelines
- provide option to use native process launcher
- add support for native PyTorch DDP distribution
- deriver master node from training environment
- Add Heterogeneous Cluster support
- CI
- add data parallelism support (#11) (#12)
- use ubuntu 18.04 base image in dlc gpu image
- remove TODOs in 1.6.0 dlc gpu dockerfile and reduce parameters for data parallel integ test
- use base cuda 11 image for test dlc gpu image
- use 1.6.0 for gpu tests and disable horovod tests
- remove local data parallel integ test
- use sagemaker-training 3.7.0 and enable data parallel integ tests
- patch socket call and update flake8 violations
- Use MPIRunnerType
- Update main buildspec to only perform CPU integration tests
- Add GPU and unit test buildspecs
- Pin SageMaker version to less than v2
- improve training.py doc style
- add issue templates
- remove confusing information from the Readme.
- do not duplicate test dependencies in tox.ini
- Rename buildspec files.
- Make docker folder read only, remove unused tests, rename test-toolkit/ -> test/.
- Bump version of sagemaker-training for typing fix
- add Python 3.7 support
- Pin Smdebug to the latest version (0.7.2)
- add Dockerfiles for PyTorch 1.5.0
- Replace sagemaker-containers with sagemaker-training
- change miniconda installation in 1.4.0 Dockerfiles
- parallelize SageMaker integ test runs
- remove (unused) model_fn from training scripts
- bump smdebug version
- add requirements.txt integ test
- upgrade pillow etc. to fix safety issues
- Upgrade sagemaker-containers and test with more than 1 epoch
- Install toolkit from PyPI.
- upgrade sagemaker-containers to 2.8.2
- Install jupyter_client 5.3.4 in advanced for py2 gpu image
- update smdebug
- run test-toolkit unit tests for release
- run build steps only when necessary.
- refactor toolkit tests.
- install sm experiments always when python 3.6 or greater
- Update smdebug to 0.7.0
- install sagemaker-experiments package only for 3.6
- upgrade to latest sagemaker-experiments
- install SageMaker Python SDK into Python 3 images
- Install awscli from pypi instead of conda for PyTorch containers
- Remove unnecessary dependencies.
- Fix python 2 tox dependencies.
- copy all tests to test-toolkit folder.
- Update license URL
- Adding changes for PyTorch 1.4.0 DLC
- Add release to PyPI. Change package name to sagemaker-pytorch-training.
- Fix flake8 erros. Add flake configuration to run during PR.
- Add twine section to tox.
- Update build artifacts
- update: Bump awscli version and constrain spyder on conda
- update: bump smdebug version to 0.5.0.post0
- Create init.py
- run local GPU tests for Python 3
- update: Update buildspec for PyTorch 1.3.1
- update copyright year in license header
- Added changes for DLC 2.1 with PyTorch v1.3.1
- Remove stale-bot config
- upgrade sagemaker-containers to 2.5.11
- upgrade pillow to 6.2.0
- use SageMaker Containers' ProcessRunner for executing the entry point
- use regional endpoint for STS in builds
- update instance type region availability.
- Update Dockerfile.gpu
- Removing extra packages to optimize space
- Adding function to skip test for py2 verison
- Installing tochvision from official pip wheel
- Add /bin/bash as default CMD
- Pytorch 1.2 py2 py3 dockerfiles added
- Add wait on entrypoint
- Add entrypoint script
- split training and serving logic
- properly fail build if has-matching-changes fails
- properly fail build if has-matching-changes fails
- fix placeholder name cpu-instance-type in buildspec-release.yml
- Update no-p2 and no-p3 regions.
- upgrade sagemaker-container version
- unmark 2 deploy tests
- update p2 restricted regions
- skip tests in gpu instance restricted regions
- modify buildspecs and tox files
- freeze dependency versions
- add buildspec-release file and upgrade cuda version
- upgrade PyTorch to 1.1
- disable test_mnist_gpu for py2 for now
- fix broken line of buildspec
- prevent hidden errors in buildspec
- Add AWS CodeBuild buildspec for pull request
- Bump minimum SageMaker Containers version to 2.4.6 and pin SageMaker Python SDK to 1.18.16
- fix broken link in README
- Add timeout to test_mnist_gpu test
- Use dummy role in tests and update local failure integ test
- Use the SageMaker Python SDK for local serving integ tests
- Use the SageMaker Python SDK for local integ distributed training tests
- Use the SageMaker Python SDK for local integ single-machine training tests
- Pin fastai version to 1.0.39 in CPU dockerfile
- Use the SageMaker Python SDK for SageMaker integration tests
- Add missing rendering dependencies for opencv and a simple test.
- Add opencv support.
- Freeze PyYAML version to avoid conflict with Docker Compose
- Unfreeze numpy version.
- Freeze TorchVision to 0.2.1
- Specify region when creating S3 resource in integ tests
- Read framework version from Python SDK for integ test default
- Fix unicode display problem in py2 container
- freeze pip <=18.1, fastai == 1.0.39, numpy <= 1.15.4
- Add support for fastai (https://github.com/fastai/fastai) library.
- Remove "requsests" from tests dependencies to avoid regular conflicts with "requests" package from "sagemaker" dependencies.
- Add support for PyTorch-1.0.