Releases: apache/mxnet
Apache MXNet (incubating) 1.9.1 patch release
Apache MXNet (incubating) 1.9.1 is a maintenance release incorporating important bug fixes and performance improvements. All users of Apache MXNet (incubating) 1.9.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.9.1 at the usual place. Please review these Release Notes to learn the bug fixes.
Bug-fixes
-
Upgrade numpy to <1.20.0 to avoid security vulnerabilities affecting numpy<1.19.1 (#20940)
-
quantized elemwise mul changed out type to float (#20926)
-
Avoid modifying loaded library map while iterating in lib_close() (#20941) (#20944)
-
Fixed issue with batchnorm on even number of channels (#20927)
-
Assign attributes of transformer operators (#20902)
-
Fix reuse of primitives for MKLDNN-AArch64. Fixes #20265. (#20482) (#20921)
-
identity fuse (#20884)
-
Port changes from master to make CPP package properly build when large tensor support is enabled. (#20768) (#20841)
-
Port BRGEMM (#20910)
Submodule
- Upgrade oneDNN to the top of rls-v2.4 branch (#20994)
CI/CD
- Fix aarch64 cd pipeline (#20783)
- Fix CD for pypi wheel version (#20782)
- Port #20903 from master. (#20918) (#20920)
- Fix pip installation in containers (#20864)
- Update libcudnn and libnccl to the same version used in NVidia's docker container for cuda 10.2 and 11.2, and update repo where we pull the packages from. (#20808)
Website
- Fix css for Apache links, add to Python docs. (#20995)
- Update website footer to include required Apache links (#20993)
- Move trusted-by section from main page to a new page (#20788) (#20798)
- Fix broken download link, reformat download page to make links more clear. (#20794)
- Fix static website build (#19906) (#20791)
- Fix broken website for master version (#19945) (#20789)
- Update website for v1.9.x branch. (#20786)
Perl
- Updates mapping between PDL and MX types (#20852)
Apache MXNet (incubating) 1.9.1 Release Candidate 0
Apache MXNet (incubating) 1.9.1 is a maintenance release incorporating important bug fixes and performance improvements. All users of Apache MXNet (incubating) 1.9.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.9.1 at the usual place. Please review these Release Notes to learn the bug fixes.
Bug-fixes
-
Upgrade numpy to <1.20.0 to avoid security vulnerabilities affecting numpy<1.19.1 (#20940)
-
quantized elemwise mul changed out type to float (#20926)
-
Avoid modifying loaded library map while iterating in lib_close() (#20941) (#20944)
-
Fixed issue with batchnorm on even number of channels (#20927)
-
Assign attributes of transformer operators (#20902)
-
Fix reuse of primitives for MKLDNN-AArch64. Fixes #20265. (#20482) (#20921)
-
identity fuse (#20884)
-
Port changes from master to make CPP package properly build when large tensor support is enabled. (#20768) (#20841)
-
Port BRGEMM (#20910)
Submodule
- Upgrade oneDNN to the top of rls-v2.4 branch (#20994)
CI/CD
- Fix aarch64 cd pipeline (#20783)
- Fix CD for pypi wheel version (#20782)
- Port #20903 from master. (#20918) (#20920)
- Fix pip installation in containers (#20864)
- Update libcudnn and libnccl to the same version used in NVidia's docker container for cuda 10.2 and 11.2, and update repo where we pull the packages from. (#20808)
Website
- Fix css for Apache links, add to Python docs. (#20995)
- Update website footer to include required Apache links (#20993)
- Move trusted-by section from main page to a new page (#20788) (#20798)
- Fix broken download link, reformat download page to make links more clear. (#20794)
- Fix static website build (#19906) (#20791)
- Fix broken website for master version (#19945) (#20789)
- Update website for v1.9.x branch. (#20786)
Perl
- Updates mapping between PDL and MX types (#20852)
Apache MXNet (incubating) 2.0.0.beta1 Release
Features
Implementations and Improvements
Array-API Standardization
- [API] Extend NumPy Array dtypes with int16, uint16, uint32, uint64 (#20478)
- [API Standardization] Add Linalg kernels: (diagonal, outer, tensordot, cross, trace, matrix_transpose) (#20638)
- [API Standardization]Standardize MXNet NumPy Statistical & Linalg Functions (#20592)
- [2.0] Bump Python to >= 3.8 (#20593)
- [API] Add positive (#20667)
- [API] Add logaddexp (#20673)
- [API] Add linalg.svdvals (#20696)
- [API] Add floor_divide (#20620)
- [API STD][SEARCH FUNC] Add keepdims=False to argmax/argmin (#20692)
- [API NEW][METHOD] Add mT, permute_dims (#20688)
- [API] Add bitwise_left/right_shift (#20587)
- [API NEW][ARRAY METHOD] Add Index() and array_namespace() (#20689)
- [API STD][LINALG] Standardize sort & linalg operators (#20694)
- [API NEW][SET FUNC] Add set functions (#20693)
- [API] Standardize MXNet NumPy creation functions (#20572)
- [API NEW][LINALG] Add vector_norm, matrix_norm (#20703)
- [API TESTS] Standardization and add more array api tests (#20725)
- [API] Add new dlpack API (#20546)
FFI Improvements
- [FFI] Add new containers and Implementations (#19685)
- [FFI] Randint (#20083)
- [FFI] npx.softmax, npx.activation, npx.batch_norm, npx.fully_connected (#20087)
- [FFI] expand_dims (#20073)
- [FFI] npx.pick, npx.convolution, npx.deconvolution (#20101)
- [FFI] npx.pooling, npx.dropout, npx.one_hot, npx.rnn (#20102)
- [FFI] fix masked_softmax (#20114)
- [FFI] part5: npx.batch_dot, npx.arange_like, npx.broadcast_like (#20110)
- [FFI] part4: npx.embedding, npx.topk, npx.layer_norm, npx.leaky_relu (#20105)
- make stack use faster API (#20059)
- Add interleaved_matmul_* to npx namespace (#20375)
Operators
- [FEATURE] AdaBelief operator (#20065)
- [Op] Fix reshape and mean (#20058)
- Fusing gelu post operator in Fully Connected symbol (#20228)
- [operator] Add logsigmoid activation function (#20268)
- [operator] Add Mish Activation Function (#20320)
- [operator] add threshold for mish (#20339)
- [NumPy] Wrap unravel_index backend implementation instead of fallback (#20730)
cuDNN & CUDA & RTC & GPU Engine
- [FEATURE] Use RTC for reduction ops (#19426)
- Improve add_bias_kernel for small bias length (#19744)
- [PERF] Moving GPU softmax to RTC and optimizations (#19905)
- [FEATURE] Load libcuda with dlopen instead of dynamic linking (#20484)
- [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility (#20443)
- Expand NVTX usage (#18683)
- Fast cuDNN BatchNorm NHWC kernels support (#20615)
- Add async GPU dependency Engine (#20331)
- Port convolutions to cuDNN v8 API (#20635)
- Automatic Layout Management (#20718)
- Use cuDNN for conv bias and bias grad (#20771)
- Fix the regular expression in RTC code (#20810)
Miscs
- 1bit gradient compression implementation (#17952)
- add inline for __half2float_warp (#20152)
- [FEATURE] Add interleaved batch_dot oneDNN fuses for new GluonNLP models (#20312)
- [ONNX] Foward port new mx2onnx into master (#20355)
- Add new benchmark function for single operator comparison (#20388)
- [BACKPORT] [FEATURE] Add API to control denormalized computations (#20387)
- [v1.9.x] modify erfinv implementation based on scipy (#20517) (#20550)
- [REFACTOR] Refactor test_quantize.py to use Gluon API (#20227)
- Switch all HybridBlocks to use forward interface (#20262)
- [FEATURE] MXIndexedRecordIO: avoid re-build index (#20549)
- Split np_elemwise_broadcast_logic_op.cc (#20580)
- [FEATURE] Add feature of retain_grad (#20500)
- [v2.0] Split Large Source Files (#20604)
- [submodule] Remove soon to be obsolete dnnl nomenclature from mxnet (#20606)
- Added ::GCD and ::LCM: [c++17] contains gcd and lcm implementation (#20583)
- [v2.0] RNN: use rnn_params (#20384)
- Add quantized batch_dot (#20680)
- [master] Add aliases for subgraph operators to be compatible with old models (#20679)
- Optimize preparation of selfattn operators (#20682)
- Fix scale bug in quantized batch_dot (#20735)
- [master] Merge DNNL adaptive pooling with standard pooling (#20741)
- Avoid redundant memcpy when reorder not in-place (#20746)
- Add microbenchmark for FC + add fusion (#20780)
- Optimize 'take' operator for CPU (#20745)
- [FEATURE] Add g5 instance to CI (#20876)
- Avoid modifying loaded library map while iterating in lib_close() (#20941)
- quantized transpose operator (#20817)
- Remove first_quantization_pass FC property (#20908)
- Reduce after quantization memory usage (#20894)
- [FEATURE] Add quantized version of reshape with DNNL reorder primitive. (#20835)
- [FEATURE] Fuse dequantize with convolution (#20816)
- [FEATURE] Add binomial sampling and fix multinomial sampling (#20734)
- Refactor src/operator/subgraph/dnnl/dnnl_conv.cc file (#20849)
Language Bindings
MKL & OneDNN
- [operator] Integrate oneDNN layer normalization implementation (#19562)
- Change inner mxnet flags nomenclature for oneDNN library (#19944)
- Change MXNET_MKLDNN_DEBUG define name to MXNET_ONEDNN_DEBUG (#20031)
- Change mx_mkldnn_lib to mx_onednn_lib in Jenkins_steps.groovy file (#20035)
- Fix oneDNN feature name in MxNET (#20070)
- Change MXNET_MKLDNN* flag names to MXNET_ONEDNN* (#20071)
- Change _mkldnn test and build scenarios names to _onednn (#20034)
- [submodule] Upgrade oneDNN to v2.2.1 (#20080)
- [submodule] Upgrade oneDNN to v2.2.2 (#20267)
- [operator] Integrate matmul primitive from oneDNN in batch dot (#20340)
- [submodule] Upgrade oneDNN to v2.2.3 (#20345)
- [submodule] Upgrade oneDNN to v2.2.4 (#20360)
- [submodule] Upgrade oneDNN to v2.3 (#20418)
- Fix backport of SoftmaxOutput implementation using onednn kernels (#20459)
- [submodule] Upgrade oneDNN to v2.3.2 (#20502)
- [FEATURE] Add oneDNN support for npx.reshape and np.reshape (#20563)
- [Backport] Enabling BRGEMM FullyConnected based on shapes (#20568)
- [BACKPORT][BUGFIX][FEATURE] Add oneDNN 1D and 3D deconvolution support and fix bias (#20292)
- [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP (#20474)
- [Performance] Add oneDNN support for temperature parameter in Softmax (#20567)
- [FEATURE] Add oneDNN support for numpy concatenate operator (#20652)
- [master] Make warning message when oneDNN is turned off less confusing (#20700)
- [FEATURE] add oneDNN support for numpy transpose (#20419)
- Reintroduce next_impl in onednn deconvolution (#20663)
- Unify all names used to refer to oneDNN library in logs and docs to oneDNN (#20719)
- Improve stack operator performance by oneDNN (#20621)
- [submodule] Upgrade oneDNN to v2.3.3 (#20752)
- Unifying oneDNN post-quantization properties (#20724)
- Add oneDNN support for reduce operators (#20669)
- Remove identity operators from oneDNN optimized graph (#20712)
- Fix oneDNN fallback for concat with scalar (#20772)
- Fix identity fuse for oneDNN (#20767)
- Improve split operator by oneDNN reorder primitive (#20757)
- Remove doubled oneDNN memory descriptor creation (#20822)
- [FEATURE] Integrate oneDNN support for add, subtract, multiply, divide. (#20713)
- [master] 2022.00 MKL' version, update (#20865)
- Add oneDNN support for "where" operator (#20862)
- [master] Implemented oneDNN Backward Adaptive Pooling kernel (#20825)
- Improve MaskedSoftmax by oneDNN (#20853)
- [Feature] Add bfloat to oneDNN version of binary broadcast operators. (#20846)
- [submodule] Upgrade oneDNN to v2.5.2 (#20843)
- Make convolution operator fully work with oneDNN v2.4+ (#20847)
- [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN (#20821)
- [master][submodule] Upgrade oneDNN to v2.5.1 (#20662)
CI-CD
- CI Infra updates (#19903)
- Fix cd by adding to $PATH (#19939)
- Fix nightly CD for python docker image releases (#19772)
- pass version param (#19984)
- Update ci/dev_menu.py file (#20053)
- add gomp and quadmath (#20121)
- [CD] Fix the name of the pip wheels in CD (#20115)
- Attemp to fix nightly docker for master cu112 (#20126)
- Disable codecov (#20173)
- [BUGFIX] Fix CI slowdown issue after removing 3rdparty/openmp (#20367)
- cudnn8 for cu101 in cd (#20408)
- [wip] Re-enable code cov (#20427)
- [CI] Fix centos CI & website build (#20512)
- [CI] Move link check from jenkins to github action (#20526)
- Pin jupyter-client (#20545)
- [CI] Add node for website full build and nightly build (#20543)
- use restricted g4 node (#20554)
- [CI] Freeze array-api-test (#20631)
- Fix os_x_mklbuild.yml (#20668)
- [CI] UPgrade windows CI (#20676)
- [master][bugfix] Remove exit 0 to avoid blocking in CI pipeline (#20683)
- [CI] Add timeout and retry to linkcheck (#20708)
- Prospector checker initial commit (#20684)
- [master][ci][feature] Static code checker for CMake files (#20706)
- Fix sanity CI (#20763)
- [CI] Workaround MKL CI timeout issue (#20777)
- [master] CI/CD updates to be more stable (#20740)
Website & Documentation & Style
- Fix static website build (#19906)
- [website] Fix broken website for master version (#19945)
- add djl (#19970)
- [website] Automate website artifacts uploading (#19955)
- Grammar fix (added period to README) (#19998)
- [website] Update for MXNet 1.8.0 website release (#20013)
- fix format issue (#20022)
- [DOC]Disabling hybridization steps added (#19986)
- [DOC] Add Flower to MXNet ecosystem (#20038)
- doc add relu (#20193)
- Avoid UnicodeDecodeError in method doc on Windows (#20215)
- updated news.md and readme.md for 1.8.0 release (#19975)
- [DOC] Update Website to Add Prerequisites for GPU pip install (#20168)
- update short desc for pip (#20236)
- [website] Fix Jinja2 version for python doc (#20263)
- [Master] Auto-formatter to keep the same coding style (#20472)
- [DOC][v2.0] Part1: Link Check (#20487)
- [DOC...
Apache MXNet (incubating) 2.0.0.beta1 Release Candidate 1
Features
Implementations and Improvements
Array-API Standardization
- [API] Extend NumPy Array dtypes with int16, uint16, uint32, uint64 (#20478)
- [API Standardization] Add Linalg kernels: (diagonal, outer, tensordot, cross, trace, matrix_transpose) (#20638)
- [API Standardization]Standardize MXNet NumPy Statistical & Linalg Functions (#20592)
- [2.0] Bump Python to >= 3.8 (#20593)
- [API] Add positive (#20667)
- [API] Add logaddexp (#20673)
- [API] Add linalg.svdvals (#20696)
- [API] Add floor_divide (#20620)
- [API STD][SEARCH FUNC] Add keepdims=False to argmax/argmin (#20692)
- [API NEW][METHOD] Add mT, permute_dims (#20688)
- [API] Add bitwise_left/right_shift (#20587)
- [API NEW][ARRAY METHOD] Add Index() and array_namespace() (#20689)
- [API STD][LINALG] Standardize sort & linalg operators (#20694)
- [API NEW][SET FUNC] Add set functions (#20693)
- [API] Standardize MXNet NumPy creation functions (#20572)
- [API NEW][LINALG] Add vector_norm, matrix_norm (#20703)
- [API TESTS] Standardization and add more array api tests (#20725)
- [API] Add new dlpack API (#20546)
FFI Improvements
- [FFI] Add new containers and Implementations (#19685)
- [FFI] Randint (#20083)
- [FFI] npx.softmax, npx.activation, npx.batch_norm, npx.fully_connected (#20087)
- [FFI] expand_dims (#20073)
- [FFI] npx.pick, npx.convolution, npx.deconvolution (#20101)
- [FFI] npx.pooling, npx.dropout, npx.one_hot, npx.rnn (#20102)
- [FFI] fix masked_softmax (#20114)
- [FFI] part5: npx.batch_dot, npx.arange_like, npx.broadcast_like (#20110)
- [FFI] part4: npx.embedding, npx.topk, npx.layer_norm, npx.leaky_relu (#20105)
- make stack use faster API (#20059)
- Add interleaved_matmul_* to npx namespace (#20375)
Operators
- [FEATURE] AdaBelief operator (#20065)
- [Op] Fix reshape and mean (#20058)
- Fusing gelu post operator in Fully Connected symbol (#20228)
- [operator] Add logsigmoid activation function (#20268)
- [operator] Add Mish Activation Function (#20320)
- [operator] add threshold for mish (#20339)
- [NumPy] Wrap unravel_index backend implementation instead of fallback (#20730)
cuDNN & CUDA & RTC & GPU Engine
- [FEATURE] Use RTC for reduction ops (#19426)
- Improve add_bias_kernel for small bias length (#19744)
- [PERF] Moving GPU softmax to RTC and optimizations (#19905)
- [FEATURE] Load libcuda with dlopen instead of dynamic linking (#20484)
- [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility (#20443)
- Expand NVTX usage (#18683)
- Fast cuDNN BatchNorm NHWC kernels support (#20615)
- Add async GPU dependency Engine (#20331)
- Port convolutions to cuDNN v8 API (#20635)
- Automatic Layout Management (#20718)
- Use cuDNN for conv bias and bias grad (#20771)
- Fix the regular expression in RTC code (#20810)
Miscs
- 1bit gradient compression implementation (#17952)
- add inline for __half2float_warp (#20152)
- [FEATURE] Add interleaved batch_dot oneDNN fuses for new GluonNLP models (#20312)
- [ONNX] Foward port new mx2onnx into master (#20355)
- Add new benchmark function for single operator comparison (#20388)
- [BACKPORT] [FEATURE] Add API to control denormalized computations (#20387)
- [v1.9.x] modify erfinv implementation based on scipy (#20517) (#20550)
- [REFACTOR] Refactor test_quantize.py to use Gluon API (#20227)
- Switch all HybridBlocks to use forward interface (#20262)
- [FEATURE] MXIndexedRecordIO: avoid re-build index (#20549)
- Split np_elemwise_broadcast_logic_op.cc (#20580)
- [FEATURE] Add feature of retain_grad (#20500)
- [v2.0] Split Large Source Files (#20604)
- [submodule] Remove soon to be obsolete dnnl nomenclature from mxnet (#20606)
- Added ::GCD and ::LCM: [c++17] contains gcd and lcm implementation (#20583)
- [v2.0] RNN: use rnn_params (#20384)
- Add quantized batch_dot (#20680)
- [master] Add aliases for subgraph operators to be compatible with old models (#20679)
- Optimize preparation of selfattn operators (#20682)
- Fix scale bug in quantized batch_dot (#20735)
- [master] Merge DNNL adaptive pooling with standard pooling (#20741)
- Avoid redundant memcpy when reorder not in-place (#20746)
- Add microbenchmark for FC + add fusion (#20780)
- Optimize 'take' operator for CPU (#20745)
- [FEATURE] Add g5 instance to CI (#20876)
- Avoid modifying loaded library map while iterating in lib_close() (#20941)
- quantized transpose operator (#20817)
- Remove first_quantization_pass FC property (#20908)
- Reduce after quantization memory usage (#20894)
- [FEATURE] Add quantized version of reshape with DNNL reorder primitive. (#20835)
- [FEATURE] Fuse dequantize with convolution (#20816)
- [FEATURE] Add binomial sampling and fix multinomial sampling (#20734)
- Refactor src/operator/subgraph/dnnl/dnnl_conv.cc file (#20849)
Language Bindings
MKL & OneDNN
- [operator] Integrate oneDNN layer normalization implementation (#19562)
- Change inner mxnet flags nomenclature for oneDNN library (#19944)
- Change MXNET_MKLDNN_DEBUG define name to MXNET_ONEDNN_DEBUG (#20031)
- Change mx_mkldnn_lib to mx_onednn_lib in Jenkins_steps.groovy file (#20035)
- Fix oneDNN feature name in MxNET (#20070)
- Change MXNET_MKLDNN* flag names to MXNET_ONEDNN* (#20071)
- Change _mkldnn test and build scenarios names to _onednn (#20034)
- [submodule] Upgrade oneDNN to v2.2.1 (#20080)
- [submodule] Upgrade oneDNN to v2.2.2 (#20267)
- [operator] Integrate matmul primitive from oneDNN in batch dot (#20340)
- [submodule] Upgrade oneDNN to v2.2.3 (#20345)
- [submodule] Upgrade oneDNN to v2.2.4 (#20360)
- [submodule] Upgrade oneDNN to v2.3 (#20418)
- Fix backport of SoftmaxOutput implementation using onednn kernels (#20459)
- [submodule] Upgrade oneDNN to v2.3.2 (#20502)
- [FEATURE] Add oneDNN support for npx.reshape and np.reshape (#20563)
- [Backport] Enabling BRGEMM FullyConnected based on shapes (#20568)
- [BACKPORT][BUGFIX][FEATURE] Add oneDNN 1D and 3D deconvolution support and fix bias (#20292)
- [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP (#20474)
- [Performance] Add oneDNN support for temperature parameter in Softmax (#20567)
- [FEATURE] Add oneDNN support for numpy concatenate operator (#20652)
- [master] Make warning message when oneDNN is turned off less confusing (#20700)
- [FEATURE] add oneDNN support for numpy transpose (#20419)
- Reintroduce next_impl in onednn deconvolution (#20663)
- Unify all names used to refer to oneDNN library in logs and docs to oneDNN (#20719)
- Improve stack operator performance by oneDNN (#20621)
- [submodule] Upgrade oneDNN to v2.3.3 (#20752)
- Unifying oneDNN post-quantization properties (#20724)
- Add oneDNN support for reduce operators (#20669)
- Remove identity operators from oneDNN optimized graph (#20712)
- Fix oneDNN fallback for concat with scalar (#20772)
- Fix identity fuse for oneDNN (#20767)
- Improve split operator by oneDNN reorder primitive (#20757)
- Remove doubled oneDNN memory descriptor creation (#20822)
- [FEATURE] Integrate oneDNN support for add, subtract, multiply, divide. (#20713)
- [master] 2022.00 MKL' version, update (#20865)
- Add oneDNN support for "where" operator (#20862)
- [master] Implemented oneDNN Backward Adaptive Pooling kernel (#20825)
- Improve MaskedSoftmax by oneDNN (#20853)
- [Feature] Add bfloat to oneDNN version of binary broadcast operators. (#20846)
- [submodule] Upgrade oneDNN to v2.5.2 (#20843)
- Make convolution operator fully work with oneDNN v2.4+ (#20847)
- [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN (#20821)
- [master][submodule] Upgrade oneDNN to v2.5.1 (#20662)
CI-CD
- CI Infra updates (#19903)
- Fix cd by adding to $PATH (#19939)
- Fix nightly CD for python docker image releases (#19772)
- pass version param (#19984)
- Update ci/dev_menu.py file (#20053)
- add gomp and quadmath (#20121)
- [CD] Fix the name of the pip wheels in CD (#20115)
- Attemp to fix nightly docker for master cu112 (#20126)
- Disable codecov (#20173)
- [BUGFIX] Fix CI slowdown issue after removing 3rdparty/openmp (#20367)
- cudnn8 for cu101 in cd (#20408)
- [wip] Re-enable code cov (#20427)
- [CI] Fix centos CI & website build (#20512)
- [CI] Move link check from jenkins to github action (#20526)
- Pin jupyter-client (#20545)
- [CI] Add node for website full build and nightly build (#20543)
- use restricted g4 node (#20554)
- [CI] Freeze array-api-test (#20631)
- Fix os_x_mklbuild.yml (#20668)
- [CI] UPgrade windows CI (#20676)
- [master][bugfix] Remove exit 0 to avoid blocking in CI pipeline (#20683)
- [CI] Add timeout and retry to linkcheck (#20708)
- Prospector checker initial commit (#20684)
- [master][ci][feature] Static code checker for CMake files (#20706)
- Fix sanity CI (#20763)
- [CI] Workaround MKL CI timeout issue (#20777)
- [master] CI/CD updates to be more stable (#20740)
Website & Documentation & Style
- Fix static website build (#19906)
- [website] Fix broken website for master version (#19945)
- add djl (#19970)
- [website] Automate website artifacts uploading (#19955)
- Grammar fix (added period to README) (#19998)
- [website] Update for MXNet 1.8.0 website release (#20013)
- fix format issue (#20022)
- [DOC]Disabling hybridization steps added (#19986)
- [DOC] Add Flower to MXNet ecosystem (#20038)
- doc add relu (#20193)
- Avoid UnicodeDecodeError in method doc on Windows (#20215)
- updated news.md and readme.md for 1.8.0 release (#19975)
- [DOC] Update Website to Add Prerequisites for GPU pip install (#20168)
- update short desc for pip (#20236)
- [website] Fix Jinja2 version for python doc (#20263)
- [Master] Auto-formatter to keep the same coding style (#20472)
- [DOC][v2.0] Part1: Link Check (#20487)
- [DOC...
Apache MXNet (incubating) 2.0.0.beta1 Release Candidate 0
Features
Implementations and Improvements
Array-API Standardization
- [API] Extend NumPy Array dtypes with int16, uint16, uint32, uint64 (#20478)
- [API Standardization] Add Linalg kernels: (diagonal, outer, tensordot, cross, trace, matrix_transpose) (#20638)
- [API Standardization]Standardize MXNet NumPy Statistical & Linalg Functions (#20592)
- [2.0] Bump Python to >= 3.8 (#20593)
- [API] Add positive (#20667)
- [API] Add logaddexp (#20673)
- [API] Add linalg.svdvals (#20696)
- [API] Add floor_divide (#20620)
- [API STD][SEARCH FUNC] Add keepdims=False to argmax/argmin (#20692)
- [API NEW][METHOD] Add mT, permute_dims (#20688)
- [API] Add bitwise_left/right_shift (#20587)
- [API NEW][ARRAY METHOD] Add Index() and array_namespace() (#20689)
- [API STD][LINALG] Standardize sort & linalg operators (#20694)
- [API NEW][SET FUNC] Add set functions (#20693)
- [API] Standardize MXNet NumPy creation functions (#20572)
- [API NEW][LINALG] Add vector_norm, matrix_norm (#20703)
- [API TESTS] Standardization and add more array api tests (#20725)
- [API] Add new dlpack API (#20546)
FFI Improvements
- [FFI] Add new containers and Implementations (#19685)
- [FFI] Randint (#20083)
- [FFI] npx.softmax, npx.activation, npx.batch_norm, npx.fully_connected (#20087)
- [FFI] expand_dims (#20073)
- [FFI] npx.pick, npx.convolution, npx.deconvolution (#20101)
- [FFI] npx.pooling, npx.dropout, npx.one_hot, npx.rnn (#20102)
- [FFI] fix masked_softmax (#20114)
- [FFI] part5: npx.batch_dot, npx.arange_like, npx.broadcast_like (#20110)
- [FFI] part4: npx.embedding, npx.topk, npx.layer_norm, npx.leaky_relu (#20105)
- make stack use faster API (#20059)
- Add interleaved_matmul_* to npx namespace (#20375)
Operators
- [FEATURE] AdaBelief operator (#20065)
- [Op] Fix reshape and mean (#20058)
- Fusing gelu post operator in Fully Connected symbol (#20228)
- [operator] Add logsigmoid activation function (#20268)
- [operator] Add Mish Activation Function (#20320)
- [operator] add threshold for mish (#20339)
- [NumPy] Wrap unravel_index backend implementation instead of fallback (#20730)
cuDNN & CUDA & RTC & GPU Engine
- [FEATURE] Use RTC for reduction ops (#19426)
- Improve add_bias_kernel for small bias length (#19744)
- [PERF] Moving GPU softmax to RTC and optimizations (#19905)
- [FEATURE] Load libcuda with dlopen instead of dynamic linking (#20484)
- [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility (#20443)
- Expand NVTX usage (#18683)
- Fast cuDNN BatchNorm NHWC kernels support (#20615)
- Add async GPU dependency Engine (#20331)
- Port convolutions to cuDNN v8 API (#20635)
- Automatic Layout Management (#20718)
- Use cuDNN for conv bias and bias grad (#20771)
- Fix the regular expression in RTC code (#20810)
Miscs
- 1bit gradient compression implementation (#17952)
- add inline for __half2float_warp (#20152)
- [FEATURE] Add interleaved batch_dot oneDNN fuses for new GluonNLP models (#20312)
- [ONNX] Foward port new mx2onnx into master (#20355)
- Add new benchmark function for single operator comparison (#20388)
- [BACKPORT] [FEATURE] Add API to control denormalized computations (#20387)
- [v1.9.x] modify erfinv implementation based on scipy (#20517) (#20550)
- [REFACTOR] Refactor test_quantize.py to use Gluon API (#20227)
- Switch all HybridBlocks to use forward interface (#20262)
- [FEATURE] MXIndexedRecordIO: avoid re-build index (#20549)
- Split np_elemwise_broadcast_logic_op.cc (#20580)
- [FEATURE] Add feature of retain_grad (#20500)
- [v2.0] Split Large Source Files (#20604)
- [submodule] Remove soon to be obsolete dnnl nomenclature from mxnet (#20606)
- Added ::GCD and ::LCM: [c++17] contains gcd and lcm implementation (#20583)
- [v2.0] RNN: use rnn_params (#20384)
- Add quantized batch_dot (#20680)
- [master] Add aliases for subgraph operators to be compatible with old models (#20679)
- Optimize preparation of selfattn operators (#20682)
- Fix scale bug in quantized batch_dot (#20735)
- [master] Merge DNNL adaptive pooling with standard pooling (#20741)
- Avoid redundant memcpy when reorder not in-place (#20746)
- Add microbenchmark for FC + add fusion (#20780)
- Optimize 'take' operator for CPU (#20745)
Language Bindings
MKL & OneDNN
- [operator] Integrate oneDNN layer normalization implementation (#19562)
- Change inner mxnet flags nomenclature for oneDNN library (#19944)
- Change MXNET_MKLDNN_DEBUG define name to MXNET_ONEDNN_DEBUG (#20031)
- Change mx_mkldnn_lib to mx_onednn_lib in Jenkins_steps.groovy file (#20035)
- Fix oneDNN feature name in MxNET (#20070)
- Change MXNET_MKLDNN* flag names to MXNET_ONEDNN* (#20071)
- Change _mkldnn test and build scenarios names to _onednn (#20034)
- [submodule] Upgrade oneDNN to v2.2.1 (#20080)
- [submodule] Upgrade oneDNN to v2.2.2 (#20267)
- [operator] Integrate matmul primitive from oneDNN in batch dot (#20340)
- [submodule] Upgrade oneDNN to v2.2.3 (#20345)
- [submodule] Upgrade oneDNN to v2.2.4 (#20360)
- [submodule] Upgrade oneDNN to v2.3 (#20418)
- Fix backport of SoftmaxOutput implementation using onednn kernels (#20459)
- [submodule] Upgrade oneDNN to v2.3.2 (#20502)
- [FEATURE] Add oneDNN support for npx.reshape and np.reshape (#20563)
- [Backport] Enabling BRGEMM FullyConnected based on shapes (#20568)
- [BACKPORT][BUGFIX][FEATURE] Add oneDNN 1D and 3D deconvolution support and fix bias (#20292)
- [FEATURE] Enable dynamic linking with MKL and compiler based OpenMP (#20474)
- [Performance] Add oneDNN support for temperature parameter in Softmax (#20567)
- [FEATURE] Add oneDNN support for numpy concatenate operator (#20652)
- [master] Make warning message when oneDNN is turned off less confusing (#20700)
- [FEATURE] add oneDNN support for numpy transpose (#20419)
- Reintroduce next_impl in onednn deconvolution (#20663)
- Unify all names used to refer to oneDNN library in logs and docs to oneDNN (#20719)
- Improve stack operator performance by oneDNN (#20621)
- [submodule] Upgrade oneDNN to v2.3.3 (#20752)
- Unifying oneDNN post-quantization properties (#20724)
- Add oneDNN support for reduce operators (#20669)
- Remove identity operators from oneDNN optimized graph (#20712)
- Fix oneDNN fallback for concat with scalar (#20772)
- Fix identity fuse for oneDNN (#20767)
- Improve split operator by oneDNN reorder primitive (#20757)
- Remove doubled oneDNN memory descriptor creation (#20822)
- [FEATURE] Integrate oneDNN support for add, subtract, multiply, divide. (#20713)
CI-CD
- CI Infra updates (#19903)
- Fix cd by adding to $PATH (#19939)
- Fix nightly CD for python docker image releases (#19772)
- pass version param (#19984)
- Update ci/dev_menu.py file (#20053)
- add gomp and quadmath (#20121)
- [CD] Fix the name of the pip wheels in CD (#20115)
- Attemp to fix nightly docker for master cu112 (#20126)
- Disable codecov (#20173)
- [BUGFIX] Fix CI slowdown issue after removing 3rdparty/openmp (#20367)
- cudnn8 for cu101 in cd (#20408)
- [wip] Re-enable code cov (#20427)
- [CI] Fix centos CI & website build (#20512)
- [CI] Move link check from jenkins to github action (#20526)
- Pin jupyter-client (#20545)
- [CI] Add node for website full build and nightly build (#20543)
- use restricted g4 node (#20554)
- [CI] Freeze array-api-test (#20631)
- Fix os_x_mklbuild.yml (#20668)
- [CI] UPgrade windows CI (#20676)
- [master][bugfix] Remove exit 0 to avoid blocking in CI pipeline (#20683)
- [CI] Add timeout and retry to linkcheck (#20708)
- Prospector checker initial commit (#20684)
- [master][ci][feature] Static code checker for CMake files (#20706)
- Fix sanity CI (#20763)
- [CI] Workaround MKL CI timeout issue (#20777)
- [master] CI/CD updates to be more stable (#20740)
Website & Documentation & Style
- Fix static website build (#19906)
- [website] Fix broken website for master version (#19945)
- add djl (#19970)
- [website] Automate website artifacts uploading (#19955)
- Grammar fix (added period to README) (#19998)
- [website] Update for MXNet 1.8.0 website release (#20013)
- fix format issue (#20022)
- [DOC]Disabling hybridization steps added (#19986)
- [DOC] Add Flower to MXNet ecosystem (#20038)
- doc add relu (#20193)
- Avoid UnicodeDecodeError in method doc on Windows (#20215)
- updated news.md and readme.md for 1.8.0 release (#19975)
- [DOC] Update Website to Add Prerequisites for GPU pip install (#20168)
- update short desc for pip (#20236)
- [website] Fix Jinja2 version for python doc (#20263)
- [Master] Auto-formatter to keep the same coding style (#20472)
- [DOC][v2.0] Part1: Link Check (#20487)
- [DOC][v2.0] Part3: Evaluate Notebooks (#20490)
- If variable is not used within the loop body, start the name with an underscore (#20505)
- [v2.0][DOC] Add migration guide (#20473)
- [Master] Clang-formatter: only src/ directory (#20571)
- [Website] Fix website publish (#20573)
- [v2.0] Update Examples (#20602)
- Attempt to fix website build pipeline (#20634)
- [Master] Ignoring mass reformatting commits with git blame (#20578)
- [Feature][Master] Clang-format tool to perform additional formatting and semantic checking of code. (#20433)
- [Master] Clang-format description on a wiki (#20612)
- Add: break line entry before tenary (#20705)
- Fix csr param description (#20698)
- [master] Bring dnnl_readme.md on master up-to-date (#20670)
- Remove extra spaces between 'if' (#20721)
- [DOC] Fix migration guide document (#20716)
- [master][clang-format] Re-format cc. .h. .cu files; cond. (#20704)
- [master][style-fix] Clang-format comment style fix (#20744)
- Port #20786 from v1.9.x (#20787)
- remove broken links (#20793)
- Fix broken download link, reformat downlo...
Apache MXNet (incubating) 1.9.0
Features
ONNX
- ONNX fix node output sort (#20327)
- fix embedding and output order (#20305)
- Add more ONNX export support to operators (#19625)
- onnx support more ops (#19653)
- ONNX _contrib_interleaved_matmul_selfatt_valatt and LayerNorm (#19661)
- Improve onnx test suite (#19662)
- Make ONNX export operators work properly with the node input shape (#19676)
- Onnx fix slice_axis and embedding and reshape (#19677)
- Add more onnx export unit tests, refactor onnxruntime tests. (#19689)
- Update onnx export support for FullyConnected and add unit tests (#19679)
- Add coverage to onnx test pipeline. (#19682)
- onnx test coverage for leakyrelu elemwise_add concat activation (#19687)
- ONNX fix softmax (#19691)
- More onnx export updates (#19692)
- onnx fix fullyconnected (#19693)
- ONNX fix embedding and slice (#19695)
- Add more CV models to onnxruntime inference test, add bert model test. (#19697)
- Add more ONNX export operator support (#19727)
- ONNX Supoort for MXNet repeat op (#19732)
- ONNX Supoort for MXNet _contrib_BilinearResize2D op (#19733)
- ONNX support adaptiveAveragePooling2D and update Softmax to support temperature (#19736)
- ONNX Supoort for MXNet reverse op (#19737)
- Add onnx export support for where and greater_scalar operators. (#19745)
- ONNX support for box_decode (#19750)
- ONNX contrib_box_nms (#19755)
- Onnx support for reshape_like (#19759)
- ONNX conversion for topk (#19761)
- _maximum_scalar (#19763)
- Onnx export support for gather_nd (#19767)
- ONNX support for broadcast_mod (#19770)
- Onnx export support for batch_dot (#19775)
- ONNX support for slice_like (#19782)
- ONNX export support for SwapAxis (#19789)
- broadcast_like (#19791)
- ONNX support for Softmax -- optimize for axis=-1 case (#19794)
- Onnx support for upsampling (#19795)
- ONNX export support for multiple input data types (#19796)
- Refactor onnx tests for object classification, add object detection tests (#19802)
- Onnx Reshpe support for special caes (#19804)
- Onnx export support for ROIAlign (#19814)
- Add image segmentation end-to-end tests and expand object classification tests (#19815)
- Add onnx operator unit tests for sum, broadcast_mul (#19820)
- Add onnx export function for log2 operator, add operator unit test and update tests to allow comparing NaN values. (#19822)
- ONNX 1.6 compatibility fix + fix for when multiple nodes have the same name (#19823)
- Add ONNX export support for equal_scalar operator (#19824)
- ONNX Export Support for Pooling & Convolution (#19831)
- Add onnx end-to-end tests for pose estimation and action recognition models. (#19834)
- new cases (#19835)
- batchnorm tests (#19836)
- Onnx Support for Dropout (#19837)
- Bump Up CI ONNX Tests Thread Count (#19845)
- nnx export support for slicechannel and box_nms (#19846)
- Move majority of ONNX model tests to nightly, only test a few models in PR pipeline (#19848)
- ONNX export rewrite Take (#19851)
- ONNX export fix slice_axis (#19853)
- ONNX support for argsort (#19854)
- enable 3d convolution (#19855)
- ONNX export rewrite tile (#19868)
- reshape corner cases for mask rcnn (#19875)
- refactor code (#19887)
- Add onnx export operator for minimum_scalar. (#19888)
- ONNX Fixes (#19914)
- Add onnx export support and unit tests for zeros and ones. (#19951)
- Add onnx export support for one_hot and random_uniform_like and unit tests for one_hot. (#19952)
- ONNX support for SequenceReverse (#19954)
- ONNX export support for RNN (#19958)
- ONNX Fixes for some NLP models (#19973)
- ONNX Type inference support (#19990)
- add roberta tests (#19996)
- add ONNX DistilBERT tests (#19999)
- Onnx Dynamic Shapes Support (#20001)
- ONNX Support for pretrained StandardRNN models (#20017)
- Add AWDRNN Pratrained model test (#20018)
- fix squeeze (#20020)
- website update for 1.8.0 (#20021)
- add ernie onnx test (#20030)
- Onnx Support for Transformer (#20048)
- ONNX export support for GRU (#20060)
- ONNX support fot gpt models (#20061)
- Rearrange ONNX tests in Nightly CI (#20075)
- ONNX Graduation (#20094)
- fix typo (#20106)
- MXNet export for ONNX 1.8 support (#20113)
- split cv tests (#20117)
- skip one test (#20122)
- fix onnx type inference issue (#20130)
- Add mx2onnx operator support matrix (#20139)
- fix mx2onnx wheel (#20157)
- increase test tolerance (#20161)
- ONNX legacy operator fix and test (#20165)
- Onnx Fix 6 MaskRCNN models (#20178)
- onnx legacy operator unit tests + fixes (#20179)
- add faster_rcnn_fpn models (#20190)
- fix test (#20191)
- Add onnx export operator unit tests. (#20192)
- Add more onnx operator export unit tests (#20194)
- ONNX support rewrite norm (#20195)
- ONNX export support from arg/aux params (#20198)
- bump onnxruntime version (#20199)
- skip cv tests (#20208)
- ONNX fix log_softmax for opset 12 (#20209)
- Add more ONNX model tests (#20210)
- ONNX export support for RNN and sum_axis (#20226)
- Add ONNX model support matrix (#20230)
- ONNX optimize softmax (#20231)
- fix (#20240)
- add example (#20245)
- ONNX add support coverage for Reshape and lstm (#20246)
- ONNX support for _split_v2 (#20250)
- ONNX fix RNN input shape (#20255)
- Update ONNX tutorial and doc (#20253)
- change some shapes from 10d to 8d (#20258)
- ONNX export support broadcast_not_equal (#20259)
- ONNX: fix error handling when op is not registered (#20261)
- ONNX tweak Resize op (#20264)
- Add more onnx export unit tests, refactor onnxruntime tests. (#19689)
- ONNX docs and tutorial revision #20269
- onnx fix rnn (#20272)
OneDNN
- Implement oneDNN deconvolution primitives to deconvolution 2D (#20107)
- [Feature] Add oneDNN support for interleaved_matmul_selfatt_* operators (fp32/int8) (#20163)
- Bumped oneDNN version to 1.6.5 (#19449)
- [submodule] Upgrade oneDNN to v2.0 (#19670)
- Impose a plain format for concat’s output when oneDNN would use padding (#19735)
- [submodule] Upgrade to oneDNN v1.7 (#19559)
- Add test case for oneDNN RNN (#19464)
- Fusing gelu post operator in Fully Connected symbol (#19971)
- [submodule] Upgrade oneDNN to v1.6.4 (#19276)
- ElementWiseSum fix for oneDNN (#18777) (#19199)
ARM support
CI-CD improvements
- Fix Nightly CI (#20019)
- correcting cuda 11.2 image name in CI and CD (#19960)
- CI fixes to make more stable and upgradable (#19895)
- Address CI failures with docker timeouts (v2) (#19890)
- Attempt to fix v1.x CI issues. (#19872)
- Update CI build scripts to install python 3.6 from deadsnakes repo (#19788)
- Fix R builds on CI (#19656)
- Update CD Jenkins config for include/mkldnn/oneapi/dnnl (#19725)
- Fix CI builds failing due to invalid GPG keys. (#19377)
- Disable unix-gpu-cu110 pipeline for v1.x build since we now build with cuda 11.0 in windows pipelines. (#19828)
- [BACKPORT]Enable CUDA 11.0 on nightly + CUDA 11.2 on pip (#19295)(#19764) (#19930)
- Fix nightly cd cu102 (#19940)
- Drop cu9x in cd (#19902)
- update cudnn from 7 to 8 for cu102 (#19522)
- update cudnn from 7 to 8 for cu102 (#19506)
- [v.1x] Attempt to fix v1.x cd by installing new cuda compt package (#19959)
- [FEATURE]Migrating all CD pipelines to Ninja build + fix cu112 CD pipeline (#19974)
- Fix nightly CD for python docker image releases (#19774)
- [CD] Fix nightly docker missing lib (#20120)
- [CD] Fix CD cu102 110 112 cuda compatibility (#20116)
- Disable codecov. (#20175)
- Static build for mxnet-cu110 (#19272)
- Use centos7 base image for CD pipeline and aarch64 build (#20423)
Subgraph API
- Move block.optimize_for backend_opts to kwargs (#19386)
- Backport Enable Numpy support for Gluon Block optimize_for to v1.x (#19456)
- Save/Load Gluon Blocks & HybridBlocks (#19565)
- Fixed setting attributes in reviewSubgraph (#19274)
- Fix for optimize_for multiple subgraph properties issue (#19263) (#20142)
- Reuse params from cached_op_args (#20221)
MXNet-TensorRT
- Simplify TRT build by adding onnx_tensorrt targets in CMake (#19742)
- Add 1:many conversions in nnvm_to_onnx and non-flatten GEMM (#19652)
- TRT test update (#19296)
- Fix TRT INT8 unsupported hardware error handling (#19349)
- Update MXNet-TRT doc with the new optimize_for API (#19385)
- Fix precision vars initialization in TRT (#20277)
Build system
- Fix gcc 10 build (#20216)
- Change gcc 8 PPA to ppa:jonathonf/gcc (#19638)
- Add option to build with shared c runtime on windows (#19409) (#19932)
- Create tool for building source archives (#19972)
- [PIP] update manifest to include lib_api.cc (#19850) (#19912)
- Fix windows dll loading for compute capabilties >7.5 (#19931)
- [PIP] add build target in cmake for osx compat (#19110) (#19926)
Documentation
- update news.md and readme.md for 1.8.0 release (#19976)
- Fix python doc version dropdown (#20189)
- Fix cu100 pip link (#20084)
License
- adding License in libmxnet make config .sym and .ver files (#19937)
- add missing license fix from master to v1.x (#19916)
- Fix license for blockingconcurrentqueue (#19910)
- update notice year (#19893)
- Backport [LICENSE] Reorganize rat-excludes file to ease license auditing (#19743) (#19799)
- Update LICENSE (#19704)
- [LICENSE] Change intgemm to a submodule instead of fetch. (#19407)
- License updates per ASF feedback (#20377)
- License updates per feedback (#20428)
- Just remove image classification CPP example from source tarball. (#20530)
- [License] Remove mistakenly placed ASF headers (#20520)
- modify erfinv implementation based on scipy (#20517)
- Add copyright detection and removal in license checker (#20498)
- Make sure files with 2 licenses are listed properly in LICENSE. (#20492)
- Remove the "copyright by contributors" line in source files (#20493)
Website improvements
- add djl and autogluon to website (#19981) ...
Apache MXNet (incubating) 2.0.0.beta0 Release Candidate 0
Features
Implementations and Improvements
- Improve add_bias_kernel for small bias length (#19744)
- [FFI] Add new containers and Implementations (#19685)
- 1bit gradient compression implementation (#17952)
- [Op] Fix reshape and mean (#20058)
- [FFI] Randint (#20083)
- [FFI] npx.softmax, npx.activation, npx.batch_norm, npx.fully_connected (#20087)
- [FFI] expand_dims (#20073)
- [FFI] npx.pick, npx.convolution, npx.deconvolution (#20101)
- [FFI] npx.pooling, npx.dropout, npx.one_hot, npx.rnn (#20102)
- [FFI] fix masked_softmax (#20114)
- add inline for __half2float_warp (#20152)
- [FFI] part5: npx.batch_dot, npx.arange_like, npx.broadcast_like (#20110)
- [FFI] part4: npx.embedding, npx.topk, npx.layer_norm, npx.leaky_relu (#20105)
- [PERF] Moving GPU softmax to RTC and optimizations (#19905)
- [FEATURE] AdaBelief operator (#20065)
- Fusing gelu post operator in Fully Connected symbol (#20228)
- [operator] Add logsigmoid activation function (#20268)
- [FEATURE] Use RTC for reduction ops (#19426)
- make stack use faster API (#20059)
- [operator] Add Mish Activation Function (#20320)
- [operator] add threshold for mish (#20339)
- [operator] Integrate matmul primitive from oneDNN in batch dot (#20340)
- [FEATURE] Add interleaved batch_dot oneDNN fuses for new GluonNLP models (#20312)
- Add interleaved_matmul_* to npx namespace (#20375)
- [FEATURE] Add backend MXGetMaxSupportedArch() and frontend get_rtc_compile_opts() for CUDA enhanced compatibility (#20443)
- [ONNX] Foward port new mx2onnx into master (#20355)
- Add new benchmark function for single operator comparison (#20388)
- [BACKPORT] [FEATURE] Add API to control denormalized computations (#20387)
- [FEATURE] Load libcuda with dlopen instead of dynamic linking (#20484)
- [operator] Integrate oneDNN layer normalization implementation (#19562)
- [v1.9.x] modify erfinv implementation based on scipy (#20517) (#20550)
- [REFACTOR] Refactor test_quantize.py to use Gluon API (#20227)
- Switch all HybridBlocks to use forward interface (#20262)
- [API] Extend NumPy Array dtypes with int16, uint16, uint32, uint64 (#20478)
- [FEATURE] MXIndexedRecordIO: avoid re-build index (#20549)
- [FEATURE] Add oneDNN support for npx.reshape and np.reshape (#20563)
- Split np_elemwise_broadcast_logic_op.cc (#20580)
- Expand NVTX usage (#18683)
- [FEATURE] Add feature of retain_grad (#20500)
- [v2.0] Split Large Source Files (#20604)
Language Bindings
OneDNN
- Change inner mxnet flags nomenclature for oneDNN library (#19944)
- Change MXNET_MKLDNN_DEBUG define name to MXNET_ONEDNN_DEBUG (#20031)
- Change mx_mkldnn_lib to mx_onednn_lib in Jenkins_steps.groovy file (#20035)
- Fix oneDNN feature name in MxNET (#20070)
- Change MXNET_MKLDNN* flag names to MXNET_ONEDNN* (#20071)
- Change _mkldnn test and build scenarios names to _onednn (#20034)
- [submodule] Upgrade oneDNN to v2.2.1 (#20080)
- [submodule] Upgrade oneDNN to v2.2.2 (#20267)
- [submodule] Upgrade oneDNN to v2.2.3 (#20345)
- [submodule] Upgrade oneDNN to v2.2.4 (#20360)
- [submodule] Upgrade oneDNN to v2.3 (#20418)
- Fix backport of SoftmaxOutput implementation using onednn kernels (#20459)
- [submodule] Upgrade oneDNN to v2.3.2 (#20502)
- [Backport] Enabling BRGEMM FullyConnected based on shapes (#20568)
- [BACKPORT][BUGFIX][FEATURE] Add oneDNN 1D and 3D deconvolution support and fix bias (#20292)
CI-CD
- CI Infra updates (#19903)
- Fix cd by adding to $PATH (#19939)
- Fix nightly CD for python docker image releases (#19772)
- pass version param (#19984)
- Update ci/dev_menu.py file (#20053)
- add gomp and quadmath (#20121)
- [CD] Fix the name of the pip wheels in CD (#20115)
- Attemp to fix nightly docker for master cu112 (#20126)
- Disable codecov (#20173)
- [BUGFIX] Fix CI slowdown issue after removing 3rdparty/openmp (#20367)
- cudnn8 for cu101 in cd (#20408)
- [wip] Re-enable code cov (#20427)
- [CI] Fix centos CI & website build (#20512)
- [CI] Move link check from jenkins to github action (#20526)
- Pin jupyter-client (#20545)
- [CI] Add node for website full build and nightly build (#20543)
- use restricted g4 node (#20554)
Website & Documentation & Style
- Fix static website build (#19906)
- [website] Fix broken website for master version (#19945)
- add djl (#19970)
- [website] Automate website artifacts uploading (#19955)
- Grammar fix (added period to README) (#19998)
- [website] Update for MXNet 1.8.0 website release (#20013)
- fix format issue (#20022)
- [DOC]Disabling hybridization steps added (#19986)
- [DOC] Add Flower to MXNet ecosystem (#20038)
- doc add relu (#20193)
- Avoid UnicodeDecodeError in method doc on Windows (#20215)
- updated news.md and readme.md for 1.8.0 release (#19975)
- [DOC] Update Website to Add Prerequisites for GPU pip install (#20168)
- update short desc for pip (#20236)
- [website] Fix Jinja2 version for python doc (#20263)
- [Master] Auto-formatter to keep the same coding style (#20472)
- [DOC][v2.0] Part1: Link Check (#20487)
- [DOC][v2.0] Part3: Evaluate Notebooks (#20490)
- If variable is not used within the loop body, start the name with an underscore (#20505)
- [v2.0][DOC] Add migration guide (#20473)
- [Master] Clang-formatter: only src/ directory (#20571)
- [Website] Fix website publish (#20573)
- [v2.0] Update Examples (#20602)
Build
- add cmake config for cu112 (#19870)
- Remove USE_MKL_IF_AVAILABLE flag (#20004)
- Define NVML_NO_UNVERSIONED_FUNC_DEFS (#20146)
- Fix ChooseBlas.cmake for CMake build dir name (#20072)
- Update select_compute_arch.cmake from upstream (#20369)
- Remove duplicated project command in CMakeLists.txt (#20481)
- Add check for MKL version selection (#20562)
- fix macos cmake with TVM_OP ON (#20570)
License
- fix license for blockingconcurrentqueue (#19909)
- WAR the dataloader issue with forked processes holding stale references (#19925)
- Forward-port #19972 to master. (#19987)
- switch to DISCLAIMER (#20242)
- [v1.9.x] Make sure files with 2 licenses are listed properly in LICENSE. (#20492) (#20519)
- Port license fixes from v1.x. (#20536)
- Port #20495 (#20607)
- [v2.0][LICENSE] Port #20493 (#20608)
- [v2.0][LICENSE] Port #20496 (#20610)
- Port #20520 (#20609)
- [CI] Add Simple GitHub-Action Based License Checker (#20617)
Bug Fixes and Others
- Mark test_masked_softmax as flaky and skip subgraph tests on windows (#19908)
- Removed 3rdparty/openmp submodule (#19953)
- [BUGFIX] Fix AmpCast for float16 (#19749) (#20003)
- fix bugs for encoding params (#20007)
- Fix for test_lans failure (#20036)
- add flaky to norm (#20091)
- Fix dropout and doc (#20124)
- Revert "add flaky to norm (#20091)" (#20125)
- Fix broadcast_like (#20169)
- [BUGFIX] Add check to make sure num_group is non-zero (#20186)
- Update CONTRIBUTORS.md (#20200)
- Update CONTRIBUTORS.md (#20201)
- [Bugfix] Fix take gradient (#20203)
- Fix workspace of BoxNMS (#20212)
- [BUGFIX][BACKPORT] Impose a plain format on padded concat output (#20129)
- [BUGFIX] Fix Windows GPU VS2019 build (#20206) (#20207)
- [BUGFIX]try avoid the error in operator/tensor/amp_cast.h (#20188)
- [BUGFIX] Fix Windows GPU VS2019 build (#20206) (#20207)
- [BUGFIX] fix #18936, #18937 (#19878)
- [BUGFIX] fix numpy op fallback bug when ndarray in kwargs (#20233)
- [BUGFIX] Fix test_zero_sized_dim save/restore of np_shape state (#20365)
- [BUGFIX] Fix quantized_op + requantize + dequantize fuse (#20323)
- [BUGFIX] Switch hybrid_forward to forward in test_fc_int8_fp32_outputs (#20398)
- [2.0] fix benchmark and nightly tests (#20370)
- [BUGFIX] fix log_sigmoid bugs (#20372)
- [BUGFIX] fix npi_concatenate quantization dim/axis (#20383)
- [BUGFIX] enable test_fc_subgraph.py::test_fc_eltwise (#20393)
- [2.0] make npx.load support empty .npz files (#20403)
- change argument order (#20413)
- [BUGFIX] Add checks in BatchNorm's infer shape (#20415)
- [BUGFIX] Fix Precision (#20421)
- [v2.0] Add Optim Warning (#20426)
- fix (#20534)
- Test_take, add additional axis (#20532)
- [BUGFIX] Fix (de)conv (#20597)
- [BUGFIX] Fix NightlyTestForBinary in master branch (#20601)
Apache MXNet (incubating) 1.8.0 Release
Features
CUDA Graphs
CUDA 11 Support
- Update CUB and include it only for CUDA < 11 #18799' (#18975)
- Add new CI pipeline for building and testing with cuda 11.0. (#19149)
- Enable CUDA 11.0 on nightly development builds (#19314)
TensorRT
- TensorRT: add int8 with calibration (#19011)
- Add TRT verbose mode (#19100)
- Backporting TensorRT-Gluon Partition API (and TensorRT 7 support) (#18916)
- Backport TRT test update #19296 (#19298)
OneDNN
- Upgrade to oneDNN v1.6.3 (#19153) (#19161)
- Update oneDNN to official v1.6 release (#18867) (#18867)
- Upgrade to oneDNN v1.6 (#18822)
- bumped version to v1.6.5 (#19437)
- Upgrade to oneDNN v1.7 (#19560)
IntGemm
Subgraph API
Extensions
- Backport #19103 (#19117)
- Backporting #19016 (#19069)
- Backport: Change Partition API's options_map to std::unordered_map #18929 (#18964)
- Backporting #18779 to v1.x (#18894)
- Backport extension bug fixes to v1.8.x (#19469) (#19504)
- fix for MX_ERROR_MSG namespace (#19756)
ONNX
- Update onnx support to work with onnx 1.7.0 with most CV models (#19017)
Large Tensor
- Fix linalg_potri and linalg_potrf operators for large tensor. (#18752)
- Add forward, backward test for linalg.gemm2 (#18784)
- Add large matrix tests for linalg ops: det, inverse, trsm, trmm (#18744)
- Add Large Tensor Test for linalg_syrk (#18782)
- Add Large Dim Checks for linalg Operators (#18816)
- Add forward & backward linalg.gemm test for large size (#18825)
- Adding error message when attempting to use Large tensor with linalg_syevd (#18807)
Website Improvements
Documentation
License
- Stop packaging GPL libquadmath.so (#19055)
- Remove mention of nightly in pypi (#18635) (#18884)
- Mkldnn header fix v1x for nightly binaries (#18797)
- Update LICENSE for all submodules. (#19440)
- LICENSE update (#19443)
- Update LICENSE (#19704) (#19707)
CI Improvements
- Upgrade unix gpu toolchain (#18186) (#18785)
- Fix CI in v1.x branch (#18907)
- Remove extra --build-arg causing docker command to fail. (#19412)
- Fix CI builds failing due to invalid GPG keys. (#19377) (#19388)
Bug Fixes
- Backport #19656 - fix R builds (#19658)
- remove cleanup on side threads (#19557)
- Don't use namespace for pow() function, since it is built into cuda math library, and cast the second argument so it will find an acceptable form. (#19533)
- Remove temporary fix for RNN (#19451)
- backport #19393 to v1.8.x (#19398)
- Fix SoftReLU fused operator numerical stability (#17849) (#19390)
- Temporary fix for RNN with oneDNN seg faults/core dumps (#19308)
- Fix MKLDNN BatchNorm with even number of channels (#19150) #19299 #19425 (#19428)
- Relaxing type requirements for broadcast_like (#17977) (#19448)
- Backporting: Fixed setting attributes in reviewSubgraph (#19278)
- Include oneDNN gemm fix (#19251)
- Fix for breaking change introduced in #17123 when batch_axis=0 (#19283)
- Backport PR #19272 to v1.8.x (#19273)
- Backport PRs in v1.7.x missing from v1.x to v1.8.x (#19262)
- Delete executor before reallocating it memory (#19222)
- Nightly Large Tensor test cherrypicks (#19194) (#19215)
- Tweeking syntax to be closer to other tests (#19186) (#19206)
- ElementWiseSum fix for oneDNN (#18777) (#19200)
- Fix flaky intgemm test in v1.8.x too (#19204)
- Revert "Fix memory leaks in Gluon (#18328) (#18359)" (#19181)
- Improve environment variable handling in unittests (#18424) (#19173)
- Backport Unittest tolerance handling improvements (#18694). Also test seeding (#18762). (#19148)
- Fix the error of gradient of np.pad (#19044) (#19167)
- Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (#19123) (#19158)
- SymbolBlock.imports ignore_extra & allow_missing (#19156)
- Fix race condition in NaiveEngine::PushAsync (#19108) (#19122)
- Empty list cannot be cleared issue fixed. (#14882)
- Update base_module.py (#19096)
- Fix block.export (#17970) (#19075)
- Support for fp16 in SpM x DnsM on GPU (#18930) (#19074)
- Backport of Fix LeakyRelu behaviour on empty input (#18934) (#19009)
- Get rid of monkey patching in LossScaler overflow handling (#18959) (#18973)
- Remove upper bound (#18857) (#18910)
- Fix gelu to use erf based algorithm (#18827) (#18946)
- Cherry-pick #18635 to v1.7.x (#18935) (#18945)
- Backporting backward inference from 2.x #18348 and #18378 (#18895)
- Backport Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (#18676) (#18890)
- Bump version to 1.8.0 (#18899)
- Fixing ONNX spatial export for batchnorm (#17711) (#18846)
- Fix softmax, logsoftmax failed on empty ndarray (#18602) (#18708)
- Add unit tests for potri and potrf backward and check output shape in unit tests. (#18803)
- Add syrk test shape check (#18812)
- Back port optimization to broadcast_axis to MXNet1.x (#18773)
- Fix crash when accessing already destructed static variables (#18768) (#18778)
- Cherrypick #18677 #18713 (#18742)
v2.0.0.alpha.rc3
v2.0.0 Alpha RC3
v2.0.0.alpha.rc2
v2.0.0 Alpha RC2