Skip to content

Releases: microsoft/tensorflow-directml-plugin

tensorflow-directml-plugin 0.4.0

03 Feb 00:57
5973109
Compare
Choose a tag to compare
Pre-release

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.4.0

  • Add DirectML kernels for CudnnRNNCanonicalToParams and CudnnRNNParamsToCanonical
  • Add support for grouped convolution in Conv2DBackpropFilter and Conv3DBackpropFilter
  • Add float16 support for _FusedConv2D

tensorflow-directml-plugin 0.3.0

13 Dec 05:17
e58bf87
Compare
Choose a tag to compare
Pre-release

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.3.0

  • Set tensorflow-cpu==2.10.0 as a hard dependency due to incompatibility with Keras 2.11's default optimizers.
  • Fix overflow in BatchNorm ops when float16 or mixed precision is used.
  • Remove unnecessary Cast operation in ReduceMin and ReduceMax ops.

tensorflow-directml-plugin 0.2.0

21 Oct 01:43
368d8d3
Compare
Choose a tag to compare
Pre-release

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.2.0

  • Improve TensorBoard profiling and capturing chrome traces
  • Add support for exponential_avg_factor != 1.0 in FusedBatchNorm
  • Add an int32 kernel registration for Fill

tensorflow-directml-plugin 0.1.1

05 Oct 18:10
fde3375
Compare
Choose a tag to compare
Pre-release

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.1

  • Fix a crash in InTopKV2 when k is bigger than the size of the axis dimension.

tensorflow-directml-plugin 0.1.0

29 Sep 17:08
536ad9a
Compare
Choose a tag to compare
Pre-release

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.0

  • Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
  • Add DirectML kernels for the RngSkip and RngReadAndSkip operators.
  • Add DirectML kernels for the StatelessRandomGetKeyCounterAlg, StatelessRandomGetKeyCounter and StatelessRandomGetAlg operators.
  • Add a DirectML kernel for SparseApplyAdagrad.
  • Add a DirectML kernel for StatelessRandomUniformV2.
  • Add a DirectML kernel for InTopKV2.
  • Add DirectML kernels for MatrixDiagV3 and MatrixDiagPartV3.
  • Add emulated support for int64.
  • Add a dependency on tensorflow-cpu>=2.10.0. Users should install the tensorflow-cpu package instead of tensorflow or tensorflow-gpu when using tensorflow-directml-plugin.
  • Add int32 support for StridedSlice.
  • Add CPU emulated versions of UnsortedSegmentSum, UnsortedSegmentMax, UnsortedSegmentMin and UnsortedSegmentProd to get rid of device placement errors in transformer models.
  • Add a C API for Linux. The C API can be downloaded from the releases page in the tensorflow-directml-plugin GitHub repository.
  • Add support for multiple devices.
  • Add integer support for Relu.
  • Add int32 support for Pack.
  • Fix the incomplete adapter description on Linux.
  • Fix a crash in ArgMin and ArgMax when the output type was int16 or uint16.
  • Fix an undefined behavior when retrieving a list of strings from an attribute.
  • Fix a memory leak in the BFC allocator.
  • Fix a memory leak in the graph optimizer.
  • Fix a memory leak in SegmentReduction.
  • Fix a memory leak in StridedSlice.
  • Fix a memory leak in the emulated random kernels.
  • Fix the validation of Range to allow values near INT_MAX.
  • Get rid of warnings related to unsupported DataFormatDimMap and DataFormatVecPermute operators.
  • Prevent unbounded growth of command allocator memory.
  • Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
  • Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
  • Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
  • Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.