v0.30.0-rc1 #1874

borg323 · 2023-04-24T15:26:15Z

borg323
Apr 24, 2023
Maintainer

In this release:

Support for networks with attention body and smolgen added to blas, cuda, metal and onnx backends.
Persistent L2 cache optimization for the cuda backend. Use the cache_opt=true backend option to turn it on.
Some performance improvements for the cuda, onnx and blas backends.
Added the threads backend option to onnx, defaults to 0 (let the onnxruntime decide) except for onnx-cpu that defaults to 1.
The onnx-dml package now includes a directml.dll installation script.
Some users experienced memory issues with onnx-dml, so the defaults were changed. This may affect performance, in which case you can use the steps=8 backend option to get the old behavior.
The Python bindings are available as a package, see the README for instructions.
Some assorted fixes and code cleanups.

This discussion was created from the release v0.30.0-rc1.