[DOC] Update changelog, prepare v0.0.2 (#68)

* [DOC] Update changelog, prepare `v0.0.2` * [FMT] Add `.md` extension to changelog, auto-format * [ADD] Forgot to add `changelog.md` * [FIX] Balance parentheses * [DOC] Add link to arXiv submission
f-dangel · Dec 12, 2023 · dcf1cbc · dcf1cbc
1 parent 0aa9336
commit dcf1cbc
Show file tree

Hide file tree

Showing 5 changed files with 68 additions and 29 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 This package contains the official PyTorch implementation of our
 **memory-efficient and numerically stable KFAC** variant, termed SINGD
-([paper](TODO Insert arXiv link)).
+([paper](http://arxiv.org/abs/2312.05705)).
 
 The main feature is a `torch.optim.Optimizer` which works like most PyTorch optimizers and is compatible with:
 

diff --git a/changelog b/changelog
diff --git a/changelog.md b/changelog.md
@@ -0,0 +1,62 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic
+Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+### Changed
+
+### Deprecated
+
+### Fixed
+
+## [0.0.2] - 2023-12-11
+
+This release adds support for neural networks with in-place activations and also
+comes with performance improvements for convolutions, as well as improvements
+regarding numerical stability in half precision.
+
+### Added
+
+New features:
+
+- Support `Conv2d` layers with `dilation != 1`
+  ([PR](https://github.com/f-dangel/singd/pull/51))
+- Support neural networks with inplace activation functions
+  ([PR](https://github.com/f-dangel/singd/pull/63))
+
+Performance improvements:
+
+- Speed up input processing for `Conv2d` with `groups != 1`
+  ([PR](https://github.com/f-dangel/singd/pull/59))
+- Speed up computation of averaged patches for KFAC-reduce
+  (`kfac_approx='reduce'`) in `Conv2d` using the tensor network approach of
+  Dangel, 2023 ([PR](https://github.com/f-dangel/singd/pull/61))
+
+### Changed
+
+- Move un-scaling of `H_C` into the update step to improve numerical stability
+  when using half precision + gradient scaling
+  ([PR](https://github.com/f-dangel/singd/pull/67))
+
+### Deprecated
+
+No deprecations
+
+### Fixed
+
+No bug fixes
+
+## [0.0.1] - 2023-10-31
+
+Initial release
+
+[unreleased]: https://github.com/f-dangel/singd/compare/v0.0.2...HEAD
+[0.0.2]: https://github.com/f-dangel/singd/releases/tag/v0.0.2
+[0.0.1]: https://github.com/f-dangel/singd/releases/tag/v0.0.1
diff --git a/docs/examples/example_05_structures.py b/docs/examples/example_05_structures.py
@@ -26,8 +26,8 @@
 # [`structures`](https://readthedocs.org/projects/singd/api/). The first entry
 # specifies the structure of $\mathbf{K}$ and its momentum
 # $\mathbf{m}_\mathbf{K}$, while the second entry specifies the structure of
-# $\mathbf{C}$ and its momentum $\mathbf{m}_\mathbf{C}$ (see the [paper](TODO
-# Insert link to arXiv submission) for details). It is even possible to specify
+# $\mathbf{C}$ and its momentum $\mathbf{m}_\mathbf{C}$ (see the
+# [paper](http://arxiv.org/abs/2312.05705) for details). It is even possible to specify
 # structures on a per-layer basis (see
 # [this](https://singd.readthedocs.io/en/latest/generated/gallery/example_03_param_groups/)
 # example).

diff --git a/singd/optim/optimizer.py b/singd/optim/optimizer.py
@@ -26,7 +26,7 @@
 class SINGD(Optimizer):
     """Structured inverse-free natural gradient descent.
 
-    The algorithm is introduced in [this paper](TODO Insert arXiv link) and
+    The algorithm is introduced in [this paper](http://arxiv.org/abs/2312.05705) and
     extends the inverse-free KFAC algorithm from [Lin et al. (ICML
     2023)](https://arxiv.org/abs/2302.09738) with structured pre-conditioner
     matrices.
@@ -104,8 +104,8 @@ def __init__(
     ):  # noqa: D301
         """Structured inverse-free natural gradient descent optimizer.
 
-        Uses the empirical Fisher. See the [paper](TODO Insert arXiv link) for the
-        notation.
+        Uses the empirical Fisher. See the [paper](http://arxiv.org/abs/2312.05705) for
+        the notation.
 
         Args:
             model: The neural network whose parameters (or a subset thereof) will be