Merge branch 'dilated-convolutions' into string-padding

f-dangel · Oct 31, 2023 · 51c329f · 51c329f
2 parents a46471f + 2d4e5b0
commit 51c329f
Show file tree

Hide file tree

Showing 40 changed files with 1,149 additions and 705 deletions.
diff --git a/README.md b/README.md
@@ -26,7 +26,9 @@ The main feature is a `torch.optim.Optimizer` which works like most PyTorch opti
   data-parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)
   (DDP) training[^1]
 
-The pre-conditioner matrices support different structures that allow to reduce cost ([overview](TODO Insert link to example)).
+The pre-conditioner matrices support different structures that allow to reduce
+cost
+([overview](https://singd.readthedocs.io/en/latest/generated/gallery/example_05_structures/)).
 
 ## Installation
 
@@ -42,10 +44,21 @@ The pre-conditioner matrices support different structures that allow to reduce c
 
 ## Usage
 
- - [Basic example](TODO Insert link to example)
- - Examples for [supported features](TODO Insert link to gallery)
- - [Advanced example](TODO Insert link to example)
- - [Supported structures](TODO Insert link to example)
+ - [Basic
+   example](https://singd.readthedocs.io/en/latest/generated/gallery/example_01_basic/)
+ - Examples for [supported
+   features](https://singd.readthedocs.io/en/latest/generated/gallery/)
+ - [Advanced
+   example](https://singd.readthedocs.io/en/latest/generated/gallery/example_04_advanced/)
+ - [Supported
+   structures](https://singd.readthedocs.io/en/latest/generated/gallery/example_05_structures/)
+
+## Limitations
+
+- `SINGD` does not support graph neural networks (GNN)
+
+- The code has stabilized only recently. Expect things to break and help us
+  improve by filing issues.
 
 ## Citation
 

diff --git a/changelog b/changelog
@@ -0,0 +1,23 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+### Changed
+
+### Deprecated
+
+### Fixed
+
+## [0.0.1] - 2023-10-31
+
+Initial release
+
+[unreleased]: https://github.com/f-dangel/singd/compare/v0.0.1...HEAD
+[0.0.1]: https://github.com/f-dangel/singd/releases/tag/v0.0.1
diff --git a/docs/examples/example_03_param_groups.py b/docs/examples/example_03_param_groups.py
@@ -60,7 +60,7 @@
     "momentum": 0.9,
     "weight_decay": 1e-2,
     "lr_cov": 1e-2,
-    "batch_averaged": True,
+    "loss_average": "batch",
     "T": 1,
     "alpha1": 0.5,
 }

diff --git a/docs/examples/example_04_advanced.py b/docs/examples/example_04_advanced.py
@@ -35,18 +35,21 @@
 MAX_STEPS = 100  # quit training after this many steps
 DEV = device("cuda" if cuda.is_available() else "cpu")
 
-BATCH_SIZE = 32
+MICRO_BATCH_SIZE = 6  # [ACC]
+ITERS_TO_ACCUMULATE = 4  # [ACC]
+NUM_PROCS = 2  # [ACC]
 
-MICRO_BATCH_SIZE = 8  # [ACC]
-assert BATCH_SIZE % MICRO_BATCH_SIZE == 0  # [ACC]
+BATCH_SIZE = MICRO_BATCH_SIZE * ITERS_TO_ACCUMULATE * NUM_PROCS
 
 train_dataset = MNIST(
     "./data",
     train=True,
     download=True,
     transform=Compose([ToTensor(), Normalize(mean=(0.1307,), std=(0.3081,))]),
 )
-train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
+train_loader = DataLoader(
+    dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True
+)
 
 model = Sequential(
     Conv2d(1, 3, kernel_size=5, stride=2),
@@ -90,7 +93,7 @@
     "momentum": 0.9,
     "weight_decay": 1e-2,
     "lr_cov": 1e-2,
-    "batch_averaged": True,
+    "loss_average": "batch",
     "T": 1,
     "alpha1": 0.5,
     "structures": ("dense", "dense"),
@@ -155,6 +158,12 @@
         with autocast(device_type=amp_device_type, dtype=amp_dtype):  # [AMP]
             loss = loss_func(model(inputs_micro), target_micro)
 
+            # [ACC] Each per-datum loss must be scaled relative to the total
+            # number of data points accumulated in a gradient, see
+            # https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-scaled-gradients
+            if loss_func.reduction == "mean":
+                loss *= MICRO_BATCH_SIZE / BATCH_SIZE
+
         # [AMP] Backward passes under ``autocast`` are not recommended, see
         # (https://pytorch.org/docs/stable/amp.html#torch.autocast).
         # Therefore, this part happens outside the ``autocast`` context

diff --git a/docs/examples/example_05_structures.py b/docs/examples/example_05_structures.py
@@ -28,8 +28,9 @@
 # $\mathbf{m}_\mathbf{K}$, while the second entry specifies the structure of
 # $\mathbf{C}$ and its momentum $\mathbf{m}_\mathbf{C}$ (see the [paper](TODO
 # Insert link to arXiv submission) for details). It is even possible to specify
-# structures on a per-layer basis (see [this](TODO Insert link to param groups
-# example) example).
+# structures on a per-layer basis (see
+# [this](https://singd.readthedocs.io/en/latest/generated/gallery/example_03_param_groups/)
+# example).
 #
 # The following structures are available:
 

diff --git a/docs/interface.md b/docs/interface.md
@@ -0,0 +1,7 @@
+This section lists the interface for structured matrices, that is the operations
+they need to implement to work in SINGD. It serves **for internal purposes
+only**. This is useful for developers that wish to add a new structured matrix
+class to the code that cannot be constructed with one of the available
+templates.
+
+::: singd.structures.base.StructuredMatrix
diff --git a/docs/structures.md b/docs/structures.md
@@ -0,0 +1,59 @@
+Here we provide a list of structured matrices. This list is meant **for internal
+purposes only**. It exists because it is more convenient to read the rendered
+LaTeX code rather than the docstring source.
+
+::: singd.structures.dense.DenseMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.hierarchical.Hierarchical15_15Matrix
+    options:
+        members:
+            - __init__
+
+# DIAGONAL
+
+::: singd.structures.diagonal.DiagonalMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.blockdiagonal.Block30DiagonalMatrix
+    options:
+        members:
+            - __init__
+
+# LOWER-TRIANGULAR
+
+::: singd.structures.triltoeplitz.TrilToeplitzMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.trilbottomrightdiag.TrilBottomRightDiagonalMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.triltopleftdiag.TrilTopLeftDiagonalMatrix
+    options:
+        members:
+            - __init__
+
+# UPPER-TRIANGULAR
+
+::: singd.structures.triutoeplitz.TriuToeplitzMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.triubottomrightdiag.TriuBottomRightDiagonalMatrix
+    options:
+        members:
+            - __init__
+
+::: singd.structures.triutopleftdiag.TriuTopLeftDiagonalMatrix
+    options:
+        members:
+            - __init__
diff --git a/docs/templates.md b/docs/templates.md
@@ -0,0 +1,24 @@
+Here we provide a list of templates that can be used to create new structured
+matrices. This list is meant **for internal purposes only**. It exists because
+it is more convenient to read the rendered LaTeX code rather than the docstring
+source.
+
+::: singd.structures.blockdiagonal.BlockDiagonalMatrixTemplate
+    options:
+        members:
+            -
+
+::: singd.structures.hierarchical.HierarchicalMatrixTemplate
+    options:
+        members:
+            -
+
+::: singd.structures.recursive.RecursiveBottomLeftMatrixTemplate
+    options:
+        members:
+            -
+
+::: singd.structures.recursive.RecursiveTopRightMatrixTemplate
+    options:
+        members:
+            -
diff --git a/makefile b/makefile
@@ -58,10 +58,9 @@ install-test:
 .PHONY: test test-light
 
 test:
-	@pytest -vx --run-optional-tests=expensive --cov=singd test
-
+	@pytest -vx --run-optional-tests=expensive --cov=singd --doctest-modules test singd
 test-light:
-	@pytest -vx --cov=singd test
+	@pytest -vx --cov=singd --doctest-modules test singd
 
 .PHONY: install-lint
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -10,6 +10,10 @@ nav:
     - Code Examples: generated/gallery
     - API Documentation: api.md
     - Developer Notes: develop.md
+    - Internal:
+      - Structures: structures.md
+      - Templates: templates.md
+      - Interface: interface.md
 theme:
     name: material
     features:
@@ -34,7 +38,7 @@ plugins:
                 options:
                       show_root_heading: true
                       show_source: true
-                      show_bases: false
+                      show_bases: true
                       show_signature_annotations: true
                       separate_signature: true
                       docstring_section_style: list

diff --git a/setup.cfg b/setup.cfg
@@ -78,6 +78,7 @@ doc =
     mkdocstrings[python]==0.22.0
     mkdocs-gallery==0.7.8
     matplotlib # structure visualizations
+    torchvision # MNIST
 
 # Dependencies needed to run fine-tuning experiments
 fine_tuning =

diff --git a/singd/optim/accumulator.py b/singd/optim/accumulator.py