MNIST training broken with `min/max` scale propagation #69

balancap · 2024-01-04T16:18:53Z

Fixing min/max ops scale propagation (PR #68 ) had the side effect of breaking MNIST training. Early investigation showing a divergence of the scale factors after a couple of iterations, similarly to an unstable dynamical system.

Todo: additional investigation to understand the dynamic of the issue.

The text was updated successfully, but these errors were encountered:

balancap · 2024-01-08T18:15:10Z

#74 implements dynamic rescaling methods. An investigation on how to use these properly is still necessary.

balancap · 2024-01-09T16:08:23Z

Bug fixed in #75 and #76 , with dynamic rescaling of logits gradient.

balancap added the bug Something isn't working label Jan 4, 2024

balancap self-assigned this Jan 4, 2024

balancap added the ops Ops coverage label Jan 4, 2024

balancap linked a pull request Jan 9, 2024 that will close this issue

Fix MNIST training example with dynamic L2 rescale logits. #76

Merged

balancap closed this as completed in #76 Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST training broken with `min/max` scale propagation #69

MNIST training broken with `min/max` scale propagation #69

balancap commented Jan 4, 2024 •

edited

Loading

balancap commented Jan 8, 2024

balancap commented Jan 9, 2024

MNIST training broken with min/max scale propagation #69

MNIST training broken with min/max scale propagation #69

Comments

balancap commented Jan 4, 2024 • edited Loading

balancap commented Jan 8, 2024

balancap commented Jan 9, 2024

MNIST training broken with `min/max` scale propagation #69

MNIST training broken with `min/max` scale propagation #69

balancap commented Jan 4, 2024 •

edited

Loading