Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNIST training broken with min/max scale propagation #69

Closed
balancap opened this issue Jan 4, 2024 · 2 comments · Fixed by #76
Closed

MNIST training broken with min/max scale propagation #69

balancap opened this issue Jan 4, 2024 · 2 comments · Fixed by #76
Assignees
Labels
bug Something isn't working ops Ops coverage

Comments

@balancap
Copy link
Contributor

balancap commented Jan 4, 2024

Fixing min/max ops scale propagation (PR #68 ) had the side effect of breaking MNIST training. Early investigation showing a divergence of the scale factors after a couple of iterations, similarly to an unstable dynamical system.

Todo: additional investigation to understand the dynamic of the issue.

@balancap balancap added the bug Something isn't working label Jan 4, 2024
@balancap balancap self-assigned this Jan 4, 2024
@balancap balancap added the ops Ops coverage label Jan 4, 2024
@balancap
Copy link
Contributor Author

balancap commented Jan 8, 2024

#74 implements dynamic rescaling methods. An investigation on how to use these properly is still necessary.

@balancap
Copy link
Contributor Author

balancap commented Jan 9, 2024

Bug fixed in #75 and #76 , with dynamic rescaling of logits gradient.

@balancap balancap linked a pull request Jan 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ops Ops coverage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant