-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added mathematical description to Noisy SGD #857
Changes from 2 commits
969933d
8c8d53f
e04c26f
10dcfbd
01bdb43
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -992,11 +992,27 @@ def noisy_sgd( | |
gamma: float = 0.55, | ||
seed: int = 0 | ||
) -> base.GradientTransformation: | ||
r"""A variant of SGD with added noise. | ||
r"""Noisy SGD is a variant of :func:`optax.sgd` that incorporates Gaussian | ||
noise into the updates. It has been found that adding noise to the gradients | ||
can improve both the training error and the generalization error in very deep | ||
networks. | ||
|
||
The update :math:`u_t` is modified to include this noise as follows: | ||
|
||
.. math:: | ||
u_t \leftarrow -\alpha_t g_t + N(0, \sigma_t^2), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for doing that! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you're right. We first add the noise to the gradient and then scale by the learning rate. I will review my more carefully in the future. |
||
|
||
where :math:`N(0, \sigma_t^2)` represents Gaussian noise with zero mean and a | ||
variance of :math:`\sigma_t^2`. | ||
|
||
The variance of this noise decays over time according to the formula | ||
|
||
.. math:: | ||
\sigma_t^2 = \frac{\eta}{(1+t)^\gamma}, | ||
|
||
where :math:`\gamma` is the decay rate parameter ``gamma`` and :math:`\eta` | ||
represents the initial variance ``eta``. | ||
|
||
It has been found that adding noise to the gradients can improve | ||
both the training error and the generalization error in very deep networks. | ||
|
||
Examples: | ||
>>> import optax | ||
>>> import jax | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First line needs to end with ".", so I would leave the first line as it is, and add your description below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the line after should be blank, so something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify if the requirement for the first line to end with "." is driven by stylistic guidelines or technical reasons? The list of optimizers at the beginning of the documention isn't effected and displays the first sentence of the docstring correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a requirement of the python style: https://peps.python.org/pep-0257/#multi-line-docstrings
Internally we get an error when importing the code if the docstring doesn't follow this convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right though that this would be something worth adding to https://optax.readthedocs.io/en/latest/development.html