Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added mathematical description to Noisy SGD #857

Merged
merged 5 commits into from
Mar 10, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions optax/_src/alias.py
Original file line number Diff line number Diff line change
Expand Up @@ -992,11 +992,27 @@ def noisy_sgd(
gamma: float = 0.55,
seed: int = 0
) -> base.GradientTransformation:
r"""A variant of SGD with added noise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First line needs to end with ".", so I would leave the first line as it is, and add your description below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the line after should be blank, so something like:

r"""A variant of SGD with added noise.

Noisy SGD is ...
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify if the requirement for the first line to end with "." is driven by stylistic guidelines or technical reasons? The list of optimizers at the beginning of the documention isn't effected and displays the first sentence of the docstring correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a requirement of the python style: https://peps.python.org/pep-0257/#multi-line-docstrings

Internally we get an error when importing the code if the docstring doesn't follow this convention

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right though that this would be something worth adding to https://optax.readthedocs.io/en/latest/development.html

r"""Noisy SGD is a variant of :func:`optax.sgd` that incorporates Gaussian
noise into the updates. It has been found that adding noise to the gradients
can improve both the training error and the generalization error in very deep
networks.

The update :math:`u_t` is modified to include this noise as follows:

.. math::
u_t \leftarrow -\alpha_t g_t + N(0, \sigma_t^2),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing that!
Shouldn't it rather be the following?
u_t \leftarrow -\alpha_t (g_t + N(0, \sigma_t^2))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. We first add the noise to the gradient and then scale by the learning rate. I will review my more carefully in the future.


where :math:`N(0, \sigma_t^2)` represents Gaussian noise with zero mean and a
variance of :math:`\sigma_t^2`.

The variance of this noise decays over time according to the formula

.. math::
\sigma_t^2 = \frac{\eta}{(1+t)^\gamma},

where :math:`\gamma` is the decay rate parameter ``gamma`` and :math:`\eta`
represents the initial variance ``eta``.

It has been found that adding noise to the gradients can improve
both the training error and the generalization error in very deep networks.

Examples:
>>> import optax
>>> import jax
Expand Down
Loading