Skip to content

MultiSteps & state.steps & warmup #324

Answered by rosshemsley
agemagician asked this question in Q&A
Discussion options

You must be logged in to vote

Hello @agemagician!

Taking a look at:
https://github.com/deepmind/optax/blob/3fb68179604e349c3083ad12cd2e38ff8713f613/optax/_src/wrappers.py#L184

It looks like the inner optimizer is only called each time "final_step" is called. Since optax works by chaining together GradientTransformations, and usually the step count is used by GradientTransformations (such as the learning rate schedule), the answer to your question depends on how the GradientTransformations are chained together:

e.g. if the schedule is applied before the multi step transformation, it will be applied once on every step (whether or not accumulation is done), but if it's chained after the multistep transformation, it will …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@agemagician
Comment options

Answer selected by agemagician
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants