SVRG #226
-
Hi, I am not very family with optax. Can someone please, give me basic template of how I will implement Stochastic Variance Reduced Gradient Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi! Thanks for the question! I'll answer for the equations of SVRG as outlined in this blogpost, i.e. the update rule: w_t = w_{t−1}−η_t [∇ψ_{i_t}(w_{t−1})−∇ψ_{i_t} (w̃ )+∇P(w̃ )], with variable names as in the blog post. The design of optax is such that users calculate gradients and apply updates themselves (see #155 for reasons for the latter), so the best approach with optax would be to let the user calculate the three gradient terms in the square brackets, use optax gradient transforms to transform the gradients if necessary, and then write a custom I hope this helps. Let me know if you have more questions! |
Beta Was this translation helpful? Give feedback.
Hi! Thanks for the question!
I'll answer for the equations of SVRG as outlined in this blogpost, i.e. the update rule:
w_t = w_{t−1}−η_t [∇ψ_{i_t}(w_{t−1})−∇ψ_{i_t} (w̃ )+∇P(w̃ )],
with variable names as in the blog post.
The design of optax is such that users calculate gradients and apply updates themselves (see #155 for reasons for the latter), so the best approach with optax would be to let the user calculate the three gradient terms in the square brackets, use optax gradient transforms to transform the gradients if necessary, and then write a custom
apply_updates
function that implements the equation above similar tooptax.apply_updates
. This function would be a great contribution tou…