What is considered a large Gradient Norm? #30

adrian-dalessandro · 2023-01-27T19:20:03Z

adrian-dalessandro
Jan 27, 2023

When it's mentioned in the guide that "Clipping can fix either early training instability (large gradient norm early)", what is the value of a large gradient norm? Is it a fixed number, or relative? I often see the clipping threshold set to 10 for common classification problems.

However, many image regression problems can have very very large gradient norms (the gradients themselves aren't too large). Similarly, when working on problems with large resolution images, I believe the gradients will be larger as well, as each kernel weight is receiving the total gradient as a summation of gradients across the entire receptive field (correct me if I'm wrong, here).

Is this a problem? Or is clipping simply relative?

Answered by jmgilmer

Jan 27, 2023

Instead of trying to answer this (which will vary from model to model), I would look for signs of being instability bound when you do a learning rate sweep. What happens when you take your best learning rate lr* and run at 2lr* or 4lr*? Do you see loss instability? If so then that's a sign you should be able to improve performance by dealing with the instability in some way. Warmup and clipping are the easiest ways to tackle this.

View full answer

jmgilmer · 2023-01-27T22:26:05Z

jmgilmer
Jan 27, 2023
Maintainer

Instead of trying to answer this (which will vary from model to model), I would look for signs of being instability bound when you do a learning rate sweep. What happens when you take your best learning rate lr* and run at 2lr* or 4lr*? Do you see loss instability? If so then that's a sign you should be able to improve performance by dealing with the instability in some way. Warmup and clipping are the easiest ways to tackle this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is considered a large Gradient Norm? #30

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What is considered a large Gradient Norm? #30

adrian-dalessandro Jan 27, 2023

Replies: 1 comment

jmgilmer Jan 27, 2023 Maintainer

adrian-dalessandro
Jan 27, 2023

jmgilmer
Jan 27, 2023
Maintainer