Weight decay cosine schedule #1243

detkov · 2022-04-30T08:34:11Z

detkov
Apr 30, 2022

Hi! I saw that in DINO paper they had weight decay cosine schedule (paper: p.5, "Implementation details.", code) and it seems useful. Are there any plans to provide such a functionality in timm?

Answered by rwightman

May 2, 2022

@detkov I'm not sure if the DINO authors are aware, but the ADAMW in PyTorch is not fully decoupled as in the paper, so the application of WD actually does decay with the LR schedule (https://github.com/pytorch/pytorch/blob/master/torch/optim/adamw.py#L248).

In general though, if anyone can show that it is beneficial for other pretraining schemes besides DINO I'm open to adding...

View full answer

rwightman · 2022-05-02T17:56:56Z

rwightman
May 2, 2022
Maintainer

@detkov I'm not sure if the DINO authors are aware, but the ADAMW in PyTorch is not fully decoupled as in the paper, so the application of WD actually does decay with the LR schedule (https://github.com/pytorch/pytorch/blob/master/torch/optim/adamw.py#L248).

In general though, if anyone can show that it is beneficial for other pretraining schemes besides DINO I'm open to adding...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight decay cosine schedule #1243

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Weight decay cosine schedule #1243

detkov Apr 30, 2022

Replies: 1 comment

rwightman May 2, 2022 Maintainer

detkov
Apr 30, 2022

rwightman
May 2, 2022
Maintainer