dadaptive lion? #1157

ppbrown · 2024-11-11T05:11:56Z

ppbrown
Nov 11, 2024

coming over here from OneTrainer.
It supports LION and DADAPT-LION.
You supportt LION, and... quantized lion?
What about adaptive lion?

bghira · 2024-11-11T11:26:42Z

bghira
Nov 11, 2024
Maintainer

it would need to support stochastic rounding and or kahan summation. furthermore any new optimisers need to be proven in a toy ViT training session against AdamW BF16 and Adam fp32. if you are interested please open a pull request with this data.

0 replies

bghira · 2024-11-11T14:06:11Z

bghira
Nov 11, 2024
Maintainer

fwiw, D-adaptation can only hit 80% accuracy on CIFAR-10 after 20 epochs. this doesn't bode well for the optimiser's performance, especially versus a more robust option like Adam(W).

0 replies

ppbrown · 2024-11-11T14:49:19Z

ppbrown
Nov 11, 2024
Author

i know that it isn’t good at long runs because it over adapts after maybe 100,000 steps.

my use case is to do two runs of training.
first run is with dadpt to find a reasonable LR value. it is an interrupted run.

then i do a full run with regular lion using that value.

0 replies

ppbrown · 2024-11-11T15:26:30Z

ppbrown
Nov 11, 2024
Author

the above is often the LR curve for dadapt. I'm grabbing the value from the first plateau

0 replies

bghira · 2024-11-11T16:10:59Z

bghira
Nov 11, 2024
Maintainer

learning rates don't really apply across optims this way so i'm not sure that is a useful approach still

0 replies

ppbrown · 2024-11-11T16:17:44Z

ppbrown
Nov 11, 2024
Author

Eh... im sure there will still be some differences, but at least they are still both lion.
This strategy is supposed to work for ADAMW, so why not lion?

Quote from web search, "Yes, you can use "Dadapt AdamW" to find a good learning rate for AdamW"

0 replies

bghira · 2024-11-11T16:21:32Z

bghira
Nov 11, 2024
Maintainer

as mentioned before if you want to see this in simpletuner you'll have to provide the implementation with the requirements met as well as data indicating its bf16 training effectiveness against Adam(W) in fp32 and bf16 w/ stochastic rounding.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dadaptive lion? #1157

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

dadaptive lion? #1157

ppbrown Nov 11, 2024

Replies: 7 comments

bghira Nov 11, 2024 Maintainer

bghira Nov 11, 2024 Maintainer

ppbrown Nov 11, 2024 Author

ppbrown Nov 11, 2024 Author

bghira Nov 11, 2024 Maintainer

ppbrown Nov 11, 2024 Author

bghira Nov 11, 2024 Maintainer

ppbrown
Nov 11, 2024

bghira
Nov 11, 2024
Maintainer

bghira
Nov 11, 2024
Maintainer

ppbrown
Nov 11, 2024
Author

ppbrown
Nov 11, 2024
Author

bghira
Nov 11, 2024
Maintainer

ppbrown
Nov 11, 2024
Author

bghira
Nov 11, 2024
Maintainer