Is LOMO capable of pre-training a LLM from scratch as well? #20

YuxingLu613 · 2023-06-26T07:28:31Z

No description provided.

QipengGuo · 2023-06-26T08:22:43Z

Good question, we don't know how LOMO will perform in the pre-training stage. The major concern is that the SGD is sensitive to the optimization settings. My guess is that the optimization process of pre-training from scratch is more difficult compared to fine-tuning or further pre-training.

In practice, a controversial solution is using a powerful optimizer (e.g., Adam) for a warm-up and switching to a cheaper optimizer (e.g., LOMO).

PromptExpert · 2023-07-04T13:38:24Z

I am doing further pre-training, reply later when the result comes out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is LOMO capable of pre-training a LLM from scratch as well? #20

Is LOMO capable of pre-training a LLM from scratch as well? #20

YuxingLu613 commented Jun 26, 2023

QipengGuo commented Jun 26, 2023

PromptExpert commented Jul 4, 2023

Is LOMO capable of pre-training a LLM from scratch as well? #20

Is LOMO capable of pre-training a LLM from scratch as well? #20

Comments

YuxingLu613 commented Jun 26, 2023

QipengGuo commented Jun 26, 2023

PromptExpert commented Jul 4, 2023