Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is LOMO capable of pre-training a LLM from scratch as well? #20

Open
YuxingLu613 opened this issue Jun 26, 2023 · 2 comments
Open

Is LOMO capable of pre-training a LLM from scratch as well? #20

YuxingLu613 opened this issue Jun 26, 2023 · 2 comments

Comments

@YuxingLu613
Copy link

No description provided.

@QipengGuo
Copy link
Collaborator

Good question, we don't know how LOMO will perform in the pre-training stage. The major concern is that the SGD is sensitive to the optimization settings. My guess is that the optimization process of pre-training from scratch is more difficult compared to fine-tuning or further pre-training.

In practice, a controversial solution is using a powerful optimizer (e.g., Adam) for a warm-up and switching to a cheaper optimizer (e.g., LOMO).

@PromptExpert
Copy link

I am doing further pre-training, reply later when the result comes out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants