Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the learning rate scheduler more flexible #242

Closed
wsnoble opened this issue Sep 19, 2023 · 9 comments · Fixed by #300
Closed

Make the learning rate scheduler more flexible #242

wsnoble opened this issue Sep 19, 2023 · 9 comments · Fixed by #300
Labels
enhancement New feature or request

Comments

@wsnoble
Copy link
Contributor

wsnoble commented Sep 19, 2023

We should modify the config file so that it contains a group of parameters for controlling the learning rate scheduler. This should give some flexibility to try other schedulers that are provided automatically by pytorch.

Currently, we have this:

# Number of warmup iterations for learning rate scheduler
warmup_iters: 100_000
# Max number of iterations for learning rate scheduler
max_iters: 600_000
# Learning rate for weight updates during training
learning_rate: 5e-4

The max_iters is poorly named, so it should be changed to better reflect what it does. In general, the config file should contain more text describing how the learning rate is controlled.

@wsnoble wsnoble added the enhancement New feature or request label Sep 19, 2023
@bittremieux
Copy link
Collaborator

bittremieux commented Dec 26, 2023

@melihyilmaz we could try some fancy learning-rate-free methods (e.g. D-Adaptation, Prodigy) as well.

@bittremieux
Copy link
Collaborator

Addressed by #294.

@wsnoble Do you still want to rename the max_iters config parameter?

@wsnoble
Copy link
Contributor Author

wsnoble commented Feb 14, 2024

Yes, I think so. But I'm not actually sure how to interpret the documentation. What does "Max number of iterations for learning rate scheduler" mean? I would think that max_iters should specify how many batches are done before the optimization terminates, but I guess that's not what this means?

@melihyilmaz
Copy link
Collaborator

@wsnoble You're right in that "Max number of iterations for learning rate scheduler" isn't true. 600,000 steps correspond to the half period of the cosine wave in the learning rate scheduler, i.e. learning rate reaches it's first minimum 600,000 steps and the second minimum at 1.8M steps so on until the max_epochs as defined in config.yaml epochs are reached or the training is terminated by the user (see the learning rate over steps below).

I think we can replace max_iters with something like scheduler_iters and provide a more succinct version of my explanation above with the documentation on the config file.

image

@wsnoble
Copy link
Contributor Author

wsnoble commented Feb 14, 2024

How about cosine_schedule_period?

@melihyilmaz
Copy link
Collaborator

Since its sister parameter is warmup_iters, maybe cosine_schedule_period_iters?

@wsnoble
Copy link
Contributor Author

wsnoble commented Feb 15, 2024

Sure, that would be fine.

@melihyilmaz
Copy link
Collaborator

I've tinkered with renaming the option but I couldn't come up with a way of doing so without introducing breaking changes since max_iters will remain in our latest chekpoint file (which makes optimizer complain when fine-tuning) and if we get rid of that by updating the ckpt file in the current release, then the current major version will be incompatible with its checkpoint file. As a result, cleanest way of renaming would be doing it with a new checkpoint ckpt file and a new minor release.

@bittremieux Is there something I'm missing?

@bittremieux
Copy link
Collaborator

Indeed, that's the downside of including the full model state in the checkpoint instead of only the weights.

Creating a new minor release would indeed be the solution. If a new checkpoint file is provided as part of a new minor release, at least our automatic version checking procedure should download it without the users really noticing. (Except if they're manually specifying the non-enzymatic version currently.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants