Make the learning rate scheduler more flexible #242

wsnoble · 2023-09-19T16:54:28Z

We should modify the config file so that it contains a group of parameters for controlling the learning rate scheduler. This should give some flexibility to try other schedulers that are provided automatically by pytorch.

Currently, we have this:

# Number of warmup iterations for learning rate scheduler
warmup_iters: 100_000
# Max number of iterations for learning rate scheduler
max_iters: 600_000
# Learning rate for weight updates during training
learning_rate: 5e-4

The max_iters is poorly named, so it should be changed to better reflect what it does. In general, the config file should contain more text describing how the learning rate is controlled.

The text was updated successfully, but these errors were encountered:

bittremieux · 2023-12-26T20:12:57Z

@melihyilmaz we could try some fancy learning-rate-free methods (e.g. D-Adaptation, Prodigy) as well.

bittremieux · 2024-02-14T13:41:56Z

Addressed by #294.

@wsnoble Do you still want to rename the max_iters config parameter?

wsnoble · 2024-02-14T18:41:43Z

Yes, I think so. But I'm not actually sure how to interpret the documentation. What does "Max number of iterations for learning rate scheduler" mean? I would think that max_iters should specify how many batches are done before the optimization terminates, but I guess that's not what this means?

melihyilmaz · 2024-02-14T21:32:24Z

@wsnoble You're right in that "Max number of iterations for learning rate scheduler" isn't true. 600,000 steps correspond to the half period of the cosine wave in the learning rate scheduler, i.e. learning rate reaches it's first minimum 600,000 steps and the second minimum at 1.8M steps so on until the max_epochs as defined in config.yaml epochs are reached or the training is terminated by the user (see the learning rate over steps below).

I think we can replace max_iters with something like scheduler_iters and provide a more succinct version of my explanation above with the documentation on the config file.

wsnoble · 2024-02-14T21:36:47Z

How about cosine_schedule_period?

melihyilmaz · 2024-02-15T00:24:51Z

Since its sister parameter is warmup_iters, maybe cosine_schedule_period_iters?

wsnoble · 2024-02-15T16:11:05Z

Sure, that would be fine.

melihyilmaz · 2024-02-15T19:17:57Z

I've tinkered with renaming the option but I couldn't come up with a way of doing so without introducing breaking changes since max_iters will remain in our latest chekpoint file (which makes optimizer complain when fine-tuning) and if we get rid of that by updating the ckpt file in the current release, then the current major version will be incompatible with its checkpoint file. As a result, cleanest way of renaming would be doing it with a new checkpoint ckpt file and a new minor release.

@bittremieux Is there something I'm missing?

bittremieux · 2024-02-16T07:33:52Z

Indeed, that's the downside of including the full model state in the checkpoint instead of only the weights.

Creating a new minor release would indeed be the solution. If a new checkpoint file is provided as part of a new minor release, at least our automatic version checking procedure should download it without the users really noticing. (Except if they're manually specifying the non-enzymatic version currently.)

wsnoble added the enhancement New feature or request label Sep 19, 2023

bittremieux mentioned this issue Feb 20, 2024

Rename max_iters to cosine_schedule_period_iters #300

Merged

bittremieux linked a pull request Feb 20, 2024 that will close this issue

Rename max_iters to cosine_schedule_period_iters #300

Merged

bittremieux closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the learning rate scheduler more flexible #242

Make the learning rate scheduler more flexible #242

wsnoble commented Sep 19, 2023

bittremieux commented Dec 26, 2023 •

edited

Loading

bittremieux commented Feb 14, 2024

wsnoble commented Feb 14, 2024

melihyilmaz commented Feb 14, 2024

wsnoble commented Feb 14, 2024

melihyilmaz commented Feb 15, 2024

wsnoble commented Feb 15, 2024

melihyilmaz commented Feb 15, 2024

bittremieux commented Feb 16, 2024

Make the learning rate scheduler more flexible #242

Make the learning rate scheduler more flexible #242

Comments

wsnoble commented Sep 19, 2023

bittremieux commented Dec 26, 2023 • edited Loading

bittremieux commented Feb 14, 2024

wsnoble commented Feb 14, 2024

melihyilmaz commented Feb 14, 2024

wsnoble commented Feb 14, 2024

melihyilmaz commented Feb 15, 2024

wsnoble commented Feb 15, 2024

melihyilmaz commented Feb 15, 2024

bittremieux commented Feb 16, 2024

bittremieux commented Dec 26, 2023 •

edited

Loading