-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the learning rate scheduler more flexible #242
Comments
@melihyilmaz we could try some fancy learning-rate-free methods (e.g. D-Adaptation, Prodigy) as well. |
Yes, I think so. But I'm not actually sure how to interpret the documentation. What does "Max number of iterations for learning rate scheduler" mean? I would think that max_iters should specify how many batches are done before the optimization terminates, but I guess that's not what this means? |
@wsnoble You're right in that "Max number of iterations for learning rate scheduler" isn't true. 600,000 steps correspond to the half period of the cosine wave in the learning rate scheduler, i.e. learning rate reaches it's first minimum 600,000 steps and the second minimum at 1.8M steps so on until the I think we can replace |
How about |
Since its sister parameter is |
Sure, that would be fine. |
I've tinkered with renaming the option but I couldn't come up with a way of doing so without introducing breaking changes since @bittremieux Is there something I'm missing? |
Indeed, that's the downside of including the full model state in the checkpoint instead of only the weights. Creating a new minor release would indeed be the solution. If a new checkpoint file is provided as part of a new minor release, at least our automatic version checking procedure should download it without the users really noticing. (Except if they're manually specifying the non-enzymatic version currently.) |
We should modify the config file so that it contains a group of parameters for controlling the learning rate scheduler. This should give some flexibility to try other schedulers that are provided automatically by pytorch.
Currently, we have this:
The
max_iters
is poorly named, so it should be changed to better reflect what it does. In general, the config file should contain more text describing how the learning rate is controlled.The text was updated successfully, but these errors were encountered: