Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG] Tutorial example runs infinitely #260

Open
schiotz opened this issue Nov 1, 2022 · 3 comments
Open

🐛 [BUG] Tutorial example runs infinitely #260

schiotz opened this issue Nov 1, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@schiotz
Copy link

schiotz commented Nov 1, 2022

Describe the bug
The tutorial colab example does not finish. The training continues forever, I stopped it after 2000 epochs.

To Reproduce
Run the colab notebook referenced in https://github.com/mir-group/nequip#tutorial

Expected behavior
The training used to stop after 100 epocs, now it runs forever.

@schiotz schiotz added the bug Something isn't working label Nov 1, 2022
@schiotz
Copy link
Author

schiotz commented Nov 2, 2022

Note that it is simply the number of epochs in the example YAML file that has been changed from 100 to 100000 under the assumption that the training will converge. That just does not seem to happen with the Toluene test example - either the potential continue to improve, or the convergence detection is not reliable.

@Linux-cpp-lisp
Copy link
Collaborator

Hi @schiotz ,

At one timescale does the the example training not converge? You're right about the change, which we will look into, but I want to be sure that the issue is just that it doesn't finish in reasonable time for a tutorial, and not that you think you've found a bug in the early stopping condition implementation.

@schiotz
Copy link
Author

schiotz commented Nov 8, 2022

I am not sure, I did not look into it in detail. The tutorial used to stop after 100 epochs, but it ran for almost 3000 epochs before I aborted it. Which seems a bit much, so I suspect a bug, but as I do not know how early stopping is supposed to be triggered, I cannot say for sure.

In any case, for the tutorial it should probably stop after 100 steps as it used to, the tutorial is not really runnable as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants