You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that it is simply the number of epochs in the example YAML file that has been changed from 100 to 100000 under the assumption that the training will converge. That just does not seem to happen with the Toluene test example - either the potential continue to improve, or the convergence detection is not reliable.
At one timescale does the the example training not converge? You're right about the change, which we will look into, but I want to be sure that the issue is just that it doesn't finish in reasonable time for a tutorial, and not that you think you've found a bug in the early stopping condition implementation.
I am not sure, I did not look into it in detail. The tutorial used to stop after 100 epochs, but it ran for almost 3000 epochs before I aborted it. Which seems a bit much, so I suspect a bug, but as I do not know how early stopping is supposed to be triggered, I cannot say for sure.
In any case, for the tutorial it should probably stop after 100 steps as it used to, the tutorial is not really runnable as it is.
Describe the bug
The tutorial colab example does not finish. The training continues forever, I stopped it after 2000 epochs.
To Reproduce
Run the colab notebook referenced in https://github.com/mir-group/nequip#tutorial
Expected behavior
The training used to stop after 100 epocs, now it runs forever.
The text was updated successfully, but these errors were encountered: