-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Energy forces #278
Energy forces #278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RylieWeaver
This is exactly what we discussed and looks awesome.
Have you tried it with Lenndard Jones to make sure that it is compatible?
Awesome, thanks. I was able to get it to work with Lennard Jones after some small changes to LJ's dataset creation and inference. For example, we now have Do not merge it yet, because I am resolving any issues with the GitHub tests, but will let you know when they are passed. |
@RylieWeaver |
I think it would be good to also include a mew test that the CI workflow must execute where we check this capability to make sure that future changes in the code do not break it |
Ok, that's good. |
Ok, I can include this. Theoretically, any model that includes positional information should be able to predict forces in this method. Should we use that list for the testing? |
@allaffa I believe I'll need to create a new dataset inside Is this correct and is random generation is fine for the forces/energy as well? |
@RylieWeaver |
@allaffa Yeah, I think this will be good to add LJ to the main branch. Would you be free to hammer out the specifics? (your calendar seems booked) If not, we can just hash out here what this will look like: (1) So, what I think you're saying is to transfer the whole LJ example to the (2) Then, would the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RylieWeaver
Additional changes that you forgot to address in the preview review of the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RylieWeaver
I committed just one tiny change on the year of the Copyright.
@allaffa Ok, cool, I just did a couple of cleanup things as well but it should pass all tests again. |
@pzhanggit @jychoi-hpc Because of this, since I have been heavily involved in the code development of this PR myself, I would appreciate if you two could review this PR with fresh eyes. You may catch issues or concerns in terms of software design that I have not caught myself. |
Double-clicking on this, let me know what you think @pzhanggit @jychoi-hpc There are some more PRs in waiting that require this one first, so I'd like to get this one approved or revised soon if possible. Thanks. |
For the interest of time, please focus on reviewing only the two files that change the core capabilities of hydragnn, namely:
All the other files are in the "examples" folder, therefore they do no affect the core of HydraGNN, and are the result of extensive work between me, Rylie, and Pranav anyway. The reason why I asked for you to review the PR is because of the two files above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. I looked over the base and train script. It looks good to me.
forces_pred = torch.autograd.grad( | ||
graph_energy_pred, | ||
data.pos, | ||
grad_outputs=torch.ones_like(graph_energy_pred), | ||
retain_graph=graph_energy_pred.requires_grad, # Retain graph only if needed (it will be needed during training, but not during validation/testing) | ||
create_graph=True, | ||
)[0].float() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is grad_outputs
used here? This would lead to sum of the gradients, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
Here's my understanding of it:
The grad_outputs
argument essentially serves as a weight to multiply the gradients. This is necessary since graph_energy_pred
is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.
Specifically to your question:
No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.
Does this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
Here's my understanding of it: The
grad_outputs
argument essentially serves as a weight to multiply the gradients. This is necessary sincegraph_energy_pred
is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.Specifically to your question: No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.
Does this answer your question?
Thanks. So just to verify if my understanding is correct: grad_outputs
is needed because we're calculating gradients of a batch of samples here. And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product
, it provides a way to aggregate/sum the gradients across all the samples in the batch. This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grad_outputs is needed because we're calculating gradients of a batch of samples here
Yep, that's right.
And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product
I think so. Yes, grad_outputs
should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is the vector-Jacobian product
you're referring to.
it provides a way to aggregate/sum the gradients across all the samples in the batch.
I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.
This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.
Yep.
Are there any parts still unclear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grad_outputs is needed because we're calculating gradients of a batch of samples here
Yep, that's right.
And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product
I think so. Yes,
grad_outputs
should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is thevector-Jacobian product
you're referring to.it provides a way to aggregate/sum the gradients across all the samples in the batch.
I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.
This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.
Yep.
Are there any parts still unclear?
Scaling those gradients
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pzhanggit
The scaling is done with torch.ones_like(graph_energy_pred)
In our case, we use vectors of ones because we do NOT want to scale. However, you could apply any multiplying factor (or even provide a customized vector with different values for each entry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pzhanggit @allaffa ^ on what Max said. Also, feel free to @ me so In reply faster :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Wow, the o1-preview is great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks.
@allaffa Let me know what you think of this PR for energy and force computation. It should follow exactly what we discussed before. Some notable things:
(1) It requires data.energy and data.forces to exist and be named this way
(2) It balances force and energy prediction weighting implicitly, since we aren't able to pass multiple weights in single-task learning
(3) Compute forces is an argument to be specified in the json within "Training".
- If specified, it throws errors for anything other than nodal single task learning, as well as any model that doesn't
use positional information for prediction