Energy forces #278

RylieWeaver · 2024-09-06T18:04:29Z

@allaffa Let me know what you think of this PR for energy and force computation. It should follow exactly what we discussed before. Some notable things:

(1) It requires data.energy and data.forces to exist and be named this way
(2) It balances force and energy prediction weighting implicitly, since we aren't able to pass multiple weights in single-task learning
(3) Compute forces is an argument to be specified in the json within "Training".
- If specified, it throws errors for anything other than nodal single task learning, as well as any model that doesn't
use positional information for prediction

…n in train_val_test)

allaffa

@RylieWeaver
This is exactly what we discussed and looks awesome.
Have you tried it with Lenndard Jones to make sure that it is compatible?

RylieWeaver · 2024-09-06T18:37:10Z

@allaffa

Awesome, thanks.

I was able to get it to work with Lennard Jones after some small changes to LJ's dataset creation and inference. For example, we now have data.energy and'data.forces instead of data.y and data.forces_pre_scaled. A next step could be updating the LennardJones example to that.

Do not merge it yet, because I am resolving any issues with the GitHub tests, but will let you know when they are passed.

allaffa · 2024-09-06T20:30:59Z

@allaffa

Awesome, thanks.

I was able to get it to work with Lennard Jones after some small changes to LJ's dataset creation and inference. For example, we now have data.energy and'data.forces instead of data.y and data.forces_pre_scaled. A next step could be updating the LennardJones example to that.

Do not merge it yet, because I am resolving any issues with the GitHub tests, but will let you know when they are passed.

@RylieWeaver
As long as the CI tests do not pass, I would not be allowed to merge this PR anyway. We have explicitly forbidden this option in the settings of the GitHub repository.

…ig_utils

allaffa · 2024-09-06T22:23:40Z

@RylieWeaver

I think it would be good to also include a mew test that the CI workflow must execute where we check this capability to make sure that future changes in the code do not break it

RylieWeaver · 2024-09-06T22:26:40Z

@RylieWeaver As long as the CI tests do not pass, I would not be allowed to merge this PR anyway. We have explicitly forbidden this option in the settings of the GitHub repository.

Ok, that's good.

RylieWeaver · 2024-09-06T22:27:52Z

@RylieWeaver

I think it would be good to also include a mew test that the CI workflow must execute where we check this capability to make sure that future changes in the code do not break it

Ok, I can include this. Theoretically, any model that includes positional information should be able to predict forces in this method. Should we use that list for the testing?

RylieWeaver · 2024-09-09T14:53:40Z

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].

Is this correct and is random generation is fine for the forces/energy as well?

allaffa · 2024-09-09T15:44:39Z

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].

Is this correct and is random generation is fine for the forces/energy as well?

@RylieWeaver
I have an idea that I would like to run by you.
What about adding a frozen version of the LJ code as an additional example, where the example first generates the data, saves it into files, and then uses it for training? We don't need to generate many data samples and traing it fully. We just need to make sure that the training kicks off correctly without errors.

RylieWeaver · 2024-09-09T17:26:22Z

@allaffa I believe I'll need to create a new dataset inside hydragnn/serialized_dataset for this testing, so that data.forces and data.energy are available. It seems like most of the other data is generated randomly from [0.0, 0.5, 1.0].
Is this correct and is random generation is fine for the forces/energy as well?

@RylieWeaver I have an idea that I would like to run by you. What about adding a frozen version of the LJ code as an additional example, where the example first generates the data, saves it into files, and then uses it for training? We don't need to generate many data samples and traing it fully. We just need to make sure that the training kicks off correctly without errors.

@allaffa Yeah, I think this will be good to add LJ to the main branch. Would you be free to hammer out the specifics? (your calendar seems booked) If not, we can just hash out here what this will look like:

(1) So, what I think you're saying is to transfer the whole LJ example to the hydragnn/examples directory, including dataset creation, train.py, and json files. This will pull from both your fork and PyTorch_interatomic_potentials. It should be able to run similar to the qm9 example, where ____.py will detect if there is data and generate it if not.

(2) Then, would the examples/LJ dataset be simply copy-pasted over to hydragnn/serialized_dataset to use for examples, or is that unrelated? It would be 500 data points in line with the other examples

hydragnn/models/DIMEStack.py

examples/LennardJones/Forces_Scatterplot.png

examples/LennardJones/LJ_vlad_atomic_forces.json

examples/LennardJones/LJ_multitask.json

examples/LennardJones/LJ_vlad_total_energy.json

examples/LennardJones/graph_utils.py

examples/LennardJones/train_vlad_atomic_forces.py

examples/LennardJones/train_vlad_total_energy.py

allaffa

@RylieWeaver
Additional changes that you forgot to address in the preview review of the PR

examples/LennardJones/LJ_inference_plots.py

allaffa

@RylieWeaver
I committed just one tiny change on the year of the Copyright.

RylieWeaver · 2024-09-17T21:05:58Z

@RylieWeaver I committed just one tiny change on the year of the Copyright.

@allaffa Ok, cool, I just did a couple of cleanup things as well but it should pass all tests again.

allaffa · 2024-09-18T02:25:49Z

@pzhanggit @jychoi-hpc
I have been working with Rylie on this PR for quite some time.
Even if nominally all commits are authored by him, some portions of the code pushed in this PR were written by me in local scripts that I passed on to him.

Because of this, since I have been heavily involved in the code development of this PR myself, I would appreciate if you two could review this PR with fresh eyes. You may catch issues or concerns in terms of software design that I have not caught myself.

RylieWeaver · 2024-09-19T14:02:19Z

Double-clicking on this, let me know what you think @pzhanggit @jychoi-hpc

There are some more PRs in waiting that require this one first, so I'd like to get this one approved or revised soon if possible. Thanks.

allaffa · 2024-09-19T14:52:08Z

Double-clicking on this, let me know what you think @pzhanggit @jychoi-hpc

There are some more PRs in waiting that require this one first, so I'd like to get this one approved or revised soon if possible. Thanks.

@pzhanggit and @jychoi-hpc

For the interest of time, please focus on reviewing only the two files that change the core capabilities of hydragnn, namely:

hydragnn/models/Base.py
hydragnn/train/train_validate_test.py

All the other files are in the "examples" folder, therefore they do no affect the core of HydraGNN, and are the result of extensive work between me, Rylie, and Pranav anyway. The reason why I asked for you to review the PR is because of the two files above.

jychoi-hpc

Thank you for the update. I looked over the base and train script. It looks good to me.

pzhanggit · 2024-09-19T19:40:32Z

hydragnn/models/Base.py

+        forces_pred = torch.autograd.grad(
+            graph_energy_pred,
+            data.pos,
+            grad_outputs=torch.ones_like(graph_energy_pred),
+            retain_graph=graph_energy_pred.requires_grad,  # Retain graph only if needed (it will be needed during training, but not during validation/testing)
+            create_graph=True,
+        )[0].float()


Why is grad_outputs used here? This would lead to sum of the gradients, right?

I don't think so.

Here's my understanding of it:
The grad_outputs argument essentially serves as a weight to multiply the gradients. This is necessary since graph_energy_pred is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.

Specifically to your question:
No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.

Does this answer your question?

@pzhanggit

I don't think so.

Here's my understanding of it: The grad_outputs argument essentially serves as a weight to multiply the gradients. This is necessary since graph_energy_pred is a higher dimension than just a scalar. The torch.ones_like() here will assign a weight of 1 to each gradient, which I believe is what we want. Otherwise, there would be a force prediction which is c*grad_E where c is a constant not equal to 1, which is nonphysical.

Specifically to your question: No, I don't believe that it results in a sum. The output shape of forces_pred will be in the same shape as data.pos. The grad_outputs specifically is for weighting the gradients before a sum, but that extra step of summing is outside of autograd calculation I believe.

Does this answer your question?

@pzhanggit

Thanks. So just to verify if my understanding is correct: grad_outputs is needed because we're calculating gradients of a batch of samples here. And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product, it provides a way to aggregate/sum the gradients across all the samples in the batch. This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

@pzhanggit

grad_outputs is needed because we're calculating gradients of a batch of samples here

Yep, that's right.

And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product

I think so. Yes, grad_outputs should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is the vector-Jacobian product you're referring to.

it provides a way to aggregate/sum the gradients across all the samples in the batch.

I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.

This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

Yep.

Are there any parts still unclear?

@pzhanggit

grad_outputs is needed because we're calculating gradients of a batch of samples here

Yep, that's right.

And since grad_outputs should be a sequence of length matching output containing the “vector” in vector-Jacobian product

I think so. Yes, grad_outputs should be a sequence, and it should be the same shape as whatever you're predicting. It then multiplies the matrix of gradients (row-wise, not normal matrix multiplication), which I think is the vector-Jacobian product you're referring to.

it provides a way to aggregate/sum the gradients across all the samples in the batch.

I think yes. It's scaling those gradients, which would be relevant in an aggregation/sum. Although, it does not do that aggregation/sum.

This is equivalent to calculate the gradients iteratively for each sample, since the cross-gradients between samples would be zero.

Yep.

Are there any parts still unclear?

Scaling those gradients?

@pzhanggit
The scaling is done with torch.ones_like(graph_energy_pred)
In our case, we use vectors of ones because we do NOT want to scale. However, you could apply any multiplying factor (or even provide a customized vector with different values for each entry).

@pzhanggit @allaffa ^ on what Max said. Also, feel free to @ me so In reply faster :)

@pzhanggit @RylieWeaver

I personally found the explanation given by ChatGPT very useful. I attach it here hoping that you will find it useful too.

Yep. Wow, the o1-preview is great

RylieWeaver added 4 commits September 6, 2024 15:35

Energy and Force Prediction changes (loss function in base, and optio…

7b56afd

…n in train_val_test)

comments and renamings

a277ee5

Black formatting and mak computational graph fixes

fd248b5

fix loss weighting

c3eaedc

RylieWeaver added the enhancement New feature or request label Sep 6, 2024

RylieWeaver requested a review from allaffa September 6, 2024 18:04

RylieWeaver self-assigned this Sep 6, 2024

RylieWeaver added 3 commits September 6, 2024 18:08

black formatting

5136b89

black formatting

65f6f1d

black formatting

3395484

allaffa reviewed Sep 6, 2024

View reviewed changes

Fix DIMEStack testing issues, and compute_grad_energy default in conf…

c5d0c7b

…ig_utils

allaffa self-requested a review September 6, 2024 22:27

allaffa reviewed Sep 14, 2024

View reviewed changes

hydragnn/models/DIMEStack.py Outdated Show resolved Hide resolved

RylieWeaver added 7 commits September 16, 2024 13:41

LJ example added first draft

fc5a554

formatting

0bce3d5

formatting

271e36a

formatting

9de9677

formatting

a904def

formatting

12c042e

formatting

0f3fe54

allaffa requested changes Sep 16, 2024

View reviewed changes

allaffa requested changes Sep 17, 2024

View reviewed changes

RylieWeaver added 6 commits September 17, 2024 16:07

file restructuring and using hydra radius graph function

42551b8

formatting

ee20cf7

renaming

1fa7766

remove qm9 test

e8a4a4d

remove qm9 test

5d358a3

smaller number samples for test

06dad51

allaffa self-requested a review September 17, 2024 20:35

allaffa reviewed Sep 17, 2024

View reviewed changes

examples/LennardJones/LJ_inference_plots.py Outdated Show resolved Hide resolved

Update examples/LennardJones/LJ_inference_plots.py

393976c

allaffa approved these changes Sep 17, 2024

View reviewed changes

RylieWeaver added 2 commits September 17, 2024 17:00

use info function

85f44ab

Unnecessary __init__ file

81992af

Unecessary json args

bc93b9b

allaffa self-requested a review September 18, 2024 02:22

allaffa approved these changes Sep 18, 2024

View reviewed changes

allaffa requested review from pzhanggit and jychoi-hpc September 18, 2024 02:23

jychoi-hpc approved these changes Sep 19, 2024

View reviewed changes

pzhanggit reviewed Sep 19, 2024

View reviewed changes

pzhanggit approved these changes Sep 20, 2024

View reviewed changes

allaffa merged commit 32660c7 into main Sep 20, 2024
2 checks passed

allaffa deleted the energy_forces branch September 20, 2024 00:20

RylieWeaver added a commit to RylieWeaver/HydraGNN that referenced this pull request Sep 25, 2024

Energy forces (ORNL#278)

886007e

RylieWeaver added a commit to RylieWeaver/HydraGNN that referenced this pull request Sep 25, 2024

Energy forces (ORNL#278)

0b77348

RylieWeaver added a commit to RylieWeaver/HydraGNN that referenced this pull request Oct 17, 2024

Energy forces (ORNL#278)

c3e534c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Energy forces #278

Energy forces #278

RylieWeaver commented Sep 6, 2024

allaffa left a comment

RylieWeaver commented Sep 6, 2024

allaffa commented Sep 6, 2024

allaffa commented Sep 6, 2024

RylieWeaver commented Sep 6, 2024

RylieWeaver commented Sep 6, 2024 •

edited

Loading

RylieWeaver commented Sep 9, 2024

allaffa commented Sep 9, 2024

RylieWeaver commented Sep 9, 2024 •

edited

Loading

allaffa left a comment

allaffa left a comment

RylieWeaver commented Sep 17, 2024

allaffa commented Sep 18, 2024 •

edited

Loading

RylieWeaver commented Sep 19, 2024

allaffa commented Sep 19, 2024 •

edited

Loading

jychoi-hpc left a comment

pzhanggit Sep 19, 2024

RylieWeaver Sep 19, 2024 •

edited

Loading

pzhanggit Sep 19, 2024

RylieWeaver Sep 19, 2024 •

edited

Loading

pzhanggit Sep 19, 2024

allaffa Sep 19, 2024 •

edited

Loading

RylieWeaver Sep 19, 2024

allaffa Sep 19, 2024

RylieWeaver Sep 19, 2024

pzhanggit Sep 20, 2024

Energy forces #278

Energy forces #278

Conversation

RylieWeaver commented Sep 6, 2024

allaffa left a comment

Choose a reason for hiding this comment

RylieWeaver commented Sep 6, 2024

allaffa commented Sep 6, 2024

allaffa commented Sep 6, 2024

RylieWeaver commented Sep 6, 2024

RylieWeaver commented Sep 6, 2024 • edited Loading

RylieWeaver commented Sep 9, 2024

allaffa commented Sep 9, 2024

RylieWeaver commented Sep 9, 2024 • edited Loading

allaffa left a comment

Choose a reason for hiding this comment

allaffa left a comment

Choose a reason for hiding this comment

RylieWeaver commented Sep 17, 2024

allaffa commented Sep 18, 2024 • edited Loading

RylieWeaver commented Sep 19, 2024

allaffa commented Sep 19, 2024 • edited Loading

jychoi-hpc left a comment

Choose a reason for hiding this comment

pzhanggit Sep 19, 2024

Choose a reason for hiding this comment

RylieWeaver Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

pzhanggit Sep 19, 2024

Choose a reason for hiding this comment

RylieWeaver Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

pzhanggit Sep 19, 2024

Choose a reason for hiding this comment

allaffa Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

RylieWeaver Sep 19, 2024

Choose a reason for hiding this comment

allaffa Sep 19, 2024

Choose a reason for hiding this comment

RylieWeaver Sep 19, 2024

Choose a reason for hiding this comment

pzhanggit Sep 20, 2024

Choose a reason for hiding this comment

RylieWeaver commented Sep 6, 2024 •

edited

Loading

RylieWeaver commented Sep 9, 2024 •

edited

Loading

allaffa commented Sep 18, 2024 •

edited

Loading

allaffa commented Sep 19, 2024 •

edited

Loading

RylieWeaver Sep 19, 2024 •

edited

Loading

RylieWeaver Sep 19, 2024 •

edited

Loading

allaffa Sep 19, 2024 •

edited

Loading