-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.jit.script profile guided optimisations produce errors in aev_computer gradients #628
Comments
Thanks for reporting the issue! This is a problem of NVFuser. A bug report has been filed at pytorch/pytorch#84510 The minimal reproducible example I extracted from the angular function is the following: def angular_terms(Rca: float, ShfZ: Tensor, EtaA: Tensor, Zeta: Tensor,
ShfA: Tensor, vectors12: Tensor) -> Tensor:
vectors12 = vectors12.view(2, -1, 3, 1, 1, 1, 1)
cos_angles = vectors12.prod(0).sum(1)
ret = (cos_angles + ShfZ) * Zeta * ShfA * 2
return ret.flatten(start_dim=1) Replace a ** operation with a torch.float_power will not solve the root cause of this problem. At this moment, I would recommend disabling NVFuser by running the following:
This will change to NNC fuser (https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md#fusers) instead of nvfuser, which I tested is working correctly. |
Hi, I have found that with pytorch 1.13 and 2.0 (not with pytorch<=1.12) the torch.jit.script profile guided optimisations (that are on by default) cause significant errors in the position gradients calculated via backpropagation of aev_computer when using a CUDA device. This is demonstrated in issue openmm/openmm-ml#50.
An example is shown below, manually turning off the jit optimizations gives accurate forces:
output I get on an RTX3090 is:
I have found a workaround to remove the errors is to replace a
**
operation with atorch.float_power
: 172b6fe,The text was updated successfully, but these errors were encountered: