Can't Allocate Memory Issue #604

bartdemooij · 2021-12-07T11:23:01Z

Dear,

What would be the best way to perform high-performance molecular dynamics with ANI on a cluster? We run torchANI in combination with ASE. Currently, when running a box of 1000 ethanol molecules gives the following error when performing the BFGS optimisation:

warnings.warn( Traceback (most recent call last): File "/home/bmooij/ANI_quality_check/MD_ethanol_quality_check_ANI.py", line 49, in <module> opt.run(fmax=1.0) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 269, in run return Dynamics.run(self) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 156, in run for converged in Dynamics.irun(self): File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/optimize/optimize.py", line 122, in irun self.atoms.get_forces() File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/atoms.py", line 788, in get_forces forces = self._calc.get_forces(self) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces return self.get_property('forces', atoms) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property self.calculate(atoms, [name], system_changes) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/ase.py", line 82, in calculate energy = self.model((species, coordinates), cell=cell, pbc=pbc).energies File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/models.py", line 106, in forward species_aevs = self.aev_computer(species_coordinates, cell=cell, pbc=pbc) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 533, in forward aev = compute_aev(species, coordinates, self.triu_index, self.constants(), self.sizes, (cell, shifts)) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 288, in compute_aev atom_index12, shifts = neighbor_pairs(species == -1, coordinates_, cell, shifts, Rcr) File "/home/bmooij/.conda/envs/py9/lib/python3.9/site-packages/torchani/aev.py", line 171, in neighbor_pairs shifts_all = torch.cat([shifts_center, shifts_outside]) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 26243892000 bytes. Error code 12 (Cannot allocate memory)

It works fine if the box has few molecules in it (i.e. 125 ethanol), but starts to give this error for larger systems (i.e. 750 or 1000 ethanol). A system of 500 ethanol also seems to work, but is terribly slow.

Some reproducible code (where file 'ethanol_1000.pdb' is a box of 1000 ethanol made with packmol):

from ase import Atoms
from ase.optimize import BFGS
import torch
import torchani
from Bio import PDB

#Load box of ethanol
parser = PDB.PDBParser()
io = PDB.PDBIO()
struct = parser.get_structure('ethanol_1000', 'ethanol_1000.pdb')
pos = []
for model in struct:
    for chain in model:
        for residue in chain:
            for atom in residue:
                x,y,z = atom.get_coord()
                pos.append([x,y,z])
ethanol = Atoms(1000*"COH3CH3", positions=pos)
ethanol.set_cell((46, 46, 46))
ethanol.set_pbc(True)

#Setup calculator
calculator = torchani.models.ANI1ccx().ase()
ethanol.set_calculator(calculator)

#Minimize the structure
print("Begin minimizing...")
opt = BFGS(ethanol)
opt.run(fmax=1.0)
print()

Best regards,

Bart

The text was updated successfully, but these errors were encountered:

isayev · 2021-12-07T18:22:31Z

Dear @bartdemooij , thanks for reporting. It seems your system is too large to fit into your GPU memory. At this point, we implemented only direct algorithms that would be memory-bound depending on matrix sizes. Therefore you have two choices: i) reduce system size, ii) get a GPU with more memory.

We are working to make simulations plugins into LAMMPS and AMBER, with domain decomposition you would be able to run much larger systems in a distributed fashion.

bartdemooij · 2021-12-09T10:00:45Z

Dear @isayev, thanks for the swift reply. So we ran this system on CPU with 64gb ram. Would you say it is to be expected that this all get used up by a 1000 ethanol molecules (9000 atoms)? Perhaps this is a trivial question, but in what way does memory usage scale with system size?
We are now looking into performing simulations with torchani using openmm-ml, do you think memory usage is more friendly here?

isayev · 2021-12-10T16:47:27Z

Since your code invokes a CUDA memory error, I would assume you need to check your run script and check its correctness. It seems to be still running on a GPU. Typical suspects are CUDA_VISIBLE_DEVICES variable and torch.device definition in your code.

zubatyuk · 2021-12-12T06:03:18Z

Bart: Current memory scale is O(N^2) since TorchANI code calculates NxN distance matrix to find neighbors. In the case of PBC, the code builds extra images (in your case of cubit cell, it would be 18 cells) to find all neighbors. This is the stage when you run out of memory.

…

On Fri, Dec 10, 2021 at 11:47 AM Olexandr Isayev ***@***.***> wrote: Since your code invokes a CUDA memory error, I would assume you need to check your run script and check its correctness. It seems to be still running on a GPU. Typical suspects are CUDA_VISIBLE_DEVICES variable and torch.device definition in your code. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#604 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7FDQ2EXJCJPDLQ5HVYQVDUQIVKVANCNFSM5JQ7332Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

bartdemooij · 2021-12-17T09:44:09Z

Thank you, the memory error is clear.

@zubatyuk The only thing I don't understand is how you get 18 cells for PBC in three dimensions as I thought it would be 26. Am I right that you get 18 by
3x3x3 - 1(original) - 8(the corner boxes) = 18.
If this is the case, why are you allowed to omit the corner boxes? If not, how do you get 18 images?

zubatyuk · 2021-12-17T14:25:15Z

Sorry, it was clearly my mistake. 3x3x3 is 27 indeed.

…

On Fri, Dec 17, 2021 at 4:44 AM bartdemooij ***@***.***> wrote: Thank you, the memory error is clear. @zubatyuk <https://github.com/zubatyuk> The only thing I don't understand is how you get 18 cells for PBC in three dimensions as I thought it would be 26. Am I right that you get 18 by 3x3x3 - 1(original) - 8(the corner boxes) = 18. If this is the case, why are you allowed to omit the corner boxes? If not, how do you get 18 images? — Reply to this email directly, view it on GitHub <#604 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7FDQ4GZMA5IHV6YVPKU4TURMA7JANCNFSM5JQ7332Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

IgnacioJPickering · 2024-09-11T15:08:09Z

We are currently preparing to release a new TorchANI version which addresses this issue since it supports a built-in cell list that makes the scaling O(N), in the next month or two it should be ready and you will be able to run much larger systems with no issue.

IgnacioJPickering closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't Allocate Memory Issue #604

Can't Allocate Memory Issue #604

bartdemooij commented Dec 7, 2021 •

edited

Loading

isayev commented Dec 7, 2021

bartdemooij commented Dec 9, 2021

isayev commented Dec 10, 2021

zubatyuk commented Dec 12, 2021 via email

bartdemooij commented Dec 17, 2021

zubatyuk commented Dec 17, 2021 via email

IgnacioJPickering commented Sep 11, 2024

Can't Allocate Memory Issue #604

Can't Allocate Memory Issue #604

Comments

bartdemooij commented Dec 7, 2021 • edited Loading

isayev commented Dec 7, 2021

bartdemooij commented Dec 9, 2021

isayev commented Dec 10, 2021

zubatyuk commented Dec 12, 2021 via email

bartdemooij commented Dec 17, 2021

zubatyuk commented Dec 17, 2021 via email

IgnacioJPickering commented Sep 11, 2024

bartdemooij commented Dec 7, 2021 •

edited

Loading