Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-memory issue #25

Open
aka-Marlen opened this issue Nov 25, 2024 · 2 comments
Open

Out-of-memory issue #25

aka-Marlen opened this issue Nov 25, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@aka-Marlen
Copy link

Dear OpenQP team,

during the package testing, we had some optimization jobs killed prematurely by the scheduler (Slurm) because of RAM limit overdraft. One of the examples is attached. For the job, we requested 16 cores and 48 GB of memory. We tried to increase the amount of memory to 4GB/core on another molecule of a similar size, with the same result in the end; same problems appeared with 8 cores/24 GB submissions. In all cases we've seen, the jobs were killed within 24 hours after start.

Our configuration: Linux 5.14.0-427.37.1.el9_4.x86_64, gcc 11.3.0, Intel MKL, SLURM 22.05.9
We can provide more info if needed.

The batch log file, input file, starting xyz and cropped version of the log file are attached OOM.zip (full log is too big but available from Drive, link below)

https://drive.google.com/file/d/1KUrlGrdIFAUvsHWa4Mshl5WRU97EKrrA/view?usp=sharing

@h-martina
Copy link

I experience the same an my system and for different types of calculations. In fact, when monitoring my jobs on I see that memory usage increases linearly with time, which for large calculations ultimately leads to OOM kills.

@foxtran foxtran added the bug Something isn't working label Dec 1, 2024
@JornSteen
Copy link

Has this issue been looked into? I encounter the same out-of-memory error running NAMD simulations with PyRAI2MD and OpenQP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants