Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus error - possible memory leak? #39

Open
farinfa opened this issue May 12, 2021 · 19 comments
Open

Bus error - possible memory leak? #39

farinfa opened this issue May 12, 2021 · 19 comments

Comments

@farinfa
Copy link

farinfa commented May 12, 2021

Hi,

Issue already posted on the JEODPP help page

I'm running LISVAP on a JEODPP terminal.
The installation of the model in JEODPP was completed successfully few months ago and the model was used on this infrastructure several times with no issues.
I have used the same settings in the past to process very large datasets (more than 150 years), last week, I was running this for a 40 years (14974 time steps) process and it stopped several times between days 6000 and 13000 for a "bus error" (see the screenshot attached).
image
Apparently the issue was related with the shared memory available in the terminal (5GB for all the users): I got access to a terminal with 25GB of memory available and the process run smoothly.
Now, I'm running a longer process (more than 50000 time steps) on this last terminal, and I'm having again the "bus error" issue (at around 20k time steps). Apparently this is the only process running on the terminal (no other users at the moment). Could the issue be related to a memory leak? (but it was working in the past....)
Thanks

@gnrgomes
Copy link
Collaborator

Hi,
Are you using the latest version?

While we try to investigate the eventual memory leak, if it is possible for you, please try to generate the output splited by year.
You just need to add this flag to your setting file next to the other setoption flags.

Basically you will get multiple files like: es_1975.nc es_1976.nc ...

@farinfa
Copy link
Author

farinfa commented May 19, 2021

Hi,

this is the version I'm using
image

I'm trying again flagging the splitOutput option as you suggested.
In case it stops again, is there a way I can restart from the year in which it stopped? (Would this be possible just by changing the StepStart and reducing the number of time steps accordingly?)
Thanks

@gnrgomes
Copy link
Collaborator

Could you please update your version because the latest one is 1.0.0 which have the splitOutput flag.

@farinfa
Copy link
Author

farinfa commented May 19, 2021

Yes, sorry. In fact now it is not generating the yearly outputs...

@farinfa
Copy link
Author

farinfa commented May 19, 2021

I have tried to install the newer version, by using the second option of the installation guide, using the pip install lisflood-lisvap command inside a conda virtual environment, as described in this post: the version is the same.
image

image

@gnrgomes
Copy link
Collaborator

Have you tried to upgrade?
pip install --upgrade lisflood-lisvap

@farinfa
Copy link
Author

farinfa commented May 19, 2021

pip install --upgrade lisflood-lisvap
image

valeriolorini added a commit that referenced this issue May 21, 2021
Correct version for pip install. issue #39
@gnrgomes
Copy link
Collaborator

@farinfa could you please upgrade now?

pip install --upgrade lisflood-lisvap

@farinfa
Copy link
Author

farinfa commented May 21, 2021

Hi @gnrgomes
The pip install --upgrade lisflood-lisvap works now. The procedure ends with the successfully installed message:
image

In order to make it work, I had to install pyproj
The version installed now is the 1.0.2
image

When I try to run the model with my data, however, I get a long list of errors: did the settings file change?
image

@gnrgomes
Copy link
Collaborator

It changed to have some options that have a default value, meaning it should run regardless of your settings file.
Could you please share your settings file?

@farinfa
Copy link
Author

farinfa commented May 25, 2021

Sorry for my late reply: I wanted to run some tests before getting back on this.

It changed to have some options that have a default value, meaning it should run regardless of your settings file.

Sorry, my bad: I misspelled something and that was causing the IndexError you saw in my previous post.

Could you please share your settings file?

Enclosed
settings_FF_sample.txt

I managed to run the updated version of LISVAP with my data, but this did not solve the 'Bus Error' original problem (I still get the process killed before 20k time steps).
The new LISVAP does manage, however, to complete the full simulation with the option "splitOutput" flagged.
Could this 'Bus Error' be related to the output data temporary storage and writing process? (again, this was working with no issues in the past)

@gnrgomes
Copy link
Collaborator

A bus error it usually means that you are trying to access memory that does not exist, meaning you exceed the limit of usable memory.
Using my output files I could estimate that only one of your files might be around 28 GB for the full 151 years, which will be all in memory and it means that the swap memory (in disk) will also be used.
This option for splitOutput will only keep in memory 1 year at a time, meaning it will save quite a lot of memory and eventually making the program run faster, because it will not use the swap memory, thus reducing disk accesses.
If you can... use this option.
NOTE: There is also the splitInput option that consumes the input as splited files. These options work independently from each other.

@farinfa
Copy link
Author

farinfa commented May 26, 2021

OK, I'll use it this way.

Thank you

@farinfa farinfa closed this as completed May 26, 2021
@farinfa farinfa reopened this May 27, 2021
@farinfa
Copy link
Author

farinfa commented May 27, 2021

Sorry to bother again @gnrgomes: the 'Bus error' issue is now happening even with the splitOutput setting flagged...

@gnrgomes
Copy link
Collaborator

No problem @farinfa, your input about the project is most welcome.

You are working with a very large dataset compared to our usual dataset of 30 years.
Could you please tell me your input files sizes?

@farinfa
Copy link
Author

farinfa commented May 27, 2021

In the order of about 150GB each.
Again, I want to point out that I have already processed these same data in the past and it was working with no issues (I am now rerunning them due to a change in the base map).

@gnrgomes
Copy link
Collaborator

You were able to run using these inputs on Lisvap v0.4.4 ? Or what version did you use? It is important to know this so I can check the differences in the code.
Are you using the same server to run with the same setup? Did something change on your side?
Do you have enough disk space?

@farinfa
Copy link
Author

farinfa commented May 27, 2021

You were able to run using these inputs on Lisvap v0.4.4 ? Or what version did you use? It is important to know this so I can check the differences in the code.

Yes, I run these on JEODPP with LISVAP version 0.4.4 (now updated).

Are you using the same server to run with the same setup? Did something change on your side?

I first asked this to the JEODPP people and they said there is no change on their side. Later they gave me access to a terminal with more memory, but the issue persisted.

Do you have enough disk space?

More than 1.5TB free space

@gnrgomes
Copy link
Collaborator

I will investigate for a possible memory leak on LISVAP code or in any of the libraries used.
I'll be back to you as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants