Pandas support #50

UniqASL · 2020-03-31T07:28:39Z

Dear Even Solbraa,

Thank you for putting this very nice library online.
Do you have plans in the future to add support for pandas dataframes? Running the code to calculate the fluid properties for a single point (T, p) takes a few seconds - due to the connection with Java I guess - so I am a bit afraid of the time it will take to run the code on a (large) dataframe.

Best regards,

EvenSol · 2020-04-16T13:48:52Z

Yes , there will be more integration of pandas dataframes in future releases.
Recently it has been added support for creating fluids via dataframes.

See some examples in the Colab sheet:
https://colab.research.google.com/github/EvenSol/NeqSim-Colab/blob/master/notebooks/PVT/PVTreports.ipynb

or:

https://github.com/equinor/neqsimpython/blob/master/examples/createFluid.py

EvenSol · 2020-04-16T13:56:01Z

A benchmark running 5000 multiphase calculations for a simple gas/oil/water fluid is done in the linked Colab page. 5000 TPflash calculations takes about 5-6 sec in Colab/Python.

https://colab.research.google.com/drive/1JXaqqj1qkriY_DqT8nCpf0tCjNEcCvEW

EvenSol · 2020-04-16T14:49:21Z

See this example filling a dataframe with properties:

https://github.com/equinor/neqsimpython/blob/master/examples/propertiesDataframes.py

UniqASL · 2020-04-17T16:48:00Z

Thank you very much for taking some time for that. I tried your last example. The list having a length of 1,000 is running in approx. 14 s on my laptop. When I increase this value to 10,000 (~one year of data with an hourly frequency), it takes > 2 min.

The problem I guess is that when applying the function calcProperties on the df, python makes call to Java for each single point, which slows the calculations down. One option I can imagine would be to send the entire df to Java and then get the results back as list or df once Java ist done calculating everything.

EvenSol · 2020-04-17T17:29:30Z

Yes, it will be some overhead when there are many calls to Java from Python. In the benchmark (https://equinor.github.io/neqsimhome/benchmark.html) it is indicated that the calculation speed is 2-3 times faster direct in Java compared to via Python. I will look into the reason for this, and if this can be improved. I guess every call to a java method has some overhead (even just reading some property), and that it can be improved by returning more information in each call. If the calculation will involve a process simulation for each time step (instead of just a flash and returning the properties of a fluid), I guess this overhead will be less significant.

I will look into your suggestion of sending the whole dataframe. Thanks for the suggestion.

EvenSol · 2020-04-21T21:11:56Z

A new method has been implemented to fill a dataframe based on a list of tempeatures and pressures (method 2 in the example):

https://github.com/equinor/neqsimpython/blob/master/examples/propertiesDataframes.py

Probably the dataframes in the PySpark project will be a better solution in future work. This can be looked into when PySpark 3.0 will be released (the current version is based on an older version of py4j).

UniqASL · 2020-04-27T12:07:24Z

Thanks for this! The second method is slightly faster eventhough it returns much more results than method 1 (18 with method 1, 63 with method 2). The overall calculations still remains quite slow however (~50 s with method 2 for 1000 values). Maybe PySpark will improve that!

Otherwise, I noticed that you make your imports in the code itself. Normally in python you should import eveything at the beginning. Moreover it is recommended to import entire modules. You can have a look here for instance, section "import".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas support #50

Pandas support #50

UniqASL commented Mar 31, 2020

EvenSol commented Apr 16, 2020 •

edited

Loading

EvenSol commented Apr 16, 2020 •

edited

Loading

EvenSol commented Apr 16, 2020

UniqASL commented Apr 17, 2020

EvenSol commented Apr 17, 2020 •

edited

Loading

EvenSol commented Apr 21, 2020 •

edited

Loading

UniqASL commented Apr 27, 2020

Pandas support #50

Pandas support #50

Comments

UniqASL commented Mar 31, 2020

EvenSol commented Apr 16, 2020 • edited Loading

EvenSol commented Apr 16, 2020 • edited Loading

EvenSol commented Apr 16, 2020

UniqASL commented Apr 17, 2020

EvenSol commented Apr 17, 2020 • edited Loading

EvenSol commented Apr 21, 2020 • edited Loading

UniqASL commented Apr 27, 2020

EvenSol commented Apr 16, 2020 •

edited

Loading

EvenSol commented Apr 16, 2020 •

edited

Loading

EvenSol commented Apr 17, 2020 •

edited

Loading

EvenSol commented Apr 21, 2020 •

edited

Loading