Skip to content

profiler

kainagel edited this page Mar 1, 2020 · 1 revision

Profiling

Attached is, in anti-chronological order, an email exchange that had to do with simulations suddenly being much slower than before. The problem was eventually solved by go through the design changes and eventually identifying the problematic one; still, I wanted to keep the descriptions of the different tools. kai, mar'20

by Rodrigo

Dear Prof. Kai,

then I would recommend running with gc logs enabled. There might be clues coming from there.

To understand why the memory consumption is different, I would say that dumping heaps periodically on both versions (the broken and the non-broken) and then use tools to analyse object distribution on both dumps.

From there one should be able to tell which types of objects are spiking in each version.

Generating heap dumps is easy. One could put a heap dump at the end of each mobsim/qsim iteration for example (it might take some time though): https://shekhargulati.com/2018/11/01/taking-java-heap-dump-programmatically/

Analysing heap dumps can use done using different tools. I normally use JVisualVM: https://visualvm.github.io/

cheers, rodrigo

by Kai

Thank you so much.

We don't think that it is memory in a simple way; the broken code, in the same iteration and at the same qsim-hour, actually uses less mem and has more free mem than the non-broken version (per the qsim memory footprint line).

All the best

Kai

by Rodrigo

Hello all,

yes, I use VTune Amplifier (Intel product, probably does not work for AMDs...). It works with JDK8 I believe. You can also run it remotely as long as you have ssh for the other machine (don't need root in the remote machine).

However, from the problem you described, I agree with Marcel, this looks like memory is slowly running out and you start spending too much time in GC. Always running with GC logs enabled (-verbose:gc) helps understanding what is happening in the JVM (and it does not cost performance, it is very light weight). There are online tools that help you digest these logs (https://gceasy.io/, for example).

If this is not a trivial problem, then VTune might give a more fine grained idea of what is happening.

cheers, rodrigo

by Marcel

Hi Kai

I think Rodrigo is using Intel’s VTune Amplifier (https://software.intel.com/vtune), but I think it only supports Java 9 and newer. I don’t know about CLI usage.

An alternative might be Async Profiler (https://github.com/jvm-profiling-tools/async-profiler). This can attach to a running process and collect information for some amount of time (either pre-specified like 120 seconds, or interactively with start/stop). Maybe creating a profiling snapshot at the beginning and after a day could help show a difference to pinpoint where the cycles are lost. The output (I suggest using the flamegraph) might take some time to get used to read it, but it can be helpful.

the computing time is going up considerably after about a day of computing.

Do you have enough memory? It could also be that you have a memory leak and Garbage Collection runs more and more often, slowing the process down. Printing garbage collector statistics into the output could help pinpoint this issue (use -verbose:gc when starting the Java process, e.g. see here: https://www.baeldung.com/java-verbose-gc).

To figure out if you have a memory leak, the so-called class-histogram is also helpful:

jcmd $PID GC.class_histogram > class_histogram.txt (replace $PID with the actual process id)

This gives a list of every loaded class in the JVM with the number of instances and the total amount of memory occupied by these instances. Take a snapshot at the start and one after a day or so to see if there is a major difference in one of the classes.

Best, Marcel

Original Problem by Kai

Dear Rodrigo,

We are having problems here where the computing time is going up considerably after about a day of computing.

Evidently difficult to reconstruct and to debug. So a couple of questions:

  • which profiler are you using?

  • would it be possible to run that from the command line? We are running our jobs on a remote server.

Thanks a lot

Best wishes

Kai

PS: Marcel, I know that I have asked you that question before; I will also dig out the answers from then.

Clone this wiki locally