Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect binaries when the JVM crashes during a Jenkins build #16426

Merged

Conversation

jdmpapin
Copy link
Contributor

@jdmpapin jdmpapin commented Dec 7, 2022

Binaries help in analyzing the core file to debug the crash.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 7, 2022

A few questions:

  • Are any of the other (already) collected files useful only for analyzing cores? e.g. maybe DDR things? This change already makes inclusion of *.dSYM conditional on the existence of a core.
  • Do we also want to be able to run the build as @pshipton suggested? Re-running likely requires more than just the executables. For example, it probably also requires jars (at least). It's unclear to me how we would predict the minimal subset of files that allow running the program, so if we want to be able to run it, I think we'd need to pick up e.g. all of build/*/jdk/* and build/*/images/{jdk,j2sdk-image}/* (in which case we could probably stop worrying about particular file extensions for the binaries and debug info).
  • How do we test this? Perhaps by deliberately failing some PR builds?

buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
@pshipton
Copy link
Member

pshipton commented Dec 7, 2022

Do we also want to be able to run the build

We want to be able to diagnose the problem. Anything we can collect to help with what is relevant. I'm not against collecting the class libraries in the diagnostics. Every successful build archives the libraries and binaries. Doing the wildcards seems fine. It's not very useful when a failure occurs and we can't do anything about it.

How do we test this? Perhaps by deliberately failing some PR builds?

I expect so.

@pshipton
Copy link
Member

pshipton commented Dec 7, 2022

@keithc-ca @AdamBrousseau fyi

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 7, 2022

We want to be able to diagnose the problem. Anything we can collect to help with what is relevant. I'm not against collecting the class libraries in the diagnostics. [...] Doing the wildcards seems fine.

OK, in that case, I'll ignore the file types and go by directory

@pshipton
Copy link
Member

pshipton commented Dec 7, 2022

I'll ignore the file types and go by directory

Chatting with Keith, with the wildcards we should exclude some things we don't need to reduce the size. The most obvious being .o and .obj files.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 7, 2022

Ah, I thought the object files weren't in the directories I was suggesting, at least based on my own build directories on my laptop. It still looks that way for 11 and 17, but I see that 8 does put (some) object files within build/$platform/jdk

buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 7, 2022

OK, so to check which directories contain binaries that we actually run while building, I built Java 8, 11, 17, and 19 on my laptop while logging all processes with strace -f -e trace='%process'. In describing the results, I'll call the particular platform-specific output directory $b, so e.g. $b might be build/linux-x86_64-server-release.

For the Java 8 build, the only Java executable within build/ that it ran was $b/images/j2re-image/bin/java -version at the end, presumably to make sure that it succeeds. For everything else it used the boot JDK.

The Java 11, 17, and 19 builds ran Java executables from $b/jdk/, and from nowhere else within build/. In particular, they ran the following executables:

  • $b/jdk/bin/java
  • $b/jdk/bin/jmod
  • $b/jdk/bin/jlink

Apparently none of the builds needed to run their own javac. The versions that used to need to do so can now rely on a new enough boot JDK. Later versions introducing new Java language features and using them in the JCL will need to do so once again, at least initially, and I can't necessarily predict which javac they will run, but my guess is that it will be $b/jdk/bin/javac.

In all of the builds, there were some additional executables within build/ that were run - more specifically they were within $b/vm/runtime:

  • constgen
  • omr_ddrgen
  • tracegen
  • tracemerge

I don't think these run JVMs, but I suppose it's possible regardless that they could crash. Do we want to try to notice crashes from these and/or include these binaries? My vm/runtime directory is huge - I think it contains most of the object files - so we'd have to pick them up in a targeted way. My copies of these executables are linked only against system libraries and (in the case of constgen and omr_ddrgen) against $b/vm/runtime/lib{omrsig,j9thr29}.so. Noticing crashes from these may be harder since they might not produce core files with the predictable naming pattern used by the JVM, or they might by default not produce a core (e.g. on Windows?).

Finally, Java 8 also ran something called $b/jdk/gensrc_x11wrappers/sizer.64.exe.


My suggestion is that for the moment I think it should suffice to pick up $b/jdk/ (excluding object files).

@pshipton
Copy link
Member

pshipton commented Dec 7, 2022

My suggestion is that for the moment I think it should suffice to pick up $b/jdk/ (excluding object files)

That works for me. Having a problem with the other executables isn't common, and likely if they have a problem it's easily repeatable and can be recreated manually.

I suspect javac isn't used because the compiler can be run by running java.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 8, 2022

Updated based on my previous comment and to address review feedback

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Dec 8, 2022

I don't believe this revision should include any object files, even though it doesn't explicitly exclude them. We can confirm in testing

@jdmpapin jdmpapin changed the title Collect binaries when a Jenkins build generates a core file Collect binaries when the JVM crashes during a Jenkins build Dec 8, 2022
@jdmpapin
Copy link
Contributor Author

Removed some clauses that Keith suggested are not useful

@jdmpapin jdmpapin marked this pull request as draft December 12, 2022 15:07
@jdmpapin
Copy link
Contributor Author

As I suggested at the beginning of the comments, I'll try to test these changes by injecting bugs causing build failures into this PR branch, and then observing the behaviour of PR builds. In preparation, I've converted this PR to a draft to ensure that we don't accidentally merge any deliberately injected bug.

@jdmpapin
Copy link
Contributor Author

Any further comments before I start testing?

@pshipton
Copy link
Member

Not from me.

@jdmpapin
Copy link
Contributor Author

Jenkins compile xlinux jdk17

@jdmpapin
Copy link
Contributor Author

Jenkins compile win jdk8

@jdmpapin
Copy link
Contributor Author

From the x86-64 Linux Java 17 crash, I can run jdk/bin/java -Xint -version, and I can see the crash by removing -Xint. The crashed process command line and output are in make-support/failure-logs/, which would be good to help reproduce an actual failure. With set sysroot ... I see symbols and line numbers in GDB, and by checking out the commits from -version and specifying dir ..., GDB also prints the source line. The compressed archive is 483MB. It doesn't include any object files

Testing Java 8 on Windows now

@jdmpapin
Copy link
Contributor Author

No diagnostics were collected for the Windows Java 8 failure. It seems that the crash doesn't prevent make from succeeding:

[2022-12-13T18:24:27.409Z] Assertion failed at F:\Users\jenkins\workspace\Build_JDK8_x86-64_windows_Personal\openj9\runtime\compiler\ilgen\Walker.cpp:189: false
[2022-12-13T18:24:27.409Z] VMState: 0x00050080
[2022-12-13T18:24:27.409Z] 	oh no, the JVM crashed!
[2022-12-13T18:24:27.409Z] compiling java/lang/String.lengthInternal()I at level: warm
[2022-12-13T18:24:27.409Z] 
[2022-12-13T18:24:27.409Z] JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
[2022-12-13T18:24:27.409Z] JVMDUMP010I JIT dump written to F:\Users\jenkins\workspace\Build_JDK8_x86-64_windows_Personal\jitdump.20221213.122422.4716.0004.dmp
[2022-12-13T18:24:27.410Z] JVMDUMP013I Processed dump event "gpf", detail "".
[2022-12-13T18:24:27.972Z] ----- Build times -------
[2022-12-13T18:24:27.972Z] Start 2022-12-13 11:29:58
[2022-12-13T18:24:27.972Z] End   2022-12-13 12:24:28
[2022-12-13T18:24:27.972Z] 00:00:37 corba
[2022-12-13T18:24:27.972Z] 00:00:41 demos
[2022-12-13T18:24:27.972Z] 00:03:15 docs
[2022-12-13T18:24:27.972Z] 00:06:39 images
[2022-12-13T18:24:27.972Z] 00:00:35 jaxp
[2022-12-13T18:24:27.972Z] 00:03:24 jaxws
[2022-12-13T18:24:27.972Z] 00:08:20 jdk
[2022-12-13T18:24:27.972Z] 00:01:07 langtools
[2022-12-13T18:24:27.972Z] 00:00:21 nashorn
[2022-12-13T18:24:27.972Z] 00:54:30 TOTAL
[2022-12-13T18:24:27.972Z] -------------------------
[2022-12-13T18:24:27.972Z] Finished building OpenJDK for target 'all'

Then the build failed when java -version crashed here.

@jdmpapin
Copy link
Contributor Author

I've opened ibmruntimes/openj9-openjdk-jdk8#639 for cases where java -version fails during make for a Java 8 build. Particularly, the failure should now cause make to fail as well, which in the context of this PR will cause us to run archive_diagnostics().

buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
buildenv/jenkins/common/build.groovy Outdated Show resolved Hide resolved
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Updated test commit to crash only once. We'll see if std::atomic<bool> is available in our AIX PR build...

I also noticed that because we know the exact path to the images/ directory, we don't need find to detect the presence of images/j2re-image/. It's possible to just test -d instead. So I've updated the script to do that.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Jenkins compile aix jdk19

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Making sure that JDK8 is still detected properly after my most recent change.

Jenkins compile xlinux jdk8

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

No luck with std::atomic. I'll go find a VM function to call instead.

The ATM_RMW      operation is not supported by the target hardware.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

OK, updated my test crash-once logic to use VM_AtomicSupport instead of std::atomic so that it should work on AIX. Hopefully this will allow the AIX build crash to produce crash data and actually exercise my changes.

It should still be fine to look at my previous x86-64 Mac JDK11 build. The only changes since then are in the test PR and in distinguishing JDK8 from 11+. The binaries collected in this one are good. I think the symbols (*.dSYM) are good too - LLDB claims to have loaded them - but I'm continuing to have other problems with LLDB so it's hard to be sure. I've asked somebody for some help in checking that the symbols are in fact present.

My previous x86-64 Linux JDK8 build OTOH looks like it mysteriously got stuck. I've cancelled it and I'll launch a new one to make sure that we can still properly recognize JDK8 and collect images/j2re-image instead of jdk/.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Jenkins compile aix jdk19

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Jenkins compile xlinux jdk8

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Jenkins compile aix jdk19

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 1, 2023

Jenkins compile xlinux jdk8

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

@nbhuiyan got the stack trace from the Mac crash (thanks!), and the symbols in the diagnostics tarball are good.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

The AIX build failed to generate any crash data again:

Unhandled exception in signal handler. Protected function: generateDiagnosticFiles (0x0)

It really should generate crash data, but that's a separate issue. I think for now this PR will have to proceed without verifying that the collected binaries and debug info are sufficient on AIX 🤷

I was also going to use the AIX failure as the JDK11+ case to verify that the right branch is taken (to contrast with the JDK8 failure) after the test -d change. Now it seems that won't work, so I'll launch another build just for that purpose. This one should still collect jdk/ and support/modules_libs/ instead of trying to collect images/j2re-image/.

Jenkins compile xlinux jdk11

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

OK, the test -d change is working. The JDK11 crash still picked up jdk/ and support/modules_libs/, and the JDK8 crash picked up images/j2re-image/ instead, as expected.

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

Removed the JIT crash and instead injected an error into Class.java to test specifically that it doesn't cause us to collect binaries.

Jenkins compile xlinux jdk17

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

No binaries collected, as expected

@jdmpapin jdmpapin marked this pull request as ready for review February 2, 2023 16:52
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 2, 2023

I think this should be good to go now

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 6, 2023

@pshipton could you please review again?

Binaries help in analyzing the core file to debug the crash.
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Feb 6, 2023

Updated based on Keith's latest review comment

@keithc-ca keithc-ca merged commit 9c8aa3c into eclipse-openj9:master Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants