Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk19 OpenJDK java/lang/Thread/virtual/HoldsLock.java Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null #16261

Closed
pshipton opened this issue Nov 4, 2022 · 30 comments · Fixed by ibmruntimes/openj9-openjdk-jdk19#71

Comments

@pshipton
Copy link
Member

pshipton commented Nov 4, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/43
jdk_lang_1
java/lang/Thread/virtual/HoldsLock.java

22:55:20  STDOUT:
22:55:20  VirtualThread[#96]/waiting@Thread-4 is waiting on java.lang.Object@6ad74f51
22:55:20  		java.lang.Object@6ad74f51
22:55:20  "main" prio=5 Id=2 WAITING
22:55:20  
22:55:20  "JIT Compilation Thread-000" prio=10 Id=4 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-001 Suspended" prio=10 Id=5 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-002 Suspended" prio=10 Id=6 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-003 Suspended" prio=10 Id=7 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-004 Suspended" prio=10 Id=8 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-005 Suspended" prio=10 Id=9 RUNNABLE
22:55:20  
22:55:20  "JIT Compilation Thread-006 Suspended" prio=10 Id=10 RUNNABLE
22:55:20  
22:55:20  "JIT Diagnostic Compilation Thread-007 Suspended" prio=10 Id=11 RUNNABLE
22:55:20  
22:55:20  "JIT-SamplerThread" prio=10 Id=12 TIMED_WAITING
22:55:20  
22:55:20  "IProfiler" prio=5 Id=13 RUNNABLE
22:55:20  
22:55:20  "Common-Cleaner" prio=8 Id=3 TIMED_WAITING
22:55:20  
22:55:20  "Concurrent Mark Helper" prio=1 Id=14 RUNNABLE
22:55:20  
22:55:20  "Finalizer thread" prio=5 Id=15 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=16 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=17 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=18 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=19 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=20 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=21 RUNNABLE
22:55:20  
22:55:20  "GC Worker" prio=5 Id=22 RUNNABLE
22:55:20  
22:55:20  "Attach API wait loop" prio=10 Id=25 RUNNABLE
22:55:20  
22:55:20  "MainThread" prio=5 Id=27 RUNNABLE
22:55:20  
22:55:20  "SeedGenerator Thread" prio=1 Id=28 RUNNABLE
22:55:20  
22:55:20  null
22:55:20  test HoldsLock.testThreadInfo(): failure
22:55:20  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
22:55:20  	at HoldsLock.testThreadInfo(HoldsLock.java:109)
@pshipton
Copy link
Member Author

pshipton commented Nov 4, 2022

@tajila @fengxue-IS fyi

@tajila
Copy link
Contributor

tajila commented Nov 4, 2022

@tajila
Copy link
Contributor

tajila commented Nov 29, 2022

@mikezhang1234567890 Can you please take a look at this? We haven't seen it in a while, if it cant be reproduced please close.

@mikezhang1234567890
Copy link
Contributor

Will take a look.

@mikezhang1234567890
Copy link
Contributor

@tajila Ran a 200x grinder here and it passed, I think this issue can be closed.

@tajila
Copy link
Contributor

tajila commented Nov 29, 2022

closing based on previous comment

@tajila tajila closed this as completed Nov 29, 2022
@pshipton
Copy link
Member Author

pshipton commented Dec 1, 2022

This still occurs, reopening. It doesn't seem like a blocker for the Java 19 release however, it can move out.

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/62
jdk_lang_0
java/lang/Thread/virtual/HoldsLock.java

23:52:49  test HoldsLock.testThreadInfo(): failure
23:52:49  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
23:52:49  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@mikezhang1234567890
Copy link
Contributor

Haven't managed to reproduce the error, but the cause is because ThreadInfo info = bean.getThreadInfo(tid); is returning null sometimes for the virtual thread.

I think the createStackTrace here is the likely cause since it looks like the only place that returns null without throwing an exception.

@pshipton
Copy link
Member Author

pshipton commented Dec 8, 2022

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/66
jdk_lang_0 & jdk_lang_1
java/lang/Thread/virtual/HoldsLock.java

00:27:35  test HoldsLock.testThreadInfo(): failure
00:27:35  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
00:27:35  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@tajila
Copy link
Contributor

tajila commented Dec 8, 2022

@pshipton
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/71
jdk_lang_0 & jdk_lang_1
java/lang/Thread/virtual/HoldsLock.java

22:33:47  test HoldsLock.testThreadInfo(): failure
22:33:47  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
22:33:47  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@JasonFengJ9
Copy link
Member

JDK19 internal build(win22x86-rt1-1)

openjdk version "19.0.1" 2022-10-18
IBM Semeru Runtime Open Edition 19.0.1+10 (build 19.0.1+10)
Eclipse OpenJ9 VM 19.0.1+10 (build master-6025e997f, JRE 19 Windows Server 2022 amd64-64-Bit Compressed References 20221018_9 (JIT enabled, AOT enabled)
OpenJ9   - 6025e997f
OMR      - 7d8ea8b6b
JCL      - 5168276e58 based on jdk-19.0.1+10)

[2022-12-17T18:53:10.941Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650
[2022-12-17T18:53:10.941Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops 

[2022-12-17T19:51:16.410Z] TEST: java/lang/Thread/virtual/HoldsLock.java

[2022-12-17T19:51:16.411Z] test HoldsLock.testThreadInfo(): failure
[2022-12-17T19:51:16.411Z] java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
[2022-12-17T19:51:16.411Z] 	at HoldsLock.testThreadInfo(HoldsLock.java:109)
[2022-12-17T19:51:16.411Z] 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
[2022-12-17T19:51:16.411Z] 	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
[2022-12-17T19:51:16.411Z] 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
[2022-12-17T19:51:16.411Z] 	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestRunner.privateRun(TestRunner.java:764)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestRunner.run(TestRunner.java:585)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunner.run(SuiteRunner.java:286)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
[2022-12-17T19:51:16.411Z] 	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestNG.runSuites(TestNG.java:1069)
[2022-12-17T19:51:16.411Z] 	at org.testng.TestNG.run(TestNG.java:1037)
[2022-12-17T19:51:16.411Z] 	at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94)
[2022-12-17T19:51:16.411Z] 	at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54)
[2022-12-17T19:51:16.411Z] 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
[2022-12-17T19:51:16.411Z] 	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
[2022-12-17T19:51:16.411Z] 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
[2022-12-17T19:51:16.411Z] 	at java.base/java.lang.Thread.run(Thread.java:1573)
[2022-12-17T19:51:16.411Z] 
[2022-12-17T19:51:16.411Z] ===============================================
[2022-12-17T19:51:16.411Z] java/lang/Thread/virtual/HoldsLock.java
[2022-12-17T19:51:16.411Z] Total tests run: 1, Passes: 0, Failures: 1, Skips: 0
[2022-12-17T19:51:16.411Z] ===============================================

[2022-12-17T19:51:16.412Z] TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.Exception: failures: 1
[2022-12-17T19:51:16.412Z] --------------------------------------------------
[2022-12-17T19:58:29.678Z] Test results: passed: 846; failed: 1
[2022-12-17T19:58:49.767Z] Report written to C:\Users\jenkins\workspace\Test_openjdk19_j9_sanity.openjdk_x86-64_windows_testList_1\aqa-tests\TKG\output_16713031822746\jdk_lang_1\report\html\report.html
[2022-12-17T19:58:49.767Z] Results written to C:\Users\jenkins\workspace\Test_openjdk19_j9_sanity.openjdk_x86-64_windows_testList_1\aqa-tests\TKG\output_16713031822746\jdk_lang_1\work
[2022-12-17T19:58:49.767Z] Error: Some tests failed or other problems occurred.
[2022-12-17T19:58:49.767Z] -----------------------------------
[2022-12-17T19:58:49.767Z] jdk_lang_1_FAILED

100x grinder

@pshipton
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/74
jdk_lang_0 & 1
java/lang/Thread/virtual/HoldsLock.java

00:46:42  null
00:46:42  test HoldsLock.testThreadInfo(): failure
00:46:42  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
00:46:42  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@pshipton
Copy link
Member Author

pshipton commented Jan 5, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/86/
jdk_lang_0 & 1
java/lang/Thread/virtual/HoldsLock.java

23:31:33  null
23:31:33  test HoldsLock.testThreadInfo(): failure
23:31:33  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
23:31:33  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@fengxue-IS
Copy link
Contributor

I was able to reproduce this with a personal build, this issue seem to be machine/OS dependent as I got 150/150 pass on win2016 machine and 24/48 failure on win11.

Failure due to allinfo[0].thread == NULL at https://github.com/eclipse-openj9/openj9/blob/master/runtime/jcl/common/mgmtthread.c#L598
given comments from https://github.com/eclipse-openj9/openj9/blob/master/runtime/jcl/common/mgmtthread.c#L831-L832, issue seem to be that the thread id passed in references a dead thread, which is why the .thread field is null

@pshipton
Copy link
Member Author

pshipton commented Jan 23, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/97/ - win2019-x64-2
jdk_lang_0 & 1
java/lang/Thread/virtual/HoldsLock.java

23:02:45  null
23:02:45  test HoldsLock.testThreadInfo(): failure
23:02:45  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
23:02:45  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

@JasonFengJ9
Copy link
Member

JDK19 internal build(win16x64cudart1)

openjdk version "19.0.2" 2023-01-17
IBM Semeru Runtime Open Edition 19.0.2+7 (build 19.0.2+7)
Eclipse OpenJ9 VM 19.0.2+7 (build master-5395447ef, JRE 19 Windows Server 2016 amd64-64-Bit Compressed References 20230117_32 (JIT enabled, AOT enabled)
OpenJ9   - 5395447ef
OMR      - 4c214a89c
JCL      - be0b953d12 based on jdk-19.0.2+7)

[2023-01-28T19:05:44.150Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
[2023-01-28T19:05:44.150Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops 

[2023-01-28T19:43:17.461Z] TEST: java/lang/Thread/virtual/HoldsLock.java

[2023-01-28T19:43:17.467Z] test HoldsLock.testThreadInfo(): failure
[2023-01-28T19:43:17.467Z] java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
[2023-01-28T19:43:17.467Z] 	at HoldsLock.testThreadInfo(HoldsLock.java:109)
[2023-01-28T19:43:17.467Z] 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
[2023-01-28T19:43:17.467Z] 	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
[2023-01-28T19:43:17.467Z] 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
[2023-01-28T19:43:17.467Z] 	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestRunner.privateRun(TestRunner.java:764)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestRunner.run(TestRunner.java:585)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunner.run(SuiteRunner.java:286)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
[2023-01-28T19:43:17.467Z] 	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestNG.runSuites(TestNG.java:1069)
[2023-01-28T19:43:17.467Z] 	at org.testng.TestNG.run(TestNG.java:1037)
[2023-01-28T19:43:17.467Z] 	at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94)
[2023-01-28T19:43:17.467Z] 	at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54)
[2023-01-28T19:43:17.467Z] 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
[2023-01-28T19:43:17.467Z] 	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
[2023-01-28T19:43:17.467Z] 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
[2023-01-28T19:43:17.467Z] 	at java.base/java.lang.Thread.run(Thread.java:1573)
[2023-01-28T19:43:17.467Z] 
[2023-01-28T19:43:17.467Z] ===============================================
[2023-01-28T19:43:17.467Z] java/lang/Thread/virtual/HoldsLock.java
[2023-01-28T19:43:17.467Z] Total tests run: 1, Passes: 0, Failures: 1, Skips: 0

@pshipton
Copy link
Member Author

pshipton commented Jan 31, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/103 - win2012x64-openj9-1
jdk_lang_1
java/lang/Thread/virtual/HoldsLock.java

01:11:02  null
01:11:02  test HoldsLock.testThreadInfo(): failure
01:11:02  java.lang.NullPointerException: Cannot invoke "java.lang.management.ThreadInfo.getLockInfo()" because "<local17>" is null
01:11:02  	at HoldsLock.testThreadInfo(HoldsLock.java:109)

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/104
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/106/

@tajila
Copy link
Contributor

tajila commented Jan 31, 2023

@fengxue-IS Any progress on this?

@fengxue-IS
Copy link
Contributor

Still working on reproducing this locally to get a corefile for investigation

@fengxue-IS
Copy link
Contributor

reproduce with corefile, based on tid from the corefile, the problematic id is a value that is already assigned out and terminated.
After adding verbose output to the thread cleanup code, the issue is that in between the calls getAllThreadIds() and getThreadInfo() in test code, some of the threads is exiting.

        long[] tids = bean.getAllThreadIds();
        boolean foundCarrier = false;
        for (long tid : tids) {
            ThreadInfo info = bean.getThreadInfo(tid);

output from grinder run:

[2023-02-14T15:04:03.944Z] Thread <SeedGenerator Thread> (85) Exited
[2023-02-14T15:04:03.944Z] Thread <SeedGenerator Thread> (86) Exited
[2023-02-14T15:04:03.944Z] Thread <SeedGenerator Thread> (87) Exited
[2023-02-14T15:04:03.944Z] getThread returned NULL for id (87) due to no matching thread
[2023-02-14T15:04:03.944Z] allinfo[0].thread returned NULL for id (0)

@DanHeidinga
Copy link
Member

DanHeidinga commented Feb 15, 2023

How is the test being run? Looking at the source, I don't see where the <SeedGenerator Thread>s are being created.

Both the run commands for this test indicate "othervm" which I think means a new VM should be forked to run the test... See https://openjdk.org/jtreg/tag-spec.html

 * @run testng/othervm --enable-preview HoldsLock
 * @run testng/othervm --enable-preview -XX:+UseHeavyMonitors HoldsLock

Is that not happening?

@pshipton
Copy link
Member Author

Re SeedGenerator on Windows, seems related to another problem we've been seeing. I suspect there is an OpenJDK timing bug in the generic SeedGenerator which is causing it to take an excessive amount of time. Probably related to the test machines not being setup correctly to use the (MS Crypto API) native Windows seed generator. Copying my related note below:

These specific test(s) timed out. There are two tests but they are both running the ThreadedSeedGenerator.
java/security/SecureRandom/ApiTest.java
java/security/SecureRandom/EnoughSeedTest.java

From the output you can see it's running the following code, the test is waiting on this.

15:04:17  "SeedGenerator Thread" prio=1 Id=27 RUNNABLE
15:04:17  	at java.base@11.0.17/sun.security.provider.SeedGenerator$ThreadedSeedGenerator.run(SeedGenerator.java:362)
15:04:17  	at java.base@11.0.17/java.lang.Thread.run(Thread.java:839)

Adoptium is the same

16:19:36  "SeedGenerator Thread" #15 daemon prio=1 os_prio=-2 cpu=958187.50ms elapsed=960.47s tid=0x0000020fb1e0a000 nid=0x195c runnable  [0x000000fde35fe000]
16:19:36     java.lang.Thread.State: RUNNABLE
16:19:36  	at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.run(java.base@11.0.18/SeedGenerator.java:362)
16:19:36  	at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)

In some of my grinders I collected diagnostics. It's burning a lot of system CPU time (16+ min) running the ThreadedSeedGenerator loop. It's possible the ThreadedSeedGenerator code is somewhat bogus, it looks to me like the loop counter < 64000 condition could result in running for a long time if the latch value, which loops around, ends up being quite small.

3XMTHREADINFO      "SeedGenerator Thread" J9VMThread:0x000000000025EA00, omrthread_t:0x0000023DC1DDB0B8, java/lang/Thread:0x00000000E0132C08, state:R, prio=1
3XMJAVALTHREAD            (java/lang/Thread getId:0x1B, isDaemon:true)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000E0041570)
3XMTHREADINFO1            (native thread ID:0xEC, native priority:0x1, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMCPUTIME               CPU usage total: 998.531250000 secs, user: 12.875000000 secs, system: 985.656250000 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=94144 (0x16FC0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at sun/security/provider/SeedGenerator$ThreadedSeedGenerator.run(SeedGenerator.java:362(Compiled Code))
4XESTACKTRACE                at java/lang/Thread.run(Thread.java:839)

The code waiting for the random bytes

4XESTACKTRACE                at java/lang/Object.wait(Native Method)
4XESTACKTRACE                at java/lang/Object.wait(Object.java:190(Compiled Code))
4XESTACKTRACE                at sun/security/provider/SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:404(Compiled Code))
4XESTACKTRACE                at sun/security/provider/SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:393)
4XESTACKTRACE                at sun/security/provider/SeedGenerator.generateSeed(SeedGenerator.java:144)
4XESTACKTRACE                at sun/security/provider/AbstractDrbg.engineGenerateSeed(AbstractDrbg.java:456)
4XESTACKTRACE                at sun/security/provider/DRBG.engineGenerateSeed(DRBG.java:238)
4XESTACKTRACE                at java/security/SecureRandom.generateSeed(SecureRandom.java:874)
4XESTACKTRACE                at ApiTest.verifyAPI(ApiTest.java:258)

@pshipton
Copy link
Member Author

pshipton commented Feb 15, 2023

About the use of the SeedGenerator, if something is calling SecureRandom, that uses the SeedGenerator on Windows. Running on Linux, I set the following option -Xdump:stack:events=load,filter=java/security/SecureRandom when running the test to find the following:

        at java/util/UUID$Holder.<clinit>()V (UUID.java:102)
        at java/util/UUID.randomUUID()Ljava/util/UUID; (UUID.java:148)
        at org/testng/xml/XmlTest.init(Lorg/testng/xml/XmlSuite;I)V (XmlTest.java:66)
        at org/testng/xml/XmlTest.<init>(Lorg/testng/xml/XmlSuite;)V (XmlTest.java:57)
        at org/testng/TestNG.createCommandLineSuitesForClasses([Ljava/lang/Class;)Ljava/util/List; (TestNG.java:557)
        at org/testng/TestNG.initializeCommandLineSuites()V (TestNG.java:825)
        at org/testng/TestNG.initializeEverything()V (TestNG.java:1020)
        at org/testng/TestNG.run()V (TestNG.java:1029)
        at com/sun/javatest/regtest/agent/TestNGRunner.main(Ljava/lang/ClassLoader;[Ljava/lang/String;)V (TestNGRunner.java:94)
        at com/sun/javatest/regtest/agent/TestNGRunner.main([Ljava/lang/String;)V (TestNGRunner.java:54)
        at java/lang/invoke/LambdaForm$DMH/0x00000000a40071f0.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V (LambdaForm$DMH)
        at java/lang/invoke/LambdaForm$MH/0x00000000a40260b0.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (LambdaForm$MH)
        at java/lang/invoke/LambdaForm$MH/0x00000000a40265d0.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (LambdaForm$MH)
        at jdk/internal/reflect/DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DirectMethodHandleAccessor.java:155)
        at jdk/internal/reflect/DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DirectMethodHandleAccessor.java:104)
        at java/lang/reflect/Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:578)
        at com/sun/javatest/regtest/agent/MainWrapper$MainThread.run()V (MainWrapper.java:127)
        at java/lang/Thread.run()V (Thread.java:1573)

@pshipton
Copy link
Member Author

pshipton commented Feb 15, 2023

The test itself is invoked via the following (-Xdump:stack:events=load,filter=HoldsLock)

        at java/lang/Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class; (Class.java:432)
        at com/sun/javatest/regtest/agent/TestNGRunner.main(Ljava/lang/ClassLoader;[Ljava/lang/String;)V (TestNGRunner.java:85)
        at com/sun/javatest/regtest/agent/TestNGRunner.main([Ljava/lang/String;)V (TestNGRunner.java:54)
        at java/lang/invoke/LambdaForm$DMH/0x000000005c008940.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V (LambdaForm$DMH)
        at java/lang/invoke/LambdaForm$MH/0x000000005c026240.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (LambdaForm$MH)
        at java/lang/invoke/LambdaForm$MH/0x000000005c026760.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (LambdaForm$MH)
        at jdk/internal/reflect/DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DirectMethodHandleAccessor.java:155)
        at jdk/internal/reflect/DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DirectMethodHandleAccessor.java:104)
        at java/lang/reflect/Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:578)
        at com/sun/javatest/regtest/agent/MainWrapper$MainThread.run()V (MainWrapper.java:127)
        at java/lang/Thread.run()V (Thread.java:1573)

@fengxue-IS
Copy link
Contributor

I've re-run the build on a windows 2016 machine which normally don't fail. (https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder/31074/consoleText)
the test cmdline/parameter is identical to windows 11 machine except -concurrency:4, (win11 uses -concurrency:2)

[2023-02-15T01:12:28.582Z] echo "";	echo "TEST SETUP:"; \
[2023-02-15T01:12:28.582Z] "C:/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image\\bin\\java" -Xshareclasses:destroyAll; "C:/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image\\bin\\java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done"; \
[2023-02-15T01:12:28.582Z] mkdir -p "C:/Users/jenkins/workspace/Grinder/aqa-tests/\\TKG\\output_16764234911383\\jdk_custom_0"; \
[2023-02-15T01:12:28.582Z] cd "C:/Users/jenkins/workspace/Grinder/aqa-tests/\\TKG\\output_16764234911383\\jdk_custom_0"; \
[2023-02-15T01:12:28.582Z] echo "";	echo "TESTING:"; \
[2023-02-15T01:12:28.582Z] "C:/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image\\bin\\java" -Xmx512m -jar "C:/Users/jenkins/workspace/Grinder/aqa-tests///..//jvmtest\\openjdk\\jtreg\\lib\\jtreg.jar" \
[2023-02-15T01:12:28.582Z] -agentvm -a -ea -esa -v:fail,error,time,nopass -retain:fail,error,*.dmp,javacore.*,heapdump.*,*.trc -ignore:quiet -timeoutFactor:8 -xml:verify  -concurrency:4 -nativepath:"C:\Users\jenkins\workspace\Grinder/openjdkbinary/openjdk-test-image\\jdk\\jtreg\\native" -vmoptions:"-Xmx512m  -XX:+UseCompressedOops  " \
[2023-02-15T01:12:28.582Z] -timeoutHandler:jtreg.openj9.CoreDumpTimeoutHandler -timeoutHandlerDir:"C:/Users/jenkins/workspace/Grinder/aqa-tests/\\TKG\\lib\\openj9jtregtimeouthandler.jar" \
[2023-02-15T01:12:28.582Z] -w ""C:/Users/jenkins/workspace/Grinder/aqa-tests/\\TKG\\output_16764234911383\\jdk_custom_0"\\work" \
[2023-02-15T01:12:28.582Z] -r ""C:/Users/jenkins/workspace/Grinder/aqa-tests/\\TKG\\output_16764234911383\\jdk_custom_0"\\report" \
[2023-02-15T01:12:28.582Z] -jdk:"C:/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image" \
[2023-02-15T01:12:28.582Z] C:/Users/jenkins/workspace/Grinder/aqa-tests/\\openjdk\\openjdk-jdk\\test\\jdk\\java/lang/Thread/virtual/HoldsLock.java;

With windows 2016, the <SeedGenerator Thread> are not being exited (imply not created either).
I am not aware of any special handling between different windows levels. maybe this is due to machine setup?
So far I wasn't able to reproduce the failure on windows 2016, will try running this on 2012 which most of the nightly build failure is found on.

@pshipton
Copy link
Member Author

I suspect it's related to #16710, i.e. machines where access to the MS Crypto API isn't allowed, otherwise the Window NativeSeedGenerator would be used. Plus the speed of the machines.

@pshipton
Copy link
Member Author

How about we just fix the test to check for nulls, in the case where the thread no longer exists.

@fengxue-IS
Copy link
Contributor

How about we just fix the test to check for nulls, in the case where the thread no longer exists.

I will create a PR to fix the test in the extension repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants