Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[win] i-Builds tests do not finish since making Edge the default on windows due to workspace lock being held #2651

Open
jukzi opened this issue Dec 11, 2024 · 26 comments
Labels
regression Regression defect

Comments

@jukzi
Copy link
Contributor

jukzi commented Dec 11, 2024

https://ci.eclipse.org/releng/job/AutomatedTests/job/ep435I-unit-win32-x86_64-java17/
image

@jukzi jukzi added the regression Regression defect label Dec 11, 2024
@jukzi
Copy link
Contributor Author

jukzi commented Dec 11, 2024

cc @BeckerWdf got time to help?

@BeckerWdf
Copy link
Contributor

Sorry. From your description I don't know what to do.

@jukzi
Copy link
Contributor Author

jukzi commented Dec 11, 2024

Sorry. From your description I don't know what to do.

Probably nobody does -> investigate what exactly went wrong, find out why, solve.

@BeckerWdf
Copy link
Contributor

What does "DNF" mean?

@merks
Copy link
Contributor

merks commented Dec 11, 2024

Did Not Finish.

@jukzi
Copy link
Contributor Author

jukzi commented Dec 11, 2024

@jukzi
Copy link
Contributor Author

jukzi commented Dec 11, 2024

SWT had 1 error then platform.ui DNF:
image

@HeikoKlare
Copy link
Contributor

I just commented on that issue here: #2648 (comment)

It was very likely caused by merging this PR yesterday: eclipse-platform/eclipse.platform.swt#1637

Sorry that no one of us responded on the issues so far. I just pinged the colleagues to have a look. I am currently out of office and thus, unfortunately, cannot help except for pointing to the cause. At worst, we could revert that PR if there is no quick solution.

@fedejeanne
Copy link
Contributor

fedejeanne commented Dec 12, 2024

I'm looking into it as we speak. The error log mentions the folder EBWebView so I will look closer to when the userdata directory for Edge was introduced in eclipse-platform/eclipse.platform.swt#1548.

Unable to delete file C:\Users\genie.releng\workspace\AutomatedTests\ep435I-unit-win32-x86_64-java17\workarea\I20241211-1800\eclipse-testing\test-eclipse\eclipse\update_configurator_folder\.metadata\.plugins\org.eclipse.swt\EBWebView\lockfile

@amartya4256 ping!

@sratz
Copy link
Member

sratz commented Dec 12, 2024

I think this is a timing issue.
The eclipse.platform.releng.aggregator\production\testScripts\configuration\sdk.tests\testScripts\test.xml script starts many runtimes in quick succession:

  • Run tests
  • Delete workspace
  • Run tests
  • Delete workspace
  • ...

Now, with the Edge browser, we spawn msedgewebview2.exe child processes.
When shutting down the runtime, it looks like those processes linger around for a little while, holding on to a lock in the workspace.

What I am not quite sure about: Why does this happen after the org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite but not after the other ones before it?

@jukzi
Copy link
Contributor Author

jukzi commented Dec 12, 2024

When shutting down the runtime, it looks like those processes linger around for a little while, holding on to a lock in the workspace.

junit tests are responsible to wait until their resources have been cleaned. Especially a (the) failed JUnit test should not leak such resources. please adapt the code as necessary.

@sratz
Copy link
Member

sratz commented Dec 12, 2024

junit tests are responsible to wait until their resources have been cleaned. Especially a (the) failed JUnit test should not leak such resources. please adapt the code as necessary.

Not sure if the individual tests can (or should) influence this. They do not spawn those WebView2 processes explicitly.

If anything, there is some Browser instance that is not disposed correctly. I'll have a look at the org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite test suite to see if I can find anything suspicious.

@fedejeanne
Copy link
Contributor

I think this is a timing issue. The eclipse.platform.releng.aggregator\production\testScripts\configuration\sdk.tests\testScripts\test.xml script starts many runtimes in quick succession:

* Run tests

* Delete workspace

* Run tests

* Delete workspace

* ...

Now, with the Edge browser, we spawn msedgewebview2.exe child processes. When shutting down the runtime, it looks like those processes linger around for a little while, holding on to a lock in the workspace.

This was exactly what we thought and @amartya4256 is trying to validate this theory locally as we speak.

What I am not quite sure about: Why does this happen after the org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite but not after the other ones before it?

Probably this configuration:
https://github.com/eclipse-platform/eclipse.platform/blob/master/update/org.eclipse.update.configurator.tests/test.xml#L37-L47

@fedejeanne
Copy link
Contributor

@sratz I think I posted 2 seconds after you so I'll ping you gain, just in case you missed it :-)

What I am not quite sure about: Why does this happen after the org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite but not after the other ones before it?

Probably this configuration: https://github.com/eclipse-platform/eclipse.platform/blob/master/update/org.eclipse.update.configurator.tests/test.xml#L37-L47

☝️

@sratz
Copy link
Member

sratz commented Dec 12, 2024

AutomatedConfiguratorSuite only contains a single test class/method, has no dependency to SWT whatsoever.
Why is an Edge initialized in the first place?

@jukzi
Copy link
Contributor Author

jukzi commented Dec 12, 2024

If there is no solution in sight today i suggest a revert until problem is understood.

@fedejeanne
Copy link
Contributor

AutomatedConfiguratorSuite only contains a single test class/method, has no dependency to SWT whatsoever. Why is an Edge initialized in the first place?

The Intro view... for whatever reason.

image

If put a breakpoint in org.eclipse.swt.browser.Edge.create(Composite, int) and debug the test, you'll see it.

@akurtakov
Copy link
Member

Intro view is a web page so it loads the default browser engine on each platform in order to show itself (render the webpage).

@sratz
Copy link
Member

sratz commented Dec 12, 2024

True, but according to the build log we have 47 other rests suite run before update.configurator.tests.

I'd expect the intro page to come up in all of them. What makes the update configurator one special?

@sratz
Copy link
Member

sratz commented Dec 12, 2024

Cannot reproduce locally:

java -jar plugins\org.eclipse.equinox.launcher_1.6.900.v20240613-2009.jar -application org.eclipse.test.uitestapplication -data .ws -testPluginName org.eclipse.update.configurator.tests -className org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite -os win32 -ws win32 -arch x86_64 -consolelog
rmdir /s /q .ws

is able to delete the whole workspace without problem :(

@sratz sratz transferred this issue from eclipse-platform/eclipse.platform.ui Dec 12, 2024
@sratz sratz changed the title [win] i-Builds repeatedly DNF I20241211-0310 [win] i-Builds tests do not finish since making Edge the default on windows due to workspace lock being held Dec 12, 2024
sratz added a commit to sratz/eclipse.platform.releng.aggregator that referenced this issue Dec 12, 2024
@sratz
Copy link
Member

sratz commented Dec 12, 2024

I opened #2652. Maybe with that all tests can run and the log file gives us some indication for which test suites msedgewebview2.exe were lingering after platform shutdown to find some pattern.

@fedejeanne
Copy link
Contributor

Cannot reproduce locally:

java -jar plugins\org.eclipse.equinox.launcher_1.6.900.v20240613-2009.jar -application org.eclipse.test.uitestapplication -data .ws -testPluginName org.eclipse.update.configurator.tests -className org.eclipse.update.configurator.tests.AutomatedConfiguratorSuite -os win32 -ws win32 -arch x86_64 -consolelog
rmdir /s /q .ws

is able to delete the whole workspace without problem :(

Can you try that in a loop, say 100 times?

Or can you tell me where exactly did you run the command so I can test it too?

@sratz
Copy link
Member

sratz commented Dec 12, 2024

Looks like this helped. All tests have run again and the log says that some msedgewebview2.exe instances were indeed killed.

This is something we need to handle more nicely in the test infrastructure than just killing processes.

But in general this is something that we will have to deal with with Edge in some way or the other due to WebView2's multiprocess / asynchronous architecture.

@fedejeanne
Copy link
Contributor

Looks like this helped. All tests have run again and the log says that some msedgewebview2.exe instances were indeed killed.

Awesome, thank you so much for the fix!

This is something we need to handle more nicely in the test infrastructure than just killing processes.

But in general this is something that we will have to deal with with Edge in some way or the other due to WebView2's multiprocess / asynchronous architecture.

Agreed. We should revisit this issue early next year.

@HeikoKlare
Copy link
Contributor

Thank you all for having a look and thank you @sratz for finding a solution so fast (even if it may not be the final solution)!
Also great to see that all downstream tests of the I-Build seem to run fine in the Jenkins infrastructure with Edge being used as default.

@merks
Copy link
Contributor

merks commented Dec 12, 2024

Indeed. Thanks to all of you. Also @jukzi and @akurtakov. So many people involved to keep things running smoothly. Much of it thankless work. But essential to running a good ship.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression defect
Projects
None yet
Development

No branches or pull requests

7 participants