Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition helix test runs using historical execution time data #62138

Merged
merged 32 commits into from
Aug 18, 2022

Conversation

dibarbet
Copy link
Member

@dibarbet dibarbet commented Jun 24, 2022

Resolves #62036

This does a few different things to fix our test partitioning

  1. Run real test discovery to find the real set of tests to run and puts that in a json file in prepare-tests. We must do it here because discovery must be run before test payload minimization.
  2. During partitioning, we lookup the test execution history for the appropriate stage to estimate the execution time of each test. We then use this info to partition the tests into work items that run under a certain time (currently 2:30). There is one test that does not run in under 2:30 consistently - Microsoft.CodeAnalysis.CSharp.UnitTests.OverloadResolutionPerfTests.NestedLambdas_01 but that can be fixed as a followup
  3. Run tests via vstest.console.dll directly. This is what already gets called when using dotnet test and passing a specific dll in. We run directly on vstest.console.dll so we can utilize RSP files to output a test filter that contains all fully qualified test method names (there is a bug where RSP files passed to dotnet test do not get passed down correctly, see Long response files don't work with dotnet test & dotnet vstest  microsoft/vstest#3513). RSP files get around max path limitations and avoid escaping issues on different platforms.

@dibarbet dibarbet force-pushed the partition_tests branch 9 times, most recently from 607bcfc to 841d67c Compare June 30, 2022 00:47
@dibarbet dibarbet force-pushed the partition_tests branch 4 times, most recently from 6f5c82a to b6ada4d Compare July 8, 2022 21:16
@RikkiGibson RikkiGibson self-assigned this Jul 8, 2022
@dibarbet dibarbet force-pushed the partition_tests branch 8 times, most recently from ab3b550 to 793e9fd Compare July 11, 2022 22:53
@dibarbet dibarbet requested a review from jaredpar August 5, 2022 05:38
@dibarbet
Copy link
Member Author

dibarbet commented Aug 5, 2022

@jaredpar this is ready for round 2

Copy link
Contributor

@RikkiGibson RikkiGibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great progress.

src/Tools/PrepareTests/TestDiscovery.cs Outdated Show resolved Hide resolved
src/Tools/PrepareTests/TestDiscovery.cs Outdated Show resolved Hide resolved
src/Tools/PrepareTests/TestDiscovery.cs Show resolved Hide resolved
src/Tools/Source/RunTests/AssemblyInfo.cs Outdated Show resolved Hide resolved
src/Tools/Source/RunTests/AssemblyScheduler.cs Outdated Show resolved Hide resolved
src/Tools/Source/RunTests/AssemblyScheduler.cs Outdated Show resolved Hide resolved
src/Tools/Source/RunTests/ProcessRunner.cs Outdated Show resolved Hide resolved
src/Tools/Source/RunTests/ProcessTestExecutor.cs Outdated Show resolved Hide resolved
timer.Start();
for (var i = 0; i < totalTests; i += MaxTestsReturnedPerRequest)
{
var testResults = await GetTestResultsAsync(runForThisStage, i, MaxTestsReturnedPerRequest, testClient, cancellationToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this phase takes maybe 30 seconds on desktop. Should we tinker with firing the all the tasks off immediately, pushing them into an array or something, then awaiting them one by one?

There's no need to spend time on it in this PR, but thought we should keep it in our back pocket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - we can play around with it (will do that separately). We have to be careful not to run too many though that we get rate limited.

@jaredpar
Copy link
Member

Can we file a work item to track refactoring run tests and prepare tests to have the better separation of responsibilities? Essentially at this point I think we should just move all helix ops into prepare tests and dumb down run tests a bit (perhaps just remove it from our core infra loop).

@dibarbet
Copy link
Member Author

Can we file a work item to track refactoring run tests and prepare tests to have the better separation of responsibilities? Essentially at this point I think we should just move all helix ops into prepare tests and dumb down run tests a bit (perhaps just remove it from our core infra loop).

Done - #63413

@dibarbet
Copy link
Member Author

dibarbet commented Aug 15, 2022

test failures look like a known outage uploading results
https://helix.dot.net/api/jobs/b0ef8a27-a0a8-466f-b918-0152d8be7506/workitems/Microsoft.Build.Tasks.CodeAnalysis.UnitTests_Microsoft.CodeAnalysis.CSharp.CodeStyle.UnitTests_1?api-version=2019-06-17

there is an ICM open for it right now. Will retry once it is closed.

@dibarbet
Copy link
Member Author

dibarbet commented Aug 16, 2022

looks like there are some new breakages that were not showing up before. They also aren't showing up in main so likely some main change/helix rollout + partitioning is breaking this.

[xUnit.net 00:02:10.41]     Microsoft.CodeAnalysis.Editor.UnitTests.CommentSelection.CommentUncommentSelectionCommandHandlerTests.Uncomment_AtBeginningOfEndOfBlockComment [FAIL]
  Failed Microsoft.CodeAnalysis.Editor.UnitTests.CommentSelection.CommentUncommentSelectionCommandHandlerTests.Uncomment_MatchesBlockComment [1 ms]
  Error Message:
   System.TypeInitializationException : The type initializer for 'Roslyn.Test.Utilities.StaTaskScheduler' threw an exception.
---- System.IO.FileLoadException : Could not load file or assembly 'file:///C:/h/w/B66909A9/w/A15A08F5/e/Microsoft.CodeAnalysis.EditorFeatures.UnitTests/Debug/net472/Microsoft.CodeAnalysis.XunitHook.DLL' or one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)
-------- System.NotSupportedException : An attempt was made to load an assembly from a network location which would have caused the assembly to be sandboxed in previous versions of the .NET Framework. This release of the .NET Framework does not enable CAS policy by default, so this load may be dangerous. If this load is not intended to sandbox the assembly, please enable the loadFromRemoteSources switch. See http://go.microsoft.com/fwlink/?LinkId=155569 for more information.
  Stack Trace:
     at Roslyn.Test.Utilities.StaTaskScheduler.get_DefaultSta()

@dibarbet
Copy link
Member Author

Going to merge this. Will followup on why exactly https://github.com/dotnet/roslyn/blob/main/src/Compilers/CSharp/Test/WinRT/Metadata/WinMdMetadataTests.cs#L141 causes the xunit hook to fail to load when run before any wpffact test.

@dibarbet dibarbet merged commit c5cf429 into dotnet:main Aug 18, 2022
@ghost ghost added this to the Next milestone Aug 18, 2022
@dibarbet dibarbet deleted the partition_tests branch August 22, 2022 18:10
@dibarbet dibarbet modified the milestones: Next, 17.4 P2 Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helix test runner does not optimally partition tests between work items
5 participants