Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSBuild task hangs occasionally in Linux when invoking 'dotnet run' #9671

Open
jorgensigvardsson opened this issue Mar 26, 2019 · 14 comments
Open
Labels
needs-triage Have yet to determine what bucket this goes in.
Milestone

Comments

@jorgensigvardsson
Copy link

Steps to reproduce

Create a project, add a target that is invoked before target BeforeBuild. That task should then invoke <Exec Command="dotnet run -c $(Configuration) -p ../OtherProject" />.

Expected behavior

I expect the command to run and finish, so that msbuild can continue executing targets.

Actual behavior

Msbuild seemingly hangs, as if it cannot determine that OtherProject has exited. This only occurs in Linux, and only sometimes. It always hangs when I run the same task on the Ubuntu 1604 hosted agent in Azure DevOps. It sometimes hangs when I run the same task in Docker on my Windows desktop machine.

Environment data

I am using the docker image microsoft/dotnet:2.2-sdk as a base for my own image. I have stripped it down to a bare minimum with /bin/bash as ENTRYPOINT, so that I have been able to run the commands manually.

dotnet --info output:

.NET Core SDK (reflecting any global.json):
 Version:   2.2.104
 Commit:    73f036d4ac

Runtime Environment:
 OS Name:     debian
 OS Version:  9
 OS Platform: Linux
 RID:         debian.9-x64
 Base Path:   /usr/share/dotnet/sdk/2.2.104/

Host (useful for support):
  Version: 2.2.2
  Commit:  a4fd7b2c84

.NET Core SDKs installed:
  2.2.104 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

The "offending" target looks like this in my .csproj:

<Target Name="GenerateBuilders" BeforeTargets="BeforeBuild">
    <Message Text="Generating model builder classes..." Importance="High"/>
    <Exec Command="dotnet run -c $(Configuration) -p ../Tracy.Core.Dal.ModelBuilderGenerator Builders.cs" />
    <Message Text="Finished." Importance="High"/>
  </Target>

The project Tracy.Core.Dal.ModelBuilderGenerator is a custom project that generates code at runtime for other projects to consume. In the logs I can see all the output from the generator project. The very last output is right before return 0;.

The workaround I have now is to tag the target with Condition="'$(BuildingInsideVisualStudio)' == 'true'" so that it'll work as expected during development time. During build, I publish the tool in my Docker file to an executable, which I run before the initial dotnet invocation.

Source code access

Access to source code etc can be arranged privately if needed.

@livarcocc
Copy link
Contributor

Can you capture a dump when this happens? Do you see the dotnet run process running without stopping when this happens.

@jorgensigvardsson
Copy link
Author

I only had bash, and three dotnet processes. I don't have a ps ef output right now, but can of course provide it on Monday.

The dotnet processes were all sleeping, as if they were waiting for something. Also, IIRC, the processes were grand parent, parent and child. I believe dotnet run was the child, but I am not 100% sure about that.

The project I run is just a simple console app that don't read from stdin, it only grabs some CLR metadata that it writes to a file.

I did try to do a dotnet publish to generate a binary to execute instead, but it hung as well.

@jorgensigvardsson
Copy link
Author

I just tried to reproduce the error in my own docker host, but I cannot reproduce the hanging dotnet run process. I can't get the access I need against the docker host in Azure DevOps, so I'm a bit clueless/powerless now.

@msftgits msftgits transferred this issue from dotnet/cli Jan 31, 2020
@msftgits msftgits added this to the Discussion milestone Jan 31, 2020
@MartinKarlgrenIMI
Copy link

We have the same problem. We noticed that the task actually completes successfully after 15 minutes.

The DefaultNodeConnectionTimeout is 900 seconds -- possibly related?

@baronfel
Copy link
Member

@ladipro /@rainersigwald is there any debugging information in MSBuild that could help diagnose if this is a node connection issue?

@MartinKarlgrenIMI
Copy link

Setting MSBUILDNODECONNECTIONTIMEOUT="30000" in the environment does indeed reduce the waiting time, and the task finishes successfully after 30 seconds instead of 15 minutes.

@ladipro
Copy link
Member

ladipro commented Jan 19, 2024

@MartinKarlgrenIMI, with MSBUILDDEBUGCOMM set to 1, MSBuild will be dumping node communication log to files named MSBuild_CommTrace_PID_*.txt in the temp directory. Would it be possible to share these logs from a problematic build?

@MartinKarlgrenIMI
Copy link

@ladipro, sure, files below.
(This was a build with a 30000 ms timeout, I noticed that in the *_1794.txt file the timeout is hit for one thread.)

MSBuild_CommTrace_PID_1794.txt
MSBuild_CommTrace_PID_1767.txt
MSBuild_CommTrace_PID_1709.txt
MSBuild_CommTrace_PID_1670.txt

@ladipro ladipro transferred this issue from dotnet/sdk Jan 22, 2024
@ladipro ladipro added the needs-triage Have yet to determine what bucket this goes in. label Jan 22, 2024
@ladipro
Copy link
Member

ladipro commented Jan 22, 2024

It looks like ToolTask doesn't receive the Process.Exited notification if the tool process is dotnet build / dotnet run which creates a new OOP node process. Or rather, it receives it only after the node process has exited.

@ladipro
Copy link
Member

ladipro commented Jan 22, 2024

Likely the same root cause as dotnet/sdk#9452. Could be specific to AzDO environment.

@ladipro
Copy link
Member

ladipro commented Jan 22, 2024

@MartinKarlgrenIMI can you please try passing the --init flag per the last couple of comments in dotnet/runtime#27115 ?

@SeijiSuenaga
Copy link

@ladipro unfortunately dotnet run --init --project abc didn't fix the 15-minute hang in my case.

@ladipro
Copy link
Member

ladipro commented Feb 9, 2024

@ladipro unfortunately dotnet run --init --project abc didn't fix the 15-minute hang in my case.

@SeijiSuenaga my understanding is that --init should be passed to docker, not dotnet. See https://docs.docker.com/engine/reference/commandline/container_run/#init

@SeijiSuenaga
Copy link

SeijiSuenaga commented Feb 9, 2024

@ladipro Ah, sorry. Just tried that as well, but it still hung for 15 minutes. (In my case, the hangs are happening in GitLab CI, so I tested it by enabling their FF_USE_INIT_WITH_DOCKER_EXECUTOR feature flag.)

That said, I did find a workaround for my particular scenario. In case it helps anyone else, I found that my MSBuild target was only hanging when executing as part of dotnet test, not dotnet build for the same project. So I adjusted my CI script to run dotnet build first, then dotnet test, and now it runs completely normally. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Have yet to determine what bucket this goes in.
Projects
None yet
Development

No branches or pull requests

7 participants