-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in GC when attached profiler calls ICorProfilerInfo4::EnumThreads using .NET9 runtime #110062
Comments
Tagging subscribers to this area: @tommcdon |
The reason why Among some other things holding the thread store lock ensures that no runtime-managed threads are concurrently added or removed. Enumerating threads would typically require that the thread store lock is acquired. Also It looks like it was like this for many releases back. I am not sure why it started causing problems now. Perhaps some dependencies on Enforcing that the lock is recursive might be a breaking change. It would most certainly rule out the entire scenario where threads are enumerated in On the other hand, there is no danger in actually supporting recursive use. At least this is my current theory. |
Thanks for pointing out the problematic recursive ThreadStoreLock! I looked into it a bit more, and the ThreadStoreLock is being acquired recursively because The runtime/src/coreclr/vm/threadsuspend.cpp Lines 2133 to 2142 in 2d6ea8d
Do we need to update the |
Sincere thanks for the quick turnaround! any idea as to which release this will be included in? |
Currently, it will only be included in .NET 10, which can be previewed in the .NET SDK latest builds when the next version is out. Would that work for you, or do you need it in .NET 9? |
This issue currently makes our profiler unable to be used with any application running on .NET 9, so I would say a fix for .NET 9 is critical |
Description
Background: I am member of team responsible for maintaining a closed source .net profiler
Issue: I am currently working on validating .NET9 with profiler and while running regression tests, I've observed process stall when using .NET9 runtime with the profiler attached.
Root Cause: I've managed to narrow down the cause of the issue to an invocation of
ICorProfilerInfo4::EnumThreads
within our profiler's overriddenICorProfilerCallback::RuntimeSuspendFinished
method, which I'll share in the 'actual behavior' formReproduction Steps
In profiler dll project have class:
`class ProfilerCallback : public ICorProfilerCallback5
{
public:
virtual COM_METHOD(HRESULT) Initialize(IUnknown* pICorProfilerInfoUnk)
{
return pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo8, (void**)&m_pProfilerInfo);
}
};
`
Attach as profiler this (using guide
Run and observe process stall/deadlock after
RuntimeSuspendFinished
is calledExpected behavior
Process should not deadlock
Actual behavior
Process deadlocks
This is the chain of calls that leads to a deadlock:
RuntimeSuspendFinished
,EnumThreads
is calledStateHolder
destructor inProfilerThreadEnum::Init
,ThreadStore::s_pThreadStore->m_HoldingThread
is set to NULLRuntimeSuspendFinished
returnsWKS::GCHeap::GarbageCollectGeneration
is called which then callsThread::RareDisablePreemptiveGC
ThreadStore::HoldingThreadStore
will return true asm_HoldingThread
== NULL, which avoids early exit fromThread::RareDisablePreemptiveGC
GCHeapUtilities::GetGCHeap()->WaitUntilGCComplete()
is made which will deadlock asGCHeap::SetWaitForGCEvent()
hasn't been called yet byThreadSuspend::RestartEE
Below are the relevant stack traces:
Resetting holding thread:
coreclr.dll!ThreadSuspend::UnlockThreadStore(int bThreadDestroyed, ThreadSuspend::SUSPEND_REASON) Line 1934 C++ [Inline Frame] coreclr.dll!ThreadStore::UnlockThreadStore() Line 5110 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::Release() Line 359 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::{dtor}() Line 340 C++ coreclr.dll!ProfilerThreadEnum::Init() Line 585 C++ coreclr.dll!ProfToEEInterfaceImpl::EnumThreads(ICorProfilerThreadEnum * * ppEnum) Line 10366 C++ CoreRewriter_x64.dll!ProfilerCallback::RuntimeSuspendFinished() Line 2417 C++ coreclr.dll!EEToProfInterfaceImpl::RuntimeSuspendFinished() Line 5056 C++ coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *)>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *) callback, HRESULT * pHR) Line 284 C++ coreclr.dll!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason) Line 5647 C++ coreclr.dll!GCToEEInterface::SuspendEE(SUSPEND_REASON reason) Line 51 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51030 C++
Checking null holding thread:
coreclr.dll!ThreadStore::HoldingThreadStore(Thread * pThread) Line 7624 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2123 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Deadlock:
coreclr.dll!WKS::GCHeap::WaitUntilGCComplete(bool bConsiderGCStart) Line 238 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2212 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Regression?
We have not encountered this GC deadlock in .NET8 and presumably earlier
Known Workarounds
I haven't tried it yet but there's probably some way to force
ThreadStore::s_pThreadStore->m_HoldingThread
into it's previous correct state after callingEnumThreads
Configuration
.NET: 9.0.100
Processor 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz Installed RAM 32.0 GB (31.7 GB usable) System type 64-bit operating system, x64-based processor
This has also been replicated on build server. I am not aware of the specs for it.
Other information
this line causes
ThreadStore::s_pThreadStore->m_HoldingThread
to be set to NULL which has knock-on effects later on which lead to a deadlock.I have minidumps, but they're pretty large to upload, so let me know if there is any way I can send them
The text was updated successfully, but these errors were encountered: