See CLRIE in Azure.
When an issue occurs in production, the first step is to mitigate it so the application continues to run. A common cause of problems are from updates to either CLRIE or to one of the Instrumentation Method clients that introduces potential conflicts with another client or incompatibilities with the app.
The most direct way to stop CLRIE and all Instrumentation Methods is to remove the ICorProfiler environment variables that are being applied to the process (along with a process restart). This may either require updating applicationHost.xdt files, changing the InstrumentationEngine_EXTENSION_VERSION
Application Setting, or modifying registry keys. Without the ICorProfiler environment variables, the process effectively removes CLRIE & all IMs that are currently running. If the application without CLRIE is still encountering errors, then the issue is something the app owner will need to address.
Alternatively, if the user cannot modify the ICorProfiler environment variable, they may be able to at least remove the dlls which are pointed to by the COR_PROFILER_PATH/CORECLR_PROFILER_PATH environment variables. A process restart will still be required.
For Azure VM/VMSS, the Visual Studio Production Diagnostics team ships a CLRIE-hosted MSI in the Microsoft.Insights.VMDiagnosticsSettings extension (Type: Microsoft.Azure.Diagnostics.IaaSDiagnostics). This extension will apply the ICorProfiler variables to the registry for IIS at HKLM\SYSTEM\CurrentControlSet\Services\W3SVC
and HKLM\SYSTEM\CurrentControlSet\Services\WAS
. In order to disable CLRIE, you will need to remove the VM extension entirely. You can do so either from the Azure Portal or via Azure PowerShell cmdlet Remove-AzVMExtension
.
Currently this feature is not supported. We could potentially create a tool that supports an allow/block list of InstrumentationMethods that CLRIE uses to ignore environment variables on startup. Concerns of this implementation involve security implications of file-on-disk or require privileged user access.
After mitigation and having the application back up and running, the next step is to figure out who the culprit is that produced the error. We recommend using a test environment rather than testing in production to prevent giving customers a very negative experience.
It is recommended to just enable CLRIE without any Instrumentation Methods to ensure there's no regression in CLRIE's steady-state behavior. This is especially important if CLRIE was recently updated because regressions or incompatibilities might be introduced for the current application. We expect these issues to be rare as CLRIE is tested on a variety of platforms and apps and features are added incrementally with emphasis on supporting backwards compatibility.
Profilers are the main feature providers and so should be the focal point of investigation. Instrumentation Methods are loaded by CLRIE via environment variables that have the prefix MicrosoftInstrumentationEngine_ConfigPath64_
or MicrosoftInstrumentationEngine_ConfigPath32_
so app owners can easily determine what Instrumentation Methods are running in their process.
The user should enable each Instrumentation Method by itself and see if the application runs into the issue. This can confirm directly whether an Instrumentation Method by itself is causing the issue. This does not confirm if the Instrumentation Method is causing the issue or if it relies on behavior from CLRIE that is broken or has bugs.
The more complex scenarios occur when multiple Instrumentation Methods are running as their behaviors may conflict without either party being aware. For these scenarios it is difficult to find a sole owner. We recommend starting a discussion by posting a GitHub issue with tags for each of the involved Instrumentation Methods products. Private discussions or discussions with sensitive information can be done via email or other channels once the involved parties are established.
A quick way to see if any errors are occurring in the CLR Instrumentation Engine or Instrumentation Methods is to enable logging. Please see Environment Variables for detailed information on the allowed values to set. For Instrumentation Method authors, we recommend enabling the "Dumps" log level which shows the IL transformations.
Setting MicrosoftInstrumentationEngine_LogLevel
will enable logging to the local computer Application log under the source name
"Instrumentation Engine". Currently LogLevel only supports Error
level. All
will default to Error
. If this variable is missing or set to
None
then no event logging will occur.
These error event logs can be seen in the eventlog.xml in Azure App Services or in Event Viewer on Windows.
If MicrosoftInstrumentationEngine_FileLogPath
is set along with MicrosoftInstrumentationEngine_LogLevel
, then logging to the filepath will
occur. If the FilePath is set to a directory, then an automatically generated %TEMP%\ProfilerLog_[PID].txt
file is used, where [PID]
represents the process id and the %TEMP% variable is evaluated against the user account running the process.
CLRIE's logging infrastructure is outlined here. During ICorProfilerCallback::Initialize(), InstrumentationMethods can obtain an InstrumentationMethod-specific logger via IProfilerManager::GetLoggingInstance()
which will prepend log messages with the [IM:GUID]
. In addition, IMs can log at a different log level from CLRIE by setting an additional environment variable, allowing IMs to filter out CLRIE or other IM logs. See ProfilerManagerForInstrumentationMethod section for more details.
Log parsing or an event collector can utilize these patterns to track InstrumentationMethod behaviors and potentially help diagnose issues.
When using a local build of the Instrumentation Engine or testing your InstrumentationMethod, you will need to disable signing validations and checks by setting the
MicrosoftInstrumentationEngine_DisableCodeSignatureValidation
to 1.
Since the CLR Instrumentation Engine initializes on process start, in order for breakpoints in the initialization logic to hit, you will need to set the MicrosoftInstrumentationEngine_DebugWait
environment variable with the value of 1. This will cause the Instrumentation Engine to wait on startup for a debugger attach event.
The best place to start debugging is in ProfilerManager.cpp. ProfilerManager is CLRIE's ICorProfiler implementation which gets the first-chance callback during profiling. It controls setting up InstrumentationMethods and forwarding ICorProfiler callback events to them.
Setting a breakpoint inside CProfilerManager::Initialize()
provides the earliest point where CLRIE is beginning to initialize in the process.
CProfilerManager::AddInstrumentationMethod()
provides the logic for loading each InstrumentationMethod and calling Initialize() on them.
CLRIE's main purpose is to coordinate instrumentation, so callbacks such as CProfilerManager::JITCompilationStarted()
and CProfilerManager::GetReJITParameters()
are also good places to inspect.
In order to debug issues, it is pertinent to have symbols (ie. pdb files) for the modules/assemblies that the process is using. This will aid in investigating IL issues at the point of failure (eg. ProgramInvalidException).
If you are debugging with Visual Studio, make sure you have "Microsoft Symbol Server" enabled for symbol loading in Tools > Options > Debugging > Symbols > Symbol file locations. Symbols for both managed modules such as clr.dll and native modules like kernel32.dll are available to be loaded.