Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant timeout exceptions #959

Closed
Astral100 opened this issue Oct 28, 2023 · 7 comments
Closed

Constant timeout exceptions #959

Astral100 opened this issue Oct 28, 2023 · 7 comments
Assignees
Milestone

Comments

@Astral100
Copy link

Hello,

I have 6 SQL trigger functions running on the same DB (different tables), and I constantly get timeout exceptions in the logs.

I wonder what could be the reason?

Here is an example of one such log:

2023-10-28T09:20:04Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:05Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:30Z   [Warning]   Possible thread pool starvation detected.
2023-10-28T09:20:30Z   [Information]   Executing StatusCodeResult, setting HTTP status code 429
2023-10-28T09:20:30Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:44Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:44Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:52Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(1669580986);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '5108c00480c44150' AND UserTableID = 1669580986;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '5108c00480c44150' AND UserTableID = 1669580986;
            
2023-10-28T09:20:52Z   [Error]   Failed to check for changes in table 'dbo.AltairUserContractFields' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:52Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(67531324);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;
            
2023-10-28T09:20:53Z   [Error]   Failed to check for changes in table 'dbo.AgencyFlags' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:53Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(512720879);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '8df7bc48feee4230' AND UserTableID = 512720879;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '8df7bc48feee4230' AND UserTableID = 512720879;
            
2023-10-28T09:20:53Z   [Error]   Failed to check for changes in table 'dbo.AltairAlternateCalendar' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:54Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(2007014231);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;
            
2023-10-28T09:20:54Z   [Error]   Failed to check for changes in table 'dbo.AltairAgencyRules' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:54Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(345768289);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '9d7cfa3d330ff016' AND UserTableID = 345768289;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '9d7cfa3d330ff016' AND UserTableID = 345768289;
            
2023-10-28T09:20:54Z   [Error]   Failed to check for changes in table 'dbo.AltairUsers' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:54Z   [Error]   Failed to check for changes in table 'dbo.AltairUserContractFields' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:54Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(2007014231);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;
            
2023-10-28T09:20:54Z   [Error]   Failed to check for changes in table 'dbo.AltairAgencyRules' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:54Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(67531324);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;
            
2023-10-28T09:20:54Z   [Error]   Failed to check for changes in table 'dbo.AgencyFlags' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:20:57Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:20:57Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:21:10Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:21:11Z   [Warning]   Possible thread pool starvation detected.
2023-10-28T09:21:11Z   [Information]   Executing StatusCodeResult, setting HTTP status code 429
2023-10-28T09:21:23Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:21:23Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(67531324);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '71ff6c58b99126a6' AND UserTableID = 67531324;
            
2023-10-28T09:21:23Z   [Error]   Failed to check for changes in table 'dbo.AgencyFlags' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:21:24Z   [Information]   Executing StatusCodeResult, setting HTTP status code 200
2023-10-28T09:21:24Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(512720879);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '8df7bc48feee4230' AND UserTableID = 512720879;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '8df7bc48feee4230' AND UserTableID = 512720879;
            
2023-10-28T09:21:24Z   [Error]   Failed to check for changes in table 'dbo.AltairAlternateCalendar' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:21:26Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(2007014231);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = 'f2f4cb674326d195' AND UserTableID = 2007014231;
            
2023-10-28T09:21:26Z   [Error]   Failed to check for changes in table 'dbo.AltairAgencyRules' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:21:26Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                DECLARE @min_valid_version bigint;
                SET @min_valid_version = CHANGE_TRACKING_MIN_VALID_VERSION(345768289);

                DECLARE @last_sync_version bigint;
                SELECT @last_sync_version = LastSyncVersion
                FROM [az_func].[GlobalState]
                WHERE UserFunctionID = '9d7cfa3d330ff016' AND UserTableID = 345768289;

                IF @last_sync_version < @min_valid_version
                    UPDATE [az_func].[GlobalState]
                    SET LastSyncVersion = @min_valid_version
                    WHERE UserFunctionID = '9d7cfa3d330ff016' AND UserTableID = 345768289;
            
2023-10-28T09:21:26Z   [Error]   Failed to check for changes in table 'dbo.AltairUsers' due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
2023-10-28T09:21:26Z   [Error]   Exception executing query. Message=Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Query=

After some time it recovers and keeps working and then starts failing again.

I wonder what could be the reason?

I have 6 SQL trigger functions running at the moment. Usually only 1 or 2 from them is actually used in a single point of time.

@MaddyDev
Copy link
Contributor

Could you please provide us with the following information: Do you have any custom timeout defined on the server (default is 15 sec) or are there any custom configuration(s) to consider and what type of database are you targeting. It would also help to know if this is a heavily used database.

Copy link

We need more info to debug your particular issue. If you could attach your logs to the issue, it would help us fix the issue much faster.
Note that these instructions are for local development only.

  1. Set the default logging level to Debug and enable logging to a file. To do this ensure the following entries are in your host.json. (instructions)
{
  "logging": {
    "fileLoggingMode": "always",
    "logLevel": {
      "default": "Debug"
    }
  }
}
  1. Restart your function and reproduce your issue

  2. Get the log file and attach it to this issue. By default this will be in %TMP%/LogFiles/Application/Functions/Host.

NOTE Debug logging will include information such as Database and table names, if you do not wish to include this information you can either redact it from the logs before attaching or let us know and we will provide a way to send logs directly to us.

@Astral100
Copy link
Author

Astral100 commented Nov 10, 2023

Hello,

Unfortunately there has not been a high load in our database recently, so I couldn't replicate the issue, however we have been investigating this issue with Azure SQL Server support team from Microsoft and we have some guesses as to why this might happen.

After thinking about it some more I believe the problem may be caused by Function App scaling.
I have 6 SQL Trigger Functions (for different tables) running on the Premium Function App, which will scale up and down depending on the load.
We normally have up to 2-5 instances of the same Function App running. When scale out happens I believe we have several of the same type of functions all trying to obtain exclusive lock on the same table. Which is why deadlocks may be happening.
I also remember when we were trying the same function app on the dedicated App Service plan (single machine per function app), the deadlocks were not happening.

Here is the situation explained in detail:

  1. We have a relatively powerful SQL Server DB that can scale up to 40Vcores.
  2. This DB had a substantial load for about 2 weeks, which you can see here:
    MicrosoftTeams-image (3)
  3. This load was triggering SQL Server Trigger functions linked to our database.
  4. We have a single Function App with 6 SQL Trigger Functions (each trigger pointing to its own table)
    image
  5. This Function App is hosted on a Premium plan EP1, with maximum burst set to 20
    image
  6. This plan is also shared between 2 other Function Apps (3 in total)
  7. This implies that our Function App is hosted on a relatively low end machine with moderate load that causes it to constantly scale up and down
  8. Looking at the number of instances we can see that we always had around 2 - 6 instances running at the same time:
    image
  9. From debug and error queries seen in the logs in the 1st post we can see that SQL Trigger function is trying to obtain an exclusive lock on a table with a timeout set to 30 seconds.
  10. My guess is that we have multiple functions all trying to obtain an exclusive lock on the same table, and possibly initial locks are not being released fast enough before new locks try to obtain locks on the same table. I am not sure what exactly is happening with those exclusive locks and why some of them are not released even after 30 seconds (which causes other locks to timeout), however there definitely is some kind of deadlock situation happening.
  11. Also timeout error sometime happens during scale down

Also on a slightly different note, I've also seen the following error occurring from time to time as well, which is also possibly related:

Exception executing query. Message=Incorrect syntax near ';'.
Query=
                DECLARE @result int;
                EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
                            @LockMode = 'Exclusive',
                            @LockTimeout = 30000
                IF @result < 0
                BEGIN
                    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
                END;

                UPDATE [az_func].[Leases_AltairUserContractFields]
                SET _az_func_LeaseExpirationTime = DATEADD(second, 60, SYSDATETIME())
                WHERE ;

and this right after:
Failed to renew leases due to exception: Microsoft.Data.SqlClient.SqlException. Exception message: Incorrect syntax near ';'.

As well as this possibly related error:
Singleton lock renewal failed for blob 'functionapp-timelock-data-api-qa/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2023-11-09T22:23:39.12Z (24083 milliseconds ago) with a duration of 8 milliseconds. The lease period was 15000 milliseconds.

If you can investigate and advice on what to do it would be great!

@Charles-Gagnon Charles-Gagnon self-assigned this Nov 30, 2023
@Charles-Gagnon
Copy link
Contributor

Charles-Gagnon commented Dec 1, 2023

Hey @Astral100 - I'm currently looking into this and have identified some improvements we can make that should help with the issues you've encountered related to the timeouts.

In the meantime though, one thing that I'll need some help with is understanding what the queries our extension is running are getting deadlocked on. To do this you'll need to create an event session and then wait for another deadlock to occur. The steps to do so are as follows (from https://learn.microsoft.com/en-us/azure/azure-sql/database/analyze-prevent-deadlocks)

  1. Create and start the event session (I have it going to a ring buffer since that's easiest to set up, but you're free to use an event file if you wish) :
CREATE EVENT SESSION [deadlocks] ON DATABASE 
ADD EVENT sqlserver.database_xml_deadlock_report
ADD TARGET package0.ring_buffer 
WITH (STARTUP_STATE=ON, MAX_MEMORY=4 MB)
GO

ALTER EVENT SESSION [deadlocks] ON DATABASE
    STATE = START;
GO
  1. Wait for deadlock to occur (you can set up an alert if you wish)
  2. Run this query to collect the deadlock information
DECLARE @tracename sysname = N'deadlocks';

WITH ring_buffer AS (
    SELECT CAST(target_data AS XML) as rb
    FROM sys.dm_xe_database_sessions AS s 
    JOIN sys.dm_xe_database_session_targets AS t 
        ON CAST(t.event_session_address AS BINARY(8)) = CAST(s.address AS BINARY(8))
    WHERE s.name = @tracename and
    t.target_name = N'ring_buffer'
), dx AS (
    SELECT 
        dxdr.evtdata.query('.') as deadlock_xml_deadlock_report
    FROM ring_buffer
    CROSS APPLY rb.nodes('/RingBufferTarget/event[@name=''database_xml_deadlock_report'']') AS dxdr(evtdata)
) 
SELECT 
    d.query('/event/data[@name=''deadlock_cycle_id'']/value').value('(/value)[1]', 'int') AS [deadlock_cycle_id],
    d.value('(/event/@timestamp)[1]', 'DateTime2') AS [deadlock_timestamp],
    d.query('/event/data[@name=''database_name'']/value').value('(/value)[1]', 'nvarchar(256)') AS [database_name],
    d.query('/event/data[@name=''xml_report'']/value/deadlock') AS deadlock_xml,
    LTRIM(RTRIM(REPLACE(REPLACE(d.value('.', 'nvarchar(2000)'),CHAR(10),' '),CHAR(13),' '))) as query_text
FROM dx
CROSS APPLY deadlock_xml_deadlock_report.nodes('(/event/data/value/deadlock/process-list/process/inputbuf)') AS ib(d)
ORDER BY [deadlock_timestamp] DESC;
GO
  1. Look through the deadlocks and find the deadlock_cycle_id pairs where at least one of the query_text fields is a query coming from our extension (they should all start with the sp_getapplock @Resource = '_az_func_Trigger', query)

Then for 3 (to start with) of the deadlocks to start with respond with the query_text for each of the queries that was deadlocked. If any of the queries are internal and you'd rather not share publicly that's fine - just indicate that and we'll work with what we have. You can go through CSS too if you wish, we have an internal ticket as well that is tracking your case there.

That'll be enough for now, and I'll update as we make changes for the other issues as those happen!

@Astral100
Copy link
Author

Hello @Charles-Gagnon, I have tried to run above queries during the last time when deadlocks were occurring, but I was not getting anything back (just empty results). I tried running those queries on both main and master databases.

We did however get the deadlock graph analysed with the Microsoft support engineer, and it seems that these 2 queries were causing deadlocks:

Query 1 (SQL Trigger query):

DECLARE @result int;
EXEC @result = sp_getapplock @Resource = '_az_func_Trigger',
            @LockMode = 'Exclusive',
            @LockTimeout = 30000
IF @result &lt; 0
BEGIN
    RAISERROR('Unable to acquire exclusive lock on _az_func_Trigger. Result = %d', 16, 1, @result)
END;
DECLARE @current_last_sync_version bigint;
SELECT @current_last_sync_version = LastSyncVersion
FROM [az_func].[GlobalState]
WHERE UserFunctionID = '5108c00480c44150' AND UserTableID = 1669580986;
DECLARE @unprocessed_changes bigint;
SELECT @unprocessed_changes = COUNT(*) FROM (
    SELECT c.SYS_CHANGE_VERSION
    FROM CHANGETABLE(CHANGES [dbo].[AltairUserContractFields], @current_last_sync_version) AS c
    LEFT OUTER JOIN [az_func].[Leases_AltairUserContractFields]
............................................................................

Query 2 (Our query that updates a few records in the table):

(@p10 int,@p0 nvarchar(4000),@p1 int,@p2 nvarchar(4000),@p3 datetime2(7),@p4 datetime2(7),@p5 datetime2(7),@p6 nvarchar(4000),@p7 nvarchar(4000),@p8 nvarchar(4000),@p9 datetime2(7),@p21 int,@p11 nvarchar(4000),@p12 int,@p13 nvarchar(4000),@p14 datetime2(7),@p15 datetime2(7),@p16 datetime2(7),@p17 nvarchar(4000),@p18 nvarchar(4000),@p19 nvarchar(4000),@p20 datetime2(7),@p32 int,@p22 nvarchar(4000),@p23 int,@p24 nvarchar(4000),@p25 datetime2(7),@p26 datetime2(7),@p27 datetime2(7),@p28 nvarchar(4000),@p29 nvarchar(4000),@p30 nvarchar(4000),@p31 datetime2(7),@p43 int,@p33 nvarchar(4000),@p34 int,@p35 nvarchar(4000),@p36 datetime2(7),@p37 datetime2(7),@p38 datetime2(7),@p39 nvarchar(4000),@p40 nvarchar(4000),@p41 nvarchar(4000),@p42 datetime2(7),@p54 int,@p44 nvarchar(4000),@p45 int,@p46 nvarchar(4000),@p47 datetime2(7),@p48 datetime2(7),@p49 datetime2(7),@p50 nvarchar(4000),@p51 nvarchar(4000),@p52 nvarchar(4000),@p53 datetime2(7),@p65 int,@p55 nvarchar(4000),@p56 int,@p57 nvarchar(4000),@p58 datetime2(7),@p59 datetime2(7),@p60 datetime2(7),@p61 nvarchar(4000),@p62 nvarchar(4000),@p63 nvarchar(4000),@p64 datetime2(7),@p76 int,@p66 nvarchar(4000),@p67 int,@p68 nvarchar(4000),@p69 datetime2(7),@p70 datetime2(7),@p71 datetime2(7),@p72 nvarchar(4000),@p73 nvarchar(4000),@p74 nvarchar(4000),@p75 datetime2(7),@p87 int,@p77 nvarchar(4000),@p78 int,@p79 nvarchar(4000),@p80 datetime2(7),@p81 datetime2(7),@p82 datetime2(7),@p83 nvarchar(4000),@p84 nvarchar(4000),@p85 nvarchar(4000),@p86 datetime2(7),@p98 int,@p88 nvarchar(4000),@p89 int,@p90 nvarchar(4000),@p91 datetime2(7),@p92 datetime2(7),@p93 datetime2(7),@p94 nvarchar(4000),@p95 nvarchar(4000),@p96 nvarchar(4000),@p97 datetime2(7),@p109 int,@p99 nvarchar(4000),@p100 int,@p101 nvarchar(4000),@p102 datetime2(7),@p103 datetime2(7),@p104 datetime2(7),@p105 nvarchar(4000),@p106 nvarchar(4000),@p107 nvarchar(4000),@p108 datetime2(7),@p120 int,@p110 nvarchar(4000),@p111 int,@p112 nvarchar(4000),@p113 datetime2(7),@p114 datetime2(7),@p115 datetime2(7),@p116 nvarchar(4000),@p117 nvarchar(4000),@p118 nvarchar(4000),@p119 datetime2(7),@p131 int,@p121 nvarchar(4000),@p122 int,@p123 nvarchar(4000),@p124 datetime2(7),@p125 datetime2(7),@p126 datetime2(7),@p127 nvarchar(4000),@p128 nvarchar(4000),@p129 nvarchar(4000),@p130 datetime2(7),@p142 int,@p132 nvarchar(4000),@p133 int,@p134 nvarchar(4000),@p135 datetime2(7),@p136 datetime2(7),@p137 datetime2(7),@p138 nvarchar(4000),@p139 nvarchar(4000),@p140 nvarchar(4000),@p141 datetime2(7),@p153 int,@p143 nvarchar(4000),@p144 int,@p145 nvarchar(4000),@p146 datetime2(7),@p147 datetime2(7),@p148 datetime2(7),@p149 nvarchar(4000),@p150 nvarchar(4000),@p151 nvarchar(4000),@p152 datetime2(7),@p164 int,@p154 nvarchar(4000),@p155 int,@p156 nvarchar(4000),@p157 datetime2(7),@p158 datetime2(7),@p159 datetime2(7),@p160 nvarchar(4000),@p161 nvarchar(4000),@p162 nvarchar(4000),@p163 datetime2(7),@p175 int,@p165 nvarchar(4000),@p166 int,@p167 nvarchar(4000),@p168 datetime2(7),@p169 datetime2(7),@p170 datetime2(7),@p171 nvarchar(4000),@p172 nvarchar(4000),@p173 nvarchar(4000),@p174 datetime2(7),@p186 int,@p176 nvarchar(4000),@p177 int,@p178 nvarchar(4000),@p179 datetime2(7),@p180 datetime2(7),@p181 datetime2(7),@p182 nvarchar(4000),@p183 nvarchar(4000),@p184 nvarchar(4000),@p185 datetime2(7),@p197 int,@p187 nvarchar(4000),@p188 int,@p189 nvarchar(4000),@p190 datetime2(7),@p191 datetime2(7),@p192 datetime2(7),@p193 nvarchar(4000),@p194 nvarchar(4000),@p195 nvarchar(4000),@p196 datetime2(7),@p208 int,@p198 nvarchar(4000),@p199 int,@p200 nvarchar(4000),@p201 datetime2(7),@p202 datetime2(7),@p203 datetime2(7),@p204 nvarchar(4000),@p205 nvarchar(4000),@p206 nvarchar(4000),@p207 datetime2(7),@p219 int,@p209 nvarchar(4000),@p210 int,@p211 nvarchar(4000),@p212 datetime2(7),@p213 datetime2(7),@p214 datetime2(7),@p215 nvarchar(4000),@p216 nvarchar(4000),@p217 nvarchar(4000),@p218 datetime2(7),@p230 int,@p220 nvarchar(4000),@p221 int,@p222 nvarchar(4000),@p223 datetime2(7),@p224 datetime2(7),@p225 datetime2(7),@p226 nvarchar(4000),@p227 nvarchar(4000),@p228 nvarchar(4000),@p229 datetime2(7),@p241 int,@p231 nvarchar(4000),@p232 int,@p233 nvarchar(4000),@p234 datetime2(7),@p235 datetime2(7),@p236 datetime2(7),@p237 nvarchar(4000),@p238 nvarchar(4000),@p239 nvarchar(4000),@p240 datetime2(7),@p252 int,@p242 nvarchar(4000),@p243 int,@p244 nvarchar(4000),@p245 datetime2(7),@p246 datetime2(7),@p247 datetime2(7),@p248 nvarchar(4000),@p249 nvarchar(4000),@p250 nvarchar(4000),@p251 datetime2(7),@p263 int,@p253 nvarchar(4000),@p254 int,@p255 nvarchar(4000),@p256 datetime2(7),@p257 datetime2(7),@p258 datetime2(7),@p259 nvarchar(4000),@p260 nvarchar(4000),@p261 nvarchar(4000),@p262 datetime2(7),@p274 int,@p264 nvarchar(4000),@p265 int,@p266 nvarchar(4000),@p267 datetime2(7),@p268 datetime2(7),@p269 datetime2(7),@p270 nvarchar(4000),@p271 nvarchar(4000),@p272 nvarchar(4000),@p273 datetime2(7),@p275 nvarchar(4000),@p276 int,@p277 nvarchar(4000),@p278 datetime2(7),@p279 datetime2(7),@p280 datetime2(7),@p281 nvarchar(4000),@p282 nvarchar(4000),@p283 nvarchar(4000),@p284 datetime2(7))
SET NOCOUNT ON;  
UPDATE [AltairUserContractFields] SET [AltairCode] = @p0, [AltairUserId] = @p1, [CreatedBy] = @p2, [CreatedOn] = @p3, [EffectiveFrom] = @p4, [EffectiveTo] = @p5, [FieldName] = @p6, [FieldValue] = @p7, [UpdatedBy] = @p8, [UpdatedOn] = @p9  WHERE [id] = @p10;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p11, [AltairUserId] = @p12, [CreatedBy] = @p13, [CreatedOn] = @p14, [EffectiveFrom] = @p15, [EffectiveTo] = @p16, [FieldName] = @p17, [FieldValue] = @p18, [UpdatedBy] = @p19, [UpdatedOn] = @p20  WHERE [id] = @p21;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p22, [AltairUserId] = @p23, [CreatedBy] = @p24, [CreatedOn] = @p25, [EffectiveFrom] = @p26, [EffectiveTo] = @p27, [FieldName] = @p28, [FieldValue] = @p29, [UpdatedBy] = @p30, [UpdatedOn] = @p31  WHERE [id] = @p32;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p33, [AltairUserId] = @p34, [CreatedBy] = @p35, [CreatedOn] = @p36, [EffectiveFrom] = @p37, [EffectiveTo] = @p38, [FieldName] = @p39, [FieldValue] = @p40, [UpdatedBy] = @p41, [UpdatedOn] = @p42  WHERE [id] = @p43;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p44, [AltairUserId] = @p45, [CreatedBy] = @p46, [CreatedOn] = @p47, [EffectiveFrom] = @p48, [EffectiveTo] = @p49, [FieldName] = @p50, [FieldValue] = @p51, [UpdatedBy] = @p52, [UpdatedOn] = @p53  WHERE [id] = @p54;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p55, [AltairUserId] = @p56, [CreatedBy] = @p57, [CreatedOn] = @p58, [EffectiveFrom] = @p59, [EffectiveTo] = @p60, [FieldName] = @p61, [FieldValue] = @p62, [UpdatedBy] = @p63, [UpdatedOn] = @p64  WHERE [id] = @p65;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p66, [AltairUserId] = @p67, [CreatedBy] = @p68, [CreatedOn] = @p69, [EffectiveFrom] = @p70, [EffectiveTo] = @p71, [FieldName] = @p72, [FieldValue] = @p73, [UpdatedBy] = @p74, [UpdatedOn] = @p75  WHERE [id] = @p76;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p77, [AltairUserId] = @p78, [CreatedBy] = @p79, [CreatedOn] = @p80, [EffectiveFrom] = @p81, [EffectiveTo] = @p82, [FieldName] = @p83, [FieldValue] = @p84, [UpdatedBy] = @p85, [UpdatedOn] = @p86  WHERE [id] = @p87;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p88, [AltairUserId] = @p89, [CreatedBy] = @p90, [CreatedOn] = @p91, [EffectiveFrom] = @p92, [EffectiveTo] = @p93, [FieldName] = @p94, [FieldValue] = @p95, [UpdatedBy] = @p96, [UpdatedOn] = @p97  WHERE [id] = @p98;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p99, [AltairUserId] = @p100, [CreatedBy] = @p101, [CreatedOn] = @p102, [EffectiveFrom] = @p103, [EffectiveTo] = @p104, [FieldName] = @p105, [FieldValue] = @p106, [UpdatedBy] = @p107, [UpdatedOn] = @p108  WHERE [id] = @p109;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p110, [AltairUserId] = @p111, [CreatedBy] = @p112, [CreatedOn] = @p113, [EffectiveFrom] = @p114, [EffectiveTo] = @p115, [FieldName] = @p116, [FieldValue] = @p117, [UpdatedBy] = @p118, [UpdatedOn] = @p119  WHERE [id] = @p120;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p121, [AltairUserId] = @p122, [CreatedBy] = @p123, [CreatedOn] = @p124, [EffectiveFrom] = @p125, [EffectiveTo] = @p126, [FieldName] = @p127, [FieldValue] = @p128, [UpdatedBy] = @p129, [UpdatedOn] = @p130  WHERE [id] = @p131;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p132, [AltairUserId] = @p133, [CreatedBy] = @p134, [CreatedOn] = @p135, [EffectiveFrom] = @p136, [EffectiveTo] = @p137, [FieldName] = @p138, [FieldValue] = @p139, [UpdatedBy] = @p140, [UpdatedOn] = @p141  WHERE [id] = @p142;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p143, [AltairUserId] = @p144, [CreatedBy] = @p145, [CreatedOn] = @p146, [EffectiveFrom] = @p147, [EffectiveTo] = @p148, [FieldName] = @p149, [FieldValue] = @p150, [UpdatedBy] = @p151, [UpdatedOn] = @p152  WHERE [id] = @p153;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p154, [AltairUserId] = @p155, [CreatedBy] = @p156, [CreatedOn] = @p157, [EffectiveFrom] = @p158, [EffectiveTo] = @p159, [FieldName] = @p160, [FieldValue] = @p161, [UpdatedBy] = @p162, [UpdatedOn] = @p163  WHERE [id] = @p164;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p165, [AltairUserId] = @p166, [CreatedBy] = @p167, [CreatedOn] = @p168, [EffectiveFrom] = @p169, [EffectiveTo] = @p170, [FieldName] = @p171, [FieldValue] = @p172, [UpdatedBy] = @p173, [UpdatedOn] = @p174  WHERE [id] = @p175;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p176, [AltairUserId] = @p177, [CreatedBy] = @p178, [CreatedOn] = @p179, [EffectiveFrom] = @p180, [EffectiveTo] = @p181, [FieldName] = @p182, [FieldValue] = @p183, [UpdatedBy] = @p184, [UpdatedOn] = @p185  WHERE [id] = @p186;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p187, [AltairUserId] = @p188, [CreatedBy] = @p189, [CreatedOn] = @p190, [EffectiveFrom] = @p191, [EffectiveTo] = @p192, [FieldName] = @p193, [FieldValue] = @p194, [UpdatedBy] = @p195, [UpdatedOn] = @p196  WHERE [id] = @p197;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p198, [AltairUserId] = @p199, [CreatedBy] = @p200, [CreatedOn] = @p201, [EffectiveFrom] = @p202, [EffectiveTo] = @p203, [FieldName] = @p204, [FieldValue] = @p205, [UpdatedBy] = @p206, [UpdatedOn] = @p207  WHERE [id] = @p208;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p209, [AltairUserId] = @p210, [CreatedBy] = @p211, [CreatedOn] = @p212, [EffectiveFrom] = @p213, [EffectiveTo] = @p214, [FieldName] = @p215, [FieldValue] = @p216, [UpdatedBy] = @p217, [UpdatedOn] = @p218  WHERE [id] = @p219;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p220, [AltairUserId] = @p221, [CreatedBy] = @p222, [CreatedOn] = @p223, [EffectiveFrom] = @p224, [EffectiveTo] = @p225, [FieldName] = @p226, [FieldValue] = @p227, [UpdatedBy] = @p228, [UpdatedOn] = @p229  WHERE [id] = @p230;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p231, [AltairUserId] = @p232, [CreatedBy] = @p233, [CreatedOn] = @p234, [EffectiveFrom] = @p235, [EffectiveTo] = @p236, [FieldName] = @p237, [FieldValue] = @p238, [UpdatedBy] = @p239, [UpdatedOn] = @p240  WHERE [id] = @p241;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p242, [AltairUserId] = @p243, [CreatedBy] = @p244, [CreatedOn] = @p245, [EffectiveFrom] = @p246, [EffectiveTo] = @p247, [FieldName] = @p248, [FieldValue] = @p249, [UpdatedBy] = @p250, [UpdatedOn] = @p251  WHERE [id] = @p252;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p253, [AltairUserId] = @p254, [CreatedBy] = @p255, [CreatedOn] = @p256, [EffectiveFrom] = @p257, [EffectiveTo] = @p258, [FieldName] = @p259, [FieldValue] = @p260, [UpdatedBy] = @p261, [UpdatedOn] = @p262  WHERE [id] = @p263;  
SELECT @@ROWCOUNT;    
UPDATE [AltairUserContractFields] SET [AltairCode] = @p264, [AltairUserId] = @p265, [CreatedBy] = @p266, [CreatedOn] = @p267, [EffectiveFrom] = @p268, [EffectiveTo] = @p269, [FieldName] = @p270, [FieldValue] = @p271, [UpdatedBy] = @p272, [UpdatedOn] = @p273  WHERE [id] = @p274;  
SELECT @@ROWCOUNT;    
INSERT INTO [AltairUserContractFields] ([AltairCode], [AltairUserId], [CreatedBy], [CreatedOn], [EffectiveFrom], [EffectiveTo], [FieldName], [FieldValue], [UpdatedBy], [UpdatedOn])  VALUES (@p275, @p276, @p277, @p278, @p279, @p280, @p281, @p282, @p283, @p284);  
SELECT [id]  FROM [AltairUserContractFields]  WHERE @@ROWCOUNT = 1 AND [id] = scope_identity();

Both of these queries have been fired several times in parallel as well.

There is a more detailed analysis in the email performed by the Microsoft engineer that I asked to be shared with your team as well.

Hopefully this can give an idea on what is causing it.

@Charles-Gagnon
Copy link
Contributor

Thanks for adding the info! I'm working with the CSS & SQL folks to figure out what exactly is the problem here, so I'll continue to go through that channel for requesting more information. I'll continue to update this thread with anything generally relevant or the final results of our investigation once we have them!

@MaddyDev MaddyDev added this to the Post GA milestone Jan 31, 2024
@Charles-Gagnon
Copy link
Contributor

Final result of our investigation :

  • The deadlocks that were observed were caused by a high MaxDOP setting. Suggestion was to reduce that and continue monitoring for issues
  • The other issues are things that I've opened separate issues for and we will be following up with fixing those going forward

#984
#983
#982

I'll close this issue then since we have the above issues for tracking, and if you see anything else (especially deadlocks again) let us know and we can investigate some more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants