PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait #1875

ShaneHarvey · 2024-09-23T20:08:08Z

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait

TODO:

Add unittests for _ACondition.
Add integration tests for blocking on client condition vars (server selection, maxConnecting, maxPoolSize). Edit: I'm deferring this to https://jira.mongodb.org/browse/PYTHON-4784

ShaneHarvey · 2024-09-25T17:46:10Z

Hmm, these changes are causing a few test regressions. I believe it's due to the new behavior of server selection because of the topology lock.wait(). Previously, server selection would block the loop entirely for 500ms on each iteration where a server could not be selected. Now, we correctly wait without blocking the loop.

ShaneHarvey · 2024-09-26T18:46:10Z

Requesting review now, I'll keep investigating the test failures.

NoahStapp · 2024-09-26T19:11:30Z

pymongo/lock.py

+                    await asyncio.wait_for(fut, timeout)
+                    return True
+                except asyncio.TimeoutError:
+                    return False  # Return false on timeout for sync pool compat.


Do we still need to acquire the lock if we timeout here?

Yes, the API contract for wait() says you MUST hold the lock before calling and you MUST still hold the lock when it returns, even on timeout.

Got it, so you always have to acquire the lock if you call wait and don't raise an error?

Can you add a comment linking to the API contract here? It would be good to make it more understandable for readers unfamiliar with it.

There's already this comment:

# Must re-acquire lock even if wait is cancelled.

NoahStapp · 2024-09-26T19:13:39Z

pymongo/lock.py

+        loop = asyncio.get_running_loop()
+        fut = loop.create_future()
+        self._waiters.append((loop, fut))
+        self.release()


Is this just to ensure we don't hold the lock while waiting for it?

Yes without releasing the lock, this code would deadlock since nothing would be able to notify the waiter.

NoahStapp · 2024-09-26T19:14:46Z

pymongo/lock.py

+                err = None
+                while True:
+                    try:
+                        await self.acquire()


Can this possibly loop forever?

Yes but only if something else holds the lock forever.

And we're asserting that this new code can't do that, got it!

NoahStapp

Code looks good, approved contingent on tests!

blink1073 · 2024-09-27T12:27:52Z

The failures seem relevant: test.asynchronous.test_cursor.TestCursor.test_to_list_csot_applied

blink1073 · 2024-09-27T12:28:26Z

It is the same failure across several tasks that all previously passed.

ShaneHarvey · 2024-09-27T16:42:34Z

Yes I'm looking into test_to_list_csot_applied today.

ShaneHarvey · 2024-09-27T20:18:41Z

I'm running a patch with both #1870 and this PR to see if the failures will work themselves out: https://spruce.mongodb.com/version/66f712658ea18700077002df/tasks

Edit: this one also has some failures. I pushed a fix for test_reconnect. Although I am a little concerned by these failures. One explanation is that an AsyncMongoClient takes much longer to process SDAM updates (eg on init or when rediscovering a server after an error).

ShaneHarvey · 2024-09-27T21:52:31Z

Bad timing for docker to start failing...

  ERROR: failed to solve: docker:stable: failed to resolve source metadata for docker.io/library/docker:stable: failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://auth.docker.io/token?scope=repository%3Alibrary%2Fdocker%3Apull&service=registry.docker.io: 401 Unauthorized

ShaneHarvey · 2024-09-27T23:30:39Z

New patch with both #1870 and this PR: https://spruce.mongodb.com/version/66f73f178ea187000770b298/

ShaneHarvey · 2024-09-28T02:02:21Z

This is ready for another look.

NoahStapp

Increased timeouts make sense, still seeing test failures--do you think they're unrelated?

blink1073 · 2024-09-30T17:44:38Z

The Windows failures seem relevant, since they're in test.asynchronous.test_locks.TestConditionStdlib.

ShaneHarvey · 2024-09-30T23:15:57Z

I fixed the remaining test failures.

blink1073

LGTM!

…1875) (cherry picked from commit 821811e)

ShaneHarvey added 6 commits September 23, 2024 13:05

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait

d7dc659

PYTHON-4782 Cleanup

5144326

PYTHON-4782 Add tests for _ACondition

45ca439

PYTHON-4782 Fix tests

b2327b6

PYTHON-4782 Don't convert test_locks.py to sync

0e9394b

PYTHON-4782 Final type ignore

2aa02e0

ShaneHarvey requested a review from NoahStapp September 26, 2024 18:45

ShaneHarvey marked this pull request as ready for review September 26, 2024 18:45

NoahStapp requested changes Sep 26, 2024

View reviewed changes

NoahStapp previously approved these changes Sep 26, 2024

View reviewed changes

PYTHON-4782 Fix _ACondition initializations

966ad50

ShaneHarvey dismissed NoahStapp’s stale review via 966ad50 September 26, 2024 23:30

NoahStapp mentioned this pull request Sep 27, 2024

PYTHON-4636 - Avoid blocking I/O calls in async code paths #1870

Merged

PYTHON-4782 Make test_to_list_csot_applied less flaky

f41a04c

PYTHON-4782 Make test_reconnect less flaky

d1ffa63

ShaneHarvey requested a review from NoahStapp September 28, 2024 02:02

NoahStapp reviewed Sep 30, 2024

View reviewed changes

ShaneHarvey added 3 commits September 30, 2024 14:01

PYTHON-4782 Improve reliability of test_cancelled_wakeup on windows

8e11556

PYTHON-4782 Make test_timeout_in_multi_batch_bulk_write less flaky

afe38db

PYTHON-4782 Make test_load_balancing less flaky

e69b773

blink1073 approved these changes Sep 30, 2024

View reviewed changes

ShaneHarvey merged commit 821811e into mongodb:master Sep 30, 2024
29 of 30 checks passed

blink1073 pushed a commit that referenced this pull request Oct 1, 2024

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait (#…

6a7fae1

…1875) (cherry picked from commit 821811e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait #1875

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait #1875

ShaneHarvey commented Sep 23, 2024 •

edited

Loading

ShaneHarvey commented Sep 25, 2024

ShaneHarvey commented Sep 26, 2024

NoahStapp Sep 26, 2024

ShaneHarvey Sep 26, 2024

NoahStapp Sep 26, 2024

NoahStapp Sep 26, 2024

ShaneHarvey Sep 26, 2024

NoahStapp Sep 26, 2024

ShaneHarvey Sep 26, 2024

NoahStapp Sep 26, 2024

ShaneHarvey Sep 26, 2024

NoahStapp Sep 26, 2024

NoahStapp left a comment

blink1073 commented Sep 27, 2024

blink1073 commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024 •

edited

Loading

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 28, 2024

NoahStapp left a comment

blink1073 commented Sep 30, 2024

ShaneHarvey commented Sep 30, 2024

blink1073 left a comment

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait #1875

PYTHON-4782 Fix deadlock and blocking behavior in _ACondition.wait #1875

Conversation

ShaneHarvey commented Sep 23, 2024 • edited Loading

ShaneHarvey commented Sep 25, 2024

ShaneHarvey commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NoahStapp left a comment

Choose a reason for hiding this comment

blink1073 commented Sep 27, 2024

blink1073 commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024 • edited Loading

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 27, 2024

ShaneHarvey commented Sep 28, 2024

NoahStapp left a comment

Choose a reason for hiding this comment

blink1073 commented Sep 30, 2024

ShaneHarvey commented Sep 30, 2024

blink1073 left a comment

Choose a reason for hiding this comment

ShaneHarvey commented Sep 23, 2024 •

edited

Loading

ShaneHarvey commented Sep 27, 2024 •

edited

Loading