Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-1571 Retry on different mongos when possible #1450

Merged
merged 34 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a2bb17b
1571
comandeo-mongo Aug 8, 2023
188e692
1571
comandeo-mongo Aug 11, 2023
d72723a
Add prose test for writes
comandeo-mongo Aug 16, 2023
1de68eb
Update source/retryable-reads/retryable-reads.rst
comandeo-mongo Aug 16, 2023
568f414
Update source/retryable-writes/retryable-writes.rst
comandeo-mongo Aug 16, 2023
82e65f7
Fix formatting
comandeo-mongo Aug 16, 2023
c5a4084
Update source/retryable-reads/retryable-reads.rst
comandeo-mongo Aug 17, 2023
a7a4928
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 17, 2023
fc6d772
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 17, 2023
1fd6891
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 17, 2023
ecf593c
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 17, 2023
b9f6907
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 17, 2023
f4b9259
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 17, 2023
ae1c88f
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 17, 2023
80b4ff7
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 17, 2023
ac09b71
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 17, 2023
d3ed143
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 17, 2023
8817111
Update source/retryable-writes/retryable-writes.rst
comandeo-mongo Aug 17, 2023
28b5fb9
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 17, 2023
1786155
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 17, 2023
49fe649
Cleanup; improve prose tests
comandeo-mongo Aug 21, 2023
4a98a13
Clarify that deprioritized servers are only for sharded
comandeo-mongo Aug 21, 2023
4ab24d4
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 23, 2023
30cc763
Merge remote-tracking branch 'upstream/master' into 1571-retry-on-dif…
comandeo-mongo Aug 24, 2023
fa36e20
Update source/retryable-reads/retryable-reads.rst
comandeo-mongo Aug 25, 2023
2698799
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 25, 2023
8289eb3
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 25, 2023
254051d
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 25, 2023
d9ed819
Update source/retryable-reads/tests/README.rst
comandeo-mongo Aug 25, 2023
aa2edbf
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 25, 2023
1cb24ec
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 25, 2023
348affb
Update source/retryable-writes/tests/README.rst
comandeo-mongo Aug 25, 2023
186cfd1
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 25, 2023
df8e5b4
Update source/server-selection/server-selection.rst
comandeo-mongo Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions source/retryable-reads/retryable-reads.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,10 +268,11 @@ selecting a server for a retry attempt.
3a. Selecting the server for retry
''''''''''''''''''''''''''''''''''

If the driver cannot select a server for a retry attempt or the newly selected
server does not support retryable reads, retrying is not possible and drivers
MUST raise the previous retryable error. In both cases, the caller is able to
infer that an attempt was made.
The server on which the operation failed MUST be provided to the server selection
mechanism as a deprioritized server. If the driver cannot select a server for
a retry attempt or the newly selected server does not support retryable reads,
retrying is not possible and drivers MUST raise the previous retryable error.
In both cases, the caller is able to infer that an attempt was made.

3b. Sending an equivalent command for a retry attempt
'''''''''''''''''''''''''''''''''''''''''''''''''''''''
Expand Down Expand Up @@ -357,9 +358,17 @@ and reflects the flow described above.
*/
function executeRetryableRead(command, session) {
Exception previousError = null;
Server previousServer = null;
while true {
try {
server = selectServer();
if (previousServer == null) {
server = selectServer();
} else {
// If a previous attempt was made, deprioritize the previous server
// where the command failed.
deprioritizedServers = [ previousServer ];
server = selectServer(deprioritizedServers);
}
} catch (ServerSelectionException exception) {
if (previousError == null) {
// If this is the first attempt, propagate the exception.
Expand Down Expand Up @@ -416,9 +425,11 @@ and reflects the flow described above.
} catch (NetworkException networkError) {
updateTopologyDescriptionForNetworkError(server, networkError);
previousError = networkError;
previousServer = server;
} catch (NotWritablePrimaryException notPrimaryError) {
updateTopologyDescriptionForNotWritablePrimaryError(server, notPrimaryError);
previousError = notPrimaryError;
previousServer = server;
} catch (DriverException error) {
if ( previousError != null ) {
throw previousError;
Expand Down
50 changes: 50 additions & 0 deletions source/retryable-reads/tests/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,56 @@ This test requires MongoDB 4.2.9+ for ``blockConnection`` support in the failpoi

9. Disable the failpoint.

Retrying Reads in a Sharded Cluster
===================================

These tests will be used to ensure drivers properly retry reads on a different
mongos.

Retryable Reads Are Retried on a Different Mongos if One Available
------------------------------------------------------------------
comandeo marked this conversation as resolved.
Show resolved Hide resolved

This test MUST be executed against a sharded cluster that has at least two
mongos instances.

1. Ensure that a test is run against a sharded cluster that has at least two
mongoses. If there are more than two mongoses in the cluster, pick two to
test against.

2. Create a client per mongos using the direct connection, and configure fail
points on each of the picked mongoses, so that each mongos raises
a retryable error once.
comandeo marked this conversation as resolved.
Show resolved Hide resolved

3. Create a client with ``retryReads=true`` that connects to the cluster,
providing the two selected mongoses as seeds.

4. Enable command monitoring, and execute a read command that is
supposed to fail on both mongoses.

5. Asserts that there were failed command events from each mongos.

6. Disable the fail points.


Retryable Reads Are Retried on the Same Mongos if No Other Available
--------------------------------------------------------------------
comandeo marked this conversation as resolved.
Show resolved Hide resolved

1. Ensure that a test is run against a sharded cluster. If there are multiple
mongoses in the cluster, pick one to test against.

2. Create a client that connects to the mongos using the direct connection,
and configure a fail point so that the mongos raises a retryable error once.
comandeo marked this conversation as resolved.
Show resolved Hide resolved

3. Create a client with ``retryReads=true`` that connects to the cluster,
providing the selected mongos as the seed.

4. Enable command monitoring, and execute a read command that is
supposed to fail.

5. Asserts that there was a failed command and a successful command event.

6. Disable the fail point.


Changelog
=========
Expand Down
19 changes: 12 additions & 7 deletions source/retryable-writes/retryable-writes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -395,11 +395,12 @@ of the following conditions is reached:
<../client-side-operations-timeout/client-side-operations-timeout.rst#retryability>`__.
- CSOT is not enabled and one retry was attempted.

For each retry attempt, drivers MUST select a writable server. If the driver
cannot select a server for a retry attempt or the selected server does not
support retryable writes, retrying is not possible and drivers MUST raise the
retryable error from the previous attempt. In both cases, the caller is able
to infer that an attempt was made.
For each retry attempt, drivers MUST select a writable server. Server on which
the operation failed MUST be provided to the server selection mechanism as
a deprioritized server. If the driver cannot select a server for a retry attempt
or the selected server does not support retryable writes, retrying is not
possible and drivers MUST raise the retryable error from the previous attempt.
In both cases, the caller is able to infer that an attempt was made.

If a retry attempt also fails, drivers MUST update their topology according to
the SDAM spec (see: `Error Handling`_). If an error would not allow the caller
Expand Down Expand Up @@ -492,11 +493,15 @@ The above rules are implemented in the following pseudo-code:
}
}

/* If we cannot select a writable server, do not proceed with retrying and
/*
* We try to select server that is not the one that failed by passing the
* failed server as a deprioritized server.
* If we cannot select a writable server, do not proceed with retrying and
* throw the previous error. The caller can then infer that an attempt was
* made and failed. */
try {
server = selectServer("writable");
deprioritizedServers = [ server ];
server = selectServer("writable", deprioritizedServers);
} catch (Exception ignoredError) {
throw previousError;
}
Expand Down
44 changes: 44 additions & 0 deletions source/retryable-writes/tests/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,50 @@ and sharded clusters.
mode: "off",
})

#. Test that in a sharded cluster writes are retried on a different mongos if
one available
Comment on lines +459 to +460
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. Test that in a sharded cluster writes are retried on a different mongos if
one available
#. Test that in a sharded cluster writes are retried on a different mongos if
one is available


This test MUST be executed against a sharded cluster that has at least two
mongos instances.

1. Ensure that a test is run against a sharded cluster that has at least two
mongoses. If there are more than two mongoses in the cluster, pick two to
test against.

2. Create a client per mongos using the direct connection, and configure fail
points on each of the picked mongoses, so that each mongos raises
a retryable error once.
comandeo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include that the failpoint should be configured with a "RetryableWriteError" label?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Added errorLabels: ['RetryableWriteError'] to the failpoint specification above.


3. Create a client with ``retryWrites=true`` that connects to the cluster,
providing the two selected mongoses as seeds.

4. Enable command monitoring, and execute a write command that is
supposed to fail on both mongoses.

5. Asserts that there were failed command events from each mongos.

6. Disable the fail points.

#. Test that in a sharded cluster on the same mongos if no other available
comandeo marked this conversation as resolved.
Show resolved Hide resolved

This test MUST be executed against a sharded cluster

1. Ensure that a test is run against a sharded cluster. If there are multiple
mongoses in the cluster, pick one to test against.

2. Create a client that connects to the mongos using the direct connection,
and configure a fail point so that the mongos raises a retryable error once.
comandeo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include that the failpoint should be configured with a "RetryableWriteError" label?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


3. Create a client with ``retryWrites=true`` that connects to the cluster,
providing the selected mongos as the seed.

4. Enable command monitoring, and execute a write command that is
supposed to fail.

5. Asserts that there was a failed command and a successful command event.

6. Disable the fail point.

Changelog
=========

Expand Down
8 changes: 6 additions & 2 deletions source/server-selection/server-selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -843,7 +843,9 @@ For multi-threaded clients, the server selection algorithm is as follows:
2. If the topology wire version is invalid, raise an error and log a
`"Server selection failed" message`_.

3. Find suitable servers by topology type and operation type
3. Find suitable servers by topology type and operation type. In the case of
sharded clusters, a list of deprioritized servers may be provided;
these servers should be selected only if there are no other suitable servers.

4. Filter the suitable servers by calling the optional, application-provided server
selector.
Expand Down Expand Up @@ -915,7 +917,9 @@ as follows:
5. If the topology wire version is invalid, raise an error and log a
`"Server selection failed" message`_.

6. Find suitable servers by topology type and operation type
6. Find suitable servers by topology type and operation type. In the case of
sharded clusters, a list of deprioritized servers may be provided;
these servers should be selected only if there are no other suitable servers.

7. Filter the suitable servers by calling the optional, application-provided
server selector.
Expand Down
Loading