From 780d296b904f24fe086adad71d46ef8bdec18f69 Mon Sep 17 00:00:00 2001 From: Steven Silvester Date: Thu, 30 May 2024 10:07:47 -0500 Subject: [PATCH] DRIVERS-2870 Fix content of retryable-writes-test readme --- source/retryable-writes/tests/README.md | 204 ++++++++++++++++++++++-- 1 file changed, 192 insertions(+), 12 deletions(-) diff --git a/source/retryable-writes/tests/README.md b/source/retryable-writes/tests/README.md index 749d29ee54..cd8320ce03 100644 --- a/source/retryable-writes/tests/README.md +++ b/source/retryable-writes/tests/README.md @@ -71,7 +71,7 @@ Drivers should also assert that command documents are properly constructed with on whether the write operation is supported. [Command Logging and Monitoring](../../command-logging-and-monitoring/command-logging-and-monitoring.rst) may be used to check for the presence of a `txnNumber` field in the command document. Note that command documents may always include an -`lsid` field per the [Driver Session](../../sessions/driver-sessions.md) specification. +`lsid` field per the [Driver Session](../../sessions/driver-sessions.rst) specification. These tests may be run against both a replica set and shard cluster. @@ -106,17 +106,197 @@ Drivers should test that transactions IDs are always included in commands for su The following tests ensure that retryable writes work properly with replica sets and sharded clusters. -1. Test that retryable writes raise an exception when using the MMAPv1 storage engine. For this test, execute a write - operation, such as `insertOne`, which should generate an exception. Assert that the error message is the replacement - error message: +- Test that retryable writes raise an exception when using the MMAPv1 storage engine. For this test, execute a write + operation, such as `insertOne`, which should generate an exception. Assert that the error message is the replacement + error message: - ``` - This MongoDB deployment does not support retryable writes. Please add - retryWrites=false to your connection string. - ``` + ``` + This MongoDB deployment does not support retryable writes. Please add + retryWrites=false to your connection string. + ``` - and the error code is 20. + and the error code is 20. - [!NOTE] - storage engine in use MAY skip this test for sharded clusters, since `mongos` does not report this information in its - `serverStatus` response. + > [!NOTE] + > Drivers that rely on `serverStatus` to determine the storage engine in use MAY skip this test for sharded clusters, + > since `mongos` does not report this information in its `serverStatus` response. + +- Test that drivers properly retry after encountering PoolClearedErrors. This test MUST be implemented by any driver + that implements the CMAP specification. + + This test requires MongoDB 4.3.4+ for both the `errorLabels` and `blockConnection` fail point options. + + - Create a client with maxPoolSize=1 and retryWrites=true. If testing against a sharded deployment, be sure to connect + to only a single mongos. + + - Enable the following failpoint: + + ```javascript + { + configureFailPoint: "failCommand", + mode: { times: 1 }, + data: { + failCommands: ["insert"], + errorCode: 91, + blockConnection: true, + blockTimeMS: 1000, + errorLabels: ["RetryableWriteError"] + } + } + ``` + + - Start two threads and attempt to perform an `insertOne` simultaneously on both. + + - Verify that both `insertOne` attempts succeed. + + - Via CMAP monitoring, assert that the first check out succeeds. + + - Via CMAP monitoring, assert that a PoolClearedEvent is then emitted. + + - Via CMAP monitoring, assert that the second check out then fails due to a connection error. + + - Via Command Monitoring, assert that exactly three `insert` CommandStartedEvents were observed in total. + + - Disable the failpoint. + +- Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label. + This test MUST + + - be implemented by any driver that implements the Command Monitoring specification, + + - only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers. + + - be run against server versions 6.0 and above. + + Additionally, this test requires drivers to set a fail point after an `insertOne` operation but before the + subsequent retry. Drivers that are unable to set a failCommand after the CommandSucceededEvent SHOULD use mocking or + write a unit test to cover the same sequence of events. + + - Create a client with `retryWrites=true`. + + - Configure a fail point with error code `91` (ShutdownInProgress): + + ```javascript + { + configureFailPoint: "failCommand", + mode: {times: 1}, + data: { + failCommands: ["insert"], + errorLabels: ["RetryableWriteError"], + writeConcernError: { code: 91 } + } + } + ``` + + - Via the command monitoring CommandSucceededEvent, configure a fail point with error code `10107` + (NotWritablePrimary) and a NoWritesPerformed label: + + ```javascript + { + configureFailPoint: "failCommand", + mode: {times: 1}, + data: { + failCommands: ["insert"], + errorCode: 10107, + errorLabels: ["RetryableWriteError", "NoWritesPerformed"] + } + } + ``` + + Drivers SHOULD only configure the `10107` fail point command if the the succeeded event is for the `91` error + configured in step 2. + + - Attempt an `insertOne` operation on any record for any database and collection. For the resulting error, assert that + the associated error code is `91`. + + - Disable the fail point: + + ```javascript + { + configureFailPoint: "failCommand", + mode: "off" + } + ``` + +- Test that in a sharded cluster writes are retried on a different mongos when one is available. This test MUST be + executed against a sharded cluster that has at least two mongos instances, supports `retryWrites=true`, has enabled + the `configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+). + + Note: this test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior + intended to be tested) from "retry on a different mongos due to normal SDAM randomized suitable server selection". + Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code + coverage tool, etc. + + - Create two clients `s0` and `s1` that each connect to a single mongos from the sharded cluster. They must not + connect to the same mongos. + + - Configure the following fail point for both `s0` and `s1`: + + ```javascript + { + configureFailPoint: "failCommand", + mode: { times: 1 }, + data: { + failCommands: ["insert"], + errorCode: 6, + errorLabels: ["RetryableWriteError"] + } + } + ``` + + - Create a client `client` with `retryWrites=true` that connects to the cluster using the same two mongoses as `s0` + and `s1`. + + - Enable failed command event monitoring for `client`. + + - Execute an `insert` command with `client`. Assert that the command failed. + + - Assert that two failed command events occurred. Assert that the failed command events occurred on different + mongoses. + + - Disable the fail points on both `s0` and `s1`. + +## Changelog + +- 2024-05-30: Migrated from reStructuredText to Markdown. + +- 2024-02-27: Convert legacy retryable writes tests to unified format. + +- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing\ + execution of deprioritization code + paths. + +- 2024-01-05: Fix typo in prose test title. + +- 2024-01-03: Note server version requirements for fail point options and revise\ + tests to specify the `errorLabels` + option at the top-level instead of within `writeConcernError`. + +- 2023-08-26: Add prose tests for retrying in a sharded cluster. + +- 2022-08-30: Add prose test verifying correct error handling for errors with\ + the NoWritesPerformed label, which is to + return the original error. + +- 2022-04-22: Clarifications to `serverless` and `useMultipleMongoses`. + +- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of\ + `useMultipleMongoses` for `LoadBalanced` topologies. + +- 2021-04-23: Add `load-balanced` to test topology requirements. + +- 2021-03-24: Add prose test verifying `PoolClearedErrors` are retried. + +- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to\ + `result` + +- 2019-08-07: Add Prose Tests section + +- 2019-06-07: Mention $merge stage for aggregate alongside $out + +- 2019-03-01: Add top-level `runOn` field to denote server version and/or\ + topology requirements requirements for the + test file. Removes the `minServerVersion` and `maxServerVersion` top-level fields, which are now expressed within + `runOn` elements. + + Add test-level `useMultipleMongoses` field.