+
#### Monitoring
diff --git a/source/logging/logging.md b/source/logging/logging.md
index 63e229367f..f8fdee7148 100644
--- a/source/logging/logging.md
+++ b/source/logging/logging.md
@@ -134,7 +134,7 @@ produce.
| Component Name | Specification(s) | Environment Variable |
| --------------- | -------------------------------------------------------------------------------------------------------------- | ------------------------------ |
| command | [Command Logging and Monitoring](../command-logging-and-monitoring/command-logging-and-monitoring.md) | `MONGODB_LOG_COMMAND` |
-| topology | [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst) | `MONGODB_LOG_TOPOLOGY` |
+| topology | [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.md) | `MONGODB_LOG_TOPOLOGY` |
| serverSelection | [Server Selection](../server-selection/server-selection.md) | `MONGODB_LOG_SERVER_SELECTION` |
| connection | [Connection Monitoring and Pooling](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md) | `MONGODB_LOG_CONNECTION` |
diff --git a/source/ocsp-support/tests/README.rst b/source/ocsp-support/tests/README.rst
index cbb8b6b9c4..263ec7a1f1 100644
--- a/source/ocsp-support/tests/README.rst
+++ b/source/ocsp-support/tests/README.rst
@@ -14,7 +14,7 @@ drivers can use to prove their conformance to the OCSP Support
specification. These tests MUST BE implemented by all drivers.
Additional YAML and JSON tests have also been added to the `URI
-Options Tests <../../uri-options/tests/README.rst>`__. Specifically,
+Options Tests <../../uri-options/tests/README.md>`__. Specifically,
the `TLS Options Test <../../uri-options/tests/tls-options.yml>`__ has
been updated with additional tests for the new URI options
``tlsDisableOCSPEndpointCheck`` and ``tlsDisableCertificateRevocationCheck``.
diff --git a/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst b/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst
index b255c9ad68..7691629d33 100644
--- a/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst
+++ b/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst
@@ -26,7 +26,7 @@ specification's definition of monitoring a set of mongos servers in a Sharded
TopologyType.
.. _`Initial DNS Seedlist Discovery`: ../initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md
-.. _`Server Discovery and Monitoring`: ../server-discovery-and-monitoring/server-discovery-and-monitoring.rst
+.. _`Server Discovery and Monitoring`: ../server-discovery-and-monitoring/server-discovery-and-monitoring.md
META
====
@@ -144,7 +144,7 @@ Single-Threaded Drivers
The rescan MUST happen **before** scanning all servers as part of the normal
scanning_ functionality, but only if *rescanSRVIntervalMS* has passed.
-.. _scanning: https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#scanning
+.. _scanning: ../server-discovery-and-monitoring/server-discovery-and-monitoring.md#scanning
Test Plan
=========
diff --git a/source/requirements.txt b/source/requirements.txt
new file mode 100644
index 0000000000..b854bca214
--- /dev/null
+++ b/source/requirements.txt
@@ -0,0 +1 @@
+mkdocs
\ No newline at end of file
diff --git a/source/retryable-reads/retryable-reads.md b/source/retryable-reads/retryable-reads.md
index 280b6f65a6..1ffd168a93 100644
--- a/source/retryable-reads/retryable-reads.md
+++ b/source/retryable-reads/retryable-reads.md
@@ -84,7 +84,7 @@ the defined name but MAY deviate to comply with their existing conventions.
Drivers MUST verify server eligibility by ensuring that `maxWireVersion` is at least 6 because retryable reads require a
MongoDB 3.6 standalone, replica set or shard cluster, MongoDB 3.6 server wire version is 6 as defined in the
-[Server Wire version and Feature List specification](../wireversion-featurelist.rst).
+[Server Wire version and Feature List specification](../wireversion-featurelist.md).
The minimum server version is 3.6 because
@@ -202,7 +202,7 @@ Drivers MUST only attempt to retry a read command if
If the driver decides to allow retry and the previous attempt of a retryable read command encounters a retryable error,
the driver MUST update its topology according to the Server Discovery and Monitoring spec (see
-[SDAM: Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling)) and
+[SDAM: Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)) and
capture this original retryable error. Drivers should then proceed with selecting a server for a retry attempt.
###### 3a. Selecting the server for retry
@@ -247,7 +247,7 @@ and the timeout has not yet expired, then the Driver MUST jump back to step 2b a
attempts.
Otherwise, drivers MUST update their topology according to the SDAM spec (see
-[SDAM: Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling)). If an
+[SDAM: Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)). If an
error would not allow the caller to infer that an attempt was made (e.g. connection pool exception originating from the
driver), the previous error should be raised. If a retry failed due to another retryable error or some other error
originating from the server, that error should be raised instead as the caller can infer that an attempt was made and
@@ -520,8 +520,8 @@ No.
[This is in contrast to the answer supplied in in the retryable writes specification.](../retryable-writes/retryable-writes.md#can-drivers-resend-the-same-wire-protocol-message-on-retry-attempts)
However, when retryable writes were implemented, no driver actually chose to resend the same wire protocol message.
Today, if a driver attempted to resend the same wire protocol message, this could violate
-[the rules for gossiping $clusterTime](../sessions/driver-sessions.rst#gossipping-the-cluster-time): specifically
-[the rule that a driver must send the highest seen $clusterTime](../sessions/driver-sessions.rst#sending-the-highest-seen-cluster-time).
+[the rules for gossiping $clusterTime](../sessions/driver-sessions.md#gossipping-the-cluster-time): specifically
+[the rule that a driver must send the highest seen $clusterTime](../sessions/driver-sessions.md#sending-the-highest-seen-cluster-time).
Additionally, there would be a behavioral difference between a driver resending the same wire protocol message and one
that does not. For example, a driver that creates a new wire protocol message could exhibit the following
diff --git a/source/retryable-writes/retryable-writes.md b/source/retryable-writes/retryable-writes.md
index 974295302e..1adfe21e18 100644
--- a/source/retryable-writes/retryable-writes.md
+++ b/source/retryable-writes/retryable-writes.md
@@ -32,12 +32,12 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH
The transaction ID identifies the transaction as part of which the command is running. In a write
command where the client has requested retryable behavior, it is expressed by the top-level `lsid` and `txnNumber`
fields. The `lsid` component is the corresponding server session ID. which is a BSON value defined in the
-[Driver Session](../sessions/driver-sessions.rst) specification. The `txnNumber` component is a monotonically increasing
+[Driver Session](../sessions/driver-sessions.md) specification. The `txnNumber` component is a monotonically increasing
(per server session), positive 64-bit integer.
**ClientSession**\
Driver object representing a client session, which is defined in the
-[Driver Session](../sessions/driver-sessions.rst) specification. This object is always associated with a server session;
+[Driver Session](../sessions/driver-sessions.md) specification. This object is always associated with a server session;
however, drivers will pool server sessions so that creating a ClientSession will not always entail creation of a new
server session. The name of this object MAY vary across drivers.
@@ -45,7 +45,7 @@ server session. The name of this object MAY vary across drivers.
An error is considered retryable if it has a RetryableWriteError label in its top-level
"errorLabels" field. See [Determining Retryable Errors](#determining-retryable-errors) for more information.
-Additional terms may be defined in the [Driver Session](../sessions/driver-sessions.rst) specification.
+Additional terms may be defined in the [Driver Session](../sessions/driver-sessions.md) specification.
### Naming Deviations
@@ -109,10 +109,11 @@ Supported single-statement write operations include `insertOne()`, `updateOne()`
`findOneAndDelete()`, `findOneAndReplace()`, and `findOneAndUpdate()`.
Supported multi-statement write operations include `insertMany()` and `bulkWrite()`. The ordered option may be `true` or
-`false`. In the case of `bulkWrite()`, `UpdateMany` or `DeleteMany` operations within the `requests` parameter may make
-some write commands ineligible for retryability. Drivers MUST evaluate eligibility for each write command sent as part
-of the `bulkWrite()` (after order and batch splitting) individually. Drivers MUST NOT alter existing logic for order and
-batch splitting in an attempt to maximize retryability for operations within a bulk write.
+`false`. For both the collection-level and client-level `bulkWrite()` methods, a bulk write batch is only retryable if
+it does not contain any `multi: true` writes (i.e. `UpdateMany` and `DeleteMany`). Drivers MUST evaluate eligibility for
+each write command sent as part of the `bulkWrite()` (after order and batch splitting) individually. Drivers MUST NOT
+alter existing logic for order and batch splitting in an attempt to maximize retryability for operations within a bulk
+write.
These methods above are defined in the [CRUD](../crud/crud.md) specification.
@@ -215,7 +216,7 @@ The RetryableWriteError label might be added to an error in a variety of ways:
- the `writeConcernError.code` field in a mongos response
The criteria for retryable errors is similar to the discussion in the SDAM spec's section on
- [Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling), but includes
+ [Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling), but includes
additional error codes. See [What do the additional error codes mean?](#what-do-the-additional-error-codes-mean) for
the reasoning behind these additional errors.
@@ -264,8 +265,8 @@ enabled.
When constructing a supported write command that will be executed within a MongoClient where retryable writes have been
enabled, drivers MUST increment the transaction number for the corresponding server session and include the server
session ID and transaction number in top-level `lsid` and `txnNumber` fields, respectively. `lsid` is a BSON value
-(discussed in the [Driver Session](../sessions/driver-sessions.rst) specification). `txnNumber` MUST be a positive
-64-bit integer (BSON type 0x12).
+(discussed in the [Driver Session](../sessions/driver-sessions.md) specification). `txnNumber` MUST be a positive 64-bit
+integer (BSON type 0x12).
The following example illustrates a possible write command for an `updateOne()` operation:
@@ -299,8 +300,8 @@ MUST NOT attempt to retry a write command on any other error.
If the first attempt of a write command including a transaction ID encounters a retryable error, the driver MUST update
its topology according to the SDAM spec (see:
-[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling)) and capture
-this original retryable error.
+[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)) and capture this
+original retryable error.
Drivers MUST then retry the operation as many times as necessary until any one of the following conditions is reached:
@@ -318,7 +319,7 @@ retrying is not possible and drivers MUST raise the retryable error from the pre
is able to infer that an attempt was made.
If a retry attempt also fails, drivers MUST update their topology according to the SDAM spec (see:
-[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling)). If an error
+[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)). If an error
would not allow the caller to infer that an attempt was made (e.g. connection pool exception originating from the
driver) or the error is labeled "NoWritesPerformed", the error from the previous attempt should be raised. If all server
errors are labeled "NoWritesPerformed", then the first error should be raised.
@@ -448,12 +449,12 @@ function executeRetryableWrite(command, session) {
```
`handleError` in the above pseudocode refers to the function defined in the
-[Error handling pseudocode](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst#error-handling-pseudocode)
+[Error handling pseudocode](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling-pseudocode)
section of the SDAM specification.
When retrying a write command, drivers MUST resend the command with the same transaction ID. Drivers MUST NOT resend the
original wire protocol message if doing so would violate rules for
-[gossipping the cluster time](../sessions/driver-sessions.rst#gossipping-the-cluster-time) (see:
+[gossipping the cluster time](../sessions/driver-sessions.md#gossipping-the-cluster-time) (see:
[Can drivers resend the same wire protocol message on retry attempts?](#can-drivers-resend-the-same-wire-protocol-message-on-retry-attempts)).
In the case of a multi-statement write operation split across multiple write commands, a failed retry attempt will also
@@ -512,7 +513,7 @@ driver API needs to be extended to support this behavior.
## Design Rationale
-The design of this specification piggy-backs that of the [Driver Session](../sessions/driver-sessions.rst) specification
+The design of this specification piggy-backs that of the [Driver Session](../sessions/driver-sessions.md) specification
in that it modifies the driver API as little as possible to introduce the concept of at-most-once semantics and
retryable behavior for write operations. A transaction ID will be included in all supported write commands executed
within the scope of a MongoClient where retryable writes have been enabled.
@@ -556,7 +557,7 @@ The spec concerns itself with retrying write operations that encounter a retryab
network error or a response indicating that the node is no longer a primary). A retryable error may be classified as
either a transient error (e.g. dropped connection, replica set failover) or persistent outage. In the case of a
transient error, the driver will mark the server as "unknown" per the
-[SDAM](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst) spec. A subsequent retry attempt will
+[SDAM](../server-discovery-and-monitoring/server-discovery-and-monitoring.md) spec. A subsequent retry attempt will
allow the driver to rediscover the primary within the designated server selection timeout period (30 seconds by
default). If server selection times out during this retry attempt, we can reasonably assume that there is a persistent
outage. In the case of a persistent outage, multiple retry attempts are fruitless and would waste time. See
@@ -634,7 +635,7 @@ Since retry attempts entail sending the same command and transaction ID to the s
the same wire protocol message in order to avoid constructing a new message and computing its checksum. The server will
not complain if it receives two messages with the same `requestId`, as the field is only used for logging and populating
the `responseTo` field in its replies to the client. That said, re-using a wire protocol message might violate rules for
-[gossipping the cluster time](../sessions/driver-sessions.rst#gossipping-the-cluster-time) and might also have
+[gossipping the cluster time](../sessions/driver-sessions.md#gossipping-the-cluster-time) and might also have
implications for [Command Monitoring](#command-monitoring), since the original write command and its retry attempt may
report the same `requestId`.
@@ -673,6 +674,8 @@ retryWrites is not true would be inconsistent with the server and potentially co
## Changelog
+- 2024-05-08: Add guidance for client-level `bulkWrite()` retryability.
+
- 2024-05-02: Migrated from reStructuredText to Markdown.
- 2024-04-29: Fix the link to the Driver Sessions spec.
diff --git a/source/retryable-writes/tests/README.md b/source/retryable-writes/tests/README.md
index 151b26181f..e883ca368d 100644
--- a/source/retryable-writes/tests/README.md
+++ b/source/retryable-writes/tests/README.md
@@ -1,7 +1,5 @@
# Retryable Write Tests
-______________________________________________________________________
-
## Introduction
The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
@@ -71,7 +69,7 @@ Drivers should also assert that command documents are properly constructed with
on whether the write operation is supported.
[Command Logging and Monitoring](../../command-logging-and-monitoring/command-logging-and-monitoring.rst) may be used to
check for the presence of a `txnNumber` field in the command document. Note that command documents may always include an
-`lsid` field per the [Driver Session](../../sessions/driver-sessions.rst) specification.
+`lsid` field per the [Driver Session](../../sessions/driver-sessions.md) specification.
These tests may be run against both a replica set and shard cluster.
@@ -106,17 +104,238 @@ Drivers should test that transactions IDs are always included in commands for su
The following tests ensure that retryable writes work properly with replica sets and sharded clusters.
-1. Test that retryable writes raise an exception when using the MMAPv1 storage engine. For this test, execute a write
- operation, such as `insertOne`, which should generate an exception. Assert that the error message is the replacement
- error message:
+### 1. Test that retryable writes raise an exception when using the MMAPv1 storage engine.
+
+For this test, execute a write operation, such as `insertOne`, which should generate an exception. Assert that the error
+message is the replacement error message:
+
+```
+This MongoDB deployment does not support retryable writes. Please add
+retryWrites=false to your connection string.
+```
+
+and the error code is 20.
+
+> [!NOTE]
+> Drivers that rely on `serverStatus` to determine the storage engine in use MAY skip this test for sharded clusters,
+> since `mongos` does not report this information in its `serverStatus` response.
+
+### 2. Test that drivers properly retry after encountering PoolClearedErrors.
+
+This test MUST be implemented by any driver that implements the CMAP specification.
+
+This test requires MongoDB 4.3.4+ for both the `errorLabels` and `blockConnection` fail point options.
+
+1. Create a client with maxPoolSize=1 and retryWrites=true. If testing against a sharded deployment, be sure to connect
+ to only a single mongos.
+
+2. Enable the following failpoint:
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: { times: 1 },
+ data: {
+ failCommands: ["insert"],
+ errorCode: 91,
+ blockConnection: true,
+ blockTimeMS: 1000,
+ errorLabels: ["RetryableWriteError"]
+ }
+ }
+ ```
+
+3. Start two threads and attempt to perform an `insertOne` simultaneously on both.
+
+4. Verify that both `insertOne` attempts succeed.
+
+5. Via CMAP monitoring, assert that the first check out succeeds.
+
+6. Via CMAP monitoring, assert that a PoolClearedEvent is then emitted.
+
+7. Via CMAP monitoring, assert that the second check out then fails due to a connection error.
+
+8. Via Command Monitoring, assert that exactly three `insert` CommandStartedEvents were observed in total.
+9. Disable the failpoint.
+
+### 3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label.
+
+This test MUST:
+
+- be implemented by any driver that implements the Command Monitoring specification,
+- only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers.
+- be run against server versions 6.0 and above.
+
+Additionally, this test requires drivers to set a fail point after an `insertOne` operation but before the subsequent
+retry. Drivers that are unable to set a failCommand after the CommandSucceededEvent SHOULD use mocking or write a unit
+test to cover the same sequence of events.
+
+1. Create a client with `retryWrites=true`.
+
+2. Configure a fail point with error code `91` (ShutdownInProgress):
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: {times: 1},
+ data: {
+ failCommands: ["insert"],
+ errorLabels: ["RetryableWriteError"],
+ writeConcernError: { code: 91 }
+ }
+ }
+ ```
+
+3. Via the command monitoring CommandSucceededEvent, configure a fail point with error code `10107` (NotWritablePrimary)
+ and a NoWritesPerformed label:
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: {times: 1},
+ data: {
+ failCommands: ["insert"],
+ errorCode: 10107,
+ errorLabels: ["RetryableWriteError", "NoWritesPerformed"]
+ }
+ }
```
- This MongoDB deployment does not support retryable writes. Please add
- retryWrites=false to your connection string.
+
+ Drivers SHOULD only configure the `10107` fail point command if the the succeeded event is for the `91` error
+ configured in step 2.
+
+4. Attempt an `insertOne` operation on any record for any database and collection. For the resulting error, assert that
+ the associated error code is `91`.
+
+5. Disable the fail point:
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: "off"
+ }
```
- and the error code is 20.
+### 4. Test that in a sharded cluster writes are retried on a different mongos when one is available.
+
+This test MUST be executed against a sharded cluster that has at least two mongos instances, supports
+`retryWrites=true`, has enabled the `configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+).
+
+> [!NOTE]
+> This test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior
+> intended to be tested) from "retry on a different mongos due to normal SDAM randomized suitable server selection".
+> Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code
+> coverage tool, etc.
+
+1. Create two clients `s0` and `s1` that each connect to a single mongos from the sharded cluster. They must not connect
+ to the same mongos.
+
+2. Configure the following fail point for both `s0` and `s1`:
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: { times: 1 },
+ data: {
+ failCommands: ["insert"],
+ errorCode: 6,
+ errorLabels: ["RetryableWriteError"]
+ }
+ }
+ ```
+
+3. Create a client `client` with `retryWrites=true` that connects to the cluster using the same two mongoses as `s0` and
+ `s1`.
+
+4. Enable failed command event monitoring for `client`.
+
+5. Execute an `insert` command with `client`. Assert that the command failed.
+
+6. Assert that two failed command events occurred. Assert that the failed command events occurred on different mongoses.
+
+7. Disable the fail points on both `s0` and `s1`.
+
+### 5. Test that in a sharded cluster writes are retried on the same mongos when no others are available.
+
+This test MUST be executed against a sharded cluster that supports `retryWrites=true`, has enabled the
+`configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+).
+
+Note: this test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior
+intended to be tested) from "retry on a different mongos due to normal SDAM behavior of randomized suitable server
+selection". Verify relevant code paths are correctly executed by the tests using external means such as a logging,
+debugger, code coverage tool, etc.
+
+1. Create a client `s0` that connects to a single mongos from the cluster.
+
+2. Configure the following fail point for `s0`:
+
+ ```javascript
+ {
+ configureFailPoint: "failCommand",
+ mode: { times: 1 },
+ data: {
+ failCommands: ["insert"],
+ errorCode: 6,
+ errorLabels: ["RetryableWriteError"],
+ closeConnection: true
+ }
+ }
+ ```
+
+3. Create a client `client` with `directConnection=false` (when not set by default) and `retryWrites=true` that connects
+ to the cluster using the same single mongos as `s0`.
+
+4. Enable succeeded and failed command event monitoring for `client`.
+
+5. Execute an `insert` command with `client`. Assert that the command succeeded.
+
+6. Assert that exactly one failed command event and one succeeded command event occurred. Assert that both events
+ occurred on the same mongos.
+
+7. Disable the fail point on `s0`.
+
+## Changelog
+
+- 2024-05-30: Migrated from reStructuredText to Markdown.
+
+- 2024-02-27: Convert legacy retryable writes tests to unified format.
+
+- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing\
+ execution of deprioritization code
+ paths.
+
+- 2024-01-05: Fix typo in prose test title.
+
+- 2024-01-03: Note server version requirements for fail point options and revise\
+ tests to specify the `errorLabels`
+ option at the top-level instead of within `writeConcernError`.
+
+- 2023-08-26: Add prose tests for retrying in a sharded cluster.
+
+- 2022-08-30: Add prose test verifying correct error handling for errors with\
+ the NoWritesPerformed label, which is to
+ return the original error.
+
+- 2022-04-22: Clarifications to `serverless` and `useMultipleMongoses`.
+
+- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of\
+ `useMultipleMongoses` for `LoadBalanced` topologies.
+
+- 2021-04-23: Add `load-balanced` to test topology requirements.
+
+- 2021-03-24: Add prose test verifying `PoolClearedErrors` are retried.
+
+- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to\
+ `result`
+
+- 2019-08-07: Add Prose Tests section
+
+- 2019-06-07: Mention $merge stage for aggregate alongside $out
+
+- 2019-03-01: Add top-level `runOn` field to denote server version and/or\
+ topology requirements requirements for the
+ test file. Removes the `minServerVersion` and `maxServerVersion` top-level fields, which are now expressed within
+ `runOn` elements.
- [!NOTE]
- storage engine in use MAY skip this test for sharded clusters, since `mongos` does not report this information in its
- `serverStatus` response.
+ Add test-level `useMultipleMongoses` field.
diff --git a/source/retryable-writes/tests/etc/templates/handshakeError.yml.template b/source/retryable-writes/tests/etc/templates/handshakeError.yml.template
index 3974392a6f..d9037d5b20 100644
--- a/source/retryable-writes/tests/etc/templates/handshakeError.yml.template
+++ b/source/retryable-writes/tests/etc/templates/handshakeError.yml.template
@@ -51,6 +51,10 @@ tests:
# - Tests whether operation successfully retries the handshake and succeeds.
{% for operation in operations %}
- description: "{{operation.object}}.{{operation.operation_name}} succeeds after retryable handshake network error"
+ {%- if (operation.operation_name == 'clientBulkWrite') %}
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0
+ {%- endif %}
operations:
- name: failPoint
object: testRunner
@@ -95,6 +99,10 @@ tests:
commandName: {{operation.command_name}}
- description: "{{operation.object}}.{{operation.operation_name}} succeeds after retryable handshake server error (ShutdownInProgress)"
+ {%- if (operation.operation_name == 'clientBulkWrite') %}
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0
+ {%- endif %}
operations:
- name: failPoint
object: testRunner
diff --git a/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.json b/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.json
new file mode 100644
index 0000000000..e2c0fb9c0a
--- /dev/null
+++ b/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.json
@@ -0,0 +1,350 @@
+{
+ "description": "client bulkWrite retryable writes with client errors",
+ "schemaVersion": "1.21",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0",
+ "topologies": [
+ "replicaset",
+ "sharded",
+ "load-balanced"
+ ]
+ }
+ ],
+ "createEntities": [
+ {
+ "client": {
+ "id": "client0",
+ "observeEvents": [
+ "commandStartedEvent"
+ ],
+ "useMultipleMongoses": false
+ }
+ },
+ {
+ "database": {
+ "id": "database0",
+ "client": "client0",
+ "databaseName": "retryable-writes-tests"
+ }
+ },
+ {
+ "collection": {
+ "id": "collection0",
+ "database": "database0",
+ "collectionName": "coll0"
+ }
+ }
+ ],
+ "initialData": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "retryable-writes-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ }
+ ]
+ }
+ ],
+ "_yamlAnchors": {
+ "namespace": "retryable-writes-tests.coll0"
+ },
+ "tests": [
+ {
+ "description": "client bulkWrite with one network error succeeds after retry",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ }
+ ],
+ "verboseResults": true
+ },
+ "expectResult": {
+ "insertedCount": 1,
+ "upsertedCount": 0,
+ "matchedCount": 0,
+ "modifiedCount": 0,
+ "deletedCount": 0,
+ "insertResults": {
+ "0": {
+ "insertedId": 4
+ }
+ },
+ "updateResults": {},
+ "deleteResults": {}
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ }
+ ]
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "retryable-writes-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ },
+ {
+ "_id": 4,
+ "x": 44
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with two network errors fails after retry",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 2
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ }
+ ],
+ "verboseResults": true
+ },
+ "expectError": {
+ "isClientError": true,
+ "errorLabelsContain": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ }
+ ]
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "retryable-writes-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}
diff --git a/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.yml b/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.yml
new file mode 100644
index 0000000000..85696e89db
--- /dev/null
+++ b/source/retryable-writes/tests/unified/client-bulkWrite-clientErrors.yml
@@ -0,0 +1,172 @@
+description: "client bulkWrite retryable writes with client errors"
+schemaVersion: "1.21"
+runOnRequirements:
+ - minServerVersion: "8.0"
+ topologies:
+ - replicaset
+ - sharded
+ - load-balanced
+
+createEntities:
+ - client:
+ id: &client0 client0
+ observeEvents: [ commandStartedEvent ]
+ useMultipleMongoses: false
+ - database:
+ id: &database0 database0
+ client: *client0
+ databaseName: &database0Name retryable-writes-tests
+ - collection:
+ id: &collection0 collection0
+ database: *database0
+ collectionName: &collection0Name coll0
+
+initialData:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
+
+_yamlAnchors:
+ namespace: &namespace "retryable-writes-tests.coll0"
+
+tests:
+ - description: "client bulkWrite with one network error succeeds after retry"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ closeConnection: true
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 4, x: 44 }
+ verboseResults: true
+ expectResult:
+ insertedCount: 1
+ upsertedCount: 0
+ matchedCount: 0
+ modifiedCount: 0
+ deletedCount: 0
+ insertResults:
+ 0:
+ insertedId: 4
+ updateResults: {}
+ deleteResults: {}
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ nsInfo:
+ - ns: *namespace
+ # An implicit session is included with the transaction number:
+ lsid: { "$$exists": true }
+ txnNumber: { "$$exists": true }
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ nsInfo:
+ - ns: *namespace
+ # An implicit session is included with the transaction number:
+ lsid: { "$$exists": true }
+ txnNumber: { "$$exists": true }
+ outcome:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
+ - { _id: 4, x: 44 }
+ - description: "client bulkWrite with two network errors fails after retry"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 2
+ data:
+ failCommands: [ bulkWrite ]
+ closeConnection: true
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 4, x: 44 }
+ verboseResults: true
+ expectError:
+ isClientError: true
+ errorLabelsContain: ["RetryableWriteError"] # Error label added by driver.
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ nsInfo:
+ - ns: *namespace
+ # An implicit session is included with the transaction number:
+ lsid: { "$$exists": true }
+ txnNumber: { "$$exists": true }
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ nsInfo:
+ - ns: *namespace
+ # An implicit session is included with the transaction number:
+ lsid: { "$$exists": true }
+ txnNumber: { "$$exists": true }
+ outcome:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
diff --git a/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.json b/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.json
new file mode 100644
index 0000000000..4a0b210eb5
--- /dev/null
+++ b/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.json
@@ -0,0 +1,872 @@
+{
+ "description": "client bulkWrite retryable writes",
+ "schemaVersion": "1.21",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0",
+ "topologies": [
+ "replicaset",
+ "sharded",
+ "load-balanced"
+ ]
+ }
+ ],
+ "createEntities": [
+ {
+ "client": {
+ "id": "client0",
+ "observeEvents": [
+ "commandStartedEvent"
+ ],
+ "useMultipleMongoses": false
+ }
+ },
+ {
+ "client": {
+ "id": "clientRetryWritesFalse",
+ "uriOptions": {
+ "retryWrites": false
+ },
+ "observeEvents": [
+ "commandStartedEvent"
+ ],
+ "useMultipleMongoses": false
+ }
+ },
+ {
+ "database": {
+ "id": "database0",
+ "client": "client0",
+ "databaseName": "retryable-writes-tests"
+ }
+ },
+ {
+ "collection": {
+ "id": "collection0",
+ "database": "database0",
+ "collectionName": "coll0"
+ }
+ }
+ ],
+ "initialData": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "retryable-writes-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ }
+ ]
+ }
+ ],
+ "_yamlAnchors": {
+ "namespace": "retryable-writes-tests.coll0"
+ },
+ "tests": [
+ {
+ "description": "client bulkWrite with no multi: true operations succeeds after retryable top-level error",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorCode": 189,
+ "errorLabels": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ },
+ {
+ "updateOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 1
+ },
+ "update": {
+ "$inc": {
+ "x": 1
+ }
+ }
+ }
+ },
+ {
+ "replaceOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 2
+ },
+ "replacement": {
+ "x": 222
+ }
+ }
+ },
+ {
+ "deleteOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 3
+ }
+ }
+ }
+ ],
+ "verboseResults": true
+ },
+ "expectResult": {
+ "insertedCount": 1,
+ "upsertedCount": 0,
+ "matchedCount": 2,
+ "modifiedCount": 2,
+ "deletedCount": 1,
+ "insertResults": {
+ "0": {
+ "insertedId": 4
+ }
+ },
+ "updateResults": {
+ "1": {
+ "matchedCount": 1,
+ "modifiedCount": 1,
+ "upsertedId": {
+ "$$exists": false
+ }
+ },
+ "2": {
+ "matchedCount": 1,
+ "modifiedCount": 1,
+ "upsertedId": {
+ "$$exists": false
+ }
+ }
+ },
+ "deleteResults": {
+ "3": {
+ "deletedCount": 1
+ }
+ }
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": false
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 2
+ },
+ "updateMods": {
+ "x": 222
+ },
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": false
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": false
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 2
+ },
+ "updateMods": {
+ "x": 222
+ },
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": false
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ }
+ ]
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "retryable-writes-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 12
+ },
+ {
+ "_id": 2,
+ "x": 222
+ },
+ {
+ "_id": 4,
+ "x": 44
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with multi: true operations fails after retryable top-level error",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorCode": 189,
+ "errorLabels": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "updateMany": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 1
+ },
+ "update": {
+ "$inc": {
+ "x": 1
+ }
+ }
+ }
+ },
+ {
+ "deleteMany": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 3
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "errorCode": 189,
+ "errorLabelsContain": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": true,
+ "ordered": true,
+ "ops": [
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": true
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": true
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ]
+ }
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with no multi: true operations succeeds after retryable writeConcernError",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorLabels": [
+ "RetryableWriteError"
+ ],
+ "writeConcernError": {
+ "code": 91,
+ "errmsg": "Replication is being shut down"
+ }
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ },
+ {
+ "updateOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 1
+ },
+ "update": {
+ "$inc": {
+ "x": 1
+ }
+ }
+ }
+ },
+ {
+ "replaceOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 2
+ },
+ "replacement": {
+ "x": 222
+ }
+ }
+ },
+ {
+ "deleteOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 3
+ }
+ }
+ }
+ ],
+ "verboseResults": true
+ },
+ "expectResult": {
+ "insertedCount": 1,
+ "upsertedCount": 0,
+ "matchedCount": 2,
+ "modifiedCount": 2,
+ "deletedCount": 1,
+ "insertResults": {
+ "0": {
+ "insertedId": 4
+ }
+ },
+ "updateResults": {
+ "1": {
+ "matchedCount": 1,
+ "modifiedCount": 1,
+ "upsertedId": {
+ "$$exists": false
+ }
+ },
+ "2": {
+ "matchedCount": 1,
+ "modifiedCount": 1,
+ "upsertedId": {
+ "$$exists": false
+ }
+ }
+ },
+ "deleteResults": {
+ "3": {
+ "deletedCount": 1
+ }
+ }
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": false
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 2
+ },
+ "updateMods": {
+ "x": 222
+ },
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": false
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": false
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 2
+ },
+ "updateMods": {
+ "x": 222
+ },
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": false
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ],
+ "lsid": {
+ "$$exists": true
+ },
+ "txnNumber": {
+ "$$exists": true
+ }
+ }
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with multi: true operations fails after retryable writeConcernError",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "client0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorLabels": [
+ "RetryableWriteError"
+ ],
+ "writeConcernError": {
+ "code": 91,
+ "errmsg": "Replication is being shut down"
+ }
+ }
+ }
+ }
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "updateMany": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 1
+ },
+ "update": {
+ "$inc": {
+ "x": 1
+ }
+ }
+ }
+ },
+ {
+ "deleteMany": {
+ "namespace": "retryable-writes-tests.coll0",
+ "filter": {
+ "_id": 3
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "writeConcernErrors": [
+ {
+ "code": 91,
+ "message": "Replication is being shut down"
+ }
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": true,
+ "ordered": true,
+ "ops": [
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": true
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 3
+ },
+ "multi": true
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ]
+ }
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with retryWrites: false does not retry",
+ "operations": [
+ {
+ "object": "testRunner",
+ "name": "failPoint",
+ "arguments": {
+ "client": "clientRetryWritesFalse",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorCode": 189,
+ "errorLabels": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ }
+ },
+ {
+ "object": "clientRetryWritesFalse",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-tests.coll0",
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "errorCode": 189,
+ "errorLabelsContain": [
+ "RetryableWriteError"
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "clientRetryWritesFalse",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "bulkWrite": 1,
+ "errorsOnly": true,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 4,
+ "x": 44
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "retryable-writes-tests.coll0"
+ }
+ ]
+ }
+ }
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}
diff --git a/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.yml b/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.yml
new file mode 100644
index 0000000000..23d2c622ee
--- /dev/null
+++ b/source/retryable-writes/tests/unified/client-bulkWrite-serverErrors.yml
@@ -0,0 +1,412 @@
+description: "client bulkWrite retryable writes"
+schemaVersion: "1.21"
+runOnRequirements:
+ - minServerVersion: "8.0"
+ topologies:
+ - replicaset
+ - sharded
+ - load-balanced
+
+createEntities:
+ - client:
+ id: &client0 client0
+ observeEvents: [ commandStartedEvent ]
+ useMultipleMongoses: false
+ - client:
+ id: &clientRetryWritesFalse clientRetryWritesFalse
+ uriOptions:
+ retryWrites: false
+ observeEvents: [ commandStartedEvent ]
+ useMultipleMongoses: false
+ - database:
+ id: &database0 database0
+ client: *client0
+ databaseName: &database0Name retryable-writes-tests
+ - collection:
+ id: &collection0 collection0
+ database: *database0
+ collectionName: &collection0Name coll0
+
+initialData:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
+
+_yamlAnchors:
+ namespace: &namespace "retryable-writes-tests.coll0"
+
+tests:
+ - description: "client bulkWrite with no multi: true operations succeeds after retryable top-level error"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ errorCode: 189 # PrimarySteppedDown
+ errorLabels: [ RetryableWriteError ]
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 4, x: 44 }
+ - updateOne:
+ namespace: *namespace
+ filter: { _id: 1 }
+ update:
+ $inc: { x: 1 }
+ - replaceOne:
+ namespace: *namespace
+ filter: { _id: 2 }
+ replacement: { x: 222 }
+ - deleteOne:
+ namespace: *namespace
+ filter: { _id: 3 }
+ verboseResults: true
+ expectResult:
+ insertedCount: 1
+ upsertedCount: 0
+ matchedCount: 2
+ modifiedCount: 2
+ deletedCount: 1
+ insertResults:
+ 0:
+ insertedId: 4
+ updateResults:
+ 1:
+ matchedCount: 1
+ modifiedCount: 1
+ upsertedId: { $$exists: false }
+ 2:
+ matchedCount: 1
+ modifiedCount: 1
+ upsertedId: { $$exists: false }
+ deleteResults:
+ 3:
+ deletedCount: 1
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: false
+ - update: 0
+ filter: { _id: 2 }
+ updateMods: { x: 222 }
+ multi: false
+ - delete: 0
+ filter: { _id: 3 }
+ multi: false
+ nsInfo:
+ - ns: *namespace
+ lsid: { $$exists: true }
+ txnNumber: { $$exists: true }
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: false
+ - update: 0
+ filter: { _id: 2 }
+ updateMods: { x: 222 }
+ multi: false
+ - delete: 0
+ filter: { _id: 3 }
+ multi: false
+ nsInfo:
+ - ns: *namespace
+ lsid: { $$exists: true }
+ txnNumber: { $$exists: true }
+ outcome:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 12 }
+ - { _id: 2, x: 222 }
+ - { _id: 4, x: 44 }
+ - description: "client bulkWrite with multi: true operations fails after retryable top-level error"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ errorCode: 189 # PrimarySteppedDown
+ errorLabels: [ RetryableWriteError ]
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - updateMany:
+ namespace: *namespace
+ filter: { _id: 1 }
+ update:
+ $inc: { x: 1 }
+ - deleteMany:
+ namespace: *namespace
+ filter: { _id: 3 }
+ expectError:
+ errorCode: 189
+ errorLabelsContain: [ RetryableWriteError ]
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: true
+ ordered: true
+ ops:
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: true
+ - delete: 0
+ filter: { _id: 3 }
+ multi: true
+ nsInfo:
+ - ns: *namespace
+ - description: "client bulkWrite with no multi: true operations succeeds after retryable writeConcernError"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ errorLabels: [ RetryableWriteError ]
+ writeConcernError:
+ code: 91
+ errmsg: "Replication is being shut down"
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 4, x: 44 }
+ - updateOne:
+ namespace: *namespace
+ filter: { _id: 1 }
+ update:
+ $inc: { x: 1 }
+ - replaceOne:
+ namespace: *namespace
+ filter: { _id: 2 }
+ replacement: { x: 222 }
+ - deleteOne:
+ namespace: *namespace
+ filter: { _id: 3 }
+ verboseResults: true
+ expectResult:
+ insertedCount: 1
+ upsertedCount: 0
+ matchedCount: 2
+ modifiedCount: 2
+ deletedCount: 1
+ insertResults:
+ 0:
+ insertedId: 4
+ updateResults:
+ 1:
+ matchedCount: 1
+ modifiedCount: 1
+ upsertedId: { $$exists: false }
+ 2:
+ matchedCount: 1
+ modifiedCount: 1
+ upsertedId: { $$exists: false }
+ deleteResults:
+ 3:
+ deletedCount: 1
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: false
+ - update: 0
+ filter: { _id: 2 }
+ updateMods: { x: 222 }
+ multi: false
+ - delete: 0
+ filter: { _id: 3 }
+ multi: false
+ nsInfo:
+ - ns: *namespace
+ lsid: { $$exists: true }
+ txnNumber: { $$exists: true }
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: false
+ - update: 0
+ filter: { _id: 2 }
+ updateMods: { x: 222 }
+ multi: false
+ - delete: 0
+ filter: { _id: 3 }
+ multi: false
+ nsInfo:
+ - ns: *namespace
+ lsid: { $$exists: true }
+ txnNumber: { $$exists: true }
+ - description: "client bulkWrite with multi: true operations fails after retryable writeConcernError"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *client0
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ errorLabels: [ RetryableWriteError ]
+ writeConcernError:
+ code: 91
+ errmsg: "Replication is being shut down"
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ models:
+ - updateMany:
+ namespace: *namespace
+ filter: { _id: 1 }
+ update:
+ $inc: { x: 1 }
+ - deleteMany:
+ namespace: *namespace
+ filter: { _id: 3 }
+ expectError:
+ writeConcernErrors:
+ - code: 91
+ message: "Replication is being shut down"
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: true
+ ordered: true
+ ops:
+ - update: 0
+ filter: { _id: 1 }
+ updateMods:
+ $inc: { x: 1 }
+ multi: true
+ - delete: 0
+ filter: { _id: 3 }
+ multi: true
+ nsInfo:
+ - ns: *namespace
+ - description: "client bulkWrite with retryWrites: false does not retry"
+ operations:
+ - object: testRunner
+ name: failPoint
+ arguments:
+ client: *clientRetryWritesFalse
+ failPoint:
+ configureFailPoint: failCommand
+ mode:
+ times: 1
+ data:
+ failCommands: [ bulkWrite ]
+ errorCode: 189 # PrimarySteppedDown
+ errorLabels: [ RetryableWriteError ]
+ - object: *clientRetryWritesFalse
+ name: clientBulkWrite
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 4, x: 44 }
+ expectError:
+ errorCode: 189
+ errorLabelsContain: [ RetryableWriteError ]
+ expectEvents:
+ - client: *clientRetryWritesFalse
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ bulkWrite: 1
+ errorsOnly: true
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 4, x: 44 }
+ nsInfo:
+ - ns: *namespace
diff --git a/source/retryable-writes/tests/unified/handshakeError.json b/source/retryable-writes/tests/unified/handshakeError.json
index df37bd7232..3c46463759 100644
--- a/source/retryable-writes/tests/unified/handshakeError.json
+++ b/source/retryable-writes/tests/unified/handshakeError.json
@@ -53,6 +53,222 @@
}
],
"tests": [
+ {
+ "description": "client.clientBulkWrite succeeds after retryable handshake network error",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ],
+ "operations": [
+ {
+ "name": "failPoint",
+ "object": "testRunner",
+ "arguments": {
+ "client": "client",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 2
+ },
+ "data": {
+ "failCommands": [
+ "ping",
+ "saslContinue"
+ ],
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "name": "runCommand",
+ "object": "database",
+ "arguments": {
+ "commandName": "ping",
+ "command": {
+ "ping": 1
+ }
+ },
+ "expectError": {
+ "isError": true
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-handshake-tests.coll",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client",
+ "eventType": "cmap",
+ "events": [
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ }
+ ]
+ },
+ {
+ "client": "client",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "command": {
+ "ping": 1
+ },
+ "databaseName": "retryable-writes-handshake-tests"
+ }
+ },
+ {
+ "commandFailedEvent": {
+ "commandName": "ping"
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite"
+ }
+ },
+ {
+ "commandSucceededEvent": {
+ "commandName": "bulkWrite"
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client.clientBulkWrite succeeds after retryable handshake server error (ShutdownInProgress)",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ],
+ "operations": [
+ {
+ "name": "failPoint",
+ "object": "testRunner",
+ "arguments": {
+ "client": "client",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 2
+ },
+ "data": {
+ "failCommands": [
+ "ping",
+ "saslContinue"
+ ],
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "name": "runCommand",
+ "object": "database",
+ "arguments": {
+ "commandName": "ping",
+ "command": {
+ "ping": 1
+ }
+ },
+ "expectError": {
+ "isError": true
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "retryable-writes-handshake-tests.coll",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ }
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client",
+ "eventType": "cmap",
+ "events": [
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ },
+ {
+ "connectionCheckOutStartedEvent": {}
+ }
+ ]
+ },
+ {
+ "client": "client",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "command": {
+ "ping": 1
+ },
+ "databaseName": "retryable-writes-handshake-tests"
+ }
+ },
+ {
+ "commandFailedEvent": {
+ "commandName": "ping"
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite"
+ }
+ },
+ {
+ "commandSucceededEvent": {
+ "commandName": "bulkWrite"
+ }
+ }
+ ]
+ }
+ ]
+ },
{
"description": "collection.insertOne succeeds after retryable handshake network error",
"operations": [
diff --git a/source/retryable-writes/tests/unified/handshakeError.yml b/source/retryable-writes/tests/unified/handshakeError.yml
index 9b2774bc77..131bbf2e5c 100644
--- a/source/retryable-writes/tests/unified/handshakeError.yml
+++ b/source/retryable-writes/tests/unified/handshakeError.yml
@@ -50,6 +50,96 @@ tests:
# - Triggers failpoint (second time).
# - Tests whether operation successfully retries the handshake and succeeds.
+ - description: "client.clientBulkWrite succeeds after retryable handshake network error"
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0
+ operations:
+ - name: failPoint
+ object: testRunner
+ arguments:
+ client: *client
+ failPoint:
+ configureFailPoint: failCommand
+ mode: { times: 2 }
+ data:
+ failCommands: [ping, saslContinue]
+ closeConnection: true
+ - name: runCommand
+ object: *database
+ arguments: { commandName: ping, command: { ping: 1 } }
+ expectError: { isError: true }
+ - name: clientBulkWrite
+ object: *client
+ arguments:
+ models:
+ - insertOne:
+ namespace: retryable-writes-handshake-tests.coll
+ document: { _id: 8, x: 88 }
+ expectEvents:
+ - client: *client
+ eventType: cmap
+ events:
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - client: *client
+ events:
+ - commandStartedEvent:
+ command: { ping: 1 }
+ databaseName: *databaseName
+ - commandFailedEvent:
+ commandName: ping
+ - commandStartedEvent:
+ commandName: bulkWrite
+ - commandSucceededEvent:
+ commandName: bulkWrite
+
+ - description: "client.clientBulkWrite succeeds after retryable handshake server error (ShutdownInProgress)"
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0
+ operations:
+ - name: failPoint
+ object: testRunner
+ arguments:
+ client: *client
+ failPoint:
+ configureFailPoint: failCommand
+ mode: { times: 2 }
+ data:
+ failCommands: [ping, saslContinue]
+ closeConnection: true
+ - name: runCommand
+ object: *database
+ arguments: { commandName: ping, command: { ping: 1 } }
+ expectError: { isError: true }
+ - name: clientBulkWrite
+ object: *client
+ arguments:
+ models:
+ - insertOne:
+ namespace: retryable-writes-handshake-tests.coll
+ document: { _id: 8, x: 88 }
+ expectEvents:
+ - client: *client
+ eventType: cmap
+ events:
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - { connectionCheckOutStartedEvent: {} }
+ - client: *client
+ events:
+ - commandStartedEvent:
+ command: { ping: 1 }
+ databaseName: *databaseName
+ - commandFailedEvent:
+ commandName: ping
+ - commandStartedEvent:
+ commandName: bulkWrite
+ - commandSucceededEvent:
+ commandName: bulkWrite
+
- description: "collection.insertOne succeeds after retryable handshake network error"
operations:
- name: failPoint
diff --git a/source/run-command/run-command.rst b/source/run-command/run-command.rst
index a51f2b0261..5e7e667966 100644
--- a/source/run-command/run-command.rst
+++ b/source/run-command/run-command.rst
@@ -76,7 +76,7 @@ The following represents how a runCommand API SHOULD be exposed.
* An optional explicit client session.
* The associated logical session id (`lsid`) the driver MUST apply to the command.
*
- * @see https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst#clientsession
+ * @see ../sessions/driver-sessions.md#clientsession
*/
session?: ClientSession;
@@ -129,11 +129,11 @@ Drivers MUST NOT attempt to check the command document for the presence of an ``
Every ClientSession has a corresponding logical session ID representing the server-side session ID.
The logical session ID MUST be included under ``lsid`` in the command sent to the server without modifying user input.
-* See Driver Sessions' section on `Sending the session ID to the server on all commands
`_
+* See Driver Sessions' section on `Sending the session ID to the server on all commands <../sessions/driver-sessions.md#sending-the-session-id-to-the-server-on-all-commands>`_
The command sent to the server MUST gossip the ``$clusterTime`` if cluster time support is detected.
-* See Driver Sessions' section on `Gossipping the cluster time `_
+* See Driver Sessions' section on `Gossipping the cluster time <../sessions/driver-sessions.md#gossipping-the-cluster-time>`_
Transactions
""""""""""""
@@ -274,7 +274,7 @@ All ``getMore`` commands constructed for this cursor MUST send the same ``lsid``
A cursor is considered exhausted or closed when the server reports its ``id`` as zero.
When the cursor is exhausted the client session MUST be ended and the server session returned to the pool as early as possible rather than waiting for a caller to completely iterate the final batch.
-* See Drivers Sessions' section on `Sessions and Cursors `_
+* See Drivers Sessions' section on `Sessions and Cursors <../sessions/driver-sessions.md#sessions-and-cursors>`_
Server Selection
""""""""""""""""
@@ -320,7 +320,7 @@ Drivers MUST provide an explicit mechanism for releasing the cursor resources, t
If the cursor id is nonzero a KillCursors operation MUST be attempted, the result of the operation SHOULD be ignored.
The ClientSession associated with the cursor MUST be ended and the ServerSession returned to the pool.
-* See Driver Sessions' section on `When sending a killCursors command `_
+* See Driver Sessions' section on `When sending a killCursors command <../sessions/driver-sessions.md#when-sending-a-killcursors-command>`_
* See Find, getMore and killCursors commands' section on `killCursors `_
Client Side Operations Timeout
diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md
index a7589a6472..5437f0eb75 100644
--- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md
+++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md
@@ -33,8 +33,7 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH
`Server`
> The term `Server` refers to the implementation in the driver's language of an abstraction of a mongod or mongos
-> process, or a load balancer, as defined by the
-> [SDAM specification](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server).
+> process, or a load balancer, as defined by the [SDAM specification](server-discovery-and-monitoring.md#server).
### Specification
@@ -369,59 +368,14 @@ The following table describes the rules for determining if a topology type has r
preference is passed to `hasReadableServer`, the driver MUST default the value to the default read preference,
`primary`, or treat the call as if `primary` was provided.
-
-
-
-
-
-
-
-
-
-
-
-Unknown |
-false |
-false |
-
-
-Single |
-true if the server is available |
-true if the server is available |
-
-
-ReplicaSetNoPrimary |
-Called with primary :
-false
-Called with any other option: uses the read preference to determine if
-any server in the cluster is suitable for reading.
-Called with no option: false |
-false |
-
-
-ReplicaSetWithPrimary |
-Called with any valid option: uses the read
-preference to determine if any server in the cluster is suitable for
-reading.
-Called with no option: true |
-true |
-
-
-Sharded |
-true if 1+ servers are available |
-true if 1+ servers are available |
-
-
-LoadBalanced |
-true |
-true |
-
-
-
+| Topology Type | `hasReadableServer` | `hasWritableServer` |
+| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- |
+| Unknown | `false` | `false` |
+| Single | `true` if the server is available | `true` if the server is available |
+| ReplicaSetNoPrimary | Called with `primary`: `false`
Called with any other option: uses the read preference to determine if any server in the cluster is suitable for reading.
Called with no option: `false` | `false` |
+| ReplicaSetWithPrimary | Called with any valid option: uses the read preference to determine if any server in the cluster is suitable for reading.
Called with no option: `true` | `true` |
+| Sharded | `true` if 1+ servers are available | `true` if 1+ servers are available |
+| LoadBalanced | `true` | `true` |
### Log Messages
@@ -617,13 +571,9 @@ See the [README](tests/monitoring/README.md).
- 2021-05-06: Updated to use modern terminology.
-# \<\<\<\<\<\<\< HEAD :2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's previousDescription field :2024-01-17: Updated to require that `TopologyDescriptionChangedEvent` should be emitted before just `TopologyClosedEvent` is emitted :2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted :2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and added requirements for SDAM log messages. :2022-10-05: Remove spec front matter and reformat changelog. :2021-05-06: Updated to use modern terminology. :2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. :2018:12-12: Clarified table of rules for readable/writable servers :2016-08-31: Added table of rules for determining if topology has readable/writable servers. :2016-10-11: TopologyDescription objects MAY have additional methods and properties. ||||||| parent of 469393fd (DRIVERS-2789 Convert SDAM Spec to Markdown) :2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's previousDescription field :2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted :2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and added requirements for SDAM log messages. :2022-10-05: Remove spec front matter and reformat changelog. :2021-05-06: Updated to use modern terminology. :2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. :2018:12-12: Clarified table of rules for readable/writable servers :2016-08-31: Added table of rules for determining if topology has readable/writable servers. :2016-10-11: TopologyDescription objects MAY have additional methods and properties.
-
- 2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events.
-> > > > > > > 469393fd (DRIVERS-2789 Convert SDAM Spec to Markdown)
-
-- 2018:12-12: Clarified table of rules for readable/writable servers
+- 2018-12-12: Clarified table of rules for readable/writable servers
- 2016-08-31: Added table of rules for determining if topology has readable/writable servers.
diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
new file mode 100644
index 0000000000..498fde0eaf
--- /dev/null
+++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md
@@ -0,0 +1,1990 @@
+# Server Discovery And Monitoring
+
+- Status: Accepted
+- Minimum Server Version: 2.4
+
+______________________________________________________________________
+
+## Abstract
+
+This spec defines how a MongoDB client discovers and monitors one or more servers. It covers monitoring a single server,
+a set of mongoses, or a replica set. How does the client determine what type of servers they are? How does it keep this
+information up to date? How does the client find an entire replica set from a seed list, and how does it respond to a
+stepdown, election, reconfiguration, or network error?
+
+All drivers must answer these questions the same. Or, where platforms' limitations require differences among drivers,
+there must be as few answers as possible and each must be clearly explained in this spec. Even in cases where several
+answers seem equally good, drivers must agree on one way to do it.
+
+MongoDB users and driver authors benefit from having one way to discover and monitor servers. Users can substantially
+understand their driver's behavior without inspecting its code or asking its author. Driver authors can avoid subtle
+mistakes when they take advantage of a design that has been well-considered, reviewed, and tested.
+
+The server discovery and monitoring method is specified in four sections. First, a client is
+[configured](#configuration). Second, it begins [monitoring](#monitoring) by calling
+[hello or legacy hello](../mongodb-handshake/handshake.rst#terms) on all servers. (Multi-threaded and asynchronous
+monitoring is described first, then single-threaded monitoring.) Third, as hello or legacy hello responses are received
+the client [parses them](#parsing-a-hello-or-legacy-hello-response), and fourth, it \[updates its view of the
+topology\](#updates its view of the topology).
+
+Finally, this spec describes how \[drivers update their topology view in response to errors\](#drivers update their
+topology view in response to errors), and includes generous implementation notes for driver authors.
+
+This spec does not describe how a client chooses a server for an operation; that is the domain of the Server Selection
+Spec. But there is a section describing the \[interaction between monitoring and server selection\](#interaction between
+monitoring and server selection).
+
+There is no discussion of driver architecture and data structures, nor is there any specification of a user-facing API.
+This spec is only concerned with the algorithm for monitoring the server topology.
+
+## Meta
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
+
+## Specification
+
+### General Requirements
+
+**Direct connections:** A client MUST be able to connect to a single server of any type. This includes querying hidden
+replica set members, and connecting to uninitialized members (see [RSGhost](#rsghost-and-rsother)) in order to run
+"replSetInitiate". Setting a read preference MUST NOT be necessary to connect to a secondary. Of course, the secondary
+will reject all operations done with the PRIMARY read preference because the secondaryOk bit is not set, but the initial
+connection itself succeeds. Drivers MAY allow direct connections to arbiters (for example, to run administrative
+commands).
+
+**Replica sets:** A client MUST be able to discover an entire replica set from a seed list containing one or more
+replica set members. It MUST be able to continue monitoring the replica set even when some members go down, or when
+reconfigs add and remove members. A client MUST be able to connect to a replica set while there is no primary, or the
+primary is down.
+
+**Mongos:** A client MUST be able to connect to a set of mongoses and monitor their availability and
+[round trip time](#round-trip-time). This spec defines how mongoses are discovered and monitored, but does not define
+which mongos is selected for a given operation.
+
+### Terms
+
+#### Server
+
+A mongod or mongos process, or a load balancer.
+
+#### Deployment
+
+One or more servers: either a standalone, a replica set, or one or more mongoses.
+
+#### Topology
+
+The state of the deployment: its type (standalone, replica set, or sharded), which servers are up, what type of servers
+they are, which is primary, and so on.
+
+#### Client
+
+Driver code responsible for connecting to MongoDB.
+
+#### Seed list
+
+Server addresses provided to the client in its initial configuration, for example from the
+[connection string](https://www.mongodb.com/docs/manual/reference/connection-string/).
+
+#### Data-Bearing Server Type
+
+A server type from which a client can receive application data:
+
+- Mongos
+- RSPrimary
+- RSSecondary
+- Standalone
+- LoadBalanced
+
+#### Round trip time
+
+Also known as RTT.
+
+The client's measurement of the duration of one hello or legacy hello call. The round trip time is used to support the
+"localThresholdMS"[^1] option in the Server Selection Spec.
+
+#### hello or legacy hello outcome
+
+The result of an attempt to call the hello or legacy hello command on a server. It consists of three elements: a boolean
+indicating the success or failure of the attempt, a document containing the command response (or null if it failed), and
+the round trip time to execute the command (or null if it failed).
+
+#### check
+
+The client checks a server by attempting to call hello or legacy hello on it, and recording the outcome.
+
+#### scan
+
+The process of checking all servers in the deployment.
+
+#### suitable
+
+A server is judged "suitable" for an operation if the client can use it for a particular operation. For example, a write
+requires a standalone, primary, or mongos. Suitability is fully specified in the
+[Server Selection Spec](../server-selection/server-selection.md).
+
+#### address
+
+The hostname or IP address, and port number, of a MongoDB server.
+
+#### network error
+
+An error that occurs while reading from or writing to a network socket.
+
+#### network timeout
+
+A timeout that occurs while reading from or writing to a network socket.
+
+#### minHeartbeatFrequencyMS
+
+Defined in the [Server Monitoring spec](server-monitoring.rst). This value MUST be 500 ms, and it MUST NOT be
+configurable.
+
+#### pool generation number
+
+The pool's generation number which starts at 0 and is incremented each time the pool is cleared. Defined in the
+[Connection Monitoring and Pooling spec](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md).
+
+#### connection generation number
+
+The pool's generation number at the time this connection was created. Defined in the
+[Connection Monitoring and Pooling spec](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md).
+
+#### error generation number
+
+The error's generation number is the generation of the connection on which the application error occurred. Note that
+when a network error occurs before the handshake completes then the error's generation number is the generation of the
+pool at the time the connection attempt was started.
+
+#### State Change Error
+
+A server reply document indicating a "not writable primary" or "node is recovering" error. Starting in MongoDB 4.4 these
+errors may also include a [topologyVersion](#topologyversion) field.
+
+### Data structures
+
+This spec uses a few data structures to describe the client's view of the topology. It must be emphasized that a driver
+is free to implement the same behavior using different data structures. This spec uses these enums and structs in order
+to describe driver **behavior**, not to mandate how a driver represents the topology, nor to mandate an API.
+
+#### Constants
+
+##### clientMinWireVersion and clientMaxWireVersion
+
+Integers. The wire protocol range supported by the client.
+
+#### Enums
+
+##### TopologyType
+
+Single, ReplicaSetNoPrimary, ReplicaSetWithPrimary, Sharded, LoadBalanced, or Unknown.
+
+See [updating the TopologyDescription](#updating-the-topologydescription).
+
+##### ServerType
+
+Standalone, Mongos, PossiblePrimary, RSPrimary, RSSecondary, RSArbiter, RSOther, RSGhost, LoadBalancer or Unknown.
+
+See [parsing a hello or legacy hello response](#parsing-a-hello-or-legacy-hello-response).
+
+> [!NOTE]
+> Single-threaded clients use the PossiblePrimary type to maintain proper
+> [scanning order](server-monitoring.rst#scanning-order). Multi-threaded and asynchronous clients do not need this
+> ServerType; it is synonymous with Unknown.
+
+#### TopologyDescription
+
+The client's representation of everything it knows about the deployment's topology.
+
+Fields:
+
+- type: a [TopologyType](#topologytype) enum value. See [initial TopologyType](#initial-topologytype).
+- setName: the replica set name. Default null.
+- maxElectionId: an ObjectId or null. The largest electionId ever reported by a primary. Default null. Part of the
+ (`electionId`, `setVersion`) tuple.
+- maxSetVersion: an integer or null. The largest setVersion ever reported by a primary. It may not monotonically
+ increase, as electionId takes precedence in ordering Default null. Part of the (`electionId`, `setVersion`) tuple.
+- servers: a set of ServerDescription instances. Default contains one server: "localhost:27017", ServerType Unknown.
+- stale: a boolean for single-threaded clients, whether the topology must be re-scanned. (Not related to
+ maxStalenessSeconds, nor to \[stale primaries\](#stale primaries).)
+- compatible: a boolean. False if any server's wire protocol version range is incompatible with the client's. Default
+ true.
+- compatibilityError: a string. The error message if "compatible" is false, otherwise null.
+- logicalSessionTimeoutMinutes: integer or null. Default null. See [logical session timeout](#logical-session-timeout).
+
+#### ServerDescription
+
+The client's view of a single server, based on the most recent hello or legacy hello outcome.
+
+Again, drivers may store this information however they choose; this data structure is defined here merely to describe
+the monitoring algorithm.
+
+Fields:
+
+- address: the hostname or IP, and the port number, that the client connects to. Note that this is **not** the "me"
+ field in the server's hello or legacy hello response, in the case that the server reports an address different from
+ the address the client uses.
+- (=) error: information about the last error related to this server. Default null.
+- roundTripTime: the duration of the hello or legacy hello call. Default null.
+- minRoundTripTime: the minimum RTT for the server. Default null.
+- lastWriteDate: a 64-bit BSON datetime or null. The "lastWriteDate" from the server's most recent hello or legacy hello
+ response.
+- opTime: an opTime or null. An opaque value representing the position in the oplog of the most recently seen write.
+ Default null. (Only mongos and shard servers record this field when monitoring config servers as replica sets, at
+ least until
+ [drivers allow applications to use readConcern "afterOptime".](../max-staleness/max-staleness.md#future-feature-to-support-readconcern-afteroptime))
+- (=) type: a [ServerType](#servertype) enum value. Default Unknown.
+- (=) minWireVersion, maxWireVersion: the wire protocol version range supported by the server. Both default to 0. \[Use
+ min and maxWireVersion only to determine compatibility\](#use min and maxWireVersion only to determine compatibility).
+- (=) me: The hostname or IP, and the port number, that this server was configured with in the replica set. Default
+ null.
+- (=) hosts, passives, arbiters: Sets of addresses. This server's opinion of the replica set's members, if any. These
+ [hostnames are normalized to lower-case](#hostnames-are-normalized-to-lower-case). Default empty. The client
+ \[monitors all three types of servers\](#monitors all three types of servers) in a replica set.
+- (=) tags: map from string to string. Default empty.
+- (=) setName: string or null. Default null.
+- (=) electionId: an ObjectId, if this is a MongoDB 2.6+ replica set member that believes it is primary. See
+ [using electionId and setVersion to detect stale primaries](#using-electionid-and-setversion-to-detect-stale-primaries).
+ Default null.
+- (=) setVersion: integer or null. Default null.
+- (=) primary: an address. This server's opinion of who the primary is. Default null.
+- lastUpdateTime: when this server was last checked. Default "infinity ago".
+- (=) logicalSessionTimeoutMinutes: integer or null. Default null.
+- (=) topologyVersion: A topologyVersion or null. Default null. The "topologyVersion" from the server's most recent
+ hello or legacy hello response or [State Change Error](#state-change-error).
+- (=) iscryptd: boolean indicating if the server is a
+ [mongocryptd](../client-side-encryption/client-side-encryption.md#mongocryptd) server. Default null.
+
+"Passives" are priority-zero replica set members that cannot become primary. The client treats them precisely the same
+as other members.
+
+Fields marked (=) are used for [Server Description Equality](#server-description-equality) comparison.
+
+### Configuration
+
+#### No breaking changes
+
+This spec does not intend to require any drivers to make breaking changes regarding what configuration options are
+available, how options are named, or what combinations of options are allowed.
+
+#### Initial TopologyDescription
+
+The default values for [TopologyDescription](#topologydescription) fields are described above. Users may override the
+defaults as follows:
+
+##### Initial Servers
+
+The user MUST be able to set the initial servers list to a [seed list](#seed-list) of one or more addresses.
+
+The hostname portion of each address MUST be normalized to lower-case.
+
+##### Initial TopologyType
+
+If the `directConnection` URI option is specified when a MongoClient is constructed, the TopologyType must be
+initialized based on the value of the `directConnection` option and the presence of the `replicaSet` option according to
+the following table:
+
+| directConnection | replicaSet present | Initial TopologyType |
+| ---------------- | ------------------ | -------------------- |
+| true | no | Single |
+| true | yes | Single |
+| false | no | Unknown |
+| false | yes | ReplicaSetNoPrimary |
+
+If the `directConnection` option is not specified, newly developed drivers MUST behave as if it was specified with the
+false value.
+
+Since changing the starting topology can reasonably be considered a backwards-breaking change, existing drivers SHOULD
+stage implementation according to semantic versioning guidelines. Specifically, support for the `directConnection` URI
+option can be added in a minor release. In a subsequent major release, the default starting topology can be changed to
+Unknown. Drivers MUST document this in a prior minor release.
+
+Existing drivers MUST deprecate other URI options, if any, for controlling topology discovery or specifying the
+deployment topology. If such a legacy option is specified and the `directConnection` option is also specified, and the
+values of the two options are semantically different, the driver MUST report an error during URI option parsing.
+
+The API for initializing TopologyType using language-specific native options is not specified here. Drivers might
+already have a convention, e.g. a single seed means Single, a setName means ReplicaSetNoPrimary, and a list of seeds
+means Unknown. There are variations, however: In the Java driver a single seed means Single, but a **list** containing
+one seed means Unknown, so it can transition to replica-set monitoring if the seed is discovered to be a replica set
+member. In contrast, PyMongo requires a non-null setName in order to begin replica-set monitoring, regardless of the
+number of seeds. This spec does not cover language-specific native options that a driver may provide.
+
+##### Initial setName
+
+It is allowed to use `directConnection=true` in conjunction with the `replicaSet` URI option. The driver must connect in
+Single topology and verify that setName matches the specified name, as per
+[verifying setName with TopologyType Single](#verifying-setname-with-topologytype-single).
+
+When a MongoClient is initialized using language-specific native options, the user MUST be able to set the client's
+initial replica set name. A driver MAY require the set name in order to connect to a replica set, or it MAY be able to
+discover the replica set name as it connects.
+
+##### Allowed configuration combinations
+
+Drivers MUST enforce:
+
+- TopologyType Single cannot be used with multiple seeds.
+- `directConnection=true` cannot be used with multiple seeds.
+- If setName is not null, only TopologyType ReplicaSetNoPrimary, and possibly Single, are allowed. (See
+ [verifying setName with TopologyType Single](#verifying-setname-with-topologytype-single).)
+- `loadBalanced=true` cannot be used in conjunction with `directConnection=true` or `replicaSet`
+
+##### Handling of SRV URIs resolving to single host
+
+When a driver is given an SRV URI, if the `directConnection` URI option is not specified, and the `replicaSet` URI
+option is not specified, the driver MUST start in Unknown topology, and follow the rules in the
+[TopologyType table](#topologytype-table) for transitioning to other topologies. In particular, the driver MUST NOT use
+the number of hosts from the initial SRV lookup to decide what topology to start in.
+
+#### heartbeatFrequencyMS
+
+The interval between server [checks](#check), counted from the end of the previous check until the beginning of the next
+one.
+
+For multi-threaded and asynchronous drivers it MUST default to 10 seconds and MUST be configurable. For single-threaded
+drivers it MUST default to 60 seconds and MUST be configurable. It MUST be called heartbeatFrequencyMS unless this
+breaks backwards compatibility.
+
+For both multi- and single-threaded drivers, the driver MUST NOT permit users to configure it less than
+minHeartbeatFrequencyMS (500ms).
+
+(See
+[heartbeatFrequencyMS defaults to 10 seconds or 60 seconds](#heartbeatfrequencyms-defaults-to-10-seconds-or-60-seconds)
+and [what's the point of periodic monitoring?](#whats-the-point-of-periodic-monitoring))
+
+### Client construction
+
+Except for [initial DNS seed list discovery](../initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md) when
+given a connection string with `mongodb+srv` scheme, the client's constructor MUST NOT do any I/O. This means that the
+constructor does not throw an exception if servers are unavailable: the topology is not yet known when the constructor
+returns. Similarly if a server has an incompatible wire protocol version, the constructor does not throw. Instead, all
+subsequent operations on the client fail as long as the error persists.
+
+See [clients do no I/O in the constructor](#clients-do-no-io-in-the-constructor) for the justification.
+
+#### Multi-threaded and asynchronous client construction
+
+The constructor MAY start the monitors as background tasks and return immediately. Or the monitors MAY be started by
+some method separate from the constructor; for example they MAY be started by some "initialize" method (by any name), or
+on the first use of the client for an operation.
+
+#### Single-threaded client construction
+
+Single-threaded clients do no I/O in the constructor. They MUST [scan](#scan) the servers on demand, when the first
+operation is attempted.
+
+### Client closing
+
+When a client is closing, before it emits the `TopologyClosedEvent` as per the
+[Events API](./server-discovery-and-monitoring-logging-and-monitoring.md#events-api), it SHOULD [remove](#remove) all
+servers from its `TopologyDescription` and set its `TopologyType` to `Unknown`, emitting the corresponding
+`TopologyDescriptionChangedEvent`.
+
+### Monitoring
+
+See the [Server Monitoring spec](server-monitoring.rst) for how a driver monitors each server. In summary, the client
+monitors each server in the topology. The scope of server monitoring is to provide the topology with updated
+ServerDescriptions based on hello or legacy hello command responses.
+
+### Parsing a hello or legacy hello response
+
+The client represents its view of each server with a [ServerDescription](#serverdescription). Each time the client
+[checks](#check) a server, it MUST replace its description of that server with a new one if and only if the new
+ServerDescription's [topologyVersion](#topologyversion) is greater than or equal to the current ServerDescription's
+[topologyVersion](#topologyversion).
+
+(See [Replacing the TopologyDescription](#replacing-the-topologydescription) for an example implementation.)
+
+This replacement MUST happen even if the new server description compares equal to the previous one, in order to keep
+client-tracked attributes like last update time and round trip time up to date.
+
+Drivers MUST be able to handle responses to both `hello` and legacy hello commands. When checking results, drivers MUST
+first check for the `isWritablePrimary` field and fall back to checking for an `ismaster` field if `isWritablePrimary`
+was not found.
+
+ServerDescriptions are created from hello or legacy hello outcomes as follows:
+
+#### type
+
+The new ServerDescription's type field is set to a [ServerType](#servertype). Note that these states do **not** exactly
+correspond to [replica set member states](https://www.mongodb.com/docs/manual/reference/replica-states/). For example,
+some replica set member states like STARTUP and RECOVERING are identical from the client's perspective, so they are
+merged into "RSOther". Additionally, states like Standalone and Mongos are not replica set member states at all.
+
+| State | Symptoms |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------- |
+| Unknown | Initial, or after a network error or failed hello or legacy hello call, or "ok: 1" not in hello or legacy hello response. |
+| Standalone | No "msg: isdbgrid", no setName, and no "isreplicaset: true". |
+| Mongos | "msg: isdbgrid". |
+| PossiblePrimary | Not yet checked, but another member thinks it is the primary. |
+| RSPrimary | "isWritablePrimary: true" or "ismaster: true", "setName" in response. |
+| RSSecondary | "secondary: true", "setName" in response. |
+| RSArbiter | "arbiterOnly: true", "setName" in response. |
+| RSOther | "setName" in response, "hidden: true" or not primary, secondary, nor arbiter. |
+| RSGhost | "isreplicaset: true" in response. |
+| LoadBalanced | "loadBalanced=true" in URI. |
+
+A server can transition from any state to any other. For example, an administrator could shut down a secondary and bring
+up a mongos in its place.
+
+##### RSGhost and RSOther
+
+The client MUST monitor replica set members even when they cannot be queried. These members are in state RSGhost or
+RSOther.
+
+**RSGhost** members occur in at least three situations:
+
+- briefly during server startup,
+- in an uninitialized replica set,
+- or when the server is shunned (removed from the replica set config).
+
+An RSGhost server has no hosts list nor setName. Therefore the client MUST NOT attempt to use its hosts list nor check
+its setName (see [JAVA-1161](https://jira.mongodb.org/browse/JAVA-1161) or
+[CSHARP-671](https://jira.mongodb.org/browse/CSHARP-671).) However, the client MUST keep the RSGhost member in its
+TopologyDescription, in case the client's only hope for staying connected to the replica set is that this member will
+transition to a more useful state.
+
+For simplicity, this is the rule: any server is an RSGhost that reports "isreplicaset: true".
+
+Non-ghost replica set members have reported their setNames since MongoDB 1.6.2. See
+[only support replica set members running MongoDB 1.6.2 or later](#only-support-replica-set-members-running-mongodb-162-or-later).
+
+> [!NOTE]
+> The Java driver does not have a separate state for RSGhost; it is an RSOther server with no hosts list.
+
+**RSOther** servers may be hidden, starting up, or recovering. They cannot be queried, but their hosts lists are useful
+for discovering the current replica set configuration.
+
+If a [hidden member](https://www.mongodb.com/docs/manual/core/replica-set-hidden-member/) is provided as a seed, the
+client can use it to find the primary. Since the hidden member does not appear in the primary's host list, it will be
+removed once the primary is checked.
+
+#### error
+
+If the client experiences any error when checking a server, it stores error information in the ServerDescription's error
+field.
+
+#### roundTripTime
+
+Drivers MUST record the server's [round trip time](#round-trip-time) (RTT) after each successful call to hello or legacy
+hello. The Server Selection Spec describes how RTT is averaged and how it is used in server selection. Drivers MUST also
+record the server's minimum RTT per [Server Monitoring (Measuring RTT)](server-monitoring.rst#measuring-rtt).
+
+If a hello or legacy hello call fails, the RTT is not updated. Furthermore, while a server's type is Unknown its RTT is
+null, and if it changes from a known type to Unknown its RTT is set to null. However, if it changes from one known type
+to another (e.g. from RSPrimary to RSSecondary) its RTT is updated normally, not set to null nor restarted from scratch.
+
+#### lastWriteDate and opTime
+
+The hello or legacy hello response of a replica set member running MongoDB 3.4 and later contains a `lastWrite`
+subdocument with fields `lastWriteDate` and `opTime` ([SERVER-8858](https://jira.mongodb.org/browse/SERVER-8858)). If
+these fields are available, parse them from the hello or legacy hello response, otherwise set them to null.
+
+Clients MUST NOT attempt to compensate for the network latency between when the server generated its hello or legacy
+hello response and when the client records `lastUpdateTime`.
+
+#### lastUpdateTime
+
+Clients SHOULD set lastUpdateTime with a monotonic clock.
+
+#### Hostnames are normalized to lower-case
+
+The same as with seeds provided in the initial configuration, all hostnames in the hello or legacy hello response's
+"me", "hosts", "passives", and "arbiters" entries MUST be lower-cased.
+
+This prevents unnecessary work rediscovering a server if a seed "A" is provided and the server responds that "a" is in
+the replica set.
+
+[RFC 4343](http://tools.ietf.org/html/rfc4343):
+
+> Domain Name System (DNS) names are "case insensitive".
+
+#### logicalSessionTimeoutMinutes
+
+MongoDB 3.6 and later include a `logicalSessionTimeoutMinutes` field if logical sessions are enabled in the deployment.
+Clients MUST check for this field and set the ServerDescription's logicalSessionTimeoutMinutes field to this value, or
+to null otherwise.
+
+#### topologyVersion
+
+MongoDB 4.4 and later include a `topologyVersion` field in all hello or legacy hello and
+[State Change Error](#state-change-error) responses. Clients MUST check for this field and set the ServerDescription's
+topologyVersion field to this value, if present. The topologyVersion helps the client and server determine the relative
+freshness of topology information in concurrent messages. (See
+[What is the purpose of topologyVersion?](#what-is-the-purpose-of-topologyversion))
+
+The topologyVersion is a subdocument with two fields, "processId" and "counter":
+
+```typescript
+{
+ topologyVersion: {processId: , counter: },
+ ( ... other fields ...)
+}
+```
+
+##### topologyVersion Comparison
+
+To compare a topologyVersion from a hello or legacy hello or State Change Error response to the current
+ServerDescription's topologyVersion:
+
+1. If the response topologyVersion is unset or the ServerDescription's topologyVersion is null, the client MUST assume
+ the response is more recent.
+2. If the response's topologyVersion.processId is not equal to the ServerDescription's, the client MUST assume the
+ response is more recent.
+3. If the response's topologyVersion.processId is equal to the ServerDescription's, the client MUST use the counter
+ field to determine which topologyVersion is more recent.
+
+See [Replacing the TopologyDescription](#replacing-the-topologydescription) for an example implementation of
+topologyVersion comparison.
+
+#### serviceId
+
+MongoDB 5.0 and later, as well as any mongos-like service, include a `serviceId` field when the service is configured
+behind a load balancer.
+
+#### Other ServerDescription fields
+
+Other required fields defined in the [ServerDescription](#serverdescription) data structure are parsed from the hello or
+legacy hello response in the obvious way.
+
+#### Server Description Equality
+
+For the purpose of determining whether to publish SDAM events, two server descriptions having the same address MUST be
+considered equal if and only if the values of [ServerDescription](#serverdescription) fields marked (=) are respectively
+equal.
+
+This specification does not prescribe how to compare server descriptions with different addresses for equality.
+
+### Updating the TopologyDescription
+
+Each time the client checks a server, it processes the outcome (successful or not) to create a
+[ServerDescription](#serverdescription), and then it processes the ServerDescription to update its
+[TopologyDescription](#topologydescription).
+
+The TopologyDescription's [TopologyType](#topologytype) influences how the ServerDescription is processed. The following
+subsection specifies how the client updates its TopologyDescription when the TopologyType is Single. The next subsection
+treats the other types.
+
+#### TopologyType Single
+
+The TopologyDescription's type was initialized as Single and remains Single forever. There is always one
+ServerDescription in TopologyDescription.servers.
+
+Whenever the client checks a server (successfully or not), and regardless of whether the new server description is equal
+to the previous server description as defined in [Server Description Equality](#server-description-equality), the
+ServerDescription in TopologyDescription.servers MUST be replaced with the new ServerDescription.
+
+##### Checking wire protocol compatibility
+
+A ServerDescription which is not Unknown is incompatible if:
+
+- minWireVersion > clientMaxWireVersion, or
+- maxWireVersion \< clientMinWireVersion
+
+If any ServerDescription is incompatible, the client MUST set the TopologyDescription's "compatible" field to false and
+fill out the TopologyDescription's "compatibilityError" field like so:
+
+- if ServerDescription.minWireVersion > clientMaxWireVersion:
+
+ "Server at $host:$port requires wire version $minWireVersion, but this version of $driverName only supports up to
+ $clientMaxWireVersion."
+
+- if ServerDescription.maxWireVersion \< clientMinWireVersion:
+
+ "Server at $host:$port reports wire version $maxWireVersion, but this version of $driverName requires at least
+ $clientMinWireVersion (MongoDB $mongoVersion)."
+
+Replace $mongoVersion with the appropriate MongoDB minor version, for example if clientMinWireVersion is 2 and it
+connects to MongoDB 2.4, format the error like:
+
+> "Server at example.com:27017 reports wire version 0, but this version of My Driver requires at least 2 (MongoDB 2.6)."
+
+In this second case, the exact required MongoDB version is known and can be named in the error message, whereas in the
+first case the implementer does not know which MongoDB versions will be compatible or incompatible in the future.
+
+##### Verifying setName with TopologyType Single
+
+A client MAY allow the user to supply a setName with an initial TopologyType of Single. In this case, if the
+ServerDescription's setName is null or wrong, the ServerDescription MUST be replaced with a default ServerDescription of
+type Unknown.
+
+#### TopologyType LoadBalanced
+
+See the [Load Balancer Specification](../load-balancers/load-balancers.md#server-discovery-logging-and-monitoring) for
+details.
+
+#### Other TopologyTypes
+
+If the TopologyType is **not** Single, the topology can contain zero or more servers. The state of topology containing
+zero servers is terminal (because servers can only be added if they are reported by a server already in the topology). A
+client SHOULD emit a warning if it is constructed with no seeds in the initial seed list. A client SHOULD emit a warning
+when, in the process of updating its topology description, it removes the last server from the topology.
+
+Whenever a client completes a hello or legacy hello call, it creates a new ServerDescription with the proper
+[ServerType](#servertype). It replaces the server's previous description in TopologyDescription.servers with the new
+one.
+
+Apply the logic for [checking wire protocol compatibility](#checking-wire-protocol-compatibility) to each
+ServerDescription in the topology. If any server's wire protocol version range does not overlap with the client's, the
+client updates the "compatible" and "compatibilityError" fields as described above for TopologyType Single. Otherwise
+"compatible" is set to true.
+
+It is possible for a multi-threaded client to receive a hello or legacy hello outcome from a server after the server has
+been removed from the TopologyDescription. For example, a monitor begins checking a server "A", then a different monitor
+receives a response from the primary claiming that "A" has been removed from the replica set, so the client removes "A"
+from the TopologyDescription. Then, the check of server "A" completes.
+
+In all cases, the client MUST ignore hello or legacy hello outcomes from servers that are not in the
+TopologyDescription.
+
+The following subsections explain in detail what actions the client takes after replacing the ServerDescription.
+
+##### TopologyType table
+
+The new ServerDescription's type is the vertical axis, and the current TopologyType is the horizontal. Where a
+ServerType and a TopologyType intersect, the table shows what action the client takes.
+
+"no-op" means, do nothing **after** replacing the server's old description with the new one.
+
+| | TopologyType Unknown | TopologyType Sharded | TopologyType ReplicaSetNoPrimary | TopologyType ReplicaSetWithPrimary |
+| ---------------------- | ----------------------------------------------------------------------------------------------- | -------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
+| ServerType Unknown | no-op | no-op | no-op | [checkIfHasPrimary](#checkifhasprimary) |
+| ServerType Standalone | [updateUnknownWithStandalone](#updateunknownwithstandalone) | [remove](#remove) | [remove](#remove) | [remove](#remove) and [checkIfHasPrimary](#checkifhasprimary) |
+| ServerType Mongos | Set topology type to Sharded | no-op | [remove](#remove) | [remove](#remove) and [checkIfHasPrimary](#checkifhasprimary) |
+| ServerType RSPrimary | Set topology type to ReplicaSetWithPrimary then [updateRSFromPrimary](#updatersfromprimary) | [remove](#remove) | Set topology type to ReplicaSetWithPrimary then [updateRSFromPrimary](#updatersfromprimary) | [updateRSFromPrimary](#updatersfromprimary) |
+| ServerType RSSecondary | Set topology type to ReplicaSetNoPrimary then [updateRSWithoutPrimary](#updaterswithoutprimary) | [remove](#remove) | [updateRSWithoutPrimary](#updaterswithoutprimary) | [updateRSWithPrimaryFromMember](#updaterswithprimaryfrommember) |
+| ServerType RSArbiter | Set topology type to ReplicaSetNoPrimary then [updateRSWithoutPrimary](#updaterswithoutprimary) | [remove](#remove) | [updateRSWithoutPrimary](#updaterswithoutprimary) | [updateRSWithPrimaryFromMember](#updaterswithprimaryfrommember) |
+| ServerType RSOther | Set topology type to ReplicaSetNoPrimary then [updateRSWithoutPrimary](#updaterswithoutprimary) | [remove](#remove) | [updateRSWithoutPrimary](#updaterswithoutprimary) | [updateRSWithPrimaryFromMember](#updaterswithprimaryfrommember) |
+| ServerType RSGhost | no-op[^2] | [remove](#remove) | no-op | [checkIfHasPrimary](#checkifhasprimary) |
+
+##### TopologyType explanations
+
+This subsection complements the [TopologyType table](#topologytype-table) with prose explanations of the TopologyTypes
+(besides Single and LoadBalanced).
+
+TopologyType Unknown\
+A starting state.
+
+**Actions**:
+
+- If the incoming ServerType is Unknown (that is, the hello or legacy hello call failed), keep the server in
+ TopologyDescription.servers. The TopologyType remains Unknown.
+- The
+ [TopologyType remains Unknown when an RSGhost is discovered](#topologytype-remains-unknown-when-an-rsghost-is-discovered),
+ too.
+- If the type is Standalone, run [updateUnknownWithStandalone](#updateunknownwithstandalone).
+- If the type is Mongos, set the TopologyType to Sharded.
+- If the type is RSPrimary, record its setName and call [updateRSFromPrimary](#updatersfromprimary).
+- If the type is RSSecondary, RSArbiter or RSOther, record its setName, set the TopologyType to ReplicaSetNoPrimary, and
+ call [updateRSWithoutPrimary](#updaterswithoutprimary).
+
+TopologyType Sharded\
+A steady state. Connected to one or more mongoses.
+
+**Actions**:
+
+- If the server is Unknown or Mongos, keep it.
+- Remove others.
+
+TopologyType ReplicaSetNoPrimary\
+A starting state. The topology is definitely a replica set, but no primary is known.
+
+**Actions**:
+
+- Keep Unknown servers.
+- Keep RSGhost servers: they are members of some replica set, perhaps this one, and may recover. (See
+ [RSGhost and RSOther](#rsghost-and-rsother).)
+- Remove any Standalones or Mongoses.
+- If the type is RSPrimary call [updateRSFromPrimary](#updatersfromprimary).
+- If the type is RSSecondary, RSArbiter or RSOther, run [updateRSWithoutPrimary](#updaterswithoutprimary).
+
+TopologyType ReplicaSetWithPrimary\
+A steady state. The primary is known.
+
+**Actions**:
+
+- If the server type is Unknown, keep it, and run [checkIfHasPrimary](#checkifhasprimary).
+- Keep RSGhost servers: they are members of some replica set, perhaps this one, and may recover. (See
+ [RSGhost and RSOther](#rsghost-and-rsother).) Run [checkIfHasPrimary](#checkifhasprimary).
+- Remove any Standalones or Mongoses and run [checkIfHasPrimary](#checkifhasprimary).
+- If the type is RSPrimary run [updateRSFromPrimary](#updatersfromprimary).
+- If the type is RSSecondary, RSArbiter or RSOther, run [updateRSWithPrimaryFromMember](#updaterswithprimaryfrommember).
+
+#### Actions
+
+##### updateUnknownWithStandalone
+
+This subroutine is executed with the ServerDescription from Standalone when the TopologyType is Unknown:
+
+```python
+if description.address not in topologyDescription.servers:
+ return
+
+if settings.seeds has one seed:
+ topologyDescription.type = Single
+else:
+ remove this server from topologyDescription and stop monitoring it
+```
+
+See
+[TopologyType remains Unknown when one of the seeds is a Standalone](#topologytype-remains-unknown-when-one-of-the-seeds-is-a-standalone).
+
+##### updateRSWithoutPrimary
+
+This subroutine is executed with the ServerDescription from an RSSecondary, RSArbiter, or RSOther when the TopologyType
+is ReplicaSetNoPrimary:
+
+```python
+if description.address not in topologyDescription.servers:
+ return
+
+if topologyDescription.setName is null:
+ topologyDescription.setName = description.setName
+
+else if topologyDescription.setName != description.setName:
+ remove this server from topologyDescription and stop monitoring it
+ return
+
+for each address in description's "hosts", "passives", and "arbiters":
+ if address is not in topologyDescription.servers:
+ add new default ServerDescription of type "Unknown"
+ begin monitoring the new server
+
+if description.primary is not null:
+ find the ServerDescription in topologyDescription.servers whose
+ address equals description.primary
+
+ if its type is Unknown, change its type to PossiblePrimary
+
+if description.address != description.me:
+ remove this server from topologyDescription and stop monitoring it
+ return
+```
+
+Unlike [updateRSFromPrimary](#updatersfromprimary), this subroutine does **not** remove any servers from the
+TopologyDescription based on the list of servers in the "hosts" field of the hello or legacy hello response. The only
+server that might be removed is the server itself that the hello or legacy hello response is from.
+
+The special handling of description.primary ensures that a single-threaded client [scans](#scan) the possible primary
+before other members.
+
+See [replica set monitoring with and without a primary](#replica-set-monitoring-with-and-without-a-primary).
+
+##### updateRSWithPrimaryFromMember
+
+This subroutine is executed with the ServerDescription from an RSSecondary, RSArbiter, or RSOther when the TopologyType
+is ReplicaSetWithPrimary:
+
+```python
+if description.address not in topologyDescription.servers:
+ # While we were checking this server, another thread heard from the
+ # primary that this server is not in the replica set.
+ return
+
+# SetName is never null here.
+if topologyDescription.setName != description.setName:
+ remove this server from topologyDescription and stop monitoring it
+ checkIfHasPrimary()
+ return
+
+if description.address != description.me:
+ remove this server from topologyDescription and stop monitoring it
+ checkIfHasPrimary()
+ return
+
+# Had this member been the primary?
+if there is no primary in topologyDescription.servers:
+ topologyDescription.type = ReplicaSetNoPrimary
+
+ if description.primary is not null:
+ find the ServerDescription in topologyDescription.servers whose
+ address equals description.primary
+
+ if its type is Unknown, change its type to PossiblePrimary
+```
+
+The special handling of description.primary ensures that a single-threaded client [scans](#scan) the possible primary
+before other members.
+
+##### updateRSFromPrimary
+
+This subroutine is executed with a ServerDescription of type RSPrimary:
+
+```python
+if serverDescription.address not in topologyDescription.servers:
+ return
+
+if topologyDescription.setName is null:
+ topologyDescription.setName = serverDescription.setName
+
+else if topologyDescription.setName != serverDescription.setName:
+ # We found a primary but it doesn't have the setName
+ # provided by the user or previously discovered.
+ remove this server from topologyDescription and stop monitoring it
+ checkIfHasPrimary()
+ return
+
+# Election ids are ObjectIds, see
+# see "Using electionId and setVersion to detect stale primaries"
+# for comparison rules.
+
+if serverDescription.maxWireVersion >= 17: # MongoDB 6.0+
+ # Null values for both electionId and setVersion are always considered less than
+ if serverDescription.electionId > topologyDescription.maxElectionId or (
+ serverDescription.electionId == topologyDescription.maxElectionId
+ and serverDescription.setVersion >= topologyDescription.maxSetVersion
+ ):
+ topologyDescription.maxElectionId = serverDescription.electionId
+ topologyDescription.maxSetVersion = serverDescription.setVersion
+ else:
+ # Stale primary.
+ # replace serverDescription with a default ServerDescription of type "Unknown"
+ checkIfHasPrimary()
+ return
+else:
+ # Maintain old comparison rules, namely setVersion is checked before electionId
+ if serverDescription.setVersion is not null and serverDescription.electionId is not null:
+ if (
+ topologyDescription.maxSetVersion is not null
+ and topologyDescription.maxElectionId is not null
+ and (
+ topologyDescription.maxSetVersion > serverDescription.setVersion
+ or (
+ topologyDescription.maxSetVersion == serverDescription.setVersion
+ and topologyDescription.maxElectionId > serverDescription.electionId
+ )
+ )
+ ):
+ # Stale primary.
+ # replace serverDescription with a default ServerDescription of type "Unknown"
+ checkIfHasPrimary()
+ return
+
+ topologyDescription.maxElectionId = serverDescription.electionId
+
+ if serverDescription.setVersion is not null and (
+ topologyDescription.maxSetVersion is null
+ or serverDescription.setVersion > topologyDescription.maxSetVersion
+ ):
+ topologyDescription.maxSetVersion = serverDescription.setVersion
+
+
+for each server in topologyDescription.servers:
+ if server.address != serverDescription.address:
+ if server.type is RSPrimary:
+ # See note below about invalidating an old primary.
+ replace the server with a default ServerDescription of type "Unknown"
+
+for each address in serverDescription's "hosts", "passives", and "arbiters":
+ if address is not in topologyDescription.servers:
+ add new default ServerDescription of type "Unknown"
+ begin monitoring the new server
+
+for each server in topologyDescription.servers:
+ if server.address not in serverDescription's "hosts", "passives", or "arbiters":
+ remove the server and stop monitoring it
+
+checkIfHasPrimary()
+```
+
+A note on invalidating the old primary: when a new primary is discovered, the client finds the previous primary (there
+should be none or one) and replaces its description with a default ServerDescription of type "Unknown." A multi-threaded
+client MUST [request an immediate check](server-monitoring.rst#requesting-an-immediate-check) for that server as soon as
+possible.
+
+If the old primary server version is 4.0 or earlier, the client MUST clear its connection pool for the old primary, too:
+the connections are all bad because the old primary has closed its sockets. If the old primary server version is 4.2 or
+newer, the client MUST NOT clear its connection pool for the old primary.
+
+See [replica set monitoring with and without a primary](#replica-set-monitoring-with-and-without-a-primary).
+
+If the server is primary with an obsolete electionId or setVersion, it is likely a stale primary that is going to step
+down. Mark it Unknown and let periodic monitoring detect when it becomes secondary. See
+[using electionId and setVersion to detect stale primaries](#using-electionid-and-setversion-to-detect-stale-primaries).
+
+A note on checking "me": Unlike `updateRSWithPrimaryFromMember`, there is no need to remove the server if the address is
+not equal to "me": since the server address will not be a member of either "hosts", "passives", or "arbiters", the
+server will already have been removed.
+
+##### checkIfHasPrimary
+
+Set TopologyType to ReplicaSetWithPrimary if there is an RSPrimary in TopologyDescription.servers, otherwise set it to
+ReplicaSetNoPrimary.
+
+For example, if the TopologyType is ReplicaSetWithPrimary and the client is processing a new ServerDescription of type
+Unknown, that could mean the primary just disconnected, so checkIfHasPrimary must run to check if the TopologyType
+should become ReplicaSetNoPrimary.
+
+Another example is if the client first reaches the primary via its external IP, but the response's host list includes
+only internal IPs. In that case the client adds the primary's internal IP to the TopologyDescription and begins
+monitoring it, and removes the external IP. Right after removing the external IP from the description, the TopologyType
+MUST be ReplicaSetNoPrimary, since no primary is available at this moment.
+
+##### remove
+
+Remove the server from TopologyDescription.servers and stop monitoring it.
+
+In multi-threaded clients, a monitor may be currently checking this server and may not immediately abort. Once the check
+completes, this server's hello or legacy hello outcome MUST be ignored, and the monitor SHOULD halt.
+
+#### Logical Session Timeout
+
+Whenever a client updates the TopologyDescription from a hello or legacy hello response, it MUST set
+TopologyDescription.logicalSessionTimeoutMinutes to the smallest logicalSessionTimeoutMinutes value among
+ServerDescriptions of all data-bearing server types. If any have a null logicalSessionTimeoutMinutes, then
+TopologyDescription.logicalSessionTimeoutMinutes MUST be set to null.
+
+See the Driver Sessions Spec for the purpose of this value.
+
+### Connection Pool Management
+
+For drivers that support connection pools, after a server check is completed successfully, if the server is determined
+to be [data-bearing](server-discovery-and-monitoring.md#data-bearing-server-type) or a
+[direct connection](server-discovery-and-monitoring.md#general-requirements) to the server is requested, and does not
+already have a connection pool, the driver MUST create the connection pool for the server. Additionally, if a driver
+implements a CMAP compliant connection pool, the server's pool (even if it already existed) MUST be marked as "ready".
+See the [Server Monitoring spec](server-monitoring.rst) for more information.
+
+Clearing the connection pool for a server MUST be synchronized with the update to the corresponding ServerDescription
+(e.g. by holding the lock on the TopologyDescription when clearing the pool). This prevents a possible race between the
+monitors and application threads. See
+[Why synchronize clearing a server's pool with updating the topology?](#why-synchronize-clearing-a-servers-pool-with-updating-the-topology)
+for more information.
+
+### Error handling
+
+#### Network error during server check
+
+See error handling in the [Server Monitoring spec](server-monitoring.rst).
+
+#### Application errors
+
+When processing a network or command error, clients MUST first check the error's \[generation number\](#generation
+number). If the error's generation number is equal to the pool's generation number then error handling MUST continue
+according to [Network error when reading or writing](#network-error-when-reading-or-writing) or
+["not writable primary" and "node is recovering"](#not-writable-primary-and-node-is-recovering). Otherwise, the error is
+considered stale and the client MUST NOT update any topology state. (See
+[Why ignore errors based on CMAP's generation number?](#why-ignore-errors-based-on-cmaps-generation-number))
+
+##### Error handling pseudocode
+
+Application operations can fail in various places, for example:
+
+- A network error, network timeout, or command error may occur while establishing a new connection. Establishing a
+ connection includes the MongoDB handshake and completing authentication (if configured).
+- A network error or network timeout may occur while reading or writing to an established connection.
+- A command error may be returned from the server.
+- A "writeConcernError" field may be included in the command response.
+
+Depending on the context, these errors may update SDAM state by marking the server Unknown and may clear the server's
+connection pool. Some errors also require other side effects, like cancelling a check or requesting an immediate check.
+Drivers may use the following pseudocode to guide their implementation:
+
+```python
+def handleError(error):
+ address = error.address
+ topologyVersion = error.topologyVersion
+
+ with client.lock:
+ # Ignore stale errors based on generation and topologyVersion.
+ if isStaleError(client.topologyDescription, error)
+ return
+
+ if isStateChangeError(error):
+ # Don't mark server unknown in load balanced mode.
+ if type != LoadBalanced
+ # Mark the server Unknown
+ unknown = new ServerDescription(type=Unknown, error=error, topologyVersion=topologyVersion)
+ onServerDescriptionChanged(unknown, connection pool for server)
+ if isShutdown(code) or (error was from <4.2):
+ # the pools must only be cleared while the lock is held.
+ if type == LoadBalanced:
+ clear connection pool for serviceId
+ else:
+ clear connection pool for server
+ if multi-threaded:
+ request immediate check
+ else:
+ # Check right now if this is "not writable primary", since it might be a
+ # useful secondary. If it's "node is recovering" leave it for the
+ # next full scan.
+ if isNotWritablePrimary(error):
+ check failing server
+ elif isNetworkError(error) or (not error.completedHandshake and (isNetworkTimeout(error) or isAuthError(error))):
+ if type != LoadBalanced
+ # Mark the server Unknown
+ unknown = new ServerDescription(type=Unknown, error=error)
+ onServerDescriptionChanged(unknown, connection pool for server)
+ clear connection pool for server
+ else
+ if serviceId
+ clear connection pool for serviceId
+ # Cancel inprogress check
+ cancel monitor check
+
+def isStaleError(topologyDescription, error):
+ currentServer = topologyDescription.servers[server.address]
+ currentGeneration = currentServer.pool.generation
+ generation = get connection generation from error
+ if generation < currentGeneration:
+ # Stale generation number.
+ return True
+
+ currentTopologyVersion = currentServer.topologyVersion
+ # True if the current error's topologyVersion is greater than the server's
+ # We use >= instead of > because any state change should result in a new topologyVersion
+ return compareTopologyVersion(currentTopologyVersion, error.commandResponse.get("topologyVersion")) >= 0
+```
+
+The following pseudocode checks a response for a "not master" or "node is recovering" error:
+
+```python
+recoveringCodes = [11600, 11602, 13436, 189, 91]
+notWritablePrimaryCodes = [10107, 13435, 10058]
+shutdownCodes = [11600, 91]
+
+def isRecovering(message, code):
+ if code:
+ if code in recoveringCodes:
+ return true
+ else:
+ # if no code, use the error message.
+ return ("not master or secondary" in message
+ or "node is recovering" in message)
+
+def isNotWritablePrimary(message, code):
+ if code:
+ if code in notWritablePrimaryCodes:
+ return true
+ else:
+ # if no code, use the error message.
+ if isRecovering(message, None):
+ return false
+ return ("not master" in message)
+
+def isShutdown(code):
+ if code and code in shutdownCodes:
+ return true
+ return false
+
+def isStateChangeError(error):
+ message = error.errmsg
+ code = error.code
+ return isRecovering(message, code) or isNotWritablePrimary(message, code)
+
+def parseGle(response):
+ if "err" in response:
+ handleError(CommandError(response, response["err"], response["code"]))
+
+# Parse response to any command besides getLastError.
+def parseCommandResponse(response):
+ if not response["ok"]:
+ handleError(CommandError(response, response["errmsg"], response["code"]))
+ else if response["writeConcernError"]:
+ wce = response["writeConcernError"]
+ handleError(WriteConcernError(response, wce["errmsg"], wce["code"]))
+
+def parseQueryResponse(response):
+ if the "QueryFailure" bit is set in response flags:
+ handleError(CommandError(response, response["$err"], response["code"]))
+```
+
+The following sections describe the handling of different classes of application errors in detail including network
+errors, network timeout errors, state change errors, and authentication errors.
+
+##### Network error when reading or writing
+
+To describe how the client responds to network errors during application operations, we distinguish two phases of
+connecting to a server and using it for application operations:
+
+- *Before the handshake completes*: the client establishes a new connection to the server and completes an initial
+ handshake by calling "hello" or legacy hello and reading the response, and optionally completing authentication
+- *After the handshake completes*: the client uses the established connection for application operations
+
+If there is a network error or timeout on the connection before the handshake completes, the client MUST replace the
+server's description with a default ServerDescription of type Unknown when the TopologyType is not LoadBalanced, and
+fill the ServerDescription's error field with useful information.
+
+If there is a network error or timeout on the connection before the handshake completes, and the TopologyType is
+LoadBalanced, the client MUST keep the ServerDescription as LoadBalancer.
+
+If there is a network timeout on the connection after the handshake completes, the client MUST NOT mark the server
+Unknown. (A timeout may indicate a slow operation on the server, rather than an unavailable server.) If, however, there
+is some other network error on the connection after the handshake completes, the client MUST replace the server's
+description with a default ServerDescription of type Unknown if the TopologyType is not LoadBalanced, and fill the
+ServerDescription's error field with useful information, the same as if an error or timeout occurred before the
+handshake completed.
+
+When the client marks a server Unknown due to a network error or timeout, the Unknown ServerDescription MUST be sent
+through the same process for [updating the TopologyDescription](#updating-the-topologydescription) as if it had been a
+failed hello or legacy hello outcome from a server check: for example, if the TopologyType is ReplicaSetWithPrimary and
+a write to the RSPrimary server fails because of a network error (other than timeout), then a new ServerDescription is
+created for the primary, with type Unknown, and the client executes the proper subroutine for an Unknown server when the
+TopologyType is ReplicaSetWithPrimary: referring to the table above we see the subroutine is
+[checkIfHasPrimary](#checkifhasprimary). The result is the TopologyType changes to ReplicaSetNoPrimary. See the test
+scenario called "Network error writing to primary".
+
+The client MUST close all idle sockets in its connection pool for the server: if one socket is bad, it is likely that
+all are.
+
+Clients MUST NOT request an immediate check of the server; since application sockets are used frequently, a network
+error likely means the server has just become unavailable, so an immediate refresh is likely to get a network error,
+too.
+
+The server will not remain Unknown forever. It will be refreshed by the next periodic check or, if an application
+operation needs the server sooner than that, then a re-check will be triggered by the server selection algorithm.
+
+##### "not writable primary" and "node is recovering"
+
+These errors are detected from a getLastError response, write command response, or query response. Clients MUST check if
+the server error is a "node is recovering" error or a "not writable primary" error.
+
+If the response includes an error code, it MUST be solely used to determine if error is a "node is recovering" or "not
+writable primary" error. Clients MUST match the errors by the numeric error code and not by the code name, as the code
+name can change from one server version to the next.
+
+The following error codes indicate a replica set member is temporarily unusable. These are called "node is recovering"
+errors:
+
+| Error Name | Error Code |
+| ------------------------------- | ---------- |
+| InterruptedAtShutdown | 11600 |
+| InterruptedDueToReplStateChange | 11602 |
+| NotPrimaryOrSecondary | 13436 |
+| PrimarySteppedDown | 189 |
+| ShutdownInProgress | 91 |
+
+And the following error codes indicate a "not writable primary" error:
+
+| Error Name | Error Code |
+| ----------------------- | ---------- |
+| NotWritablePrimary | 10107 |
+| NotPrimaryNoSecondaryOk | 13435 |
+| LegacyNotPrimary | 10058 |
+
+Clients MUST fallback to checking the error message if and only if the response does not include an error code. The
+error is considered a "node is recovering" error if the substrings "node is recovering" or "not master or secondary" are
+anywhere in the error message. Otherwise, if the substring "not master" is in the error message it is a "not writable
+primary" error.
+
+Additionally, if the response includes a write concern error, then the code and message of the write concern error MUST
+be checked the same way a response error is checked above.
+
+Errors contained within the writeErrors field MUST NOT be checked.
+
+See the test scenario called "parsing 'not writable primary' and 'node is recovering' errors" for example response
+documents.
+
+When the client sees a "not writable primary" or "node is recovering" error and the error's
+[topologyVersion](#topologyversion) is strictly greater than the current ServerDescription's topologyVersion it MUST
+replace the server's description with a ServerDescription of type Unknown. Clients MUST store useful information in the
+new ServerDescription's error field, including the error message from the server. Clients MUST store the error's
+[topologyVersion](#topologyversion) field in the new ServerDescription if present. (See
+[What is the purpose of topologyVersion?](#what-is-the-purpose-of-topologyversion))
+
+Multi-threaded and asynchronous clients MUST
+[request an immediate check](server-monitoring.rst#requesting-an-immediate-check) of the server. Unlike in the "network
+error" scenario above, a "not writable primary" or "node is recovering" error means the server is available but the
+client is wrong about its type, thus an immediate re-check is likely to provide useful information.
+
+For single-threaded clients, in the case of a "not writable primary" or "node is shutting down" error, the client MUST
+mark the topology as "stale" so the next server selection scans all servers. For a "node is recovering" error,
+single-threaded clients MUST NOT mark the topology as "stale". If a node is recovering for some time, an immediate scan
+may not gain useful information.
+
+The following subset of "node is recovering" errors is defined to be "node is shutting down" errors:
+
+| Error Name | Error Code |
+| --------------------- | ---------- |
+| InterruptedAtShutdown | 11600 |
+| ShutdownInProgress | 91 |
+
+When handling a "not writable primary" or "node is recovering" error, the client MUST clear the server's connection pool
+if and only if the error is "node is shutting down" or the error originated from server version \< 4.2.
+
+(See
+[when does a client see "not writable primary" or "node is recovering"?](#when-does-a-client-see-not-writable-primary-or-node-is-recovering),
+[use error messages to detect "not master" and "node is recovering"](#use-error-messages-to-detect-not-master-and-node-is-recovering),
+and [other transient errors](#other-transient-errors) and
+[Why close connections when a node is shutting down?](#why-close-connections-when-a-node-is-shutting-down).)
+
+##### Authentication errors
+
+If the authentication handshake fails for a connection, drivers MUST mark the server Unknown and clear the server's
+connection pool if the TopologyType is not LoadBalanced. (See
+[Why mark a server Unknown after an auth error?](#why-mark-a-server-unknown-after-an-auth-error))
+
+### Monitoring SDAM events
+
+The required driver specification for providing lifecycle hooks into server discovery and monitoring for applications to
+consume can be found in the [SDAM Monitoring Specification](server-discovery-and-monitoring-logging-and-monitoring.rst).
+
+### Implementation notes
+
+This section intends to provide generous guidance to driver authors. It is complementary to the reference
+implementations. Words like "should", "may", and so on are used more casually here.
+
+See also, the implementation notes in the [Server Monitoring spec](server-monitoring.rst).
+
+#### Multi-threaded or asynchronous server selection
+
+While no suitable server is available for an operation,
+[the client MUST re-check all servers every minHeartbeatFrequencyMS](#the-client-must-re-check-all-servers-every-minheartbeatfrequencyms).
+(See [requesting an immediate check](server-monitoring.rst#requesting-an-immediate-check).)
+
+#### Single-threaded server selection
+
+When a client that uses [single-threaded monitoring](server-monitoring.rst#single-threaded-monitoring) fails to select a
+suitable server for any operation, it [scans](#scan) the servers, then attempts selection again, to see if the scan
+discovered suitable servers. It repeats, waiting [minHeartbeatFrequencyMS](#minheartbeatfrequencyms) after each scan,
+until a timeout.
+
+#### Documentation
+
+##### Giant seed lists
+
+Drivers' manuals should warn against huge seed lists, since it will slow initialization for single-threaded clients and
+generate load for multi-threaded and asynchronous drivers.
+
+#### Multi-threaded
+
+#### Warning about the maxWireVersion from a monitor's hello or legacy hello response
+
+Clients consult some fields from a server's hello or legacy hello response to decide how to communicate with it:
+
+- maxWireVersion
+- maxBsonObjectSize
+- maxMessageSizeBytes
+- maxWriteBatchSize
+
+It is tempting to take these values from the last hello or legacy hello response a *monitor* received and store them in
+the ServerDescription, but this is an anti-pattern. Multi-threaded and asynchronous clients that do so are prone to
+several classes of race, for example:
+
+- Setup: A MongoDB 3.0 Standalone with authentication enabled, the client must log in with SCRAM-SHA-1.
+- The monitor thread discovers the server and stores maxWireVersion on the ServerDescription
+- An application thread wants a socket, selects the Standalone, and is about to check the maxWireVersion on its
+ ServerDescription when...
+- The monitor thread gets disconnected from server and marks it Unknown, with default maxWireVersion of 0.
+- The application thread resumes, creates a socket, and attempts to log in using MONGODB-CR, since maxWireVersion is
+ *now* reported as 0.
+- Authentication fails, the server requires SCRAM-SHA-1.
+
+Better to call hello or legacy hello for each new socket, as required by the [Auth Spec](../auth/auth.md), and use the
+hello or legacy hello response associated with that socket for maxWireVersion, maxBsonObjectSize, etc.: all the fields
+required to correctly communicate with the server.
+
+The hello or legacy hello responses received by monitors determine if the topology as a whole \[is compatible\](#is
+compatible) with the driver, and which servers are suitable for selection. The monitors' responses should not be used to
+determine how to format wire protocol messages to the servers.
+
+##### Immutable data
+
+Multi-threaded drivers should treat ServerDescriptions and TopologyDescriptions as immutable: the client replaces them,
+rather than modifying them, in response to new information about the topology. Thus readers of these data structures can
+simply acquire a reference to the current one and read it, without holding a lock that would block a monitor from making
+further updates.
+
+##### Process one hello or legacy hello outcome at a time
+
+Although servers are checked in parallel, the function that actually creates the new TopologyDescription should be
+synchronized so only one thread can run it at a time.
+
+##### Replacing the TopologyDescription
+
+Drivers may use the following pseudocode to guide their implementation. The client object has a lock and a condition
+variable. It uses the lock to ensure that only one new ServerDescription is processed at a time, and it must be acquired
+before invoking this function. Once the client has taken the lock it must do no I/O:
+
+```python
+def onServerDescriptionChanged(server, pool):
+ # "server" is the new ServerDescription.
+ # "pool" is the pool associated with the server
+
+ if server.address not in client.topologyDescription.servers:
+ # The server was once in the topologyDescription, otherwise
+ # we wouldn't have been monitoring it, but an intervening
+ # state-change removed it. E.g., we got a host list from
+ # the primary that didn't include this server.
+ return
+
+ newTopologyDescription = client.topologyDescription.copy()
+
+ # Ignore this update if the current topologyVersion is greater than
+ # the new ServerDescription's.
+ if isStaleServerDescription(td, server):
+ return
+
+ # Replace server's previous description.
+ address = server.address
+ newTopologyDescription.servers[address] = server
+
+ # for drivers that implement CMAP, mark the connection pool as ready after a successful check
+ if (server.type in (Mongos, RSPrimary, RSSecondary, Standalone, LoadBalanced))
+ or (server.type != Unknown and newTopologyDescription.type == Single):
+ pool.ready()
+
+ take any additional actions,
+ depending on the TopologyType and server...
+
+ # Replace TopologyDescription and notify waiters.
+ client.topologyDescription = newTopologyDescription
+ client.condition.notifyAll()
+
+def compareTopologyVersion(tv1, tv2):
+ """Return -1 if tv1tv2"""
+ if tv1 is None or tv2 is None:
+ # Assume greater.
+ return -1
+ pid1 = tv1['processId']
+ pid2 = tv2['processId']
+ if pid1 == pid2:
+ counter1 = tv1['counter']
+ counter2 = tv2['counter']
+ if counter1 == counter2:
+ return 0
+ elif counter1 < counter2:
+ return -1
+ else:
+ return 1
+ else:
+ # Assume greater.
+ return -1
+
+def isStaleServerDescription(topologyDescription, server):
+ # True if the new ServerDescription's topologyVersion is greater than
+ # or equal to the current server's.
+ currentServer = topologyDescription.servers[server.address]
+ currentTopologyVersion = currentServer.topologyVersion
+ return compareTopologyVersion(currentTopologyVersion, server.topologyVersion) > 0
+```
+
+Notifying the condition unblocks threads waiting in the server-selection loop for a suitable server to be discovered.
+
+> [!NOTE]
+> The Java driver uses a CountDownLatch instead of a condition variable, and it atomically swaps the old and new
+> CountDownLatches so it does not need "client.lock". It does, however, use a lock to ensure that only one thread runs
+> onServerDescriptionChanged at a time.
+
+## Rationale
+
+### Clients do no I/O in the constructor
+
+An alternative proposal was to distinguish between "discovery" and "monitoring". When discovery begins, the client
+checks all its seeds, and discovery is complete once all servers have been checked, or after some maximum time.
+Application operations cannot proceed until discovery is complete.
+
+If the discovery phase is distinct, then single- and multi-threaded drivers could accomplish discovery in the
+constructor, and throw an exception from the constructor if the deployment is unavailable or misconfigured. This is
+consistent with prior behavior for many drivers. It will surprise some users that the constructor now succeeds, but all
+operations fail.
+
+Similarly for misconfigured seed lists: the client may discover a mix of mongoses and standalones, or find multiple
+replica set names. It may surprise some users that the constructor succeeds and the client attempts to proceed with a
+compatible subset of the deployment.
+
+Nevertheless, this spec prohibits I/O in the constructor for the following reasons:
+
+#### Common case
+
+In the common case, the deployment is available and usable. This spec favors allowing operations to proceed as soon as
+possible in the common case, at the cost of surprising behavior in uncommon cases.
+
+#### Simplicity
+
+It is simpler to omit a special discovery phase and treat all server [checks](#check) the same.
+
+#### Consistency
+
+Asynchronous clients cannot do I/O in a constructor, so it is consistent to prohibit I/O in other clients' constructors
+as well.
+
+#### Restarts
+
+If clients can be constructed when the deployment is in some states but not in other states, it leads to an unfortunate
+scenario: When the deployment is passing through a strange state, long-running clients may keep working, but any clients
+restarted during this period fail.
+
+Say an administrator changes one replica set member's setName. Clients that are already constructed remove the bad
+member and stay usable, but if any client is restarted its constructor fails. Web servers that dynamically adjust their
+process pools will show particularly undesirable behavior.
+
+### heartbeatFrequencyMS defaults to 10 seconds or 60 seconds
+
+Many drivers have different values. The time has come to standardize. Lacking a rigorous methodology for calculating the
+best frequency, this spec chooses 10 seconds for multi-threaded or asynchronous drivers because some already use that
+value.
+
+Because scanning has a greater impact on the performance of single-threaded drivers, they MUST default to a longer
+frequency (60 seconds).
+
+An alternative is to check servers less and less frequently the longer they remain unchanged. This idea is rejected
+because it is a goal of this spec to answer questions about monitoring such as,
+
+- "How rapidly can I rotate a replica set to a new set of hosts?"
+- "How soon after I add a secondary will query load be rebalanced?"
+- "How soon will a client notice a change in round trip time, or tags?"
+
+Having a constant monitoring frequency allows us to answer these questions simply and definitively. Losing the ability
+to answer these questions is not worth any minor gain in efficiency from a more complex scheduling method.
+
+### The client MUST re-check all servers every minHeartbeatFrequencyMS
+
+While an application is waiting to do an operation for which there is no suitable server, a multi-threaded client MUST
+re-check all servers very frequently. The slight cost is worthwhile in many scenarios. For example:
+
+1. A client and a MongoDB server are started simultaneously.
+2. The client checks the server before it begins listening, so the check fails.
+3. The client waits in the server-selection loop for the topology to change.
+
+In this state, the client should check the server very frequently, to give it ample opportunity to connect to the server
+before timing out in server selection.
+
+### No knobs
+
+This spec does not intend to introduce any new configuration options unless absolutely necessary.
+
+### The client MUST monitor arbiters
+
+Mongos 2.6 does not monitor arbiters, but it costs little to do so, and in the rare case that all data members are moved
+to new hosts in a short time, an arbiter may be the client's last hope to find the new replica set configuration.
+
+### Only support replica set members running MongoDB 1.6.2 or later
+
+Replica set members began reporting their setNames in that version. Supporting earlier versions is impractical.
+
+### TopologyType remains Unknown when an RSGhost is discovered
+
+If the TopologyType is Unknown and the client receives a hello or legacy hello response from
+an[RSGhost](#rsghost-and-rsother), the TopologyType could be set to ReplicaSetNoPrimary. However, an RSGhost does not
+report its setName, so the setName would still be unknown. This adds an additional state to the existing list:
+"TopologyType ReplicaSetNoPrimary **and** no setName." The additional state adds substantial complexity without any
+benefit, so this spec says clients MUST NOT change the TopologyType when an RSGhost is discovered.
+
+### TopologyType remains Unknown when one of the seeds is a Standalone
+
+If TopologyType is Unknown and there are multiple seeds, and one of them is discovered to be a standalone, it MUST be
+removed. The TopologyType remains Unknown.
+
+This rule supports the following common scenario:
+
+1. Servers A and B are in a replica set.
+2. A seed list with A and B is stored in a configuration file.
+3. An administrator removes B from the set and brings it up as standalone for maintenance, without changing its port
+ number.
+4. The client is initialized with seeds A and B, TopologyType Unknown, and no setName.
+5. The first hello or legacy hello response is from B, the standalone.
+
+What if the client changed TopologyType to Single at this point? It would be unable to use the replica set; it would
+have to remove A from the TopologyDescription once A's hello or legacy hello response comes.
+
+The user's intent in this case is clearly to use the replica set, despite the outdated seed list. So this spec requires
+clients to remove B from the TopologyDescription and keep the TopologyType as Unknown. Then when A's response arrives,
+the client can set its TopologyType to ReplicaSet (with or without primary).
+
+On the other hand, if there is only one seed and the seed is discovered to be a Standalone, the TopologyType MUST be set
+to Single.
+
+See the "member brought up as standalone" test scenario.
+
+### Replica set monitoring with and without a primary
+
+The client strives to fill the "servers" list only with servers that the **primary** said were members of the replica
+set, when the client most recently contacted the primary.
+
+The primary's view of the replica set is authoritative for two reasons:
+
+1. The primary is never on the minority side of a network partition. During a partition it is the primary's list of
+ servers the client should use.
+2. Since reconfigs must be executed on the primary, the primary is the first to know of them. Reconfigs propagate to
+ non-primaries eventually, but the client can receive hello or legacy hello responses from non-primaries that reflect
+ any past state of the replica set. See the "Replica set discovery" test scenario.
+
+If at any time the client believes there is no primary, the TopologyDescription's type is set to ReplicaSetNoPrimary.
+While there is no known primary, the client MUST **add** servers from non-primaries' host lists, but it MUST NOT remove
+servers from the TopologyDescription.
+
+Eventually, when a primary is discovered, any hosts not in the primary's host list are removed.
+
+### Using electionId and setVersion to detect stale primaries
+
+Replica set members running MongoDB 2.6.10+ or 3.0+ include an integer called "setVersion" and an ObjectId called
+"electionId" in their hello or legacy hello response. Starting with MongoDB 3.2.0, replica sets can use two different
+replication protocol versions; electionIds from one protocol version must not be compared to electionIds from a
+different protocol.
+
+Because protocol version changes require replica set reconfiguration, clients use the tuple (electionId, setVersion) to
+detect stale primaries. The tuple order comparison MUST be checked in the order of electionId followed by setVersion
+since that order of comparison is guaranteed monotonicity.
+
+The client remembers the greatest electionId and setVersion reported by a primary, and distrusts primaries from older
+electionIds or from the same electionId but with lesser setVersion.
+
+- It compares electionIds as 12-byte sequence i.e. memory comparison.
+- It compares setVersions as integer values.
+
+This prevents the client from oscillating between the old and new primary during a split-brain period, and helps provide
+read-your-writes consistency with write concern "majority" and read preference "primary".
+
+Prior to MongoDB server version 6.0 drivers had the logic opposite from the server side Replica Set Management logic by
+ordering the tuple by `setVersion` before the `electionId`. In order to remain compatibility with backup systems, etc.
+drivers continue to maintain the reversed logic when connected to a topology that reports a maxWireVersion less than
+`17`. Server versions 6.0 and beyond MUST order the tuple by `electionId` then `setVersion`.
+
+#### Requirements for read-your-writes consistency
+
+Using (electionId, setVersion) only provides read-your-writes consistency if:
+
+- The application uses the same MongoClient instance for write-concern "majority" writes and read-preference "primary"
+ reads, and
+- All members use MongoDB 2.6.10+, 3.0.0+ or 3.2.0+ with replication protocol 0 and clocks are *less* than 30 seconds
+ skewed, or
+- All members run MongoDB 3.2.0 and replication protocol 1 and clocks are *less* skewed than the election timeout
+ (`electionTimeoutMillis`, which defaults to 10 seconds), or
+- All members run MongoDB 3.2.1+ and replication protocol 1 (in which case clocks need not be synchronized).
+
+#### Scenario
+
+Consider the following situation:
+
+1. Server A is primary.
+2. A network partition isolates A from the set, but the client still sees it.
+3. Server B is elected primary.
+4. The client discovers that B is primary, does a write-concern "majority" write operation on B and receives
+ acknowledgment.
+5. The client receives a hello or legacy hello response from A, claiming A is still primary.
+6. If the client trusts that A is primary, the next read-preference "primary" read sees stale data from A that may *not*
+ include the write sent to B.
+
+See [SERVER-17975](https://jira.mongodb.org/browse/SERVER-17975), "Stale reads with WriteConcern Majority and
+ReadPreference Primary."
+
+#### Detecting a stale primary
+
+To prevent this scenario, the client uses electionId and setVersion to determine which primary was elected last. In this
+case, it would not consider "A" a primary, nor read from it because server B will have a greater electionId but the same
+setVersion.
+
+#### Monotonicity
+
+The electionId is an ObjectId compared bytewise in order.
+
+(ie. 000000000000000000000001 > 000000000000000000000000, FF0000000000000000000000 > FE0000000000000000000000 etc.)
+
+In some server versions, it is monotonic with respect to a particular servers' system clock, but is not globally
+monotonic across a deployment. However, if inter-server clock skews are small, it can be treated as a monotonic value.
+
+In MongoDB 2.6.10+ (which has [SERVER-13542](https://jira.mongodb.org/browse/SERVER-13542) backported), MongoDB 3.0.0+
+or MongoDB 3.2+ (under replication protocol version 0), the electionId's leading bytes are a server timestamp. As long
+as server clocks are skewed *less* than 30 seconds, electionIds can be reliably compared. (This is precise enough,
+because in replication protocol version 0, servers are designed not to complete more than one election every 30 seconds.
+Elections do not take 30 seconds--they are typically much faster than that--but there is a 30-second cooldown before the
+next election can complete.)
+
+Beginning in MongoDB 3.2.0, under replication protocol version 1, the electionId begins with a timestamp, but the
+cooldown is shorter. As long as inter-server clock skew is *less* than the configured election timeout
+(`electionTimeoutMillis`, which defaults to 10 seconds), then electionIds can be reliably compared.
+
+Beginning in MongoDB 3.2.1, under replication protocol version 1, the electionId is guaranteed monotonic without relying
+on any clock synchronization.
+
+### Using me field to detect seed list members that do not match host names in the replica set configuration
+
+Removal from the topology of seed list members where the "me" property does not match the address used to connect
+prevents clients from being able to select a server, only to fail to re-select that server once the primary has
+responded.
+
+This scenario illustrates the problems that arise if this is NOT done:
+
+- The client specifies a seed list of A, B, C
+- Server A responds as a secondary with hosts D, E, F
+- The client executes a query with read preference of secondary, and server A is selected
+- Server B responds as a primary with hosts D, E, F. Servers A, B, C are removed, as they don't appear in the primary's
+ hosts list
+- The client iterates the cursor and attempts to execute a getMore against server A.
+- Server selection fails because server A is no longer part of the topology.
+
+With checking for "me" in place, it looks like this instead:
+
+- The client specifies a seed list of A, B, C
+- Server A responds as a secondary with hosts D, E, F, where "me" is D, and so the client adds D, E, F as type "Unknown"
+ and starts monitoring them, but removes A from the topology.
+- The client executes a query with read preference of secondary, and goes into the server selection loop
+- Server D responds as a secondary where "me" is D
+- Server selection completes by matching D
+- The client iterates the cursor and attempts to execute a getMore against server D.
+- Server selection completes by matching D.
+
+### Ignore setVersion unless the server is primary
+
+It was thought that if all replica set members report a setVersion, and a secondary's response has a higher setVersion
+than any seen, that the secondary's host list could be considered as authoritative as the primary's. (See
+[Replica set monitoring with and without a primary](#replica-set-monitoring-with-and-without-a-primary).)
+
+This scenario illustrates the problem with setVersion:
+
+- We have a replica set with servers A, B, and C.
+- Server A is the primary, with setVersion 4.
+- An administrator runs replSetReconfig on A, which increments its setVersion to 5.
+- The client checks Server A and receives the new config.
+- Server A crashes before any secondary receives the new config.
+- Server B is elected primary. It has the old setVersion 4.
+- The client ignores B's version of the config because its setVersion is not greater than 5.
+
+The client may never correct its view of the topology.
+
+Even worse:
+
+- An administrator runs replSetReconfig on Server B, which increments its setVersion to 5.
+- Server A restarts. This results in *two* versions of the config, both claiming to be version 5.
+
+If the client trusted the setVersion in this scenario, it would trust whichever config it received first.
+
+mongos 2.6 ignores setVersion and only trusts the primary. This spec requires all clients to ignore setVersion from
+non-primaries.
+
+### Use error messages to detect "not master" and "node is recovering"
+
+When error codes are not available, error messages are checked for the substrings "not master" and "node is recovering".
+This is because older server versions returned unstable error codes or no error codes in many circumstances.
+
+### Other transient errors
+
+There are other transient errors a server may return, e.g. retryable errors listed in the retryable writes spec. SDAM
+does not consider these because they do not imply the connected server should be marked as "Unknown". For example, the
+following errors may be returned from a mongos when it cannot route to a shard:
+
+| Error Name | Error Code |
+| --------------- | ---------- |
+| HostNotFound | 7 |
+| HostUnreachable | 6 |
+| NetworkTimeout | 89 |
+| SocketException | 9001 |
+
+When these are returned, the mongos should *not* be marked as "Unknown", since it is more likely an issue with the
+shard.
+
+### Why ignore errors based on CMAP's generation number?
+
+Using CMAP's \[generation number\](#generation number) solves the following race condition among application threads and
+the monitor during error handling:
+
+1. Two concurrent writes begin on application threads A and B.
+2. The server restarts.
+3. Thread A receives the first non-timeout network error, and the client marks the server Unknown, and clears the
+ server's pool.
+4. The client re-checks the server and marks it Primary.
+5. Thread B receives the second non-timeout network error and the client marks the server Unknown again.
+
+The core issue is that the client processes errors in arbitrary order and may overwrite fresh information about the
+server's status with stale information. Using CMAP's generation number avoids the race condition because the duplicate
+(or stale) network error can be identified (changes in **bold**):
+
+1. Two concurrent writes begin on application threads A and B, **with generation 1**.
+2. The server restarts.
+3. Thread A receives the first non-timeout network error, and the client marks the server Unknown, and clears the
+ server's pool. **The pool's generation is now 2.**
+4. The client re-checks the server and marks it Primary.
+5. Thread B receives the second non-timeout network error, **and the client ignores the error because the error
+ originated from a connection with generation 1.**
+
+### Why synchronize clearing a server's pool with updating the topology?
+
+Doing so solves the following race condition among application threads and the monitor during error handling, similar to
+the previous example:
+
+1. A write begins on an application thread.
+2. The server restarts.
+3. The application thread receives a non-timeout network error.
+4. The application thread acquires the lock on the TopologyDescription, marks the Server as Unknown, and releases the
+ lock.
+5. The monitor re-checks the server and marks it Primary and its pool as "ready".
+6. Several other application threads enter the WaitQueue of the server's pool.
+7. The application thread clears the server's pool, evicting all those new threads from the WaitQueue, causing them to
+ return errors or to retry. Additionally, the pool is now "paused", but the server is considered the Primary, meaning
+ future operations will be routed to the server and fail until the next heartbeat marks the pool as "ready" again.
+
+If marking the server as Unknown and clearing its pool were synchronized, then the monitor marking the server as Primary
+after its check would happen after the pool was cleared and thus avoid putting it an inconsistent state.
+
+### What is the purpose of topologyVersion?
+
+[topologyVersion](#topologyversion) solves the following race condition among application threads and the monitor when
+handling \[State Change Errors\](#State Change Errors):
+
+1. Two concurrent writes begin on application threads A and B.
+2. The primary steps down.
+3. Thread A receives the first State Change Error, the client marks the server Unknown.
+4. The client re-checks the server and marks it Secondary.
+5. Thread B receives a delayed State Change Error and the client marks the server Unknown again.
+
+The core issue is that the client processes errors in arbitrary order and may overwrite fresh information about the
+server's status with stale information. Using topologyVersion avoids the race condition because the duplicate (or stale)
+State Change Errors can be identified (changes in **bold**):
+
+1. Two concurrent writes begin on application threads A and B.
+ 1. **The primary's ServerDescription.topologyVersion == tv1**
+2. The primary steps down **and sets its topologyVersion to tv2**.
+3. Thread A receives the first State Change Error **containing tv2**, the client marks the server Unknown (**with
+ topologyVersion: tv2**).
+4. The client re-checks the server and marks it Secondary (**with topologyVersion: tv2**).
+5. Thread B receives a delayed State Change Error (**with topologyVersion: tv2**) **and the client ignores the error
+ because the error's topologyVersion (tv2) is not greater than the current ServerDescription (tv2).**
+
+### Why mark a server Unknown after an auth error?
+
+The [Authentication spec](../auth/auth.md) requires that when authentication fails on a server, the driver MUST clear
+the server's connection pool. Clearing the pool without marking the server Unknown would leave the pool in the "paused"
+state while the server is still selectable. When auth fails due to invalid credentials, marking the server Unknown also
+serves to rate limit new connections; future operations will need to wait for the server to be rediscovered.
+
+Note that authentication may fail for a variety of reasons, for example:
+
+- A network error, or network timeout error may occur.
+- The server may return a [State Change Error](#state-change-error).
+- The server may return a AuthenticationFailed command error (error code 18) indicating that the provided credentials
+ are invalid.
+
+Does this mean that authentication failures due to invalid credentials will manifest as server selection timeout errors?
+No, authentication errors are still returned to the application immediately. A subsequent operation will block until the
+server is rediscovered and immediately attempt authentication on a new connection.
+
+### Clients use the hostnames listed in the replica set config, not the seed list
+
+Very often users have DNS aliases they use in their [seed list](#seed-list) instead of the hostnames in the replica set
+config. For example, the name "host_alias" might refer to a server also known as "host1", and the URI is:
+
+```
+mongodb://host_alias/?replicaSet=rs
+```
+
+When the client connects to "host_alias", its hello or legacy hello response includes the list of hostnames from the
+replica set config, which does not include the seed:
+
+```
+{
+ hosts: ["host1:27017", "host2:27017"],
+ setName: "rs",
+ ... other hello or legacy hello response fields ...
+}
+```
+
+This spec requires clients to connect to the hostnames listed in the hello or legacy hello response. Furthermore, if the
+response is from a primary, the client MUST remove all hostnames not listed. In this case, the client disconnects from
+"host_alias" and tries "host1" and "host2". (See [updateRSFromPrimary](#updatersfromprimary).)
+
+Thus, replica set members must be reachable from the client by the hostnames listed in the replica set config.
+
+An alternative proposal is for clients to continue using the hostnames in the seed list. It could add new hosts from the
+hello or legacy hello response, and where a host is known by two names, the client can deduplicate them using the "me"
+field and prefer the name in the seed list.
+
+This proposal was rejected because it does not support key features of replica sets: failover and zero-downtime
+reconfiguration.
+
+In our example, if "host1" and "host2" are not reachable from the client, the client continues to use "host_alias" only.
+If that server goes down or is removed by a replica set reconfig, the client is suddenly unable to reach the replica set
+at all: by allowing the client to use the alias, we have hidden the fact that the replica set's failover feature will
+not work in a crisis or during a reconfig.
+
+In conclusion, to support key features of replica sets, we require that the hostnames used in a replica set config are
+reachable from the client.
+
+## Backwards Compatibility
+
+The Java driver 2.12.1 has a "heartbeatConnectRetryFrequency". Since this spec recommends the option be named
+"minHeartbeatFrequencyMS", the Java driver must deprecate its old option and rename it minHeartbeatFrequency (for
+consistency with its other options which also lack the "MS" suffix).
+
+## Reference Implementation
+
+- Java driver 3.x
+- PyMongo 3.x
+- Perl driver 1.0.0 (in progress)
+
+## Future Work
+
+MongoDB is likely to add some of the following features, which will require updates to this spec:
+
+- Eventually consistent collections (SERVER-2956)
+- Mongos discovery (SERVER-1834)
+- Put individual databases into maintenance mode, instead of the whole server (SERVER-7826)
+- Put setVersion in write-command responses (SERVER-13909)
+
+## Questions and Answers
+
+### When does a client see "not writable primary" or "node is recovering"?
+
+These errors indicate one of these:
+
+- A write was attempted on an unwritable server (arbiter, secondary, ghost, or recovering).
+- A read was attempted on an unreadable server (arbiter, ghost, or recovering) or a read was attempted on a read-only
+ server without the secondaryOk bit set.
+- An operation was attempted on a server that is now shutting down.
+
+In any case the error is a symptom that a ServerDescription's type no longer reflects reality.
+
+On MongoDB 4.0 and earlier, a primary closes its connections when it steps down, so in many cases the next operation
+causes a network error rather than "not writable primary". The driver can see a "not writable primary" error in the
+following scenario:
+
+1. The client discovers the primary.
+2. The primary steps down.
+3. Before the client checks the server and discovers the stepdown, the application attempts an operation.
+4. The client's connection pool is empty, either because it has never attempted an operation on this server, or because
+ all connections are in use by other threads.
+5. The client creates a connection to the old primary.
+6. The client attempts to write, or to read without the secondaryOk bit, and receives "not writable primary".
+
+See ["not writable primary" and "node is recovering"](#not-writable-primary-and-node-is-recovering), and the test
+scenario called "parsing 'not writable primary' and 'node is recovering' errors".
+
+### Why close connections when a node is shutting down?
+
+When a server shuts down, it will return one of the "node is shutting down" errors for each attempted operation and
+eventually will close all connections. Keeping a connection to a server which is shutting down open would only produce
+errors on this connection - such a connection will never be usable for any operations. In contrast, when a server 4.2 or
+later returns "not writable primary" error the connection may be usable for other operations (such as secondary reads).
+
+### What's the point of periodic monitoring?
+
+Why not just wait until a "not writable primary" error or "node is recovering" error informs the client that its
+TopologyDescription is wrong? Or wait until server selection fails to find a suitable server, and only scan all servers
+then?
+
+Periodic monitoring accomplishes three objectives:
+
+- Update each server's type, tags, and [round trip time](#round-trip-time). Read preferences and the mongos selection
+ algorithm require this information remains up to date.
+- Discover new secondaries so that secondary reads are evenly spread.
+- Detect incremental changes to the replica set configuration, so that the client remains connected to the set even
+ while it is migrated to a completely new set of hosts.
+
+If the application uses some servers very infrequently, monitoring can also proactively detect state changes (primary
+stepdown, server becoming unavailable) that would otherwise cause future errors.
+
+### Why is auto-discovery the preferred default?
+
+Auto-discovery is most resilient and is therefore preferred.
+
+### Why is it possible for maxSetVersion to go down?
+
+`maxElectionId` and `maxSetVersion` are actually considered a pair of values Drivers MAY consider implementing
+comparison in code as a tuple of the two to ensure their always updated together:
+
+```typescript
+// New tuple old tuple
+{ electionId: 2, setVersion: 1 } > { electionId: 1, setVersion: 50 }
+```
+
+In this scenario, the maxSetVersion goes from 50 to 1, but the maxElectionId is raised to 2.
+
+## Acknowledgments
+
+Jeff Yemin's code for the Java driver 2.12, and his patient explanation thereof, is the major inspiration for this spec.
+Mathias Stearn's beautiful design for replica set monitoring in mongos 2.6 contributed as well. Bernie Hackett gently
+oversaw the specification process.
+
+## Changelog
+
+- 2024-05-08: Migrated from reStructuredText to Markdown.
+
+- 2015-12-17: Require clients to compare (setVersion, electionId) tuples.
+
+- 2015-10-09: Specify electionID comparison method.
+
+- 2015-06-16: Added cooldownMS.
+
+- 2016-05-04: Added link to SDAM monitoring.
+
+- 2016-07-18: Replace mentions of the "Read Preferences Spec" with "Server\
+ Selection Spec", and
+ "secondaryAcceptableLatencyMS" with "localThresholdMS".
+
+- 2016-07-21: Updated for Max Staleness support.
+
+- 2016-08-04: Explain better why clients use the hostnames in RS config, not URI.
+
+- 2016-08-31: Multi-threaded clients SHOULD use hello or legacy hello replies to\
+ update the topology when they
+ handshake application connections.
+
+- 2016-10-06: In updateRSWithoutPrimary the hello or legacy hello response's\
+ "primary" field should be used to update
+ the topology description, even if address != me.
+
+- 2016-10-29: Allow for idleWritePeriodMS to change someday.
+
+- 2016-11-01: "Unknown" is no longer the default TopologyType, the default is now\
+ explicitly unspecified. Update
+ instructions for setting the initial TopologyType when running the spec tests.
+
+- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the\
+ future.
+
+- 2017-02-28: Update "network error when reading or writing": timeout while\
+ connecting does mark a server Unknown,
+ unlike a timeout while reading or writing. Justify the different behaviors, and also remove obsolete reference to
+ auto-retry.
+
+- 2017-06-13: Move socketCheckIntervalMS to Server Selection Spec.
+
+- 2017-08-01: Parse logicalSessionTimeoutMinutes from hello or legacy hello reply.
+
+- 2017-08-11: Clearer specification of "incompatible" logic.
+
+- 2017-09-01: Improved incompatibility error messages.
+
+- 2018-03-28: Specify that monitoring must not do mechanism negotiation or authentication.
+
+- 2019-05-29: Renamed InterruptedDueToStepDown to InterruptedDueToReplStateChange
+
+- 2020-02-13: Drivers must run SDAM flow even when server description is equal to\
+ the last one.
+
+- 2020-03-31: Add topologyVersion to ServerDescription. Add rules for ignoring\
+ stale application errors.
+
+- 2020-05-07: Include error field in ServerDescription equality comparison.
+
+- 2020-06-08: Clarify reasoning behind how SDAM determines if a topologyVersion is stale.
+
+- 2020-12-17: Mark the pool for a server as "ready" after performing a successful\
+ check. Synchronize pool clearing with
+ SDAM updates.
+
+- 2021-01-17: Require clients to compare (electionId, setVersion) tuples.
+
+- 2021-02-11: Errors encountered during auth are handled by SDAM. Auth errors\
+ mark the server Unknown and clear the
+ pool.
+
+- 2021-04-12: Adding in behaviour for load balancer mode.
+
+- 2021-05-03: Require parsing "isWritablePrimary" field in responses.
+
+- 2021-06-09: Connection pools must be created and eventually marked ready for\
+ any server if a direct connection is
+ used.
+
+- 2021-06-29: Updated to use modern terminology.
+
+- 2022-01-19: Add iscryptd and 90th percentile RTT fields to ServerDescription.
+
+- 2022-07-11: Convert integration tests to the unified format.
+
+- 2022-09-30: Update `updateRSFromPrimary` to include logic before and after 6.0 servers
+
+- 2022-10-05: Remove spec front matter, move footnote, and reformat changelog.
+
+- 2022-11-17: Add minimum RTT tracking and remove 90th percentile RTT.
+
+- 2024-01-17: Add section on expected client close behaviour
+
+______________________________________________________________________
+
+[^1]: "localThresholdMS" was called "secondaryAcceptableLatencyMS" in the Read Preferences Spec, before it was superseded
+ by the Server Selection Spec.
+
+[^2]: [TopologyType remains Unknown when an RSGhost is discovered](#topologytype-remains-unknown-when-an-rsghost-is-discovered).
diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
index 2594b090a7..ddd00719ec 100644
--- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
+++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst
@@ -1,2573 +1,4 @@
-===============================
-Server Discovery And Monitoring
-===============================
-
-:Status: Accepted
-:Minimum Server Version: 2.4
-
-.. contents::
-
---------
-
-Abstract
---------
-
-This spec defines how a MongoDB client discovers and monitors one or more servers.
-It covers monitoring a single server, a set of mongoses, or a replica set.
-How does the client determine what type of servers they are?
-How does it keep this information up to date?
-How does the client find an entire replica set from a seed list,
-and how does it respond to a stepdown, election, reconfiguration, or network error?
-
-All drivers must answer these questions the same.
-Or, where platforms' limitations require differences among drivers,
-there must be as few answers as possible and each must be clearly explained in this spec.
-Even in cases where several answers seem equally good, drivers must agree on one way to do it.
-
-MongoDB users and driver authors benefit from having one way to discover and monitor servers.
-Users can substantially understand their driver's behavior without inspecting its code or asking its author.
-Driver authors can avoid subtle mistakes
-when they take advantage of a design that has been well-considered, reviewed, and tested.
-
-The server discovery and monitoring method is specified in four sections.
-First, a client is `configured`_.
-Second, it begins `monitoring`_ by calling `hello or legacy hello`_ on all servers.
-(Multi-threaded and asynchronous monitoring is described first,
-then single-threaded monitoring.)
-Third, as hello or legacy hello responses are received
-the client `parses them`_,
-and fourth, it `updates its view of the topology`_.
-
-Finally, this spec describes how `drivers update their topology view
-in response to errors`_,
-and includes generous implementation notes for driver authors.
-
-This spec does not describe how a client chooses a server for an operation;
-that is the domain of the Server Selection Spec.
-But there is a section describing
-the `interaction between monitoring and server selection`_.
-
-There is no discussion of driver architecture and data structures,
-nor is there any specification of a user-facing API.
-This spec is only concerned with the algorithm for monitoring the server topology.
-
-Meta
-----
-
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
-NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
-"OPTIONAL" in this document are to be interpreted as described in
-`RFC 2119`_.
-
-.. _RFC 2119: https://www.ietf.org/rfc/rfc2119.txt
-
-Specification
--------------
-
-General Requirements
-''''''''''''''''''''
-
-**Direct connections:**
-A client MUST be able to connect to a single server of any type.
-This includes querying hidden replica set members,
-and connecting to uninitialized members (see `RSGhost`_) in order to run
-"replSetInitiate".
-Setting a read preference MUST NOT be necessary to connect to a secondary.
-Of course,
-the secondary will reject all operations done with the PRIMARY read preference
-because the secondaryOk bit is not set,
-but the initial connection itself succeeds.
-Drivers MAY allow direct connections to arbiters
-(for example, to run administrative commands).
-
-**Replica sets:**
-A client MUST be able to discover an entire replica set from
-a seed list containing one or more replica set members.
-It MUST be able to continue monitoring the replica set
-even when some members go down,
-or when reconfigs add and remove members.
-A client MUST be able to connect to a replica set
-while there is no primary, or the primary is down.
-
-**Mongos:**
-A client MUST be able to connect to a set of mongoses
-and monitor their availability and `round trip time`_.
-This spec defines how mongoses are discovered and monitored,
-but does not define which mongos is selected for a given operation.
-
-Terms
-'''''
-
-Server
-``````
-
-A mongod or mongos process, or a load balancer.
-
-Deployment
-``````````
-
-One or more servers:
-either a standalone, a replica set, or one or more mongoses.
-
-Topology
-````````
-
-The state of the deployment:
-its type (standalone, replica set, or sharded),
-which servers are up, what type of servers they are,
-which is primary, and so on.
-
-Client
-``````
-
-Driver code responsible for connecting to MongoDB.
-
-Seed list
-`````````
-
-Server addresses provided to the client in its initial configuration,
-for example from the `connection string`_.
-
-Data-Bearing Server Type
-````````````````````````
-
-A server type from which a client can receive application data:
-
-* Mongos
-* RSPrimary
-* RSSecondary
-* Standalone
-* LoadBalanced
-
-Round trip time
-```````````````
-
-Also known as RTT.
-
-The client's measurement of the duration of one hello or legacy hello call.
-The round trip time is used to support the "localThresholdMS" [1]_
-option in the Server Selection Spec.
-
-.. [1] "localThresholdMS" was called "secondaryAcceptableLatencyMS" in the Read
- Preferences Spec, before it was superseded by the Server Selection Spec.
-
-hello or legacy hello outcome
-`````````````````````````````
-
-The result of an attempt to call the hello or legacy hello command on a server.
-It consists of three elements:
-a boolean indicating the success or failure of the attempt,
-a document containing the command response (or null if it failed),
-and the round trip time to execute the command (or null if it failed).
-
-.. _checks: #check
-
-check
-`````
-
-The client checks a server by attempting to call hello or legacy hello on it,
-and recording the outcome.
-
-.. _scans: #scan
-
-scan
-````
-
-The process of checking all servers in the deployment.
-
-suitable
-````````
-
-A server is judged "suitable" for an operation if the client can use it
-for a particular operation.
-For example, a write requires a standalone, primary, or mongos.
-Suitability is fully specified in the `Server Selection Spec
-<../server-selection/server-selection.md>`_.
-
-address
-```````
-
-The hostname or IP address, and port number, of a MongoDB server.
-
-network error
-`````````````
-
-An error that occurs while reading from or writing to a network socket.
-
-network timeout
-```````````````
-
-A timeout that occurs while reading from or writing to a network socket.
-
-
-minHeartbeatFrequencyMS
-```````````````````````
-
-Defined in the `Server Monitoring spec`_. This value MUST be 500 ms, and
-it MUST NOT be configurable.
-
-.. _generation number:
-
-pool generation number
-``````````````````````
-
-The pool's generation number which starts at 0 and is incremented each time
-the pool is cleared. Defined in the `Connection Monitoring and Pooling spec`_.
-
-connection generation number
-````````````````````````````
-
-The pool's generation number at the time this connection was created.
-Defined in the `Connection Monitoring and Pooling spec`_.
-
-error generation number
-```````````````````````
-
-The error's generation number is the generation of the connection on which the
-application error occurred. Note that when a network error occurs before the
-handshake completes then the error's generation number is the generation of
-the pool at the time the connection attempt was started.
-
-.. _State Change Errors:
-
-State Change Error
-``````````````````
-
-A server reply document indicating a "not writable primary" or "node is recovering"
-error. Starting in MongoDB 4.4 these errors may also include a
-`topologyVersion`_ field.
-
-Data structures
-'''''''''''''''
-
-This spec uses a few data structures
-to describe the client's view of the topology.
-It must be emphasized that
-a driver is free to implement the same behavior
-using different data structures.
-This spec uses these enums and structs in order to describe driver **behavior**,
-not to mandate how a driver represents the topology,
-nor to mandate an API.
-
-Constants
-`````````
-
-clientMinWireVersion and clientMaxWireVersion
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Integers. The wire protocol range supported by the client.
-
-Enums
-`````
-
-TopologyType
-~~~~~~~~~~~~
-
-Single, ReplicaSetNoPrimary, ReplicaSetWithPrimary, Sharded, LoadBalanced, or Unknown.
-
-See `updating the TopologyDescription`_.
-
-ServerType
-~~~~~~~~~~
-
-Standalone, Mongos,
-PossiblePrimary, RSPrimary, RSSecondary, RSArbiter, RSOther, RSGhost,
-LoadBalancer or Unknown.
-
-See `parsing a hello or legacy hello response`_.
-
-.. note:: Single-threaded clients use the PossiblePrimary type
- to maintain proper `scanning order`_.
- Multi-threaded and asynchronous clients do not need this ServerType;
- it is synonymous with Unknown.
-
-TopologyDescription
-```````````````````
-
-The client's representation of everything it knows about the deployment's topology.
-
-Fields:
-
-* type: a `TopologyType`_ enum value. See `initial TopologyType`_.
-* setName: the replica set name. Default null.
-* maxElectionId: an ObjectId or null. The largest electionId ever reported by
- a primary. Default null. Part of the (``electionId``, ``setVersion``) tuple.
-* maxSetVersion: an integer or null. The largest setVersion ever reported by
- a primary. It may not monotonically increase, as electionId takes precedence in ordering
- Default null. Part of the (``electionId``, ``setVersion``) tuple.
-* servers: a set of ServerDescription instances.
- Default contains one server: "localhost:27017", ServerType Unknown.
-* stale: a boolean for single-threaded clients, whether the topology must
- be re-scanned.
- (Not related to maxStalenessSeconds, nor to `stale primaries`_.)
-* compatible: a boolean.
- False if any server's wire protocol version range
- is incompatible with the client's.
- Default true.
-* compatibilityError: a string.
- The error message if "compatible" is false, otherwise null.
-* logicalSessionTimeoutMinutes: integer or null. Default null. See
- `logical session timeout`_.
-
-ServerDescription
-`````````````````
-
-The client's view of a single server,
-based on the most recent hello or legacy hello outcome.
-
-Again, drivers may store this information however they choose;
-this data structure is defined here
-merely to describe the monitoring algorithm.
-
-Fields:
-
-* address: the hostname or IP, and the port number,
- that the client connects to.
- Note that this is **not** the "me" field in the server's hello or legacy hello response,
- in the case that the server reports an address different
- from the address the client uses.
-* (=) error: information about the last error related to this server. Default null.
-* roundTripTime: the duration of the hello or legacy hello call. Default null.
-* minRoundTripTime: the minimum RTT for the server. Default null.
-* lastWriteDate: a 64-bit BSON datetime or null.
- The "lastWriteDate" from the server's most recent hello or legacy hello response.
-* opTime: an opTime or null.
- An opaque value representing the position in the oplog of the most recently seen write. Default null.
- (Only mongos and shard servers record this field when monitoring
- config servers as replica sets, at least until `drivers allow applications to use readConcern "afterOptime". <../max-staleness/max-staleness.md#future-feature-to-support-readconcern-afteroptime>`_)
-* (=) type: a `ServerType`_ enum value. Default Unknown.
-* (=) minWireVersion, maxWireVersion:
- the wire protocol version range supported by the server.
- Both default to 0.
- `Use min and maxWireVersion only to determine compatibility`_.
-* (=) me: The hostname or IP, and the port number, that this server was
- configured with in the replica set. Default null.
-* (=) hosts, passives, arbiters: Sets of addresses.
- This server's opinion of the replica set's members, if any.
- These `hostnames are normalized to lower-case`_.
- Default empty.
- The client `monitors all three types of servers`_ in a replica set.
-* (=) tags: map from string to string. Default empty.
-* (=) setName: string or null. Default null.
-* (=) electionId: an ObjectId, if this is a MongoDB 2.6+ replica set member that
- believes it is primary. See `using electionId and setVersion to detect stale primaries`_.
- Default null.
-* (=) setVersion: integer or null. Default null.
-* (=) primary: an address. This server's opinion of who the primary is.
- Default null.
-* lastUpdateTime: when this server was last checked. Default "infinity ago".
-* (=) logicalSessionTimeoutMinutes: integer or null. Default null.
-* (=) topologyVersion: A topologyVersion or null. Default null.
- The "topologyVersion" from the server's most recent hello or legacy hello response or
- `State Change Error`_.
-* (=) iscryptd: boolean indicating if the server is a
- `mongocryptd <../client-side-encryption/client-side-encryption.md#mongocryptd>`_
- server. Default null.
-
-"Passives" are priority-zero replica set members that cannot become primary.
-The client treats them precisely the same as other members.
-
-Fields marked (=) are used for `Server Description Equality`_ comparison.
-
-.. _configured: #configuration
-
-Configuration
-'''''''''''''
-
-No breaking changes
-```````````````````
-
-This spec does not intend
-to require any drivers to make breaking changes regarding
-what configuration options are available,
-how options are named,
-or what combinations of options are allowed.
-
-Initial TopologyDescription
-```````````````````````````
-
-The default values for `TopologyDescription`_ fields are described above.
-Users may override the defaults as follows:
-
-Initial Servers
-~~~~~~~~~~~~~~~
-
-The user MUST be able to set the initial servers list to a `seed list`_
-of one or more addresses.
-
-The hostname portion of each address MUST be normalized to lower-case.
-
-Initial TopologyType
-~~~~~~~~~~~~~~~~~~~~
-
-If the ``directConnection`` URI option is specified when a MongoClient is
-constructed, the TopologyType must be initialized based on the value of
-the ``directConnection`` option and the presence of the ``replicaSet`` option
-according to the following table:
-
-+------------------+-----------------------+-----------------------+
-| directConnection | replicaSet present | Initial TopologyType |
-+==================+=======================+=======================+
-| true | no | Single |
-+------------------+-----------------------+-----------------------+
-| true | yes | Single |
-+------------------+-----------------------+-----------------------+
-| false | no | Unknown |
-+------------------+-----------------------+-----------------------+
-| false | yes | ReplicaSetNoPrimary |
-+------------------+-----------------------+-----------------------+
-
-If the ``directConnection`` option is not specified, newly developed drivers
-MUST behave as if it was specified with the false value.
-
-Since changing the starting topology can reasonably be considered a
-backwards-breaking change, existing drivers SHOULD stage implementation
-according to semantic versioning guidelines. Specifically, support for the
-``directConnection`` URI option can be added in a minor release.
-In a subsequent major release, the default starting topology can be changed
-to Unknown. Drivers MUST document this in a prior minor release.
-
-Existing drivers MUST deprecate other URI options, if any, for controlling
-topology discovery or specifying the deployment topology. If such a legacy
-option is specified and the ``directConnection`` option is also
-specified, and the values of the two options are semantically different,
-the driver MUST report an error during URI option parsing.
-
-The API for initializing TopologyType using language-specific native options
-is not specified here. Drivers might already have a convention, e.g. a single
-seed means Single, a setName means ReplicaSetNoPrimary, and a list of seeds
-means Unknown. There are variations, however: In the Java driver a single seed
-means Single, but a **list** containing one seed means Unknown, so it can
-transition to replica-set monitoring if the seed is discovered to be a
-replica set member. In contrast, PyMongo requires a non-null setName in order
-to begin replica-set monitoring, regardless of the number of seeds.
-This spec does not cover language-specific native options that a driver may
-provide.
-
-Initial setName
-~~~~~~~~~~~~~~~
-
-It is allowed to use ``directConnection=true`` in conjunction with the
-``replicaSet`` URI option. The driver must connect in Single topology and
-verify that setName matches the specified name, as per
-`verifying setName with TopologyType Single`_.
-
-When a MongoClient is initialized using language-specific native options,
-the user MUST be able to set the client's initial replica set name.
-A driver MAY require the set name in order to connect to a replica set,
-or it MAY be able to discover the replica set name as it connects.
-
-Allowed configuration combinations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Drivers MUST enforce:
-
-* TopologyType Single cannot be used with multiple seeds.
-* ``directConnection=true`` cannot be used with multiple seeds.
-* If setName is not null, only TopologyType ReplicaSetNoPrimary,
- and possibly Single,
- are allowed.
- (See `verifying setName with TopologyType Single`_.)
-* ``loadBalanced=true`` cannot be used in conjunction with
- ``directConnection=true`` or ``replicaSet``
-
-Handling of SRV URIs resolving to single host
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When a driver is given an SRV URI, if the ``directConnection`` URI option
-is not specified, and the ``replicaSet`` URI option is not specified, the
-driver MUST start in Unknown topology, and follow the rules in the
-`TopologyType table`_ for transitioning to other topologies. In particular,
-the driver MUST NOT use the number of hosts from the initial SRV lookup
-to decide what topology to start in.
-
-heartbeatFrequencyMS
-````````````````````
-
-The interval between server `checks`_, counted from the end of the previous
-check until the beginning of the next one.
-
-For multi-threaded and asynchronous drivers
-it MUST default to 10 seconds and MUST be configurable.
-For single-threaded drivers it MUST default to 60 seconds
-and MUST be configurable.
-It MUST be called heartbeatFrequencyMS
-unless this breaks backwards compatibility.
-
-For both multi- and single-threaded drivers,
-the driver MUST NOT permit users to configure it less than minHeartbeatFrequencyMS (500ms).
-
-(See `heartbeatFrequencyMS defaults to 10 seconds or 60 seconds`_
-and `what's the point of periodic monitoring?`_)
-
-Client construction
-'''''''''''''''''''
-
-Except for `initial DNS seed list discovery
-<../initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md>`_
-when given a connection string with ``mongodb+srv`` scheme,
-the client's constructor MUST NOT do any I/O.
-This means that the constructor does not throw an exception
-if servers are unavailable:
-the topology is not yet known when the constructor returns.
-Similarly if a server has an incompatible wire protocol version,
-the constructor does not throw.
-Instead, all subsequent operations on the client fail
-as long as the error persists.
-
-See `clients do no I/O in the constructor`_ for the justification.
-
-Multi-threaded and asynchronous client construction
-```````````````````````````````````````````````````
-
-The constructor MAY start the monitors as background tasks
-and return immediately.
-Or the monitors MAY be started by some method separate from the constructor;
-for example they MAY be started by some "initialize" method (by any name),
-or on the first use of the client for an operation.
-
-Single-threaded client construction
-```````````````````````````````````
-
-Single-threaded clients do no I/O in the constructor.
-They MUST `scan`_ the servers on demand,
-when the first operation is attempted.
-
-Client closing
-''''''''''''''
-
-When a client is closing, before it emits the ``TopologyClosedEvent`` as per the
-`Events API `_,
-it SHOULD `remove`_ all servers from its ``TopologyDescription`` and set its
-``TopologyType`` to ``Unknown``, emitting the corresponding
-``TopologyDescriptionChangedEvent``.
-
-Monitoring
-''''''''''
-
-See the `Server Monitoring spec`_ for how a driver monitors each server. In
-summary, the client monitors each server in the topology. The scope of server
-monitoring is to provide the topology with updated ServerDescriptions based on
-hello or legacy hello command responses.
-
-.. _parses them: #parsing-a-hello-or-legacy-hello-response
-
-Parsing a hello or legacy hello response
-''''''''''''''''''''''''''''''''''''''''
-
-The client represents its view of each server with a `ServerDescription`_.
-Each time the client `checks`_ a server, it MUST replace its description of
-that server with a new one if and only if the new ServerDescription's
-`topologyVersion`_ is greater than or equal to the current ServerDescription's
-`topologyVersion`_.
-
-(See `Replacing the TopologyDescription`_ for an example implementation.)
-
-This replacement MUST happen even if the new server description compares equal
-to the previous one, in order to keep client-tracked attributes like last
-update time and round trip time up to date.
-
-Drivers MUST be able to handle responses to both ``hello`` and legacy hello
-commands. When checking results, drivers MUST first check for the
-``isWritablePrimary`` field and fall back to checking for an ``ismaster`` field
-if ``isWritablePrimary`` was not found.
-
-ServerDescriptions are created from hello or legacy hello outcomes as follows:
-
-type
-````
-
-The new ServerDescription's type field is set to a `ServerType`_.
-Note that these states do **not** exactly correspond to
-`replica set member states
-`_.
-For example, some replica set member states like STARTUP and RECOVERING
-are identical from the client's perspective, so they are merged into "RSOther".
-Additionally, states like Standalone and Mongos
-are not replica set member states at all.
-
-+-------------------+---------------------------------------------------------------+
-| State | Symptoms |
-+===================+===============================================================+
-| Unknown | Initial, or after a network error or failed hello or legacy |
-| | hello call, or "ok: 1" not in hello or legacy hello response. |
-+-------------------+---------------------------------------------------------------+
-| Standalone | No "msg: isdbgrid", no setName, and no "isreplicaset: true". |
-+-------------------+---------------------------------------------------------------+
-| Mongos | "msg: isdbgrid". |
-+-------------------+---------------------------------------------------------------+
-| PossiblePrimary | Not yet checked, but another member thinks it is the primary. |
-+-------------------+---------------------------------------------------------------+
-| RSPrimary | "isWritablePrimary: true" or "ismaster: true", |
-| | "setName" in response. |
-+-------------------+---------------------------------------------------------------+
-| RSSecondary | "secondary: true", "setName" in response. |
-+-------------------+---------------------------------------------------------------+
-| RSArbiter | "arbiterOnly: true", "setName" in response. |
-+-------------------+---------------------------------------------------------------+
-| RSOther | "setName" in response, "hidden: true" or not primary, |
-| | secondary, nor arbiter. |
-+-------------------+---------------------------------------------------------------+
-| RSGhost | "isreplicaset: true" in response. |
-+-------------------+---------------------------------------------------------------+
-| LoadBalanced | "loadBalanced=true" in URI. |
-+-------------------+---------------------------------------------------------------+
-
-A server can transition from any state to any other. For example, an
-administrator could shut down a secondary and bring up a mongos in its place.
-
-.. _RSGhost: #RSGhost-and-RSOther
-
-RSGhost and RSOther
-~~~~~~~~~~~~~~~~~~~
-
-The client MUST monitor replica set members
-even when they cannot be queried.
-These members are in state RSGhost or RSOther.
-
-**RSGhost** members occur in at least three situations:
-
-* briefly during server startup,
-* in an uninitialized replica set,
-* or when the server is shunned (removed from the replica set config).
-
-An RSGhost server has no hosts list nor setName.
-Therefore the client MUST NOT attempt to use its hosts list
-nor check its setName
-(see `JAVA-1161 `_
-or `CSHARP-671 `_.)
-However, the client MUST keep the RSGhost member in its TopologyDescription,
-in case the client's only hope for staying connected to the replica set
-is that this member will transition to a more useful state.
-
-For simplicity, this is the rule:
-any server is an RSGhost that reports "isreplicaset: true".
-
-Non-ghost replica set members have reported their setNames
-since MongoDB 1.6.2.
-See `only support replica set members running MongoDB 1.6.2 or later`_.
-
-.. note:: The Java driver does not have a separate state for RSGhost;
- it is an RSOther server with no hosts list.
-
-**RSOther** servers may be hidden, starting up, or recovering.
-They cannot be queried, but their hosts lists are useful
-for discovering the current replica set configuration.
-
-If a `hidden member `_
-is provided as a seed,
-the client can use it to find the primary.
-Since the hidden member does not appear in the primary's host list,
-it will be removed once the primary is checked.
-
-error
-`````
-
-If the client experiences any error when checking a server,
-it stores error information in the ServerDescription's error field.
-
-roundTripTime
-`````````````
-
-Drivers MUST record the server's `round trip time`_ (RTT) after each
-successful call to hello or legacy hello. The Server Selection Spec
-describes how RTT is averaged and how it is used in server selection.
-Drivers MUST also record the server's minimum RTT per
-`Server Monitoring (Measuring RTT)`_.
-
-If a hello or legacy hello call fails, the RTT is not updated.
-Furthermore, while a server's type is Unknown its RTT is null,
-and if it changes from a known type to Unknown its RTT is set to null.
-However, if it changes from one known type to another
-(e.g. from RSPrimary to RSSecondary) its RTT is updated normally,
-not set to null nor restarted from scratch.
-
-lastWriteDate and opTime
-````````````````````````
-
-The hello or legacy hello response of a replica set member running MongoDB 3.4 and later
-contains a ``lastWrite`` subdocument with fields ``lastWriteDate`` and ``opTime``
-(`SERVER-8858`_).
-If these fields are available, parse them from the hello or legacy hello response,
-otherwise set them to null.
-
-Clients MUST NOT attempt to compensate for the network latency between when the server
-generated its hello or legacy hello response and when the client records ``lastUpdateTime``.
-
-.. _SERVER-8858: https://jira.mongodb.org/browse/SERVER-8858
-
-lastUpdateTime
-``````````````
-
-Clients SHOULD set lastUpdateTime with a monotonic clock.
-
-Hostnames are normalized to lower-case
-``````````````````````````````````````
-
-The same as with seeds provided in the initial configuration,
-all hostnames in the hello or legacy hello response's "me", "hosts", "passives", and "arbiters"
-entries MUST be lower-cased.
-
-This prevents unnecessary work rediscovering a server
-if a seed "A" is provided and the server
-responds that "a" is in the replica set.
-
-`RFC 4343 `_:
-
- Domain Name System (DNS) names are "case insensitive".
-
-logicalSessionTimeoutMinutes
-````````````````````````````
-
-MongoDB 3.6 and later include a ``logicalSessionTimeoutMinutes`` field if
-logical sessions are enabled in the deployment. Clients MUST check for this
-field and set the ServerDescription's logicalSessionTimeoutMinutes field to this
-value, or to null otherwise.
-
-topologyVersion
-```````````````
-
-MongoDB 4.4 and later include a ``topologyVersion`` field in all hello or legacy hello
-and `State Change Error`_ responses. Clients MUST check for this field and set
-the ServerDescription's topologyVersion field to this value, if present.
-The topologyVersion helps the client and server determine the relative
-freshness of topology information in concurrent messages.
-(See `What is the purpose of topologyVersion?`_)
-
-The topologyVersion is a subdocument with two fields, "processId" and
-"counter":
-
-.. code:: typescript
-
- {
- topologyVersion: {processId: , counter: },
- ( ... other fields ...)
- }
-
-topologyVersion Comparison
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-To compare a topologyVersion from a hello or legacy hello or State Change Error
-response to the current ServerDescription's topologyVersion:
-
-#. If the response topologyVersion is unset or the ServerDescription's
- topologyVersion is null, the client MUST assume the response is more recent.
-#. If the response's topologyVersion.processId is not equal to the
- ServerDescription's, the client MUST assume the response is more recent.
-#. If the response's topologyVersion.processId is equal to the
- ServerDescription's, the client MUST use the counter field to determine
- which topologyVersion is more recent.
-
-See `Replacing the TopologyDescription`_ for an example implementation of
-topologyVersion comparison.
-
-serviceId
-`````````
-
-MongoDB 5.0 and later, as well as any mongos-like service, include a ``serviceId``
-field when the service is configured behind a load balancer.
-
-Other ServerDescription fields
-``````````````````````````````
-
-Other required fields
-defined in the `ServerDescription`_ data structure
-are parsed from the hello or legacy hello response in the obvious way.
-
-.. _updates its view of the topology:
-
-Server Description Equality
-```````````````````````````
-
-For the purpose of determining whether to publish SDAM events, two server
-descriptions having the same address MUST be considered equal if and only if
-the values of `ServerDescription`_ fields marked (=) are respectively equal.
-
-This specification does not prescribe how to compare server descriptions
-with different addresses for equality.
-
-Updating the TopologyDescription
-''''''''''''''''''''''''''''''''
-
-Each time the client checks a server,
-it processes the outcome (successful or not)
-to create a `ServerDescription`_,
-and then it processes the ServerDescription to update its `TopologyDescription`_.
-
-The TopologyDescription's `TopologyType`_ influences
-how the ServerDescription is processed.
-The following subsection
-specifies how the client updates its TopologyDescription
-when the TopologyType is Single.
-The next subsection treats the other types.
-
-TopologyType Single
-```````````````````
-
-The TopologyDescription's type was initialized as Single
-and remains Single forever.
-There is always one ServerDescription in TopologyDescription.servers.
-
-Whenever the client checks a server (successfully or not), and regardless of
-whether the new server description is equal to the previous server description
-as defined in `Server Description Equality`_,
-the ServerDescription in TopologyDescription.servers
-MUST be replaced with the new ServerDescription.
-
-.. _is compatible:
-
-
-Checking wire protocol compatibility
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-A ServerDescription which is not Unknown is incompatible if:
-
-* minWireVersion > clientMaxWireVersion, or
-* maxWireVersion < clientMinWireVersion
-
-If any ServerDescription is incompatible, the client MUST set the
-TopologyDescription's "compatible" field to false and fill out the
-TopologyDescription's "compatibilityError" field like so:
-
-- if ServerDescription.minWireVersion > clientMaxWireVersion:
-
- "Server at $host:$port requires wire version $minWireVersion, but this version
- of $driverName only supports up to $clientMaxWireVersion."
-
-- if ServerDescription.maxWireVersion < clientMinWireVersion:
-
- "Server at $host:$port reports wire version $maxWireVersion, but this version
- of $driverName requires at least $clientMinWireVersion (MongoDB
- $mongoVersion)."
-
-Replace $mongoVersion with the appropriate MongoDB minor version, for example if
-clientMinWireVersion is 2 and it connects to MongoDB 2.4, format the error like:
-
- "Server at example.com:27017 reports wire version 0, but this version
- of My Driver requires at least 2 (MongoDB 2.6)."
-
-In this second case, the exact required MongoDB version is known and can be
-named in the error message, whereas in the first case the implementer does not
-know which MongoDB versions will be compatible or incompatible in the future.
-
-Verifying setName with TopologyType Single
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-A client MAY allow the user to supply a setName with an initial TopologyType
-of Single. In this case, if the ServerDescription's setName is null or wrong,
-the ServerDescription MUST be replaced with a default ServerDescription of
-type Unknown.
-
-
-TopologyType LoadBalanced
-`````````````````````````
-
-See the `Load Balancer Specification <../load-balancers/load-balancers.md#server-discovery-logging-and-monitoring>`__ for details.
-
-Other TopologyTypes
-```````````````````
-
-If the TopologyType is **not** Single, the topology can contain zero or more
-servers. The state of topology containing zero servers is terminal
-(because servers can only be added if they are reported by a server already
-in the topology). A client SHOULD emit a warning if it is constructed
-with no seeds in the initial seed list. A client SHOULD emit a warning when,
-in the process of updating its topology description, it removes the last
-server from the topology.
-
-Whenever a client completes a hello or legacy hello call,
-it creates a new ServerDescription with the proper `ServerType`_.
-It replaces the server's previous description in TopologyDescription.servers
-with the new one.
-
-Apply the logic for `checking wire protocol compatibility`_ to each
-ServerDescription in the topology.
-If any server's wire protocol version range does not overlap with the client's,
-the client updates the "compatible" and "compatibilityError" fields
-as described above for TopologyType Single.
-Otherwise "compatible" is set to true.
-
-It is possible for a multi-threaded client to receive a hello or legacy hello outcome
-from a server after the server has been removed from the TopologyDescription.
-For example, a monitor begins checking a server "A",
-then a different monitor receives a response from the primary
-claiming that "A" has been removed from the replica set,
-so the client removes "A" from the TopologyDescription.
-Then, the check of server "A" completes.
-
-In all cases, the client MUST ignore hello or legacy hello outcomes from servers
-that are not in the TopologyDescription.
-
-The following subsections explain in detail what actions the client takes
-after replacing the ServerDescription.
-
-TopologyType table
-~~~~~~~~~~~~~~~~~~
-
-The new ServerDescription's type is the vertical axis,
-and the current TopologyType is the horizontal.
-Where a ServerType and a TopologyType intersect,
-the table shows what action the client takes.
-
-"no-op" means,
-do nothing **after** replacing the server's old description
-with the new one.
-
-.. csv-table::
- :header-rows: 1
- :stub-columns: 1
-
- ,TopologyType Unknown,TopologyType Sharded,TopologyType ReplicaSetNoPrimary,TopologyType ReplicaSetWithPrimary
- ServerType Unknown,no-op,no-op,no-op,`checkIfHasPrimary`_
- ServerType Standalone,`updateUnknownWithStandalone`_,`remove`_,`remove`_,`remove`_ and `checkIfHasPrimary`_
- ServerType Mongos,Set topology type to Sharded,no-op,`remove`_,`remove`_ and `checkIfHasPrimary`_
- ServerType RSPrimary,Set topology type to ReplicaSetWithPrimary then `updateRSFromPrimary`_,`remove`_,Set topology type to ReplicaSetWithPrimary then `updateRSFromPrimary`_,`updateRSFromPrimary`_
- ServerType RSSecondary,Set topology type to ReplicaSetNoPrimary then `updateRSWithoutPrimary`_,`remove`_,`updateRSWithoutPrimary`_,`updateRSWithPrimaryFromMember`_
- ServerType RSArbiter,Set topology type to ReplicaSetNoPrimary then `updateRSWithoutPrimary`_,`remove`_,`updateRSWithoutPrimary`_,`updateRSWithPrimaryFromMember`_
- ServerType RSOther,Set topology type to ReplicaSetNoPrimary then `updateRSWithoutPrimary`_,`remove`_,`updateRSWithoutPrimary`_,`updateRSWithPrimaryFromMember`_
- ServerType RSGhost,no-op [#]_,`remove`_,no-op,`checkIfHasPrimary`_
-
-.. [#] `TopologyType remains Unknown when an RSGhost is discovered`_.
-
-TopologyType explanations
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This subsection complements the `TopologyType table`_
-with prose explanations of the TopologyTypes (besides Single and LoadBalanced).
-
-TopologyType Unknown
- A starting state.
-
- **Actions**:
-
- * If the incoming ServerType is Unknown (that is, the hello or legacy hello call failed),
- keep the server in TopologyDescription.servers.
- The TopologyType remains Unknown.
- * The `TopologyType remains Unknown when an RSGhost is discovered`_, too.
- * If the type is Standalone, run `updateUnknownWithStandalone`_.
- * If the type is Mongos, set the TopologyType to Sharded.
- * If the type is RSPrimary, record its setName
- and call `updateRSFromPrimary`_.
- * If the type is RSSecondary, RSArbiter or RSOther, record its setName,
- set the TopologyType to ReplicaSetNoPrimary,
- and call `updateRSWithoutPrimary`_.
-
-TopologyType Sharded
- A steady state. Connected to one or more mongoses.
-
- **Actions**:
-
- * If the server is Unknown or Mongos, keep it.
- * Remove others.
-
-TopologyType ReplicaSetNoPrimary
- A starting state.
- The topology is definitely a replica set,
- but no primary is known.
-
- **Actions**:
-
- * Keep Unknown servers.
- * Keep RSGhost servers: they are members of some replica set,
- perhaps this one, and may recover.
- (See `RSGhost and RSOther`_.)
- * Remove any Standalones or Mongoses.
- * If the type is RSPrimary call `updateRSFromPrimary`_.
- * If the type is RSSecondary, RSArbiter or RSOther,
- run `updateRSWithoutPrimary`_.
-
-TopologyType ReplicaSetWithPrimary
- A steady state. The primary is known.
-
- **Actions**:
-
- * If the server type is Unknown, keep it,
- and run `checkIfHasPrimary`_.
- * Keep RSGhost servers: they are members of some replica set,
- perhaps this one, and may recover.
- (See `RSGhost and RSOther`_.)
- Run `checkIfHasPrimary`_.
- * Remove any Standalones or Mongoses
- and run `checkIfHasPrimary`_.
- * If the type is RSPrimary run `updateRSFromPrimary`_.
- * If the type is RSSecondary, RSArbiter or RSOther,
- run `updateRSWithPrimaryFromMember`_.
-
-Actions
-```````
-
-updateUnknownWithStandalone
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This subroutine is executed with the ServerDescription from Standalone
-when the TopologyType is Unknown:
-
-.. code-block:: python
-
- if description.address not in topologyDescription.servers:
- return
-
- if settings.seeds has one seed:
- topologyDescription.type = Single
- else:
- remove this server from topologyDescription and stop monitoring it
-
-See `TopologyType remains Unknown when one of the seeds is a Standalone`_.
-
-updateRSWithoutPrimary
-~~~~~~~~~~~~~~~~~~~~~~
-
-This subroutine is executed
-with the ServerDescription from an RSSecondary, RSArbiter, or RSOther
-when the TopologyType is ReplicaSetNoPrimary:
-
-.. code-block:: python
-
- if description.address not in topologyDescription.servers:
- return
-
- if topologyDescription.setName is null:
- topologyDescription.setName = description.setName
-
- else if topologyDescription.setName != description.setName:
- remove this server from topologyDescription and stop monitoring it
- return
-
- for each address in description's "hosts", "passives", and "arbiters":
- if address is not in topologyDescription.servers:
- add new default ServerDescription of type "Unknown"
- begin monitoring the new server
-
- if description.primary is not null:
- find the ServerDescription in topologyDescription.servers whose
- address equals description.primary
-
- if its type is Unknown, change its type to PossiblePrimary
-
- if description.address != description.me:
- remove this server from topologyDescription and stop monitoring it
- return
-
-Unlike `updateRSFromPrimary`_,
-this subroutine does **not** remove any servers from the TopologyDescription
-based on the list of servers in the "hosts" field of the hello or legacy hello
-response. The only server that might be removed is the server itself that the
-hello or legacy hello response is from.
-
-The special handling of description.primary
-ensures that a single-threaded client
-`scans`_ the possible primary before other members.
-
-See `replica set monitoring with and without a primary`_.
-
-updateRSWithPrimaryFromMember
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This subroutine is executed with the ServerDescription from
-an RSSecondary, RSArbiter, or RSOther when the TopologyType is ReplicaSetWithPrimary:
-
-.. code-block:: python
-
- if description.address not in topologyDescription.servers:
- # While we were checking this server, another thread heard from the
- # primary that this server is not in the replica set.
- return
-
- # SetName is never null here.
- if topologyDescription.setName != description.setName:
- remove this server from topologyDescription and stop monitoring it
- checkIfHasPrimary()
- return
-
- if description.address != description.me:
- remove this server from topologyDescription and stop monitoring it
- checkIfHasPrimary()
- return
-
- # Had this member been the primary?
- if there is no primary in topologyDescription.servers:
- topologyDescription.type = ReplicaSetNoPrimary
-
- if description.primary is not null:
- find the ServerDescription in topologyDescription.servers whose
- address equals description.primary
-
- if its type is Unknown, change its type to PossiblePrimary
-
-The special handling of description.primary
-ensures that a single-threaded client
-`scans`_ the possible primary before other members.
-
-
-updateRSFromPrimary
-~~~~~~~~~~~~~~~~~~~
-
-This subroutine is executed with a ServerDescription of type RSPrimary:
-
-.. code-block:: python
-
- if serverDescription.address not in topologyDescription.servers:
- return
-
- if topologyDescription.setName is null:
- topologyDescription.setName = serverDescription.setName
-
- else if topologyDescription.setName != serverDescription.setName:
- # We found a primary but it doesn't have the setName
- # provided by the user or previously discovered.
- remove this server from topologyDescription and stop monitoring it
- checkIfHasPrimary()
- return
-
- # Election ids are ObjectIds, see
- # see "Using electionId and setVersion to detect stale primaries"
- # for comparison rules.
-
- if serverDescription.maxWireVersion >= 17: # MongoDB 6.0+
- # Null values for both electionId and setVersion are always considered less than
- if serverDescription.electionId > topologyDescription.maxElectionId or (
- serverDescription.electionId == topologyDescription.maxElectionId
- and serverDescription.setVersion >= topologyDescription.maxSetVersion
- ):
- topologyDescription.maxElectionId = serverDescription.electionId
- topologyDescription.maxSetVersion = serverDescription.setVersion
- else:
- # Stale primary.
- # replace serverDescription with a default ServerDescription of type "Unknown"
- checkIfHasPrimary()
- return
- else:
- # Maintain old comparison rules, namely setVersion is checked before electionId
- if serverDescription.setVersion is not null and serverDescription.electionId is not null:
- if (
- topologyDescription.maxSetVersion is not null
- and topologyDescription.maxElectionId is not null
- and (
- topologyDescription.maxSetVersion > serverDescription.setVersion
- or (
- topologyDescription.maxSetVersion == serverDescription.setVersion
- and topologyDescription.maxElectionId > serverDescription.electionId
- )
- )
- ):
- # Stale primary.
- # replace serverDescription with a default ServerDescription of type "Unknown"
- checkIfHasPrimary()
- return
-
- topologyDescription.maxElectionId = serverDescription.electionId
-
- if serverDescription.setVersion is not null and (
- topologyDescription.maxSetVersion is null
- or serverDescription.setVersion > topologyDescription.maxSetVersion
- ):
- topologyDescription.maxSetVersion = serverDescription.setVersion
-
-
- for each server in topologyDescription.servers:
- if server.address != serverDescription.address:
- if server.type is RSPrimary:
- # See note below about invalidating an old primary.
- replace the server with a default ServerDescription of type "Unknown"
-
- for each address in serverDescription's "hosts", "passives", and "arbiters":
- if address is not in topologyDescription.servers:
- add new default ServerDescription of type "Unknown"
- begin monitoring the new server
-
- for each server in topologyDescription.servers:
- if server.address not in serverDescription's "hosts", "passives", or "arbiters":
- remove the server and stop monitoring it
-
- checkIfHasPrimary()
-
-A note on invalidating the old primary:
-when a new primary is discovered,
-the client finds the previous primary (there should be none or one)
-and replaces its description
-with a default ServerDescription of type "Unknown."
-A multi-threaded client MUST `request an immediate check`_ for that server as
-soon as possible.
-
-If the old primary server version is 4.0 or earlier,
-the client MUST clear its connection pool for the old primary, too:
-the connections are all bad because the old primary has closed its sockets.
-If the old primary server version is 4.2 or newer, the client MUST NOT
-clear its connection pool for the old primary.
-
-See `replica set monitoring with and without a primary`_.
-
-If the server is primary with an obsolete electionId or setVersion, it is
-likely a stale primary that is going to step down. Mark it Unknown and let periodic
-monitoring detect when it becomes secondary. See
-`using electionId and setVersion to detect stale primaries`_.
-
-A note on checking "me": Unlike `updateRSWithPrimaryFromMember`, there is no need to remove the server if the address is not equal to
-"me": since the server address will not be a member of either "hosts", "passives", or "arbiters", the server will already have been
-removed.
-
-checkIfHasPrimary
-~~~~~~~~~~~~~~~~~
-
-Set TopologyType to ReplicaSetWithPrimary if there is an RSPrimary
-in TopologyDescription.servers, otherwise set it to ReplicaSetNoPrimary.
-
-For example, if the TopologyType is ReplicaSetWithPrimary
-and the client is processing a new ServerDescription of type Unknown,
-that could mean the primary just disconnected,
-so checkIfHasPrimary must run to check if the TopologyType should become
-ReplicaSetNoPrimary.
-
-Another example is if the client first reaches the primary via its external
-IP, but the response's host list includes only internal IPs.
-In that case the client adds the primary's internal IP to the
-TopologyDescription and begins monitoring it, and removes the external IP.
-Right after removing the external IP from the description,
-the TopologyType MUST be ReplicaSetNoPrimary, since no primary is
-available at this moment.
-
-remove
-~~~~~~
-
-Remove the server from TopologyDescription.servers and stop monitoring it.
-
-In multi-threaded clients, a monitor may be currently checking this server
-and may not immediately abort.
-Once the check completes, this server's hello or legacy hello outcome MUST be
-ignored, and the monitor SHOULD halt.
-
-Logical Session Timeout
-```````````````````````
-
-Whenever a client updates the TopologyDescription from a hello or legacy hello response,
-it MUST set TopologyDescription.logicalSessionTimeoutMinutes to the smallest
-logicalSessionTimeoutMinutes value among ServerDescriptions of all data-bearing
-server types. If any have a null logicalSessionTimeoutMinutes,
-then TopologyDescription.logicalSessionTimeoutMinutes MUST be set to null.
-
-See the Driver Sessions Spec for the purpose of this value.
-
-.. _drivers update their topology view in response to errors:
-
-
-Connection Pool Management
-''''''''''''''''''''''''''
-
-For drivers that support connection pools, after a server check is
-completed successfully, if the server is determined to be
-`data-bearing `_
-or a
-`direct connection `__
-to the server is requested,
-and does not already have a connection pool, the driver MUST create
-the connection pool for the server. Additionally, if a driver
-implements a CMAP compliant connection pool, the server's pool (even
-if it already existed) MUST be marked as "ready". See the `Server
-Monitoring spec`_ for more information.
-
-Clearing the connection pool for a server MUST be synchronized with
-the update to the corresponding ServerDescription (e.g. by holding the
-lock on the TopologyDescription when clearing the pool). This prevents
-a possible race between the monitors and application threads. See `Why
-synchronize clearing a server's pool with updating the topology?`_ for
-more information.
-
-Error handling
-''''''''''''''
-
-Network error during server check
-`````````````````````````````````
-
-See error handling in the `Server Monitoring spec`_.
-
-Application errors
-``````````````````
-
-When processing a network or command error, clients MUST first check the
-error's `generation number`_. If the error's generation number is equal to
-the pool's generation number then error handling MUST continue according to
-`Network error when reading or writing`_ or
-`"not writable primary" and "node is recovering"`_. Otherwise, the error is considered
-stale and the client MUST NOT update any topology state.
-(See `Why ignore errors based on CMAP's generation number?`_)
-
-Error handling pseudocode
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Application operations can fail in various places, for example:
-
-- A network error, network timeout, or command error may occur while
- establishing a new connection. Establishing a connection includes the
- MongoDB handshake and completing authentication (if configured).
-- A network error or network timeout may occur while reading or writing to an
- established connection.
-- A command error may be returned from the server.
-- A "writeConcernError" field may be included in the command response.
-
-Depending on the context, these errors may update SDAM state by marking
-the server Unknown and may clear the server's connection pool. Some errors
-also require other side effects, like cancelling a check or requesting an
-immediate check. Drivers may use the following pseudocode to guide their
-implementation:
-
-.. code-block:: python
-
- def handleError(error):
- address = error.address
- topologyVersion = error.topologyVersion
-
- with client.lock:
- # Ignore stale errors based on generation and topologyVersion.
- if isStaleError(client.topologyDescription, error)
- return
-
- if isStateChangeError(error):
- # Don't mark server unknown in load balanced mode.
- if type != LoadBalanced
- # Mark the server Unknown
- unknown = new ServerDescription(type=Unknown, error=error, topologyVersion=topologyVersion)
- onServerDescriptionChanged(unknown, connection pool for server)
- if isShutdown(code) or (error was from <4.2):
- # the pools must only be cleared while the lock is held.
- if type == LoadBalanced:
- clear connection pool for serviceId
- else:
- clear connection pool for server
- if multi-threaded:
- request immediate check
- else:
- # Check right now if this is "not writable primary", since it might be a
- # useful secondary. If it's "node is recovering" leave it for the
- # next full scan.
- if isNotWritablePrimary(error):
- check failing server
- elif isNetworkError(error) or (not error.completedHandshake and (isNetworkTimeout(error) or isAuthError(error))):
- if type != LoadBalanced
- # Mark the server Unknown
- unknown = new ServerDescription(type=Unknown, error=error)
- onServerDescriptionChanged(unknown, connection pool for server)
- clear connection pool for server
- else
- if serviceId
- clear connection pool for serviceId
- # Cancel inprogress check
- cancel monitor check
-
- def isStaleError(topologyDescription, error):
- currentServer = topologyDescription.servers[server.address]
- currentGeneration = currentServer.pool.generation
- generation = get connection generation from error
- if generation < currentGeneration:
- # Stale generation number.
- return True
-
- currentTopologyVersion = currentServer.topologyVersion
- # True if the current error's topologyVersion is greater than the server's
- # We use >= instead of > because any state change should result in a new topologyVersion
- return compareTopologyVersion(currentTopologyVersion, error.commandResponse.get("topologyVersion")) >= 0
-
-The following pseudocode checks a response for a "not master" or "node is
-recovering" error:
-
-.. code-block:: python
-
- recoveringCodes = [11600, 11602, 13436, 189, 91]
- notWritablePrimaryCodes = [10107, 13435, 10058]
- shutdownCodes = [11600, 91]
-
- def isRecovering(message, code):
- if code:
- if code in recoveringCodes:
- return true
- else:
- # if no code, use the error message.
- return ("not master or secondary" in message
- or "node is recovering" in message)
-
- def isNotWritablePrimary(message, code):
- if code:
- if code in notWritablePrimaryCodes:
- return true
- else:
- # if no code, use the error message.
- if isRecovering(message, None):
- return false
- return ("not master" in message)
-
- def isShutdown(code):
- if code and code in shutdownCodes:
- return true
- return false
-
- def isStateChangeError(error):
- message = error.errmsg
- code = error.code
- return isRecovering(message, code) or isNotWritablePrimary(message, code)
-
- def parseGle(response):
- if "err" in response:
- handleError(CommandError(response, response["err"], response["code"]))
-
- # Parse response to any command besides getLastError.
- def parseCommandResponse(response):
- if not response["ok"]:
- handleError(CommandError(response, response["errmsg"], response["code"]))
- else if response["writeConcernError"]:
- wce = response["writeConcernError"]
- handleError(WriteConcernError(response, wce["errmsg"], wce["code"]))
-
- def parseQueryResponse(response):
- if the "QueryFailure" bit is set in response flags:
- handleError(CommandError(response, response["$err"], response["code"]))
-
-The following sections describe the handling of different classes of
-application errors in detail including network errors, network timeout errors,
-state change errors, and authentication errors.
-
-Network error when reading or writing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-To describe how the client responds to network errors during application operations,
-we distinguish two phases of connecting to a server and using it for application operations:
-
-- *Before the handshake completes*: the client establishes a new connection to the server
- and completes an initial handshake by calling "hello" or legacy hello and reading the
- response, and optionally completing authentication
-- *After the handshake completes*: the client uses the established connection for
- application operations
-
-If there is a network error or timeout on the connection before the handshake completes,
-the client MUST replace the server's description
-with a default ServerDescription of type Unknown when the TopologyType is not
-LoadBalanced, and fill the ServerDescription's error field with useful information.
-
-If there is a network error or timeout on the connection before the handshake completes,
-and the TopologyType is LoadBalanced, the client MUST keep the ServerDescription
-as LoadBalancer.
-
-If there is a network timeout on the connection after the handshake completes,
-the client MUST NOT mark the server Unknown.
-(A timeout may indicate a slow operation on the server,
-rather than an unavailable server.)
-If, however, there is some other network error on the connection after the
-handshake completes, the client MUST replace the server's description
-with a default ServerDescription of type Unknown if the TopologyType is not
-LoadBalanced, and fill the ServerDescription's error field with useful information,
-the same as if an error or timeout occurred before the handshake completed.
-
-When the client marks a server Unknown due to a network error or timeout,
-the Unknown ServerDescription MUST be sent through the same process for
-`updating the TopologyDescription`_ as if it had been a failed hello or legacy hello outcome
-from a server check: for example, if the TopologyType is ReplicaSetWithPrimary
-and a write to the RSPrimary server fails because of a network error
-(other than timeout), then a new ServerDescription is created for the primary,
-with type Unknown, and the client executes the proper subroutine for an
-Unknown server when the TopologyType is ReplicaSetWithPrimary:
-referring to the table above we see the subroutine is `checkIfHasPrimary`_.
-The result is the TopologyType changes to ReplicaSetNoPrimary.
-See the test scenario called "Network error writing to primary".
-
-The client MUST close all idle sockets in its connection pool for the server:
-if one socket is bad, it is likely that all are.
-
-Clients MUST NOT request an immediate check of the server;
-since application sockets are used frequently, a network error likely means
-the server has just become unavailable,
-so an immediate refresh is likely to get a network error, too.
-
-The server will not remain Unknown forever.
-It will be refreshed by the next periodic check or,
-if an application operation needs the server sooner than that,
-then a re-check will be triggered by the server selection algorithm.
-
-"not writable primary" and "node is recovering"
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-These errors are detected from a getLastError response,
-write command response, or query response. Clients MUST check if the server
-error is a "node is recovering" error or a "not writable primary" error.
-
-If the response includes an error code, it MUST be solely used to determine
-if error is a "node is recovering" or "not writable primary" error.
-Clients MUST match the errors by the numeric error code and not by the code
-name, as the code name can change from one server version to the next.
-
-The following error codes indicate a replica set member is temporarily
-unusable. These are called "node is recovering" errors:
-
-.. list-table::
- :header-rows: 1
-
- * - Error Name
- - Error Code
- * - InterruptedAtShutdown
- - 11600
- * - InterruptedDueToReplStateChange
- - 11602
- * - NotPrimaryOrSecondary
- - 13436
- * - PrimarySteppedDown
- - 189
- * - ShutdownInProgress
- - 91
-
-And the following error codes indicate a "not writable primary" error:
-
-.. list-table::
- :header-rows: 1
-
- * - Error Name
- - Error Code
- * - NotWritablePrimary
- - 10107
- * - NotPrimaryNoSecondaryOk
- - 13435
- * - LegacyNotPrimary
- - 10058
-
-Clients MUST fallback to checking the error message if and only if the
-response does not include an error code. The error is considered a "node
-is recovering" error if the substrings "node is recovering" or "not master or
-secondary" are anywhere in the error message. Otherwise, if the substring "not
-master" is in the error message it is a "not writable primary" error.
-
-Additionally, if the response includes a write concern error, then the code
-and message of the write concern error MUST be checked the same way a response
-error is checked above.
-
-Errors contained within the writeErrors field MUST NOT be checked.
-
-See the test scenario called
-"parsing 'not writable primary' and 'node is recovering' errors"
-for example response documents.
-
-When the client sees a "not writable primary" or "node is recovering" error and
-the error's `topologyVersion`_ is strictly greater than the current
-ServerDescription's topologyVersion it MUST replace the server's description
-with a ServerDescription of type Unknown.
-Clients MUST store useful information in the new ServerDescription's error
-field, including the error message from the server.
-Clients MUST store the error's `topologyVersion`_ field in the new
-ServerDescription if present.
-(See `What is the purpose of topologyVersion?`_)
-
-Multi-threaded and asynchronous clients MUST `request an immediate check`_
-of the server.
-Unlike in the "network error" scenario above,
-a "not writable primary" or "node is recovering" error means the server is available
-but the client is wrong about its type,
-thus an immediate re-check is likely to provide useful information.
-
-For single-threaded clients, in the case of a "not writable primary" or "node is
-shutting down" error, the client MUST mark the topology as "stale" so the next
-server selection scans all servers. For a "node is recovering" error,
-single-threaded clients MUST NOT mark the topology as "stale". If a node is
-recovering for some time, an immediate scan may not gain useful information.
-
-The following subset of "node is recovering" errors is defined to be "node is
-shutting down" errors:
-
-.. list-table::
- :header-rows: 1
-
- * - Error Name
- - Error Code
- * - InterruptedAtShutdown
- - 11600
- * - ShutdownInProgress
- - 91
-
-When handling a "not writable primary" or "node is recovering" error, the client MUST
-clear the server's connection pool if and only if the error is
-"node is shutting down" or the error originated from server version < 4.2.
-
-(See `when does a client see "not writable primary" or "node is recovering"?`_, `use
-error messages to detect "not master" and "node is recovering"`_, and `other
-transient errors`_ and `Why close connections when a node is shutting down?`_.)
-
-Authentication errors
-~~~~~~~~~~~~~~~~~~~~~
-
-If the authentication handshake fails for a connection, drivers MUST mark the
-server Unknown and clear the server's connection pool if the TopologyType is
-not LoadBalanced. (See `Why mark a server Unknown after an auth error?`_)
-
-Monitoring SDAM events
-''''''''''''''''''''''
-
-The required driver specification for providing lifecycle hooks into server
-discovery and monitoring for applications to consume can be found in the
-`SDAM Monitoring Specification`_.
-
-Implementation notes
-''''''''''''''''''''
-
-This section intends to provide generous guidance to driver authors.
-It is complementary to the reference implementations.
-Words like "should", "may", and so on are used more casually here.
-
-See also, the implementation notes in the `Server Monitoring spec`_.
-
-.. _interaction between monitoring and server selection:
-
-Multi-threaded or asynchronous server selection
-```````````````````````````````````````````````
-
-While no suitable server is available for an operation,
-`the client MUST re-check all servers every minHeartbeatFrequencyMS`_.
-(See `requesting an immediate check`_.)
-
-Single-threaded server selection
-````````````````````````````````
-
-When a client that uses `single-threaded monitoring`_
-fails to select a suitable server for any operation,
-it `scans`_ the servers, then attempts selection again,
-to see if the scan discovered suitable servers. It repeats, waiting
-`minHeartbeatFrequencyMS`_ after each scan, until a timeout.
-
-Documentation
-`````````````
-
-Giant seed lists
-~~~~~~~~~~~~~~~~
-
-Drivers' manuals should warn against huge seed lists,
-since it will slow initialization for single-threaded clients
-and generate load for multi-threaded and asynchronous drivers.
-
-.. _implementation notes for multi-threaded clients:
-
-Multi-threaded
-``````````````
-
-.. _use min and maxWireVersion only to determine compatibility:
-
-Warning about the maxWireVersion from a monitor's hello or legacy hello response
-````````````````````````````````````````````````````````````````````````````````
-
-Clients consult some fields from a server's hello or legacy hello response
-to decide how to communicate with it:
-
-* maxWireVersion
-* maxBsonObjectSize
-* maxMessageSizeBytes
-* maxWriteBatchSize
-
-It is tempting to take these values
-from the last hello or legacy hello response a *monitor* received
-and store them in the ServerDescription, but this is an anti-pattern.
-Multi-threaded and asynchronous clients that do so
-are prone to several classes of race, for example:
-
-* Setup: A MongoDB 3.0 Standalone with authentication enabled,
- the client must log in with SCRAM-SHA-1.
-* The monitor thread discovers the server
- and stores maxWireVersion on the ServerDescription
-* An application thread wants a socket, selects the Standalone,
- and is about to check the maxWireVersion on its ServerDescription when...
-* The monitor thread gets disconnected from server and marks it Unknown,
- with default maxWireVersion of 0.
-* The application thread resumes, creates a socket,
- and attempts to log in using MONGODB-CR,
- since maxWireVersion is *now* reported as 0.
-* Authentication fails, the server requires SCRAM-SHA-1.
-
-Better to call hello or legacy hello for each new socket, as required by the `Auth Spec
-<../auth/auth.md>`_,
-and use the hello or legacy hello response associated with that socket
-for maxWireVersion, maxBsonObjectSize, etc.:
-all the fields required to correctly communicate with the server.
-
-The hello or legacy hello responses received by monitors determine if the topology
-as a whole `is compatible`_ with the driver,
-and which servers are suitable for selection.
-The monitors' responses should not be used to determine how to format
-wire protocol messages to the servers.
-
-Immutable data
-~~~~~~~~~~~~~~
-
-Multi-threaded drivers should treat
-ServerDescriptions and
-TopologyDescriptions as immutable:
-the client replaces them, rather than modifying them,
-in response to new information about the topology.
-Thus readers of these data structures
-can simply acquire a reference to the current one
-and read it, without holding a lock that would block a monitor
-from making further updates.
-
-Process one hello or legacy hello outcome at a time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Although servers are checked in parallel,
-the function that actually creates the new TopologyDescription
-should be synchronized so only one thread can run it at a time.
-
-.. _onServerDescriptionChanged:
-
-Replacing the TopologyDescription
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Drivers may use the following pseudocode to guide their
-implementation. The client object has a lock and a condition
-variable. It uses the lock to ensure that only one new
-ServerDescription is processed at a time, and it must be acquired
-before invoking this function. Once the client has taken the lock it
-must do no I/O::
-
- def onServerDescriptionChanged(server, pool):
- # "server" is the new ServerDescription.
- # "pool" is the pool associated with the server
-
- if server.address not in client.topologyDescription.servers:
- # The server was once in the topologyDescription, otherwise
- # we wouldn't have been monitoring it, but an intervening
- # state-change removed it. E.g., we got a host list from
- # the primary that didn't include this server.
- return
-
- newTopologyDescription = client.topologyDescription.copy()
-
- # Ignore this update if the current topologyVersion is greater than
- # the new ServerDescription's.
- if isStaleServerDescription(td, server):
- return
-
- # Replace server's previous description.
- address = server.address
- newTopologyDescription.servers[address] = server
-
- # for drivers that implement CMAP, mark the connection pool as ready after a successful check
- if (server.type in (Mongos, RSPrimary, RSSecondary, Standalone, LoadBalanced))
- or (server.type != Unknown and newTopologyDescription.type == Single):
- pool.ready()
-
- take any additional actions,
- depending on the TopologyType and server...
-
- # Replace TopologyDescription and notify waiters.
- client.topologyDescription = newTopologyDescription
- client.condition.notifyAll()
-
- def compareTopologyVersion(tv1, tv2):
- """Return -1 if tv1tv2"""
- if tv1 is None or tv2 is None:
- # Assume greater.
- return -1
- pid1 = tv1['processId']
- pid2 = tv2['processId']
- if pid1 == pid2:
- counter1 = tv1['counter']
- counter2 = tv2['counter']
- if counter1 == counter2:
- return 0
- elif counter1 < counter2:
- return -1
- else:
- return 1
- else:
- # Assume greater.
- return -1
-
- def isStaleServerDescription(topologyDescription, server):
- # True if the new ServerDescription's topologyVersion is greater than
- # or equal to the current server's.
- currentServer = topologyDescription.servers[server.address]
- currentTopologyVersion = currentServer.topologyVersion
- return compareTopologyVersion(currentTopologyVersion, server.topologyVersion) > 0
-
-.. https://github.com/mongodb/mongo-java-driver/blob/5fb47a3bf86c56ed949ce49258a351773f716d07/src/main/com/mongodb/BaseCluster.java#L160
-
-Notifying the condition unblocks threads waiting in the server-selection loop
-for a suitable server to be discovered.
.. note::
- The Java driver uses a CountDownLatch instead of a condition variable,
- and it atomically swaps the old and new CountDownLatches
- so it does not need "client.lock".
- It does, however, use a lock to ensure that only one thread runs
- onServerDescriptionChanged at a time.
-
-Rationale
----------
-
-Clients do no I/O in the constructor
-''''''''''''''''''''''''''''''''''''
-
-An alternative proposal was to distinguish between "discovery" and "monitoring".
-When discovery begins, the client checks all its seeds,
-and discovery is complete once all servers have been checked,
-or after some maximum time.
-Application operations cannot proceed until discovery is complete.
-
-If the discovery phase is distinct,
-then single- and multi-threaded drivers
-could accomplish discovery in the constructor,
-and throw an exception from the constructor
-if the deployment is unavailable or misconfigured.
-This is consistent with prior behavior for many drivers.
-It will surprise some users that the constructor now succeeds,
-but all operations fail.
-
-Similarly for misconfigured seed lists:
-the client may discover a mix of mongoses and standalones,
-or find multiple replica set names.
-It may surprise some users that the constructor succeeds
-and the client attempts to proceed with a compatible subset of the deployment.
-
-Nevertheless, this spec prohibits I/O in the constructor
-for the following reasons:
-
-Common case
-```````````
-
-In the common case, the deployment is available and usable.
-This spec favors allowing operations to proceed as soon as possible
-in the common case,
-at the cost of surprising behavior in uncommon cases.
-
-Simplicity
-``````````
-
-It is simpler to omit a special discovery phase
-and treat all server `checks`_ the same.
-
-Consistency
-```````````
-
-Asynchronous clients cannot do I/O in a constructor,
-so it is consistent to prohibit I/O in other clients' constructors as well.
-
-Restarts
-````````
-
-If clients can be constructed when the deployment is in some states
-but not in other states,
-it leads to an unfortunate scenario:
-When the deployment is passing through a strange state,
-long-running clients may keep working,
-but any clients restarted during this period fail.
-
-Say an administrator changes one replica set member's setName.
-Clients that are already constructed remove the bad member and stay usable,
-but if any client is restarted its constructor fails.
-Web servers that dynamically adjust their process pools
-will show particularly undesirable behavior.
-
-heartbeatFrequencyMS defaults to 10 seconds or 60 seconds
-'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Many drivers have different values. The time has come to standardize.
-Lacking a rigorous methodology for calculating the best frequency,
-this spec chooses 10 seconds for multi-threaded or asynchronous drivers
-because some already use that value.
-
-Because scanning has a greater impact on
-the performance of single-threaded drivers,
-they MUST default to a longer frequency (60 seconds).
-
-An alternative is to check servers less and less frequently
-the longer they remain unchanged.
-This idea is rejected because
-it is a goal of this spec to answer questions about monitoring such as,
-
-* "How rapidly can I rotate a replica set to a new set of hosts?"
-* "How soon after I add a secondary will query load be rebalanced?"
-* "How soon will a client notice a change in round trip time, or tags?"
-
-Having a constant monitoring frequency allows us to answer these questions
-simply and definitively.
-Losing the ability to answer these questions is not worth
-any minor gain in efficiency from a more complex scheduling method.
-
-The client MUST re-check all servers every minHeartbeatFrequencyMS
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-While an application is waiting to do an operation
-for which there is no suitable server,
-a multi-threaded client MUST re-check all servers very frequently.
-The slight cost is worthwhile in many scenarios. For example:
-
-#. A client and a MongoDB server are started simultaneously.
-#. The client checks the server before it begins listening,
- so the check fails.
-#. The client waits in the server-selection loop for the topology to change.
-
-In this state, the client should check the server very frequently,
-to give it ample opportunity to connect to the server before
-timing out in server selection.
-
-No knobs
-''''''''
-
-This spec does not intend to introduce any new configuration options
-unless absolutely necessary.
-
-.. _monitors all three types of servers:
-
-The client MUST monitor arbiters
-''''''''''''''''''''''''''''''''
-
-Mongos 2.6 does not monitor arbiters,
-but it costs little to do so,
-and in the rare case that
-all data members are moved to new hosts in a short time,
-an arbiter may be the client's last hope
-to find the new replica set configuration.
-
-Only support replica set members running MongoDB 1.6.2 or later
-'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Replica set members began reporting their setNames in that version.
-Supporting earlier versions is impractical.
-
-TopologyType remains Unknown when an RSGhost is discovered
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-If the TopologyType is Unknown and the client receives a hello or legacy hello response
-from an`RSGhost`_, the TopologyType could be set to ReplicaSetNoPrimary.
-However, an RSGhost does not report its setName,
-so the setName would still be unknown.
-This adds an additional state to the existing list:
-"TopologyType ReplicaSetNoPrimary **and** no setName."
-The additional state adds substantial complexity
-without any benefit, so this spec says clients MUST NOT change the TopologyType
-when an RSGhost is discovered.
-
-TopologyType remains Unknown when one of the seeds is a Standalone
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-If TopologyType is Unknown and there are multiple seeds,
-and one of them is discovered to be a standalone,
-it MUST be removed.
-The TopologyType remains Unknown.
-
-This rule supports the following common scenario:
-
-#. Servers A and B are in a replica set.
-#. A seed list with A and B is stored in a configuration file.
-#. An administrator removes B from the set and brings it up as standalone
- for maintenance, without changing its port number.
-#. The client is initialized with seeds A and B,
- TopologyType Unknown, and no setName.
-#. The first hello or legacy hello response is from B, the standalone.
-
-What if the client changed TopologyType to Single at this point?
-It would be unable to use the replica set; it would have to remove A
-from the TopologyDescription once A's hello or legacy hello response comes.
-
-The user's intent in this case is clearly to use the replica set,
-despite the outdated seed list. So this spec requires clients to remove B
-from the TopologyDescription and keep the TopologyType as Unknown.
-Then when A's response arrives, the client can set its TopologyType
-to ReplicaSet (with or without primary).
-
-On the other hand,
-if there is only one seed and the seed is discovered to be a Standalone,
-the TopologyType MUST be set to Single.
-
-See the "member brought up as standalone" test scenario.
-
-
-Replica set monitoring with and without a primary
-'''''''''''''''''''''''''''''''''''''''''''''''''
-
-The client strives to fill the "servers" list
-only with servers that the **primary**
-said were members of the replica set,
-when the client most recently contacted the primary.
-
-The primary's view of the replica set is authoritative for two reasons:
-
-1. The primary is never on the minority side of a network partition.
- During a partition it is the primary's list of
- servers the client should use.
-2. Since reconfigs must be executed on the primary,
- the primary is the first to know of them.
- Reconfigs propagate to non-primaries eventually,
- but the client can receive hello or legacy hello responses from non-primaries
- that reflect any past state of the replica set.
- See the "Replica set discovery" test scenario.
-
-If at any time the client believes there is no primary,
-the TopologyDescription's type is set to ReplicaSetNoPrimary.
-While there is no known primary,
-the client MUST **add** servers from non-primaries' host lists,
-but it MUST NOT remove servers from the TopologyDescription.
-
-Eventually, when a primary is discovered, any hosts not in the primary's host
-list are removed.
-
-.. _stale primaries:
-
-Using electionId and setVersion to detect stale primaries
-'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Replica set members running MongoDB 2.6.10+ or 3.0+ include an integer called
-"setVersion" and an ObjectId called
-"electionId" in their hello or legacy hello response.
-Starting with MongoDB 3.2.0, replica sets can use two different replication
-protocol versions; electionIds from one protocol version must not be compared
-to electionIds from a different protocol.
-
-Because protocol version changes require replica set reconfiguration,
-clients use the tuple (electionId, setVersion) to detect stale primaries.
-The tuple order comparison MUST be checked in the order of electionId followed
-by setVersion since that order of comparison is guaranteed monotonicity.
-
-The client remembers the greatest electionId and setVersion reported by a primary,
-and distrusts primaries from older electionIds or from the same electionId
-but with lesser setVersion.
-
-- It compares electionIds as 12-byte sequence i.e. memory comparison.
-- It compares setVersions as integer values.
-
-This prevents the client from oscillating
-between the old and new primary during a split-brain period,
-and helps provide read-your-writes consistency with write concern "majority"
-and read preference "primary".
-
-Prior to MongoDB server version 6.0 drivers had the logic opposite from
-the server side Replica Set Management logic by ordering the tuple by ``setVersion`` before the ``electionId``.
-In order to remain compatibility with backup systems, etc. drivers continue to
-maintain the reversed logic when connected to a topology that reports a maxWireVersion less than ``17``.
-Server versions 6.0 and beyond MUST order the tuple by ``electionId`` then ``setVersion``.
-
-Requirements for read-your-writes consistency
-`````````````````````````````````````````````
-
-Using (electionId, setVersion) only provides read-your-writes consistency if:
-
-* The application uses the same MongoClient instance for write-concern
- "majority" writes and read-preference "primary" reads, and
-* All members use MongoDB 2.6.10+, 3.0.0+ or 3.2.0+ with replication protocol 0
- and clocks are *less* than 30 seconds skewed, or
-* All members run MongoDB 3.2.0 and replication protocol 1
- and clocks are *less* skewed than the election timeout
- (`electionTimeoutMillis`, which defaults to 10 seconds), or
-* All members run MongoDB 3.2.1+ and replication protocol 1
- (in which case clocks need not be synchronized).
-
-Scenario
-````````
-
-Consider the following situation:
-
-1. Server A is primary.
-2. A network partition isolates A from the set, but the client still sees it.
-3. Server B is elected primary.
-4. The client discovers that B is primary, does a write-concern "majority"
- write operation on B and receives acknowledgment.
-5. The client receives a hello or legacy hello response from A, claiming A is still primary.
-6. If the client trusts that A is primary, the next read-preference "primary"
- read sees stale data from A that may *not* include the write sent to B.
-
-See `SERVER-17975 `_, "Stale
-reads with WriteConcern Majority and ReadPreference Primary."
-
-Detecting a stale primary
-`````````````````````````
-
-To prevent this scenario, the client uses electionId and setVersion to
-determine which primary was elected last. In this case, it would not consider
-"A" a primary, nor read from it because server B will have a greater electionId
-but the same setVersion.
-
-Monotonicity
-````````````
-
-The electionId is an ObjectId compared bytewise in order.
-
-(ie. 000000000000000000000001 > 000000000000000000000000, FF0000000000000000000000 > FE0000000000000000000000 etc.)
-
-In some server versions, it is monotonic with respect
-to a particular servers' system clock, but is not globally monotonic across
-a deployment. However, if inter-server clock skews are small, it can be
-treated as a monotonic value.
-
-In MongoDB 2.6.10+ (which has `SERVER-13542 `_ backported),
-MongoDB 3.0.0+ or MongoDB 3.2+ (under replication protocol version 0),
-the electionId's leading bytes are a server timestamp.
-As long as server clocks are skewed *less* than 30 seconds,
-electionIds can be reliably compared.
-(This is precise enough, because in replication protocol version 0, servers
-are designed not to complete more than one election every 30 seconds.
-Elections do not take 30 seconds--they are typically much faster than that--but
-there is a 30-second cooldown before the next election can complete.)
-
-Beginning in MongoDB 3.2.0, under replication protocol version 1,
-the electionId begins with a timestamp, but
-the cooldown is shorter. As long as inter-server clock skew is *less* than
-the configured election timeout (`electionTimeoutMillis`, which defaults to
-10 seconds), then electionIds can be reliably compared.
-
-Beginning in MongoDB 3.2.1, under replication protocol version 1,
-the electionId is guaranteed monotonic
-without relying on any clock synchronization.
-
-Using me field to detect seed list members that do not match host names in the replica set configuration
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Removal from the topology of seed list members where the "me" property does not
-match the address used to connect prevents clients from being able to select
-a server, only to fail to re-select that server once the primary has responded.
-
-This scenario illustrates the problems that arise if this is NOT done:
-
-* The client specifies a seed list of A, B, C
-* Server A responds as a secondary with hosts D, E, F
-* The client executes a query with read preference of secondary, and server A
- is selected
-* Server B responds as a primary with hosts D, E, F. Servers A, B, C are
- removed, as they don't appear in the primary's hosts list
-* The client iterates the cursor and attempts to execute a getMore against
- server A.
-* Server selection fails because server A is no longer part of the topology.
-
-With checking for "me" in place, it looks like this instead:
-
-* The client specifies a seed list of A, B, C
-* Server A responds as a secondary with hosts D, E, F, where "me" is D, and so
- the client adds D, E, F as type "Unknown" and starts monitoring them, but
- removes A from the topology.
-* The client executes a query with read preference of secondary, and goes into
- the server selection loop
-* Server D responds as a secondary where "me" is D
-* Server selection completes by matching D
-* The client iterates the cursor and attempts to execute a getMore against
- server D.
-* Server selection completes by matching D.
-
-Ignore setVersion unless the server is primary
-''''''''''''''''''''''''''''''''''''''''''''''
-
-It was thought that if all replica set members report a setVersion,
-and a secondary's response has a higher setVersion than any seen,
-that the secondary's host list could be considered as authoritative
-as the primary's. (See `Replica set monitoring with and without a primary`_.)
-
-This scenario illustrates the problem with setVersion:
-
-* We have a replica set with servers A, B, and C.
-* Server A is the primary, with setVersion 4.
-* An administrator runs replSetReconfig on A,
- which increments its setVersion to 5.
-* The client checks Server A and receives the new config.
-* Server A crashes before any secondary receives the new config.
-* Server B is elected primary. It has the old setVersion 4.
-* The client ignores B's version of the config
- because its setVersion is not greater than 5.
-
-The client may never correct its view of the topology.
-
-Even worse:
-
-* An administrator runs replSetReconfig
- on Server B, which increments its setVersion to 5.
-* Server A restarts.
- This results in *two* versions of the config,
- both claiming to be version 5.
-
-If the client trusted the setVersion in this scenario,
-it would trust whichever config it received first.
-
-mongos 2.6 ignores setVersion and only trusts the primary.
-This spec requires all clients to ignore setVersion from non-primaries.
-
-Use error messages to detect "not master" and "node is recovering"
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-When error codes are not available, error messages are checked for the
-substrings "not master" and "node is recovering". This is because older server
-versions returned unstable error codes or no error codes in many
-circumstances.
-
-Other transient errors
-''''''''''''''''''''''
-
-There are other transient errors a server may return, e.g. retryable errors
-listed in the retryable writes spec. SDAM does not consider these because they
-do not imply the connected server should be marked as "Unknown". For example,
-the following errors may be returned from a mongos when it cannot route to a
-shard:
-
-.. list-table::
- :header-rows: 1
-
- * - Error Name
- - Error Code
- * - HostNotFound
- - 7
- * - HostUnreachable
- - 6
- * - NetworkTimeout
- - 89
- * - SocketException
- - 9001
-
-When these are returned, the mongos should *not* be marked as "Unknown", since
-it is more likely an issue with the shard.
-
-Why ignore errors based on CMAP's generation number?
-''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Using CMAP's `generation number`_ solves the following race condition among
-application threads and the monitor during error handling:
-
-#. Two concurrent writes begin on application threads A and B.
-#. The server restarts.
-#. Thread A receives the first non-timeout network error, and the client
- marks the server Unknown, and clears the server's pool.
-#. The client re-checks the server and marks it Primary.
-#. Thread B receives the second non-timeout network error and the client
- marks the server Unknown again.
-
-The core issue is that the client processes errors in arbitrary order
-and may overwrite fresh information about the server's status with stale
-information. Using CMAP's generation number avoids the race condition because
-the duplicate (or stale) network error can be identified (changes in
-**bold**):
-
-#. Two concurrent writes begin on application threads A and B, **with
- generation 1**.
-#. The server restarts.
-#. Thread A receives the first non-timeout network error, and the client
- marks the server Unknown, and clears the server's pool. **The
- pool's generation is now 2.**
-#. The client re-checks the server and marks it Primary.
-#. Thread B receives the second non-timeout network error, **and the
- client ignores the error because the error originated from a
- connection with generation 1.**
-
-Why synchronize clearing a server's pool with updating the topology?
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Doing so solves the following race condition among application threads
-and the monitor during error handling, similar to the previous
-example:
-
-#. A write begins on an application thread.
-#. The server restarts.
-#. The application thread receives a non-timeout network error.
-#. The application thread acquires the lock on the TopologyDescription, marks
- the Server as Unknown, and releases the lock.
-#. The monitor re-checks the server and marks it Primary and its pool
- as "ready".
-#. Several other application threads enter the WaitQueue of the
- server's pool.
-#. The application thread clears the server's pool, evicting all those
- new threads from the WaitQueue, causing them to return errors or to
- retry. Additionally, the pool is now "paused", but the server is
- considered the Primary, meaning future operations will be routed to
- the server and fail until the next heartbeat marks the pool as
- "ready" again.
-
-If marking the server as Unknown and clearing its pool were
-synchronized, then the monitor marking the server as Primary after its
-check would happen after the pool was cleared and thus avoid putting
-it an inconsistent state.
-
-What is the purpose of topologyVersion?
-'''''''''''''''''''''''''''''''''''''''
-
-`topologyVersion`_ solves the following race condition among application
-threads and the monitor when handling `State Change Errors`_:
-
-#. Two concurrent writes begin on application threads A and B.
-#. The primary steps down.
-#. Thread A receives the first State Change Error, the client marks the
- server Unknown.
-#. The client re-checks the server and marks it Secondary.
-#. Thread B receives a delayed State Change Error and the client marks
- the server Unknown again.
-
-The core issue is that the client processes errors in arbitrary order
-and may overwrite fresh information about the server's status with stale
-information. Using topologyVersion avoids the race condition because the
-duplicate (or stale) State Change Errors can be identified (changes in
-**bold**):
-
-#. Two concurrent writes begin on application threads A and B.
-
- a. **The primary's ServerDescription.topologyVersion == tv1**
-
-#. The primary steps down **and sets its topologyVersion to tv2**.
-#. Thread A receives the first State Change Error **containing tv2**,
- the client marks the server Unknown (**with topologyVersion: tv2**).
-#. The client re-checks the server and marks it Secondary (**with
- topologyVersion: tv2**).
-#. Thread B receives a delayed State Change Error (**with
- topologyVersion: tv2**) **and the client ignores the error because
- the error's topologyVersion (tv2) is not greater than the current
- ServerDescription (tv2).**
-
-Why mark a server Unknown after an auth error?
-''''''''''''''''''''''''''''''''''''''''''''''
-
-The `Authentication spec`_ requires that when authentication fails on a server,
-the driver MUST clear the server's connection pool. Clearing the pool without
-marking the server Unknown would leave the pool in the "paused" state while
-the server is still selectable. When auth fails due to invalid credentials,
-marking the server Unknown also serves to rate limit new connections;
-future operations will need to wait for the server to be rediscovered.
-
-Note that authentication may fail for a variety of reasons, for example:
-
-- A network error, or network timeout error may occur.
-- The server may return a `State Change Error`_.
-- The server may return a AuthenticationFailed command error (error code 18)
- indicating that the provided credentials are invalid.
-
-Does this mean that authentication failures due to invalid credentials will
-manifest as server selection timeout errors? No, authentication errors are
-still returned to the application immediately. A subsequent operation will
-block until the server is rediscovered and immediately attempt
-authentication on a new connection.
-
-Clients use the hostnames listed in the replica set config, not the seed list
-'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Very often users have DNS aliases they use in their `seed list`_ instead of
-the hostnames in the replica set config. For example, the name "host_alias"
-might refer to a server also known as "host1", and the URI is::
-
- mongodb://host_alias/?replicaSet=rs
-
-When the client connects to "host_alias", its hello or legacy hello response includes the
-list of hostnames from the replica set config, which does not include the seed::
-
- {
- hosts: ["host1:27017", "host2:27017"],
- setName: "rs",
- ... other hello or legacy hello response fields ...
- }
-
-This spec requires clients to connect to the hostnames listed in the hello or legacy hello
-response. Furthermore, if the response is from a primary, the client MUST
-remove all hostnames not listed. In this case, the client disconnects from
-"host_alias" and tries "host1" and "host2". (See `updateRSFromPrimary`_.)
-
-Thus, replica set members must be reachable from the client by the hostnames
-listed in the replica set config.
-
-An alternative proposal is for clients to continue using the hostnames in the
-seed list. It could add new hosts from the hello or legacy hello response, and where a host
-is known by two names, the client can deduplicate them using the "me" field and
-prefer the name in the seed list.
-
-This proposal was rejected because it does not support key features of replica
-sets: failover and zero-downtime reconfiguration.
-
-In our example, if "host1" and "host2" are not reachable from the client, the
-client continues to use "host_alias" only. If that server goes down or is
-removed by a replica set reconfig, the client is suddenly unable to reach the
-replica set at all: by allowing the client to use the alias, we have hidden the
-fact that the replica set's failover feature will not work in a crisis or
-during a reconfig.
-
-In conclusion, to support key features of replica sets, we require that the
-hostnames used in a replica set config are reachable from the client.
-
-Backwards Compatibility
------------------------
-
-The Java driver 2.12.1 has a "heartbeatConnectRetryFrequency".
-Since this spec recommends the option be named "minHeartbeatFrequencyMS",
-the Java driver must deprecate its old option
-and rename it minHeartbeatFrequency (for consistency with its other options
-which also lack the "MS" suffix).
-
-Reference Implementation
-------------------------
-
-* Java driver 3.x
-* PyMongo 3.x
-* Perl driver 1.0.0 (in progress)
-
-Future Work
------------
-
-MongoDB is likely to add some of the following features,
-which will require updates to this spec:
-
-* Eventually consistent collections (SERVER-2956)
-* Mongos discovery (SERVER-1834)
-* Put individual databases into maintenance mode,
- instead of the whole server (SERVER-7826)
-* Put setVersion in write-command responses (SERVER-13909)
-
-Questions and Answers
----------------------
-
-When does a client see "not writable primary" or "node is recovering"?
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-These errors indicate one of these:
-
-* A write was attempted on an unwritable server
- (arbiter, secondary, ghost, or recovering).
-* A read was attempted on an unreadable server
- (arbiter, ghost, or recovering)
- or a read was attempted on a read-only server without the secondaryOk bit set.
-* An operation was attempted on a server that is now shutting down.
-
-In any case the error is a symptom that
-a ServerDescription's type no longer reflects reality.
-
-On MongoDB 4.0 and earlier, a primary closes its connections when it steps
-down, so in many cases the next operation causes a network error
-rather than "not writable primary".
-The driver can see a "not writable primary" error in the following scenario:
-
-#. The client discovers the primary.
-#. The primary steps down.
-#. Before the client checks the server and discovers the stepdown,
- the application attempts an operation.
-#. The client's connection pool is empty,
- either because it has
- never attempted an operation on this server,
- or because all connections are in use by other threads.
-#. The client creates a connection to the old primary.
-#. The client attempts to write, or to read without the secondaryOk bit,
- and receives "not writable primary".
-
-See `"not writable primary" and "node is recovering"`_,
-and the test scenario called
-"parsing 'not writable primary' and 'node is recovering' errors".
-
-Why close connections when a node is shutting down?
-'''''''''''''''''''''''''''''''''''''''''''''''''''
-
-When a server shuts down, it will return one of the "node is shutting down"
-errors for each attempted operation and eventually will close all connections.
-Keeping a connection to a server which is shutting down open would only
-produce errors on this connection - such a connection will never be usable for
-any operations. In contrast, when a server 4.2 or later returns "not writable primary"
-error the connection may be usable for other operations (such as secondary reads).
-
-What's the point of periodic monitoring?
-''''''''''''''''''''''''''''''''''''''''
-
-Why not just wait until a "not writable primary" error or
-"node is recovering" error informs the client that its
-TopologyDescription is wrong? Or wait until server selection
-fails to find a suitable server, and only scan all servers then?
-
-Periodic monitoring accomplishes three objectives:
-
-* Update each server's type, tags, and `round trip time`_.
- Read preferences and the mongos selection algorithm
- require this information remains up to date.
-* Discover new secondaries so that secondary reads are evenly spread.
-* Detect incremental changes to the replica set configuration,
- so that the client remains connected to the set
- even while it is migrated to a completely new set of hosts.
-
-If the application uses some servers very infrequently,
-monitoring can also proactively detect state changes
-(primary stepdown, server becoming unavailable)
-that would otherwise cause future errors.
-
-Why is auto-discovery the preferred default?
-''''''''''''''''''''''''''''''''''''''''''''
-
-Auto-discovery is most resilient and is therefore preferred.
-
-Why is it possible for maxSetVersion to go down?
-''''''''''''''''''''''''''''''''''''''''''''''''
-
-``maxElectionId`` and ``maxSetVersion`` are actually considered a pair of values
-Drivers MAY consider implementing comparison in code as a tuple of the two to ensure their always updated together:
-
-.. code:: typescript
-
- // New tuple old tuple
- { electionId: 2, setVersion: 1 } > { electionId: 1, setVersion: 50 }
-
-In this scenario, the maxSetVersion goes from 50 to 1, but the maxElectionId is raised to 2.
-
-Acknowledgments
----------------
-
-Jeff Yemin's code for the Java driver 2.12,
-and his patient explanation thereof,
-is the major inspiration for this spec.
-Mathias Stearn's beautiful design for replica set monitoring in mongos 2.6
-contributed as well.
-Bernie Hackett gently oversaw the specification process.
-
-Changelog
----------
-
-:2015-12-17: Require clients to compare (setVersion, electionId) tuples.
-:2015-10-09: Specify electionID comparison method.
-:2015-06-16: Added cooldownMS.
-:2016-05-04: Added link to SDAM monitoring.
-:2016-07-18: Replace mentions of the "Read Preferences Spec" with "Server
- Selection Spec", and "secondaryAcceptableLatencyMS" with
- "localThresholdMS".
-:2016-07-21: Updated for Max Staleness support.
-:2016-08-04: Explain better why clients use the hostnames in RS config, not URI.
-:2016-08-31: Multi-threaded clients SHOULD use hello or legacy hello replies to
- update the topology when they handshake application connections.
-:2016-10-06: In updateRSWithoutPrimary the hello or legacy hello response's
- "primary" field should be used to update the topology description,
- even if address != me.
-:2016-10-29: Allow for idleWritePeriodMS to change someday.
-:2016-11-01: "Unknown" is no longer the default TopologyType, the default is now
- explicitly unspecified. Update instructions for setting the initial
- TopologyType when running the spec tests.
-:2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the
- future.
-:2017-02-28: Update "network error when reading or writing": timeout while
- connecting does mark a server Unknown, unlike a timeout while
- reading or writing. Justify the different behaviors, and also
- remove obsolete reference to auto-retry.
-:2017-06-13: Move socketCheckIntervalMS to Server Selection Spec.
-:2017-08-01: Parse logicalSessionTimeoutMinutes from hello or legacy hello reply.
-:2017-08-11: Clearer specification of "incompatible" logic.
-:2017-09-01: Improved incompatibility error messages.
-:2018-03-28: Specify that monitoring must not do mechanism negotiation or authentication.
-:2019-05-29: Renamed InterruptedDueToStepDown to InterruptedDueToReplStateChange
-:2020-02-13: Drivers must run SDAM flow even when server description is equal to
- the last one.
-:2020-03-31: Add topologyVersion to ServerDescription. Add rules for ignoring
- stale application errors.
-:2020-05-07: Include error field in ServerDescription equality comparison.
-:2020-06-08: Clarify reasoning behind how SDAM determines if a topologyVersion is stale.
-:2020-12-17: Mark the pool for a server as "ready" after performing a successful
- check. Synchronize pool clearing with SDAM updates.
-:2021-01-17: Require clients to compare (electionId, setVersion) tuples.
-:2021-02-11: Errors encountered during auth are handled by SDAM. Auth errors
- mark the server Unknown and clear the pool.
-:2021-04-12: Adding in behaviour for load balancer mode.
-:2021-05-03: Require parsing "isWritablePrimary" field in responses.
-:2021-06-09: Connection pools must be created and eventually marked ready for
- any server if a direct connection is used.
-:2021-06-29: Updated to use modern terminology.
-:2022-01-19: Add iscryptd and 90th percentile RTT fields to ServerDescription.
-:2022-07-11: Convert integration tests to the unified format.
-:2022-09-30: Update ``updateRSFromPrimary`` to include logic before and after 6.0 servers
-:2022-10-05: Remove spec front matter, move footnote, and reformat changelog.
-:2022-11-17: Add minimum RTT tracking and remove 90th percentile RTT.
-:2024-01-17: Add section on expected client close behaviour
-
-----
-
-.. Section for links.
-
-.. _hello or legacy hello: /source/mongodb-handshake/handshake.rst#terms
-.. _connection string: https://www.mongodb.com/docs/manual/reference/connection-string/
-.. _Server Monitoring spec: server-monitoring.rst
-.. _SDAM Monitoring Specification: server-discovery-and-monitoring-logging-and-monitoring.rst
-.. _requesting an immediate check: server-monitoring.rst#requesting-an-immediate-check
-.. _request an immediate check: server-monitoring.rst#requesting-an-immediate-check
-.. _scanning order: server-monitoring.rst#scanning-order
-.. _clients update the topology from each handshake: server-monitoring.rst#clients-update-the-topology-from-each-handshake
-.. _single-threaded monitoring: server-monitoring.rst#single-threaded-monitoring
-.. _Connection Monitoring and Pooling spec: ../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
-.. _CMAP spec: ../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
-.. _Authentication spec: ../auth/auth.md
-.. _Server Monitoring (Measuring RTT): server-monitoring.rst#measuring-rtt
+ This specification has been converted to Markdown and renamed to
+ `server-discovery-and-monitoring.md `_.
diff --git a/source/server-discovery-and-monitoring/server-monitoring.md b/source/server-discovery-and-monitoring/server-monitoring.md
index 2e30583a4d..b03a960166 100644
--- a/source/server-discovery-and-monitoring/server-monitoring.md
+++ b/source/server-discovery-and-monitoring/server-monitoring.md
@@ -581,7 +581,7 @@ class Monitor(Thread):
wait()
def setUpConnection():
- # Take the mutex to avoid a data race becauase this code writes to the connection field and a concurrent
+ # Take the mutex to avoid a data race because this code writes to the connection field and a concurrent
# cancelCheck call could be reading from it.
with lock:
# Server API versioning implies that the server supports hello.
@@ -874,7 +874,7 @@ above mentioned concerns.
In the streaming protocol, clients use the hello or legacy hello command on a dedicated connection to measure a server's
RTT. However, errors encountered when running the RTT command MUST NOT mark a server Unknown. We reached this decision
-because the dedicate RTT connection does not come from a connection pool and thus does not have a generation number
+because the dedicated RTT connection does not come from a connection pool and thus does not have a generation number
associated with it. Without a generation number we cannot handle errors from the RTT command without introducing race
conditions. Introducing such a generation number would add complexity to this design without much benefit. It is safe to
ignore these errors because the Monitor will soon discover the server's state regardless (either through an updated
diff --git a/source/server-discovery-and-monitoring/tests/README.md b/source/server-discovery-and-monitoring/tests/README.md
index a96bcb6490..23f4fe00ab 100644
--- a/source/server-discovery-and-monitoring/tests/README.md
+++ b/source/server-discovery-and-monitoring/tests/README.md
@@ -193,10 +193,9 @@ Run the following test(s) on MongoDB 4.4+.
6. Wait for the server's RTT to exceed 250ms. Eventually the average RTT should also exceed 500ms but we use 250ms to
speed up the test. Note that the
- [Server Description Equality](/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server-description-equality)
- rule means that ServerDescriptionChangedEvents will not be published. This test may need to use a driver specific
- helper to obtain the latest RTT instead. If the RTT does not exceed 250ms after 10 seconds, consider the test
- failed.
+ [Server Description Equality](../server-discovery-and-monitoring.md#server-description-equality) rule means that
+ ServerDescriptionChangedEvents will not be published. This test may need to use a driver specific helper to obtain
+ the latest RTT instead. If the RTT does not exceed 250ms after 10 seconds, consider the test failed.
7. Disable the failpoint:
diff --git a/source/server-selection/server-selection.md b/source/server-selection/server-selection.md
index 644c67c806..063d67af68 100644
--- a/source/server-selection/server-selection.md
+++ b/source/server-selection/server-selection.md
@@ -78,7 +78,7 @@ An OP_QUERY operation targeting the '$cmd' collection namespace.
A driver connection mode that sends all database operations to a single server without regard for
type.
-
+
**Eligible**\
Describes candidate servers that also meet the criteria specified by the `tag_sets` and
@@ -228,7 +228,7 @@ once after server selection fails, then either selects a server or raises an err
The serverSelectionTryOnce option MUST be true by default. If it is set false, then the driver repeatedly searches for
an appropriate server until the selection process times out (pausing
-[minHeartbeatFrequencyMS](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#minheartbeatfrequencyms)
+[minHeartbeatFrequencyMS](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#minheartbeatfrequencyms)
between attempts, as required by the
[Server Discovery and Monitoring](https://github.com/mongodb/specifications/tree/master/source/server-discovery-and-monitoring)
spec).
@@ -249,8 +249,8 @@ for a ["try once" mode](#try-once-mode).)
#### heartbeatFrequencyMS
This controls when topology updates are scheduled. See
-[heartbeatFrequencyMS](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#heartbeatfrequencyms)
-in the
+[heartbeatFrequencyMS](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#heartbeatfrequencyms) in
+the
[Server Discovery and Monitoring](https://github.com/mongodb/specifications/tree/master/source/server-discovery-and-monitoring)
spec for details.
@@ -1576,8 +1576,6 @@ maxStalenessSeconds first, then tag_sets, and select Node 2.
## Changelog
-- 2024-02-07: Migrated from reStructuredText to Markdown.
-
- 2015-06-26: Updated single-threaded selection logic with "stale" and serverSelectionTryOnce.
- 2015-08-10: Updated single-threaded selection logic to ensure a scan always\
@@ -1656,5 +1654,7 @@ maxStalenessSeconds first, then tag_sets, and select Node 2.
- 2023-08-26: Add list of deprioritized servers for sharded cluster topology.
+- 2024-02-07: Migrated from reStructuredText to Markdown.
+
[^1]: mongos 3.4 refuses to connect to mongods with maxWireVersion \< 5, so it does no additional wire version checks
related to maxStalenessSeconds.
diff --git a/source/server-selection/tests/logging/operation-id.json b/source/server-selection/tests/logging/operation-id.json
index 276e4b8d6d..6cdbcb3f5a 100644
--- a/source/server-selection/tests/logging/operation-id.json
+++ b/source/server-selection/tests/logging/operation-id.json
@@ -47,6 +47,9 @@
}
}
],
+ "_yamlAnchors": {
+ "namespace": "logging-tests.server-selection"
+ },
"tests": [
{
"description": "Successful bulkWrite operation: log messages have operationIds",
@@ -224,6 +227,190 @@
]
}
]
+ },
+ {
+ "description": "Successful client bulkWrite operation: log messages have operationIds",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ],
+ "operations": [
+ {
+ "name": "waitForEvent",
+ "object": "testRunner",
+ "arguments": {
+ "client": "client",
+ "event": {
+ "topologyDescriptionChangedEvent": {}
+ },
+ "count": 2
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "logging-tests.server-selection",
+ "document": {
+ "x": 1
+ }
+ }
+ }
+ ]
+ }
+ }
+ ],
+ "expectLogMessages": [
+ {
+ "client": "client",
+ "messages": [
+ {
+ "level": "debug",
+ "component": "serverSelection",
+ "data": {
+ "message": "Server selection started",
+ "operationId": {
+ "$$type": [
+ "int",
+ "long"
+ ]
+ },
+ "operation": "bulkWrite"
+ }
+ },
+ {
+ "level": "debug",
+ "component": "serverSelection",
+ "data": {
+ "message": "Server selection succeeded",
+ "operationId": {
+ "$$type": [
+ "int",
+ "long"
+ ]
+ },
+ "operation": "bulkWrite"
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "Failed client bulkWrite operation: log messages have operationIds",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ],
+ "operations": [
+ {
+ "name": "failPoint",
+ "object": "testRunner",
+ "arguments": {
+ "client": "failPointClient",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": "alwaysOn",
+ "data": {
+ "failCommands": [
+ "hello",
+ "ismaster"
+ ],
+ "appName": "loggingClient",
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "name": "waitForEvent",
+ "object": "testRunner",
+ "arguments": {
+ "client": "client",
+ "event": {
+ "serverDescriptionChangedEvent": {
+ "newDescription": {
+ "type": "Unknown"
+ }
+ }
+ },
+ "count": 1
+ }
+ },
+ {
+ "name": "bulkWrite",
+ "object": "client",
+ "arguments": {
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "logging-tests.server-selection",
+ "document": {
+ "x": 1
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "isClientError": true
+ }
+ }
+ ],
+ "expectLogMessages": [
+ {
+ "client": "client",
+ "messages": [
+ {
+ "level": "debug",
+ "component": "serverSelection",
+ "data": {
+ "message": "Server selection started",
+ "operationId": {
+ "$$type": [
+ "int",
+ "long"
+ ]
+ },
+ "operation": "bulkWrite"
+ }
+ },
+ {
+ "level": "info",
+ "component": "serverSelection",
+ "data": {
+ "message": "Waiting for suitable server to become available",
+ "operationId": {
+ "$$type": [
+ "int",
+ "long"
+ ]
+ },
+ "operation": "bulkWrite"
+ }
+ },
+ {
+ "level": "debug",
+ "component": "serverSelection",
+ "data": {
+ "message": "Server selection failed",
+ "operationId": {
+ "$$type": [
+ "int",
+ "long"
+ ]
+ },
+ "operation": "bulkWrite"
+ }
+ }
+ ]
+ }
+ ]
}
]
}
diff --git a/source/server-selection/tests/logging/operation-id.yml b/source/server-selection/tests/logging/operation-id.yml
index 430e81a58b..24e48f9410 100644
--- a/source/server-selection/tests/logging/operation-id.yml
+++ b/source/server-selection/tests/logging/operation-id.yml
@@ -30,6 +30,9 @@ createEntities:
- client:
id: &failPointClient failPointClient
+_yamlAnchors:
+ namespace: &namespace "logging-tests.server-selection"
+
tests:
- description: "Successful bulkWrite operation: log messages have operationIds"
operations:
@@ -122,3 +125,97 @@ tests:
operationId: { $$type: [int, long] }
operation: insert
+ - description: "Successful client bulkWrite operation: log messages have operationIds"
+ runOnRequirements:
+ - minServerVersion: "8.0" # required for bulkWrite command
+ operations:
+ # ensure we've discovered the server so it is immediately available
+ # and no extra "waiting for suitable server" messages are emitted.
+ # expected topology events reflect initial server discovery and server connect event.
+ - name: waitForEvent
+ object: testRunner
+ arguments:
+ client: *client
+ event:
+ topologyDescriptionChangedEvent: {}
+ count: 2
+ - name: clientBulkWrite
+ object: *client
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { x: 1 }
+ expectLogMessages:
+ - client: *client
+ messages:
+ - level: debug
+ component: serverSelection
+ data:
+ message: "Server selection started"
+ operationId: { $$type: [int, long] }
+ operation: bulkWrite
+ - level: debug
+ component: serverSelection
+ data:
+ message: "Server selection succeeded"
+ operationId: { $$type: [int, long] }
+ operation: bulkWrite
+
+ - description: "Failed client bulkWrite operation: log messages have operationIds"
+ runOnRequirements:
+ - minServerVersion: "8.0" # required for bulkWrite command
+ operations:
+ # fail all hello/legacy hello commands for the main client.
+ - name: failPoint
+ object: testRunner
+ arguments:
+ client: *failPointClient
+ failPoint:
+ configureFailPoint: failCommand
+ mode: alwaysOn
+ data:
+ failCommands: ["hello", "ismaster"]
+ appName: *appName
+ closeConnection: true
+ # wait until we've marked the server unknown due
+ # to a failed heartbeat.
+ - name: waitForEvent
+ object: testRunner
+ arguments:
+ client: *client
+ event:
+ serverDescriptionChangedEvent:
+ newDescription:
+ type: Unknown
+ count: 1
+ - name: bulkWrite
+ object: *client
+ arguments:
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { x: 1 }
+ expectError:
+ isClientError: true # server selection timeout
+ expectLogMessages:
+ - client: *client
+ messages:
+ - level: debug
+ component: serverSelection
+ data:
+ message: "Server selection started"
+ operationId: { $$type: [int, long] }
+ operation: bulkWrite
+ - level: info
+ component: serverSelection
+ data:
+ message: "Waiting for suitable server to become available"
+ operationId: { $$type: [int, long] }
+ operation: bulkWrite
+ - level: debug
+ component: serverSelection
+ data:
+ message: "Server selection failed"
+ operationId: { $$type: [int, long] }
+ operation: bulkWrite
diff --git a/source/server_write_commands.rst b/source/server_write_commands.rst
index 0626d54829..237f022a25 100644
--- a/source/server_write_commands.rst
+++ b/source/server_write_commands.rst
@@ -464,13 +464,21 @@ response would look, had the request asked for that write concern.
FAQ
---
-Can a driver still use the OP_INSERT, OP_DELETE, OP_UPDATE?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Why are ``_id`` values generated client-side by default for new documents?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Yes, a 2.6 server will still support those. But it is unlikely that a 2.8 server would. Of course, when talking to older servers, the usual op codes will continue working the same. An older server is one that reports ``hello.maxWireVersion`` to be less than 2 or does not include the field.
+Though drivers may expose configuration options to prevent this behavior, by default a new ``ObjectId`` value will be created client-side before an ``insert`` operation.
+
+This design decision primarily stems from the fact that MongoDB is a distributed database and the typical unique auto-incrementing scalar value most RDBMS' use for generating a primary key would not be robust enough, necessitating the need for a more robust data type (``ObjectId`` in this case). These ``_id`` values can be generated either on the client or the server, however when done client-side a new document's ``_id`` value is immediately available for use without the need for a network round trip.
+
+Prior to MongoDB 3.6, an ``insert`` operation would use the ``OP_INSERT`` opcode of the wire protocol to send the operation, and retrieve the results subsequently with a ``getLastError`` command. If client-side ``_id`` values were omitted, this command response wouldn't contain the server-created ``_id`` values for new documents. Following MongoDB 3.6 when all commands would be issued using the ``OP_MSG`` wire protocol opcode (``insert`` included), the response to the command still wouldn't contain the ``_id`` values for inserted documents.
+
+
+Can a driver still use the ``OP_INSERT``, ``OP_DELETE``, ``OP_UPDATE``?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `legacy opcodes were removed in MongoDB 6.0 `_. As of MongoDB 3.6 these opcodes were superseded by `OP_MSG `_, however all server versions up until 6.0 continued to support the legacy opcodes.
-The rationale here is that we may choose to divert all the write traffic to the new
-protocol. (This depends on the having the overhead to issue a batch with one item very low.)
Can an application still issue requests with write concerns {w: 0}?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -494,10 +502,12 @@ Yes but as of 2.6 the existing getLastError behavior is supported for backward c
Changelog
---------
-:2014-05-14: First public version
+:2024-06-04: Add FAQ entry outlining client-side _id value generation
+ Update FAQ to indicate legacy opcodes were removed
+:2022-10-05: Revise spec front matter and reformat changelog.
+:2022-07-25: Remove outdated value for ``maxWriteBatchSize``
+:2021-04-22: Updated to use hello command
:2014-05-15: Removed text related to bulk operations; see the Bulk API spec for
bulk details. Clarified some paragraphs; re-ordered the response
field sections.
-:2021-04-22: Updated to use hello command
-:2022-07-25: Remove outdated value for ``maxWriteBatchSize``
-:2022-10-05: Revise spec front matter and reformat changelog.
+:2014-05-14: First public version
diff --git a/source/sessions/driver-sessions.md b/source/sessions/driver-sessions.md
new file mode 100644
index 0000000000..c277494694
--- /dev/null
+++ b/source/sessions/driver-sessions.md
@@ -0,0 +1,939 @@
+# Sessions Specification
+
+- Status: Accepted
+- Minimum Server Version: 3.6
+
+______________________________________________________________________
+
+## Abstract
+
+Version 3.6 of the server introduces the concept of logical sessions for clients. A session is an abstract concept that
+represents a set of sequential operations executed by an application that are related in some way. This specification is
+limited to how applications start and end sessions. Other specifications define various ways in which sessions are used
+(e.g. causally consistent reads, retryable writes, or transactions).
+
+This specification also discusses how drivers participate in distributing the cluster time throughout a deployment, a
+process known as "gossipping the cluster time". While gossipping the cluster time is somewhat orthogonal to sessions,
+any driver that implements sessions MUST also implement gossipping the cluster time, so it is included in this
+specification.
+
+## Definitions
+
+### META
+
+The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
+
+### Terms
+
+**ClientSession**\
+The driver object representing a client session and the operations that can be performed on it.
+Depending on the language a driver is written in this might be an interface or a class. See also `ServerSession`.
+
+**Deployment**\
+A set of servers that are all part of a single MongoDB cluster. We avoid the word "cluster" because some
+people interpret "cluster" to mean "sharded cluster".
+
+**Explicit session**\
+A session that was started explicitly by the application by calling `startSession` and passed as
+an argument to an operation.
+
+**MongoClient**\
+The root object of a driver's API. MAY be named differently in some drivers.
+
+**Implicit session**\
+A session that was started implicitly by the driver because the application called an operation
+without providing an explicit session.
+
+**MongoCollection**\
+The driver object representing a collection and the operations that can be performed on it. MAY be
+named differently in some drivers.
+
+**MongoDatabase**\
+The driver object representing a database and the operations that can be performed on it. MAY be
+named differently in some drivers.
+
+**ServerSession**\
+The driver object representing a server session. This type is an implementation detail and does not
+need to be public. See also `ClientSession`.
+
+**Server session ID**\
+A server session ID is a token used to identify a particular server session. A driver can ask the
+server for a session ID using the `startSession` command or it can generate one locally (see Generating a Session ID
+locally).
+
+**Session**\
+A session is an abstract concept that represents a set of sequential operations executed by an application
+that are related in some way. Other specifications define the various ways in which operations can be related, but
+examples include causally consistent reads and retryable writes.
+
+**Topology**\
+The current configuration and state of a deployment.
+
+**Unacknowledged writes**\
+Unacknowledged writes are write operations that are sent to the server without waiting for a
+reply acknowledging the write. See the "When using unacknowledged writes" section below for information on how
+unacknowledged writes interact with sessions.
+
+**Network error**\
+Any network exception writing to or reading from a socket (e.g. a socket timeout or error).
+
+## Specification
+
+Drivers currently have no concept of a session. The driver API will be expanded to provide a way for applications to
+start and end sessions and to execute operations in the context of a session. The goal is to expand the API in a way
+that introduces no backward breaking changes. Existing applications that don't use sessions don't need to be changed,
+and new applications that don't need sessions can continue to be written using the existing API.
+
+To use sessions an application will call new (or overloaded) methods that take a session parameter.
+
+## Naming variations
+
+This specification defines names for new methods and types. To the extent possible, these names SHOULD be used by
+drivers. However, where a driver and/or language's naming conventions differ, those naming conventions SHOULD be used.
+For example, a driver might name a method `StartSession` or `start_session` instead of `startSession`, or might name a
+type `client_session` instead of `ClientSession`.
+
+## High level summary of the API changes for sessions
+
+This section is just a high level summary of the new API. Details are provided further on.
+
+Applications start a new session like this:
+
+```typescript
+options = new SessionOptions(/* various settings */);
+session = client.startSession(options);
+```
+
+The `SessionOptions` will be individually defined in several other specifications. It is expected that the set of
+`SessionOptions` will grow over time as sessions are used for new purposes.
+
+Applications use a session by passing it as an argument to operation methods. For example:
+
+```typescript
+collection.InsertOne(session /* etc. */)
+collection.UpdateOne(session /* etc. */)
+```
+
+Applications end a session like this:
+
+```typescript
+session.endSession()
+```
+
+This specification does not deal with multi-document transactions, which are covered in
+[their own specification](../transactions/transactions.md).
+
+## MongoClient changes
+
+`MongoClient` interface summary
+
+```java
+class SessionOptions {
+ // various other options as defined in other specifications
+}
+
+interface MongoClient {
+ ClientSession startSession(SessionOptions options);
+ // other existing members of MongoClient
+}
+```
+
+Each new member is documented below.
+
+While it is not part of the public API, `MongoClient` also needs a private (or internal) `clusterTime` member
+(containing either a BSON document or null) to record the highest `clusterTime` observed in a deployment (as described
+below in [Gossipping the cluster time](#gossipping-the-cluster-time)).
+
+### startSession
+
+The `startSession` method starts a new `ClientSession` with the provided options.
+
+It MUST NOT be possible to change the options provided to `startSession` after `startSession` has been called. This can
+be accomplished by making the `SessionOptions` class immutable or using some equivalent mechanism that is idiomatic for
+your language.
+
+It is valid to call `startSession` with no options set. This will result in a `ClientSession` that has no effect on the
+operations performed in the context of that session, other than to include a session ID in commands sent to the server.
+
+The `SessionOptions` MAY be a strongly typed class in some drivers, or MAY be a loosely typed dictionary in other
+drivers. Drivers MUST define `SessionOptions` in such a way that new options can be added in a backward compatible way
+(it is acceptable for backward compatibility to be at the source level).
+
+A `ClientSession` MUST be associated with a `ServerSession` at the time `startSession` is called. As an implementation
+optimization drivers MUST reuse `ServerSession` instances across multiple `ClientSession` instances subject to the rule
+that a server session MUST NOT be used by two `ClientSession` instances at the same time (see the Server Session Pool
+section). Additionally, a `ClientSession` may only ever be associated with one `ServerSession` for its lifetime.
+
+Drivers MUST NOT check for session support in `startSession`. Instead, if sessions are not supported, the error MUST be
+reported the first time the session is used for an operation (See
+[How to Tell Whether a Connection Supports Sessions](#how-to-tell-whether-a-connection-supports-sessions)).
+
+### Explicit vs implicit sessions
+
+An explicit session is one started explicitly by the application by calling `startSession`. An implicit session is one
+started implicitly by the driver because the application called an operation without providing an explicit session.
+Internally, a driver must be able to distinguish between explicit and implicit sessions, but no public API for this is
+necessary because an application will never see an implicit session.
+
+The motivation for starting an implicit session for all methods that don't take an explicit session parameter is to make
+sure that all commands that are sent to the server are tagged with a session ID. This improves the ability of an
+operations team to monitor (and kill if necessary) long running operations. Tagging an operation with a session ID is
+specially useful if a deployment wide operation needs to be killed.
+
+### Authentication
+
+When using authentication, using a session requires that only a single user be authenticated. Drivers that still support
+authenticating multiple users at once MAY continue to do so, but MUST NOT allow sessions to be used under such
+circumstances.
+
+If `startSession` is called when multiple users are authenticated drivers MUST raise an error with the error message
+"Cannot call startSession when multiple users are authenticated."
+
+If a driver allows authentication to be changed on the fly (presumably few still do) the driver MUST either prevent
+`ClientSession` instances from being used with a connection that doesn't have matching authentication or MUST return an
+error if such use is attempted.
+
+## ClientSession
+
+`ClientSession` instances are not thread safe or fork safe. They can only be used by one thread or process at a time.
+
+Drivers MUST document the thread-safety and fork-safety limitations of sessions. Drivers MUST NOT attempt to detect
+simultaneous use by multiple threads or processes (see Q&A for the rationale).
+
+ClientSession interface summary:
+
+```java
+interface ClientSession {
+ MongoClient client;
+ Optional clusterTime;
+ SessionOptions options;
+ BsonDocument sessionId;
+
+ void advanceClusterTime(BsonDocument clusterTime);
+ void endSession();
+}
+```
+
+While it is not part of the public API, a `ClientSession` also has a private (or internal) reference to a
+`ServerSession`.
+
+Each member is documented below.
+
+### client
+
+This property returns the `MongoClient` that was used to start this `ClientSession`.
+
+### clusterTime
+
+This property returns the most recent cluster time seen by this session. If no operations have been executed using this
+session this value will be null unless `advanceClusterTime` has been called. This value will also be null when a cluster
+does not report cluster times.
+
+When a driver is gossiping the cluster time it should send the more recent `clusterTime` of the `ClientSession` and the
+`MongoClient`.
+
+### options
+
+This property returns the `SessionOptions` that were used to start this `ClientSession`.
+
+### sessionId
+
+This property returns the session ID of this session. Note that since `ServerSessions` are pooled, different
+`ClientSession` instances can have the same session ID, but never at the same time.
+
+### advanceClusterTime
+
+This method advances the `clusterTime` for a session. If the new `clusterTime` is greater than the session's current
+`clusterTime` then the session's `clusterTime` MUST be advanced to the new `clusterTime`. If the new `clusterTime` is
+less than or equal to the session's current `clusterTime` then the session's `clusterTime` MUST NOT be changed.
+
+This method MUST NOT advance the `clusterTime` in `MongoClient` because we have no way of verifying that the supplied
+`clusterTime` is valid. If the `clusterTime` in `MongoClient` were set to an invalid value all future operations with
+this `MongoClient` would result in the server returning an error. The `clusterTime` in `MongoClient` should only be
+advanced with a `$clusterTime` received directly from a server.
+
+### endSession
+
+This method ends a `ClientSession`.
+
+In languages that have idiomatic ways of disposing of resources, drivers SHOULD support that in addition to or instead
+of `endSession`. For example, in the .NET driver `ClientSession` would implement `IDisposable` and the application could
+choose to call `session.Dispose` or put the session in a using statement instead of calling `session.endSession`. If
+your language has an idiomatic way of disposing resources you MAY choose to implement that in addition to or instead of
+`endSession`, whichever is more idiomatic for your language.
+
+A driver MUST allow multiple calls to `endSession` (or `Dispose`). All calls after the first one are ignored.
+
+Conceptually, calling `endSession` implies ending the corresponding server session (by calling the `endSessions`
+command). As an implementation detail drivers SHOULD cache server sessions for reuse (see Server Session Pool).
+
+Once a `ClientSession` has ended, drivers MUST report an error if any operations are attempted with that
+`ClientSession`.
+
+## ServerSession
+
+A `ServerSession` is the driver object that tracks a server session. This object is an implementation detail and does
+not need to be public. Drivers may store this information however they choose; this data structure is defined here
+merely to describe the operation of the server session pool.
+
+ServerSession interface summary
+
+```java
+interface ServerSession {
+ BsonDocument sessionId;
+ DateTime lastUse;
+}
+```
+
+### sessionId
+
+This property returns the server session ID.
+
+### lastUse
+
+The driver MUST update the value of this property with the current DateTime every time the server session ID is sent to
+the server. This allows the driver to track with reasonable accuracy the server's view of when a server session was last
+used.
+
+### Creating a ServerSession
+
+When a driver needs to create a new `ServerSession` instance the only information it needs is the session ID to use for
+the new session. It can either get the session ID from the server by running the `startSession` command, or it can
+generate it locally.
+
+In either case, the lastUse field of the `ServerSession` MUST be set to the current time when the `ServerSession` is
+created.
+
+### Generating a session ID locally
+
+Running the `startSession` command to get a session ID for a new session requires a round trip to the server. As an
+optimization the server allows drivers to generate new session IDs locally and to just start using them. When a server
+sees a new session ID that it has never seen before it simply assumes that it is a new session.
+
+A session ID is a `BsonDocument` that has the following form:
+
+```typescript
+interface SessionId {
+ id: UUID
+}
+```
+
+Where the UUID is encoded as a BSON binary value of subtype 4.
+
+The id field of the session ID is a version 4 UUID that must comply with the format described in RFC 4122. Section 4.4
+describes an algorithm for generating correctly-versioned UUIDs from a pseudo-random number generator.
+
+If a driver is unable to generate a version 4 UUID it MAY instead run the `startSession` command and let the server
+generate the session ID.
+
+## MongoDatabase changes
+
+All `MongoDatabase` methods that talk to the server MUST send a session ID with the command when connected to a
+deployment that supports sessions so that the server can associate the operation with a session ID.
+
+### New database methods that take an explicit session
+
+All `MongoDatabase` methods that talk to the server SHOULD be overloaded to take an explicit session parameter. (See
+[why is session an explicit parameter?](#why-is-session-an-explicit-parameter).)
+
+When overloading methods to take a session parameter, the session parameter SHOULD be the first parameter. If
+overloading is not possible for your language, it MAY be in a different position or MAY be embedded in an options
+structure.
+
+Methods that have a session parameter MUST check that the session argument is not null and was created by the same
+`MongoClient` that this `MongoDatabase` came from and report an error if they do not match.
+
+### Existing database methods that start an implicit session
+
+When an existing `MongoDatabase` method that does not take a session is called, the driver MUST behave as if a new
+`ClientSession` was started just for this one operation and ended immediately after this operation completes. The actual
+implementation will likely involve calling `client.startSession`, but that is not required by this spec. Regardless,
+please consult the startSession section to replicate the required steps for creating a session. The driver MUST NOT use
+the session if the checked out connection does not support sessions (see
+[How to Tell Whether a Connection Supports Sessions](#how-to-tell-whether-a-connection-supports-sessions)) and, in all
+cases, MUST NOT consume a server session id until after the connection is checked out and session support is confirmed.
+
+## MongoCollection changes
+
+All `MongoCollection` methods that talk to the server MUST send a session ID with the command when connected to a
+deployment that supports sessions so that the server can associate the operation with a session ID.
+
+### New collection methods that take an explicit session
+
+All `MongoCollection` methods that talk to the server, with the exception of `estimatedDocumentCount`, SHOULD be
+overloaded to take an explicit session parameter. (See
+[why is session an explicit parameter?](#why-is-session-an-explicit-parameter).)
+
+When overloading methods to take a session parameter, the session parameter SHOULD be the first parameter. If
+overloading is not possible for your language, it MAY be in a different position or MAY be embedded in an options
+structure.
+
+Methods that have a session parameter MUST check that the session argument is not null and was created by the same
+`MongoClient` that this `MongoCollection` came from and report an error if they do not match.
+
+The `estimatedDocumentCount` helper does not support an explicit session parameter. The underlying command, `count`, is
+not supported in a transaction, so supporting an explicit session would likely confuse application developers. The
+helper returns an estimate of the documents in a collection and causal consistency is unlikely to improve the accuracy
+of the estimate.
+
+### Existing collection methods that start an implicit session
+
+When an existing `MongoCollection` method that does not take a session is called, the driver MUST behave as if a new
+`ClientSession` was started just for this one operation and ended immediately after this operation completes. The actual
+implementation will likely involve calling `client.startSession`, but that is not required by this spec. Regardless,
+please consult the startSession section to replicate the required steps for creating a session. The driver MUST NOT use
+the session if the checked out connection does not support sessions (see
+[How to Tell Whether a Connection Supports Sessions](#how-to-tell-whether-a-connection-supports-sessions)) and, in all
+cases, MUST NOT consume a server session id until after the connection is checked out and session support is confirmed.
+
+## Sessions and Cursors
+
+When an operation using a session returns a cursor, all subsequent `GETMORE` commands for that cursor MUST be run using
+the same session ID.
+
+If a driver decides to run a `KILLCURSORS` command on the cursor, it also MAY be run using the same session ID. See the
+Exceptions below for when it is permissible to not include a session ID in a `KILLCURSORS` command.
+
+## Sessions and Connections
+
+To reduce the number of `ServerSessions` created, the driver MUST only obtain an implicit session's `ServerSession`
+after it successfully checks out a connection. A driver SHOULD NOT attempt to release the acquired session before
+connection check in.
+
+Explicit sessions MAY be changed to allocate a server session similarly.
+
+## How to Tell Whether a Connection Supports Sessions
+
+A driver can determine whether a connection supports sessions by checking whether the `logicalSessionTimeoutMinutes`
+property of the establishing handshake response has a value or not. If it has a value, sessions are supported.
+
+In the case of an explicit session, if sessions are not supported, the driver MUST raise an error. In the case of an
+implicit session, if sessions are not supported, the driver MUST ignore the session.
+
+### Possible race condition when checking for session support
+
+There is a possible race condition that can happen between the time the driver checks whether sessions are supported and
+subsequently sends a command to the server:
+
+- The server might have supported sessions at the time the connection was first opened (and reported a value for
+ logicalSessionTimeoutMinutes in the initial response to the [handshake](../mongodb-handshake/handshake.rst)), but have
+ subsequently been downgraded to not support sessions. The server does not close the socket in this scenario, so the
+ driver will conclude that the server at the other end of this connection supports sessions.
+
+There is nothing that the driver can do about this race condition, and the server will just return an error in this
+scenario.
+
+## Sending the session ID to the server on all commands
+
+When connected to a server that supports sessions a driver MUST append the session ID to every command it sends to the
+server (with the exceptions noted in the following section). It does this by adding a top level `lsid` field to the
+command sent to the server. A driver MUST do this without modifying any data supplied by the application (e.g. the
+command document passed to runCommand).:
+
+```typescript
+interface ExampleCommandWithLSID {
+ foo: 1;
+ lsid: SessionId;
+}
+```
+
+## Exceptions to sending the session ID to the server on all commands
+
+There are some exceptions to the rule that a driver MUST append the session ID to every command it sends to the server.
+
+### When opening and authenticating a connection
+
+A driver MUST NOT append a session ID to any command sent during the process of opening and authenticating a connection.
+
+### When monitoring the state of a deployment
+
+A driver MAY omit a session ID in hello and legacy hello commands sent solely for the purposes of monitoring the state
+of a deployment.
+
+### When sending a parallelCollectionScan command
+
+Sessions are designed for sequential operations and `parallelCollectionScan` is designed for parallel operation. Because
+these are fundamentally incompatible goals, drivers MUST NOT append session ID to the `parallelCollectionScan` command
+so that the resulting cursors have no associated session ID and thus can be used in parallel.
+
+### When sending a killCursors command
+
+A driver MAY omit a session ID in `killCursors` commands for two reasons. First, `killCursors` is only ever sent to a
+particular server, so operation teams wouldn't need the `lsid` for cluster-wide killOp. An admin can manually kill the
+op with its operation id in the case that it is slow. Secondly, some drivers have a background cursor reaper to kill
+cursors that aren't exhausted and closed. Due to GC semantics, it can't use the same `lsid` for `killCursors` as was
+used for a cursor's `find` and `getMore`, so there's no point in using any `lsid` at all.
+
+### When multiple users are authenticated and the session is implicit
+
+The driver MUST NOT send a session ID from an implicit session when multiple users are authenticated. If possible the
+driver MUST NOT start an implicit session when multiple users are authenticated. Alternatively, if the driver cannot
+determine whether multiple users are authenticated at the point in time that an implicit session is started, then the
+driver MUST ignore any implicit sessions that subsequently end up being used on a connection that has multiple users
+authenticated.
+
+### When using unacknowledged writes
+
+A session ID MUST NOT be used simultaneously by more than one operation. Since drivers don't wait for a response for an
+unacknowledged write a driver would not know when the session ID could be reused. In theory a driver could use a new
+session ID for each unacknowledged write, but that would result in many orphaned sessions building up at the server.
+
+Therefore drivers MUST NOT send a session ID with unacknowledged writes under any circumstances:
+
+- For unacknowledged writes with an explicit session, drivers SHOULD raise an error. If a driver allows users to provide
+ an explicit session with an unacknowledged write (e.g. for backwards compatibility), the driver MUST NOT send the
+ session ID.
+- For unacknowledged writes without an explicit session, drivers SHOULD NOT use an implicit session. If a driver creates
+ an implicit session for unacknowledged writes without an explicit session, the driver MUST NOT send the session ID.
+
+Drivers MUST document the behavior of unacknowledged writes for both explicit and implicit sessions.
+
+### When wrapping commands in a `$query` field
+
+If the driver is wrapping the command in a `$query` field for non-OP_MSG messages in order to pass a readPreference to a
+mongos (see [ReadPreference and Mongos](../find_getmore_killcursors_commands.rst#readpreference-and-mongos)), the driver
+SHOULD NOT add the `lsid` as a top-level field, and MUST add the `lsid` as a field of the `$query`
+
+```typescript
+// Wrapped command:
+interface WrappedCommandExample {
+ $query: {
+ find: { foo: 1 }
+ },
+ $readPreference: {}
+}
+
+// Correct application of lsid
+interface CorrectLSIDUsageExample {
+ $query: {
+ find: { foo: 1 },
+ lsid: SessionId
+ },
+ $readPreference: {}
+}
+
+// Incorrect application of lsid
+interface IncorrectLSIDUsageExample {
+ $query: {
+ find: { foo: 1 }
+ },
+ $readPreference: {},
+ lsid: SessionId
+}
+```
+
+## Server Commands
+
+### startSession
+
+The `startSession` server command has the following format:
+
+```typescript
+interface StartSessionCommand {
+ startSession: 1;
+ $clusterTime?: ClusterTime;
+}
+```
+
+The `$clusterTime` field should only be sent when gossipping the cluster time. See the section "Gossipping the cluster
+time" for information on `$clusterTime`.
+
+The `startSession` command MUST be sent to the `admin` database.
+
+The server response has the following format:
+
+```typescript
+interface StartSessionResponse {
+ ok: 1;
+ id: BsonDocument;
+}
+```
+
+In case of an error, the server response has the following format:
+
+```typescript
+interface StartSessionError {
+ ok: 0;
+ errmsg: string;
+ code: number;
+}
+```
+
+When connected to a replica set the `startSession` command MUST be sent to the primary if the primary is available. The
+`startSession` command MAY be sent to a secondary if there is no primary available at the time the `startSession`
+command needs to be run.
+
+Drivers SHOULD generate session IDs locally if possible instead of running the `startSession` command, since running the
+command requires a network round trip.
+
+### endSessions
+
+The `endSessions` server command has the following format:
+
+```typescript
+interface EndSessionCommand {
+ endSessions: Array;
+ $clusterTime?: ClusterTime;
+}
+```
+
+The `$clusterTime` field should only be sent when gossipping the cluster time. See the section of "Gossipping the
+cluster time" for information on `$clusterTime`.
+
+The `endSessions` command MUST be sent to the `admin` database.
+
+The server response has the following format:
+
+```typescript
+interface EndSessionResponse {
+ ok: 1;
+}
+```
+
+In case of an error, the server response has the following format:
+
+```typescript
+interface EndSessionError {
+ ok: 0;
+ errmsg: string;
+ code: number;
+}
+```
+
+Drivers MUST ignore any errors returned by the `endSessions` command.
+
+The `endSessions` command MUST be run once when the `MongoClient` instance is shut down. If the number of sessions is
+very large the `endSessions` command SHOULD be run multiple times to end 10,000 sessions at a time (in order to avoid
+creating excessively large commands).
+
+When connected to a sharded cluster the `endSessions` command can be sent to any mongos. When connected to a replica set
+the `endSessions` command MUST be sent to the primary if the primary is available, otherwise it MUST be sent to any
+available secondary.
+
+## Server Session Pool
+
+Conceptually, each `ClientSession` can be thought of as having a new corresponding `ServerSession`. However, starting a
+server session might require a round trip to the server (which can be avoided by generating the session ID locally) and
+ending a session requires a separate round trip to the server. Drivers can operate more efficiently and put less load on
+the server if they cache `ServerSession` instances for reuse. To this end drivers MUST implement a server session pool
+containing `ServerSession` instances available for reuse. A `ServerSession` pool MUST belong to a `MongoClient` instance
+and have the same lifetime as the `MongoClient` instance.
+
+When a new implicit `ClientSession` is started it MUST NOT attempt to acquire a server session from the server session
+pool immediately. When a new explicit `ClientSession` is started it MAY attempt to acquire a server session from the
+server session pool immediately. See the algorithm below for the steps to follow when attempting to acquire a
+`ServerSession` from the server session pool.
+
+Note that `ServerSession` instances acquired from the server session pool might have as little as one minute left before
+becoming stale and being discarded server side. Drivers MUST document that if an application waits more than one minute
+after calling `startSession` to perform operations with that session it risks getting errors due to the server session
+going stale before it was used.
+
+A server session is considered stale by the server when it has not been used for a certain amount of time. The default
+amount of time is 30 minutes, but this value is configurable on the server. Servers that support sessions will report
+this value in the `logicalSessionTimeoutMinutes` field of the reply to the hello and legacy hello commands. The smallest
+reported timeout is recorded in the `logicalSessionTimeoutMinutes` property of the `TopologyDescription`. See the Server
+Discovery And Monitoring specification for details.
+
+When a `ClientSession` is ended it MUST return the server session to the server session pool. See the algorithm below
+for the steps to follow when returning a `ServerSession` instance to the server session pool.
+
+The server session pool has no maximum size. The pool only shrinks when a server session is acquired for use or
+discarded.
+
+When a `MongoClient` instance is closed the driver MUST proactively inform the server that the pooled server sessions
+will no longer be used by sending one or more `endSessions` commands to the server.
+
+The server session pool is modeled as a double ended queue. The algorithms below require the ability to add and remove
+`ServerSession` instances from the front of the queue and to inspect and possibly remove `ServerSession` instances from
+the back of the queue. The front of the queue holds `ServerSession` instances that have been released recently and
+should be the first to be reused. The back of the queue holds `ServerSession` instances that have not been used recently
+and that potentially will be discarded if they are not used again before they expire.
+
+An implicit session MUST be returned to the pool immediately following the completion of an operation. When an implicit
+session is associated with a cursor for use with `getMore` operations, the session MUST be returned to the pool
+immediately following a `getMore` operation that indicates that the cursor has been exhausted. In particular, it MUST
+not wait until all documents have been iterated by the application or until the application disposes of the cursor. For
+language runtimes that provide the ability to attach finalizers to objects that are run prior to garbage collection, the
+cursor class SHOULD return an implicit session to the pool in the finalizer if the cursor has not already been
+exhausted.
+
+If a driver supports process forking, the session pool needs to be cleared on one side of the forked processes (just
+like sockets need to reconnect). Drivers MUST provide a way to clear the session pool without sending `endSessions`.
+Drivers MAY make this automatic when the process ID changes. If they do not, they MUST document how to clear the session
+pool wherever they document fork support. After clearing the session pool in this way, drivers MUST ensure that sessions
+already checked out are not returned to the new pool.
+
+If a driver has a server session pool and a network error is encountered when executing any command with a
+`ClientSession`, the driver MUST mark the associated `ServerSession` as dirty. Dirty server sessions are discarded when
+returned to the server session pool. It is valid for a dirty session to be used for subsequent commands (e.g. an
+implicit retry attempt, a later command in a bulk write, or a later operation on an explicit session), however, it MUST
+remain dirty for the remainder of its lifetime regardless if later commands succeed.
+
+### Algorithm to acquire a ServerSession instance from the server session pool
+
+1. If the server session pool is empty create a new `ServerSession` and use it
+2. Otherwise remove a `ServerSession` from the front of the queue and examine it:
+ - If the driver is in load balancer mode, use this `ServerSession`.
+ - If it has at least one minute left before becoming stale use this `ServerSession`
+ - If it has less than one minute left before becoming stale discard it (let it be garbage collected) and return to
+ step 1.
+
+See the [Load Balancer Specification](../load-balancers/load-balancers.md#session-expiration) for details on session
+expiration.
+
+### Algorithm to return a ServerSession instance to the server session pool
+
+1. Before returning a server session to the pool a driver MUST first check the server session pool for server sessions
+ at the back of the queue that are about to expire (meaning they will expire in less than one minute). A driver MUST
+ stop checking server sessions once it encounters a server session that is not about to expire. Any server sessions
+ found that are about to expire are removed from the end of the queue and discarded (or allowed to be garbage
+ collected)
+2. Then examine the server session that is being returned to the pool and:
+ - If this session is marked dirty (i.e. it was involved in a network error) discard it (let it be garbage collected)
+ - If it will expire in less than one minute discard it (let it be garbage collected)
+ - If it won't expire for at least one minute add it to the front of the queue
+
+## Gossipping the cluster time
+
+Drivers MUST gossip the cluster time when connected to a deployment that uses cluster times.
+
+Gossipping the cluster time is a process in which the driver participates in distributing the logical cluster time in a
+deployment. Drivers learn the current cluster time (from a particular server's perspective) in responses they receive
+from servers. Drivers in turn forward the highest cluster time they have seen so far to any server they subsequently
+send commands to.
+
+A driver detects that it MUST participate in gossipping the cluster time when it sees a `$clusterTime` in a response
+received from a server.
+
+### Receiving the current cluster time
+
+Drivers MUST examine all responses from the server commands to see if they contain a top level field named
+`$clusterTime` formatted as follows:
+
+```typescript
+interface ClusterTime {
+ clusterTime: Timestamp;
+ signature: {
+ hash: Binary;
+ keyId: Int64;
+ };
+}
+
+interface AnyServerResponse {
+ // ... other properties ...
+ $clusterTime: ClusterTime;
+}
+```
+
+Whenever a driver receives a cluster time from a server it MUST compare it to the current highest seen cluster time for
+the deployment. If the new cluster time is higher than the highest seen cluster time it MUST become the new highest seen
+cluster time. Two cluster times are compared using only the BsonTimestamp value of the `clusterTime` embedded field (be
+sure to include both the timestamp and the increment of the BsonTimestamp in the comparison). The signature field does
+not participate in the comparison.
+
+### Sending the highest seen cluster time
+
+Whenever a driver sends a command to a server it MUST include the highest seen cluster time in a top level field called
+`$clusterTime`, in the same format as it was received in (but see Gossipping with mixed server versions below).
+
+### How to compute the `$clusterTime` to send to a server
+
+When sending `$clusterTime` to the server the driver MUST send the greater of the `clusterTime` values from
+`MongoClient` and `ClientSession`. Normally a session's `clusterTime` will be less than or equal to the `clusterTime` in
+`MongoClient`, but it could be greater than the `clusterTime` in `MongoClient` if `advanceClusterTime` was called with a
+`clusterTime` that came from somewhere else.
+
+A driver MUST NOT use the `clusterTime` of a `ClientSession` anywhere else except when executing an operation with this
+session. This rule protects the driver from the scenario where `advanceClusterTime` was called with an invalid
+`clusterTime` by limiting the resulting server errors to the one session. The `clusterTime` of a `MongoClient` MUST NOT
+be advanced by any `clusterTime` other than a `$clusterTime` received directly from a server.
+
+The safe way to compute the `$clusterTime` to send to a server is:
+
+1. When the `ClientSession` is first started its `clusterTime` is set to null.
+
+2. When the driver sends `$clusterTime` to the server it should send the greater of the `ClientSession` `clusterTime`
+ and the `MongoClient` `clusterTime` (either one could be null).
+
+3. When the driver receives a `$clusterTime` from the server it should advance both the `ClientSession` and the
+ `MongoClient` `clusterTime`. The `clusterTime` of a `ClientSession` can also be advanced by calling
+ `advanceClusterTime`.
+
+This sequence ensures that if the `clusterTime` of a `ClientSession` is invalid only that one session will be affected.
+The `MongoClient` `clusterTime` is only updated with `$clusterTime` values known to be valid because they were received
+directly from a server.
+
+### Tracking the highest seen cluster time does not require checking the deployment topology or the server version
+
+Drivers do not need to check the deployment topology or the server version they are connected to in order to track the
+highest seen `$clusterTime`. They simply need to check for the presence of the `$clusterTime` field in responses
+received from servers.
+
+### Gossipping with mixed server versions
+
+Drivers MUST check that the server they are sending a command to supports `$clusterTime` before adding `$clusterTime` to
+the command. A server supports `$clusterTime` when the `maxWireVersion` >= 6.
+
+This supports the (presumably short lived) scenario where not all servers have been upgraded to 3.6.
+
+## Test Plan
+
+See the [README](tests/README.md) for tests.
+
+## Motivation
+
+Drivers currently have no concept of a session. The driver API needs to be extended to support sessions.
+
+## Design Rationale
+
+The goal is to modify the driver API in such a way that existing programs that don't use sessions continue to compile
+and run correctly. This goal is met by defining new methods (or overloads) that take a session parameter. An application
+does not need to be modified unless it wants to take advantage of the new features supported by sessions.
+
+## Backwards Compatibility
+
+The API changes to support sessions extend the existing API but do not introduce any backward breaking changes. Existing
+programs that don't use sessions continue to compile and run correctly.
+
+## Reference Implementation (always required)
+
+A reference implementation must be completed before any spec is given status "Final", but it need not be completed
+before the spec is "Accepted". While there is merit to the approach of reaching consensus on the specification and
+rationale before writing code, the principle of "rough consensus and running code" is still useful when it comes to
+resolving many discussions of spec details. A final reference implementation must include test code and documentation.
+
+The C and C# drivers will do initial POC implementations.
+
+## Future work (optional)
+
+Use this section to discuss any possible work for a future spec. This could cover issues where no consensus could be
+reached but that don't block this spec, changes that were rejected due to unclear use cases, etc.
+
+## Open questions
+
+## Q&A
+
+### Why do we say drivers MUST NOT attempt to detect unsafe multi-threaded or multi-process use of `ClientSession`?
+
+Because doing so would provide an illusion of safety. It doesn't make these instances thread safe. And even if when
+testing an application no such exceptions are encountered, that doesn't prove anything. The application might still be
+using the instances in a thread-unsafe way and just didn't happen to do so during a test run. The final argument is that
+checking this would require overhead that doesn't provide any clear benefit.
+
+### Why is session an explicit parameter?
+
+A previous draft proposed that ClientSession would be a MongoClient-like object added to the object hierarchy:
+
+```javascript
+session = client.startSession(...)
+database = session.getDatabase(...) // database is associated with session
+collection = database.getCollection(...) // collection is associated with session
+// operations on collection implicitly use session
+collection.insertOne({})
+session.endSession()
+```
+
+The central feature of this design is that a MongoCollection (or database, or perhaps a GridFS object) is associated
+with a session, which is then an implied parameter to any operations executed using that MongoCollection.
+
+This API was rejected, with the justification that a ClientSession does not naturally belong to the state of a
+MongoCollection. MongoCollection has up to now been a stable long-lived object that could be widely shared, and in most
+drivers it is thread safe. Once we associate a ClientSession with it, the MongoCollection object becomes short-lived and
+is no longer thread safe. It is a bad sign that MongoCollection's thread safety and lifetime vary depending on how its
+parent MongoDatabase is created.
+
+Instead, we require users to pass session as a parameter to each function:
+
+```javascript
+session = client.startSession(...)
+database = client.getDatabase(...)
+collection = database.getCollection(...)
+// users must explicitly pass session to operations
+collection.insertOne(session, {})
+session.endSession()
+```
+
+### Why does a network error cause the `ServerSession` to be discarded from the pool?
+
+When a network error is encountered when executing an operation with a `ClientSession`, the operation may be left
+running on the server. Re-using this `ServerSession` can lead to parallel operations which violates the rule that a
+session must be used sequentially. This results in multiple problems:
+
+1. killSessions to end an earlier operation would surprisingly also end a later operation.
+2. An otherwise unrelated operation that just happens to use that same server session will potentially block waiting for
+ the previous operation to complete. For example, a transactional write will block a subsequent transactional write.
+
+### Why do automatic retry attempts re-use a dirty implicit session?
+
+The retryable writes spec requires that both the original and retry attempt use the same server session. The server will
+block the retry attempt until the initial attempt completes at which point the retry attempt will continue executing.
+
+For retryable reads that use an implicit session, drivers could choose to use a new server session for the retry attempt
+however this would lose the information that these two reads are related.
+
+### Why don't drivers run the endSessions command to cleanup dirty server sessions?
+
+Drivers do not run the endSessions command when discarding a dirty server session because disconnects should be
+relatively rare and the server won't normally accumulate a large number of abandoned dirty sessions. Any abandoned
+sessions will be automatically cleaned up by the server after the configured `logicalSessionTimeoutMinutes`.
+
+### Why must drivers wait to consume a server session until after a connection is checked out?
+
+The problem that may occur is when the number of concurrent application requests are larger than the number of available
+connections, the driver may generate many more implicit sessions than connections. For example with maxPoolSize=1 and
+100 threads, 100 implicit sessions may be created. This increases the load on the server since session state is cached
+in memory. In the worst case this kind of workload can hit the session limit and trigger TooManyLogicalSessions.
+
+In order to address this, drivers MUST NOT consume a server session id until after the connection is checked out. This
+change will limit the number of "in use" server sessions to no greater than an application's maxPoolSize.
+
+The language here is specific about obtaining a server session as opposed to creating the implicit session to permit
+drivers to take an implementation approach where the implicit session creation logic largely remains unchanged. Implicit
+session creation can be left as is, as long as the underlying server resource isn't allocated until it is needed and,
+known it will be used, after connection checkout succeeds.
+
+It is still possible that via explicit sessions or cursors, which hold on to the session they started with, a driver
+could over allocate sessions. But those scenarios are extenuating and outside the scope of solving in this spec.
+
+### Why should drivers NOT attempt to release a serverSession before checking back in the operation's connection?
+
+There are a variety of cases, such as retryable operations or cursor creating operations, where a `serverSession` must
+remain acquired by the `ClientSession` after an operation is attempted. Attempting to account for all these scenarios
+has risks that do not justify the potential guaranteed `ServerSession` allocation limiting.
+
+## Changelog
+
+- 2024-05-08: Migrated from reStructuredText to Markdown.
+- 2017-09-13: If causalConsistency option is omitted assume true
+- 2017-09-16: Omit session ID when opening and authenticating a connection
+- 2017-09-18: Drivers MUST gossip the cluster time when they see a `$clusterTime`.
+- 2017-09-19: How to safely use initialClusterTime
+- 2017-09-29: Add an exception to the rule that `KILLCURSORS` commands always require a session id
+- 2017-10-03: startSession and endSessions commands MUST be sent to the admin database
+- 2017-10-03: Fix format of endSessions command
+- 2017-10-04: Added advanceClusterTime
+- 2017-10-06: Added descriptions of explicit and implicit sessions
+- 2017-10-17: Implicit sessions MUST NOT be used when multiple users authenticated
+- 2017-10-19: Possible race conditions when checking whether a deployment supports sessions
+- 2017-11-21: Drivers MUST NOT send a session ID for unacknowledged writes
+- 2018-01-10: Note that MongoClient must retain highest clusterTime
+- 2018-01-10: Update test plan for drivers without APM
+- 2018-01-11: Clarify that sessions require replica sets or sharded clusters
+- 2018-02-20: Add implicit/explicit session tests
+- 2018-02-20: Drivers SHOULD error if unacknowledged writes are used with sessions
+- 2018-05-23: Drivers MUST not use session ID with parallelCollectionScan
+- 2018-06-07: Document that estimatedDocumentCount does not support explicit sessions
+- 2018-07-19: Justify why session must be an explicit parameter to each function
+- 2018-10-11: Session pools must be cleared in child process after fork
+- 2019-05-15: A ServerSession that is involved in a network error MUST be discarded
+- 2019-10-22: Drivers may defer checking if a deployment supports sessions until the first
+- 2021-04-08: Updated to use hello and legacy hello
+- 2021-04-08: Adding in behaviour for load balancer mode.
+- 2020-05-26: Simplify logic for determining sessions support
+- 2022-01-28: Implicit sessions MUST obtain server session after connection checkout succeeds
+- 2022-03-24: ServerSession Pooling is required and clarifies session acquisition bounding
+- 2022-06-13: Move prose tests to test README and apply new ordering
+- 2022-10-05: Remove spec front matter
+- 2023-02-24: Defer checking for session support until after connection checkout
diff --git a/source/sessions/driver-sessions.rst b/source/sessions/driver-sessions.rst
index 493bcc8ff7..9cd2e55e32 100644
--- a/source/sessions/driver-sessions.rst
+++ b/source/sessions/driver-sessions.rst
@@ -1,1154 +1,4 @@
-=============================
-Driver Sessions Specification
-=============================
-:Status: Accepted
-:Minimum Server Version: 3.6
-
-.. contents::
-
---------
-
-Abstract
-========
-
-Version 3.6 of the server introduces the concept of logical sessions for
-clients. A session is an abstract concept that represents a set of sequential
-operations executed by an application that are related in some way. This
-specification is limited to how applications start and end sessions. Other
-specifications define various ways in which sessions are used (e.g. causally
-consistent reads, retryable writes, or transactions).
-
-This specification also discusses how drivers participate in distributing the
-cluster time throughout a deployment, a process known as "gossipping the
-cluster time". While gossipping the cluster time is somewhat orthogonal to
-sessions, any driver that implements sessions MUST also implement gossipping
-the cluster time, so it is included in this specification.
-
-Definitions
-===========
-
-META
-----
-
-The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
-“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be
-interpreted as described in `RFC 2119 `_.
-
-Terms
------
-
-ClientSession
- The driver object representing a client session and the operations that can
- be performed on it. Depending on the language a driver is written in this
- might be an interface or a class. See also ``ServerSession``.
-
-Deployment
- A set of servers that are all part of a single MongoDB cluster. We avoid the
- word "cluster" because some people interpret "cluster" to mean "sharded cluster".
-
-Explicit session
- A session that was started explicitly by the application by calling ``startSession``
- and passed as an argument to an operation.
-
-MongoClient
- The root object of a driver's API. MAY be named differently in some drivers.
-
-Implicit session
- A session that was started implicitly by the driver because the application
- called an operation without providing an explicit session.
-
-MongoCollection
- The driver object representing a collection and the operations that can be
- performed on it. MAY be named differently in some drivers.
-
-MongoDatabase
- The driver object representing a database and the operations that can be
- performed on it. MAY be named differently in some drivers.
-
-ServerSession
- The driver object representing a server session. This type is an
- implementation detail and does not need to be public. See also
- ``ClientSession``.
-
-Server session ID
- A server session ID is a token used to identify a particular server
- session. A driver can ask the server for a session ID using the
- ``startSession`` command or it can generate one locally (see Generating a
- Session ID locally).
-
-Session
- A session is an abstract concept that represents a set of sequential
- operations executed by an application that are related in some way. Other
- specifications define the various ways in which operations can be related,
- but examples include causally consistent reads and retryable writes.
-
-Topology
- The current configuration and state of a deployment.
-
-Unacknowledged writes
- Unacknowledged writes are write operations that are sent to the server
- without waiting for a reply acknowledging the write. See the "When using
- unacknowledged writes" section below for information on how unacknowledged
- writes interact with sessions.
-
-Network error
- Any network exception writing to or reading from a socket (e.g. a socket
- timeout or error).
-
-Specification
-=============
-
-Drivers currently have no concept of a session. The driver API will be expanded
-to provide a way for applications to start and end sessions and to execute
-operations in the context of a session. The goal is to expand the API in a way
-that introduces no backward breaking changes. Existing applications that don't
-use sessions don't need to be changed, and new applications that don't need
-sessions can continue to be written using the existing API.
-
-To use sessions an application will call new (or overloaded) methods that take
-a session parameter.
-
-Naming variations
-=================
-
-This specification defines names for new methods and types. To the extent
-possible, these names SHOULD be used by drivers. However, where a driver and/or
-language's naming conventions differ, those naming conventions SHOULD be used.
-For example, a driver might name a method ``StartSession`` or ``start_session`` instead
-of ``startSession``, or might name a type ``client_session`` instead of ``ClientSession``.
-
-High level summary of the API changes for sessions
-==================================================
-
-This section is just a high level summary of the new API. Details are provided
-further on.
-
-Applications start a new session like this:
-
-.. code:: typescript
-
- options = new SessionOptions(/* various settings */);
- session = client.startSession(options);
-
-The ``SessionOptions`` will be individually defined in several other
-specifications. It is expected that the set of ``SessionOptions`` will grow over
-time as sessions are used for new purposes.
-
-Applications use a session by passing it as an argument to operation methods.
-For example:
-
-.. code:: typescript
-
- collection.InsertOne(session /* etc. */)
- collection.UpdateOne(session /* etc. */)
-
-Applications end a session like this:
-
-.. code:: typescript
-
- session.endSession()
-
-This specification does not deal with multi-document transactions, which
-are covered in `their own specification <../transactions/transactions.md>`_.
-
-MongoClient changes
-===================
-
-``MongoClient`` interface summary
-
-.. code:: java
-
- class SessionOptions {
- // various other options as defined in other specifications
- }
-
- interface MongoClient {
- ClientSession startSession(SessionOptions options);
- // other existing members of MongoClient
- }
-
-Each new member is documented below.
-
-While it is not part of the public API, ``MongoClient`` also needs a private
-(or internal) ``clusterTime`` member (containing either a BSON document or
-null) to record the highest ``clusterTime`` observed in a deployment (as
-described below in `Gossipping the cluster time`_).
-
-startSession
-------------
-
-The ``startSession`` method starts a new ``ClientSession`` with the provided options.
-
-It MUST NOT be possible to change the options provided to ``startSession`` after
-``startSession`` has been called. This can be accomplished by making the
-``SessionOptions`` class immutable or using some equivalent mechanism that is
-idiomatic for your language.
-
-It is valid to call ``startSession`` with no options set. This will result in a
-``ClientSession`` that has no effect on the operations performed in the context of
-that session, other than to include a session ID in commands sent to the
-server.
-
-The ``SessionOptions`` MAY be a strongly typed class in some drivers, or MAY be a
-loosely typed dictionary in other drivers. Drivers MUST define ``SessionOptions``
-in such a way that new options can be added in a backward compatible way (it is
-acceptable for backward compatibility to be at the source level).
-
-A ``ClientSession`` MUST be associated with a ``ServerSession`` at the time
-``startSession`` is called. As an implementation optimization drivers MUST reuse
-``ServerSession`` instances across multiple ``ClientSession`` instances subject
-to the rule that a server session MUST NOT be used by two ``ClientSession``
-instances at the same time (see the Server Session Pool section). Additionally,
-a ``ClientSession`` may only ever be associated with one ``ServerSession`` for
-its lifetime.
-
-Drivers MUST NOT check for session support in `startSession`. Instead, if sessions
-are not supported, the error MUST be reported the first time the session is used
-for an operation (See `How to Tell Whether a Connection Supports Sessions`_).
-
-Explicit vs implicit sessions
------------------------------
-
-An explicit session is one started explicitly by the application by calling
-``startSession``. An implicit session is one started implicitly by the driver
-because the application called an operation without providing an explicit
-session. Internally, a driver must be able to distinguish between explicit and
-implicit sessions, but no public API for this is necessary because an
-application will never see an implicit session.
-
-The motivation for starting an implicit session for all methods that don't
-take an explicit session parameter is to make sure that all commands that are
-sent to the server are tagged with a session ID. This improves the ability of
-an operations team to monitor (and kill if necessary) long running operations.
-Tagging an operation with a session ID is specially useful if a deployment wide
-operation needs to be killed.
-
-Authentication
---------------
-
-When using authentication, using a session requires that only a single user be
-authenticated. Drivers that still support authenticating multiple users at once
-MAY continue to do so, but MUST NOT allow sessions to be used under such
-circumstances.
-
-If ``startSession`` is called when multiple users are authenticated drivers MUST
-raise an error with the error message "Cannot call startSession when multiple
-users are authenticated."
-
-If a driver allows authentication to be changed on the fly (presumably few
-still do) the driver MUST either prevent ``ClientSession`` instances from being used with a
-connection that doesn't have matching authentication or MUST return an error if
-such use is attempted.
-
-ClientSession
-=============
-
-``ClientSession`` instances are not thread safe or fork safe. They can only be
-used by one thread or process at a time.
-
-Drivers MUST document the thread-safety and fork-safety limitations of sessions.
-Drivers MUST NOT attempt to detect simultaneous use by multiple threads or
-processes (see Q&A for the rationale).
-
-ClientSession interface summary:
-
-.. code:: java
-
- interface ClientSession {
- MongoClient client;
- Optional clusterTime;
- SessionOptions options;
- BsonDocument sessionId;
-
- void advanceClusterTime(BsonDocument clusterTime);
- void endSession();
- }
-
-While it is not part of the public API, a ``ClientSession`` also has a private
-(or internal) reference to a ``ServerSession``.
-
-Each member is documented below.
-
-client
-------
-
-This property returns the ``MongoClient`` that was used to start this
-``ClientSession``.
-
-clusterTime
------------
-
-This property returns the most recent cluster time seen by this session. If no
-operations have been executed using this session this value will be null unless
-``advanceClusterTime`` has been called. This value will also be null when a
-cluster does not report cluster times.
-
-When a driver is gossiping the cluster time it should send the more recent
-``clusterTime`` of the ``ClientSession`` and the ``MongoClient``.
-
-options
--------
-
-This property returns the ``SessionOptions`` that were used to start this
-``ClientSession``.
-
-sessionId
----------
-
-This property returns the session ID of this session. Note that since ``ServerSessions``
-are pooled, different ``ClientSession`` instances can have the same session ID,
-but never at the same time.
-
-advanceClusterTime
-------------------
-
-This method advances the ``clusterTime`` for a session. If the new
-``clusterTime`` is greater than the session's current ``clusterTime`` then the
-session's ``clusterTime`` MUST be advanced to the new ``clusterTime``. If the
-new ``clusterTime`` is less than or equal to the session's current
-``clusterTime`` then the session's ``clusterTime`` MUST NOT be changed.
-
-This method MUST NOT advance the ``clusterTime`` in ``MongoClient`` because we
-have no way of verifying that the supplied ``clusterTime`` is valid. If the
-``clusterTime`` in ``MongoClient`` were set to an invalid value all future
-operations with this ``MongoClient`` would result in the server returning an
-error. The ``clusterTime`` in ``MongoClient`` should only be advanced with a
-``$clusterTime`` received directly from a server.
-
-endSession
-----------
-
-This method ends a ``ClientSession``.
-
-In languages that have idiomatic ways of disposing of resources, drivers SHOULD
-support that in addition to or instead of ``endSession``. For example, in the .NET
-driver ``ClientSession`` would implement ``IDisposable`` and the application could
-choose to call ``session.Dispose`` or put the session in a using statement instead
-of calling ``session.endSession``. If your language has an idiomatic way of
-disposing resources you MAY choose to implement that in addition to or instead
-of ``endSession``, whichever is more idiomatic for your language.
-
-A driver MUST allow multiple calls to ``endSession`` (or ``Dispose``). All calls after
-the first one are ignored.
-
-Conceptually, calling ``endSession`` implies ending the corresponding server
-session (by calling the ``endSessions`` command). As an implementation detail
-drivers SHOULD cache server sessions for reuse (see Server Session Pool).
-
-Once a ``ClientSession`` has ended, drivers MUST report an error if any operations
-are attempted with that ``ClientSession``.
-
-ServerSession
-=============
-
-A ``ServerSession`` is the driver object that tracks a server session. This object
-is an implementation detail and does not need to be public. Drivers may store
-this information however they choose; this data structure is defined here
-merely to describe the operation of the server session pool.
-
-ServerSession interface summary
-
-.. code:: java
-
- interface ServerSession {
- BsonDocument sessionId;
- DateTime lastUse;
- }
-
-sessionId
----------
-
-This property returns the server session ID.
-
-lastUse
--------
-
-The driver MUST update the value of this property with the current DateTime
-every time the server session ID is sent to the server. This allows the driver
-to track with reasonable accuracy the server's view of when a server session
-was last used.
-
-Creating a ServerSession
-------------------------
-
-When a driver needs to create a new ``ServerSession`` instance the only information
-it needs is the session ID to use for the new session. It can either get the
-session ID from the server by running the ``startSession`` command, or it can
-generate it locally.
-
-In either case, the lastUse field of the ``ServerSession`` MUST be set to the
-current time when the ``ServerSession`` is created.
-
-Generating a session ID locally
--------------------------------
-
-Running the ``startSession`` command to get a session ID for a new session requires
-a round trip to the server. As an optimization the server allows drivers to
-generate new session IDs locally and to just start using them. When a server
-sees a new session ID that it has never seen before it simply assumes that it
-is a new session.
-
-A session ID is a ``BsonDocument`` that has the following form:
-
-.. code:: typescript
-
- interface SessionId {
- id: UUID
- }
-
-Where the UUID is encoded as a BSON binary value of subtype 4.
-
-The id field of the session ID is a version 4 UUID that must comply with the
-format described in RFC 4122. Section 4.4 describes an algorithm for generating
-correctly-versioned UUIDs from a pseudo-random number generator.
-
-If a driver is unable to generate a version 4 UUID it MAY instead run the
-``startSession`` command and let the server generate the session ID.
-
-MongoDatabase changes
-=====================
-
-All ``MongoDatabase`` methods that talk to the server MUST send a session ID
-with the command when connected to a deployment that supports sessions so that
-the server can associate the operation with a session ID.
-
-New database methods that take an explicit session
---------------------------------------------------
-
-All ``MongoDatabase`` methods that talk to the server SHOULD be overloaded to
-take an explicit session parameter. (See `why is session an explicit parameter?`_.)
-
-When overloading methods to take a session parameter, the session parameter
-SHOULD be the first parameter. If overloading is not possible for your
-language, it MAY be in a different position or MAY be embedded in an options
-structure.
-
-Methods that have a session parameter MUST check that the session argument is
-not null and was created by the same ``MongoClient`` that this ``MongoDatabase`` came
-from and report an error if they do not match.
-
-Existing database methods that start an implicit session
---------------------------------------------------------
-
-When an existing ``MongoDatabase`` method that does not take a session is called,
-the driver MUST behave as if a new ``ClientSession`` was started just for this one
-operation and ended immediately after this operation completes. The actual
-implementation will likely involve calling ``client.startSession``, but that is not
-required by this spec. Regardless, please consult the startSession section to
-replicate the required steps for creating a session.
-The driver MUST NOT use the session if the checked out connection does not support sessions
-(see `How to Tell Whether a Connection Supports Sessions`_) and, in all cases, MUST NOT consume a server
-session id until after the connection is checked out and session support is confirmed.
-
-MongoCollection changes
-=======================
-
-All ``MongoCollection`` methods that talk to the server MUST send a session ID
-with the command when connected to a deployment that supports sessions so that
-the server can associate the operation with a session ID.
-
-New collection methods that take an explicit session
-----------------------------------------------------
-
-All ``MongoCollection`` methods that talk to the server, with the exception of
-`estimatedDocumentCount`, SHOULD be overloaded to take an explicit session
-parameter. (See `why is session an explicit parameter?`_.)
-
-When overloading methods to take a session parameter, the session parameter
-SHOULD be the first parameter. If overloading is not possible for your
-language, it MAY be in a different position or MAY be embedded in an options
-structure.
-
-Methods that have a session parameter MUST check that the session argument is
-not null and was created by the same ``MongoClient`` that this ``MongoCollection`` came
-from and report an error if they do not match.
-
-The `estimatedDocumentCount` helper does not support an explicit session
-parameter. The underlying command, `count`, is not supported in a transaction,
-so supporting an explicit session would likely confuse application developers.
-The helper returns an estimate of the documents in a collection and
-causal consistency is unlikely to improve the accuracy of the estimate.
-
-Existing collection methods that start an implicit session
-----------------------------------------------------------
-
-When an existing ``MongoCollection`` method that does not take a session is called,
-the driver MUST behave as if a new ``ClientSession`` was started just for this one
-operation and ended immediately after this operation completes. The actual
-implementation will likely involve calling ``client.startSession``, but that is not
-required by this spec. Regardless, please consult the startSession section to
-replicate the required steps for creating a session.
-The driver MUST NOT use the session if the checked out connection does not support sessions
-(see `How to Tell Whether a Connection Supports Sessions`_) and, in all cases, MUST NOT consume a server
-session id until after the connection is checked out and session support is confirmed.
-
-Sessions and Cursors
-====================
-
-When an operation using a session returns a cursor, all subsequent ``GETMORE``
-commands for that cursor MUST be run using the same session ID.
-
-If a driver decides to run a ``KILLCURSORS`` command on the cursor, it also MAY be
-run using the same session ID. See the Exceptions below for when it is permissible to not
-include a session ID in a ``KILLCURSORS`` command.
-
-Sessions and Connections
-========================
-
-To reduce the number of ``ServerSessions`` created, the driver MUST only obtain an implicit session's
-``ServerSession`` after it successfully checks out a connection.
-A driver SHOULD NOT attempt to release the acquired session before connection check in.
-
-Explicit sessions MAY be changed to allocate a server session similarly.
-
-How to Tell Whether a Connection Supports Sessions
-===================================================
-
-A driver can determine whether a connection supports sessions by checking whether
-the ``logicalSessionTimeoutMinutes`` property of the establishing handshake response has
-a value or not. If it has a value, sessions are supported.
-
-In the case of an explicit session, if sessions are not supported, the driver MUST raise an error.
-In the case of an implicit session, if sessions are not supported, the driver MUST ignore the session.
-
-Possible race condition when checking for session support
----------------------------------------------------------
-
-There is a possible race condition that can happen between the time the
-driver checks whether sessions are supported and subsequently sends a command
-to the server:
-
-* The server might have supported sessions at the time the connection was first
- opened (and reported a value for logicalSessionTimeoutMinutes in the initial
- response to the `handshake `_),
- but have subsequently been downgraded to not support sessions. The server does
- not close the socket in this scenario, so the driver will conclude that
- the server at the other end of this connection supports sessions.
-
-There is nothing that the driver can do about this race condition, and the server
-will just return an error in this scenario.
-
-Sending the session ID to the server on all commands
-====================================================
-
-When connected to a server that supports sessions a driver MUST append the
-session ID to every command it sends to the server (with the exceptions noted
-in the following section). It does this by adding a
-top level ``lsid`` field to the command sent to the server. A driver MUST do this
-without modifying any data supplied by the application (e.g. the command
-document passed to runCommand).:
-
-.. code:: typescript
-
- interface ExampleCommandWithLSID {
- foo: 1;
- lsid: SessionId;
- }
-
-Exceptions to sending the session ID to the server on all commands
-==================================================================
-
-There are some exceptions to the rule that a driver MUST append the session ID to
-every command it sends to the server.
-
-When opening and authenticating a connection
---------------------------------------------
-
-A driver MUST NOT append a session ID to any command sent during the process of
-opening and authenticating a connection.
-
-When monitoring the state of a deployment
------------------------------------------
-
-A driver MAY omit a session ID in hello and legacy hello commands sent solely
-for the purposes of monitoring the state of a deployment.
-
-When sending a parallelCollectionScan command
----------------------------------------------
-
-Sessions are designed for sequential operations and ``parallelCollectionScan``
-is designed for parallel operation. Because these are fundamentally
-incompatible goals, drivers MUST NOT append session ID to the
-``parallelCollectionScan`` command so that the resulting cursors have
-no associated session ID and thus can be used in parallel.
-
-When sending a killCursors command
-----------------------------------
-
-A driver MAY omit a session ID in ``killCursors`` commands for two reasons.
-First, ``killCursors`` is only ever sent to a particular server, so operation teams
-wouldn't need the ``lsid`` for cluster-wide killOp. An admin can manually kill the op with
-its operation id in the case that it is slow. Secondly, some drivers have a background
-cursor reaper to kill cursors that aren't exhausted and closed. Due to GC semantics,
-it can't use the same ``lsid`` for ``killCursors`` as was used for a cursor's ``find`` and ``getMore``,
-so there's no point in using any ``lsid`` at all.
-
-When multiple users are authenticated and the session is implicit
------------------------------------------------------------------
-
-The driver MUST NOT send a session ID from an implicit session when multiple
-users are authenticated. If possible the driver MUST NOT start an implicit
-session when multiple users are authenticated. Alternatively, if the driver
-cannot determine whether multiple users are authenticated at the point in time
-that an implicit session is started, then the driver MUST ignore any implicit
-sessions that subsequently end up being used on a connection that has multiple
-users authenticated.
-
-When using unacknowledged writes
---------------------------------
-
-A session ID MUST NOT be used simultaneously by more than one operation. Since
-drivers don't wait for a response for an unacknowledged write a driver would
-not know when the session ID could be reused. In theory a driver could use a
-new session ID for each unacknowledged write, but that would result in many
-orphaned sessions building up at the server.
-
-Therefore drivers MUST NOT send a session ID with unacknowledged writes under
-any circumstances:
-
-* For unacknowledged writes with an explicit session, drivers SHOULD raise an
- error. If a driver allows users to provide an explicit session with an
- unacknowledged write (e.g. for backwards compatibility), the driver MUST NOT
- send the session ID.
-
-* For unacknowledged writes without an explicit session, drivers SHOULD NOT use
- an implicit session. If a driver creates an implicit session for
- unacknowledged writes without an explicit session, the driver MUST NOT send
- the session ID.
-
-Drivers MUST document the behavior of unacknowledged writes for both explicit
-and implicit sessions.
-
-When wrapping commands in a ``$query`` field
---------------------------------------------
-
-If the driver is wrapping the command in a ``$query`` field for non-OP_MSG messages in order to pass a readPreference to a
-mongos (see `ReadPreference and Mongos <./find_getmore_killcursors_commands.rst#readpreference-and-mongos>`_),
-the driver SHOULD NOT add the ``lsid`` as a top-level field, and MUST add the ``lsid`` as a field of the ``$query``
-
-.. code:: typescript
-
- // Wrapped command:
- interface WrappedCommandExample {
- $query: {
- find: { foo: 1 }
- },
- $readPreference: {}
- }
-
- // Correct application of lsid
- interface CorrectLSIDUsageExample {
- $query: {
- find: { foo: 1 },
- lsid: SessionId
- },
- $readPreference: {}
- }
-
- // Incorrect application of lsid
- interface IncorrectLSIDUsageExample {
- $query: {
- find: { foo: 1 }
- },
- $readPreference: {},
- lsid: SessionId
- }
-
-
-Server Commands
-===============
-
-startSession
-------------
-
-The ``startSession`` server command has the following format:
-
-.. code:: typescript
-
- interface StartSessionCommand {
- startSession: 1;
- $clusterTime?: ClusterTime;
- }
-
-The ``$clusterTime`` field should only be sent when gossipping the cluster time. See the
-section "Gossipping the cluster time" for information on ``$clusterTime``.
-
-The ``startSession`` command MUST be sent to the ``admin`` database.
-
-The server response has the following format:
-
-.. code:: typescript
-
- interface StartSessionResponse {
- ok: 1;
- id: BsonDocument;
- }
-
-In case of an error, the server response has the following format:
-
-.. code:: typescript
-
- interface StartSessionError {
- ok: 0;
- errmsg: string;
- code: number;
- }
-
-When connected to a replica set the ``startSession`` command MUST be sent to the
-primary if the primary is available. The ``startSession`` command MAY be sent to a
-secondary if there is no primary available at the time the ``startSession`` command
-needs to be run.
-
-Drivers SHOULD generate session IDs locally if possible instead of running the
-``startSession`` command, since running the command requires a network round trip.
-
-endSessions
------------
-
-The ``endSessions`` server command has the following format:
-
-.. code:: typescript
-
- interface EndSessionCommand {
- endSessions: Array;
- $clusterTime?: ClusterTime;
- }
-
-The ``$clusterTime`` field should only be sent when gossipping the cluster time. See the
-section of "Gossipping the cluster time" for information on ``$clusterTime``.
-
-The ``endSessions`` command MUST be sent to the ``admin`` database.
-
-The server response has the following format:
-
-.. code:: typescript
-
- interface EndSessionResponse {
- ok: 1;
- }
-
-In case of an error, the server response has the following format:
-
-.. code:: typescript
-
- interface EndSessionError {
- ok: 0;
- errmsg: string;
- code: number;
- }
-
-Drivers MUST ignore any errors returned by the ``endSessions`` command.
-
-The ``endSessions`` command MUST be run once when the ``MongoClient`` instance is shut down.
-If the number of sessions is very large the ``endSessions`` command SHOULD be run
-multiple times to end 10,000 sessions at a time (in order to avoid creating excessively large commands).
-
-When connected to a sharded cluster the ``endSessions`` command can be sent to any
-mongos. When connected to a replica set the ``endSessions`` command MUST be sent to
-the primary if the primary is available, otherwise it MUST be sent to any
-available secondary.
-
-Server Session Pool
-===================
-
-Conceptually, each ``ClientSession`` can be thought of as having a new
-corresponding ``ServerSession``. However, starting a server session might require a
-round trip to the server (which can be avoided by generating the session ID
-locally) and ending a session requires a separate round trip to the server.
-Drivers can operate more efficiently and put less load on the server if they
-cache ``ServerSession`` instances for reuse. To this end drivers MUST
-implement a server session pool containing ``ServerSession`` instances
-available for reuse. A ``ServerSession`` pool MUST belong to a ``MongoClient``
-instance and have the same lifetime as the ``MongoClient`` instance.
-
-When a new implicit ``ClientSession`` is started it MUST NOT attempt to acquire a server
-session from the server session pool immediately. When a new explicit ``ClientSession`` is started
-it MAY attempt to acquire a server session from the server session pool immediately.
-See the algorithm below for the steps to follow when attempting to acquire a ``ServerSession`` from the server session pool.
-
-Note that ``ServerSession`` instances acquired from the server session pool might have as
-little as one minute left before becoming stale and being discarded server
-side. Drivers MUST document that if an application waits more than one minute
-after calling ``startSession`` to perform operations with that session it risks
-getting errors due to the server session going stale before it was used.
-
-A server session is considered stale by the server when it has not been used
-for a certain amount of time. The default amount of time is 30 minutes, but
-this value is configurable on the server. Servers that support sessions will
-report this value in the ``logicalSessionTimeoutMinutes`` field of the reply
-to the hello and legacy hello commands. The smallest reported timeout is recorded in the
-``logicalSessionTimeoutMinutes`` property of the ``TopologyDescription``. See the
-Server Discovery And Monitoring specification for details.
-
-When a ``ClientSession`` is ended it MUST return the server session to the server session pool.
-See the algorithm below for the steps to follow when returning a ``ServerSession`` instance to the server
-session pool.
-
-The server session pool has no maximum size. The pool only shrinks when a
-server session is acquired for use or discarded.
-
-When a ``MongoClient`` instance is closed the driver MUST proactively inform the
-server that the pooled server sessions will no longer be used by sending one or
-more ``endSessions`` commands to the server.
-
-The server session pool is modeled as a double ended queue. The algorithms
-below require the ability to add and remove ``ServerSession`` instances from the front of
-the queue and to inspect and possibly remove ``ServerSession`` instances from the back of
-the queue. The front of the queue holds ``ServerSession`` instances that have been released
-recently and should be the first to be reused. The back of the queue holds
-``ServerSession`` instances that have not been used recently and that potentially will be
-discarded if they are not used again before they expire.
-
-An implicit session MUST be returned to the pool immediately following the completion of
-an operation. When an implicit session is associated with a cursor for use with ``getMore``
-operations, the session MUST be returned to the pool immediately following a ``getMore``
-operation that indicates that the cursor has been exhausted. In particular, it MUST not wait
-until all documents have been iterated by the application or until the application disposes
-of the cursor. For language runtimes that provide the ability to attach finalizers to objects
-that are run prior to garbage collection, the cursor class SHOULD return an implicit session
-to the pool in the finalizer if the cursor has not already been exhausted.
-
-If a driver supports process forking, the session pool needs to be cleared on
-one side of the forked processes (just like sockets need to reconnect).
-Drivers MUST provide a way to clear the session pool without sending
-``endSessions``. Drivers MAY make this automatic when the process ID changes.
-If they do not, they MUST document how to clear the session pool wherever they
-document fork support. After clearing the session pool in this way, drivers
-MUST ensure that sessions already checked out are not returned to the new pool.
-
-If a driver has a server session pool and a network error is encountered when
-executing any command with a ``ClientSession``, the driver MUST mark the
-associated ``ServerSession`` as dirty. Dirty server sessions are discarded
-when returned to the server session pool. It is valid for a dirty session to be
-used for subsequent commands (e.g. an implicit retry attempt, a later command
-in a bulk write, or a later operation on an explicit session), however, it MUST
-remain dirty for the remainder of its lifetime regardless if later commands
-succeed.
-
-Algorithm to acquire a ServerSession instance from the server session pool
---------------------------------------------------------------------------
-
-1. If the server session pool is empty create a new ``ServerSession`` and use it
-
-2. Otherwise remove a ``ServerSession`` from the front of the queue and examine it:
-
- * If the driver is in load balancer mode, use this ``ServerSession``.
- * If it has at least one minute left before becoming stale use this ``ServerSession``
- * If it has less than one minute left before becoming stale discard it (let it be garbage collected) and return to step 1.
-
-See the `Load Balancer Specification <../load-balancers/load-balancers.md#session-expiration>`__
-for details on session expiration.
-
-
-Algorithm to return a ServerSession instance to the server session pool
------------------------------------------------------------------------
-
-1. Before returning a server session to the pool a driver MUST first check the
- server session pool for server sessions at the back of the queue that are about
- to expire (meaning they will expire in less than one minute). A driver MUST
- stop checking server sessions once it encounters a server session that is not
- about to expire. Any server sessions found that are about to expire are removed
- from the end of the queue and discarded (or allowed to be garbage collected)
-
-2. Then examine the server session that is being returned to the pool and:
-
- * If this session is marked dirty (i.e. it was involved in a network error)
- discard it (let it be garbage collected)
- * If it will expire in less than one minute discard it
- (let it be garbage collected)
- * If it won't expire for at least one minute add it to the front of the queue
-
-Gossipping the cluster time
-===========================
-
-Drivers MUST gossip the cluster time when connected to a deployment that uses
-cluster times.
-
-Gossipping the cluster time is a process in which the driver participates in
-distributing the logical cluster time in a deployment. Drivers learn the
-current cluster time (from a particular server's perspective) in responses
-they receive from servers. Drivers in turn forward the highest cluster
-time they have seen so far to any server they subsequently send commands
-to.
-
-A driver detects that it MUST participate in gossipping the cluster time when it sees
-a ``$clusterTime`` in a response received from a server.
-
-Receiving the current cluster time
-----------------------------------
-
-Drivers MUST examine all responses from the server
-commands to see if they contain a top level field named ``$clusterTime`` formatted
-as follows:
-
-.. code:: typescript
-
- interface ClusterTime {
- clusterTime: Timestamp;
- signature: {
- hash: Binary;
- keyId: Int64;
- };
- }
-
- interface AnyServerResponse {
- // ... other properties ...
- $clusterTime: ClusterTime;
- }
-
-Whenever a driver receives a cluster time from a server it MUST compare it to
-the current highest seen cluster time for the deployment. If the new cluster time
-is higher than the highest seen cluster time it MUST become the new highest
-seen cluster time. Two cluster times are compared using only the BsonTimestamp
-value of the ``clusterTime`` embedded field (be sure to include both the timestamp
-and the increment of the BsonTimestamp in the comparison). The signature field
-does not participate in the comparison.
-
-Sending the highest seen cluster time
--------------------------------------
-
-Whenever a driver sends a command to a server it MUST include the highest
-seen cluster time in a top level field called ``$clusterTime``, in the same format
-as it was received in (but see Gossipping with mixed server versions below).
-
-How to compute the $clusterTime to send to a server
----------------------------------------------------
-
-When sending ``$clusterTime`` to the server the driver MUST send the greater of
-the ``clusterTime`` values from ``MongoClient`` and ``ClientSession``. Normally
-a session's ``clusterTime`` will be less than or equal to the ``clusterTime``
-in ``MongoClient``, but it could be greater than the ``clusterTime`` in
-``MongoClient`` if ``advanceClusterTime`` was called with a ``clusterTime``
-that came from somewhere else.
-
-A driver MUST NOT use the ``clusterTime`` of a ``ClientSession`` anywhere else
-except when executing an operation with this session. This rule protects the
-driver from the scenario where ``advanceClusterTime`` was called with an
-invalid ``clusterTime`` by limiting the resulting server errors to the one
-session. The ``clusterTime`` of a ``MongoClient`` MUST NOT be advanced by any
-``clusterTime`` other than a ``$clusterTime`` received directly from a server.
-
-The safe way to compute the ``$clusterTime`` to send to a server is:
-
-1. When the ``ClientSession`` is first started its ``clusterTime`` is set to
-null.
-
-2. When the driver sends ``$clusterTime`` to the server it should send the
-greater of the ``ClientSession`` ``clusterTime`` and the ``MongoClient``
-``clusterTime`` (either one could be null).
-
-3. When the driver receives a ``$clusterTime`` from the server it should advance
-both the ``ClientSession`` and the ``MongoClient`` ``clusterTime``. The ``clusterTime``
-of a ``ClientSession`` can also be advanced by calling ``advanceClusterTime``.
-
-This sequence ensures that if the ``clusterTime`` of a ``ClientSession`` is invalid only that
-one session will be affected. The ``MongoClient`` ``clusterTime`` is only
-updated with ``$clusterTime`` values known to be valid because they were
-received directly from a server.
-
-Tracking the highest seen cluster time does not require checking the deployment topology or the server version
---------------------------------------------------------------------------------------------------------------
-
-Drivers do not need to check the deployment topology or the server version they
-are connected to in order to track the highest seen ``$clusterTime``. They simply
-need to check for the presence of the ``$clusterTime`` field in responses received
-from servers.
-
-Gossipping with mixed server versions
--------------------------------------
-
-Drivers MUST check that the server they are sending a command to supports
-``$clusterTime`` before adding ``$clusterTime`` to the command. A server supports
-``$clusterTime`` when the ``maxWireVersion`` >= 6.
-
-This supports the (presumably short lived) scenario where not all servers have
-been upgraded to 3.6.
-
-Test Plan
-=========
-
-See the `README `_ for tests.
-
-Motivation
-==========
-
-Drivers currently have no concept of a session. The driver API needs to be
-extended to support sessions.
-
-Design Rationale
-================
-
-The goal is to modify the driver API in such a way that existing programs that
-don't use sessions continue to compile and run correctly. This goal is met by
-defining new methods (or overloads) that take a session parameter. An
-application does not need to be modified unless it wants to take advantage of
-the new features supported by sessions.
-
-Backwards Compatibility
-=======================
-
-The API changes to support sessions extend the existing API but do not
-introduce any backward breaking changes. Existing programs that don't use
-sessions continue to compile and run correctly.
-
-Reference Implementation (always required)
-==========================================
-
-A reference implementation must be completed before any spec is given status
-"Final", but it need not be completed before the spec is “Accepted”. While
-there is merit to the approach of reaching consensus on the specification and
-rationale before writing code, the principle of "rough consensus and running
-code" is still useful when it comes to resolving many discussions of spec
-details. A final reference implementation must include test code and
-documentation.
-
-The C and C# drivers will do initial POC implementations.
-
-Future work (optional)
-======================
-
-Use this section to discuss any possible work for a future spec. This could
-cover issues where no consensus could be reached but that don’t block this
-spec, changes that were rejected due to unclear use cases, etc.
-
-Open questions
-==============
-
-Q&A
-===
-
-Why do we say drivers MUST NOT attempt to detect unsafe multi-threaded or multi-process use of ``ClientSession``?
------------------------------------------------------------------------------------------------------------------
-
-Because doing so would provide an illusion of safety. It doesn't make these
-instances thread safe. And even if when testing an application no such exceptions
-are encountered, that doesn't prove anything. The application might still be
-using the instances in a thread-unsafe way and just didn't happen to do so during
-a test run. The final argument is that checking this would require overhead
-that doesn't provide any clear benefit.
-
-Why is session an explicit parameter?
--------------------------------------
-
-A previous draft proposed that ClientSession would be a MongoClient-like object added to the object hierarchy::
-
- session = client.startSession(...)
- database = session.getDatabase(...) // database is associated with session
- collection = database.getCollection(...) // collection is associated with session
- // operations on collection implicitly use session
- collection.insertOne({})
- session.endSession()
-
-The central feature of this design is that a MongoCollection (or database, or perhaps a GridFS object) is associated with a session, which is then an implied parameter to any operations executed using that MongoCollection.
-
-This API was rejected, with the justification that a ClientSession does not naturally belong to the state of a MongoCollection. MongoCollection has up to now been a stable long-lived object that could be widely shared, and in most drivers it is thread safe. Once we associate a ClientSession with it, the MongoCollection object becomes short-lived and is no longer thread safe. It is a bad sign that MongoCollection's thread safety and lifetime vary depending on how its parent MongoDatabase is created.
-
-Instead, we require users to pass session as a parameter to each function::
-
- session = client.startSession(...)
- database = client.getDatabase(...)
- collection = database.getCollection(...)
- // users must explicitly pass session to operations
- collection.insertOne(session, {})
- session.endSession()
-
-Why does a network error cause the ``ServerSession`` to be discarded from the pool?
------------------------------------------------------------------------------------
-
-When a network error is encountered when executing an operation with a
-``ClientSession``, the operation may be left running on the server. Re-using
-this ``ServerSession`` can lead to parallel operations which violates the
-rule that a session must be used sequentially. This results in multiple
-problems:
-
-#. killSessions to end an earlier operation would surprisingly also end a
- later operation.
-#. An otherwise unrelated operation that just happens to use that same server
- session will potentially block waiting for the previous operation to
- complete. For example, a transactional write will block a subsequent
- transactional write.
-
-Why do automatic retry attempts re-use a dirty implicit session?
-----------------------------------------------------------------
-
-The retryable writes spec requires that both the original and retry attempt
-use the same server session. The server will block the retry attempt until the
-initial attempt completes at which point the retry attempt will continue
-executing.
-
-For retryable reads that use an implicit session, drivers could choose to use a
-new server session for the retry attempt however this would lose the
-information that these two reads are related.
-
-Why don't drivers run the endSessions command to cleanup dirty server sessions?
--------------------------------------------------------------------------------
-
-Drivers do not run the endSessions command when discarding a dirty server
-session because disconnects should be relatively rare and the server won't
-normally accumulate a large number of abandoned dirty sessions. Any abandoned
-sessions will be automatically cleaned up by the server after the
-configured ``logicalSessionTimeoutMinutes``.
-
-
-Why must drivers wait to consume a server session until after a connection is checked out?
-------------------------------------------------------------------------------------------
-
-The problem that may occur is when the number of concurrent application requests are larger than the number of available connections,
-the driver may generate many more implicit sessions than connections.
-For example with maxPoolSize=1 and 100 threads, 100 implicit sessions may be created.
-This increases the load on the server since session state is cached in memory.
-In the worst case this kind of workload can hit the session limit and trigger TooManyLogicalSessions.
-
-In order to address this, drivers MUST NOT consume a server session id until after the connection is checked out.
-This change will limit the number of "in use" server sessions to no greater than an application's maxPoolSize.
-
-The language here is specific about obtaining a server session as opposed to creating the implicit session
-to permit drivers to take an implementation approach where the implicit session creation logic largely remains unchanged.
-Implicit session creation can be left as is, as long as the underlying server resource isn't allocated until it
-is needed and, known it will be used, after connection checkout succeeds.
-
-It is still possible that via explicit sessions or cursors, which hold on to the session they started with, a driver could over allocate sessions.
-But those scenarios are extenuating and outside the scope of solving in this spec.
-
-Why should drivers NOT attempt to release a serverSession before checking back in the operation's connection?
--------------------------------------------------------------------------------------------------------------
-
-There are a variety of cases, such as retryable operations or cursor creating operations,
-where a ``serverSession`` must remain acquired by the ``ClientSession`` after an operation is attempted.
-Attempting to account for all these scenarios has risks that do not justify the potential guaranteed ``ServerSession`` allocation limiting.
-
-Changelog
-=========
-
-:2017-09-13: If causalConsistency option is omitted assume true
-:2017-09-16: Omit session ID when opening and authenticating a connection
-:2017-09-18: Drivers MUST gossip the cluster time when they see a $clusterTime
-:2017-09-19: How to safely use initialClusterTime
-:2017-09-29: Add an exception to the rule that ``KILLCURSORS`` commands always require a session id
-:2017-10-03: startSession and endSessions commands MUST be sent to the admin database
-:2017-10-03: Fix format of endSessions command
-:2017-10-04: Added advanceClusterTime
-:2017-10-06: Added descriptions of explicit and implicit sessions
-:2017-10-17: Implicit sessions MUST NOT be used when multiple users authenticated
-:2017-10-19: Possible race conditions when checking whether a deployment supports sessions
-:2017-11-21: Drivers MUST NOT send a session ID for unacknowledged writes
-:2018-01-10: Note that MongoClient must retain highest clusterTime
-:2018-01-10: Update test plan for drivers without APM
-:2018-01-11: Clarify that sessions require replica sets or sharded clusters
-:2018-02-20: Add implicit/explicit session tests
-:2018-02-20: Drivers SHOULD error if unacknowledged writes are used with sessions
-:2018-05-23: Drivers MUST not use session ID with parallelCollectionScan
-:2018-06-07: Document that estimatedDocumentCount does not support explicit sessions
-:2018-07-19: Justify why session must be an explicit parameter to each function
-:2018-10-11: Session pools must be cleared in child process after fork
-:2019-05-15: A ServerSession that is involved in a network error MUST be discarded
-:2019-10-22: Drivers may defer checking if a deployment supports sessions until the first
-:2021-04-08: Updated to use hello and legacy hello
-:2021-04-08: Adding in behaviour for load balancer mode.
-:2020-05-26: Simplify logic for determining sessions support
-:2022-01-28: Implicit sessions MUST obtain server session after connection checkout succeeds
-:2022-03-24: ServerSession Pooling is required and clarifies session acquisition bounding
-:2022-06-13: Move prose tests to test README and apply new ordering
-:2022-10-05: Remove spec front matter
-:2023-02-24: Defer checking for session support until after connection checkout
+.. note::
+ This specification has been converted to Markdown and renamed to
+ `driver-sessions.md `_.
diff --git a/source/sessions/snapshot-sessions.md b/source/sessions/snapshot-sessions.md
new file mode 100644
index 0000000000..c34aa7b89c
--- /dev/null
+++ b/source/sessions/snapshot-sessions.md
@@ -0,0 +1,243 @@
+# Snapshot Reads Specification
+
+- Status: Accepted
+- Minimum Server Version: 5.0
+
+______________________________________________________________________
+
+## Abstract
+
+Version 5.0 of the server introduces support for read concern level "snapshot" (non-speculative) for read commands
+outside of transactions, including on secondaries. This spec builds upon the
+[Sessions Specification](./driver-sessions.md) to define how an application requests "snapshot" level read concern and
+how a driver interacts with the server to implement snapshot reads.
+
+## Definitions
+
+### META
+
+The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
+
+### Terms
+
+**ClientSession**\
+The driver object representing a client session and the operations that can be performed on it.
+
+**MongoClient**\
+The root object of a driver's API. MAY be named differently in some drivers.
+
+**MongoCollection**\
+The driver object representing a collection and the operations that can be performed on it. MAY be
+named differently in some drivers.
+
+**MongoDatabase**\
+The driver object representing a database and the operations that can be performed on it. MAY be
+named differently in some drivers.
+
+**ServerSession**\
+The driver object representing a server session.
+
+**Session**\
+A session is an abstract concept that represents a set of sequential operations executed by an application
+that are related in some way. This specification defines how sessions are used to implement snapshot reads.
+
+**Snapshot reads**\
+Reads with read concern level `snapshot` that occur outside of transactions on both the primary and
+secondary nodes, including in sharded clusters. Snapshots reads are majority committed reads.
+
+**Snapshot timestamp**\
+Snapshot timestamp, representing timestamp of the first supported read operation (i.e.
+find/aggregate/distinct) in the session. The server creates a cursor in response to a snapshot find/aggregate command
+and reports `atClusterTime` within the `cursor` field in the response. For the distinct command the server adds a
+top-level `atClusterTime` field to the response. The `atClusterTime` field represents the timestamp of the read and is
+guaranteed to be majority committed.
+
+## Specification
+
+An application requests snapshot reads by creating a `ClientSession` with options that specify that snapshot reads are
+desired. An application then passes the session as an argument to methods in the `MongoDatabase` and `MongoCollection`
+classes. Read operations (find/aggregate/distinct) performed against that session will be read from the same snapshot.
+
+## High level summary of the API changes for snapshot reads
+
+Snapshot reads are built on top of client sessions.
+
+Applications will start a new client session for snapshot reads like this:
+
+```typescript
+options = new SessionOptions(snapshot = true);
+session = client.startSession(options);
+```
+
+All read operations performed using this session will be read from the same snapshot.
+
+If no value is provided for `snapshot` a value of false is implied. There are no MongoDatabase, MongoClient, or
+MongoCollection API changes.
+
+## SessionOptions changes
+
+`SessionOptions` change summary
+
+```typescript
+class SessionOptions {
+ Optional snapshot;
+
+ // other options defined by other specs
+}
+```
+
+In order to support snapshot reads a new property named `snapshot` is added to `SessionOptions`. Applications set
+`snapshot` when starting a client session to indicate whether they want snapshot reads. All read operations performed
+using that client session will share the same snapshot.
+
+Each new member is documented below.
+
+### snapshot
+
+Applications set `snapshot` when starting a session to indicate whether they want snapshot reads.
+
+Note that the `snapshot` property is optional. The default value of this property is false.
+
+Snapshot reads and causal consistency are mutually exclusive. Therefore if `snapshot` is set to true,
+`causalConsistency` must be false. Client MUST throw an error if both `snapshot` and `causalConsistency` are set to
+true. Snapshot reads are supported on both primaries and secondaries.
+
+## ClientSession changes
+
+Transactions are not allowed with snapshot sessions. Calling `session.startTransaction(options)` on a snapshot session
+MUST raise an error.
+
+## ReadConcern changes
+
+`snapshot` added to [ReadConcernLevel enumeration](../read-write-concern/read-write-concern.rst#read-concern).
+
+## Server Commands
+
+There are no new server commands related to snapshot reads. Instead, snapshot reads are implemented by:
+
+1. Saving the `atClusterTime` returned by 5.0+ servers for the first find/aggregate/distinct operation in a private
+ `snapshotTime` property of the `ClientSession` object. Drivers MUST save `atClusterTime` in the `ClientSession`
+ object.
+2. Passing that `snapshotTime` in the `atClusterTime` field of the `readConcern` field for subsequent snapshot read
+ operations (i.e. find/aggregate/distinct commands).
+
+## Server Command Responses
+
+For find/aggregate commands the server returns `atClusterTime` within the `cursor` field of the response.
+
+```typescript
+{
+ ok : 1 or 0,
+ ... // the rest of the command reply
+ cursor : {
+ ... // the rest of the cursor reply
+ atClusterTime :
+ }
+}
+```
+
+For distinct commands the server returns `atClusterTime` as a top-level field in the response.
+
+```typescript
+{
+ ok : 1 or 0,
+ ... // the rest of the command reply
+ atClusterTime :
+}
+```
+
+The `atClusterTime` timestamp MUST be stored in the `ClientSession` to later be passed as the `atClusterTime` field of
+the `readConcern` with a `snapshot` level in subsequent read operations.
+
+## Server Errors
+
+1. The server may reply to read commands with a `SnapshotTooOld(239)` error if the client's `atClusterTime` value is not
+ available in the server's history.
+2. The server will return `InvalidOptions(72)` error if both `atClusterTime` and `afterClusterTime` options are set to
+ true.
+3. The server will return `InvalidOptions(72)` error if the command does not support readConcern.level "snapshot".
+
+## Snapshot Read Commands
+
+For snapshot reads the driver MUST first obtain `atClusterTime` from the server response of a find/aggregate/distinct
+command, by specifying `readConcern` with `snapshot` level field, and store it as `snapshotTime` in the `ClientSession`
+object.
+
+```typescript
+{
+ find : , // or other read command
+ ... // the rest of the command parameters
+ readConcern :
+ {
+ level : "snapshot"
+ }
+}
+```
+
+For subsequent reads in the same session, the driver MUST send the `snapshotTime` saved in the `ClientSession` as the
+value of the `atClusterTime` field of the `readConcern` with a `snapshot` level:
+
+```typescript
+{
+ find : , // or other read command
+ ... // the rest of the command parameters
+ readConcern :
+ {
+ level : "snapshot",
+ atClusterTime :
+ }
+}
+```
+
+Lists of commands that support snapshot reads:
+
+1. find
+2. aggregate
+3. distinct
+
+## Sending readConcern to the server on all commands
+
+Drivers MUST set the readConcern `level` and `atClusterTime` fields (as outlined above) on all commands in a snapshot
+session, including commands that do not accept a readConcern (e.g. insert, update). This ensures that the server will
+return an error for invalid operations, such as writes, within a session configured for snapshot reads.
+
+## Requires MongoDB 5.0+
+
+Snapshot reads require MongoDB 5.0+. When the connected server's maxWireVersion is less than 13, drivers MUST throw an
+exception with the message "Snapshot reads require MongoDB 5.0 or later".
+
+## Motivation
+
+To support snapshot reads. Only supported with server version 5.0+ or newer.
+
+## Design Rationale
+
+The goal is to modify the driver API as little as possible so that existing programs that don't need snapshot reads
+don't have to be changed. This goal is met by defining a `SessionOptions` field that applications use to start a
+`ClientSession` that can be used for snapshot reads. Alternative explicit approach of obtaining `atClusterTime` from
+`cursor` object and passing it to read concern object was considered initially. A session-based approach was chosen as
+it aligns better with the existing API, and requires minimal API changes. Future extensibility for snapshot reads would
+be best served by a session-based approach, as no API changes will be required.
+
+## Backwards Compatibility
+
+The API changes to support snapshot reads extend the existing API but do not introduce any backward breaking changes.
+Existing programs that don't use snapshot reads continue to compile and run correctly.
+
+## Reference Implementation
+
+C# driver will provide the reference implementation. The corresponding ticket is
+[CSHARP-3668](https://jira.mongodb.org/browse/CSHARP-3668).
+
+## Q&A
+
+## Changelog
+
+- 2024-05-08: Migrated from reStructuredText to Markdown.
+- 2021-06-15: Initial version.
+- 2021-06-28: Raise client side error on \< 5.0.
+- 2021-06-29: Send readConcern with all snapshot session commands.
+- 2021-07-16: Grammar revisions. Change SHOULD to MUST for startTransaction error to comply with existing tests.
+- 2021-08-09: Updated client-side error spec tests to use correct syntax for `test.expectEvents`
+- 2022-10-05: Remove spec front matter
diff --git a/source/sessions/snapshot-sessions.rst b/source/sessions/snapshot-sessions.rst
index ffa9ceeb94..244a49ce25 100644
--- a/source/sessions/snapshot-sessions.rst
+++ b/source/sessions/snapshot-sessions.rst
@@ -1,287 +1,4 @@
-============================
-Snapshot Reads Specification
-============================
-:Status: Accepted
-:Minimum Server Version: 5.0
-
-.. contents::
-
---------
-
-Abstract
-========
-
-Version 5.0 of the server introduces support for read concern level "snapshot" (non-speculative)
-for read commands outside of transactions, including on secondaries.
-This spec builds upon the `Sessions Specification <../driver-sessions.rst>`_ to define how an application
-requests "snapshot" level read concern and how a driver interacts with the server
-to implement snapshot reads.
-
-Definitions
-===========
-
-META
-----
-
-The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
-“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be
-interpreted as described in `RFC 2119 `_.
-
-Terms
------
-
-ClientSession
- The driver object representing a client session and the operations that can be
- performed on it.
-
-MongoClient
- The root object of a driver's API. MAY be named differently in some drivers.
-
-MongoCollection
- The driver object representing a collection and the operations that can be
- performed on it. MAY be named differently in some drivers.
-
-MongoDatabase
- The driver object representing a database and the operations that can be
- performed on it. MAY be named differently in some drivers.
-
-ServerSession
- The driver object representing a server session.
-
-Session
- A session is an abstract concept that represents a set of sequential
- operations executed by an application that are related in some way. This
- specification defines how sessions are used to implement snapshot reads.
-
-Snapshot reads
- Reads with read concern level ``snapshot`` that occur outside of transactions on
- both the primary and secondary nodes, including in sharded clusters.
- Snapshots reads are majority committed reads.
-
-Snapshot timestamp
- Snapshot timestamp, representing timestamp of the first supported read operation (i.e. find/aggregate/distinct) in the session.
- The server creates a cursor in response to a snapshot find/aggregate command and
- reports ``atClusterTime`` within the ``cursor`` field in the response. For the distinct command the server adds a top-level ``atClusterTime`` field to the response.
- The ``atClusterTime`` field represents the timestamp of the read and is guaranteed to be majority committed.
-
-Specification
-=============
-
-An application requests snapshot reads by creating a ``ClientSession``
-with options that specify that snapshot reads are desired. An
-application then passes the session as an argument to methods in the
-``MongoDatabase`` and ``MongoCollection`` classes. Read operations (find/aggregate/distinct) performed against
-that session will be read from the same snapshot.
-
-High level summary of the API changes for snapshot reads
-========================================================
-
-Snapshot reads are built on top of client sessions.
-
-Applications will start a new client session for snapshot reads like
-this:
-
-.. code:: typescript
-
- options = new SessionOptions(snapshot = true);
- session = client.startSession(options);
-
-All read operations performed using this session will be read from the same snapshot.
-
-If no value is provided for ``snapshot`` a value of false is
-implied.
-There are no MongoDatabase, MongoClient, or MongoCollection API changes.
-
-SessionOptions changes
-======================
-
-``SessionOptions`` change summary
-
-.. code:: typescript
-
- class SessionOptions {
- Optional snapshot;
-
- // other options defined by other specs
- }
-
-In order to support snapshot reads a new property named
-``snapshot`` is added to ``SessionOptions``. Applications set
-``snapshot`` when starting a client session to indicate
-whether they want snapshot reads. All read operations performed
-using that client session will share the same snapshot.
-
-Each new member is documented below.
-
-snapshot
---------
-
-Applications set ``snapshot`` when starting a session to
-indicate whether they want snapshot reads.
-
-Note that the ``snapshot`` property is optional. The default value of
-this property is false.
-
-Snapshot reads and causal consistency are mutually exclusive. Therefore if ``snapshot`` is set to true,
-``causalConsistency`` must be false. Client MUST throw an error if both ``snapshot`` and ``causalConsistency`` are set to true.
-Snapshot reads are supported on both primaries and secondaries.
-
-ClientSession changes
-=====================
-
-Transactions are not allowed with snapshot sessions.
-Calling ``session.startTransaction(options)`` on a snapshot session MUST raise an error.
-
-ReadConcern changes
-===================
-
-``snapshot`` added to `ReadConcernLevel enumeration <../read-write-concern/read-write-concern.rst#read-concern>`_.
-
-Server Commands
-===============
-
-There are no new server commands related to snapshot reads. Instead,
-snapshot reads are implemented by:
-
-1. Saving the ``atClusterTime`` returned by 5.0+ servers for the first find/aggregate/distinct operation in a
- private ``snapshotTime`` property of the ``ClientSession`` object. Drivers MUST save ``atClusterTime``
- in the ``ClientSession`` object.
-
-2. Passing that ``snapshotTime`` in the ``atClusterTime`` field of the ``readConcern`` field
- for subsequent snapshot read operations (i.e. find/aggregate/distinct commands).
-
-Server Command Responses
-========================
-
-For find/aggregate commands the server returns ``atClusterTime`` within the ``cursor``
-field of the response.
-
-.. code:: typescript
-
- {
- ok : 1 or 0,
- ... // the rest of the command reply
- cursor : {
- ... // the rest of the cursor reply
- atClusterTime :
- }
- }
-
-For distinct commands the server returns ``atClusterTime`` as a top-level field in the
-response.
-
-.. code:: typescript
-
- {
- ok : 1 or 0,
- ... // the rest of the command reply
- atClusterTime :
- }
-
-The ``atClusterTime`` timestamp MUST be stored in the ``ClientSession`` to later be passed as the
-``atClusterTime`` field of the ``readConcern`` with a ``snapshot`` level in subsequent read operations.
-
-Server Errors
-=============
-1. The server may reply to read commands with a ``SnapshotTooOld(239)`` error if the client's ``atClusterTime`` value is not available in the server's history.
-2. The server will return ``InvalidOptions(72)`` error if both ``atClusterTime`` and ``afterClusterTime`` options are set to true.
-3. The server will return ``InvalidOptions(72)`` error if the command does not support readConcern.level "snapshot".
-
-Snapshot Read Commands
-======================
-
-For snapshot reads the driver MUST first obtain ``atClusterTime`` from the server response of a find/aggregate/distinct command,
-by specifying ``readConcern`` with ``snapshot`` level field, and store it as ``snapshotTime`` in the
-``ClientSession`` object.
-
-.. code:: typescript
-
- {
- find : , // or other read command
- ... // the rest of the command parameters
- readConcern :
- {
- level : "snapshot"
- }
- }
-
-For subsequent reads in the same session, the driver MUST send the ``snapshotTime`` saved in
-the ``ClientSession`` as the value of the ``atClusterTime`` field of the
-``readConcern`` with a ``snapshot`` level:
-
-.. code:: typescript
-
- {
- find : , // or other read command
- ... // the rest of the command parameters
- readConcern :
- {
- level : "snapshot",
- atClusterTime :
- }
- }
-
-Lists of commands that support snapshot reads:
-
-1. find
-2. aggregate
-3. distinct
-
-Sending readConcern to the server on all commands
-=================================================
-
-Drivers MUST set the readConcern ``level`` and ``atClusterTime`` fields (as
-outlined above) on all commands in a snapshot session, including commands that
-do not accept a readConcern (e.g. insert, update). This ensures that the server
-will return an error for invalid operations, such as writes, within a session
-configured for snapshot reads.
-
-Requires MongoDB 5.0+
-=====================
-
-Snapshot reads require MongoDB 5.0+. When the connected server's
-maxWireVersion is less than 13, drivers MUST throw an exception with the
-message "Snapshot reads require MongoDB 5.0 or later".
-
-Motivation
-==========
-
-To support snapshot reads. Only supported with server version 5.0+ or newer.
-
-Design Rationale
-================
-
-The goal is to modify the driver API as little as possible so that existing
-programs that don't need snapshot reads don't have to be changed.
-This goal is met by defining a ``SessionOptions`` field that applications use to
-start a ``ClientSession`` that can be used for snapshot reads. Alternative explicit approach of
-obtaining ``atClusterTime`` from ``cursor`` object and passing it to read concern object was considered initially.
-A session-based approach was chosen as it aligns better with the existing API, and requires minimal API changes.
-Future extensibility for snapshot reads would be best served by a session-based approach, as no API changes will be required.
-
-Backwards Compatibility
-=======================
-
-The API changes to support snapshot reads extend the existing API but do not
-introduce any backward breaking changes. Existing programs that don't use
-snapshot reads continue to compile and run correctly.
-
-Reference Implementation
-========================
-
-C# driver will provide the reference implementation.
-The corresponding ticket is `CSHARP-3668 `_.
-
-Q&A
-===
-
-Changelog
-=========
-
-:2021-06-15: Initial version.
-:2021-06-28: Raise client side error on < 5.0.
-:2021-06-29: Send readConcern with all snapshot session commands.
-:2021-07-16: Grammar revisions. Change SHOULD to MUST for startTransaction error to comply with existing tests.
-:2021-08-09: Updated client-side error spec tests to use correct syntax for ``test.expectEvents``
-:2022-10-05: Remove spec front matter
+.. note::
+ This specification has been converted to Markdown and renamed to
+ `snapshot-sessions.md `_.
diff --git a/source/sessions/tests/README.md b/source/sessions/tests/README.md
new file mode 100644
index 0000000000..218e481a2f
--- /dev/null
+++ b/source/sessions/tests/README.md
@@ -0,0 +1,249 @@
+# Driver Session Tests
+
+______________________________________________________________________
+
+## Introduction
+
+The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
+sessions. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md).
+
+### Snapshot session tests
+
+The default snapshot history window on the server is 5 minutes. Running the test in debug mode, or in any other slow
+configuration may lead to `SnapshotTooOld` errors. Drivers can work around this issue by increasing the server's
+`minSnapshotHistoryWindowInSeconds` parameter, for example:
+
+```python
+client.admin.command('setParameter', 1, minSnapshotHistoryWindowInSeconds=600)
+```
+
+### Testing against servers that do not support sessions
+
+Since all regular 3.6+ servers support sessions, the prose tests which test for session non-support SHOULD use a
+mongocryptd server as the test server (available with server versions 4.2+); however, if future versions of mongocryptd
+support sessions or if mongocryptd is not a viable option for the driver implementing these tests, another server MAY be
+substituted as long as it does not return a non-null value for `logicalSessionTimeoutMinutes`; in the event that no such
+server is readily available, a mock server may be used as a last resort.
+
+As part of the test setup for these cases, create a `MongoClient` pointed at the test server with the options specified
+in the test case and verify that the test server does NOT define a value for `logicalSessionTimeoutMinutes` by sending a
+hello command and checking the response.
+
+## Prose tests
+
+### 1. Setting both `snapshot` and `causalConsistency` to true is not allowed
+
+Snapshot sessions tests require server of version 5.0 or higher and replica set or a sharded cluster deployment.
+
+- `client.startSession(snapshot = true, causalConsistency = true)`
+- Assert that an error was raised by driver
+
+### 2. Pool is LIFO
+
+This test applies to drivers with session pools.
+
+- Call `MongoClient.startSession` twice to create two sessions, let us call them `A` and `B`.
+- Call `A.endSession`, then `B.endSession`.
+- Call `MongoClient.startSession`: the resulting session must have the same session ID as `B`.
+- Call `MongoClient.startSession` again: the resulting session must have the same session ID as `A`.
+
+### 3. `$clusterTime` in commands
+
+- Turn `heartbeatFrequencyMS` up to a very large number.
+- Register a command-started and a command-succeeded APM listener. If the driver has no APM support, inspect
+ commands/replies in another idiomatic way, such as monkey-patching or a mock server.
+- Send a `ping` command to the server with the generic `runCommand` method.
+- Assert that the command passed to the command-started listener includes `$clusterTime` if and only if `maxWireVersion`
+ \>= 6.
+- Record the `$clusterTime`, if any, in the reply passed to the command-succeeded APM listener.
+- Send another `ping` command.
+- Assert that `$clusterTime` in the command passed to the command-started listener, if any, equals the `$clusterTime` in
+ the previous server reply. (Turning `heartbeatFrequencyMS` up prevents an intervening heartbeat from advancing the
+ `$clusterTime` between these final two steps.)
+
+Repeat the above for:
+
+- An aggregate command from the `aggregate` helper method
+- A find command from the `find` helper method
+- An insert command from the `insert_one` helper method
+
+### 4. Explicit and implicit session arguments
+
+- Register a command-started APM listener. If the driver has no APM support, inspect commands in another idiomatic way,
+ such as monkey-patching or a mock server.
+- Create `client1`
+- Get `database` from `client1`
+- Get `collection` from `database`
+- Start `session` from `client1`
+- Call `collection.insertOne(session,...)`
+- Assert that the command passed to the command-started listener contained the session `lsid` from `session`.
+- Call `collection.insertOne(,...)` (*without* a session argument)
+- Assert that the command passed to the command-started listener contained a session `lsid`.
+
+Repeat the above for all methods that take a session parameter.
+
+### 5. Session argument is for the right client
+
+- Create `client1` and `client2`
+- Get `database` from `client1`
+- Get `collection` from `database`
+- Start `session` from `client2`
+- Call `collection.insertOne(session,...)`
+- Assert that an error was reported because `session` was not started from `client1`
+
+Repeat the above for all methods that take a session parameter.
+
+### 6. No further operations can be performed using a session after `endSession` has been called
+
+- Start a `session`
+- End the `session`
+- Call `collection.InsertOne(session, ...)`
+- Assert that the proper error was reported
+
+Repeat the above for all methods that take a session parameter.
+
+If your driver implements a platform dependent idiomatic disposal pattern, test that also (if the idiomatic disposal
+pattern calls `endSession` it would be sufficient to only test the disposal pattern since that ends up calling
+`endSession`).
+
+### 7. Authenticating as multiple users suppresses implicit sessions
+
+Skip this test if your driver does not allow simultaneous authentication with multiple users.
+
+- Authenticate as two users
+- Call `findOne` with no explicit session
+- Capture the command sent to the server
+- Assert that the command sent to the server does not have an `lsid` field
+
+### 8. Client-side cursor that exhausts the results on the initial query immediately returns the implicit session to the pool
+
+- Insert two documents into a collection
+- Execute a find operation on the collection and iterate past the first document
+- Assert that the implicit session is returned to the pool. This can be done in several ways:
+ - Track in-use count in the server session pool and assert that the count has dropped to zero
+ - Track the lsid used for the find operation (e.g. with APM) and then do another operation and assert that the same
+ lsid is used as for the find operation.
+
+### 9. Client-side cursor that exhausts the results after a `getMore` immediately returns the implicit session to the pool
+
+- Insert five documents into a collection
+- Execute a find operation on the collection with batch size of 3
+- Iterate past the first four documents, forcing the final `getMore` operation
+- Assert that the implicit session is returned to the pool prior to iterating past the last document
+
+### 10. No remaining sessions are checked out after each functional test
+
+At the end of every individual functional test of the driver, there SHOULD be an assertion that there are no remaining
+sessions checked out from the pool. This may require changes to existing tests to ensure that they close any explicit
+client sessions and any unexhausted cursors.
+
+### 11. For every combination of topology and readPreference, ensure that `find` and `getMore` both send the same session id
+
+- Insert three documents into a collection
+- Execute a `find` operation on the collection with a batch size of 2
+- Assert that the server receives a non-zero lsid
+- Iterate through enough documents (3) to force a `getMore`
+- Assert that the server receives a non-zero lsid equal to the lsid that `find` sent.
+
+### 12. Session pool can be cleared after forking without calling `endSession`
+
+Skip this test if your driver does not allow forking.
+
+- Create ClientSession
+- Record its lsid
+- Delete it (so the lsid is pushed into the pool)
+- Fork
+- In the parent, create a ClientSession and assert its lsid is the same.
+- In the child, create a ClientSession and assert its lsid is different.
+
+### 13. Existing sessions are not checked into a cleared pool after forking
+
+Skip this test if your driver does not allow forking.
+
+- Create ClientSession
+- Record its lsid
+- Fork
+- In the parent, return the ClientSession to the pool, create a new ClientSession, and assert its lsid is the same.
+- In the child, return the ClientSession to the pool, create a new ClientSession, and assert its lsid is different.
+
+### 14. Implicit sessions only allocate their server session after a successful connection checkout
+
+- Create a MongoClient with the following options: `maxPoolSize=1` and `retryWrites=true`. If testing against a sharded
+ deployment, the test runner MUST ensure that the MongoClient connects to only a single mongos host.
+- Attach a command started listener that collects each command's lsid
+- Initiate the following concurrent operations
+ - `insertOne({ }),`
+ - `deleteOne({ }),`
+ - `updateOne({ }, { $set: { a: 1 } }),`
+ - `bulkWrite([{ updateOne: { filter: { }, update: { $set: { a: 1 } } } }]),`
+ - `findOneAndDelete({ }),`
+ - `findOneAndUpdate({ }, { $set: { a: 1 } }),`
+ - `findOneAndReplace({ }, { a: 1 }),`
+ - `find().toArray()`
+- Wait for all operations to complete successfully
+- Assert the following across at least 5 retries of the above test:
+ - Drivers MUST assert that exactly one session is used for all operations at least once across the retries of this
+ test.
+ - Note that it's possible, although rare, for >1 server session to be used because the session is not released until
+ after the connection is checked in.
+ - Drivers MUST assert that the number of allocated sessions is strictly less than the number of concurrent operations
+ in every retry of this test. In this instance it would be less than (but NOT equal to) 8.
+
+### 15. `lsid` is added inside `$query` when using OP_QUERY
+
+This test only applies to drivers that have not implemented OP_MSG and still use OP_QUERY.
+
+- For a command to a mongos that includes a readPreference, verify that the `lsid` on query commands is added inside the
+ `$query` field, and NOT as a top-level field.
+
+### 16. Authenticating as a second user after starting a session results in a server error
+
+This test only applies to drivers that allow authentication to be changed on the fly.
+
+- Authenticate as the first user
+- Start a session by calling `startSession`
+- Authenticate as a second user
+- Call `findOne` using the session as an explicit session
+- Assert that the driver returned an error because multiple users are authenticated
+
+### 17. Driver verifies that the session is owned by the current user
+
+This test only applies to drivers that allow authentication to be changed on the fly.
+
+- Authenticate as user A
+- Start a session by calling `startSession`
+- Logout user A
+- Authenticate as user B
+- Call `findOne` using the session as an explicit session
+- Assert that the driver returned an error because the session is owned by a different user
+
+### 18. Implicit session is ignored if connection does not support sessions
+
+Refer to [Testing against servers that do not support sessions](#testing-against-servers-that-do-not-support-sessions)
+and configure a `MongoClient` with command monitoring enabled.
+
+- Send a read command to the server (e.g., `findOne`), ignoring any errors from the server response
+- Check the corresponding `commandStarted` event: verify that `lsid` is not set
+- Send a write command to the server (e.g., `insertOne`), ignoring any errors from the server response
+- Check the corresponding `commandStarted` event: verify that lsid is not set
+
+### 19. Explicit session raises an error if connection does not support sessions
+
+Refer to [Testing against servers that do not support sessions](#testing-against-servers-that-do-not-support-sessions)
+and configure a `MongoClient` with default options.
+
+- Create a new explicit session by calling `startSession` (this MUST NOT error)
+- Attempt to send a read command to the server (e.g., `findOne`) with the explicit session passed in
+- Assert that a client-side error is generated indicating that sessions are not supported
+- Attempt to send a write command to the server (e.g., `insertOne`) with the explicit session passed in
+- Assert that a client-side error is generated indicating that sessions are not supported
+
+## Changelog
+
+- 2024-05-08: Migrated from reStructuredText to Markdown.
+- 2019-05-15: Initial version.
+- 2021-06-15: Added snapshot-session tests. Introduced legacy and unified folders.
+- 2021-07-30: Use numbering for prose test
+- 2022-02-11: Convert legacy tests to unified format
+- 2022-06-13: Relocate prose test from spec document and apply new ordering
+- 2023-02-24: Fix formatting and add new prose tests 18 and 19
diff --git a/source/sessions/tests/README.rst b/source/sessions/tests/README.rst
deleted file mode 100644
index 51efce8009..0000000000
--- a/source/sessions/tests/README.rst
+++ /dev/null
@@ -1,276 +0,0 @@
-====================
-Driver Session Tests
-====================
-
-.. contents::
-
-----
-
-Introduction
-============
-
-The YAML and JSON files in this directory are platform-independent tests
-meant to exercise a driver's implementation of sessions. These tests utilize the
-`Unified Test Format <../../unified-test-format/unified-test-format.md>`__.
-
-Snapshot session tests
-~~~~~~~~~~~~~~~~~~~~~~
-The default snapshot history window on the server is 5 minutes. Running the test in debug mode, or in any other slow configuration
-may lead to `SnapshotTooOld` errors. Drivers can work around this issue by increasing the server's `minSnapshotHistoryWindowInSeconds` parameter, for example:
-
-.. code:: python
-
- client.admin.command('setParameter', 1, minSnapshotHistoryWindowInSeconds=600)
-
-Testing against servers that do not support sessions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Since all regular 3.6+ servers support sessions, the prose tests which test for session non-support SHOULD
-use a mongocryptd server as the test server (available with server versions 4.2+); however, if future versions of mongocryptd
-support sessions or if mongocryptd is not a viable option for the driver implementing these tests, another server MAY be
-substituted as long as it does not return a non-null value for ``logicalSessionTimeoutMinutes``;
-in the event that no such server is readily available, a mock server may be used as a last resort.
-
-As part of the test setup for these cases, create a ``MongoClient`` pointed at the test server with the options
-specified in the test case and verify that the test server does NOT define a value for ``logicalSessionTimeoutMinutes``
-by sending a hello command and checking the response.
-
-Prose tests
-===========
-
-1. Setting both ``snapshot`` and ``causalConsistency`` to true is not allowed
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Snapshot sessions tests require server of version 5.0 or higher and
-replica set or a sharded cluster deployment.
-
-* ``client.startSession(snapshot = true, causalConsistency = true)``
-* Assert that an error was raised by driver
-
-2. Pool is LIFO
-~~~~~~~~~~~~~~~
-
-This test applies to drivers with session pools.
-
-* Call ``MongoClient.startSession`` twice to create two sessions, let us call them ``A`` and ``B``.
-* Call ``A.endSession``, then ``B.endSession``.
-* Call ``MongoClient.startSession``: the resulting session must have the same session ID as ``B``.
-* Call ``MongoClient.startSession`` again: the resulting session must have the same session ID as ``A``.
-
-3. ``$clusterTime`` in commands
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Turn ``heartbeatFrequencyMS`` up to a very large number.
-* Register a command-started and a command-succeeded APM listener. If the driver has no APM support, inspect commands/replies in another idiomatic way, such as monkey-patching or a mock server.
-* Send a ``ping`` command to the server with the generic ``runCommand`` method.
-* Assert that the command passed to the command-started listener includes ``$clusterTime`` if and only if ``maxWireVersion`` >= 6.
-* Record the ``$clusterTime``, if any, in the reply passed to the command-succeeded APM listener.
-* Send another ``ping`` command.
-* Assert that ``$clusterTime`` in the command passed to the command-started listener, if any, equals the ``$clusterTime`` in the previous server reply. (Turning ``heartbeatFrequencyMS`` up prevents an intervening heartbeat from advancing the ``$clusterTime`` between these final two steps.)
-
-Repeat the above for:
-
-* An aggregate command from the ``aggregate`` helper method
-* A find command from the ``find`` helper method
-* An insert command from the ``insert_one`` helper method
-
-4. Explicit and implicit session arguments
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Register a command-started APM listener. If the driver has no APM support, inspect commands in another idiomatic way, such as monkey-patching or a mock server.
-* Create ``client1``
-* Get ``database`` from ``client1``
-* Get ``collection`` from ``database``
-* Start ``session`` from ``client1``
-* Call ``collection.insertOne(session,...)``
-* Assert that the command passed to the command-started listener contained the session ``lsid`` from ``session``.
-* Call ``collection.insertOne(,...)`` (*without* a session argument)
-* Assert that the command passed to the command-started listener contained a session ``lsid``.
-
-Repeat the above for all methods that take a session parameter.
-
-5. Session argument is for the right client
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Create ``client1`` and ``client2``
-* Get ``database`` from ``client1``
-* Get ``collection`` from ``database``
-* Start ``session`` from ``client2``
-* Call ``collection.insertOne(session,...)``
-* Assert that an error was reported because ``session`` was not started from ``client1``
-
-Repeat the above for all methods that take a session parameter.
-
-6. No further operations can be performed using a session after ``endSession`` has been called
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Start a ``session``
-* End the ``session``
-* Call ``collection.InsertOne(session, ...)``
-* Assert that the proper error was reported
-
-Repeat the above for all methods that take a session parameter.
-
-If your driver implements a platform dependent idiomatic disposal pattern, test
-that also (if the idiomatic disposal pattern calls ``endSession`` it would be
-sufficient to only test the disposal pattern since that ends up calling
-``endSession``).
-
-7. Authenticating as multiple users suppresses implicit sessions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Skip this test if your driver does not allow simultaneous authentication with multiple users.
-
-* Authenticate as two users
-* Call ``findOne`` with no explicit session
-* Capture the command sent to the server
-* Assert that the command sent to the server does not have an ``lsid`` field
-
-8. Client-side cursor that exhausts the results on the initial query immediately returns the implicit session to the pool
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Insert two documents into a collection
-* Execute a find operation on the collection and iterate past the first document
-* Assert that the implicit session is returned to the pool. This can be done in several ways:
-
- * Track in-use count in the server session pool and assert that the count has dropped to zero
- * Track the lsid used for the find operation (e.g. with APM) and then do another operation and
- assert that the same lsid is used as for the find operation.
-
-9. Client-side cursor that exhausts the results after a ``getMore`` immediately returns the implicit session to the pool
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Insert five documents into a collection
-* Execute a find operation on the collection with batch size of 3
-* Iterate past the first four documents, forcing the final ``getMore`` operation
-* Assert that the implicit session is returned to the pool prior to iterating past the last document
-
-10. No remaining sessions are checked out after each functional test
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-At the end of every individual functional test of the driver, there SHOULD be an
-assertion that there are no remaining sessions checked out from the pool. This
-may require changes to existing tests to ensure that they close any explicit
-client sessions and any unexhausted cursors.
-
-11. For every combination of topology and readPreference, ensure that ``find`` and ``getMore`` both send the same session id
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Insert three documents into a collection
-* Execute a ``find`` operation on the collection with a batch size of 2
-* Assert that the server receives a non-zero lsid
-* Iterate through enough documents (3) to force a ``getMore``
-* Assert that the server receives a non-zero lsid equal to the lsid that ``find`` sent.
-
-12. Session pool can be cleared after forking without calling ``endSession``
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Skip this test if your driver does not allow forking.
-
-* Create ClientSession
-* Record its lsid
-* Delete it (so the lsid is pushed into the pool)
-* Fork
-* In the parent, create a ClientSession and assert its lsid is the same.
-* In the child, create a ClientSession and assert its lsid is different.
-
-13. Existing sessions are not checked into a cleared pool after forking
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Skip this test if your driver does not allow forking.
-
-* Create ClientSession
-* Record its lsid
-* Fork
-* In the parent, return the ClientSession to the pool, create a new ClientSession, and assert its lsid is the same.
-* In the child, return the ClientSession to the pool, create a new ClientSession, and assert its lsid is different.
-
-14. Implicit sessions only allocate their server session after a successful connection checkout
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* Create a MongoClient with the following options: ``maxPoolSize=1`` and ``retryWrites=true``. If testing against a sharded deployment, the test runner MUST ensure that the MongoClient connects to only a single mongos host.
-* Attach a command started listener that collects each command's lsid
-* Initiate the following concurrent operations
-
- * ``insertOne({ }),``
- * ``deleteOne({ }),``
- * ``updateOne({ }, { $set: { a: 1 } }),``
- * ``bulkWrite([{ updateOne: { filter: { }, update: { $set: { a: 1 } } } }]),``
- * ``findOneAndDelete({ }),``
- * ``findOneAndUpdate({ }, { $set: { a: 1 } }),``
- * ``findOneAndReplace({ }, { a: 1 }),``
- * ``find().toArray()``
-
-* Wait for all operations to complete successfully
-* Assert the following across at least 5 retries of the above test:
-
- * Drivers MUST assert that exactly one session is used for all operations at
- least once across the retries of this test.
- * Note that it's possible, although rare, for >1 server session to be used
- because the session is not released until after the connection is checked in.
- * Drivers MUST assert that the number of allocated sessions is strictly less
- than the number of concurrent operations in every retry of this test. In
- this instance it would be less than (but NOT equal to) 8.
-
-15. ``lsid`` is added inside ``$query`` when using OP_QUERY
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This test only applies to drivers that have not implemented OP_MSG and still use OP_QUERY.
-
-* For a command to a mongos that includes a readPreference, verify that the
- ``lsid`` on query commands is added inside the ``$query`` field, and NOT as a
- top-level field.
-
-16. Authenticating as a second user after starting a session results in a server error
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This test only applies to drivers that allow authentication to be changed on the fly.
-
-* Authenticate as the first user
-* Start a session by calling ``startSession``
-* Authenticate as a second user
-* Call ``findOne`` using the session as an explicit session
-* Assert that the driver returned an error because multiple users are authenticated
-
-17. Driver verifies that the session is owned by the current user
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This test only applies to drivers that allow authentication to be changed on the fly.
-
-* Authenticate as user A
-* Start a session by calling ``startSession``
-* Logout user A
-* Authenticate as user B
-* Call ``findOne`` using the session as an explicit session
-* Assert that the driver returned an error because the session is owned by a different user
-
-18. Implicit session is ignored if connection does not support sessions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Refer to `Testing against servers that do not support sessions`_ and configure a ``MongoClient``
-with command monitoring enabled.
-
-* Send a read command to the server (e.g., ``findOne``), ignoring any errors from the server response
-* Check the corresponding ``commandStarted`` event: verify that ``lsid`` is not set
-* Send a write command to the server (e.g., ``insertOne``), ignoring any errors from the server response
-* Check the corresponding ``commandStarted`` event: verify that lsid is not set
-
-19. Explicit session raises an error if connection does not support sessions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Refer to `Testing against servers that do not support sessions`_ and configure a ``MongoClient``
-with default options.
-
-* Create a new explicit session by calling ``startSession`` (this MUST NOT error)
-* Attempt to send a read command to the server (e.g., ``findOne``) with the explicit session passed in
-* Assert that a client-side error is generated indicating that sessions are not supported
-* Attempt to send a write command to the server (e.g., ``insertOne``) with the explicit session passed in
-* Assert that a client-side error is generated indicating that sessions are not supported
-
-Changelog
-=========
-
-:2019-05-15: Initial version.
-:2021-06-15: Added snapshot-session tests. Introduced legacy and unified folders.
-:2021-07-30: Use numbering for prose test
-:2022-02-11: Convert legacy tests to unified format
-:2022-06-13: Relocate prose test from spec document and apply new ordering
-:2023-02-24: Fix formatting and add new prose tests 18 and 19
diff --git a/source/transactions-convenient-api/transactions-convenient-api.rst b/source/transactions-convenient-api/transactions-convenient-api.rst
index 82f1136193..668a165331 100644
--- a/source/transactions-convenient-api/transactions-convenient-api.rst
+++ b/source/transactions-convenient-api/transactions-convenient-api.rst
@@ -44,7 +44,7 @@ ClientSession
`Driver Session`_ specification. The name of this object MAY vary across
drivers.
-.. _Driver Session: ../sessions/driver-sessions.rst
+.. _Driver Session: ../sessions/driver-sessions.md
MongoClient
The root object of a driver's API. The name of this object MAY vary across
diff --git a/source/transactions/tests/unified/client-bulkWrite.json b/source/transactions/tests/unified/client-bulkWrite.json
new file mode 100644
index 0000000000..f8f1d97169
--- /dev/null
+++ b/source/transactions/tests/unified/client-bulkWrite.json
@@ -0,0 +1,592 @@
+{
+ "description": "client bulkWrite transactions",
+ "schemaVersion": "1.3",
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0",
+ "topologies": [
+ "replicaset",
+ "sharded",
+ "load-balanced"
+ ]
+ }
+ ],
+ "createEntities": [
+ {
+ "client": {
+ "id": "client0",
+ "observeEvents": [
+ "commandStartedEvent"
+ ]
+ }
+ },
+ {
+ "database": {
+ "id": "database0",
+ "client": "client0",
+ "databaseName": "transaction-tests"
+ }
+ },
+ {
+ "collection": {
+ "id": "collection0",
+ "database": "database0",
+ "collectionName": "coll0"
+ }
+ },
+ {
+ "session": {
+ "id": "session0",
+ "client": "client0"
+ }
+ },
+ {
+ "client": {
+ "id": "client_with_wmajority",
+ "uriOptions": {
+ "w": "majority"
+ },
+ "observeEvents": [
+ "commandStartedEvent"
+ ]
+ }
+ },
+ {
+ "session": {
+ "id": "session_with_wmajority",
+ "client": "client_with_wmajority"
+ }
+ }
+ ],
+ "_yamlAnchors": {
+ "namespace": "transaction-tests.coll0"
+ },
+ "initialData": [
+ {
+ "databaseName": "transaction-tests",
+ "collectionName": "coll0",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ },
+ {
+ "_id": 5,
+ "x": 55
+ },
+ {
+ "_id": 6,
+ "x": 66
+ },
+ {
+ "_id": 7,
+ "x": 77
+ }
+ ]
+ }
+ ],
+ "tests": [
+ {
+ "description": "client bulkWrite in a transaction",
+ "operations": [
+ {
+ "object": "session0",
+ "name": "startTransaction"
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "session": "session0",
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "transaction-tests.coll0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ },
+ {
+ "updateOne": {
+ "namespace": "transaction-tests.coll0",
+ "filter": {
+ "_id": 1
+ },
+ "update": {
+ "$inc": {
+ "x": 1
+ }
+ }
+ }
+ },
+ {
+ "updateMany": {
+ "namespace": "transaction-tests.coll0",
+ "filter": {
+ "$and": [
+ {
+ "_id": {
+ "$gt": 1
+ }
+ },
+ {
+ "_id": {
+ "$lte": 3
+ }
+ }
+ ]
+ },
+ "update": {
+ "$inc": {
+ "x": 2
+ }
+ }
+ }
+ },
+ {
+ "replaceOne": {
+ "namespace": "transaction-tests.coll0",
+ "filter": {
+ "_id": 4
+ },
+ "replacement": {
+ "x": 44
+ },
+ "upsert": true
+ }
+ },
+ {
+ "deleteOne": {
+ "namespace": "transaction-tests.coll0",
+ "filter": {
+ "_id": 5
+ }
+ }
+ },
+ {
+ "deleteMany": {
+ "namespace": "transaction-tests.coll0",
+ "filter": {
+ "$and": [
+ {
+ "_id": {
+ "$gt": 5
+ }
+ },
+ {
+ "_id": {
+ "$lte": 7
+ }
+ }
+ ]
+ }
+ }
+ }
+ ],
+ "verboseResults": true
+ },
+ "expectResult": {
+ "insertedCount": 1,
+ "upsertedCount": 1,
+ "matchedCount": 3,
+ "modifiedCount": 3,
+ "deletedCount": 3,
+ "insertResults": {
+ "0": {
+ "insertedId": 8
+ }
+ },
+ "updateResults": {
+ "1": {
+ "matchedCount": 1,
+ "modifiedCount": 1,
+ "upsertedId": {
+ "$$exists": false
+ }
+ },
+ "2": {
+ "matchedCount": 2,
+ "modifiedCount": 2,
+ "upsertedId": {
+ "$$exists": false
+ }
+ },
+ "3": {
+ "matchedCount": 1,
+ "modifiedCount": 0,
+ "upsertedId": 4
+ }
+ },
+ "deleteResults": {
+ "4": {
+ "deletedCount": 1
+ },
+ "5": {
+ "deletedCount": 2
+ }
+ }
+ }
+ },
+ {
+ "object": "session0",
+ "name": "commitTransaction"
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client0",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "lsid": {
+ "$$sessionLsid": "session0"
+ },
+ "txnNumber": 1,
+ "startTransaction": true,
+ "autocommit": false,
+ "writeConcern": {
+ "$$exists": false
+ },
+ "bulkWrite": 1,
+ "errorsOnly": false,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 1
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 1
+ }
+ },
+ "multi": false
+ },
+ {
+ "update": 0,
+ "filter": {
+ "$and": [
+ {
+ "_id": {
+ "$gt": 1
+ }
+ },
+ {
+ "_id": {
+ "$lte": 3
+ }
+ }
+ ]
+ },
+ "updateMods": {
+ "$inc": {
+ "x": 2
+ }
+ },
+ "multi": true
+ },
+ {
+ "update": 0,
+ "filter": {
+ "_id": 4
+ },
+ "updateMods": {
+ "x": 44
+ },
+ "upsert": true,
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "_id": 5
+ },
+ "multi": false
+ },
+ {
+ "delete": 0,
+ "filter": {
+ "$and": [
+ {
+ "_id": {
+ "$gt": 5
+ }
+ },
+ {
+ "_id": {
+ "$lte": 7
+ }
+ }
+ ]
+ },
+ "multi": true
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "transaction-tests.coll0"
+ }
+ ]
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "commandName": "commitTransaction",
+ "databaseName": "admin",
+ "command": {
+ "commitTransaction": 1,
+ "lsid": {
+ "$$sessionLsid": "session0"
+ },
+ "txnNumber": 1,
+ "startTransaction": {
+ "$$exists": false
+ },
+ "autocommit": false,
+ "writeConcern": {
+ "$$exists": false
+ }
+ }
+ }
+ }
+ ]
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "transaction-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 12
+ },
+ {
+ "_id": 2,
+ "x": 24
+ },
+ {
+ "_id": 3,
+ "x": 35
+ },
+ {
+ "_id": 4,
+ "x": 44
+ },
+ {
+ "_id": 8,
+ "x": 88
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client writeConcern ignored for client bulkWrite in transaction",
+ "operations": [
+ {
+ "object": "session_with_wmajority",
+ "name": "startTransaction",
+ "arguments": {
+ "writeConcern": {
+ "w": 1
+ }
+ }
+ },
+ {
+ "object": "client_with_wmajority",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "session": "session_with_wmajority",
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "transaction-tests.coll0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ },
+ "expectResult": {
+ "insertedCount": 1,
+ "upsertedCount": 0,
+ "matchedCount": 0,
+ "modifiedCount": 0,
+ "deletedCount": 0,
+ "insertResults": {
+ "$$unsetOrMatches": {}
+ },
+ "updateResults": {
+ "$$unsetOrMatches": {}
+ },
+ "deleteResults": {
+ "$$unsetOrMatches": {}
+ }
+ }
+ },
+ {
+ "object": "session_with_wmajority",
+ "name": "commitTransaction"
+ }
+ ],
+ "expectEvents": [
+ {
+ "client": "client_with_wmajority",
+ "events": [
+ {
+ "commandStartedEvent": {
+ "commandName": "bulkWrite",
+ "databaseName": "admin",
+ "command": {
+ "lsid": {
+ "$$sessionLsid": "session_with_wmajority"
+ },
+ "txnNumber": 1,
+ "startTransaction": true,
+ "autocommit": false,
+ "writeConcern": {
+ "$$exists": false
+ },
+ "bulkWrite": 1,
+ "errorsOnly": true,
+ "ordered": true,
+ "ops": [
+ {
+ "insert": 0,
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ ],
+ "nsInfo": [
+ {
+ "ns": "transaction-tests.coll0"
+ }
+ ]
+ }
+ }
+ },
+ {
+ "commandStartedEvent": {
+ "command": {
+ "commitTransaction": 1,
+ "lsid": {
+ "$$sessionLsid": "session_with_wmajority"
+ },
+ "txnNumber": {
+ "$numberLong": "1"
+ },
+ "startTransaction": {
+ "$$exists": false
+ },
+ "autocommit": false,
+ "writeConcern": {
+ "w": 1
+ }
+ },
+ "commandName": "commitTransaction",
+ "databaseName": "admin"
+ }
+ }
+ ]
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "coll0",
+ "databaseName": "transaction-tests",
+ "documents": [
+ {
+ "_id": 1,
+ "x": 11
+ },
+ {
+ "_id": 2,
+ "x": 22
+ },
+ {
+ "_id": 3,
+ "x": 33
+ },
+ {
+ "_id": 5,
+ "x": 55
+ },
+ {
+ "_id": 6,
+ "x": 66
+ },
+ {
+ "_id": 7,
+ "x": 77
+ },
+ {
+ "_id": 8,
+ "x": 88
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "description": "client bulkWrite with writeConcern in a transaction causes a transaction error",
+ "operations": [
+ {
+ "object": "session0",
+ "name": "startTransaction"
+ },
+ {
+ "object": "client0",
+ "name": "clientBulkWrite",
+ "arguments": {
+ "session": "session0",
+ "writeConcern": {
+ "w": 1
+ },
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "transaction-tests.coll0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "isClientError": true,
+ "errorContains": "Cannot set write concern after starting a transaction"
+ }
+ }
+ ]
+ }
+ ]
+}
diff --git a/source/transactions/tests/unified/client-bulkWrite.yml b/source/transactions/tests/unified/client-bulkWrite.yml
new file mode 100644
index 0000000000..eda2babbe7
--- /dev/null
+++ b/source/transactions/tests/unified/client-bulkWrite.yml
@@ -0,0 +1,262 @@
+description: "client bulkWrite transactions"
+schemaVersion: "1.3"
+runOnRequirements:
+ - minServerVersion: "8.0"
+ topologies:
+ - replicaset
+ - sharded
+ - load-balanced
+
+createEntities:
+ - client:
+ id: &client0 client0
+ observeEvents: [ commandStartedEvent ]
+ - database:
+ id: &database0 database0
+ client: *client0
+ databaseName: &database0Name transaction-tests
+ - collection:
+ id: &collection0 collection0
+ database: *database0
+ collectionName: &collection0Name coll0
+ - session:
+ id: &session0 session0
+ client: *client0
+ - client:
+ id: &client_with_wmajority client_with_wmajority
+ uriOptions:
+ w: majority
+ observeEvents:
+ - commandStartedEvent
+ - session:
+ id: &session_with_wmajority session_with_wmajority
+ client: *client_with_wmajority
+
+_yamlAnchors:
+ namespace: &namespace "transaction-tests.coll0"
+
+initialData:
+ - databaseName: *database0Name
+ collectionName: *collection0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
+ - { _id: 5, x: 55 }
+ - { _id: 6, x: 66 }
+ - { _id: 7, x: 77 }
+
+tests:
+ - description: "client bulkWrite in a transaction"
+ operations:
+ - object: *session0
+ name: startTransaction
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ session: *session0
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 8, x: 88 }
+ - updateOne:
+ namespace: *namespace
+ filter: { _id: 1 }
+ update: { $inc: { x: 1 } }
+ - updateMany:
+ namespace: *namespace
+ filter:
+ $and: [ { _id: { $gt: 1 } }, { _id: { $lte: 3 } } ]
+ update: { $inc: { x: 2 } }
+ - replaceOne:
+ namespace: *namespace
+ filter: { _id: 4 }
+ replacement: { x: 44 }
+ upsert: true
+ - deleteOne:
+ namespace: *namespace
+ filter: { _id: 5 }
+ - deleteMany:
+ namespace: *namespace
+ filter:
+ $and: [ { _id: { $gt: 5 } }, { _id: { $lte: 7 } } ]
+ verboseResults: true
+ expectResult:
+ insertedCount: 1
+ upsertedCount: 1
+ matchedCount: 3
+ modifiedCount: 3
+ deletedCount: 3
+ insertResults:
+ 0:
+ insertedId: 8
+ updateResults:
+ 1:
+ matchedCount: 1
+ modifiedCount: 1
+ upsertedId: { $$exists: false }
+ 2:
+ matchedCount: 2
+ modifiedCount: 2
+ upsertedId: { $$exists: false }
+ 3:
+ matchedCount: 1
+ modifiedCount: 0
+ upsertedId: 4
+ deleteResults:
+ 4:
+ deletedCount: 1
+ 5:
+ deletedCount: 2
+ - object: *session0
+ name: commitTransaction
+ expectEvents:
+ - client: *client0
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ lsid: { $$sessionLsid: *session0 }
+ txnNumber: 1
+ startTransaction: true
+ autocommit: false
+ writeConcern: { $$exists: false }
+ bulkWrite: 1
+ errorsOnly: false
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 8, x: 88 }
+ - update: 0
+ filter: { _id: 1 }
+ updateMods: { $inc: { x: 1 } }
+ multi: false
+ - update: 0
+ filter:
+ $and: [ { _id: { $gt: 1 } }, { _id: { $lte: 3 } } ]
+ updateMods: { $inc: { x: 2 } }
+ multi: true
+ - update: 0
+ filter: { _id: 4 }
+ updateMods: { x: 44 }
+ upsert: true
+ multi: false
+ - delete: 0
+ filter: { _id: 5 }
+ multi: false
+ - delete: 0
+ filter:
+ $and: [ { _id: { $gt: 5 } }, { _id: { $lte: 7 } } ]
+ multi: true
+ nsInfo:
+ - ns: *namespace
+ - commandStartedEvent:
+ commandName: commitTransaction
+ databaseName: admin
+ command:
+ commitTransaction: 1
+ lsid: { $$sessionLsid: *session0 }
+ txnNumber: 1
+ startTransaction: { $$exists: false }
+ autocommit: false
+ writeConcern: { $$exists: false }
+ outcome:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 12 }
+ - { _id: 2, x: 24 }
+ - { _id: 3, x: 35 }
+ - { _id: 4, x: 44 }
+ - { _id: 8, x: 88 }
+ - description: 'client writeConcern ignored for client bulkWrite in transaction'
+ operations:
+ - object: *session_with_wmajority
+ name: startTransaction
+ arguments:
+ writeConcern:
+ w: 1
+ - object: *client_with_wmajority
+ name: clientBulkWrite
+ arguments:
+ session: *session_with_wmajority
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 8, x: 88 }
+ expectResult:
+ insertedCount: 1
+ upsertedCount: 0
+ matchedCount: 0
+ modifiedCount: 0
+ deletedCount: 0
+ insertResults:
+ $$unsetOrMatches: {}
+ updateResults:
+ $$unsetOrMatches: {}
+ deleteResults:
+ $$unsetOrMatches: {}
+ - object: *session_with_wmajority
+ name: commitTransaction
+ expectEvents:
+ -
+ client: *client_with_wmajority
+ events:
+ - commandStartedEvent:
+ commandName: bulkWrite
+ databaseName: admin
+ command:
+ lsid: { $$sessionLsid: *session_with_wmajority }
+ txnNumber: 1
+ startTransaction: true
+ autocommit: false
+ writeConcern: { $$exists: false }
+ bulkWrite: 1
+ errorsOnly: true
+ ordered: true
+ ops:
+ - insert: 0
+ document: { _id: 8, x: 88 }
+ nsInfo:
+ - ns: *namespace
+ -
+ commandStartedEvent:
+ command:
+ commitTransaction: 1
+ lsid: { $$sessionLsid: *session_with_wmajority }
+ txnNumber: { $numberLong: '1' }
+ startTransaction: { $$exists: false }
+ autocommit: false
+ writeConcern:
+ w: 1
+ commandName: commitTransaction
+ databaseName: admin
+ outcome:
+ - collectionName: *collection0Name
+ databaseName: *database0Name
+ documents:
+ - { _id: 1, x: 11 }
+ - { _id: 2, x: 22 }
+ - { _id: 3, x: 33 }
+ - { _id: 5, x: 55 }
+ - { _id: 6, x: 66 }
+ - { _id: 7, x: 77 }
+ - { _id: 8, x: 88 }
+ - description: "client bulkWrite with writeConcern in a transaction causes a transaction error"
+ operations:
+ - object: *session0
+ name: startTransaction
+ - object: *client0
+ name: clientBulkWrite
+ arguments:
+ session: *session0
+ writeConcern:
+ w: 1
+ models:
+ - insertOne:
+ namespace: *namespace
+ document: { _id: 8, x: 88 }
+ expectError:
+ isClientError: true
+ errorContains: "Cannot set write concern after starting a transaction"
diff --git a/source/transactions/tests/unified/mongos-pin-auto-tests.py b/source/transactions/tests/unified/mongos-pin-auto-tests.py
index 99a34b485d..ad2aeabd17 100644
--- a/source/transactions/tests/unified/mongos-pin-auto-tests.py
+++ b/source/transactions/tests/unified/mongos-pin-auto-tests.py
@@ -291,6 +291,11 @@
insert: *collection_name
documents:
- { _id : 1 }'''),
+ # clientBulkWrite:
+ 'clientBulkWrite': ('bulkWrite', '*client0', r'''models:
+ - insertOne:
+ namespace: database0.collection0
+ document: { _id: 8, x: 88 }'''),
}
# Maps from error_name to error_data.
@@ -313,7 +318,11 @@ def create_pin_test(op_name, error_name):
error_data = NON_TRANSIENT_ERRORS[error_name]
if op_name.startswith('bulkWrite'):
op_name = 'bulkWrite'
- return TEMPLATE.format(**locals())
+ test = TEMPLATE.format(**locals())
+ if op_name == 'clientBulkWrite':
+ test += ' runOnRequirements:\n'
+ test += ' - minServerVersion: "8.0" # `bulkWrite` added to server 8.0"\n'
+ return test
def create_unpin_test(op_name, error_name):
@@ -324,7 +333,12 @@ def create_unpin_test(op_name, error_name):
error_data = TRANSIENT_ERRORS[error_name]
if op_name.startswith('bulkWrite'):
op_name = 'bulkWrite'
- return TEMPLATE.format(**locals())
+ test = TEMPLATE.format(**locals())
+ if op_name == 'clientBulkWrite':
+ test += ' runOnRequirements:\n'
+ test += ' - minServerVersion: "8.0" # `bulkWrite` added to server 8.0"\n'
+ return test
+
tests = []
diff --git a/source/transactions/tests/unified/mongos-pin-auto.json b/source/transactions/tests/unified/mongos-pin-auto.json
index 93eac8bb77..27db520401 100644
--- a/source/transactions/tests/unified/mongos-pin-auto.json
+++ b/source/transactions/tests/unified/mongos-pin-auto.json
@@ -2004,6 +2004,104 @@
}
]
},
+ {
+ "description": "remain pinned after non-transient Interrupted error on clientBulkWrite bulkWrite",
+ "operations": [
+ {
+ "object": "session0",
+ "name": "startTransaction"
+ },
+ {
+ "object": "collection0",
+ "name": "insertOne",
+ "arguments": {
+ "session": "session0",
+ "document": {
+ "_id": 3
+ }
+ },
+ "expectResult": {
+ "$$unsetOrMatches": {
+ "insertedId": {
+ "$$unsetOrMatches": 3
+ }
+ }
+ }
+ },
+ {
+ "name": "targetedFailPoint",
+ "object": "testRunner",
+ "arguments": {
+ "session": "session0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorCode": 11601
+ }
+ }
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client0",
+ "arguments": {
+ "session": "session0",
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "database0.collection0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "errorLabelsOmit": [
+ "TransientTransactionError"
+ ]
+ }
+ },
+ {
+ "object": "testRunner",
+ "name": "assertSessionPinned",
+ "arguments": {
+ "session": "session0"
+ }
+ },
+ {
+ "object": "session0",
+ "name": "abortTransaction"
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "test",
+ "databaseName": "transaction-tests",
+ "documents": [
+ {
+ "_id": 1
+ },
+ {
+ "_id": 2
+ }
+ ]
+ }
+ ],
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ]
+ },
{
"description": "unpin after transient connection error on insertOne insert",
"operations": [
@@ -5175,6 +5273,202 @@
]
}
]
+ },
+ {
+ "description": "unpin after transient connection error on clientBulkWrite bulkWrite",
+ "operations": [
+ {
+ "object": "session0",
+ "name": "startTransaction"
+ },
+ {
+ "object": "collection0",
+ "name": "insertOne",
+ "arguments": {
+ "session": "session0",
+ "document": {
+ "_id": 3
+ }
+ },
+ "expectResult": {
+ "$$unsetOrMatches": {
+ "insertedId": {
+ "$$unsetOrMatches": 3
+ }
+ }
+ }
+ },
+ {
+ "name": "targetedFailPoint",
+ "object": "testRunner",
+ "arguments": {
+ "session": "session0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "closeConnection": true
+ }
+ }
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client0",
+ "arguments": {
+ "session": "session0",
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "database0.collection0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "errorLabelsContain": [
+ "TransientTransactionError"
+ ]
+ }
+ },
+ {
+ "object": "testRunner",
+ "name": "assertSessionUnpinned",
+ "arguments": {
+ "session": "session0"
+ }
+ },
+ {
+ "object": "session0",
+ "name": "abortTransaction"
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "test",
+ "databaseName": "transaction-tests",
+ "documents": [
+ {
+ "_id": 1
+ },
+ {
+ "_id": 2
+ }
+ ]
+ }
+ ],
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ]
+ },
+ {
+ "description": "unpin after transient ShutdownInProgress error on clientBulkWrite bulkWrite",
+ "operations": [
+ {
+ "object": "session0",
+ "name": "startTransaction"
+ },
+ {
+ "object": "collection0",
+ "name": "insertOne",
+ "arguments": {
+ "session": "session0",
+ "document": {
+ "_id": 3
+ }
+ },
+ "expectResult": {
+ "$$unsetOrMatches": {
+ "insertedId": {
+ "$$unsetOrMatches": 3
+ }
+ }
+ }
+ },
+ {
+ "name": "targetedFailPoint",
+ "object": "testRunner",
+ "arguments": {
+ "session": "session0",
+ "failPoint": {
+ "configureFailPoint": "failCommand",
+ "mode": {
+ "times": 1
+ },
+ "data": {
+ "failCommands": [
+ "bulkWrite"
+ ],
+ "errorCode": 91
+ }
+ }
+ }
+ },
+ {
+ "name": "clientBulkWrite",
+ "object": "client0",
+ "arguments": {
+ "session": "session0",
+ "models": [
+ {
+ "insertOne": {
+ "namespace": "database0.collection0",
+ "document": {
+ "_id": 8,
+ "x": 88
+ }
+ }
+ }
+ ]
+ },
+ "expectError": {
+ "errorLabelsContain": [
+ "TransientTransactionError"
+ ]
+ }
+ },
+ {
+ "object": "testRunner",
+ "name": "assertSessionUnpinned",
+ "arguments": {
+ "session": "session0"
+ }
+ },
+ {
+ "object": "session0",
+ "name": "abortTransaction"
+ }
+ ],
+ "outcome": [
+ {
+ "collectionName": "test",
+ "databaseName": "transaction-tests",
+ "documents": [
+ {
+ "_id": 1
+ },
+ {
+ "_id": 2
+ }
+ ]
+ }
+ ],
+ "runOnRequirements": [
+ {
+ "minServerVersion": "8.0"
+ }
+ ]
}
]
}
diff --git a/source/transactions/tests/unified/mongos-pin-auto.yml b/source/transactions/tests/unified/mongos-pin-auto.yml
index 7a76347555..a80dd62031 100644
--- a/source/transactions/tests/unified/mongos-pin-auto.yml
+++ b/source/transactions/tests/unified/mongos-pin-auto.yml
@@ -676,6 +676,36 @@ tests:
- *abortTransaction
outcome: *outcome
+ - description: remain pinned after non-transient Interrupted error on clientBulkWrite bulkWrite
+ operations:
+ - *startTransaction
+ - *initialCommand
+ - name: targetedFailPoint
+ object: testRunner
+ arguments:
+ session: *session0
+ failPoint:
+ configureFailPoint: failCommand
+ mode: {times: 1}
+ data:
+ failCommands: ["bulkWrite"]
+ errorCode: 11601
+ - name: clientBulkWrite
+ object: *client0
+ arguments:
+ session: *session0
+ models:
+ - insertOne:
+ namespace: database0.collection0
+ document: { _id: 8, x: 88 }
+ expectError:
+ errorLabelsOmit: ["TransientTransactionError"]
+ - *assertSessionPinned
+ - *abortTransaction
+ outcome: *outcome
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0"
+
- description: unpin after transient connection error on insertOne insert
operations:
- *startTransaction
@@ -1614,3 +1644,63 @@ tests:
- *abortTransaction
outcome: *outcome
+ - description: unpin after transient connection error on clientBulkWrite bulkWrite
+ operations:
+ - *startTransaction
+ - *initialCommand
+ - name: targetedFailPoint
+ object: testRunner
+ arguments:
+ session: *session0
+ failPoint:
+ configureFailPoint: failCommand
+ mode: {times: 1}
+ data:
+ failCommands: ["bulkWrite"]
+ closeConnection: true
+ - name: clientBulkWrite
+ object: *client0
+ arguments:
+ session: *session0
+ models:
+ - insertOne:
+ namespace: database0.collection0
+ document: { _id: 8, x: 88 }
+ expectError:
+ errorLabelsContain: ["TransientTransactionError"]
+ - *assertSessionUnpinned
+ - *abortTransaction
+ outcome: *outcome
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0"
+
+ - description: unpin after transient ShutdownInProgress error on clientBulkWrite bulkWrite
+ operations:
+ - *startTransaction
+ - *initialCommand
+ - name: targetedFailPoint
+ object: testRunner
+ arguments:
+ session: *session0
+ failPoint:
+ configureFailPoint: failCommand
+ mode: {times: 1}
+ data:
+ failCommands: ["bulkWrite"]
+ errorCode: 91
+ - name: clientBulkWrite
+ object: *client0
+ arguments:
+ session: *session0
+ models:
+ - insertOne:
+ namespace: database0.collection0
+ document: { _id: 8, x: 88 }
+ expectError:
+ errorLabelsContain: ["TransientTransactionError"]
+ - *assertSessionUnpinned
+ - *abortTransaction
+ outcome: *outcome
+ runOnRequirements:
+ - minServerVersion: "8.0" # `bulkWrite` added to server 8.0"
+
diff --git a/source/transactions/transactions.md b/source/transactions/transactions.md
index 76745b59bb..484ab33fb7 100644
--- a/source/transactions/transactions.md
+++ b/source/transactions/transactions.md
@@ -1,4 +1,4 @@
-# Driver Transactions Specification
+# Transactions Specification
- Status: Accepted
- Minimum Server Version: 4.0
@@ -8,8 +8,8 @@ ______________________________________________________________________
## **Abstract**
Version 4.0 of the server introduces multi-statement transactions. This spec builds upon the
-[Driver Sessions Specification](../sessions/driver-sessions.rst) to define how an application uses transactions and how
-a driver interacts with the server to implement transactions.
+[Driver Sessions Specification](../sessions/driver-sessions.md) to define how an application uses transactions and how a
+driver interacts with the server to implement transactions.
The API for transactions must be specified to ensure that all drivers and the mongo shell are consistent with each
other, and to provide a natural interface for application developers and DBAs who use multi-statement transactions.
@@ -23,7 +23,7 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH
### **Terms**
-This specification uses the terms defined in the [Driver Sessions Specification](../sessions/driver-sessions.rst) and
+This specification uses the terms defined in the [Driver Sessions Specification](../sessions/driver-sessions.md) and
[Retryable Writes Specification](../retryable-writes/retryable-writes.md). Additional terms are defined below.
#### Resource Management Block
@@ -289,7 +289,7 @@ containing the message "Transaction already in progress" without modifying any s
startTransaction SHOULD report an error if the driver can detect that transactions are not supported by the deployment.
A deployment does not support transactions when the deployment does not support sessions, or maxWireVersion \< 7, or the
maxWireVersion \< 8 and the topology type is Sharded, see
-[How to Check Whether a Deployment Supports Sessions](https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst#how-to-check-whether-a-deployment-supports-sessions).
+[How to Check Whether a Deployment Supports Sessions](../sessions/driver-sessions.md#how-to-check-whether-a-deployment-supports-sessions).
Note that checking the maxWireVersion does not guarantee that the deployment supports transactions, for example a
MongoDB 4.0 replica set using MMAPv1 will report maxWireVersion 7 but does not support transactions. In this case,
Drivers rely on the deployment to report an error when a transaction is started.
@@ -636,7 +636,7 @@ Drivers MUST unpin a ClientSession in the following situations:
1. The transaction is aborted. The session MUST be unpinned regardless of whether or the `abortTransaction` command
succeeds or fails, or was executed at all. If the operation fails with a retryable error, the session MUST be
unpinned before performing server selection for the retry.
-2. Any operation in the transcation, including `commitTransaction` fails with a TransientTransactionError. Transient
+2. Any operation in the transaction, including `commitTransaction` fails with a TransientTransactionError. Transient
errors indicate that the transaction in question has already been aborted or that the pinnned mongos is
down/unavailable. Unpinning the session ensures that a subsequent `abortTransaction` (or `commitTransaction`) does
not block waiting on a server that is unreachable.
@@ -778,7 +778,7 @@ The Python driver serves as a reference implementation.
## **Design Rationale**
-The design of this specification builds on the [Driver Sessions Specification](../sessions/driver-sessions.rst) and
+The design of this specification builds on the [Driver Sessions Specification](../sessions/driver-sessions.md) and
modifies the driver API as little as possible.
Drivers will rely on the server to yield an error if an unsupported command is executed within a transaction. This will
@@ -859,7 +859,7 @@ execute a command directly with minimum additional client-side logic.
This specification depends on:
-1. [Driver Sessions Specification](../sessions/driver-sessions.rst)
+1. [Driver Sessions Specification](../sessions/driver-sessions.md)
2. [Retryable Writes Specification](../retryable-writes/retryable-writes.md)
## **Backwards Compatibility**
@@ -1009,6 +1009,7 @@ The following commands are allowed inside transactions:
10. geoSearch
11. create
12. createIndexes on an empty collection created in the same transaction or on a non-existing collection
+13. bulkWrite
### Why don’t drivers automatically retry commit after a write concern timeout error?
@@ -1072,6 +1073,8 @@ objective of avoiding duplicate commits.
## **Changelog**
+- 2024-05-08: Add bulkWrite to the list of commands allowed in transactions.
+
- 2024-02-15: Migrated from reStructuredText to Markdown.
- 2023-11-22: Specify that non-transient transaction errors abort the transaction\
diff --git a/source/unified-test-format/schema-1.21.json b/source/unified-test-format/schema-1.21.json
new file mode 100644
index 0000000000..9d22fe6209
--- /dev/null
+++ b/source/unified-test-format/schema-1.21.json
@@ -0,0 +1,1116 @@
+{
+ "$schema": "http://json-schema.org/draft-07/schema#",
+ "title": "Unified Test Format",
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "description",
+ "schemaVersion",
+ "tests"
+ ],
+ "properties": {
+ "description": {
+ "type": "string"
+ },
+ "schemaVersion": {
+ "$ref": "#/definitions/version"
+ },
+ "runOnRequirements": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/runOnRequirement"
+ }
+ },
+ "createEntities": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/entity"
+ }
+ },
+ "initialData": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/collectionData"
+ }
+ },
+ "tests": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/test"
+ }
+ },
+ "_yamlAnchors": {
+ "type": "object",
+ "additionalProperties": true
+ }
+ },
+ "definitions": {
+ "version": {
+ "type": "string",
+ "pattern": "^[0-9]+(\\.[0-9]+){1,2}$"
+ },
+ "runOnRequirement": {
+ "type": "object",
+ "additionalProperties": false,
+ "minProperties": 1,
+ "properties": {
+ "maxServerVersion": {
+ "$ref": "#/definitions/version"
+ },
+ "minServerVersion": {
+ "$ref": "#/definitions/version"
+ },
+ "topologies": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string",
+ "enum": [
+ "single",
+ "replicaset",
+ "sharded",
+ "sharded-replicaset",
+ "load-balanced"
+ ]
+ }
+ },
+ "serverless": {
+ "type": "string",
+ "enum": [
+ "require",
+ "forbid",
+ "allow"
+ ]
+ },
+ "serverParameters": {
+ "type": "object",
+ "minProperties": 1
+ },
+ "auth": {
+ "type": "boolean"
+ },
+ "authMechanism": {
+ "type": "string"
+ },
+ "csfle": {
+ "type": "boolean"
+ }
+ }
+ },
+ "entity": {
+ "type": "object",
+ "additionalProperties": false,
+ "maxProperties": 1,
+ "minProperties": 1,
+ "properties": {
+ "client": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "uriOptions": {
+ "type": "object"
+ },
+ "useMultipleMongoses": {
+ "type": "boolean"
+ },
+ "observeEvents": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string",
+ "enum": [
+ "commandStartedEvent",
+ "commandSucceededEvent",
+ "commandFailedEvent",
+ "poolCreatedEvent",
+ "poolReadyEvent",
+ "poolClearedEvent",
+ "poolClosedEvent",
+ "connectionCreatedEvent",
+ "connectionReadyEvent",
+ "connectionClosedEvent",
+ "connectionCheckOutStartedEvent",
+ "connectionCheckOutFailedEvent",
+ "connectionCheckedOutEvent",
+ "connectionCheckedInEvent",
+ "serverDescriptionChangedEvent",
+ "topologyDescriptionChangedEvent"
+ ]
+ }
+ },
+ "ignoreCommandMonitoringEvents": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string"
+ }
+ },
+ "storeEventsAsEntities": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/storeEventsAsEntity"
+ }
+ },
+ "observeLogMessages": {
+ "type": "object",
+ "minProperties": 1,
+ "additionalProperties": false,
+ "properties": {
+ "command": {
+ "$ref": "#/definitions/logSeverityLevel"
+ },
+ "topology": {
+ "$ref": "#/definitions/logSeverityLevel"
+ },
+ "serverSelection": {
+ "$ref": "#/definitions/logSeverityLevel"
+ },
+ "connection": {
+ "$ref": "#/definitions/logSeverityLevel"
+ }
+ }
+ },
+ "serverApi": {
+ "$ref": "#/definitions/serverApi"
+ },
+ "observeSensitiveCommands": {
+ "type": "boolean"
+ }
+ }
+ },
+ "clientEncryption": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "clientEncryptionOpts"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "clientEncryptionOpts": {
+ "$ref": "#/definitions/clientEncryptionOpts"
+ }
+ }
+ },
+ "database": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "client",
+ "databaseName"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "client": {
+ "type": "string"
+ },
+ "databaseName": {
+ "type": "string"
+ },
+ "databaseOptions": {
+ "$ref": "#/definitions/collectionOrDatabaseOptions"
+ }
+ }
+ },
+ "collection": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "database",
+ "collectionName"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "database": {
+ "type": "string"
+ },
+ "collectionName": {
+ "type": "string"
+ },
+ "collectionOptions": {
+ "$ref": "#/definitions/collectionOrDatabaseOptions"
+ }
+ }
+ },
+ "session": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "client"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "client": {
+ "type": "string"
+ },
+ "sessionOptions": {
+ "type": "object"
+ }
+ }
+ },
+ "bucket": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "database"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "database": {
+ "type": "string"
+ },
+ "bucketOptions": {
+ "type": "object"
+ }
+ }
+ },
+ "thread": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ }
+ }
+ }
+ }
+ },
+ "logComponent": {
+ "type": "string",
+ "enum": [
+ "command",
+ "topology",
+ "serverSelection",
+ "connection"
+ ]
+ },
+ "logSeverityLevel": {
+ "type": "string",
+ "enum": [
+ "emergency",
+ "alert",
+ "critical",
+ "error",
+ "warning",
+ "notice",
+ "info",
+ "debug",
+ "trace"
+ ]
+ },
+ "clientEncryptionOpts": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "keyVaultClient",
+ "keyVaultNamespace",
+ "kmsProviders"
+ ],
+ "properties": {
+ "keyVaultClient": {
+ "type": "string"
+ },
+ "keyVaultNamespace": {
+ "type": "string"
+ },
+ "kmsProviders": {
+ "$ref": "#/definitions/kmsProviders"
+ }
+ }
+ },
+ "kmsProviders": {
+ "$defs": {
+ "stringOrPlaceholder": {
+ "oneOf": [
+ {
+ "type": "string"
+ },
+ {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "$$placeholder"
+ ],
+ "properties": {
+ "$$placeholder": {}
+ }
+ }
+ ]
+ }
+ },
+ "type": "object",
+ "additionalProperties": false,
+ "patternProperties": {
+ "^aws(:[a-zA-Z0-9_]+)?$": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "accessKeyId": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "secretAccessKey": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "sessionToken": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ }
+ }
+ },
+ "^azure(:[a-zA-Z0-9_]+)?$": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "tenantId": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "clientId": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "clientSecret": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "identityPlatformEndpoint": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ }
+ }
+ },
+ "^gcp(:[a-zA-Z0-9_]+)?$": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "email": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "privateKey": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ },
+ "endpoint": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ }
+ }
+ },
+ "^kmip(:[a-zA-Z0-9_]+)?$": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "endpoint": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ }
+ }
+ },
+ "^local(:[a-zA-Z0-9_]+)?$": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "key": {
+ "$ref": "#/definitions/kmsProviders/$defs/stringOrPlaceholder"
+ }
+ }
+ }
+ }
+ },
+ "storeEventsAsEntity": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "id",
+ "events"
+ ],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "events": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string",
+ "enum": [
+ "PoolCreatedEvent",
+ "PoolReadyEvent",
+ "PoolClearedEvent",
+ "PoolClosedEvent",
+ "ConnectionCreatedEvent",
+ "ConnectionReadyEvent",
+ "ConnectionClosedEvent",
+ "ConnectionCheckOutStartedEvent",
+ "ConnectionCheckOutFailedEvent",
+ "ConnectionCheckedOutEvent",
+ "ConnectionCheckedInEvent",
+ "CommandStartedEvent",
+ "CommandSucceededEvent",
+ "CommandFailedEvent",
+ "ServerDescriptionChangedEvent",
+ "TopologyDescriptionChangedEvent"
+ ]
+ }
+ }
+ }
+ },
+ "collectionData": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "collectionName",
+ "databaseName",
+ "documents"
+ ],
+ "properties": {
+ "collectionName": {
+ "type": "string"
+ },
+ "databaseName": {
+ "type": "string"
+ },
+ "createOptions": {
+ "type": "object",
+ "properties": {
+ "writeConcern": false
+ }
+ },
+ "documents": {
+ "type": "array",
+ "items": {
+ "type": "object"
+ }
+ }
+ }
+ },
+ "expectedEventsForClient": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "client",
+ "events"
+ ],
+ "properties": {
+ "client": {
+ "type": "string"
+ },
+ "eventType": {
+ "type": "string",
+ "enum": [
+ "command",
+ "cmap",
+ "sdam"
+ ]
+ },
+ "events": {
+ "type": "array"
+ },
+ "ignoreExtraEvents": {
+ "type": "boolean"
+ }
+ },
+ "oneOf": [
+ {
+ "required": [
+ "eventType"
+ ],
+ "properties": {
+ "eventType": {
+ "const": "command"
+ },
+ "events": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedCommandEvent"
+ }
+ }
+ }
+ },
+ {
+ "required": [
+ "eventType"
+ ],
+ "properties": {
+ "eventType": {
+ "const": "cmap"
+ },
+ "events": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedCmapEvent"
+ }
+ }
+ }
+ },
+ {
+ "required": [
+ "eventType"
+ ],
+ "properties": {
+ "eventType": {
+ "const": "sdam"
+ },
+ "events": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedSdamEvent"
+ }
+ }
+ }
+ },
+ {
+ "additionalProperties": false,
+ "properties": {
+ "client": {
+ "type": "string"
+ },
+ "events": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedCommandEvent"
+ }
+ },
+ "ignoreExtraEvents": {
+ "type": "boolean"
+ }
+ }
+ }
+ ]
+ },
+ "expectedCommandEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "maxProperties": 1,
+ "minProperties": 1,
+ "properties": {
+ "commandStartedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "command": {
+ "type": "object"
+ },
+ "commandName": {
+ "type": "string"
+ },
+ "databaseName": {
+ "type": "string"
+ },
+ "hasServiceId": {
+ "type": "boolean"
+ },
+ "hasServerConnectionId": {
+ "type": "boolean"
+ }
+ }
+ },
+ "commandSucceededEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "reply": {
+ "type": "object"
+ },
+ "commandName": {
+ "type": "string"
+ },
+ "databaseName": {
+ "type": "string"
+ },
+ "hasServiceId": {
+ "type": "boolean"
+ },
+ "hasServerConnectionId": {
+ "type": "boolean"
+ }
+ }
+ },
+ "commandFailedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "commandName": {
+ "type": "string"
+ },
+ "databaseName": {
+ "type": "string"
+ },
+ "hasServiceId": {
+ "type": "boolean"
+ },
+ "hasServerConnectionId": {
+ "type": "boolean"
+ }
+ }
+ }
+ }
+ },
+ "expectedCmapEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "maxProperties": 1,
+ "minProperties": 1,
+ "properties": {
+ "poolCreatedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "poolReadyEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "poolClearedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "hasServiceId": {
+ "type": "boolean"
+ },
+ "interruptInUseConnections": {
+ "type": "boolean"
+ }
+ }
+ },
+ "poolClosedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "connectionCreatedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "connectionReadyEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "connectionClosedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "reason": {
+ "type": "string"
+ }
+ }
+ },
+ "connectionCheckOutStartedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "connectionCheckOutFailedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "reason": {
+ "type": "string"
+ }
+ }
+ },
+ "connectionCheckedOutEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "connectionCheckedInEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ }
+ }
+ },
+ "expectedSdamEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "maxProperties": 1,
+ "minProperties": 1,
+ "properties": {
+ "serverDescriptionChangedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "previousDescription": {
+ "$ref": "#/definitions/serverDescription"
+ },
+ "newDescription": {
+ "$ref": "#/definitions/serverDescription"
+ }
+ }
+ },
+ "topologyDescriptionChangedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "previousDescription": {
+ "$ref": "#/definitions/topologyDescription"
+ },
+ "newDescription": {
+ "$ref": "#/definitions/topologyDescription"
+ }
+ }
+ },
+ "serverHeartbeatStartedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "awaited": {
+ "type": "boolean"
+ }
+ }
+ },
+ "serverHeartbeatSucceededEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "awaited": {
+ "type": "boolean"
+ }
+ }
+ },
+ "serverHeartbeatFailedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "awaited": {
+ "type": "boolean"
+ }
+ }
+ },
+ "topologyOpeningEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ },
+ "topologyClosedEvent": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {}
+ }
+ }
+ },
+ "serverDescription": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "type": {
+ "type": "string",
+ "enum": [
+ "Standalone",
+ "Mongos",
+ "PossiblePrimary",
+ "RSPrimary",
+ "RSSecondary",
+ "RSOther",
+ "RSArbiter",
+ "RSGhost",
+ "LoadBalancer",
+ "Unknown"
+ ]
+ }
+ }
+ },
+ "topologyDescription": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "type": {
+ "type": "string",
+ "enum": [
+ "Single",
+ "Unknown",
+ "ReplicaSetNoPrimary",
+ "ReplicaSetWithPrimary",
+ "Sharded",
+ "LoadBalanced"
+ ]
+ }
+ }
+ },
+ "expectedLogMessagesForClient": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "client",
+ "messages"
+ ],
+ "properties": {
+ "client": {
+ "type": "string"
+ },
+ "messages": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedLogMessage"
+ }
+ },
+ "ignoreExtraMessages": {
+ "type": "boolean"
+ },
+ "ignoreMessages": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/expectedLogMessage"
+ }
+ }
+ }
+ },
+ "expectedLogMessage": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "level",
+ "component",
+ "data"
+ ],
+ "properties": {
+ "level": {
+ "$ref": "#/definitions/logSeverityLevel"
+ },
+ "component": {
+ "$ref": "#/definitions/logComponent"
+ },
+ "data": {
+ "type": "object"
+ },
+ "failureIsRedacted": {
+ "type": "boolean"
+ }
+ }
+ },
+ "collectionOrDatabaseOptions": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "readConcern": {
+ "type": "object"
+ },
+ "readPreference": {
+ "type": "object"
+ },
+ "writeConcern": {
+ "type": "object"
+ },
+ "timeoutMS": {
+ "type": "integer"
+ }
+ }
+ },
+ "serverApi": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "version"
+ ],
+ "properties": {
+ "version": {
+ "type": "string"
+ },
+ "strict": {
+ "type": "boolean"
+ },
+ "deprecationErrors": {
+ "type": "boolean"
+ }
+ }
+ },
+ "operation": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "name",
+ "object"
+ ],
+ "properties": {
+ "name": {
+ "type": "string"
+ },
+ "object": {
+ "type": "string"
+ },
+ "arguments": {
+ "type": "object"
+ },
+ "ignoreResultAndError": {
+ "type": "boolean"
+ },
+ "expectError": {
+ "$ref": "#/definitions/expectedError"
+ },
+ "expectResult": {},
+ "saveResultAsEntity": {
+ "type": "string"
+ }
+ },
+ "allOf": [
+ {
+ "not": {
+ "required": [
+ "expectError",
+ "expectResult"
+ ]
+ }
+ },
+ {
+ "not": {
+ "required": [
+ "expectError",
+ "saveResultAsEntity"
+ ]
+ }
+ },
+ {
+ "not": {
+ "required": [
+ "ignoreResultAndError",
+ "expectResult"
+ ]
+ }
+ },
+ {
+ "not": {
+ "required": [
+ "ignoreResultAndError",
+ "expectError"
+ ]
+ }
+ },
+ {
+ "not": {
+ "required": [
+ "ignoreResultAndError",
+ "saveResultAsEntity"
+ ]
+ }
+ }
+ ]
+ },
+ "expectedError": {
+ "type": "object",
+ "additionalProperties": false,
+ "minProperties": 1,
+ "properties": {
+ "isError": {
+ "type": "boolean",
+ "const": true
+ },
+ "isClientError": {
+ "type": "boolean"
+ },
+ "isTimeoutError": {
+ "type": "boolean"
+ },
+ "errorContains": {
+ "type": "string"
+ },
+ "errorCode": {
+ "type": "integer"
+ },
+ "errorCodeName": {
+ "type": "string"
+ },
+ "errorLabelsContain": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string"
+ }
+ },
+ "errorLabelsOmit": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "type": "string"
+ }
+ },
+ "writeErrors": {
+ "type": "object"
+ },
+ "writeConcernErrors": {
+ "type": "array",
+ "items": {
+ "type": "object"
+ }
+ },
+ "errorResponse": {
+ "type": "object"
+ },
+ "expectResult": {}
+ }
+ },
+ "test": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": [
+ "description",
+ "operations"
+ ],
+ "properties": {
+ "description": {
+ "type": "string"
+ },
+ "runOnRequirements": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/runOnRequirement"
+ }
+ },
+ "skipReason": {
+ "type": "string"
+ },
+ "operations": {
+ "type": "array",
+ "items": {
+ "$ref": "#/definitions/operation"
+ }
+ },
+ "expectEvents": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/expectedEventsForClient"
+ }
+ },
+ "expectLogMessages": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/expectedLogMessagesForClient"
+ }
+ },
+ "outcome": {
+ "type": "array",
+ "minItems": 1,
+ "items": {
+ "$ref": "#/definitions/collectionData"
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/source/unified-test-format/tests/Makefile b/source/unified-test-format/tests/Makefile
index 5c30f9a66d..a2b79e3f70 100644
--- a/source/unified-test-format/tests/Makefile
+++ b/source/unified-test-format/tests/Makefile
@@ -1,4 +1,4 @@
-SCHEMA=../schema-1.20.json
+SCHEMA=../schema-1.21.json
.PHONY: all invalid valid-fail valid-pass atlas-data-lake versioned-api load-balancers gridfs transactions transactions-convenient-api crud collection-management read-write-concern retryable-reads retryable-writes sessions command-logging-and-monitoring client-side-operations-timeout HAS_AJV
diff --git a/source/unified-test-format/unified-test-format.md b/source/unified-test-format/unified-test-format.md
index 17a7e5bf12..dba41fb34b 100644
--- a/source/unified-test-format/unified-test-format.md
+++ b/source/unified-test-format/unified-test-format.md
@@ -31,10 +31,10 @@ This test format can be used to define tests for the following specifications:
- [GridFS](../gridfs/gridfs-spec.md)
- [Retryable Reads](../retryable-reads/retryable-reads.md)
- [Retryable Writes](../retryable-writes/retryable-writes.md)
-- [Sessions](../sessions/driver-sessions.rst)
+- [Sessions](../sessions/driver-sessions.md)
- [Transactions](../transactions/transactions.md)
- [Convenient API for Transactions](../transactions-convenient-api/transactions-convenient-api.rst)
-- [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.rst)
+- [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.md)
This is not an exhaustive list. Specifications that are known to not be supported by this format may be discussed under
[Future Work](#future-work).
@@ -188,7 +188,7 @@ Test runners MUST support the following types of entities:
- ClientSession. See [entity_session](#entity_session) and [Session Operations](#session-operations).
- GridFS Bucket. See [entity_bucket](#entity_bucket) and [Bucket Operations](#bucket-operations).
-