diff --git a/source/index.md b/source/index.md index 47298cf74c..ae4d3ebf1d 100644 --- a/source/index.md +++ b/source/index.md @@ -26,6 +26,10 @@ - [OP_MSG](message/OP_MSG.md) - [Retryable Reads](retryable-reads/retryable-reads.md) - [Retryable Writes](retryable-writes/retryable-writes.md) +- [SDAM Logging and Monitoring Specification](server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md) +- [Server Discovery And Monitoring -- Summary](server-discovery-and-monitoring/server-discovery-and-monitoring-summary.md) +- [Server Discovery And Monitoring -- Test Plan](server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md) +- [Server Monitoring](server-discovery-and-monitoring/server-monitoring.md) - [Server Selection](server-selection/server-selection.md) - [Server Selection Test Plan](server-selection/server-selection-tests.md) - [Unified Test Format](unified-test-format/unified-test-format.md) diff --git a/source/load-balancers/load-balancers.md b/source/load-balancers/load-balancers.md index 79ca900a94..f9198d18b2 100644 --- a/source/load-balancers/load-balancers.md +++ b/source/load-balancers/load-balancers.md @@ -88,14 +88,14 @@ events when operating in this mode. #### Log Messages SDAM events details described in [Monitoring](#monitoring) apply to corresponding log messages. Please refer to the -[SDAM logging specification](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#log-messages) +[SDAM logging specification](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#log-messages) for details on SDAM logging. Drivers MUST emit the relevant SDAM log messages, such as: -- [Starting Topology Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#starting-topology-monitoring-log-message) -- [Stopped Topology Mmonitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#stopped-topology-monitoring-log-message) -- [Starting Server Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#starting-server-monitoring-log-message) -- [Stopped Server Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#stopped-server-monitoring-log-message) -- [Topology Description Changed](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#topology-description-changed-log-message) +- [Starting Topology Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#starting-topology-monitoring-log-message) +- [Stopped Topology Mmonitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#stopped-topology-monitoring-log-message) +- [Starting Server Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#starting-server-monitoring-log-message) +- [Stopped Server Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#stopped-server-monitoring-log-message) +- [Topology Description Changed](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#topology-description-changed-log-message) ### Driver Sessions diff --git a/source/mongodb-handshake/handshake.rst b/source/mongodb-handshake/handshake.rst index d1462cab86..8a589c444a 100644 --- a/source/mongodb-handshake/handshake.rst +++ b/source/mongodb-handshake/handshake.rst @@ -79,7 +79,7 @@ ASIDE: If the legacy handshake response includes ``helloOk: true``, then subsequent topology monitoring commands MUST use the ``hello`` command. If the legacy handshake response does not include ``helloOk: true``, then subsequent topology monitoring commands MUST use the legacy hello command. See the -`Server Discovery and Monitoring spec <../server-discovery-and-monitoring/server-discovery-and-monitoring-summary.rst>`__ +`Server Discovery and Monitoring spec <../server-discovery-and-monitoring/server-discovery-and-monitoring-summary.md>`__ for further information. The initial handshake MUST be performed on every socket to any and all servers diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md new file mode 100644 index 0000000000..a7589a6472 --- /dev/null +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md @@ -0,0 +1,632 @@ +# SDAM Logging and Monitoring Specification + +- Status: Accepted +- Minimum Server Version: 2.4 + +______________________________________________________________________ + +## Abstract + +The SDAM logging and monitoring specification defines a set of behaviors in the driver for providing runtime information +about server discovery and monitoring (SDAM) in log messages, as well as in events that users can consume +programmatically, either directly or by integrating with third-party APM libraries. + +### Definitions + +#### META + +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +#### Terms + +`ServerAddress` + +> The term `ServerAddress` refers to the implementation in the driver's language of a server host/port pair. This may be +> an object or a string. The name of this object is NOT REQUIRED. + +`TopologyType` + +> The term `TopologyType` refers to the implementation in the driver's language of a topology type (standalone, sharded, +> etc.). This may be a string or object. The name of the object is NOT REQUIRED. + +`Server` + +> The term `Server` refers to the implementation in the driver's language of an abstraction of a mongod or mongos +> process, or a load balancer, as defined by the +> [SDAM specification](https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server). + +### Specification + +### Guidance + +#### Documentation + +The documentation provided in the code below is merely for driver authors and SHOULD NOT be taken as required +documentation for the driver. + +#### Messages and Events + +All drivers MUST implement the specified event types as well as log messages. + +Implementation details are noted below when a specific implementation is required. Within each event and log message, +all properties are REQUIRED unless noted otherwise. + +#### Naming + +All drivers MUST name types, properties, and log message values as defined in the following sections. Exceptions to this +rule are noted in the appropriate section. Class and interface names may vary according to the driver and language best +practices. + +#### Publishing and Subscribing + +The driver SHOULD publish events in a manner that is standard to the driver's language publish/subscribe patterns and is +not strictly mandated in this specification. + +Similarly, as described in the [logging specification](../logging/logging.md#implementation-requirements) the driver +SHOULD emit log messages in a manner that is standard for the language. + +### Guarantees + +#### Event Order and Concurrency + +Events and log messages MUST be published in the order that their corresponding changes are processed in the driver. +Events MUST NOT be published concurrently for the same topology ID or server ID, but MAY be published concurrently for +differing topology IDs and server IDs. + +#### Heartbeats + +The driver MUST guarantee that every `ServerHeartbeatStartedEvent` has either a correlating +`ServerHeartbeatSucceededEvent` or `ServerHeartbeatFailedEvent`, and that every "server heartbeat started" log message +has either a correlating "server heartbeat succeeded" or "server heartbeat failed" log message. + +Drivers that use the streaming heartbeat protocol MUST publish a `ServerHeartbeatStartedEvent` and "server heartbeat +started" log message before attempting to read the next `hello` or legacy hello exhaust response. + +#### Error Handling + +If an exception occurs while sending the `hello` or legacy hello operation to the server, the driver MUST generate a +`ServerHeartbeatFailedEvent` and "server heartbeat failed" log message with the exception or message and re-raise the +exception. The SDAM mandated retry of the `hello` or legacy hello call should be visible to consumers. + +#### Topology IDs + +These MUST be a unique value that is specific to the Topology for which the events and log messages are emitted. The +language may decide how to generate the value and what type the value is, as long as it is unique to the Topology. The +ID MUST be created once when the Topology is created and remain the same until the Topology is destroyed. + +#### Initial Server Description + +`ServerDescription` objects MUST be initialized with a default description in an "unknown" state, guaranteeing that the +previous description in the events and log messages will never be null. + +#### Initial Topology Description + +The first `TopologyDescriptionChangedEvent` to be emitted from a monitored Topology MUST set its `previousDescription` +property to be a `TopologyDescription` object in the "unknown" state. + +### Events API + +The first `TopologyDescriptionChangedEvent` to be emitted from a monitored Topology MUST set its `previousDescription` +property to be a `TopologyDescription` object in the "unknown" state. + +## Closing Topology Description + +When a `Topology` object or equivalent is being shut-down or closed, the driver MUST change the `TopologyDescription` to +an "unknown" state. + +______________________________________________________________________ + +## Events API + +This specification defines 9 main events that MUST be published in the scenarios described. 6 of these events are the +core behaviour within the cluster lifecycle, and the remaining 3 server heartbeat events are fired from the server +monitor and follow the guidelines for publishing in the command monitoring specification. + +Events that MUST be published (with their conditions) are as follows. + +| Event Type | Condition | +| --------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `TopologyOpeningEvent` | When a topology description is initialized - this MUST be the first SDAM event fired. | +| `ServerOpeningEvent` | Published when the server description is instantiated with its defaults, and MUST be the first operation to happen after the defaults are set. This is before the Monitor is created and the Monitor socket connection is opened. | +| `ServerDescriptionChangedEvent` | When the old server description is not equal to the new server description | +| `TopologyDescriptionChangedEvent` | When the old topology description is not equal to the new topology description. | +| `ServerClosedEvent` | Published when the server monitor's connection is closed and the server is shutdown. | +| `TopologyClosedEvent` | When a topology is shut down - this MUST be the last SDAM event fired. | +| `ServerHeartbeatStartedEvent` | Published when the server monitor sends its `hello` or legacy hello call to the server. When the monitor is creating a new connection, this event MUST be published just before the socket is created. | +| `ServerHeartbeatSucceededEvent` | Published on successful completion of the server monitor's `hello` or legacy hello call. | +| `ServerHeartbeatFailedEvent` | Published on failure of the server monitor's `hello` or legacy hello call, either with an ok: 0 result or a socket exception from the connection. | + +```typescript +/** + * Published when server description changes, but does NOT include changes to the RTT. + */ +interface ServerDescriptionChangedEvent { + + /** + * Returns the address (host/port pair) of the server. + */ + address: ServerAddress; + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; + + /** + * Returns the previous server description. + */ + previousDescription: ServerDescription; + + /** + * Returns the new server description. + */ + newDescription: ServerDescription; +} + +/** + * Published when server is initialized. + */ +interface ServerOpeningEvent { + + /** + * Returns the address (host/port pair) of the server. + */ + address: ServerAddress; + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; +} + +/** + * Published when server is closed. + */ +interface ServerClosedEvent { + + /** + * Returns the address (host/port pair) of the server. + */ + address: ServerAddress; + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; +} + +/** + * Published when topology description changes. + */ +interface TopologyDescriptionChangedEvent { + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; + + /** + * Returns the old topology description. + */ + previousDescription: TopologyDescription; + + /** + * Returns the new topology description. + */ + newDescription: TopologyDescription; +} + +/** + * Published when topology is initialized. + */ +interface TopologyOpeningEvent { + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; +} + +/** + * Published when topology is closed. + */ +interface TopologyClosedEvent { + + /** + * Returns a unique identifier for the topology. + */ + topologyId: Object; +} + +/** + * Fired when the server monitor's ``hello`` or legacy hello command is started - immediately before + * the ``hello`` or legacy hello command is serialized into raw BSON and written to the socket. + * When the monitor is creating a new monitoring connection, this event is fired just before the + * socket is opened. + */ +interface ServerHeartbeatStartedEvent { + + /** + * Returns the connection id for the command. The connection id is the unique + * identifier of the driver's Connection object that wraps the socket. For languages that + * do not have this object, this MUST a string of "hostname:port" or an object that + * that contains the hostname and port as attributes. + * + * The name of this field is flexible to match the object that is returned from the driver. + * Examples are, but not limited to, 'address', 'serverAddress', 'connectionId', + */ + connectionId: ConnectionId; + + /** + * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. + */ + awaited: Boolean; + +} + +/** + * Fired when the server monitor's ``hello`` or legacy hello succeeds. + */ +interface ServerHeartbeatSucceededEvent { + + /** + * Returns the execution time of the event in the highest possible resolution for the platform. + * The calculated value MUST be the time to send the message and receive the reply from the server, + * including BSON serialization and deserialization. The name can imply the units in which the + * value is returned, i.e. durationMS, durationNanos. + * + * When the awaited field is false, the time measurement used MUST be the + * same measurement used for the RTT calculation. When the awaited field is + * true, the time measurement is not used for RTT calculation. + */ + duration: Int64; + + /** + * Returns the command reply. + */ + reply: Document; + + /** + * Returns the connection id for the command. For languages that do not have this, + * this MUST return the driver equivalent which MUST include the server address and port. + * The name of this field is flexible to match the object that is returned from the driver. + */ + connectionId: ConnectionId; + + /** + * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. If + * true, then the duration field cannot be used for RTT calculation + * because the command blocks on the server. + */ + awaited: Boolean; + +} + +/** + * Fired when the server monitor's ``hello`` or legacy hello fails, either with an "ok: 0" or a socket exception. + */ +interface ServerHeartbeatFailedEvent { + + /** + * Returns the execution time of the event in the highest possible resolution for the platform. + * The calculated value MUST be the time to send the message and receive the reply from the server, + * including BSON serialization and deserialization. The name can imply the units in which the + * value is returned, i.e. durationMS, durationNanos. + */ + duration: Int64; + + /** + * Returns the failure. Based on the language, this SHOULD be a message string, + * exception object, or error document. + */ + failure: String,Exception,Document; + + /** + * Returns the connection id for the command. For languages that do not have this, + * this MUST return the driver equivalent which MUST include the server address and port. + * The name of this field is flexible to match the object that is returned from the driver. + */ + connectionId: ConnectionId; + + /** + * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. If + * true, then the duration field cannot be used for RTT calculation + * because the command blocks on the server. + */ + awaited: Boolean; +} +``` + +The `TopologyDescription` object MUST expose the new methods defined in the API below, in order for subscribers to take +action on certain conditions based on the driver options. + +`TopologyDescription` objects MAY have additional methods and properties. + +```typescript +/** + * Describes the current topology. + */ +interface TopologyDescription { + + /** + * Determines if the topology has a readable server available. See the table in the + * following section for behaviour rules. + */ + hasReadableServer(readPreference: Optional): Boolean + + /** + * Determines if the topology has a writable server available. See the table in the + * following section for behaviour rules. + */ + hasWritableServer(): Boolean +} +``` + +### Determining If A Topology Has Readable/Writable Servers + +The following table describes the rules for determining if a topology type has readable or writable servers. If no read +preference is passed to `hasReadableServer`, the driver MUST default the value to the default read preference, +`primary`, or treat the call as if `primary` was provided. + + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Topology TypehasReadableServerhasWritableServer
Unknownfalsefalse
Singletrue if the server is availabletrue if the server is available
ReplicaSetNoPrimary
Called with primary: +false
+Called with any other option: uses the read preference to determine if +any server in the cluster is suitable for reading.
+Called with no option: false
false
ReplicaSetWithPrimary
Called with any valid option: uses the read +preference to determine if any server in the cluster is suitable for +reading.
+Called with no option: true
true
Shardedtrue if 1+ servers are availabletrue if 1+ servers are available
LoadBalancedtruetrue
+ +### Log Messages + +Please refer to the [logging specification](../logging/logging.md) for details on logging implementations in general, +including log levels, log components, and structured versus unstructured logging. + +Drivers MUST support logging of SDAM information via the following types of log messages. These messages MUST be logged +at `Debug` level and use the `topology` log component. + +A number of the log messages are intended to match the information contained in the events above. However, note that a +log message regarding a server description change (which would correspond to `ServerDescriptionChangedEvent`) has been +intentionally omitted since the information it would contain is redundant with `TopologyDescriptionChangedEvent` and the +equivalent log message. + +Drivers MAY implement SDAM logging support via an event subscriber if it is convenient to do so. + +The types used in the structured message definitions below are demonstrative, and drivers MAY use similar types instead +so long as the information is present (e.g. a double instead of an integer, or a string instead of an integer if the +structured logging framework does not support numeric types.) + +#### Common Fields + +The following key-value pairs are common to all or several log messages and MUST be included in the "applicable +messages": + +| Key | Applicable Messages | Suggested Type | Value | +| ------------------ | ---------------------------------------------------------------------------------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| topologyId | All messages | Flexible | The driver's unique ID for this topology as discussed in [Topology IDs](#topology-ids). The type is flexible depending on the driver's choice of type for topology ID. | +| serverHost | Log messages specific to a particular server, including heartbeat-related messages | String | The hostname, IP address, or Unix domain socket path for the endpoint the pool is for. | +| serverPort | Log messages specific to a particular server, including heartbeat-related messages | Int | (Only present for server-specific log messages) The port for the endpoint the pool is for. Optional; not present for Unix domain sockets. When the user does not specify a port and the default (27017) is used, the driver SHOULD include it here. | +| driverConnectionId | Heartbeat-related log messages | Int | The driver-generated ID for the monitoring connection as defined in the [connection monitoring and pooling specification](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md). Unlike `connectionId` in the above events, this field MUST NOT contain the host/port; that information MUST be in the above fields, `serverHost` and `serverPort`. This field is optional for drivers that do not implement CMAP if they do have an equivalent concept of a connection ID. | +| serverConnectionId | Heartbeat-related log messages | Int | The server's ID for the monitoring connection, if known. This value will be unknown and can be omitted in certain cases, e.g. the first "heartbeat started" message for a monitoring connection. Only present on server versions 4.2+. | + +#### "Starting Topology Monitoring" Log Message + +This message MUST be published under the same circumstances as a `TopologyOpeningEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pair: + +| Key | Suggested Type | Value | +| ------- | -------------- | ------------------------------ | +| message | String | "Starting topology monitoring" | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Starting monitoring for topology with ID {{topologyId}} + +#### "Stopped Topology Monitoring" Log Message + +This message MUST be published under the same circumstances as a `TopologyClosedEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pair: + +| Key | Suggested Type | Value | +| ------- | -------------- | ----------------------------- | +| message | String | "Stopped topology monitoring" | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Stopped monitoring for topology with ID {{topologyId}} + +#### "Starting Server Monitoring" Log Message + +This message MUST be published under the same circumstances as a `ServerOpeningEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pair: + +| Key | Suggested Type | Value | +| ------- | -------------- | ---------------------------- | +| message | String | "Starting server monitoring" | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Starting monitoring for server {{serverHost}}:{{serverPort}} in topology with ID {{topologyId}} + +#### "Stopped Server Monitoring" Log Message + +This message MUST be published under the same circumstances as a `ServerClosedEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pair: + +| Key | Suggested Type | Value | +| ------- | -------------- | --------------------------- | +| message | String | "Stopped server monitoring" | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Stopped monitoring for server {{serverHost}}:{{serverPort}} in topology with ID {{topologyId}} + +#### "Topology Description Changed" Log Message + +This message MUST be published under the same circumstances as a `TopologyDescriptionChangedEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pairs: + +| Key | Suggested Type | Value | +| ------------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| message | String | "Topology description changed" | +| previousDescription | String | A string representation of the previous description of the topology. The format is flexible and could be e.g. the `toString()` implementation for a driver's topology description type, or an extended JSON representation of the topology object. | +| newDescription | String | A string representation of the new description of the server. The format is flexible and could be e.g. the `toString()` implementation for a driver's topology description type, or an extended JSON representation of the topology object. | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Description changed for topology with ID {{topologyId}}. Previous description: {{previousDescription}}. New +> description: {{newDescription}} + +#### "Server Heartbeat Started" Log Message + +This message MUST be published under the same circumstances as a `ServerHeartbeatStartedEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pairs: + +| Key | Suggested Type | Value | +| ------- | -------------- | --------------------------------------------------------------------- | +| message | String | "Server heartbeat started" | +| awaited | Boolean | Whether this log message is for an awaitable hello or legacy "hello". | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Heartbeat started for {{serverHost}}:{{serverPort}} on connection with driver-generated ID {{driverConnectionId}} and +> server-generated ID {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: {{awaited}} + +#### "Server Heartbeat Succeeded" Log Message + +This message MUST be published under the same circumstances as a `ServerHeartbeatSucceededEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pairs: + +| Key | Suggested Type | Value | +| ---------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| message | String | "Server heartbeat succeeded" | +| awaited | Boolean | Whether this log message is for an awaitable hello or legacy "hello". | +| durationMS | Int | The execution time for the heartbeat in milliseconds. See `ServerHeartbeatSucceededEvent` in [Events API](#events-api) for details on calculating this value. | +| reply | String | Relaxed extended JSON representation of the reply to the heartbeat command. | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Heartbeat succeeded in {{durationMS}} ms for {{serverHost}}:{{serverPort}} on connection with driver-generated ID +> {{driverConnectionId}} and server-generated ID {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: +> {{awaited}}. Reply: {{reply}} + +#### "Server Heartbeat Failed" Log Message + +This message MUST be published under the same circumstances as a `ServerHeartbeatFailedEvent` as detailed in +[Events API](#events-api). + +In addition to the relevant common fields, these messages MUST contain the following key-value pairs: + +| Key | Suggested Type | Value | +| ---------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| message | String | "Server heartbeat failed" | +| awaited | Boolean | Whether this log message is for an awaitable hello or legacy "hello". | +| durationMS | Int | The execution time for the heartbeat in milliseconds. See `ServerHeartbeatFailedEvent` in [Events API](#events-api) for details on calculating this value. | +| failure | Flexible | The error. The type and format of this value is flexible; see the [logging specification](../logging/logging.md#representing-errors-in-log-messages) for details on representing errors in log messages. If the command is considered sensitive, the error MUST be redacted and replaced with a language-appropriate alternative for a redacted error, e.g. an empty string, empty document, or null. | + +The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in +placeholders as appropriate: + +> Heartbeat failed in {{durationMS}} ms for {{serverHost}}:{{serverPort}} on connection with driver-generated ID +> {{driverConnectionId}} and server-generated ID {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: +> {{awaited}}. Failure: {{failure}} + +### Tests + +See the [README](tests/monitoring/README.md). + +## Changelog + +- 2024-05-02: Migrated from reStructuredText to Markdown. + +- 2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's previousDescription field + +- 2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted + +- 2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and + added requirements for SDAM log messages. + +- 2022-10-05: Remove spec front matter and reformat changelog. + +- 2021-05-06: Updated to use modern terminology. + +# \<\<\<\<\<\<\< HEAD :2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's previousDescription field :2024-01-17: Updated to require that `TopologyDescriptionChangedEvent` should be emitted before just `TopologyClosedEvent` is emitted :2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted :2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and added requirements for SDAM log messages. :2022-10-05: Remove spec front matter and reformat changelog. :2021-05-06: Updated to use modern terminology. :2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. :2018:12-12: Clarified table of rules for readable/writable servers :2016-08-31: Added table of rules for determining if topology has readable/writable servers. :2016-10-11: TopologyDescription objects MAY have additional methods and properties. ||||||| parent of 469393fd (DRIVERS-2789 Convert SDAM Spec to Markdown) :2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's previousDescription field :2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted :2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and added requirements for SDAM log messages. :2022-10-05: Remove spec front matter and reformat changelog. :2021-05-06: Updated to use modern terminology. :2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. :2018:12-12: Clarified table of rules for readable/writable servers :2016-08-31: Added table of rules for determining if topology has readable/writable servers. :2016-10-11: TopologyDescription objects MAY have additional methods and properties. + +- 2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. + +> > > > > > > 469393fd (DRIVERS-2789 Convert SDAM Spec to Markdown) + +- 2018:12-12: Clarified table of rules for readable/writable servers + +- 2016-08-31: Added table of rules for determining if topology has readable/writable servers. + +- 2016-10-11: TopologyDescription objects MAY have additional methods and properties. + +______________________________________________________________________ diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst index 36cf12f9d4..e2e01525df 100644 --- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst @@ -1,730 +1,4 @@ -.. role:: javascript(code) - :language: javascript -========================================= -SDAM Logging and Monitoring Specification -========================================= - -:Status: Accepted -:Minimum Server Version: 2.4 - -.. contents:: - --------- - -Abstract -======== - -The SDAM logging and monitoring specification defines a set of behaviors in the driver for providing runtime information about server discovery and monitoring (SDAM) in log messages, as well as in events that users can consume programmatically, either directly or by integrating with third-party APM libraries. - ------------ -Definitions ------------ - -META ----- - -The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in `RFC 2119 `_. - -Terms ------ - -``ServerAddress`` - - The term ``ServerAddress`` refers to the implementation in the driver's language of a server host/port pair. This may be an object or a string. The name of this object is NOT REQUIRED. - -``TopologyType`` - - The term ``TopologyType`` refers to the implementation in the driver's language of a topology type (standalone, sharded, etc.). This may be a string or object. The name of the object is NOT REQUIRED. - -``Server`` - - The term ``Server`` refers to the implementation in the driver's language of an abstraction of a mongod or mongos process, or a load balancer, as defined by the - `SDAM specification `_. - -------------- -Specification -------------- - --------- -Guidance --------- - -Documentation -------------- - -The documentation provided in the code below is merely for driver authors and SHOULD NOT be taken as required documentation for the driver. - -Messages and Events -------------------- - -All drivers MUST implement the specified event types as well as log messages. - -Implementation details are noted below when a specific implementation is required. Within each event and log message, all properties are REQUIRED unless noted otherwise. - -Naming ------- - -All drivers MUST name types, properties, and log message values as defined in the following sections. Exceptions to this rule are noted in the appropriate section. Class and interface names may vary according to the driver and language best practices. - -Publishing and Subscribing --------------------------- - -The driver SHOULD publish events in a manner that is standard to the driver's language publish/subscribe patterns and is not strictly mandated in this specification. - -Similarly, as described in the `logging specification <../logging/logging.md#implementation-requirements>`__ the driver SHOULD emit log messages in a manner that is standard for the language. - ----------- -Guarantees ----------- - -Event Order and Concurrency ---------------------------- - -Events and log messages MUST be published in the order that their corresponding changes are processed in the driver. -Events MUST NOT be published concurrently for the same topology ID or server ID, but MAY be published concurrently for differing topology IDs and server IDs. - -Heartbeats ----------- - -The driver MUST guarantee that every ``ServerHeartbeatStartedEvent`` has either a correlating ``ServerHeartbeatSucceededEvent`` or ``ServerHeartbeatFailedEvent``, and that -every "server heartbeat started" log message has either a correlating "server heartbeat succeeded" or "server heartbeat failed" log message. - -Drivers that use the streaming heartbeat protocol MUST publish a ``ServerHeartbeatStartedEvent`` and "server heartbeat started" log message before attempting to read the next -``hello`` or legacy hello exhaust response. - -Error Handling --------------- - -If an exception occurs while sending the ``hello`` or legacy hello operation to the server, the driver MUST generate a ``ServerHeartbeatFailedEvent`` and "server heartbeat failed" -log message with the exception or message and re-raise the exception. The SDAM mandated retry of the ``hello`` or legacy hello call should be visible to consumers. - -Topology IDs ------------- - -These MUST be a unique value that is specific to the Topology for which the events and log messages are emitted. The language may decide how to generate the value and what type the value is, -as long as it is unique to the Topology. The ID MUST be created once when the Topology is created and remain the same until the Topology is destroyed. - - -Initial Server Description --------------------------- - -``ServerDescription`` objects MUST be initialized with a default description in an “unknown” state, guaranteeing that the previous description in the events and log messages will never be null. - - -Initial Topology Description ----------------------------- - -The first ``TopologyDescriptionChangedEvent`` to be emitted from a monitored Topology MUST set its ``previousDescription`` property to be a ``TopologyDescription`` object in the "unknown" state. - -Closing Topology Description ----------------------------- - -When a ``Topology`` object or equivalent is being shut-down or closed, the driver MUST change the -``TopologyDescription`` to an "unknown" state. - ----------- -Events API ----------- - -This specification defines 9 main events that MUST be published in the scenarios described. 6 of these events are the core behaviour within the cluster lifecycle, and the remaining 3 server heartbeat events are fired from the server monitor and follow the guidelines for publishing in the command monitoring specification. - -Events that MUST be published (with their conditions) are as follows. - -.. list-table:: - :header-rows: 1 - :widths: 50 50 - - * - Event Type - - Condition - * - ``TopologyOpeningEvent`` - - When a topology description is initialized - this MUST be the first SDAM event fired. - * - ``ServerOpeningEvent`` - - Published when the server description is instantiated with its defaults, and MUST be the first operation to happen after the defaults are set. This is before the Monitor is created and the Monitor socket connection is opened. - * - ``ServerDescriptionChangedEvent`` - - When the old server description is not equal to the new server description - * - ``TopologyDescriptionChangedEvent`` - - When the old topology description is not equal to the new topology description. - * - ``ServerClosedEvent`` - - Published when the server monitor's connection is closed and the server is shutdown. - * - ``TopologyClosedEvent`` - - When a topology is shut down - this MUST be the last SDAM event fired. - * - ``ServerHeartbeatStartedEvent`` - - Published when the server monitor sends its ``hello`` or legacy hello call to the server. When the monitor is creating a new connection, this event MUST be published just before the socket is created. - * - ``ServerHeartbeatSucceededEvent`` - - Published on successful completion of the server monitor's ``hello`` or legacy hello call. - * - ``ServerHeartbeatFailedEvent`` - - Published on failure of the server monitor's ``hello`` or legacy hello call, either with an ok: 0 result or a socket exception from the connection. - - -.. code:: typescript - - /** - * Published when server description changes, but does NOT include changes to the RTT. - */ - interface ServerDescriptionChangedEvent { - - /** - * Returns the address (host/port pair) of the server. - */ - address: ServerAddress; - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - - /** - * Returns the previous server description. - */ - previousDescription: ServerDescription; - - /** - * Returns the new server description. - */ - newDescription: ServerDescription; - } - - /** - * Published when server is initialized. - */ - interface ServerOpeningEvent { - - /** - * Returns the address (host/port pair) of the server. - */ - address: ServerAddress; - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - } - - /** - * Published when server is closed. - */ - interface ServerClosedEvent { - - /** - * Returns the address (host/port pair) of the server. - */ - address: ServerAddress; - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - } - - /** - * Published when topology description changes. - */ - interface TopologyDescriptionChangedEvent { - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - - /** - * Returns the old topology description. - */ - previousDescription: TopologyDescription; - - /** - * Returns the new topology description. - */ - newDescription: TopologyDescription; - } - - /** - * Published when topology is initialized. - */ - interface TopologyOpeningEvent { - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - } - - /** - * Published when topology is closed. - */ - interface TopologyClosedEvent { - - /** - * Returns a unique identifier for the topology. - */ - topologyId: Object; - } - - /** - * Fired when the server monitor's ``hello`` or legacy hello command is started - immediately before - * the ``hello`` or legacy hello command is serialized into raw BSON and written to the socket. - * When the monitor is creating a new monitoring connection, this event is fired just before the - * socket is opened. - */ - interface ServerHeartbeatStartedEvent { - - /** - * Returns the connection id for the command. The connection id is the unique - * identifier of the driver's Connection object that wraps the socket. For languages that - * do not have this object, this MUST a string of “hostname:port” or an object that - * that contains the hostname and port as attributes. - * - * The name of this field is flexible to match the object that is returned from the driver. - * Examples are, but not limited to, 'address', 'serverAddress', 'connectionId', - */ - connectionId: ConnectionId; - - /** - * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. - */ - awaited: Boolean; - - } - - /** - * Fired when the server monitor's ``hello`` or legacy hello succeeds. - */ - interface ServerHeartbeatSucceededEvent { - - /** - * Returns the execution time of the event in the highest possible resolution for the platform. - * The calculated value MUST be the time to send the message and receive the reply from the server, - * including BSON serialization and deserialization. The name can imply the units in which the - * value is returned, i.e. durationMS, durationNanos. - * - * When the awaited field is false, the time measurement used MUST be the - * same measurement used for the RTT calculation. When the awaited field is - * true, the time measurement is not used for RTT calculation. - */ - duration: Int64; - - /** - * Returns the command reply. - */ - reply: Document; - - /** - * Returns the connection id for the command. For languages that do not have this, - * this MUST return the driver equivalent which MUST include the server address and port. - * The name of this field is flexible to match the object that is returned from the driver. - */ - connectionId: ConnectionId; - - /** - * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. If - * true, then the duration field cannot be used for RTT calculation - * because the command blocks on the server. - */ - awaited: Boolean; - - } - - /** - * Fired when the server monitor's ``hello`` or legacy hello fails, either with an “ok: 0” or a socket exception. - */ - interface ServerHeartbeatFailedEvent { - - /** - * Returns the execution time of the event in the highest possible resolution for the platform. - * The calculated value MUST be the time to send the message and receive the reply from the server, - * including BSON serialization and deserialization. The name can imply the units in which the - * value is returned, i.e. durationMS, durationNanos. - */ - duration: Int64; - - /** - * Returns the failure. Based on the language, this SHOULD be a message string, - * exception object, or error document. - */ - failure: String,Exception,Document; - - /** - * Returns the connection id for the command. For languages that do not have this, - * this MUST return the driver equivalent which MUST include the server address and port. - * The name of this field is flexible to match the object that is returned from the driver. - */ - connectionId: ConnectionId; - - /** - * Determines if this heartbeat event is for an awaitable ``hello`` or legacy hello. If - * true, then the duration field cannot be used for RTT calculation - * because the command blocks on the server. - */ - awaited: Boolean; - } - - -The ``TopologyDescription`` object MUST expose the new methods defined in the API below, in order for subscribers to take action on certain conditions based on the driver options. - -``TopologyDescription`` objects MAY have additional methods and properties. - -.. code:: typescript - - /** - * Describes the current topology. - */ - interface TopologyDescription { - - /** - * Determines if the topology has a readable server available. See the table in the - * following section for behaviour rules. - */ - hasReadableServer(readPreference: Optional): Boolean - - /** - * Determines if the topology has a writable server available. See the table in the - * following section for behaviour rules. - */ - hasWritableServer(): Boolean - } - -------------------------------------------------------- -Determining If A Topology Has Readable/Writable Servers -------------------------------------------------------- - -The following table describes the rules for determining if a topology type has readable or -writable servers. If no read preference is passed to ``hasReadableServer``, the driver MUST default -the value to the default read preference, ``primary``, or treat the call as if ``primary`` was provided. - -+-----------------------+----------------------------------------+----------------------------------------+ -| Topology Type | ``hasReadableServer`` | ``hasWritableServer`` | -+=======================+========================================+========================================+ -| Unknown | ``false`` | ``false`` | -+-----------------------+----------------------------------------+----------------------------------------+ -| Single | ``true`` if the server is available | ``true`` if the server is available | -+-----------------------+----------------------------------------+----------------------------------------+ -| ReplicaSetNoPrimary | | Called with ``primary``: ``false`` | ``false`` | -| | | Called with any other option: uses | | -| | the read preference to determine if | | -| | any server in the cluster is | | -| | suitable for reading. | | -| | | Called with no option: ``false`` | | -+-----------------------+----------------------------------------+----------------------------------------+ -| ReplicaSetWithPrimary | | Called with any valid option: uses | ``true`` | -| | the read preference to determine if | | -| | any server in the cluster is | | -| | suitable for reading. | | -| | | Called with no option: ``true`` | | -+-----------------------+----------------------------------------+----------------------------------------+ -| Sharded | ``true`` if 1+ servers are available | ``true`` if 1+ servers are available | -+-----------------------+----------------------------------------+----------------------------------------+ -| LoadBalanced | ``true`` | ``true`` | -+-----------------------+----------------------------------------+----------------------------------------+ - ------------- -Log Messages ------------- -Please refer to the `logging specification <../logging/logging.md>`__ for details on logging implementations in general, including log levels, log -components, and structured versus unstructured logging. - -Drivers MUST support logging of SDAM information via the following types of log messages. These messages MUST be logged at ``Debug`` level and use -the ``topology`` log component. - -A number of the log messages are intended to match the information contained in the events above. However, note that a log message regarding a server -description change (which would correspond to ``ServerDescriptionChangedEvent``) has been intentionally omitted since the information it would contain -is redundant with ``TopologyDescriptionChangedEvent`` and the equivalent log message. - -Drivers MAY implement SDAM logging support via an event subscriber if it is convenient to do so. - -The types used in the structured message definitions below are demonstrative, and drivers MAY use similar types instead so long as the information -is present (e.g. a double instead of an integer, or a string instead of an integer if the structured logging framework does not support numeric types.) - -Common Fields -------------- -The following key-value pairs are common to all or several log messages and MUST be included in the "applicable messages": - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 1 - - * - Key - - Applicable Messages - - Suggested Type - - Value - - * - topologyId - - All messages - - Flexible - - The driver's unique ID for this topology as discussed in `Topology IDs <#topology-ids>`_. The type - is flexible depending on the driver's choice of type for topology ID. - - * - serverHost - - Log messages specific to a particular server, including heartbeat-related messages - - String - - The hostname, IP address, or Unix domain socket path for the endpoint the pool is for. - - * - serverPort - - Log messages specific to a particular server, including heartbeat-related messages - - Int - - (Only present for server-specific log messages) The port for the endpoint the pool is for. Optional; not present for Unix domain sockets. When - the user does not specify a port and the default (27017) is used, the driver SHOULD include it here. - - * - driverConnectionId - - Heartbeat-related log messages - - Int - - The driver-generated ID for the monitoring connection as defined in the - `connection monitoring and pooling specification <../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md>`_. Unlike - ``connectionId`` in the above events, this field MUST NOT contain the host/port; that information MUST be in the above fields, - ``serverHost`` and ``serverPort``. This field is optional for drivers that do not implement CMAP if they do have an equivalent concept of - a connection ID. - - * - serverConnectionId - - Heartbeat-related log messages - - Int - - The server's ID for the monitoring connection, if known. This value will be unknown and can be omitted in certain cases, e.g. the first - "heartbeat started" message for a monitoring connection. Only present on server versions 4.2+. - -"Starting Topology Monitoring" Log Message ------------------------------------------- -This message MUST be published under the same circumstances as a ``TopologyOpeningEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pair: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Starting topology monitoring" - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Starting monitoring for topology with ID {{topologyId}} - -"Stopped Topology Monitoring" Log Message ------------------------------------------- -This message MUST be published under the same circumstances as a ``TopologyClosedEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pair: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Stopped topology monitoring" - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Stopped monitoring for topology with ID {{topologyId}} - -"Starting Server Monitoring" Log Message ----------------------------------------- -This message MUST be published under the same circumstances as a ``ServerOpeningEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pair: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Starting server monitoring" - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Starting monitoring for server {{serverHost}}:{{serverPort}} in topology with ID {{topologyId}} - -"Stopped Server Monitoring" Log Message ----------------------------------------- -This message MUST be published under the same circumstances as a ``ServerClosedEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pair: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Stopped server monitoring" - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Stopped monitoring for server {{serverHost}}:{{serverPort}} in topology with ID {{topologyId}} - -"Topology Description Changed" Log Message ------------------------------------------- -This message MUST be published under the same circumstances as a ``TopologyDescriptionChangedEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pairs: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Topology description changed" - - * - previousDescription - - String - - A string representation of the previous description of the topology. The format is flexible and could be e.g. the ``toString()`` implementation - for a driver's topology description type, or an extended JSON representation of the topology object. - - * - newDescription - - String - - A string representation of the new description of the server. The format is flexible and could be e.g. the ``toString()`` implementation - for a driver's topology description type, or an extended JSON representation of the topology object. - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Description changed for topology with ID {{topologyId}}. Previous description: {{previousDescription}}. New description: {{newDescription}} - -"Server Heartbeat Started" Log Message --------------------------------------- -This message MUST be published under the same circumstances as a ``ServerHeartbeatStartedEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pairs: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Server heartbeat started" - - * - awaited - - Boolean - - Whether this log message is for an awaitable hello or legacy "hello". - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Heartbeat started for {{serverHost}}:{{serverPort}} on connection with driver-generated ID {{driverConnectionId}} and server-generated ID - {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: {{awaited}} - -"Server Heartbeat Succeeded" Log Message ----------------------------------------- -This message MUST be published under the same circumstances as a ``ServerHeartbeatSucceededEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pairs: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Server heartbeat succeeded" - - * - awaited - - Boolean - - Whether this log message is for an awaitable hello or legacy "hello". - - * - durationMS - - Int - - The execution time for the heartbeat in milliseconds. See ``ServerHeartbeatSucceededEvent`` in `Events API <#events-api>`_ for details - on calculating this value. - - * - reply - - String - - Relaxed extended JSON representation of the reply to the heartbeat command. - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Heartbeat succeeded in {{durationMS}} ms for {{serverHost}}:{{serverPort}} on connection with driver-generated ID {{driverConnectionId}} - and server-generated ID {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: {{awaited}}. Reply: {{reply}} - -"Server Heartbeat Failed" Log Message -------------------------------------- -This message MUST be published under the same circumstances as a ``ServerHeartbeatFailedEvent`` as detailed in `Events API <#events-api>`_. - -In addition to the relevant common fields, these messages MUST contain the following key-value pairs: - -.. list-table:: - :header-rows: 1 - :widths: 1 1 1 - - * - Key - - Suggested Type - - Value - - * - message - - String - - "Server heartbeat failed" - - * - awaited - - Boolean - - Whether this log message is for an awaitable hello or legacy "hello". - - * - durationMS - - Int - - The execution time for the heartbeat in milliseconds. See ``ServerHeartbeatFailedEvent`` in `Events API <#events-api>`_ for details - on calculating this value. - - * - failure - - Flexible - - The error. The type and format of this value is flexible; see the `logging specification <../logging/logging.md#representing-errors-in-log-messages>`__ - for details on representing errors in log messages. If the command is considered sensitive, the error MUST be redacted and replaced with a - language-appropriate alternative for a redacted error, e.g. an empty string, empty document, or null. - -The unstructured form SHOULD be as follows, using the values defined in the structured format above to fill in placeholders as appropriate: - - Heartbeat failed in {{durationMS}} ms for {{serverHost}}:{{serverPort}} on connection with driver-generated ID {{driverConnectionId}} and - server-generated ID {{serverConnectionId}} in topology with ID {{topologyId}}. Awaited: {{awaited}}. Failure: {{failure}} - ------ -Tests ------ - -See the `README `_. - - -Changelog -========= - -:2024-03-29: Updated to clarify expected initial value of TopologyDescriptionChangedEvent's - previousDescription field -:2024-01-17: Updated to require that ``TopologyDescriptionChangedEvent`` should be emitted before just ``TopologyClosedEvent`` is emitted -:2024-01-04: Updated to clarify when ServerHeartbeatStartedEvent should be emitted -:2023-03-31: Renamed to include "logging" in the title. Reorganized contents and made consistent with CLAM spec, and added requirements - for SDAM log messages. -:2022-10-05: Remove spec front matter and reformat changelog. -:2021-05-06: Updated to use modern terminology. -:2020-04-20: Add rules for streaming heartbeat protocol and add "awaited" field to heartbeat events. -:2018:12-12: Clarified table of rules for readable/writable servers -:2016-08-31: Added table of rules for determining if topology has readable/writable servers. -:2016-10-11: TopologyDescription objects MAY have additional methods and properties. - ----- - -.. Section for links. - -.. _Server Discovery And Monitoring: server-discovery-and-monitoring.rst +.. note:: + This specification has been converted to Markdown and renamed to + `server-discovery-and-monitoring-logging-and-monitoring.md `_. diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.md new file mode 100644 index 0000000000..be72be9519 --- /dev/null +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.md @@ -0,0 +1,143 @@ +# Server Discovery And Monitoring -- Summary + +- Status: Accepted +- Minimum Server Version: 2.4 + +______________________________________________________________________ + +## Abstract + +This spec defines how a MongoDB client discovers and monitors one or more servers. It covers monitoring a single server, +a set of mongoses, or a replica set. How does the client determine what type of servers they are? How does it keep this +information up to date? How does the client find an entire replica set from a seed list, and how does it respond to a +stepdown, election, reconfiguration, or network error? + +All drivers must answer these questions the same. Or, where platforms' limitations require differences among drivers, +there must be as few answers as possible and each must be clearly explained in this spec. Even in cases where several +answers seem equally good, drivers must agree on one way to do it. + +The server discovery and monitoring method is specified in five sections. First, a client is constructed. Second, it +begins monitoring the topology by calling `hello` or legacy hello on all servers. (Multi-threaded and asynchronous +monitoring is described first, then single-threaded monitoring.) Third, as `hello` or legacy hello responses are +received the client parses them, and fourth, it updates its view of the topology. Finally, this spec describes how +drivers update their topology view in response to errors. + +This spec does not describe how a client selects a server for an operation; that is the domain of the specs for Mongos +High Availability and for Read Preferences. There is no discussion of driver architecture and data structures, nor is +there any specification of a user-facing API. This spec is only concerned with the algorithm for monitoring the +topology. + +The Java driver 2.12.1 is the reference implementation for multi-threaded drivers. Mongos 2.6's replica set monitor is +the reference implementation for single-threaded drivers, with a few differences. + +## General Requirements + +A client MUST be able to connect to a single server of any type. This includes querying hidden replica set members, and +connecting to uninitialized members in order to run "replSetInitiate". + +A client MUST be able to discover an entire replica set from a seed list containing one or more replica set members. It +MUST be able to continue monitoring the replica set even when some members go down, or when reconfigs add and remove +members. A client MUST be able to connect to a replica set while there is no primary, or while the primary is down. + +A client MUST be able to connect to a set of mongoses and monitor their availability and round trip time. This spec +defines how mongoses are discovered and monitored, but does not define which mongos is selected for a given operation. + +Multi-threaded or asynchronous clients MUST unblock waiting operations as soon as a suitable server is found, rather +than waiting for all servers to be checked. For example, if the client is discovering a replica set and the application +attempts a write operation, the write operation MUST proceed as soon as the primary is found, rather than blocking until +the client has checked all members. + +## Client Construction + +This spec does not intend to require any drivers to make breaking changes regarding what configuration options are +available, how options are named, or what combinations of options are allowed. + +A multi-threaded or asynchronous client's constructor MUST NOT do any I/O. The constructor MAY start the monitors as +background tasks, or they MAY be started by some "initialize" method (by any name), or on the first use of the client +for an operation. This means that the constructor does not throw an exception if the deployment is unavailable: Instead, +all subsequent operations on the client fail as long as the error persists. + +The justification is that if clients can be constructed when the deployment is in some states but not in other states, +it leads to an unfortunate scenario: When the deployment is passing through a strange state, long-running applications +may keep working, but any applications restarted during this period fail. + +Additionally, since asynchronous clients cannot do I/O in a constructor, it is consistent to prohibit I/O in other +clients' constructors as well. + +Single-threaded clients also MUST NOT do I/O in the constructor. They scan the servers on demand, when the first +operation is attempted. Thus they behave consistently with multi-threaded and asynchronous clients. + +## Monitoring + +The client monitors servers by checking them periodically, or after an error indicates that the client's view of the +topology is wrong, or when no suitable server is found for a write or a read. + +Drivers differ from Mongos 2.6 in two respects. First, if a client frequently rechecks a server, it MUST wait at least +10 ms since the previous check to avoid excessive effort. + +Second, Mongos 2.6 does not monitor arbiters, but drivers MUST do so. It costs little, and in rare cases an arbiter may +be the client's last hope to find the new replica set configuration. + +### Multi-threaded or asynchronous monitoring + +All servers' monitors run independently, in parallel: If some monitors block calling `hello` or legacy hello over slow +connections, other monitors MUST proceed unimpeded. The natural implementation is a thread per server, but the decision +is left to the implementer. + +Multi-threaded and asynchronous drivers MUST call `hello` or legacy hello on servers every 10 seconds by default. (10 +seconds is Mongos's frequency.) This frequency MAY be configurable. + +### Single-threaded monitoring + +Single-threaded clients MUST scan all servers synchronously, inline with regular application operations. For +single-threaded drivers the default frequency MUST be 60 seconds and MUST be configurable. + +If the topology is a replica set, a single-threaded client attempts to contact the primary as soon as possible to get an +authoritative list of members. Otherwise, the client attempts to check all members it knows of, in order from the +least-recently to the most-recently checked. The scanning order is described completely in the spec. + +## Parsing `hello` or legacy hello + +The full algorithm for determining server type from a `hello` or legacy hello response is specified and test cases are +provided. + +Drivers MUST record the server's round trip time after each successful call to `hello` or legacy hello. How round trip +times are averaged is not in this spec's scope. + +## Updating the Topology View + +After each attempt to call `hello` or legacy hello on a server, the client updates its topology view. Initial topology +discovery and long-running monitoring are both specified by the same detailed algorithm. + +When monitoring a replica set, the client strives to use only the servers that the primary says are members. While there +is no known primary, the client MUST add servers from non-primaries' host lists, but it MUST NOT remove servers. +Eventually, when a primary is discovered, any hosts not in the primary's host list are removed from the client's view of +the topology. + +The client MUST NOT use replica set members' "setVersion" to detect reconfigs, since race conditions with setVersion +make it inferior to simply trusting the primary. + +## Error handling + +When an application operation fails because of any network error besides a socket timeout, the client MUST mark the +server "down". The server will eventually be re-checked by periodic monitoring. The specific operation that discovered +the error MUST abort and raise an exception if it was a write. It MAY be retried if it was a read. (The Read Preferences +spec includes retry rules for reads.) + +If a monitor's `hello` or legacy hello call fails on a server, the behavior is different from a failed application +operation. The `hello` or legacy hello call is retried once, immediately, before the server is marked "down". + +In either case the client MUST clear its connection pool for the server: if one socket is bad, it is likely that all +are. + +An algorithm is specified for inspecting error codes (MongoDB 3.6+) and falling back to parsing error messages when +error codes are unavailable (MongoDB 3.4 and earlier). When the client sees such an error it knows its topology view is +out of date. It MUST mark the server type "unknown." Multi-threaded and asynchronous clients MUST re-check the server +soon, and single-threaded clients MUST request a scan before the next operation. The client MUST clear its connection +pool for the server if the server is 4.0 or earlier, and MUST NOT clear its connection pool for the server if the server +is 4.2 or later. + +## Changelog + +- 2024-05-02: Migrated from reStructuredText to Markdown. +- 2022-10-05: Revise spec front matter and add changelog. diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.rst b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.rst index 50f1b0d9ea..de235231e0 100644 --- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.rst +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-summary.rst @@ -1,223 +1,4 @@ -========================================== -Server Discovery And Monitoring -- Summary -========================================== -:Status: Accepted -:Minimum Server Version: 2.4 - -.. contents:: - --------- - -Abstract --------- - -This spec defines how a MongoDB client discovers and monitors one or more servers. -It covers monitoring a single server, a set of mongoses, or a replica set. -How does the client determine what type of servers they are? -How does it keep this information up to date? -How does the client find an entire replica set from a seed list, -and how does it respond to a stepdown, election, reconfiguration, or network error? - -All drivers must answer these questions the same. -Or, where platforms' limitations require differences among drivers, -there must be as few answers as possible and each must be clearly explained in this spec. -Even in cases where several answers seem equally good, drivers must agree on one way to do it. - -The server discovery and monitoring method is specified in five sections. -First, a client is constructed. -Second, it begins monitoring the topology by calling ``hello`` or legacy hello on all servers. -(Multi-threaded and asynchronous monitoring is described first, -then single-threaded monitoring.) -Third, as ``hello`` or legacy hello responses are received -the client parses them, -and fourth, it updates its view of the topology. -Finally, this spec describes how drivers update their topology view -in response to errors. - -This spec does not describe how a client selects a server for an operation; -that is the domain of the specs for Mongos High Availability -and for Read Preferences. -There is no discussion of driver architecture and data structures, -nor is there any specification of a user-facing API. -This spec is only concerned with the algorithm for monitoring the topology. - -The Java driver 2.12.1 is the reference implementation -for multi-threaded drivers. -Mongos 2.6's replica set monitor -is the reference implementation for single-threaded drivers, -with a few differences. - -General Requirements --------------------- - -A client MUST be able to connect to a single server of any type. -This includes querying hidden replica set members, -and connecting to uninitialized members in order to run -"replSetInitiate". - -A client MUST be able to discover an entire replica set from -a seed list containing one or more replica set members. -It MUST be able to continue monitoring the replica set -even when some members go down, -or when reconfigs add and remove members. -A client MUST be able to connect to a replica set -while there is no primary, or while the primary is down. - -A client MUST be able to connect to a set of mongoses -and monitor their availability and round trip time. -This spec defines how mongoses are discovered and monitored, -but does not define which mongos is selected for a given operation. - -Multi-threaded or asynchronous clients -MUST unblock waiting operations -as soon as a suitable server is found, -rather than waiting for all servers to be checked. -For example, if the client is discovering a replica set -and the application attempts a write operation, -the write operation MUST proceed as soon as the primary is found, -rather than blocking until the client has checked all members. - -Client Construction -------------------- - -This spec does not intend -to require any drivers to make breaking changes regarding -what configuration options are available, -how options are named, -or what combinations of options are allowed. - -A multi-threaded or asynchronous client's constructor MUST NOT do any I/O. -The constructor MAY start the monitors as background tasks, -or they MAY be started by some "initialize" method (by any name), -or on the first use of the client for an operation. -This means that the constructor does not throw an exception -if the deployment is unavailable: -Instead, all subsequent operations on the client fail -as long as the error persists. - -The justification is that -if clients can be constructed when the deployment is in some states -but not in other states, -it leads to an unfortunate scenario: -When the deployment is passing through a strange state, -long-running applications may keep working, -but any applications restarted during this period fail. - -Additionally, since asynchronous clients cannot do I/O in a constructor, -it is consistent to prohibit I/O in other clients' constructors as well. - -Single-threaded clients also MUST NOT do I/O in the constructor. -They scan the servers on demand, -when the first operation is attempted. -Thus they behave consistently with multi-threaded and asynchronous clients. - -Monitoring ----------- - -The client monitors servers by checking them periodically, -or after an error indicates that the client's view of the topology is wrong, -or when no suitable server is found for a write or a read. - -Drivers differ from Mongos 2.6 in two respects. First, -if a client frequently rechecks a server, -it MUST wait at least 10 ms -since the previous check to avoid excessive effort. - -Second, Mongos 2.6 does not monitor arbiters, but drivers MUST do so. -It costs little, and in rare cases an arbiter may be the client's last hope -to find the new replica set configuration. - -Multi-threaded or asynchronous monitoring -''''''''''''''''''''''''''''''''''''''''' - -All servers' monitors run independently, in parallel: -If some monitors block calling ``hello`` or legacy hello over slow connections, -other monitors MUST proceed unimpeded. -The natural implementation is a thread per server, -but the decision is left to the implementer. - -Multi-threaded and asynchronous drivers -MUST call ``hello`` or legacy hello on servers every 10 seconds by default. -(10 seconds is Mongos's frequency.) -This frequency MAY be configurable. - -Single-threaded monitoring -'''''''''''''''''''''''''' - -Single-threaded clients MUST scan all servers synchronously, -inline with regular application operations. -For single-threaded drivers the default frequency MUST be 60 seconds -and MUST be configurable. - -If the topology is a replica set, -a single-threaded client attempts to contact the primary as soon as possible -to get an authoritative list of members. -Otherwise, the client attempts to check all members it knows of, -in order from the least-recently to the most-recently checked. -The scanning order is described completely in the spec. - -Parsing ``hello`` or legacy hello ---------------------------------- - -The full algorithm for determining server type from a ``hello`` or legacy hello response -is specified and test cases are provided. - -Drivers MUST record the server's round trip time -after each successful call to ``hello`` or legacy hello. -How round trip times are averaged is not in this spec's scope. - -Updating the Topology View --------------------------- - -After each attempt to call ``hello`` or legacy hello on a server, -the client updates its topology view. -Initial topology discovery and long-running monitoring -are both specified by the same detailed algorithm. - -When monitoring a replica set, -the client strives to use only the servers that the primary says are members. -While there is no known primary, -the client MUST add servers from non-primaries' host lists, -but it MUST NOT remove servers. -Eventually, when a primary is discovered, any hosts not in the primary's host -list are removed from the client's view of the topology. - -The client MUST NOT use replica set members' "setVersion" -to detect reconfigs, since race conditions with setVersion -make it inferior to simply trusting the primary. - -Error handling --------------- - -When an application operation fails because of -any network error besides a socket timeout, -the client MUST mark the server "down". -The server will eventually be re-checked by periodic monitoring. -The specific operation that discovered the error -MUST abort and raise an exception if it was a write. -It MAY be retried if it was a read. -(The Read Preferences spec includes retry rules for reads.) - -If a monitor's ``hello`` or legacy hello call fails on a server, -the behavior is different from a failed application operation. -The ``hello`` or legacy hello call is retried once, immediately, -before the server is marked "down". - -In either case the client MUST clear its connection pool for the server: -if one socket is bad, it is likely that all are. - -An algorithm is specified for inspecting error codes (MongoDB 3.6+) and -falling back to parsing error messages when error codes are unavailable (MongoDB 3.4 and earlier). -When the client sees such an error it knows its topology view is out of date. -It MUST mark the server type "unknown." -Multi-threaded and asynchronous clients MUST re-check the server soon, -and single-threaded clients MUST request a scan before the next operation. -The client MUST clear its connection pool for the server if the -server is 4.0 or earlier, and MUST NOT clear its connection pool for the -server if the server is 4.2 or later. - -Changelog ---------- - -:2022-10-05: Revise spec front matter and add changelog. +.. note:: + This specification has been converted to Markdown and renamed to + `server-discovery-and-monitoring-summary.md `_. diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md new file mode 100644 index 0000000000..b542844894 --- /dev/null +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md @@ -0,0 +1,186 @@ +# Server Discovery And Monitoring -- Test Plan + +- Status: Accepted + +\- Minimum Server Version: 2.4 See also the YAML test files and their accompanying README in the "tests" directory. + +______________________________________________________________________ + +## All servers unavailable + +A MongoClient can be constructed without an exception, even with all seeds unavailable. + +## Network error writing to primary + +Scenario: With TopologyType ReplicaSetWithPrimary, a write to the primary fails with a network error other than timeout. + +Outcome: The former primary's ServerType MUST become Unknown. The TopologyType MUST change to ReplicaSetNoPrimary. The +client MUST NOT immediately re-check the former primary. + +## "Not writable primary" error when reading without SecondaryOk bit + +Scenario: With TopologyType ReplicaSetWithPrimary, we read from a server we thought was RSPrimary. Thus the SecondaryOk +bit is not set. + +The server response should indicate an error due to the server not being a primary. + +Outcome: The former primary's ServerType MUST become Unknown. The TopologyType MUST change to ReplicaSetNoPrimary. The +client MUST NOT immediately re-check the former primary. + +## "Node is recovering" error reading with SecondaryOk bit + +Scenario: With TopologyType ReplicaSetWithPrimary, we read from a server we thought was RSSecondary. Thus the +SecondaryOk bit *is* set. + +The server response should indicate an error due to the server being in recovering state. + +Outcome: The former secondary's ServerType MUST become Unknown. The TopologyType MUST remain ReplicaSetWithPrimary. A +multi-threaded client MUST immediately re-check the former secondary, a single-threaded client MUST NOT. + +## "Node is recovering" error from a write concern error + +Scenario: With TopologyType ReplicaSetWithPrimary, a write to the primary responds with the following document: + +> { ok: 1, writeConcernError: {code: 91, errmsg: "Replication is being shut down"} } + +Outcome: The former primary's ServerType MUST become Unknown. The TopologyType MUST change to ReplicaSetNoPrimary. A +multi-threaded client MUST immediately re-check the former secondary, a single-threaded client MUST NOT. + +## Parsing "not writable primary" and "node is recovering" errors + +For all these example responses, the client MUST mark the server "Unknown" and store the error message in the +ServerDescription's error field. + +Clients MUST NOT depend on any particular field order in these responses. + +### getLastError + +GLE response after OP_INSERT on an arbiter, secondary, recovering member, or ghost: + +> {ok: 1, err: "not writable primary"} + +[Possible GLE response in MongoDB 2.6](https://jira.mongodb.org/browse/SERVER-9617) during failover: + +> {ok: 1, err: "replicatedToNum called but not master anymore"} + +Note that this error message contains "not master" but does not start with it. + +### Write command + +Response to an "insert" command on an arbiter, secondary, recovering member, or ghost: + +> {ok: 0, errmsg: "not writable primary"} + +### Query with SecondaryOk bit + +Response from an arbiter, recovering member, or ghost when SecondaryOk is true: + +> {$err: "not primary or secondary; cannot currently read from this replSet member"} + +The QueryFailure bit is set in responseFlags. + +### Query without SecondaryOk bit + +Response from an arbiter, recovering member, ghost, or secondary when SecondaryOk is false: + +> {$err: "not writable primary and SecondaryOk=false"} + +The QueryFailure bit is set in responseFlags. + +### Count with SecondaryOk bit + +Command response on an arbiter, recovering member, or ghost when SecondaryOk is true: + +> {ok: 0, errmsg: "node is recovering"} + +### Count without SecondaryOk bit + +Command response on an arbiter, recovering member, ghost, or secondary when SecondaryOk is false: + +> {ok: 0, errmsg: "not writable primary"} + +## Topology discovery and direct connection + +### Topology discovery + +Scenario: given a replica set deployment with a secondary, where HOST is the address of the secondary, create a +MongoClient using `mongodb://HOST/?directConnection=false` as the URI. Attempt a write to a collection. + +Outcome: Verify that the write succeeded. + +### Direct connection + +Scenario: given a replica set deployment with a secondary, where HOST is the address of the secondary, create a +MongoClient using `mongodb://HOST/?directConnection=true` as the URI. Attempt a write to a collection. + +Outcome: Verify that the write failed with a NotWritablePrimary error. + +### Existing behavior + +Scenario: given a replica set deployment with a secondary, where HOST is the address of the secondary, create a +MongoClient using `mongodb://HOST/` as the URI. Attempt a write to a collection. + +Outcome: Verify that the write succeeded or failed depending on existing driver behavior with respect to the starting +topology. + +## Monitors sleep at least minHeartbeatFrequencyMS between checks + +This test will be used to ensure monitors sleep for an appropriate amount of time between failed server checks so as to +not flood the server with new connection creations. + +This test requires MongoDB 4.9.0+. + +1. Enable the following failpoint: + + ``` + { + configureFailPoint: "failCommand", + mode: { times: 5 }, + data: { + failCommands: ["hello"], // or legacy hello command + errorCode: 1234, + appName: "SDAMMinHeartbeatFrequencyTest" + } + } + ``` + +2. Create a client with directConnection=true, appName="SDAMMinHeartbeatFrequencyTest", and + serverSelectionTimeoutMS=5000. + +3. Start a timer. + +4. Execute a `ping` command. + +5. Stop the timer. Assert that the `ping` took between 2 seconds and 3.5 seconds to complete. + +## Connection Pool Management + +This test will be used to ensure monitors properly create and unpause connection pools when they discover servers. + +This test requires failCommand appName support which is only available in MongoDB 4.2.9+. + +1. Create a client with directConnection=true, appName="SDAMPoolManagementTest", and heartbeatFrequencyMS=500 (or lower + if possible). + +2. Verify via SDAM and CMAP event monitoring that a ConnectionPoolReadyEvent occurs after the first + ServerHeartbeatSucceededEvent event does. + +3. Enable the following failpoint: + + ``` + { + configureFailPoint: "failCommand", + mode: { times: 2 }, + data: { + failCommands: ["hello"], // or legacy hello command + errorCode: 1234, + appName: "SDAMPoolManagementTest" + } + } + ``` + +4. Verify that a ServerHeartbeatFailedEvent and a ConnectionPoolClearedEvent (CMAP) are emitted. + +5. Then verify that a ServerHeartbeatSucceededEvent and a ConnectionPoolReadyEvent (CMAP) are emitted. + +6. Disable the failpoint. diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.rst b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.rst index 39bb671533..635ffb3c80 100644 --- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.rst +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.rst @@ -1,235 +1,4 @@ -============================================ -Server Discovery And Monitoring -- Test Plan -============================================ -:Status: Accepted -:Minimum Server Version: 2.4 - -See also the YAML test files and their accompanying README in the "tests" -directory. - -.. contents:: - --------- - -All servers unavailable ------------------------ - -A MongoClient can be constructed without an exception, -even with all seeds unavailable. - -Network error writing to primary --------------------------------- - -Scenario: With TopologyType ReplicaSetWithPrimary, a write to the primary fails -with a network error other than timeout. - -Outcome: The former primary's ServerType MUST become Unknown. -The TopologyType MUST change to ReplicaSetNoPrimary. -The client MUST NOT immediately re-check the former primary. - -"Not writable primary" error when reading without SecondaryOk bit ------------------------------------------------------------------ - -Scenario: With TopologyType ReplicaSetWithPrimary, we read from a server we -thought was RSPrimary. Thus the SecondaryOk bit is not set. - -The server response should indicate an error due to the server not being a primary. - -Outcome: The former primary's ServerType MUST become Unknown. -The TopologyType MUST change to ReplicaSetNoPrimary. -The client MUST NOT immediately re-check the former primary. - -"Node is recovering" error reading with SecondaryOk bit -------------------------------------------------------- - -Scenario: With TopologyType ReplicaSetWithPrimary, we read from a server we -thought was RSSecondary. Thus the SecondaryOk bit *is* set. - -The server response should indicate an error due to the server being in recovering state. - -Outcome: The former secondary's ServerType MUST become Unknown. -The TopologyType MUST remain ReplicaSetWithPrimary. -A multi-threaded client MUST immediately re-check the former secondary, -a single-threaded client MUST NOT. - -"Node is recovering" error from a write concern error ------------------------------------------------------ - -Scenario: With TopologyType ReplicaSetWithPrimary, a write to the primary responds -with the following document: - - { ok: 1, writeConcernError: {code: 91, errmsg: "Replication is being shut down"} } - -Outcome: The former primary's ServerType MUST become Unknown. -The TopologyType MUST change to ReplicaSetNoPrimary. -A multi-threaded client MUST immediately re-check the former secondary, -a single-threaded client MUST NOT. - -Parsing "not writable primary" and "node is recovering" errors --------------------------------------------------------------- - -For all these example responses, -the client MUST mark the server "Unknown" -and store the error message in the ServerDescription's error field. - -Clients MUST NOT depend on any particular field order in these responses. - -getLastError -'''''''''''' - -GLE response after OP_INSERT on an arbiter, secondary, recovering member, or ghost: - - {ok: 1, err: "not writable primary"} - -`Possible GLE response in MongoDB 2.6`_ during failover: - - {ok: 1, err: "replicatedToNum called but not master anymore"} - -Note that this error message contains "not master" but does not start with it. - -.. _Possible GLE response in MongoDB 2.6: https://jira.mongodb.org/browse/SERVER-9617 - -Write command -''''''''''''' - -Response to an "insert" command on an arbiter, secondary, recovering member, or ghost: - - {ok: 0, errmsg: "not writable primary"} - -Query with SecondaryOk bit -'''''''''''''''''''''''''' - -Response from an arbiter, recovering member, or ghost -when SecondaryOk is true: - - {$err: "not primary or secondary; cannot currently read from this replSet member"} - -The QueryFailure bit is set in responseFlags. - -Query without SecondaryOk bit -''''''''''''''''''''''''''''' - -Response from an arbiter, recovering member, ghost, or secondary -when SecondaryOk is false: - - {$err: "not writable primary and SecondaryOk=false"} - -The QueryFailure bit is set in responseFlags. - -Count with SecondaryOk bit -'''''''''''''''''''''''''' - -Command response on an arbiter, recovering member, or ghost -when SecondaryOk is true: - - {ok: 0, errmsg: "node is recovering"} - -Count without SecondaryOk bit -''''''''''''''''''''''''''''' - -Command response on an arbiter, recovering member, ghost, or secondary -when SecondaryOk is false: - - {ok: 0, errmsg: "not writable primary"} - - -Topology discovery and direct connection ----------------------------------------- - -Topology discovery -'''''''''''''''''' - -Scenario: given a replica set deployment with a secondary, where HOST -is the address of the secondary, create a MongoClient using -``mongodb://HOST/?directConnection=false`` as the URI. -Attempt a write to a collection. - -Outcome: Verify that the write succeeded. - -Direct connection -''''''''''''''''' - -Scenario: given a replica set deployment with a secondary, where HOST -is the address of the secondary, create a MongoClient using -``mongodb://HOST/?directConnection=true`` as the URI. -Attempt a write to a collection. - -Outcome: Verify that the write failed with a NotWritablePrimary error. - -Existing behavior -''''''''''''''''' - -Scenario: given a replica set deployment with a secondary, where HOST -is the address of the secondary, create a MongoClient using -``mongodb://HOST/`` as the URI. -Attempt a write to a collection. - -Outcome: Verify that the write succeeded or failed depending on existing -driver behavior with respect to the starting topology. - -Monitors sleep at least minHeartbeatFrequencyMS between checks --------------------------------------------------------------- - -This test will be used to ensure monitors sleep for an appropriate amount of -time between failed server checks so as to not flood the server with new -connection creations. - -This test requires MongoDB 4.9.0+. - -1. Enable the following failpoint:: - - { - configureFailPoint: "failCommand", - mode: { times: 5 }, - data: { - failCommands: ["hello"], // or legacy hello command - errorCode: 1234, - appName: "SDAMMinHeartbeatFrequencyTest" - } - } - -2. Create a client with directConnection=true, appName="SDAMMinHeartbeatFrequencyTest", and - serverSelectionTimeoutMS=5000. - -3. Start a timer. - -4. Execute a ``ping`` command. - -5. Stop the timer. Assert that the ``ping`` took between 2 seconds and 3.5 - seconds to complete. - -Connection Pool Management --------------------------- - -This test will be used to ensure monitors properly create and unpause connection -pools when they discover servers. - -This test requires failCommand appName support which is only available in -MongoDB 4.2.9+. - -1. Create a client with directConnection=true, appName="SDAMPoolManagementTest", - and heartbeatFrequencyMS=500 (or lower if possible). - -2. Verify via SDAM and CMAP event monitoring that a ConnectionPoolReadyEvent occurs - after the first ServerHeartbeatSucceededEvent event does. - -3. Enable the following failpoint:: - - { - configureFailPoint: "failCommand", - mode: { times: 2 }, - data: { - failCommands: ["hello"], // or legacy hello command - errorCode: 1234, - appName: "SDAMPoolManagementTest" - } - } - -4. Verify that a ServerHeartbeatFailedEvent and a ConnectionPoolClearedEvent (CMAP) are - emitted. - -5. Then verify that a ServerHeartbeatSucceededEvent and a ConnectionPoolReadyEvent (CMAP) - are emitted. - -6. Disable the failpoint. +.. note:: + This specification has been converted to Markdown and renamed to + `server-discovery-and-monitoring-tests.md `_. diff --git a/source/server-discovery-and-monitoring/server-monitoring.md b/source/server-discovery-and-monitoring/server-monitoring.md new file mode 100644 index 0000000000..2e30583a4d --- /dev/null +++ b/source/server-discovery-and-monitoring/server-monitoring.md @@ -0,0 +1,994 @@ +# Server Monitoring + +- Status: Accepted +- Minimum Server Version: 2.4 + +______________________________________________________________________ + +## Abstract + +This spec defines how a driver monitors a MongoDB server. In summary, the client monitors each server in the topology. +The scope of server monitoring is to provide the topology with updated ServerDescriptions based on hello or legacy hello +command responses. + +## META + +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Specification + +### Terms + +See the terms in the [main SDAM spec](server-discovery-and-monitoring.rst). + +#### check + +The client checks a server by attempting to call hello or legacy hello on it, and recording the outcome. + +#### client + +A process that initiates a connection to a MongoDB server. This includes mongod and mongos processes in a replica set or +sharded cluster, as well as drivers, the shell, tools, etc. + +#### scan + +The process of checking all servers in the deployment. + +#### suitable + +A server is judged "suitable" for an operation if the client can use it for a particular operation. For example, a write +requires a standalone, primary, or mongos. Suitability is fully specified in the +[Server Selection Spec](../server-selection/server-selection.md). + +#### significant topology change + +A change in the server's state that is relevant to the client's view of the server, e.g. a change in the server's +replica set member state, or its replica set tags. In SDAM terms, a significant topology change on the server means the +client's ServerDescription is out of date. Standalones and mongos do not currently experience significant topology +changes but they may in the future. + +#### regular hello or legacy hello command + +A default `{hello: 1}` or legacy hello command where the server responds immediately. + +#### streamable hello or legacy hello command + +The hello or legacy hello command feature which allows the server to stream multiple replies back to the client. + +#### RTT + +Round trip time. The client's measurement of the duration of one hello or legacy hello call. The RTT is used to support +[localThresholdMS](../server-selection/server-selection.md#localThresholdMS) from the Server Selection spec and +[timeoutMS](../client-side-operations-timeout/client-side-operations-timeout.md#timeoutMS) from the +[Client Side Operations Timeout Spec](../client-side-operations-timeout/client-side-operations-timeout.md). + +#### FaaS + +A Function-as-a-Service (FaaS) environment like AWS Lambda. + +#### serverMonitoringMode + +The serverMonitoringMode option configures which server monitoring protocol to use. Valid modes are "stream", "poll", or +"auto". The default value MUST be "auto": + +- With "stream" mode, the client MUST use the streaming protocol when the server supports it or fall back to the polling + protocol otherwise. +- With "poll" mode, the client MUST use the polling protocol. +- With "auto" mode, the client MUST behave the same as "poll" mode when running on a FaaS platform or the same as + "stream" mode otherwise. The client detects that it's running on a FaaS platform via the same rules for generating the + `client.env` handshake metadata field in the [MongoDB Handshake spec](../mongodb-handshake/handshake.rst#client-env). + +Multi-threaded or asynchronous drivers MUST implement this option. See +[Why disable the streaming protocol on FaaS platforms like AWS Lambda?](#why-disable-the-streaming-protocol-on-faas-platforms-like-aws-lambda) +and [Why introduce a knob for serverMonitoringMode?](#why-introduce-a-knob-for-servermonitoringmode) + +### Monitoring + +The client monitors servers using the hello or legacy hello commands. In MongoDB 4.4+, a monitor uses the +[Streaming Protocol](#streaming-protocol) to continuously stream hello or legacy hello responses from the server. In +MongoDB \<= 4.2, a monitor uses the [Polling Protocol](#polling-protocol) pausing heartbeatFrequencyMS between +[checks](#check). Clients check servers sooner in response to certain events. + +If a [server API version](../versioned-api/versioned-api.rst) is requested, then the driver must use hello for +monitoring. If a server API version is not requested, the initial handshake using the legacy hello command must include +`helloOk: true`. If the response contains `helloOk: true`, then the driver must use the `hello` command for monitoring. +If the response does not contain `helloOk: true`, then the driver must use the legacy hello command for monitoring. + +The socket used to check a server MUST use the same +[connectTimeoutMS](https://www.mongodb.com/docs/manual/reference/connection-string/) as regular sockets. Multi-threaded +clients SHOULD set monitoring sockets' socketTimeoutMS to the connectTimeoutMS. (See +[socket timeout for monitoring is connectTimeoutMS](#socket-timeout-for-monitoring-is-connecttimeoutms). Drivers MAY let +users configure the timeouts for monitoring sockets separately if necessary to preserve backwards compatibility.) + +The client begins monitoring a server when: + +- ... the client is initialized and begins monitoring each seed. See + [initial servers](server-discovery-and-monitoring.rst#initial-servers). +- ... [updateRSWithoutPrimary](server-discovery-and-monitoring.rst#updateRSWithoutPrimary) or + [updateRSFromPrimary](server-discovery-and-monitoring.rst#updateRSFromPrimary) discovers new replica set members. + +The following subsections specify how monitoring works, first in multi-threaded or asynchronous clients, and second in +single-threaded clients. This spec provides detailed requirements for monitoring because it intends to make all drivers +behave consistently. + +#### Multi-threaded or asynchronous monitoring + +##### Servers are monitored in parallel + +All servers' monitors run independently, in parallel: If some monitors block calling hello or legacy hello over slow +connections, other monitors MUST proceed unimpeded. + +The natural implementation is a thread per server, but the decision is left to the implementer. (See +[thread per server](#thread-per-server).) + +##### Servers are monitored with dedicated sockets + +[A monitor SHOULD NOT use the client's regular connection pool](#a-monitor-should-not-use-the-clients-regular-connection-pool) +to acquire a socket; it uses a dedicated socket that does not count toward the pool's maximum size. + +Drivers MUST NOT authenticate on sockets used for monitoring nor include SCRAM mechanism negotiation (i.e. +`saslSupportedMechs`), as doing so would make monitoring checks more expensive for the server. + +##### Servers are checked periodically + +Each monitor [checks](#check) its server and notifies the client of the outcome so the client can update the +TopologyDescription. + +After each check, the next check SHOULD be scheduled [heartbeatFrequencyMS](#heartbeatfrequencyms) later; a check MUST +NOT run while a previous check is still in progress. + +##### Requesting an immediate check + +At any time, the client can request that a monitor check its server immediately. (For example, after a "not writable +primary" error. See [error handling](server-discovery-and-monitoring.rst#error-handling).) If the monitor is sleeping +when this request arrives, it MUST wake and check as soon as possible. If a hello or legacy hello call is already in +progress, the request MUST be ignored. If the previous check ended less than +[minHeartbeatFrequencyMS](#minheartbeatfrequencyms) ago, the monitor MUST sleep until the minimum delay has passed, then +check the server. + +##### Application operations are unblocked when a server is found + +Each time a check completes, threads waiting for a [suitable](#suitable) server are unblocked. Each unblocked thread +MUST proceed if the new TopologyDescription now contains a suitable server. + +##### Clients update the topology from each handshake + +When a monitor check creates a new connection, the [connection handshake](../mongodb-handshake/handshake.rst) response +MUST be used to satisfy the check and update the topology. + +When a client successfully calls hello or legacy hello to handshake a new connection for application operations, it +SHOULD use the hello or legacy hello reply to update the ServerDescription and TopologyDescription, the same as with a +hello or legacy hello reply on a monitoring socket. If the hello or legacy hello call fails, the client SHOULD mark the +server Unknown and update its TopologyDescription, the same as a failed server check on monitoring socket. + +##### Clients use the streaming protocol when supported + +When a monitor discovers that the server supports the streamable hello or legacy hello command and the client does not +have [streaming disabled](#streaming-disabled), it MUST use the [streaming protocol](#streaming-protocol). + +#### Single-threaded monitoring + +##### cooldownMS + +After a single-threaded client gets a network error trying to [check](#check) a server, the client skips re-checking the +server until cooldownMS has passed. + +This avoids spending connectTimeoutMS on each unavailable server during each scan. + +This value MUST be 5000 ms, and it MUST NOT be configurable. + +##### Scanning + +Single-threaded clients MUST [scan](#scan) all servers synchronously, inline with regular application operations. Before +each operation, the client checks if [heartbeatFrequencyMS](#heartbeatfrequencyms) has passed since the previous scan +ended, or if the topology is marked "stale"; if so it scans all the servers before selecting a server and performing the +operation. + +Selection failure triggers an immediate scan. When a client that uses single-threaded monitoring fails to select a +suitable server for any operation, it [scans](#scan) the servers, then attempts selection again, to see if the scan +discovered suitable servers. It repeats, waiting [minHeartbeatFrequencyMS](#minheartbeatfrequencyms) after each scan, +until a timeout. + +##### Scanning order + +If the topology is a replica set, the client attempts to contact the primary as soon as possible to get an authoritative +list of members. Otherwise, the client attempts to check all members it knows of, in order from the least-recently to +the most-recently checked. + +When all servers have been checked the scan is complete. New servers discovered **during** the scan MUST be checked +before the scan is complete. Sometimes servers are removed during a scan so they are not checked, depending on the order +of events. + +The scanning order is expressed in this pseudocode: + +``` +scanStartTime = now() +# You'll likely need to convert units here. +beforeCoolDown = scanStartTime - cooldownMS + +while true: + serversToCheck = all servers with lastUpdateTime before scanStartTime + + remove from serversToCheck any Unknowns with lastUpdateTime > beforeCoolDown + + if no serversToCheck: + # This scan has completed. + break + + if a server in serversToCheck is RSPrimary: + check it + else if there is a PossiblePrimary: + check it + else if any servers are not of type Unknown or RSGhost: + check the one with the oldest lastUpdateTime + if several servers have the same lastUpdateTime, choose one at random + else: + check the Unknown or RSGhost server with the oldest lastUpdateTime + if several servers have the same lastUpdateTime, choose one at random +``` + +This algorithm might be better understood with an example: + +1. The client is configured with one seed and TopologyType Unknown. It begins a scan. +2. When it checks the seed, it discovers a secondary. +3. The secondary's hello or legacy hello response includes the "primary" field with the address of the server that the + secondary thinks is primary. +4. The client creates a ServerDescription with that address, type PossiblePrimary, and lastUpdateTime "infinity ago". + (See [updateRSWithoutPrimary](server-discovery-and-monitoring.rst#updateRSWithoutPrimary).) +5. On the next iteration, there is still no RSPrimary, so the new PossiblePrimary is the top-priority server to check. +6. The PossiblePrimary is checked and replaced with an RSPrimary. The client has now acquired an authoritative host + list. Any new hosts in the list are added to the TopologyDescription with lastUpdateTime "infinity ago". (See + [updateRSFromPrimary](server-discovery-and-monitoring.rst#updateRSFromPrimary).) +7. The client continues scanning until all known hosts have been checked. + +Another common case might be scanning a pool of mongoses. When the client first scans its seed list, they all have the +default lastUpdateTime "infinity ago", so it scans them in random order. This randomness provides some load-balancing if +many clients start at once. A client's subsequent scans of the mongoses are always in the same order, since their +lastUpdateTimes are always in the same order by the time a scan ends. + +#### minHeartbeatFrequencyMS + +If a client frequently rechecks a server, it MUST wait at least minHeartbeatFrequencyMS milliseconds since the previous +check ended, to avoid pointless effort. This value MUST be 500 ms, and it MUST NOT be configurable (no knobs). + +#### heartbeatFrequencyMS + +The interval between server [checks](#check), counted from the end of the previous check until the beginning of the next +one. + +For multi-threaded and asynchronous drivers it MUST default to 10 seconds and MUST be configurable. For single-threaded +drivers it MUST default to 60 seconds and MUST be configurable. It MUST be called heartbeatFrequencyMS unless this +breaks backwards compatibility. + +For both multi- and single-threaded drivers, the driver MUST NOT permit users to configure it less than +minHeartbeatFrequencyMS (500ms). + +(See [heartbeatFrequencyMS in the main SDAM spec](server-discovery-and-monitoring.rst#heartbeatFrequencyMS).) + +### Awaitable hello or legacy hello Server Specification + +As of MongoDB 4.4 the hello or legacy hello command can wait to reply until there is a topology change or a maximum time +has elapsed. Clients opt in to this "awaitable hello" feature by passing new parameters "topologyVersion" and +"maxAwaitTimeMS" to the hello or legacy hello commands. Exhaust support has also been added, which clients can enable in +the usual manner by setting the [OP_MSG exhaustAllowed flag](../message/OP_MSG.md#exhaustAllowed). + +Clients use the awaitable hello feature as the basis of the streaming heartbeat protocol to learn much sooner about +stepdowns, elections, reconfigs, and other events. + +#### topologyVersion + +A server that supports awaitable hello or legacy hello includes a "topologyVersion" field in all hello or legacy hello +replies and State Change Error replies. The topologyVersion is a subdocument with two fields, "processId" and "counter": + +```typescript +{ + topologyVersion: {processId: , counter: }, + ( ... other fields ...) +} +``` + +##### processId + +An ObjectId maintained in memory by the server. It is reinitialized by the server using the standard ObjectId logic each +time this server process starts. + +##### counter + +An int64 State change counter, maintained in memory by the server. It begins at 0 when the server starts, and it is +incremented whenever there is a significant topology change. + +#### maxAwaitTimeMS + +To enable awaitable hello or legacy hello, the client includes a new int64 field "maxAwaitTimeMS" in the hello or legacy +hello request. This field determines the maximum duration in milliseconds a server will wait for a significant topology +change before replying. + +#### Feature Discovery + +To discover if the connected server supports awaitable hello or legacy hello, a client checks the most recent hello or +legacy hello command reply. If the reply includes "topologyVersion" then the server supports awaitable hello or legacy +hello. + +#### Awaitable hello or legacy hello Protocol + +To initiate an awaitable hello or legacy hello command, the client includes both maxAwaitTimeMS and topologyVersion in +the request, for example: + +```typescript +{ + hello: 1, + maxAwaitTimeMS: 10000, + topologyVersion: {processId: , counter: }, + ( ... other fields ...) +} +``` + +Clients MAY additionally set the [OP_MSG exhaustAllowed flag](../message/OP_MSG.md#exhaustAllowed) to enable streaming +hello or legacy hello. With streaming hello or legacy hello, the server MAY send multiple hello or legacy hello +responses without waiting for further requests. + +A server that implements the new protocol follows these rules: + +- Always include the server's topologyVersion in hello, legacy hello, and State Change Error replies. +- If the request includes topologyVersion without maxAwaitTimeMS or vice versa, return an error. +- If the request omits topologyVersion and maxAwaitTimeMS, reply immediately. +- If the request includes topologyVersion and maxAwaitTimeMS, then reply immediately if the server's + topologyVersion.processId does not match the request's, otherwise reply when the server's topologyVersion.counter is + greater than the request's, or maxAwaitTimeMS elapses, whichever comes first. +- Following the [OP_MSG spec](../message/OP_MSG.md), if the request omits the exhaustAllowed flag, the server MUST NOT + set the moreToCome flag on the reply. If the request's exhaustAllowed flag is set, the server MAY set the moreToCome + flag on the reply. If the server sets moreToCome, it MUST continue streaming replies without awaiting further + requests. Between replies it MUST wait until the server's topologyVersion.counter is incremented or maxAwaitTimeMS + elapses, whichever comes first. If the reply includes `ok: 0` the server MUST NOT set the moreToCome flag. +- On a topology change that changes the horizon parameters, the server will close all application connections. + +Example awaitable hello conversation: + +| Client | Server | +| ---------------------------------------------------------- | ------------------------------- | +| hello handshake -> | | +| | \<- reply with topologyVersion | +| hello as OP_MSG with maxAwaitTimeMS and topologyVersion -> | | +| | wait for change or timeout | +| | \<- OP_MSG with topologyVersion | +| ... | | + +Example streaming hello conversation (awaitable hello with exhaust): + +| Client | Server | +| --------------------------------------------------------------------------- | ---------------------------------------------- | +| hello handshake -> | | +| | \<- reply with topologyVersion | +| hello as OP_MSG with exhaustAllowed, maxAwaitTimeMS, and topologyVersion -> | | +| | wait for change or timeout | +| | \<- OP_MSG with moreToCome and topologyVersion | +| | wait for change or timeout | +| | \<- OP_MSG with moreToCome and topologyVersion | +| | ... | +| | \<- OP_MSG without moreToCome | +| ... | | + +### Streaming Protocol + +The streaming protocol is used to monitor MongoDB 4.4+ servers and optimally reduces the time it takes for a client to +discover server state changes. Multi-threaded or asynchronous drivers MUST use the streaming protocol when connected to +a server that supports the awaitable hello or legacy hello commands. This protocol requires an extra thread and an extra +socket for each monitor to perform RTT calculations. + +#### Streaming disabled + +The streaming protocol MUST be disabled when either: + +- the client is configured with serverMonitoringMode=poll, or +- the client is configured with serverMonitoringMode=auto and a FaaS platform is detected, or +- the server does not support streaming (eg MongoDB \< 4.4). + +When the streaming protocol is disabled the client MUST use the [polling protocol](#polling-protocol) and MUST NOT start +an extra thread or connection for [Measuring RTT](#measuring-rtt). + +See +[Why disable the streaming protocol on FaaS platforms like AWS Lambda?](#why-disable-the-streaming-protocol-on-faas-platforms-like-aws-lambda). + +#### Streaming hello or legacy hello + +The streaming hello or legacy hello protocol uses awaitable hello or legacy hello with the OP_MSG exhaustAllowed flag to +continuously stream hello or legacy hello responses from the server. Drivers MUST set the OP_MSG exhaustAllowed flag +with the awaitable hello or legacy hello command and MUST process each hello or legacy hello response. (I.e., they MUST +process responses strictly in the order they were received.) + +A client follows these rules when processing the hello or legacy hello exhaust response: + +- If the response indicates a command error, or a network error or timeout occurs, the client MUST close the connection + and restart the monitoring protocol on a new connection. (See + [Network or command error during server check](#network-or-command-error-during-server-check).) +- If the response is successful (includes "ok:1") and includes the OP_MSG moreToCome flag, then the client begins + reading the next response. +- If the response is successful (includes "ok:1") and does not include the OP_MSG moreToCome flag, then the client + initiates a new awaitable hello or legacy hello with the topologyVersion field from the previous response. + +#### Socket timeout + +Clients MUST use connectTimeoutMS as the timeout for the connection handshake. When connectTimeoutMS=0, the timeout is +unlimited and MUST remain unlimited for awaitable hello and legacy hello replies. Otherwise, connectTimeoutMS is +non-zero and clients MUST use connectTimeoutMS + heartbeatFrequencyMS as the timeout for awaitable hello and legacy +hello replies. + +#### Measuring RTT + +When using the streaming protocol, clients MUST issue a hello or legacy hello command to each server to measure RTT +every heartbeatFrequencyMS. The RTT command MUST be run on a dedicated connection to each server. Clients MUST NOT use +dedicated connections to measure RTT when the streaming protocol is not used. (See +[Monitors MUST use a dedicated connection for RTT commands](#monitors-must-use-a-dedicated-connection-for-rtt-commands).) + +Clients MUST update the RTT from the hello or legacy hello duration of the initial connection handshake. Clients MUST +NOT update RTT based on streaming hello or legacy hello responses. + +Clients MUST ignore the response to the hello or legacy hello command when measuring RTT. Errors encountered when +running a hello or legacy hello command MUST NOT update the topology. (See +[Why don't clients mark a server unknown when an RTT command fails?](#why-dont-clients-mark-a-server-unknown-when-an-rtt-command-fails)) + +Clients MUST track the minimum RTT out of the (at most) last 10 samples. Clients MUST report the minimum RTT as 0 until +at least 2 samples have been gathered. + +When constructing a ServerDescription from a streaming hello or legacy hello response, clients MUST set the average and +minimum round trip times from the RTT task as the "roundTripTime" and "minRoundTripTime" fields, respectively. + +See the pseudocode in the [RTT thread](#rtt-thread) section for an example implementation. + +#### SDAM Monitoring + +Clients MUST publish a ServerHeartbeatStartedEvent before attempting to read the next hello or legacy hello exhaust +response. (See +[Why must streaming hello or legacy hello clients publish ServerHeartbeatStartedEvents?](#why-must-streaming-hello-or-legacy-hello-clients-publish-serverheartbeatstartedevents)) + +Clients MUST NOT publish any events when running an RTT command. (See +[Why don't streaming hello or legacy hello clients publish events for RTT commands?](#why-dont-streaming-hello-or-legacy-hello-clients-publish-events-for-rtt-commands)) + +#### Heartbeat frequency + +In the polling protocol, a client sleeps between each hello or legacy hello check (for at least minHeartbeatFrequencyMS +and up to heartbeatFrequencyMS). In the streaming protocol, after processing an "ok:1" hello or legacy hello response, +the client MUST NOT sleep and MUST begin the next check immediately. + +Clients MUST set [maxAwaitTimeMS](#maxawaittimems) to heartbeatFrequencyMS. + +#### hello or legacy hello Cancellation + +When a client is closed, clients MUST cancel all hello and legacy hello checks; a monitor blocked waiting for the next +streaming hello or legacy hello response MUST be interrupted such that threads may exit promptly without waiting +maxAwaitTimeMS. + +When a client marks a server Unknown from +[Network error when reading or writing](server-discovery-and-monitoring.rst#network-error-when-reading-or-writing), +clients MUST cancel the hello or legacy hello check on that server and close the current monitoring connection. (See +[Drivers cancel in-progress monitor checks](#drivers-cancel-in-progress-monitor-checks).) + +### Polling Protocol + +The polling protocol is used to monitor MongoDB \< 4.4 servers or when \[streaming is disabled\](#streaming is +disabled). The client [checks](#check) a server with a hello or legacy hello command and then sleeps for +heartbeatFrequencyMS before running another check. + +### Marking the connection pool as ready (CMAP only) + +When a monitor completes a successful check against a server, it MUST mark the connection pool for that server as +"ready", and doing so MUST be synchronized with the update to the topology (e.g. by marking the pool as ready in +onServerDescriptionChanged). This is required to ensure a server does not get selected while its pool is still paused. +See the [Connection Pool](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#Connection-Pool) +definition in the CMAP specification for more details on marking the pool as "ready". + +### Error handling + +#### Network or command error during server check + +When a server [check](#check) fails due to a network error (including a network timeout) or a command error (`ok: 0`), +the client MUST follow these steps: + +1. Close the current monitoring connection. +2. Mark the server Unknown. +3. Clear the connection pool for the server (See + [Clear the connection pool on both network and command errors](#clear-the-connection-pool-on-both-network-and-command-errors)). + For CMAP compliant drivers, clearing the pool MUST be synchronized with marking the server as Unknown (see + [Why synchronize clearing a server's pool with updating the topology?](server-discovery-and-monitoring.rst#why-synchronize-clearing-a-server-s-pool-with-updating-the-topology?)). + If this was a network timeout error, then the pool MUST be cleared with interruptInUseConnections = true (see + [Why does the pool need to support closing in use connections as part of its clear logic?](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#Why-does-the-pool-need-to-support-closing-in-use-connections-as-part-of-its-clear-logic?)) +4. If this was a network error and the server was in a known state before the error, the client MUST NOT sleep and MUST + begin the next check immediately. (See + [retry hello or legacy hello calls once](#retry-hello-or-legacy-hello-calls-once) and + [JAVA-1159](https://jira.mongodb.org/browse/JAVA-1159).) +5. Otherwise, wait for heartbeatFrequencyMS (or minHeartbeatFrequencyMS if a check is requested) before restarting the + monitoring protocol on a new connection. + - Note that even in the streaming protocol, a monitor in this state will wait for an application operation to + \[request an immediate check\](#request an immediate check) or for the heartbeatFrequencyMS timeout to expire + before beginning the next check. + +See the pseudocode in the `Monitor thread` section. + +Note that this rule applies only to server checks during monitoring. It does *not* apply when multi-threaded +[clients update the topology from each handshake](#clients-update-the-topology-from-each-handshake). + +### Implementation notes + +This section intends to provide generous guidance to driver authors. It is complementary to the reference +implementations. Words like "should", "may", and so on are used more casually here. + +#### Monitor thread + +Most platforms can use an event object to control the monitor thread. The event API here is assumed to be like the +standard [Python Event](https://docs.python.org/2/library/threading.html#event-objects). +[heartbeatFrequencyMS](#heartbeatfrequencyms) is configurable, [minHeartbeatFrequencyMS](#minheartbeatfrequencyms) is +always 500 milliseconds: + +```python +class Monitor(Thread): + def __init__(): + # Monitor options: + serverAddress = serverAddress + connectTimeoutMS = connectTimeoutMS + heartbeatFrequencyMS = heartbeatFrequencyMS + minHeartbeatFrequencyMS = 500 + stableApi = stableApi + if serverMonitoringMode == "stream": + streamingEnabled = True + elif serverMonitoringMode == "poll": + streamingEnabled = False + else: # serverMonitoringMode == "auto" + streamingEnabled = not isFaas() + + # Internal Monitor state: + connection = Null + # Server API versioning implies that the server supports hello. + helloOk = stableApi != Null + description = default ServerDescription + lock = Mutex() + rttMonitor = RttMonitor(serverAddress, stableApi) + + def run(): + while this monitor is not stopped: + previousDescription = description + try: + description = checkServer(previousDescription) + except CheckCancelledError: + if this monitor is stopped: + # The client was closed. + return + # The client marked this server Unknown and cancelled this + # check during "Network error when reading or writing". + # Wait before running the next check. + wait() + continue + + with client.lock: + topology.onServerDescriptionChanged(description, connection pool for server) + if description.error != Null: + # Clear the connection pool only after the server description is set to Unknown. + clear(interruptInUseConnections: isNetworkTimeout(description.error)) connection pool for server + + # Immediately proceed to the next check if the previous response + # was successful and included the topologyVersion field, or the + # previous response included the moreToCome flag, or the server + # has just transitioned to Unknown from a network error. + serverSupportsStreaming = description.type != Unknown and description.topologyVersion != Null + connectionIsStreaming = connection != Null and connection.moreToCome + transitionedWithNetworkError = isNetworkError(description.error) and previousDescription.type != Unknown + if streamingEnabled and serverSupportsStreaming and not rttMonitor.started: + # Start the RttMonitor. + rttMonitor.run() + if (streamingEnabled and (serverSupportsStreaming or connectionIsStreaming)) or transitionedWithNetworkError: + continue + + wait() + + def setUpConnection(): + # Take the mutex to avoid a data race becauase this code writes to the connection field and a concurrent + # cancelCheck call could be reading from it. + with lock: + # Server API versioning implies that the server supports hello. + helloOk = stableApi != Null + connection = new Connection(serverAddress) + set connection timeout to connectTimeoutMS + + # Do any potentially blocking operations after releasing the mutex. + create the socket and perform connection handshake + + def checkServer(previousDescription): + try: + # The connection is null if this is the first check. It's closed if there was an error during the previous + # check or the previous check was cancelled. + + if helloOk: + helloCommand = hello + else + helloCommand = legacy hello + + if not connection or connection.isClosed(): + setUpConnection() + rttMonitor.addSample(connection.handshakeDuration) + response = connection.handshakeResponse + elif connection.moreToCome: + response = read next helloCommand exhaust response + elif streamingEnabled and previousDescription.topologyVersion: + # Initiate streaming hello or legacy hello + if connectTimeoutMS != 0: + set connection timeout to connectTimeoutMS+heartbeatFrequencyMS + response = call {helloCommand: 1, helloOk: True, topologyVersion: previousDescription.topologyVersion, maxAwaitTimeMS: heartbeatFrequencyMS} + else: + # The server does not support topologyVersion or streamingEnabled=False. + response = call {helloCommand: 1, helloOk: True} + + # If the server supports hello, then response.helloOk will be true + # and hello will be used for subsequent monitoring commands. + # If the server does not support hello, then response.helloOk will be undefined + # and legacy hello will be used for subsequent monitoring commands. + helloOk = response.helloOk + + return ServerDescription(response, rtt=rttMonitor.average(), ninetiethPercentileRtt=rttMonitor.ninetiethPercentile()) + except Exception as exc: + close connection + rttMonitor.reset() + return ServerDescription(type=Unknown, error=exc) + + def wait(): + start = gettime() + + # Can be awakened by requestCheck(). + event.wait(heartbeatFrequencyMS) + event.clear() + + waitTime = gettime() - start + if waitTime < minHeartbeatFrequencyMS: + # Cannot be awakened. + sleep(minHeartbeatFrequencyMS - waitTime) +``` + +[Requesting an immediate check](#requesting-an-immediate-check): + +```python +def requestCheck(): + event.set() +``` + +[hello or legacy hello Cancellation](#hello-or-legacy-hello-cancellation): + +```python +def cancelCheck(): + # Take the mutex to avoid reading the connection value while setUpConnection is writing to it. + # Copy the connection value in the lock but do the actual cancellation outside. + with lock: + tempConnection = connection + + if tempConnection: + interrupt connection read + close tempConnection +``` + +#### RTT thread + +The requirements in the [Measuring RTT](#measuring-rtt) section can be satisfied with an additional thread that +periodically runs the hello or legacy hello command on a dedicated connection, for example: + +```python +class RttMonitor(Thread): + def __init__(): + # Options: + serverAddress = serverAddress + connectTimeoutMS = connectTimeoutMS + heartbeatFrequencyMS = heartbeatFrequencyMS + stableApi = stableApi + + # Internal state: + connection = Null + # Server API versioning implies that the server supports hello. + helloOk = stableApi != Null + lock = Mutex() + movingAverage = MovingAverage() + # Track the min RTT seen in the most recent 10 samples. + recentSamples = deque(maxlen=10) + + def reset(): + with lock: + movingAverage.reset() + recentSamples.clear() + + def addSample(rtt): + with lock: + movingAverage.update(rtt) + recentSamples.append(rtt) + + def average(): + with lock: + return movingAverage.get() + + def min(): + with lock: + # Need at least 2 RTT samples. + if len(recentSamples) < 2: + return 0 + return min(recentSamples) + + def run(): + while this monitor is not stopped: + try: + rtt = pingServer() + addSample(rtt) + except Exception as exc: + # Don't call reset() here. The Monitor thread is responsible + # for resetting the average RTT. + close connection + connection = Null + helloOk = stableApi != Null + + # Can be awakened when the client is closed. + event.wait(heartbeatFrequencyMS) + event.clear() + + def setUpConnection(): + # Server API versioning implies that the server supports hello. + helloOk = stableApi != Null + connection = new Connection(serverAddress) + set connection timeout to connectTimeoutMS + perform connection handshake + + def pingServer(): + if helloOk: + helloCommand = hello + else + helloCommand = legacy hello + + if not connection: + setUpConnection() + return RTT of the connection handshake + + start = time() + response = call {helloCommand: 1, helloOk: True} + rtt = time() - start + helloOk = response.helloOk + return rtt +``` + +## Design Alternatives + +### Alternating hello or legacy hello to check servers and RTT without adding an extra connection + +The streaming hello or legacy hello protocol is optimal in terms of latency; clients are always blocked waiting for the +server to stream updated hello or legacy hello information, they learn of server state changes as soon as possible. +However, streaming hello or legacy hello has two downsides: + +1. Streaming hello or legacy hello requires a new connection to each server to calculate the RTT. +2. Streaming hello or legacy hello requires a new thread (or threads) to calculate the RTT of each server. + +To address these concerns we designed the alternating hello or legacy hello protocol. This protocol would have +alternated between awaitable hello or legacy hello and regular hello or legacy hello. The awaitable hello or legacy +hello replaces the polling protocol's client side sleep and allows the client to receive updated hello or legacy hello +responses sooner. The regular hello or legacy hello allows the client to maintain accurate RTT calculations without +requiring any extra threads or sockets. + +We reject this design because streaming hello or legacy hello is strictly better at reducing the client's +time-to-recovery. We determined that one extra connection per server per MongoClient is reasonable for all drivers. +Applications that upgrade may see a modest increase in connections and memory usage on the server. We don't expect this +increase to be problematic; however, we have several projects planned for future MongoDB releases to make the streaming +hello or legacy hello protocol cheaper server-side which should mitigate the cost of the extra monitoring connections. + +### Use TCP smoothed round-trip time instead of measuring RTT explicitly + +TCP sockets internally maintain a "smoothed round-trip time" or SRTT. Drivers could use this SRTT instead of measuring +RTT explicitly via hello or legacy hello commands. The server could even include this value on all hello or legacy hello +responses. We reject this idea for a few reasons: + +- Not all programming languages have an API to access the TCP socket's RTT. +- On Windows, RTT access requires Admin privileges. +- TCP's SRTT would likely differ substantially from RTT measurements in the current protocol. For example, the SRTT can + be reset on [retransmission timeouts](https://tools.ietf.org/html/rfc2988#section-5). + +## Rationale + +### Thread per server + +Mongos uses a monitor thread per replica set, rather than a thread per server. A thread per server is impractical if +mongos is monitoring a large number of replica sets. But a driver only monitors one. + +In mongos, threads trying to do reads and writes join the effort to scan the replica set. Such threads are more likely +to be abundant in mongos than in drivers, so mongos can rely on them to help with monitoring. + +In short: mongos has different scaling concerns than a multi-threaded or asynchronous driver, so it allocates threads +differently. + +### Socket timeout for monitoring is connectTimeoutMS + +When a client waits for a server to respond to a connection, the client does not know if the server will respond +eventually or if it is down. Users can help the client guess correctly by supplying a reasonable connectTimeoutMS for +their network: on some networks a server is probably down if it hasn't responded in 10 ms, on others a server might +still be up even if it hasn't responded in 10 seconds. + +The socketTimeoutMS, on the other hand, must account for both network latency and the operation's duration on the +server. Applications should typically set a very long or infinite socketTimeoutMS so they can wait for long-running +MongoDB operations. + +Multi-threaded clients use distinct sockets for monitoring and for application operations. A socket used for monitoring +does two things: it connects and calls hello or legacy hello. Both operations are fast on the server, so only network +latency matters. Thus both operations SHOULD use connectTimeoutMS, since that is the value users supply to help the +client guess if a server is down, based on users' knowledge of expected latencies on their networks. + +### A monitor SHOULD NOT use the client's regular connection pool + +If a multi-threaded driver's connection pool enforces a maximum size and monitors use sockets from the pool, there are +two bad options: either monitors compete with the application for sockets, or monitors have the exceptional ability to +create sockets even when the pool has reached its maximum size. The former risks starving the monitor. The latter is +more complex than it is worth. (A lesson learned from PyMongo 2.6's pool, which implemented this option.) + +Since this rule is justified for drivers that enforce a maximum pool size, this spec recommends that all drivers follow +the same rule for the sake of consistency. + +### Monitors MUST use a dedicated connection for RTT commands + +When using the streaming protocol, a monitor needs to maintain an extra dedicated connection to periodically update its +average round trip time in order to support [localThresholdMS](../server-selection/server-selection.md#localThresholdMS) +from the Server Selection spec. + +It could pop a connection from its regular pool, but we rejected this option for a few reasons: + +- Under contention the RTT task may block application operations from completing in a timely manner. +- Under contention the application may block the RTT task from completing in a timely manner. +- Under contention the RTT task may often result in an extra connection anyway because the pool creates new connections + under contention up to maxPoolSize. +- This would be inconsistent with the rule that a monitor SHOULD NOT use the client's regular connection pool. + +The client could open and close a new connection for each RTT check. We rejected this design, because if we ping every +heartbeatFrequencyMS (default 10 seconds) then the cost to the client and the server of creating and destroying the +connection might exceed the cost of keeping a dedicated connection open. + +Instead, the client must use a dedicated connection reserved for RTT commands. Despite the cost of the additional +connection per server, we chose this option as the safest and least likely to result in surprising behavior under load. + +### Monitors MUST use the hello or legacy hello command to measure RTT + +In the streaming protocol, clients could use the "ping", "hello", or legacy hello commands to measure RTT. This spec +chooses "hello" or legacy hello for consistency with the polling protocol as well as consistency with the initial RTT +provided the connection handshake which also uses the hello or legacy hello commands. Additionally, mongocryptd does not +allow the ping command but does allow hello or legacy hello. + +### Why not use `awaitedTimeMS` in the server response to calculate RTT in the streaming protocol? + +One approach to calculating RTT in the streaming protocol would be to have the server return an `awaitedTimeMS` in its +`hello` or legacy hello response. A driver could then determine the RTT by calculating the difference between the +initial request, or last response, and the `awaitedTimeMS`. + +We rejected this design because of a number of issue with the unreliability of clocks in distributed systems. Clocks +skew between local and remote system clocks. This approach mixes two notions of time: the local clock times the whole +operation while the remote clock times the wait. This means that if these clocks tick at different rates, or there are +anomalies like clock changes, you will get bad results. To make matters worse, you will be comparing times from multiple +servers that could each have clocks ticking at different rates. This approach will bias toward servers with the fastest +ticking clock, since it will seem like it spends the least time on the wire. + +Additionally, systems using NTP will experience clock "slew". ntpd "slews" time by up to 500 parts-per-million to have +the local time gradually approach the "true" time without big jumps - over a 10 second window that means a 5ms +difference. If both sides are slewing in opposite directions, that can result in an effective difference of 10ms. Both +of these times are close enough to [localThresholdMS](../server-selection/server-selection.md#localThresholdMS) to +significantly affect which servers are viable in NEAREST calculations. + +Ensuring that all measurements use the same clock obviates the need for a more complicated solution, and mitigates the +above mentioned concerns. + +### Why don't clients mark a server unknown when an RTT command fails? + +In the streaming protocol, clients use the hello or legacy hello command on a dedicated connection to measure a server's +RTT. However, errors encountered when running the RTT command MUST NOT mark a server Unknown. We reached this decision +because the dedicate RTT connection does not come from a connection pool and thus does not have a generation number +associated with it. Without a generation number we cannot handle errors from the RTT command without introducing race +conditions. Introducing such a generation number would add complexity to this design without much benefit. It is safe to +ignore these errors because the Monitor will soon discover the server's state regardless (either through an updated +streaming response, an error on the streaming connection, or by handling an error on an application connection). + +### Drivers cancel in-progress monitor checks + +When an application operation fails with a non-timeout network error, drivers cancel that monitor's in-progress check. + +We assume that a non-timeout network error on one application connection implies that all other connections to that +server are also bad. This means that it is redundant to continue reading on the current monitoring connection. Instead, +we cancel the current monitor check, close the monitoring connection, and start a new check soon. Note that we rely on +the connection/pool generation number checking to avoid races and ensure that the monitoring connection is only closed +once. + +This approach also handles the rare case where the client sees a network error on an application connection but the +monitoring connection is still healthy. If we did not cancel the monitor check in this scenario, then the server would +remain in the Unknown state until the next hello or legacy hello response (up to maxAwaitTimeMS). A potential real world +example of this behavior is when Azure closes an idle connection in the application pool. + +### Retry hello or legacy hello calls once + +A monitor's connection to a server is long-lived and used only for hello or legacy hello calls. So if a server has +responded in the past, a network error on the monitor's connection means that there was a network glitch, or a server +restart since the last check, or that the server is truly down. To handle the case that the server is truly down, the +monitor makes the server unselectable by marking it Unknown. To handle the case of a transient network glitch or +restart, the monitor immediately runs the next check without waiting. + +### Clear the connection pool on both network and command errors + +A monitor clears the connection pool when a server check fails with a network or command error +([Network or command error during server check](#network-or-command-error-during-server-check)). When the check fails +with a network error it is likely that all connections to that server are also closed. (See +[JAVA-1252](https://jira.mongodb.org/browse/JAVA-1252)). When the check fails with a network timeout error, a monitor +MUST set interruptInUseConnections to true. See, +[Why does the pool need to support closing in use connections as part of its clear logic?](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#Why-does-the-pool-need-to-support-closing-in-use-connections-as-part-of-its-clear-logic?). + +When the server is shutting down, it may respond to hello or legacy hello commands with ShutdownInProgress errors before +closing connections. In this case, the monitor clears the connection pool because all connections will be closed soon. +Other command errors are unexpected but are handled identically. + +### Why must streaming hello or legacy hello clients publish ServerHeartbeatStartedEvents? + +The [SDAM Monitoring spec](server-discovery-and-monitoring-logging-and-monitoring.rst#heartbeats) guarantees that every +ServerHeartbeatStartedEvent has either a correlating ServerHeartbeatSucceededEvent or ServerHeartbeatFailedEvent. This +is consistent with Command Monitoring on exhaust cursors where the driver publishes a fake CommandStartedEvent before +reading the next getMore response. + +### Why don't streaming hello or legacy hello clients publish events for RTT commands? + +In the streaming protocol, clients MUST NOT publish any events (server, topology, command, CMAP, etc..) when running an +RTT command. We considered introducing new RTT events (ServerRTTStartedEvent, ServerRTTSucceededEvent, +ServerRTTFailedEvent) but it's not clear that there is a demand for this. Applications can still monitor changes to a +server's RTT by listening to TopologyDescriptionChangedEvents. + +### What is the purpose of the "awaited" field on server heartbeat events? + +ServerHeartbeatSucceededEvents published from awaitable hello or legacy hello responses will regularly have 10 second +durations. The spec introduces the "awaited" field on server heartbeat events so that applications can differentiate a +slow heartbeat in the polling protocol from a normal awaitable hello or legacy hello heartbeat in the new protocol. + +### Why disable the streaming protocol on FaaS platforms like AWS Lambda? + +The streaming protocol relies on the assumption that the client can read the server's heartbeat responses in a timely +manner, otherwise the client will be acting on stale information. In many FaaS platforms, like AWS Lambda, host +applications will be suspended and resumed many minutes later. This behavior causes a build up of heartbeat responses +and the client can end up spending a long time in a catch up phase processing outdated responses. This problem was +discovered in [DRIVERS-2246](https://jira.mongodb.org/browse/DRIVERS-2246). + +Additionally, the streaming protocol requires an extra connection and thread per monitored server which is expensive on +platforms like AWS Lambda. The extra connection is particularly inefficient when thousands of AWS instances and thus +thousands of clients are used. + +We decided to make polling the default behavior when running on FaaS platforms like AWS Lambda to improve scalability, +performance, and reliability. + +### Why introduce a knob for serverMonitoringMode? + +The serverMonitoringMode knob provides a workaround in cases where the polling protocol would be a better choice but the +driver is not running on a FaaS platform. It also provides a workaround in case the FaaS detection logic becomes +outdated or inaccurate. + +## Changelog + +- 2024-05-02: Migrated from reStructuredText to Markdown. + +- 2020-02-20: Extracted server monitoring from SDAM into this new spec. + +- 2020-03-09: A monitor check that creates a new connection MUST use the connection's handshake to update the topology. + +- 2020-04-20: Add streaming heartbeat protocol. + +- 2020-05-20: Include rationale for why we don't use `awaitedTimeMS` + +- 2020-06-11: Support connectTimeoutMS=0 in streaming heartbeat protocol. + +- 2020-12-17: Mark the pool for a server as "ready" after performing a successful check. Synchronize pool clearing with + SDAM updates. + +- 2021-06-21: Added support for hello/helloOk to handshake and monitoring. + +- 2021-06-24: Remove optimization mention that no longer applies + +- 2022-01-19: Add 90th percentile RTT tracking. + +- 2022-02-24: Rename Versioned API to Stable API + +- 2022-04-05: Preemptively cancel in progress operations when SDAM heartbeats timeout. + +- 2022-10-05: Remove spec front matter reformat changelog. + +- 2022-11-17: Add minimum RTT tracking and remove 90th percentile RTT. + +- 2023-10-05: Add serverMonitoringMode and default to the polling protocol on FaaS. Clients MUST NOT use dedicated + connections to measure RTT when using the polling protocol. + +______________________________________________________________________ diff --git a/source/server-discovery-and-monitoring/server-monitoring.rst b/source/server-discovery-and-monitoring/server-monitoring.rst index c475ba73d1..11355ed1dc 100644 --- a/source/server-discovery-and-monitoring/server-monitoring.rst +++ b/source/server-discovery-and-monitoring/server-monitoring.rst @@ -1,1260 +1,4 @@ -================= -Server Monitoring -================= -:Status: Accepted -:Minimum Server Version: 2.4 - -.. contents:: - --------- - -Abstract --------- - -This spec defines how a driver monitors a MongoDB server. In summary, the -client monitors each server in the topology. The scope of server monitoring is -to provide the topology with updated ServerDescriptions based on hello or -legacy hello command responses. - -META ----- - -The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", -"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be -interpreted as described in `RFC 2119 `_. - -Specification -------------- - -Terms -''''' - -See the terms in the `main SDAM spec`_. - -.. _checking: #check -.. _checks: #check - -check -````` - -The client checks a server by attempting to call hello or legacy hello on it, -and recording the outcome. - -client -`````` - -A process that initiates a connection to a MongoDB server. This includes -mongod and mongos processes in a replica set or sharded cluster, as well as -drivers, the shell, tools, etc. - -.. _scans: #scans - -scan -```` - -The process of checking all servers in the deployment. - -suitable -```````` - -A server is judged "suitable" for an operation if the client can use it -for a particular operation. -For example, a write requires a standalone, primary, or mongos. -Suitability is fully specified in the `Server Selection Spec`_. - -significant topology change -``````````````````````````` - -A change in the server's state that is relevant to the client's view of the -server, e.g. a change in the server's replica set member state, or its replica -set tags. In SDAM terms, a significant topology change on the server means the -client's ServerDescription is out of date. Standalones and mongos do not -currently experience significant topology changes but they may in the future. - -regular hello or legacy hello command -````````````````````````````````````` - -A default ``{hello: 1}`` or legacy hello command where the server responds immediately. - - -streamable hello or legacy hello command -```````````````````````````````````````` - -The hello or legacy hello command feature which allows the server to stream multiple -replies back to the client. - -RTT -``` - -Round trip time. The client's measurement of the duration of one hello or legacy hello call. -The RTT is used to support `localThresholdMS`_ from the Server Selection spec -and `timeoutMS`_ from the `Client Side Operations Timeout Spec`_. - -FaaS -```` - -A Function-as-a-Service (FaaS) environment like AWS Lambda. - -serverMonitoringMode -```````````````````` - -The serverMonitoringMode option configures which server monitoring protocol to use. Valid modes are -"stream", "poll", or "auto". The default value MUST be "auto": - -- With "stream" mode, the client MUST use the streaming protocol when the server supports - it or fall back to the polling protocol otherwise. -- With "poll" mode, the client MUST use the polling protocol. -- With "auto" mode, the client MUST behave the same as "poll" mode when running on a FaaS - platform or the same as "stream" mode otherwise. The client detects that it's - running on a FaaS platform via the same rules for generating the ``client.env`` - handshake metadata field in the `MongoDB Handshake spec`_. - -Multi-threaded or asynchronous drivers MUST implement this option. -See `Why disable the streaming protocol on FaaS platforms like AWS Lambda?`_ and -`Why introduce a knob for serverMonitoringMode?`_ - -Monitoring -'''''''''' - -The client monitors servers using the hello or legacy hello commands. In MongoDB 4.4+, a -monitor uses the `Streaming Protocol`_ to continuously stream hello or legacy hello -responses from the server. In MongoDB <= 4.2, a monitor uses the -`Polling Protocol`_ pausing heartbeatFrequencyMS between `checks`_. -Clients check servers sooner in response to certain events. - -If a `server API version`_ is requested, then the driver must use hello for monitoring. -If a server API version is not requested, the initial handshake using the legacy hello -command must include `helloOk: true`. If the response contains `helloOk: true`, then the -driver must use the `hello` command for monitoring. If the response does not contain -`helloOk: true`, then the driver must use the legacy hello command for monitoring. - -The socket used to check a server MUST use the same -`connectTimeoutMS `_ -as regular sockets. -Multi-threaded clients SHOULD set monitoring sockets' socketTimeoutMS to the -connectTimeoutMS. -(See `socket timeout for monitoring is connectTimeoutMS`_. -Drivers MAY let users configure the timeouts for monitoring sockets -separately if necessary to preserve backwards compatibility.) - -The client begins monitoring a server when: - -* ... the client is initialized and begins monitoring each seed. - See `initial servers`_. -* ... `updateRSWithoutPrimary`_ or `updateRSFromPrimary`_ - discovers new replica set members. - -The following subsections specify how monitoring works, -first in multi-threaded or asynchronous clients, -and second in single-threaded clients. -This spec provides detailed requirements for monitoring -because it intends to make all drivers behave consistently. - -Multi-threaded or asynchronous monitoring -````````````````````````````````````````` - -Servers are monitored in parallel -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -All servers' monitors run independently, in parallel: -If some monitors block calling hello or legacy hello over slow connections, -other monitors MUST proceed unimpeded. - -The natural implementation is a thread per server, -but the decision is left to the implementer. -(See `thread per server`_.) - -Servers are monitored with dedicated sockets -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -`A monitor SHOULD NOT use the client's regular connection pool`_ -to acquire a socket; -it uses a dedicated socket that does not count toward the pool's -maximum size. - -Drivers MUST NOT authenticate on sockets used for monitoring nor include -SCRAM mechanism negotiation (i.e. ``saslSupportedMechs``), as doing so would -make monitoring checks more expensive for the server. - -Servers are checked periodically -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Each monitor `checks`_ its server and notifies the client of the outcome -so the client can update the TopologyDescription. - -After each check, the next check SHOULD be scheduled `heartbeatFrequencyMS`_ later; -a check MUST NOT run while a previous check is still in progress. - -.. _request an immediate check: - -Requesting an immediate check -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -At any time, the client can request that a monitor check its server immediately. -(For example, after a "not writable primary" error. See `error handling`_.) -If the monitor is sleeping when this request arrives, -it MUST wake and check as soon as possible. -If a hello or legacy hello call is already in progress, -the request MUST be ignored. -If the previous check ended less than `minHeartbeatFrequencyMS`_ ago, -the monitor MUST sleep until the minimum delay has passed, -then check the server. - -Application operations are unblocked when a server is found -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Each time a check completes, threads waiting for a `suitable`_ server -are unblocked. Each unblocked thread MUST proceed if the new TopologyDescription -now contains a suitable server. - -Clients update the topology from each handshake -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When a monitor check creates a new connection, the `connection handshake`_ -response MUST be used to satisfy the check and update the topology. - -When a client successfully calls hello or legacy hello to handshake a new connection for application -operations, it SHOULD use the hello or legacy hello reply to update the ServerDescription -and TopologyDescription, the same as with a hello or legacy hello reply on a monitoring -socket. If the hello or legacy hello call fails, the client SHOULD mark the server Unknown -and update its TopologyDescription, the same as a failed server check on -monitoring socket. - -Clients use the streaming protocol when supported -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When a monitor discovers that the server supports the streamable hello or legacy hello -command and the client does not have `streaming disabled`_, it MUST use the `streaming protocol`_. - -Single-threaded monitoring -`````````````````````````` - -cooldownMS -~~~~~~~~~~ - -After a single-threaded client gets a network error trying to `check`_ a -server, the client skips re-checking the server until cooldownMS has passed. - -This avoids spending connectTimeoutMS on each unavailable server -during each scan. - -This value MUST be 5000 ms, and it MUST NOT be configurable. - -Scanning -~~~~~~~~ - -Single-threaded clients MUST `scan`_ all servers synchronously, -inline with regular application operations. -Before each operation, the client checks if `heartbeatFrequencyMS`_ has -passed since the previous scan ended, or if the topology is marked "stale"; -if so it scans all the servers before -selecting a server and performing the operation. - -Selection failure triggers an immediate scan. -When a client that uses single-threaded monitoring -fails to select a suitable server for any operation, -it `scans`_ the servers, then attempts selection again, -to see if the scan discovered suitable servers. It repeats, waiting -`minHeartbeatFrequencyMS`_ after each scan, until a timeout. - -Scanning order -~~~~~~~~~~~~~~ - -If the topology is a replica set, -the client attempts to contact the primary as soon as possible -to get an authoritative list of members. -Otherwise, the client attempts to check all members it knows of, -in order from the least-recently to the most-recently checked. - -When all servers have been checked the scan is complete. -New servers discovered **during** the scan -MUST be checked before the scan is complete. -Sometimes servers are removed during a scan -so they are not checked, depending on the order of events. - -The scanning order is expressed in this pseudocode:: - - scanStartTime = now() - # You'll likely need to convert units here. - beforeCoolDown = scanStartTime - cooldownMS - - while true: - serversToCheck = all servers with lastUpdateTime before scanStartTime - - remove from serversToCheck any Unknowns with lastUpdateTime > beforeCoolDown - - if no serversToCheck: - # This scan has completed. - break - - if a server in serversToCheck is RSPrimary: - check it - else if there is a PossiblePrimary: - check it - else if any servers are not of type Unknown or RSGhost: - check the one with the oldest lastUpdateTime - if several servers have the same lastUpdateTime, choose one at random - else: - check the Unknown or RSGhost server with the oldest lastUpdateTime - if several servers have the same lastUpdateTime, choose one at random - -This algorithm might be better understood with an example: - -#. The client is configured with one seed and TopologyType Unknown. - It begins a scan. -#. When it checks the seed, it discovers a secondary. -#. The secondary's hello or legacy hello response includes the "primary" field - with the address of the server that the secondary thinks is primary. -#. The client creates a ServerDescription with that address, - type PossiblePrimary, and lastUpdateTime "infinity ago". - (See `updateRSWithoutPrimary`_.) -#. On the next iteration, there is still no RSPrimary, - so the new PossiblePrimary is the top-priority server to check. -#. The PossiblePrimary is checked and replaced with an RSPrimary. - The client has now acquired an authoritative host list. - Any new hosts in the list are added to the TopologyDescription - with lastUpdateTime "infinity ago". - (See `updateRSFromPrimary`_.) -#. The client continues scanning until all known hosts have been checked. - -Another common case might be scanning a pool of mongoses. -When the client first scans its seed list, -they all have the default lastUpdateTime "infinity ago", -so it scans them in random order. -This randomness provides some load-balancing if many clients start at once. -A client's subsequent scans of the mongoses -are always in the same order, -since their lastUpdateTimes are always in the same order -by the time a scan ends. - -minHeartbeatFrequencyMS -``````````````````````` - -If a client frequently rechecks a server, -it MUST wait at least minHeartbeatFrequencyMS milliseconds -since the previous check ended, to avoid pointless effort. -This value MUST be 500 ms, and it MUST NOT be configurable (no knobs). - -heartbeatFrequencyMS -```````````````````` - -The interval between server `checks`_, counted from the end of the previous -check until the beginning of the next one. - -For multi-threaded and asynchronous drivers -it MUST default to 10 seconds and MUST be configurable. -For single-threaded drivers it MUST default to 60 seconds -and MUST be configurable. -It MUST be called heartbeatFrequencyMS -unless this breaks backwards compatibility. - -For both multi- and single-threaded drivers, -the driver MUST NOT permit users to configure it less than minHeartbeatFrequencyMS (500ms). - -(See `heartbeatFrequencyMS in the main SDAM spec`_.) - -Awaitable hello or legacy hello Server Specification -'''''''''''''''''''''''''''''''''''''''''''''''''''' - -As of MongoDB 4.4 the hello or legacy hello command can wait to reply until -there is a topology change or a maximum time has elapsed. Clients opt in to -this "awaitable hello" feature by passing new parameters "topologyVersion" -and "maxAwaitTimeMS" to the hello or legacy hello commands. Exhaust support -has also been added, which clients can enable in the usual manner by -setting the `OP_MSG exhaustAllowed flag`_. - -Clients use the awaitable hello feature as the basis of the streaming -heartbeat protocol to learn much sooner about stepdowns, elections, reconfigs, -and other events. - -topologyVersion -``````````````` - -A server that supports awaitable hello or legacy hello includes a "topologyVersion" -field in all hello or legacy hello replies and State Change Error replies. -The topologyVersion is a subdocument with two fields, "processId" and -"counter": - -.. code:: typescript - - { - topologyVersion: {processId: , counter: }, - ( ... other fields ...) - } - -processId -~~~~~~~~~ - -An ObjectId maintained in memory by the server. It is reinitialized by the -server using the standard ObjectId logic each time this server process starts. - -counter -~~~~~~~ - -An int64 State change counter, maintained in memory by the server. It begins -at 0 when the server starts, and it is incremented whenever there is a -significant topology change. - -maxAwaitTimeMS -`````````````` - -To enable awaitable hello or legacy hello, the client includes a new int64 field -"maxAwaitTimeMS" in the hello or legacy hello request. This field determines the maximum -duration in milliseconds a server will wait for a significant topology change -before replying. - -Feature Discovery -````````````````` - -To discover if the connected server supports awaitable hello or legacy hello, a client -checks the most recent hello or legacy hello command reply. If the reply includes -"topologyVersion" then the server supports awaitable hello or legacy hello. - -Awaitable hello or legacy hello Protocol -```````````````````````````````````````` - -To initiate an awaitable hello or legacy hello command, the client includes both -maxAwaitTimeMS and topologyVersion in the request, for example: - -.. code:: typescript - - { - hello: 1, - maxAwaitTimeMS: 10000, - topologyVersion: {processId: , counter: }, - ( ... other fields ...) - } - -Clients MAY additionally set the `OP_MSG exhaustAllowed flag`_ to enable -streaming hello or legacy hello. With streaming hello or legacy hello, the server -MAY send multiple hello or legacy hello responses without waiting for further requests. - -A server that implements the new protocol follows these rules: - -- Always include the server's topologyVersion in hello, legacy hello, and State Change - Error replies. -- If the request includes topologyVersion without maxAwaitTimeMS or vice versa, - return an error. -- If the request omits topologyVersion and maxAwaitTimeMS, reply immediately. -- If the request includes topologyVersion and maxAwaitTimeMS, then reply - immediately if the server's topologyVersion.processId does not match the - request's, otherwise reply when the server's topologyVersion.counter is - greater than the request's, or maxAwaitTimeMS elapses, whichever comes first. -- Following the `OP_MSG spec`_, if the request omits the exhaustAllowed flag, - the server MUST NOT set the moreToCome flag on the reply. If the request's - exhaustAllowed flag is set, the server MAY set the moreToCome flag on the - reply. If the server sets moreToCome, it MUST continue streaming replies - without awaiting further requests. Between replies it MUST wait until the - server's topologyVersion.counter is incremented or maxAwaitTimeMS elapses, - whichever comes first. If the reply includes ``ok: 0`` the server MUST NOT - set the moreToCome flag. -- On a topology change that changes the horizon parameters, the server will - close all application connections. - - -Example awaitable hello conversation: - -+---------------------------------------+--------------------------------+ -| Client | Server | -+=======================================+================================+ -| hello handshake -> | | -+---------------------------------------+--------------------------------+ -| | <- reply with topologyVersion | -+---------------------------------------+--------------------------------+ -| hello as OP_MSG with | | -| maxAwaitTimeMS and topologyVersion -> | | -+---------------------------------------+--------------------------------+ -| | wait for change or timeout | -+---------------------------------------+--------------------------------+ -| | <- OP_MSG with topologyVersion | -+---------------------------------------+--------------------------------+ -| ... | | -+---------------------------------------+--------------------------------+ - -Example streaming hello conversation (awaitable hello with exhaust): - -+---------------------------------------+--------------------------------+ -| Client | Server | -+=======================================+================================+ -| hello handshake -> | | -+---------------------------------------+--------------------------------+ -| | <- reply with topologyVersion | -+---------------------------------------+--------------------------------+ -| hello as OP_MSG with | | -| exhaustAllowed, maxAwaitTimeMS, | | -| and topologyVersion -> | | -+---------------------------------------+--------------------------------+ -| | wait for change or timeout | -+---------------------------------------+--------------------------------+ -| | <- OP_MSG with moreToCome | -| | and topologyVersion | -+---------------------------------------+--------------------------------+ -| | wait for change or timeout | -+---------------------------------------+--------------------------------+ -| | <- OP_MSG with moreToCome | -| | and topologyVersion | -+---------------------------------------+--------------------------------+ -| | ... | -+---------------------------------------+--------------------------------+ -| | <- OP_MSG without moreToCome | -+---------------------------------------+--------------------------------+ -| ... | | -+---------------------------------------+--------------------------------+ - - -Streaming Protocol -'''''''''''''''''' - -The streaming protocol is used to monitor MongoDB 4.4+ servers and optimally -reduces the time it takes for a client to discover server state changes. -Multi-threaded or asynchronous drivers MUST use the streaming protocol when -connected to a server that supports the awaitable hello or legacy hello commands. -This protocol requires an extra thread and an extra socket for -each monitor to perform RTT calculations. - -.. _streaming is disabled: - -Streaming disabled -`````````````````` - -The streaming protocol MUST be disabled when either: - -- the client is configured with serverMonitoringMode=poll, or -- the client is configured with serverMonitoringMode=auto and a FaaS platform is detected, or -- the server does not support streaming (eg MongoDB < 4.4). - -When the streaming protocol is disabled the client MUST use the `polling protocol`_ -and MUST NOT start an extra thread or connection for `Measuring RTT`_. - -See `Why disable the streaming protocol on FaaS platforms like AWS Lambda?`_. - -Streaming hello or legacy hello -``````````````````````````````` - -The streaming hello or legacy hello protocol uses awaitable hello or legacy hello -with the OP_MSG exhaustAllowed flag to continuously stream hello or legacy hello -responses from the server. Drivers MUST set the OP_MSG exhaustAllowed flag -with the awaitable hello or legacy hello command and MUST process each -hello or legacy hello response. (I.e., they MUST process responses strictly -in the order they were received.) - -A client follows these rules when processing the hello or legacy hello -exhaust response: - -- If the response indicates a command error, or a network error or timeout - occurs, the client MUST close the connection and restart the monitoring - protocol on a new connection. (See - `Network or command error during server check`_.) -- If the response is successful (includes "ok:1") and includes the OP_MSG - moreToCome flag, then the client begins reading the next response. -- If the response is successful (includes "ok:1") and does not include the - OP_MSG moreToCome flag, then the client initiates a new awaitable hello - or legacy hello with the topologyVersion field from the previous response. - -Socket timeout -`````````````` - -Clients MUST use connectTimeoutMS as the timeout for the connection handshake. -When connectTimeoutMS=0, the timeout is unlimited and MUST remain unlimited -for awaitable hello and legacy hello replies. Otherwise, connectTimeoutMS is -non-zero and clients MUST use connectTimeoutMS + heartbeatFrequencyMS as the -timeout for awaitable hello and legacy hello replies. - -Measuring RTT -````````````` - -When using the streaming protocol, clients MUST issue a hello or legacy hello -command to each server to measure RTT every heartbeatFrequencyMS. The RTT command -MUST be run on a dedicated connection to each server. Clients MUST NOT use -dedicated connections to measure RTT when the streaming protocol is not used. (See -`Monitors MUST use a dedicated connection for RTT commands`_.) - -Clients MUST update the RTT from the hello or legacy hello duration of the initial -connection handshake. Clients MUST NOT update RTT based on streaming hello or -legacy hello responses. - -Clients MUST ignore the response to the hello or legacy hello command when measuring RTT. -Errors encountered when running a hello or legacy hello command MUST NOT update the topology. -(See `Why don't clients mark a server unknown when an RTT command fails?`_) - -Clients MUST track the minimum RTT out of the (at most) last 10 samples. Clients -MUST report the minimum RTT as 0 until at least 2 samples have been gathered. - -When constructing a ServerDescription from a streaming hello or legacy hello response, -clients MUST set the average and minimum round trip times from the RTT task as the -"roundTripTime" and "minRoundTripTime" fields, respectively. - -See the pseudocode in the `RTT thread`_ section for an example implementation. - -SDAM Monitoring -``````````````` - -Clients MUST publish a ServerHeartbeatStartedEvent before attempting to -read the next hello or legacy hello exhaust response. (See -`Why must streaming hello or legacy hello clients publish ServerHeartbeatStartedEvents?`_) - -Clients MUST NOT publish any events when running an RTT command. (See -`Why don't streaming hello or legacy hello clients publish events for RTT commands?`_) - -Heartbeat frequency -``````````````````` - -In the polling protocol, a client sleeps between each hello or legacy hello check (for at -least minHeartbeatFrequencyMS and up to heartbeatFrequencyMS). In the -streaming protocol, after processing an "ok:1" hello or legacy hello response, the client -MUST NOT sleep and MUST begin the next check immediately. - -Clients MUST set `maxAwaitTimeMS`_ to heartbeatFrequencyMS. - -hello or legacy hello Cancellation -`````````````````````````````````` - -When a client is closed, clients MUST cancel all hello and legacy hello checks; a monitor -blocked waiting for the next streaming hello or legacy hello response MUST be interrupted -such that threads may exit promptly without waiting maxAwaitTimeMS. - -When a client marks a server Unknown from `Network error when reading or writing`_, -clients MUST cancel the hello or legacy hello check on that server and close the -current monitoring connection. (See `Drivers cancel in-progress monitor checks`_.) - -Polling Protocol -'''''''''''''''' - -The polling protocol is used to monitor MongoDB < 4.4 servers or when `streaming is disabled`_. -The client `checks`_ a server with a hello or legacy hello command and then sleeps for -heartbeatFrequencyMS before running another check. - -Marking the connection pool as ready (CMAP only) -'''''''''''''''''''''''''''''''''''''''''''''''' - -When a monitor completes a successful check against a server, it MUST mark the -connection pool for that server as "ready", and doing so MUST be synchronized -with the update to the topology (e.g. by marking the pool as ready in -onServerDescriptionChanged). This is required to ensure a server does not get -selected while its pool is still paused. See the `Connection Pool`_ definition -in the CMAP specification for more details on marking the pool as "ready". - -Error handling -'''''''''''''' - -Network or command error during server check -```````````````````````````````````````````` - -When a server `check`_ fails due to a network error (including a network -timeout) or a command error (``ok: 0``), the client MUST follow these steps: - -#. Close the current monitoring connection. -#. Mark the server Unknown. -#. Clear the connection pool for the server (See `Clear the connection pool on - both network and command errors`_). For CMAP compliant drivers, clearing the - pool MUST be synchronized with marking the server as Unknown (see `Why - synchronize clearing a server's pool with updating the topology?`_). If this - was a network timeout error, then the pool MUST be cleared with interruptInUseConnections = true - (see `Why does the pool need to support closing in use connections as part of - its clear logic?`_) -#. If this was a network error and the server was in a known state before the - error, the client MUST NOT sleep and MUST begin the next check immediately. - (See `retry hello or legacy hello calls once`_ and - `JAVA-1159 `_.) -#. Otherwise, wait for heartbeatFrequencyMS (or minHeartbeatFrequencyMS if a - check is requested) before restarting the monitoring protocol on a new - connection. - - - Note that even in the streaming protocol, a monitor in this state will - wait for an application operation to `request an immediate check`_ or - for the heartbeatFrequencyMS timeout to expire before beginning the next - check. - -See the pseudocode in the `Monitor thread` section. - -Note that this rule applies only to server checks during monitoring. -It does *not* apply when multi-threaded -`clients update the topology from each handshake`_. - -Implementation notes -'''''''''''''''''''' - -This section intends to provide generous guidance to driver authors. -It is complementary to the reference implementations. -Words like "should", "may", and so on are used more casually here. - -Monitor thread -`````````````` - -Most platforms can use an event object to control the monitor thread. -The event API here is assumed to be like the standard `Python Event -`_. -`heartbeatFrequencyMS`_ is configurable, -`minHeartbeatFrequencyMS`_ is always 500 milliseconds: - -.. code-block:: python - - class Monitor(Thread): - def __init__(): - # Monitor options: - serverAddress = serverAddress - connectTimeoutMS = connectTimeoutMS - heartbeatFrequencyMS = heartbeatFrequencyMS - minHeartbeatFrequencyMS = 500 - stableApi = stableApi - if serverMonitoringMode == "stream": - streamingEnabled = True - elif serverMonitoringMode == "poll": - streamingEnabled = False - else: # serverMonitoringMode == "auto" - streamingEnabled = not isFaas() - - # Internal Monitor state: - connection = Null - # Server API versioning implies that the server supports hello. - helloOk = stableApi != Null - description = default ServerDescription - lock = Mutex() - rttMonitor = RttMonitor(serverAddress, stableApi) - - def run(): - while this monitor is not stopped: - previousDescription = description - try: - description = checkServer(previousDescription) - except CheckCancelledError: - if this monitor is stopped: - # The client was closed. - return - # The client marked this server Unknown and cancelled this - # check during "Network error when reading or writing". - # Wait before running the next check. - wait() - continue - - with client.lock: - topology.onServerDescriptionChanged(description, connection pool for server) - if description.error != Null: - # Clear the connection pool only after the server description is set to Unknown. - clear(interruptInUseConnections: isNetworkTimeout(description.error)) connection pool for server - - # Immediately proceed to the next check if the previous response - # was successful and included the topologyVersion field, or the - # previous response included the moreToCome flag, or the server - # has just transitioned to Unknown from a network error. - serverSupportsStreaming = description.type != Unknown and description.topologyVersion != Null - connectionIsStreaming = connection != Null and connection.moreToCome - transitionedWithNetworkError = isNetworkError(description.error) and previousDescription.type != Unknown - if streamingEnabled and serverSupportsStreaming and not rttMonitor.started: - # Start the RttMonitor. - rttMonitor.run() - if (streamingEnabled and (serverSupportsStreaming or connectionIsStreaming)) or transitionedWithNetworkError: - continue - - wait() - - def setUpConnection(): - # Take the mutex to avoid a data race becauase this code writes to the connection field and a concurrent - # cancelCheck call could be reading from it. - with lock: - # Server API versioning implies that the server supports hello. - helloOk = stableApi != Null - connection = new Connection(serverAddress) - set connection timeout to connectTimeoutMS - - # Do any potentially blocking operations after releasing the mutex. - create the socket and perform connection handshake - - def checkServer(previousDescription): - try: - # The connection is null if this is the first check. It's closed if there was an error during the previous - # check or the previous check was cancelled. - - if helloOk: - helloCommand = hello - else - helloCommand = legacy hello - - if not connection or connection.isClosed(): - setUpConnection() - rttMonitor.addSample(connection.handshakeDuration) - response = connection.handshakeResponse - elif connection.moreToCome: - response = read next helloCommand exhaust response - elif streamingEnabled and previousDescription.topologyVersion: - # Initiate streaming hello or legacy hello - if connectTimeoutMS != 0: - set connection timeout to connectTimeoutMS+heartbeatFrequencyMS - response = call {helloCommand: 1, helloOk: True, topologyVersion: previousDescription.topologyVersion, maxAwaitTimeMS: heartbeatFrequencyMS} - else: - # The server does not support topologyVersion or streamingEnabled=False. - response = call {helloCommand: 1, helloOk: True} - - # If the server supports hello, then response.helloOk will be true - # and hello will be used for subsequent monitoring commands. - # If the server does not support hello, then response.helloOk will be undefined - # and legacy hello will be used for subsequent monitoring commands. - helloOk = response.helloOk - - return ServerDescription(response, rtt=rttMonitor.average(), ninetiethPercentileRtt=rttMonitor.ninetiethPercentile()) - except Exception as exc: - close connection - rttMonitor.reset() - return ServerDescription(type=Unknown, error=exc) - - def wait(): - start = gettime() - - # Can be awakened by requestCheck(). - event.wait(heartbeatFrequencyMS) - event.clear() - - waitTime = gettime() - start - if waitTime < minHeartbeatFrequencyMS: - # Cannot be awakened. - sleep(minHeartbeatFrequencyMS - waitTime) - - -`Requesting an immediate check`_: - -.. code-block:: python - - def requestCheck(): - event.set() - - -`hello or legacy hello Cancellation`_: - -.. code-block:: python - - def cancelCheck(): - # Take the mutex to avoid reading the connection value while setUpConnection is writing to it. - # Copy the connection value in the lock but do the actual cancellation outside. - with lock: - tempConnection = connection - - if tempConnection: - interrupt connection read - close tempConnection - -RTT thread -`````````` - -The requirements in the `Measuring RTT`_ section can be satisfied with an -additional thread that periodically runs the hello or legacy hello command -on a dedicated connection, for example: - -.. code-block:: python - - class RttMonitor(Thread): - def __init__(): - # Options: - serverAddress = serverAddress - connectTimeoutMS = connectTimeoutMS - heartbeatFrequencyMS = heartbeatFrequencyMS - stableApi = stableApi - - # Internal state: - connection = Null - # Server API versioning implies that the server supports hello. - helloOk = stableApi != Null - lock = Mutex() - movingAverage = MovingAverage() - # Track the min RTT seen in the most recent 10 samples. - recentSamples = deque(maxlen=10) - - def reset(): - with lock: - movingAverage.reset() - recentSamples.clear() - - def addSample(rtt): - with lock: - movingAverage.update(rtt) - recentSamples.append(rtt) - - def average(): - with lock: - return movingAverage.get() - - def min(): - with lock: - # Need at least 2 RTT samples. - if len(recentSamples) < 2: - return 0 - return min(recentSamples) - - def run(): - while this monitor is not stopped: - try: - rtt = pingServer() - addSample(rtt) - except Exception as exc: - # Don't call reset() here. The Monitor thread is responsible - # for resetting the average RTT. - close connection - connection = Null - helloOk = stableApi != Null - - # Can be awakened when the client is closed. - event.wait(heartbeatFrequencyMS) - event.clear() - - def setUpConnection(): - # Server API versioning implies that the server supports hello. - helloOk = stableApi != Null - connection = new Connection(serverAddress) - set connection timeout to connectTimeoutMS - perform connection handshake - - def pingServer(): - if helloOk: - helloCommand = hello - else - helloCommand = legacy hello - - if not connection: - setUpConnection() - return RTT of the connection handshake - - start = time() - response = call {helloCommand: 1, helloOk: True} - rtt = time() - start - helloOk = response.helloOk - return rtt - - -Design Alternatives -------------------- - -Alternating hello or legacy hello to check servers and RTT without adding an extra connection -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -The streaming hello or legacy hello protocol is optimal in terms of latency; -clients are always blocked waiting for the server to stream updated hello or -legacy hello information, they learn of server state changes as soon as possible. -However, streaming hello or legacy hello has two downsides: - -1. Streaming hello or legacy hello requires a new connection to each server to - calculate the RTT. -2. Streaming hello or legacy hello requires a new thread (or threads) to calculate - the RTT of each server. - -To address these concerns we designed the alternating hello or legacy hello protocol. -This protocol would have alternated between awaitable hello or legacy hello and regular -hello or legacy hello. The awaitable hello or legacy hello replaces the polling protocol's -client side sleep and allows the client to receive updated hello or legacy hello -responses sooner. The regular hello or legacy hello allows the client to maintain -accurate RTT calculations without requiring any extra threads or -sockets. - -We reject this design because streaming hello or legacy hello is strictly better at -reducing the client's time-to-recovery. We determined that one extra -connection per server per MongoClient is reasonable for all drivers. -Applications that upgrade may see a modest increase in connections and -memory usage on the server. We don't expect this increase to be -problematic; however, we have several projects planned for future -MongoDB releases to make the streaming hello or legacy hello protocol cheaper -server-side which should mitigate the cost of the extra monitoring -connections. - -Use TCP smoothed round-trip time instead of measuring RTT explicitly -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -TCP sockets internally maintain a "smoothed round-trip time" or SRTT. Drivers -could use this SRTT instead of measuring RTT explicitly via hello or legacy hello commands. -The server could even include this value on all hello or legacy hello responses. We reject -this idea for a few reasons: - -- Not all programming languages have an API to access the TCP socket's RTT. -- On Windows, RTT access requires Admin privileges. -- TCP's SRTT would likely differ substantially from RTT measurements in - the current protocol. For example, the SRTT can be reset on - `retransmission timeouts `_. - -Rationale ---------- - -Thread per server -''''''''''''''''' - -Mongos uses a monitor thread per replica set, rather than a thread per server. -A thread per server is impractical if mongos is monitoring a large number of -replica sets. -But a driver only monitors one. - -In mongos, threads trying to do reads and writes join the effort to scan -the replica set. -Such threads are more likely to be abundant in mongos than in drivers, -so mongos can rely on them to help with monitoring. - -In short: mongos has different scaling concerns than -a multi-threaded or asynchronous driver, -so it allocates threads differently. - -Socket timeout for monitoring is connectTimeoutMS -''''''''''''''''''''''''''''''''''''''''''''''''' - -When a client waits for a server to respond to a connection, -the client does not know if the server will respond eventually or if it is down. -Users can help the client guess correctly -by supplying a reasonable connectTimeoutMS for their network: -on some networks a server is probably down if it hasn't responded in 10 ms, -on others a server might still be up even if it hasn't responded in 10 seconds. - -The socketTimeoutMS, on the other hand, must account for both network latency -and the operation's duration on the server. -Applications should typically set a very long or infinite socketTimeoutMS -so they can wait for long-running MongoDB operations. - -Multi-threaded clients use distinct sockets for monitoring and for application -operations. -A socket used for monitoring does two things: it connects and calls hello or legacy hello. -Both operations are fast on the server, so only network latency matters. -Thus both operations SHOULD use connectTimeoutMS, since that is the value -users supply to help the client guess if a server is down, -based on users' knowledge of expected latencies on their networks. - -A monitor SHOULD NOT use the client's regular connection pool -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -If a multi-threaded driver's connection pool enforces a maximum size -and monitors use sockets from the pool, -there are two bad options: -either monitors compete with the application for sockets, -or monitors have the exceptional ability -to create sockets even when the pool has reached its maximum size. -The former risks starving the monitor. -The latter is more complex than it is worth. -(A lesson learned from PyMongo 2.6's pool, which implemented this option.) - -Since this rule is justified for drivers that enforce a maximum pool size, -this spec recommends that all drivers follow the same rule -for the sake of consistency. - -Monitors MUST use a dedicated connection for RTT commands -''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -When using the streaming protocol, a monitor needs to maintain an extra -dedicated connection to periodically update its average round trip time in -order to support `localThresholdMS`_ from the Server Selection spec. - -It could pop a connection from its regular pool, but we rejected this option -for a few reasons: - -- Under contention the RTT task may block application operations from - completing in a timely manner. -- Under contention the application may block the RTT task from completing in - a timely manner. -- Under contention the RTT task may often result in an extra connection - anyway because the pool creates new connections under contention up to maxPoolSize. -- This would be inconsistent with the rule that a monitor SHOULD NOT use the - client's regular connection pool. - -The client could open and close a new connection for each RTT check. -We rejected this design, because if we ping every heartbeatFrequencyMS -(default 10 seconds) then the cost to the client and the server of creating -and destroying the connection might exceed the cost of keeping a dedicated -connection open. - -Instead, the client must use a dedicated connection reserved for RTT commands. -Despite the cost of the additional connection per server, we chose this option -as the safest and least likely to result in surprising behavior under load. - -Monitors MUST use the hello or legacy hello command to measure RTT -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -In the streaming protocol, clients could use the "ping", "hello", or legacy hello -commands to measure RTT. This spec chooses "hello" or legacy hello for consistency -with the polling protocol as well as consistency with the initial RTT provided the -connection handshake which also uses the hello or legacy hello commands. Additionally, -mongocryptd does not allow the ping command but does allow hello or legacy hello. - -Why not use `awaitedTimeMS` in the server response to calculate RTT in the streaming protocol? -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -One approach to calculating RTT in the streaming protocol would be to have the server -return an ``awaitedTimeMS`` in its ``hello`` or legacy hello response. A driver could then -determine the RTT by calculating the difference between the initial request, or last response, -and the ``awaitedTimeMS``. - -We rejected this design because of a number of issue with the unreliability of clocks in -distributed systems. Clocks skew between local and remote system clocks. This approach mixes -two notions of time: the local clock times the whole operation while the remote clock times -the wait. This means that if these clocks tick at different rates, or there are anomalies -like clock changes, you will get bad results. To make matters worse, you will be comparing -times from multiple servers that could each have clocks ticking at different rates. This -approach will bias toward servers with the fastest ticking clock, since it will seem like it -spends the least time on the wire. - -Additionally, systems using NTP will experience clock "slew". ntpd "slews" time by up to 500 -parts-per-million to have the local time gradually approach the "true" time without big -jumps - over a 10 second window that means a 5ms difference. If both sides are slewing in -opposite directions, that can result in an effective difference of 10ms. Both of these times -are close enough to `localThresholdMS`_ to significantly affect which servers are viable -in NEAREST calculations. - -Ensuring that all measurements use the same clock obviates the need for a more complicated -solution, and mitigates the above mentioned concerns. - -Why don't clients mark a server unknown when an RTT command fails? -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -In the streaming protocol, clients use the hello or legacy hello command on a dedicated -connection to measure a server's RTT. However, errors encountered when running -the RTT command MUST NOT mark a server Unknown. We reached this decision -because the dedicate RTT connection does not come from a connection pool and -thus does not have a generation number associated with it. Without a generation -number we cannot handle errors from the RTT command without introducing race -conditions. Introducing such a generation number would add complexity to this -design without much benefit. It is safe to ignore these errors because the -Monitor will soon discover the server's state regardless (either through an -updated streaming response, an error on the streaming connection, or by -handling an error on an application connection). - -Drivers cancel in-progress monitor checks -''''''''''''''''''''''''''''''''''''''''' - -When an application operation fails with a non-timeout network error, drivers -cancel that monitor's in-progress check. - -We assume that a non-timeout network error on one application connection -implies that all other connections to that server are also bad. This means -that it is redundant to continue reading on the current monitoring connection. -Instead, we cancel the current monitor check, close the monitoring connection, -and start a new check soon. Note that we rely on the connection/pool -generation number checking to avoid races and ensure that the monitoring -connection is only closed once. - -This approach also handles the rare case where the client sees a network error -on an application connection but the monitoring connection is still healthy. -If we did not cancel the monitor check in this scenario, then the server would -remain in the Unknown state until the next hello or legacy hello response (up to -maxAwaitTimeMS). A potential real world example of this behavior is when -Azure closes an idle connection in the application pool. - -Retry hello or legacy hello calls once -'''''''''''''''''''''''''''''''''''''' - -A monitor's connection to a server is long-lived and used only for hello or legacy hello -calls. So if a server has responded in the past, a network error on the -monitor's connection means that there was a network glitch, or a server restart -since the last check, or that the server is truly down. To handle the case -that the server is truly down, the monitor makes the server unselectable by -marking it Unknown. To handle the case of a transient network glitch or -restart, the monitor immediately runs the next check without waiting. - -Clear the connection pool on both network and command errors -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -A monitor clears the connection pool when a server check fails with a network -or command error (`Network or command error during server check`_). -When the check fails with a network error it is likely that all connections -to that server are also closed. -(See `JAVA-1252 `_). When the check fails -with a network timeout error, a monitor MUST set interruptInUseConnections to true. -See, `Why does the pool need to support closing in use connections as part of its clear logic?`_. - -When the server is shutting down, it may respond to hello or legacy hello commands with -ShutdownInProgress errors before closing connections. In this case, the -monitor clears the connection pool because all connections will be closed soon. -Other command errors are unexpected but are handled identically. - -Why must streaming hello or legacy hello clients publish ServerHeartbeatStartedEvents? -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -The `SDAM Monitoring spec`_ guarantees that every ServerHeartbeatStartedEvent -has either a correlating ServerHeartbeatSucceededEvent or -ServerHeartbeatFailedEvent. This is consistent with Command Monitoring on -exhaust cursors where the driver publishes a fake CommandStartedEvent before -reading the next getMore response. - -Why don't streaming hello or legacy hello clients publish events for RTT commands? -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -In the streaming protocol, clients MUST NOT publish any events -(server, topology, command, CMAP, etc..) when running an RTT command. We -considered introducing new RTT events (ServerRTTStartedEvent, -ServerRTTSucceededEvent, ServerRTTFailedEvent) but it's not clear that -there is a demand for this. Applications can still monitor changes to a -server's RTT by listening to TopologyDescriptionChangedEvents. - -What is the purpose of the "awaited" field on server heartbeat events? -'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -ServerHeartbeatSucceededEvents published from awaitable hello or legacy hello -responses will regularly have 10 second durations. The spec introduces -the "awaited" field on server heartbeat events so that applications can -differentiate a slow heartbeat in the polling protocol from a normal -awaitable hello or legacy hello heartbeat in the new protocol. - -Why disable the streaming protocol on FaaS platforms like AWS Lambda? -''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' - -The streaming protocol relies on the assumption that the client -can read the server's heartbeat responses in a timely manner, otherwise the -client will be acting on stale information. In many FaaS platforms, like AWS -Lambda, host applications will be suspended and resumed many minutes later. -This behavior causes a build up of heartbeat responses and the client can end -up spending a long time in a catch up phase processing outdated responses. -This problem was discovered in `DRIVERS-2246`_. - -Additionally, the streaming protocol requires an extra connection and thread -per monitored server which is expensive on platforms like AWS Lambda. The -extra connection is particularly inefficient when thousands of AWS instances -and thus thousands of clients are used. - -We decided to make polling the default behavior when running on FaaS platforms -like AWS Lambda to improve scalability, performance, and reliability. - -Why introduce a knob for serverMonitoringMode? -'''''''''''''''''''''''''''''''''''''''''''''' - -The serverMonitoringMode knob provides a workaround in cases where the polling -protocol would be a better choice but the driver is not running on a FaaS -platform. It also provides a workaround in case the FaaS detection -logic becomes outdated or inaccurate. - -Changelog ---------- - -:2020-02-20: Extracted server monitoring from SDAM into this new spec. -:2020-03-09: A monitor check that creates a new connection MUST use the - connection's handshake to update the topology. -:2020-04-20: Add streaming heartbeat protocol. -:2020-05-20: Include rationale for why we don't use `awaitedTimeMS` -:2020-06-11: Support connectTimeoutMS=0 in streaming heartbeat protocol. -:2020-12-17: Mark the pool for a server as "ready" after performing a successful - check. Synchronize pool clearing with SDAM updates. -:2021-06-21: Added support for hello/helloOk to handshake and monitoring. -:2021-06-24: Remove optimization mention that no longer applies -:2022-01-19: Add 90th percentile RTT tracking. -:2022-02-24: Rename Versioned API to Stable API -:2022-04-05: Preemptively cancel in progress operations when SDAM heartbeats timeout. -:2022-10-05: Remove spec front matter reformat changelog. -:2022-11-17: Add minimum RTT tracking and remove 90th percentile RTT. -:2023-10-05: Add serverMonitoringMode and default to the polling protocol on FaaS. - Clients MUST NOT use dedicated connections to measure RTT when using the polling protocol. - ----- - -.. Section for links. - -.. _Server Selection Spec: ../server-selection/server-selection.md -.. _main SDAM spec: server-discovery-and-monitoring.rst -.. _Server Discovery And Monitoring: server-discovery-and-monitoring.rst -.. _server API version: /source/versioned-api/versioned-api.rst -.. _heartbeatFrequencyMS in the main SDAM spec: server-discovery-and-monitoring.rst#heartbeatFrequencyMS -.. _error handling: server-discovery-and-monitoring.rst#error-handling -.. _initial servers: server-discovery-and-monitoring.rst#initial-servers -.. _updateRSWithoutPrimary: server-discovery-and-monitoring.rst#updateRSWithoutPrimary -.. _updateRSFromPrimary: server-discovery-and-monitoring.rst#updateRSFromPrimary -.. _Network error when reading or writing: server-discovery-and-monitoring.rst#network-error-when-reading-or-writing -.. _connection handshake: mongodb-handshake/handshake.rst -.. _localThresholdMS: ../server-selection/server-selection.md#localThresholdMS -.. _SDAM Monitoring spec: server-discovery-and-monitoring-logging-and-monitoring.rst#heartbeats -.. _OP_MSG Spec: ../message/OP_MSG.md -.. _OP_MSG exhaustAllowed flag: ../message/OP_MSG.md#exhaustAllowed -.. _Connection Pool: /source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#Connection-Pool -.. _Why synchronize clearing a server's pool with updating the topology?: server-discovery-and-monitoring.rst#why-synchronize-clearing-a-server-s-pool-with-updating-the-topology? -.. _Client Side Operations Timeout Spec: /source/client-side-operations-timeout/client-side-operations-timeout.rst -.. _timeoutMS: /source/client-side-operations-timeout/client-side-operations-timeout.rst#timeoutMS -.. _Why does the pool need to support closing in use connections as part of its clear logic?: /source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#Why-does-the-pool-need-to-support-closing-in-use-connections-as-part-of-its-clear-logic? -.. _DRIVERS-2246: https://jira.mongodb.org/browse/DRIVERS-2246 -.. _MongoDB Handshake spec: /source/mongodb-handshake/handshake.rst#client-env +.. note:: + This specification has been converted to Markdown and renamed to + `server-monitoring.md `_. diff --git a/source/server-discovery-and-monitoring/tests/README.md b/source/server-discovery-and-monitoring/tests/README.md new file mode 100644 index 0000000000..a96bcb6490 --- /dev/null +++ b/source/server-discovery-and-monitoring/tests/README.md @@ -0,0 +1,239 @@ +# Server Discovery And Monitoring Tests + +______________________________________________________________________ + +The YAML and JSON files in this directory tree are platform-independent tests that drivers can use to prove their +conformance to the Server Discovery And Monitoring Spec. + +Additional prose tests, that cannot be represented as spec tests, are described and MUST be implemented. + +## Version + +Files in the "specifications" repository have no version scheme. They are not tied to a MongoDB server version. + +## Format + +Each YAML file has the following keys: + +- description: A textual description of the test. +- uri: A connection string. +- phases: An array of "phase" objects. A phase of the test optionally sends inputs to the client, then tests the + client's resulting TopologyDescription. + +Each phase object has the following keys: + +- description: (optional) A textual description of this phase. +- responses: (optional) An array of "response" objects. If not provided, the test runner should construct the client and + perform assertions specified in the outcome object without processing any responses. +- applicationErrors: (optional) An array of "applicationError" objects. +- outcome: An "outcome" object representing the TopologyDescription. + +A response is a pair of values: + +- The source, for example "a:27017". This is the address the client sent the "hello" or legacy hello command to. +- A hello or legacy hello response, for example `{ok: 1, helloOk: true, isWritablePrimary: true}`. If the response + includes an electionId it is shown in extended JSON like `{"$oid": "000000000000000000000002"}`. The empty response + `{}` indicates a network error when attempting to call "hello" or legacy hello. + +An "applicationError" object has the following keys: + +- address: The source address, for example "a:27017". +- generation: (optional) The error's generation number, for example `1`. When absent this value defaults to the pool's + current generation number. +- maxWireVersion: The `maxWireVersion` of the connection the error occurs on, for example `9`. Added to support testing + the behavior of "not writable primary" errors on \<4.2 and >=4.2 servers. +- when: A string describing when this mock error should occur. Supported values are: + - "beforeHandshakeCompletes": Simulate this mock error as if it occurred during a new connection's handshake for an + application operation. + - "afterHandshakeCompletes": Simulate this mock error as if it occurred on an established connection for an + application operation (i.e. after the connection pool check out succeeds). +- type: The type of error to mock. Supported values are: + - "command": A command error. Always accompanied with a "response". + - "network": A non-timeout network error. + - "timeout": A network timeout error. +- response: (optional) A command error response, for example `{ok: 0, errmsg: "not primary"}`. Present if and only if + `type` is "command". Note the server only returns "not primary" if the "hello" command has been run on this + connection. Otherwise the legacy error message is returned. + +In non-monitoring tests, an "outcome" represents the correct TopologyDescription that results from processing the +responses in the phases so far. It has the following keys: + +- topologyType: A string like "ReplicaSetNoPrimary". +- setName: A string with the expected replica set name, or null. +- servers: An object whose keys are addresses like "a:27017", and whose values are "server" objects. +- logicalSessionTimeoutMinutes: null or an integer. +- maxSetVersion: absent or an integer. +- maxElectionId: absent or a BSON ObjectId. +- compatible: absent or a bool. + +A "server" object represents a correct ServerDescription within the client's current TopologyDescription. It has the +following keys: + +- type: A ServerType name, like "RSSecondary". See [ServerType](../server-discovery-and-monitoring.rst#servertype) for + details pertaining to async and multi-threaded drivers. +- setName: A string with the expected replica set name, or null. +- setVersion: absent or an integer. +- electionId: absent, null, or an ObjectId. +- logicalSessionTimeoutMinutes: absent, null, or an integer. +- minWireVersion: absent or an integer. +- maxWireVersion: absent or an integer. +- topologyVersion: absent, null, or a topologyVersion document. +- pool: (optional) A "pool" object. + +A "pool" object represents a correct connection pool for a given server. It has the following keys: + +- generation: This server's expected pool generation, like `0`. + +In monitoring tests, an "outcome" contains a list of SDAM events that should have been published by the client as a +result of processing hello or legacy hello responses in the current phase. Any SDAM events published by the client +during its construction (that is, prior to processing any of the responses) should be combined with the events published +during processing of hello or legacy hello responses of the first phase of the test. A test MAY explicitly verify events +published during client construction by providing an empty responses array for the first phase. + +## Use as unittests + +### Mocking + +Drivers should be able to test their server discovery and monitoring logic without any network I/O, by parsing hello (or +legacy hello) and application error from the test file and passing them into the driver code. Parts of the client and +monitoring code may need to be mocked or subclassed to achieve this. +[A reference implementation for PyMongo 3.10.1 is available here](https://github.com/mongodb/mongo-python-driver/blob/3.10.1/test/test_discovery_and_monitoring.py). + +### Initialization + +For each file, create a fresh client object initialized with the file's "uri". + +All files in the "single" directory include a connection string with one host and no "replicaSet" option. Set the +client's initial TopologyType to Single, however that is achieved using the client's API. (The spec says "The user MUST +be able to set the initial TopologyType to Single" without specifying how.) + +All files in the "sharded" directory include a connection string with multiple hosts and no "replicaSet" option. Set the +client's initial TopologyType to Unknown or Sharded, depending on the client's API. + +All files in the "rs" directory include a connection string with a "replicaSet" option. Set the client's initial +TopologyType to ReplicaSetNoPrimary. (For most clients, parsing a connection string with a "replicaSet" option +automatically sets the TopologyType to ReplicaSetNoPrimary.) Some of the files in "rs" are post-fixed with "pre-6.0". +These files test the `updateRSFromPrimary` behavior prior to maxWireVersion 17, there should be no special handling +required for these tests. + +Set up a listener to collect SDAM events published by the client, including events published during client construction. + +### Test Phases + +For each phase in the file: + +1. Parse the "responses" array. Pass in the responses in order to the driver code. If a response is the empty object + `{}`, simulate a network error. +2. Parse the "applicationErrors" array. For each element, simulate the given error as if it occurred while running an + application operation. Note that it is sufficient to construct a mock error and call the procedure which updates the + topology, e.g. `topology.handleApplicationError(address, generation, maxWireVersion, error)`. + +For non-monitoring tests, once all responses are processed, assert that the phase's "outcome" object is equivalent to +the driver's current TopologyDescription. + +For monitoring tests, once all responses are processed, assert that the events collected so far by the SDAM event +listener are equivalent to the events specified in the phase. + +Some fields such as "logicalSessionTimeoutMinutes", "compatible", and "topologyVersion" were added later and haven't +been added to all test files. If these fields are present, test that they are equivalent to the fields of the driver's +current TopologyDescription or ServerDescription. + +For monitoring tests, clear the list of events collected so far. + +Continue until all phases have been executed. + +## Integration Tests + +Integration tests are provided in the "unified" directory and are written in the +[Unified Test Format](../../unified-test-format/unified-test-format.md). + +## Prose Tests + +The following prose tests cannot be represented as spec tests and MUST be implemented. + +### Streaming protocol Tests + +Drivers that implement the streaming protocol (multi-threaded or asynchronous drivers) must implement the following +tests. Each test should be run against a standalone, replica set, and sharded cluster unless otherwise noted. + +Some of these cases should already be tested with the old protocol; in that case just verify the test cases succeed with +the new protocol. + +1. Configure the client with heartbeatFrequencyMS set to 500, overriding the default of 10000. Assert the client + processes hello and legacy hello replies more frequently (approximately every 500ms). + +### RTT Tests + +Run the following test(s) on MongoDB 4.4+. + +1. Test that RTT is continuously updated. + 1. Create a client with `heartbeatFrequencyMS=500`, `appName=streamingRttTest`, and subscribe to server events. + + 2. Run a find command to wait for the server to be discovered. + + 3. Sleep for 2 seconds. This must be long enough for multiple heartbeats to succeed. + + 4. Assert that each `ServerDescriptionChangedEvent` includes a non-zero RTT. + + 5. Configure the following failpoint to block hello or legacy hello commands for 250ms which should add extra latency + to each RTT check: + + ``` + db.adminCommand({ + configureFailPoint: "failCommand", + mode: {times: 1000}, + data: { + failCommands: ["hello"], // or the legacy hello command + blockConnection: true, + blockTimeMS: 500, + appName: "streamingRttTest", + }, + }); + ``` + + 6. Wait for the server's RTT to exceed 250ms. Eventually the average RTT should also exceed 500ms but we use 250ms to + speed up the test. Note that the + [Server Description Equality](/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server-description-equality) + rule means that ServerDescriptionChangedEvents will not be published. This test may need to use a driver specific + helper to obtain the latest RTT instead. If the RTT does not exceed 250ms after 10 seconds, consider the test + failed. + + 7. Disable the failpoint: + + ``` + db.adminCommand({ + configureFailPoint: "failCommand", + mode: "off", + }); + ``` + +### Heartbeat Tests + +1. Test that `ServerHeartbeatStartedEvent` is emitted before the monitoring socket was created + 1. Create a mock TCP server (example shown below) that pushes a `client connected` event to a shared array when a + client connects and a `client hello received` event when the server receives the client hello and then closes the + connection: + + ``` + let events = []; + server = createServer(clientSocket => { + events.push('client connected'); + + clientSocket.on('data', () => { + events.push('client hello received'); + clientSocket.destroy(); + }); + }); + server.listen(9999); + ``` + + 2. Create a client with `serverSelectionTimeoutMS: 500` and listen to `ServerHeartbeatStartedEvent` and + `ServerHeartbeatFailedEvent`, pushing the event name to the same shared array as the mock TCP server + + 3. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect + + 4. Assert that the first four elements in the array are: : + + ``` + ['serverHeartbeatStartedEvent', 'client connected', 'client hello received', 'serverHeartbeatFailedEvent'] + ``` diff --git a/source/server-discovery-and-monitoring/tests/README.rst b/source/server-discovery-and-monitoring/tests/README.rst deleted file mode 100644 index 95b9865b7f..0000000000 --- a/source/server-discovery-and-monitoring/tests/README.rst +++ /dev/null @@ -1,293 +0,0 @@ -===================================== -Server Discovery And Monitoring Tests -===================================== - -.. contents:: - ----- - -The YAML and JSON files in this directory tree are platform-independent tests -that drivers can use to prove their conformance to the -Server Discovery And Monitoring Spec. - -Additional prose tests, that cannot be represented as spec tests, are -described and MUST be implemented. - -Version -------- - -Files in the "specifications" repository have no version scheme. They are not -tied to a MongoDB server version. - -Format ------- - -Each YAML file has the following keys: - -- description: A textual description of the test. -- uri: A connection string. -- phases: An array of "phase" objects. - A phase of the test optionally sends inputs to the client, - then tests the client's resulting TopologyDescription. - -Each phase object has the following keys: - -- description: (optional) A textual description of this phase. -- responses: (optional) An array of "response" objects. If not provided, - the test runner should construct the client and perform assertions specified - in the outcome object without processing any responses. -- applicationErrors: (optional) An array of "applicationError" objects. -- outcome: An "outcome" object representing the TopologyDescription. - -A response is a pair of values: - -- The source, for example "a:27017". - This is the address the client sent the "hello" or legacy hello command to. -- A hello or legacy hello response, for example ``{ok: 1, helloOk: true, isWritablePrimary: true}``. - If the response includes an electionId it is shown in extended JSON like - ``{"$oid": "000000000000000000000002"}``. - The empty response `{}` indicates a network error - when attempting to call "hello" or legacy hello. - -An "applicationError" object has the following keys: - -- address: The source address, for example "a:27017". -- generation: (optional) The error's generation number, for example ``1``. - When absent this value defaults to the pool's current generation number. -- maxWireVersion: The ``maxWireVersion`` of the connection the error occurs - on, for example ``9``. Added to support testing the behavior of "not writable primary" - errors on <4.2 and >=4.2 servers. -- when: A string describing when this mock error should occur. Supported - values are: - - - "beforeHandshakeCompletes": Simulate this mock error as if it occurred - during a new connection's handshake for an application operation. - - "afterHandshakeCompletes": Simulate this mock error as if it occurred - on an established connection for an application operation (i.e. after - the connection pool check out succeeds). - -- type: The type of error to mock. Supported values are: - - - "command": A command error. Always accompanied with a "response". - - "network": A non-timeout network error. - - "timeout": A network timeout error. - -- response: (optional) A command error response, for example - ``{ok: 0, errmsg: "not primary"}``. Present if and only if ``type`` is - "command". Note the server only returns "not primary" if the "hello" command - has been run on this connection. Otherwise the legacy error message is returned. - -In non-monitoring tests, an "outcome" represents the correct -TopologyDescription that results from processing the responses in the phases -so far. It has the following keys: - -- topologyType: A string like "ReplicaSetNoPrimary". -- setName: A string with the expected replica set name, or null. -- servers: An object whose keys are addresses like "a:27017", and whose values - are "server" objects. -- logicalSessionTimeoutMinutes: null or an integer. -- maxSetVersion: absent or an integer. -- maxElectionId: absent or a BSON ObjectId. -- compatible: absent or a bool. - -A "server" object represents a correct ServerDescription within the client's -current TopologyDescription. It has the following keys: - -- type: A ServerType name, like "RSSecondary". See `ServerType <../server-discovery-and-monitoring.rst#servertype>`_ for details pertaining to async and multi-threaded drivers. -- setName: A string with the expected replica set name, or null. -- setVersion: absent or an integer. -- electionId: absent, null, or an ObjectId. -- logicalSessionTimeoutMinutes: absent, null, or an integer. -- minWireVersion: absent or an integer. -- maxWireVersion: absent or an integer. -- topologyVersion: absent, null, or a topologyVersion document. -- pool: (optional) A "pool" object. - -A "pool" object represents a correct connection pool for a given server. -It has the following keys: - -- generation: This server's expected pool generation, like ``0``. - -In monitoring tests, an "outcome" contains a list of SDAM events that should -have been published by the client as a result of processing hello or legacy hello -responses in the current phase. Any SDAM events published by the client during its -construction (that is, prior to processing any of the responses) should be -combined with the events published during processing of hello or legacy hello -responses of the first phase of the test. A test MAY explicitly verify events -published during client construction by providing an empty responses array for the -first phase. - - -Use as unittests ----------------- - -Mocking -~~~~~~~ - -Drivers should be able to test their server discovery and monitoring logic without -any network I/O, by parsing hello (or legacy hello) and application error from the -test file and passing them into the driver code. Parts of the client and -monitoring code may need to be mocked or subclassed to achieve this. -`A reference implementation for PyMongo 3.10.1 is available here -`_. - -Initialization -~~~~~~~~~~~~~~ - -For each file, create a fresh client object initialized with the file's "uri". - -All files in the "single" directory include a connection string with one host -and no "replicaSet" option. -Set the client's initial TopologyType to Single, however that is achieved using the client's API. -(The spec says "The user MUST be able to set the initial TopologyType to Single" -without specifying how.) - -All files in the "sharded" directory include a connection string with multiple hosts -and no "replicaSet" option. -Set the client's initial TopologyType to Unknown or Sharded, depending on the client's API. - -All files in the "rs" directory include a connection string with a "replicaSet" option. -Set the client's initial TopologyType to ReplicaSetNoPrimary. -(For most clients, parsing a connection string with a "replicaSet" option -automatically sets the TopologyType to ReplicaSetNoPrimary.) -Some of the files in "rs" are post-fixed with "pre-6.0". These files test the ``updateRSFromPrimary`` behavior -prior to maxWireVersion 17, there should be no special handling required for these tests. - -Set up a listener to collect SDAM events published by the client, including -events published during client construction. - -Test Phases -~~~~~~~~~~~ - -For each phase in the file: - -#. Parse the "responses" array. Pass in the responses in order to the driver - code. If a response is the empty object ``{}``, simulate a network error. - -#. Parse the "applicationErrors" array. For each element, simulate the given - error as if it occurred while running an application operation. Note that - it is sufficient to construct a mock error and call the procedure which - updates the topology, e.g. - ``topology.handleApplicationError(address, generation, maxWireVersion, error)``. - -For non-monitoring tests, -once all responses are processed, assert that the phase's "outcome" object -is equivalent to the driver's current TopologyDescription. - -For monitoring tests, once all responses are processed, assert that the -events collected so far by the SDAM event listener are equivalent to the -events specified in the phase. - -Some fields such as "logicalSessionTimeoutMinutes", "compatible", and -"topologyVersion" were added later and haven't been added to all test files. -If these fields are present, test that they are equivalent to the fields of -the driver's current TopologyDescription or ServerDescription. - -For monitoring tests, clear the list of events collected so far. - -Continue until all phases have been executed. - -Integration Tests ------------------ - -Integration tests are provided in the "unified" directory and are -written in the `Unified Test Format -<../../unified-test-format/unified-test-format.md>`_. - -Prose Tests ------------ - -The following prose tests cannot be represented as spec tests and MUST be -implemented. - -Streaming protocol Tests -~~~~~~~~~~~~~~~~~~~~~~~~ - -Drivers that implement the streaming protocol (multi-threaded or -asynchronous drivers) must implement the following tests. Each test should be -run against a standalone, replica set, and sharded cluster unless otherwise -noted. - -Some of these cases should already be tested with the old protocol; in -that case just verify the test cases succeed with the new protocol. - -1. Configure the client with heartbeatFrequencyMS set to 500, - overriding the default of 10000. Assert the client processes - hello and legacy hello replies more frequently (approximately every 500ms). - -RTT Tests -~~~~~~~~~ - -Run the following test(s) on MongoDB 4.4+. - -1. Test that RTT is continuously updated. - - #. Create a client with ``heartbeatFrequencyMS=500``, - ``appName=streamingRttTest``, and subscribe to server events. - - #. Run a find command to wait for the server to be discovered. - - #. Sleep for 2 seconds. This must be long enough for multiple heartbeats - to succeed. - - #. Assert that each ``ServerDescriptionChangedEvent`` includes a non-zero - RTT. - - #. Configure the following failpoint to block hello or legacy hello commands - for 250ms which should add extra latency to each RTT check:: - - db.adminCommand({ - configureFailPoint: "failCommand", - mode: {times: 1000}, - data: { - failCommands: ["hello"], // or the legacy hello command - blockConnection: true, - blockTimeMS: 500, - appName: "streamingRttTest", - }, - }); - - #. Wait for the server's RTT to exceed 250ms. Eventually the average RTT - should also exceed 500ms but we use 250ms to speed up the test. Note - that the `Server Description Equality`_ rule means that - ServerDescriptionChangedEvents will not be published. This test may - need to use a driver specific helper to obtain the latest RTT instead. - If the RTT does not exceed 250ms after 10 seconds, consider the test - failed. - - #. Disable the failpoint:: - - db.adminCommand({ - configureFailPoint: "failCommand", - mode: "off", - }); - -Heartbeat Tests -~~~~~~~~~~~~~~~ - -1. Test that ``ServerHeartbeatStartedEvent`` is emitted before the monitoring socket was created - - #. Create a mock TCP server (example shown below) that pushes a ``client connected`` event to a shared array when a client connects and a ``client hello received`` event when the server receives the client hello and then closes the connection:: - - let events = []; - server = createServer(clientSocket => { - events.push('client connected'); - - clientSocket.on('data', () => { - events.push('client hello received'); - clientSocket.destroy(); - }); - }); - server.listen(9999); - - #. Create a client with ``serverSelectionTimeoutMS: 500`` and listen to ``ServerHeartbeatStartedEvent`` and ``ServerHeartbeatFailedEvent``, pushing the event name to the same shared array as the mock TCP server - - #. Attempt to connect client to previously created TCP server, catching the error when the client fails to connect - - #. Assert that the first four elements in the array are: :: - - ['serverHeartbeatStartedEvent', 'client connected', 'client hello received', 'serverHeartbeatFailedEvent'] - -.. Section for links. - -.. _Server Description Equality: /source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#server-description-equality diff --git a/source/server-discovery-and-monitoring/tests/monitoring/README.md b/source/server-discovery-and-monitoring/tests/monitoring/README.md new file mode 100644 index 0000000000..fd463c48db --- /dev/null +++ b/source/server-discovery-and-monitoring/tests/monitoring/README.md @@ -0,0 +1,9 @@ +# SDAM Monitoring Tests + +The YAML and JSON files in this directory tree are platform-independent tests that drivers can use to prove their +conformance to the SDAM Monitoring spec. + +## Format + +The format of the tests follows the standard SDAM test and should be able to leverage the existing test runner in each +language for the SDAM tests. diff --git a/source/server-discovery-and-monitoring/tests/monitoring/README.rst b/source/server-discovery-and-monitoring/tests/monitoring/README.rst deleted file mode 100644 index 7c741544ec..0000000000 --- a/source/server-discovery-and-monitoring/tests/monitoring/README.rst +++ /dev/null @@ -1,12 +0,0 @@ -===================== -SDAM Monitoring Tests -===================== - -The YAML and JSON files in this directory tree are platform-independent tests -that drivers can use to prove their conformance to the SDAM Monitoring spec. - -Format ------- - -The format of the tests follows the standard SDAM test and should be able to leverage -the existing test runner in each language for the SDAM tests. diff --git a/source/unified-test-format/unified-test-format.md b/source/unified-test-format/unified-test-format.md index 3f5def1348..4086c3b92f 100644 --- a/source/unified-test-format/unified-test-format.md +++ b/source/unified-test-format/unified-test-format.md @@ -957,7 +957,7 @@ The structure of each object is as follows: - `eventType`: Optional string. Specifies the type of the monitor which captured the events. Valid values are `command` for [Command Monitoring](../command-logging-and-monitoring/command-logging-and-monitoring.rst#events-api) events, `cmap` for [CMAP](../connection-monitoring-and-pooling/connection-monitoring-and-pooling.md#events) events, and `sdam` - for [SDAM](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + for [SDAM](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) events. Defaults to `command` if omitted. - `events`: Required array of [expectedEvent](#expectedevent) objects. List of events, which are expected to be observed (in this order) on the corresponding client while executing [operations](#test_operations). If the array is empty, the @@ -1115,7 +1115,7 @@ The structure of this object is as follows: - `serverDescriptionChangedEvent`: Optional object. Assertions for one or more - [ServerDescriptionChangedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + [ServerDescriptionChangedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) fields. The structure of this object is as follows: @@ -1136,7 +1136,7 @@ The structure of this object is as follows: - `serverHeartbeatStartedEvent`: Optional object. Assertions for one or more - [ServerHeartbeatStartedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + [ServerHeartbeatStartedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) fields. The structure of this object is as follows: @@ -1146,7 +1146,7 @@ The structure of this object is as follows: - `serverHeartbeatSucceededEvent`: Optional object. Assertions for one or more - [ServerHeartbeatSucceededEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + [ServerHeartbeatSucceededEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) fields. The structure of this object is as follows: @@ -1156,7 +1156,7 @@ The structure of this object is as follows: - `serverHeartbeatFailedEvent`: Optional object. Assertions for one or more - [ServerHeartbeatFailedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + [ServerHeartbeatFailedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) fields. The structure of this object is as follows: @@ -1166,7 +1166,7 @@ The structure of this object is as follows: - `topologyDescriptionChangedEvent`: Optional object. Assertions for one - [TopologyDescriptionChangedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.rst#events) + [TopologyDescriptionChangedEvent](../server-discovery-and-monitoring/server-discovery-and-monitoring-logging-and-monitoring.md#events) object. The structure of this object is as follows: