diff --git a/source/auth/auth.md b/source/auth/auth.md index 65329d0255..8e81de18fa 100644 --- a/source/auth/auth.md +++ b/source/auth/auth.md @@ -24,24 +24,30 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Definitions -- Credential\ - The pieces of information used to establish the authenticity of a user. This is composed of an identity - and some form of evidence such as a password or a certificate. +- Credential + + The pieces of information used to establish the authenticity of a user. This is composed of an identity and some form + of evidence such as a password or a certificate. + +- FQDN -- FQDN\ Fully Qualified Domain Name -- Mechanism\ +- Mechanism + A SASL implementation of a particular type of credential negotiation. -- Source\ - The authority used to establish credentials and/or privileges in reference to a mongodb server. In practice, - it is the database to which sasl authentication commands are sent. +- Source + + The authority used to establish credentials and/or privileges in reference to a mongodb server. In practice, it is the + database to which sasl authentication commands are sent. + +- Realm -- Realm\ The authority used to establish credentials and/or privileges in reference to GSSAPI. -- SASL\ +- SASL + Simple Authentication and Security Layer - [RFC 4422](http://www.ietf.org/rfc/rfc4422.txt) ### Client Implementation @@ -299,19 +305,24 @@ RESP = {ok: 1} #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MUST be specified and non-zero length. -- source\ +- source + MUST be specified. Defaults to the database name if supplied on the connection string or `admin`. -- password\ +- password + MUST be specified. -- mechanism\ +- mechanism + MUST be "MONGODB-CR" -- mechanism_properties\ +- mechanism_properties + MUST NOT be specified. ### MONGODB-X509 @@ -358,19 +369,24 @@ RESP = {"dbname" : "$external", "user" : "C=IS,ST=Reykjavik,L=Reykjavik,O=MongoD #### [MongoCredential](#mongocredential) Properties -- username\ +- username + SHOULD NOT be provided for MongoDB 3.4+ MUST be specified and non-zero length for MongoDB prior to 3.4 -- source\ +- source + MUST be "$external". Defaults to `$external`. -- password\ +- password + MUST NOT be specified. -- mechanism\ +- mechanism + MUST be "MONGODB-X509" -- mechanism_properties\ +- mechanism_properties + MUST NOT be specified. TODO: Errors @@ -412,7 +428,8 @@ scratch. ### GSSAPI -- Since:\ +- Since: + 2.4 Enterprise 2.6 Enterprise on Windows @@ -422,41 +439,49 @@ proprietary implementation called SSPI which is compatible with both Windows and [MongoCredential](#mongocredential) properties: -- username\ +- username + MUST be specified and non-zero length. -- source\ +- source + MUST be "$external". Defaults to `$external`. -- password\ - MAY be specified. If omitted, drivers MUST NOT pass the username without password to SSPI on Windows and - instead use the default credentials. +- password + + MAY be specified. If omitted, drivers MUST NOT pass the username without password to SSPI on Windows and instead use + the default credentials. + +- mechanism -- mechanism\ MUST be "GSSAPI" - mechanism_properties - - SERVICE_NAME\ + - SERVICE_NAME + Drivers MUST allow the user to specify a different service name. The default is "mongodb". - - CANONICALIZE_HOST_NAME\ - Drivers MAY allow the user to request canonicalization of the hostname. This might be - required when the hosts report different hostnames than what is used in the kerberos database. The value is a string - of either "none", "forward", or "forwardAndReverse". "none" is the default and performs no canonicalization. - "forward" performs a forward DNS lookup to canonicalize the hostname. "forwardAndReverse" performs a forward DNS - lookup and then a reverse lookup on that value to canonicalize the hostname. The driver MUST fallback to the - provided host if any lookup errors or returns no results. Drivers MAY decide to also keep the legacy boolean values - where `true` equals the "forwardAndReverse" behaviour and `false` equals "none". + - CANONICALIZE_HOST_NAME - - SERVICE_REALM\ - Drivers MAY allow the user to specify a different realm for the service. This might be necessary to - support cross-realm authentication where the user exists in one realm and the service in another. + Drivers MAY allow the user to request canonicalization of the hostname. This might be required when the hosts report + different hostnames than what is used in the kerberos database. The value is a string of either "none", "forward", + or "forwardAndReverse". "none" is the default and performs no canonicalization. "forward" performs a forward DNS + lookup to canonicalize the hostname. "forwardAndReverse" performs a forward DNS lookup and then a reverse lookup on + that value to canonicalize the hostname. The driver MUST fallback to the provided host if any lookup errors or + returns no results. Drivers MAY decide to also keep the legacy boolean values where `true` equals the + "forwardAndReverse" behaviour and `false` equals "none". - - SERVICE_HOST\ - Drivers MAY allow the user to specify a different host for the service. This is stored in the service - principal name instead of the standard host name. This is generally used for cases where the initial role is being - created from localhost but the actual service host would differ. + - SERVICE_REALM + + Drivers MAY allow the user to specify a different realm for the service. This might be necessary to support + cross-realm authentication where the user exists in one realm and the service in another. + + - SERVICE_HOST + + Drivers MAY allow the user to specify a different host for the service. This is stored in the service principal name + instead of the standard host name. This is generally used for cases where the initial role is being created from + localhost but the actual service host would differ. #### Hostname Canonicalization @@ -558,19 +583,24 @@ MongoDB supports either of these forms. #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MUST be specified and non-zero length. -- source\ +- source + MUST be specified. Defaults to the database name if supplied on the connection string or `$external`. -- password\ +- password + MUST be specified. -- mechanism\ +- mechanism + MUST be "PLAIN" -- mechanism_properties\ +- mechanism_properties + MUST NOT be specified. ### SCRAM-SHA-1 @@ -639,19 +669,24 @@ RESP = {conversationId: 1, payload: BinData(0,"dj1VTVdlSTI1SkQxeU5ZWlJNcFo0Vkh2a #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MUST be specified and non-zero length. -- source\ +- source + MUST be specified. Defaults to the database name if supplied on the connection string or `admin`. -- password\ +- password + MUST be specified. -- mechanism\ +- mechanism + MUST be "SCRAM-SHA-1" -- mechanism_properties\ +- mechanism_properties + MUST NOT be specified. ### SCRAM-SHA-256 @@ -700,19 +735,24 @@ RESP = {conversationId: 1, payload: BinData(0, "dj02cnJpVFJCaTIzV3BSUi93dHVwK21N #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MUST be specified and non-zero length. -- source\ +- source + MUST be specified. Defaults to the database name if supplied on the connection string or `admin`. -- password\ +- password + MUST be specified. -- mechanism\ +- mechanism + MUST be "SCRAM-SHA-256" -- mechanism_properties\ +- mechanism_properties + MUST NOT be specified. ### MONGODB-AWS @@ -897,23 +937,27 @@ Examples are provided below. #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MAY be specified. The non-sensitive AWS access key. -- source\ +- source + MUST be "$external". Defaults to `$external`. -- password\ +- password + MAY be specified. The sensitive AWS secret key. -- mechanism\ +- mechanism + MUST be "MONGODB-AWS" - mechanism_properties - - AWS_SESSION_TOKEN\ - Drivers MUST allow the user to specify an AWS session token for authentication with temporary - credentials. + - AWS_SESSION_TOKEN + + Drivers MUST allow the user to specify an AWS session token for authentication with temporary credentials. #### Obtaining Credentials @@ -1201,51 +1245,60 @@ in the MONGODB-OIDC specification, including sections or blocks that specificall #### [MongoCredential](#mongocredential) Properties -- username\ +- username + MAY be specified. Its meaning varies depending on the OIDC provider integration used. -- source\ +- source + MUST be "$external". Defaults to `$external`. -- password\ +- password + MUST NOT be specified. -- mechanism\ +- mechanism + MUST be "MONGODB-OIDC" - mechanism_properties - - ENVIRONMENT\ - Drivers MUST allow the user to specify the name of a built-in OIDC application environment integration - to use to obtain credentials. If provided, the value MUST be one of `["test", "azure", "gcp"]`. If both - `ENVIRONMENT` and an [OIDC Callback](#oidc-callback) or [OIDC Human Callback](#oidc-human-callback) are provided for - the same `MongoClient`, the driver MUST raise an error. - - - TOKEN_RESOURCE\ - The URI of the target resource. If `TOKEN_RESOURCE` is provided and `ENVIRONMENT` is not one of - `["azure", "gcp"]` or `TOKEN_RESOURCE` is not provided and `ENVIRONMENT` is one of `["azure", "gcp"]`, the driver - MUST raise an error. Note: because the `TOKEN_RESOURCE` is often itself a URL, drivers MUST document that a - `TOKEN_RESOURCE` with a comma `,` must be given as a `MongoClient` configuration and not as part of the connection - string, and that the `TOKEN_RESOURCE` value can contain a colon `:` character. - - - OIDC_CALLBACK\ - An [OIDC Callback](#oidc-callback) that returns OIDC credentials. Drivers MAY allow the user to - specify an [OIDC Callback](#oidc-callback) using a `MongoClient` configuration instead of a mechanism property, - depending on what is idiomatic for the driver. Drivers MUST NOT support both the `OIDC_CALLBACK` mechanism property - and a `MongoClient` configuration. - - - OIDC_HUMAN_CALLBACK\ - An [OIDC Human Callback](#oidc-human-callback) that returns OIDC credentials. Drivers MAY allow - the user to specify a [OIDC Human Callback](#oidc-human-callback) using a `MongoClient` configuration instead of a - mechanism property, depending on what is idiomatic for the driver. Drivers MUST NOT support both the - `OIDC_HUMAN_CALLBACK` mechanism property and a `MongoClient` configuration. Drivers MUST return an error if both an - [OIDC Callback](#oidc-callback) and `OIDC Human Callback` are provided for the same `MongoClient`. This property is - only required for drivers that support the [Human Authentication Flow](#human-authentication-flow). - - - ALLOWED_HOSTS\ - The list of allowed hostnames or ip-addresses (ignoring ports) for MongoDB connections. The hostnames - may include a leading "\*." wildcard, which allows for matching (potentially nested) subdomains. `ALLOWED_HOSTS` is - a security feature and MUST default to + - ENVIRONMENT + + Drivers MUST allow the user to specify the name of a built-in OIDC application environment integration to use to + obtain credentials. If provided, the value MUST be one of `["test", "azure", "gcp"]`. If both `ENVIRONMENT` and an + [OIDC Callback](#oidc-callback) or [OIDC Human Callback](#oidc-human-callback) are provided for the same + `MongoClient`, the driver MUST raise an error. + + - TOKEN_RESOURCE + + The URI of the target resource. If `TOKEN_RESOURCE` is provided and `ENVIRONMENT` is not one of `["azure", "gcp"]` + or `TOKEN_RESOURCE` is not provided and `ENVIRONMENT` is one of `["azure", "gcp"]`, the driver MUST raise an error. + Note: because the `TOKEN_RESOURCE` is often itself a URL, drivers MUST document that a `TOKEN_RESOURCE` with a comma + `,` must be given as a `MongoClient` configuration and not as part of the connection string, and that the + `TOKEN_RESOURCE` value can contain a colon `:` character. + + - OIDC_CALLBACK + + An [OIDC Callback](#oidc-callback) that returns OIDC credentials. Drivers MAY allow the user to specify an + [OIDC Callback](#oidc-callback) using a `MongoClient` configuration instead of a mechanism property, depending on + what is idiomatic for the driver. Drivers MUST NOT support both the `OIDC_CALLBACK` mechanism property and a + `MongoClient` configuration. + + - OIDC_HUMAN_CALLBACK + + An [OIDC Human Callback](#oidc-human-callback) that returns OIDC credentials. Drivers MAY allow the user to specify + a [OIDC Human Callback](#oidc-human-callback) using a `MongoClient` configuration instead of a mechanism property, + depending on what is idiomatic for the driver. Drivers MUST NOT support both the `OIDC_HUMAN_CALLBACK` mechanism + property and a `MongoClient` configuration. Drivers MUST return an error if both an [OIDC Callback](#oidc-callback) + and `OIDC Human Callback` are provided for the same `MongoClient`. This property is only required for drivers that + support the [Human Authentication Flow](#human-authentication-flow). + + - ALLOWED_HOSTS + + The list of allowed hostnames or ip-addresses (ignoring ports) for MongoDB connections. The hostnames may include a + leading "\*." wildcard, which allows for matching (potentially nested) subdomains. `ALLOWED_HOSTS` is a security + feature and MUST default to `["*.mongodb.net", "*.mongodb-qa.net", "*.mongodb-dev.net", "*.mongodbgov.net", "localhost", "127.0.0.1", "::1"]`. When MONGODB-OIDC authentication using a [OIDC Human Callback](#oidc-human-callback) is attempted against a hostname that does not match any of list of allowed hosts, the driver MUST raise a client-side error without invoking any @@ -1808,13 +1861,15 @@ def reauth(connection): #### Auth Related Options -- authMechanism\ +- authMechanism + MONGODB-CR, MONGODB-X509, GSSAPI, PLAIN, SCRAM-SHA-1, SCRAM-SHA-256, MONGODB-AWS -Sets the Mechanism property on the MongoCredential. When not set, the default will be one of SCRAM-SHA-256, SCRAM-SHA-1 -or MONGODB-CR, following the auth spec default mechanism rules. + Sets the Mechanism property on the MongoCredential. When not set, the default will be one of SCRAM-SHA-256, + SCRAM-SHA-1 or MONGODB-CR, following the auth spec default mechanism rules. + +- authSource -- authSource\ Sets the Source property on the MongoCredential. For GSSAPI, MONGODB-X509 and MONGODB-AWS authMechanisms the authSource defaults to `$external`. For PLAIN the authSource @@ -1822,14 +1877,15 @@ defaults to the database name if supplied on the connection string or `$external SCRAM-SHA-256 authMechanisms, the authSource defaults to the database name if supplied on the connection string or `admin`. -- authMechanismProperties=PROPERTY_NAME:PROPERTY_VALUE,PROPERTY_NAME2:PROPERTY_VALUE2\ - A generic method to set mechanism - properties in the connection string. +- authMechanismProperties=PROPERTY_NAME:PROPERTY_VALUE,PROPERTY_NAME2:PROPERTY_VALUE2 + + A generic method to set mechanism properties in the connection string. For example, to set REALM and CANONICALIZE_HOST_NAME, the option would be `authMechanismProperties=CANONICALIZE_HOST_NAME:forward,SERVICE_REALM:AWESOME`. -- gssapiServiceName (deprecated)\ +- gssapiServiceName (deprecated) + An alias for `authMechanismProperties=SERVICE_NAME:mongodb`. #### Errors @@ -1970,22 +2026,24 @@ The Java and .NET drivers currently uses eager authentication and abide by this ## Q & A Q: According to [Authentication Handshake](#authentication-handshake), we are calling `hello` or legacy hello for every -socket. Isn't this a lot?\ -Drivers should be pooling connections and, as such, new sockets getting opened should be -relatively infrequent. It's simply part of the protocol for setting up a socket to be used. +socket. Isn't this a lot? + +Drivers should be pooling connections and, as such, new sockets getting opened should be relatively infrequent. It's +simply part of the protocol for setting up a socket to be used. -Q: Where is information related to user management?\ -Not here currently. Should it be? This is about authentication, not -user management. Perhaps a new spec is necessary. +Q: Where is information related to user management? + +Not here currently. Should it be? This is about authentication, not user management. Perhaps a new spec is necessary. Q: It's possible to continue using authenticated sockets even if new sockets fail authentication. Why can't we do that -so that applications continue to work.\ -Yes, that's technically true. The issue with doing that is for drivers using -connection pooling. An application would function normally until an operation needed an additional connection(s) during -a spike. Each new connection would fail to authenticate causing intermittent failures that would be very difficult to -understand for a user. +so that applications continue to work. + +Yes, that's technically true. The issue with doing that is for drivers using connection pooling. An application would +function normally until an operation needed an additional connection(s) during a spike. Each new connection would fail +to authenticate causing intermittent failures that would be very difficult to understand for a user. + +Q: Should a driver support multiple credentials? -Q: Should a driver support multiple credentials?\ No. Historically, the MongoDB server and drivers have supported multiple credentials, one per authSource, on a single @@ -2011,16 +2069,16 @@ feature that builds on sessions (e.g. retryable writes). Drivers should therefore guide application creators in the right direction by supporting the association of at most one credential with a MongoClient instance. -Q: Should a driver support lazy authentication?\ -No, for the same reasons as given in the previous section, as lazy -authentication is another mechanism for allowing multiple credentials to be associated with a single MongoClient -instance. +Q: Should a driver support lazy authentication? + +No, for the same reasons as given in the previous section, as lazy authentication is another mechanism for allowing +multiple credentials to be associated with a single MongoClient instance. -Q: Why does SCRAM sometimes SASLprep and sometimes not?\ -When MongoDB implemented SCRAM-SHA-1, it required drivers to -*NOT* SASLprep usernames and passwords. The primary reason for this was to allow a smooth upgrade path from MongoDB-CR -using existing usernames and passwords. Also, because MongoDB's SCRAM-SHA-1 passwords are hex characters of a digest, -SASLprep of passwords was irrelevant. +Q: Why does SCRAM sometimes SASLprep and sometimes not? + +When MongoDB implemented SCRAM-SHA-1, it required drivers to *NOT* SASLprep usernames and passwords. The primary reason +for this was to allow a smooth upgrade path from MongoDB-CR using existing usernames and passwords. Also, because +MongoDB's SCRAM-SHA-1 passwords are hex characters of a digest, SASLprep of passwords was irrelevant. With the introduction of SCRAM-SHA-256, MongoDB requires users to explicitly create new SCRAM-SHA-256 credentials distinct from those used for MONGODB-CR and SCRAM-SHA-1. This means SCRAM-SHA-256 passwords are not digested and any @@ -2034,9 +2092,10 @@ SASLprep username. After considering various options to address or workaround th best user experience on upgrade and lowest technical risk of implementation is to require drivers to continue to not SASLprep usernames in SCRAM-SHA-256. -Q: Should drivers support accessing Amazon EC2 instance metadata in Amazon ECS?\ -No. While it's possible to allow access -to EC2 instance metadata in ECS, for security reasons, Amazon states it's best practice to avoid this. (See +Q: Should drivers support accessing Amazon EC2 instance metadata in Amazon ECS? + +No. While it's possible to allow access to EC2 instance metadata in ECS, for security reasons, Amazon states it's best +practice to avoid this. (See [accessing EC2 metadata in ECS](https://aws.amazon.com/premiumsupport/knowledge-center/ecs-container-ec2-metadata/) and [IAM Roles for Tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html)) @@ -2067,8 +2126,7 @@ to EC2 instance metadata in ECS, for security reasons, Amazon states it's best p - 2024-01-31: Migrated from reStructuredText to Markdown. -- 2024-01-17: Added MONGODB-OIDC machine auth flow spec and combine with human\ - auth flow specs. +- 2024-01-17: Added MONGODB-OIDC machine auth flow spec and combine with human auth flow specs. - 2023-04-28: Added MONGODB-OIDC auth mechanism @@ -2096,13 +2154,11 @@ to EC2 instance metadata in ECS, for security reasons, Amazon states it's best p - 2020-02-04: Support shorter SCRAM conversation starting in version 4.4 of the server. -- 2020-01-31: Clarify that drivers must raise an error when a connection string\ - has an empty value for authSource. +- 2020-01-31: Clarify that drivers must raise an error when a connection string has an empty value for authSource. - 2020-01-23: Clarify when authentication will occur. -- 2020-01-22: Clarify that authSource in URI is not treated as a user configuring\ - auth credentials. +- 2020-01-22: Clarify that authSource in URI is not treated as a user configuring auth credentials. - 2019-12-05: Added MONGODB-IAM auth mechanism @@ -2112,8 +2168,7 @@ to EC2 instance metadata in ECS, for security reasons, Amazon states it's best p - Clarify that database name in URI is not treated as a user configuring auth credentials. -- 2018-08-08: Unknown users don't cause handshake errors. This was changed before\ - server 4.0 GA in SERVER-34421, so the +- 2018-08-08: Unknown users don't cause handshake errors. This was changed before server 4.0 GA in SERVER-34421, so the auth spec no longer refers to such a possibility. - 2018-04-17: Clarify authSource defaults diff --git a/source/bson-corpus/bson-corpus.md b/source/bson-corpus/bson-corpus.md index 7969b138e2..dcb9f9c520 100644 --- a/source/bson-corpus/bson-corpus.md +++ b/source/bson-corpus/bson-corpus.md @@ -339,29 +339,24 @@ development. - 2024-01-22: Migrated from reStructuredText to Markdown. -- 2023-06-14: Add decimal128 Extended JSON parse tests for clamped zeros with\ - very large exponents. +- 2023-06-14: Add decimal128 Extended JSON parse tests for clamped zeros with very large exponents. - 2022-10-05: Remove spec front matter and reformat changelog. - 2021-09-09: Clarify error expectation rules for `parseErrors`. -- 2021-09-02: Add spec and prose tests for prohibiting null bytes in\ - null-terminated strings within document field - names and regular expressions. Clarify type-specific rules for `parseErrors`. +- 2021-09-02: Add spec and prose tests for prohibiting null bytes in null-terminated strings within document field names + and regular expressions. Clarify type-specific rules for `parseErrors`. -- 2017-05-26: Revised to be consistent with Extended JSON spec 2.0: valid case\ - fields have changed, as have the test +- 2017-05-26: Revised to be consistent with Extended JSON spec 2.0: valid case fields have changed, as have the test assertions. -- 2017-01-23: Added `multi-type.json` to test encoding and decoding all BSON\ - types within the same document. Amended - all extended JSON strings to adhere to the Extended JSON Specification. Modified the "Use of extjson" section of this +- 2017-01-23: Added `multi-type.json` to test encoding and decoding all BSON types within the same document. Amended all + extended JSON strings to adhere to the Extended JSON Specification. Modified the "Use of extjson" section of this specification to note that canonical extended JSON is now used. - 2016-11-14: Removed "invalid flags" BSON Regexp case. -- 2016-10-25: Added a "non-alphabetized flags" case to the BSON Regexp corpus\ - file; decoders must be able to read +- 2016-10-25: Added a "non-alphabetized flags" case to the BSON Regexp corpus file; decoders must be able to read non-alphabetized flags, but encoders must emit alphabetized flags. Added an "invalid flags" case to the BSON Regexp corpus file. diff --git a/source/bson-decimal128/decimal128.md b/source/bson-decimal128/decimal128.md index 488896f16e..fbb301df19 100644 --- a/source/bson-decimal128/decimal128.md +++ b/source/bson-decimal128/decimal128.md @@ -32,25 +32,27 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ## Terminology -**IEEE 754-2008 128-bit decimal floating point (Decimal128)**:\ -The Decimal128 specification supports 34 decimal digits -of precision, a max value of approximately `10^6145`, and min value of approximately `-10^6145`. This is the new -`BSON Decimal128` type (`"\x13"`). - -**Clamping**:\ -Clamping happens when a value's exponent is too large for the destination format. This works by adding -zeros to the coefficient to reduce the exponent to the largest usable value. An overflow occurs if the number of digits -required is more than allowed in the destination format. - -**Binary Integer Decimal (BID)**:\ -MongoDB uses this binary encoding for the coefficient as specified in `IEEE 754-2008` -section 3.5.2 using method 2 "binary encoding" rather than method 1 "decimal encoding". The byte order is little-endian, -like the rest of the BSON types. - -**Value Object**:\ -An immutable container type representing a value (e.g. Decimal128). This Value Object MAY provide -accessors that retrieve the abstracted value as a different type (e.g. casting it). -`double x = valueObject.getAsDouble();` +**IEEE 754-2008 128-bit decimal floating point (Decimal128)** + +The Decimal128 specification supports 34 decimal digits of precision, a max value of approximately `10^6145`, and min +value of approximately `-10^6145`. This is the new `BSON Decimal128` type (`"\x13"`). + +**Clamping** + +Clamping happens when a value's exponent is too large for the destination format. This works by adding zeros to the +coefficient to reduce the exponent to the largest usable value. An overflow occurs if the number of digits required is +more than allowed in the destination format. + +**Binary Integer Decimal (BID)** + +MongoDB uses this binary encoding for the coefficient as specified in `IEEE 754-2008` section 3.5.2 using method 2 +"binary encoding" rather than method 1 "decimal encoding". The byte order is little-endian, like the rest of the BSON +types. + +**Value Object** + +An immutable container type representing a value (e.g. Decimal128). This Value Object MAY provide accessors that +retrieve the abstracted value as a different type (e.g. casting it). `double x = valueObject.getAsDouble();` ## Specification diff --git a/source/causal-consistency/causal-consistency.md b/source/causal-consistency/causal-consistency.md index 494151edfa..f80bca1632 100644 --- a/source/causal-consistency/causal-consistency.md +++ b/source/causal-consistency/causal-consistency.md @@ -20,52 +20,63 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**Causal consistency**\ -A property that guarantees that an application can read its own writes and that a later read -will never observe a version of the data that is older than an earlier read. +**Causal consistency** + +A property that guarantees that an application can read its own writes and that a later read will never observe a +version of the data that is older than an earlier read. + +**ClientSession** -**ClientSession**\ The driver object representing a client session and the operations that can be performed on it. -**Cluster time**\ -The current cluster time. The server reports its view of the current cluster time in the -`$clusterTime` field in responses from the server and the driver participates in distributing the current cluster time -to all nodes (called "gossipping the cluster time") by sending the highest `$clusterTime` it has seen so far in messages -it sends to mongos servers. The current cluster time is a logical time, but is digitally signed to prevent malicious -clients from propagating invalid cluster times. Cluster time is only used in replica sets and sharded clusters. +**Cluster time** + +The current cluster time. The server reports its view of the current cluster time in the `$clusterTime` field in +responses from the server and the driver participates in distributing the current cluster time to all nodes (called +"gossipping the cluster time") by sending the highest `$clusterTime` it has seen so far in messages it sends to mongos +servers. The current cluster time is a logical time, but is digitally signed to prevent malicious clients from +propagating invalid cluster times. Cluster time is only used in replica sets and sharded clusters. -**Logical time**\ -A time-like quantity that can be used to determine the order in which events occurred. Logical time is -represented as a BsonTimestamp. +**Logical time** + +A time-like quantity that can be used to determine the order in which events occurred. Logical time is represented as a +BsonTimestamp. + +**MongoClient** -**MongoClient**\ The root object of a driver's API. MAY be named differently in some drivers. -**MongoCollection**\ -The driver object representing a collection and the operations that can be performed on it. MAY be -named differently in some drivers. +**MongoCollection** + +The driver object representing a collection and the operations that can be performed on it. MAY be named differently in +some drivers. -**MongoDatabase**\ -The driver object representing a database and the operations that can be performed on it. MAY be -named differently in some drivers. +**MongoDatabase** -**Operation time**\ -The logical time at which an operation occurred. The server reports the operation time in the -response to all commands, including error responses. The operation time by definition is always less than or equal to -the cluster time. Operation times are tracked on a per `ClientSession` basis, so the `operationTime` of each -`ClientSession` corresponds to the time of the last operation performed in that particular `ClientSession`. +The driver object representing a database and the operations that can be performed on it. MAY be named differently in +some drivers. + +**Operation time** + +The logical time at which an operation occurred. The server reports the operation time in the response to all commands, +including error responses. The operation time by definition is always less than or equal to the cluster time. Operation +times are tracked on a per `ClientSession` basis, so the `operationTime` of each `ClientSession` corresponds to the time +of the last operation performed in that particular `ClientSession`. + +**ServerSession** -**ServerSession**\ The driver object representing a server session. -**Session**\ -A session is an abstract concept that represents a set of sequential operations executed by an application -that are related in some way. This specification defines how sessions are used to implement causal consistency. +**Session** -**Unacknowledged writes**\ -Unacknowledged writes are write operations that are sent to the server without waiting for a -reply acknowledging the write. See the "Unacknowledged Writes" section below for information on how unacknowledged -writes interact with causal consistency. +A session is an abstract concept that represents a set of sequential operations executed by an application that are +related in some way. This specification defines how sessions are used to implement causal consistency. + +**Unacknowledged writes** + +Unacknowledged writes are write operations that are sent to the server without waiting for a reply acknowledging the +write. See the "Unacknowledged Writes" section below for information on how unacknowledged writes interact with causal +consistency. ## Specification @@ -412,10 +423,8 @@ resolving many discussions of spec details. A final reference implementation mus - 2017-10-04: Added advanceOperationTime -- 2017-09-28: Remove remaining references to collections being associated with\ - sessions. Update spec to reflect that +- 2017-09-28: Remove remaining references to collections being associated with sessions. Update spec to reflect that replica sets use $clusterTime also now. -- 2017-09-13: Renamed "causally consistent reads" to "causal consistency". If no\ - value is supplied for +- 2017-09-13: Renamed "causally consistent reads" to "causal consistency". If no value is supplied for `causallyConsistent` assume true. diff --git a/source/change-streams/change-streams.md b/source/change-streams/change-streams.md index 32464cf776..a39fb18ea1 100644 --- a/source/change-streams/change-streams.md +++ b/source/change-streams/change-streams.md @@ -1024,23 +1024,19 @@ There should be no backwards compatibility concerns. - 2022-05-17: Add `wallTime` to `ChangeStreamDocument`. -- 2022-04-13: Support returning point-in-time pre and post-images with\ - `fullDocumentBeforeChange` and `fullDocument`. +- 2022-04-13: Support returning point-in-time pre and post-images with `fullDocumentBeforeChange` and `fullDocument`. - 2022-03-25: Do not error when parsing change stream event documents. - 2022-02-28: Add `to` to `ChangeStreamDocument`. -- 2022-02-10: Specify that `getMore` command must explicitly send inherited\ - `comment`. +- 2022-02-10: Specify that `getMore` command must explicitly send inherited `comment`. - 2022-02-01: Add `comment` to `ChangeStreamOptions`. -- 2022-01-19: Require that timeouts be applied per the client-side operations\ - timeout specification. +- 2022-01-19: Require that timeouts be applied per the client-side operations timeout specification. -- 2021-09-01: Clarify that server selection during resumption should respect\ - normal server selection rules. +- 2021-09-01: Clarify that server selection during resumption should respect normal server selection rules. - 2021-04-29: Add `load-balanced` to test topology requirements. @@ -1050,22 +1046,19 @@ There should be no backwards compatibility concerns. - 2020-02-10: Change error handling approach to use an allow list. -- 2019-07-15: Clarify resume process for change streams started with the\ - `startAfter` option. +- 2019-07-15: Clarify resume process for change streams started with the `startAfter` option. - 2019-07-09: Change `fullDocument` to be an optional string. - 2019-07-02: Fix server version for `startAfter`. -- 2019-07-01: Clarify that close may be implemented with more idiomatic\ - patterns instead of a method. +- 2019-07-01: Clarify that close may be implemented with more idiomatic patterns instead of a method. - 2019-06-20: Fix server version for addition of `postBatchResumeToken`. - 2019-04-12: Clarify caching process for resume token. -- 2019-04-03: Update the lowest server version that supports\ - `postBatchResumeToken`. +- 2019-04-03: Update the lowest server version that supports `postBatchResumeToken`. - 2019-01-10: Clarify error handling for killing the cursor. @@ -1083,15 +1076,13 @@ There should be no backwards compatibility concerns. - 2018-05-24: Change `startAtClusterTime` to `startAtOperationTime`. -- 2018-04-18: Add helpers for Database and MongoClient, and add\ - `startAtClusterTime` option. +- 2018-04-18: Add helpers for Database and MongoClient, and add `startAtClusterTime` option. - 2018-04-17: Clarify that the initial aggregate should not be retried. - 2017-12-13: Default read concern is also accepted, not just "majority". -- 2017-11-06: Defer to Read and Write concern spec for determining a read\ - concern for the helper method. +- 2017-11-06: Defer to Read and Write concern spec for determining a read concern for the helper method. - 2017-09-26: Clarify that change stream options may be added later. diff --git a/source/client-side-encryption/client-side-encryption.md b/source/client-side-encryption/client-side-encryption.md index 47d7e4b5f7..e0d416739f 100644 --- a/source/client-side-encryption/client-side-encryption.md +++ b/source/client-side-encryption/client-side-encryption.md @@ -23,24 +23,27 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ## Terms -**encrypted MongoClient**\ +**encrypted MongoClient** + A MongoClient with client side encryption enabled. -**data key**\ -A key used to encrypt and decrypt BSON values. Data keys are encrypted with a key management service (e.g. -AWS KMS) and stored within a document in the MongoDB key vault collection (see +**data key** + +A key used to encrypt and decrypt BSON values. Data keys are encrypted with a key management service (e.g. AWS KMS) and +stored within a document in the MongoDB key vault collection (see [Key vault collection schema for data keys](#key-vault-collection-schema-for-data-keys) for a description of the data key document). Therefore, a client needs access to both MongoDB and the external KMS service to utilize a data key. -**MongoDB key vault collection**\ -A MongoDB collection designated to contain data keys. This can either be co-located -with the data-bearing cluster, or in a separate external MongoDB cluster. +**MongoDB key vault collection** + +A MongoDB collection designated to contain data keys. This can either be co-located with the data-bearing cluster, or in +a separate external MongoDB cluster. -**Key Management Service (KMS)**\ -An external service providing fixed-size encryption/decryption. Only data keys are -encrypted and decrypted with KMS. +**Key Management Service (KMS)** -**KMS providers**\\ +An external service providing fixed-size encryption/decryption. Only data keys are encrypted and decrypted with KMS. + +**KMS providers** > A map of KMS providers to credentials. Configured client-side. Example: > @@ -56,40 +59,46 @@ encrypted and decrypted with KMS. > } > ``` -**KMS provider**\ -A configured KMS. Identified by a key in the KMS providers map. The key has the form -"" or ":". Examples: "aws" or "aws:myname". In -[libmongocrypt](#libmongocrypt), the key is referred to as the KMS ID. +**KMS provider** + +A configured KMS. Identified by a key in the KMS providers map. The key has the form "" or +":". Examples: "aws" or "aws:myname". In [libmongocrypt](#libmongocrypt), the key +is referred to as the KMS ID. + +**KMS provider type** -**KMS provider type**\ The type of backing KMS. Identified by the string: "aws", "azure", "gcp", "kmip", or "local". -**KMS provider name**\ -An optional name to identify a KMS provider. Enables configuring multiple KMS providers with the -same KMS provider type (e.g. "aws:name1" and "aws:name2" can refer to different AWS accounts). +**KMS provider name** + +An optional name to identify a KMS provider. Enables configuring multiple KMS providers with the same KMS provider type +(e.g. "aws:name1" and "aws:name2" can refer to different AWS accounts). + +**Customer Master Key (CMK)** -**Customer Master Key (CMK)**\ The underlying key AWS KMS uses to encrypt and decrypt. See [AWS Key Management Service Concepts](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys). -**schema**\ -A MongoDB JSON Schema (either supplied by the server or client-side) which may include metadata about -encrypted fields. This is a JSON Schema based on draft 4 of the JSON Schema specification, +**schema** + +A MongoDB JSON Schema (either supplied by the server or client-side) which may include metadata about encrypted fields. +This is a JSON Schema based on draft 4 of the JSON Schema specification, [as documented in the MongoDB manual.](https://www.mongodb.com/docs/manual/reference/operator/query/jsonSchema/). -**[libmongocrypt](#libmongocrypt)**\ -A library, written in C, that coordinates communication, does -encryption/decryption, caches key and schemas. [Located here](https://github.com/mongodb/libmongocrypt). +**[libmongocrypt](#libmongocrypt)** -**[mongocryptd](#mongocryptd)**\ -A local process the driver communicates with to determine how to encrypt values in a -command. +A library, written in C, that coordinates communication, does encryption/decryption, caches key and schemas. +[Located here](https://github.com/mongodb/libmongocrypt). -**[crypt_shared](#crypt_shared)**\ -This term, spelled in all-lowercase with an underscore, refers to the client-side -field-level-encryption dynamic library provided as part of a MongoDB Enterprise distribution. It replaces -[mongocryptd](#mongocryptd) as the method of +**[mongocryptd](#mongocryptd)** + +A local process the driver communicates with to determine how to encrypt values in a command. + +**[crypt_shared](#crypt_shared)** + +This term, spelled in all-lowercase with an underscore, refers to the client-side field-level-encryption dynamic library +provided as part of a MongoDB Enterprise distribution. It replaces [mongocryptd](#mongocryptd) as the method of `marking-up a database command for encryption `. See also: @@ -97,25 +106,30 @@ See also: > - [Introduction on crypt_shared](#crypt_shared) > - [Enabling crypt_shared](#enabling-crypt_shared) -**ciphertext**\ +**ciphertext** + One of the data formats of [BSON binary subtype 6](https://github.com/mongodb/specifications/tree/master/source/client-side-encryption/subtype6.rst), representing an encoded BSON document containing encrypted ciphertext and metadata. -**FLE**\ -FLE is the first version of Client-Side Field Level Encryption. FLE is almost entirely client-side with the -exception of server-side JSON schema. +**FLE** + +FLE is the first version of Client-Side Field Level Encryption. FLE is almost entirely client-side with the exception of +server-side JSON schema. -**Queryable Encryption**\ -Queryable Encryption the second version of Client-Side Field Level Encryption. Data is -encrypted client-side. Queryable Encryption supports indexed encrypted fields, which are further processed server-side. +**Queryable Encryption** + +Queryable Encryption the second version of Client-Side Field Level Encryption. Data is encrypted client-side. Queryable +Encryption supports indexed encrypted fields, which are further processed server-side. + +**In-Use Encryption** -**In-Use Encryption**\ Is an umbrella term describing the both FLE and Queryable Encryption. -**encryptedFields**\ -A BSON document describing the Queryable Encryption encrypted fields. This is analogous to the JSON -Schema in FLE. The following is an example encryptedFields in extended canonical JSON: +**encryptedFields** + +A BSON document describing the Queryable Encryption encrypted fields. This is analogous to the JSON Schema in FLE. The +following is an example encryptedFields in extended canonical JSON: ```javascript { @@ -1656,9 +1670,10 @@ documentation in MongoClient: ### Appendix terms -intent-to-encrypt marking\ -One of the data formats of BSON binary subtype 6, representing an encoded BSON document -containing plaintext and metadata. +**intent-to-encrypt marking** + +One of the data formats of BSON binary subtype 6, representing an encoded BSON document containing plaintext and +metadata. ### Key vault collection schema for data keys diff --git a/source/client-side-operations-timeout/client-side-operations-timeout.md b/source/client-side-operations-timeout/client-side-operations-timeout.md index c63b991279..b99754d6ae 100644 --- a/source/client-side-operations-timeout/client-side-operations-timeout.md +++ b/source/client-side-operations-timeout/client-side-operations-timeout.md @@ -20,9 +20,10 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -min(a, b)\ -Shorthand for "the minimum of a and b" where `a` and `b` are numeric values. For any cases where 0 means -"infinite" (e.g. [timeoutMS](#timeoutms)), `min(0, other)` MUST evaluate to `other`. +**min(a, b)** + +Shorthand for "the minimum of a and b" where `a` and `b` are numeric values. For any cases where 0 means "infinite" +(e.g. [timeoutMS](#timeoutms)), `min(0, other)` MUST evaluate to `other`. ### MongoClient Configuration diff --git a/source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md b/source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md index 45fc9159a1..1474c5000c 100644 --- a/source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md +++ b/source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md @@ -156,7 +156,7 @@ interface ConnectionPoolOptions { /** * An alternative way of setting waitQueueSize, it specifies * the maximum number of threads that can wait per connection. - * waitQueueSize === waitQueueMultiple \* maxPoolSize + * waitQueueSize === waitQueueMultiple * maxPoolSize */ waitQueueMultiple?: number } @@ -699,7 +699,7 @@ interrupting in-use connections, its next run MUST be scheduled as soon as possi The pool MUST only interrupt in-use Connections whose generation is less than or equal to the generation of the pool at the moment of the clear (before the increment) that used the interruptInUseConnections flag. Any operations that have their Connections interrupted in this way MUST fail with a retryable error. If possible, the error SHOULD be a -PoolClearedError with the following message: "Connection to \ interrupted due to server monitor timeout". +PoolClearedError with the following message: "Connection to interrupted due to server monitor timeout". ##### Clearing a load balanced pool @@ -1376,8 +1376,7 @@ to close and remove from its pool a [Connection](#connection) which has unread e - 2019-06-06: Add "connectionError" as a valid reason for ConnectionCheckOutFailedEvent -- 2020-09-03: Clarify Connection states and definition. Require the use of a\ - background thread and/or async I/O. Add +- 2020-09-03: Clarify Connection states and definition. Require the use of a background thread and/or async I/O. Add tests to ensure ConnectionReadyEvents are fired after ConnectionCreatedEvents. - 2020-09-24: Introduce maxConnecting requirement @@ -1386,8 +1385,7 @@ to close and remove from its pool a [Connection](#connection) which has unread e - 2021-01-12: Clarify "clear" method behavior in load balancer mode. -- 2021-01-19: Require that timeouts be applied per the client-side operations\ - timeout specification. +- 2021-01-19: Require that timeouts be applied per the client-side operations timeout specification. - 2021-04-12: Adding in behaviour for load balancer mode. diff --git a/source/connection-string/connection-string-spec.md b/source/connection-string/connection-string-spec.md index 5ee78434a4..f08eae5abe 100644 --- a/source/connection-string/connection-string-spec.md +++ b/source/connection-string/connection-string-spec.md @@ -149,11 +149,13 @@ for legacy reasons. A key value pair represents the option key and its associated value. The key is everything up to the first equals sign ("=") and the value is everything afterwards. Key values contain the following information: -- Key:\ - The connection option's key string. Keys should be normalised and character case should be ignored. +- Key: -- Value: (optional)\ - The value if provided otherwise it defaults to an empty string. +The connection option's key string. Keys should be normalised and character case should be ignored. + +- Value: (optional) + +The value if provided otherwise it defaults to an empty string. ### Defining connection options @@ -348,61 +350,67 @@ Given the string `mongodb://foo:bar%3A@mongodb.example.com,%2Ftmp%2Fmongodb-2701 1. Auth database: `admin`. 2. Connection options: `w=1`. 7. URL decode the auth database. In this example, the auth database is `admin`. -8. Validate the \[database contains no prohibited characters\](#database contains no prohibited characters). +8. Validate the database contains no prohibited characters. 9. Validate, split, and URL decode the connection options. In this example, the connection options are `{w: 1}`. ### Q&A -Q: What about existing Connection Options that aren't currently defined in a specification?\ -Ideally all MongoClient -options would already belong in their relevant specifications. As we iterate and produce more specifications these -options should be covered. +Q: What about existing Connection Options that aren't currently defined in a specification + +Ideally all MongoClient options would already belong in their relevant specifications. As we iterate and produce more +specifications these options should be covered. -Q: Why is it recommended that Connection Options take precedence over application set options?\ -This is only a -recommendation but the reasoning is application code is much harder to change across deployments. By making the -Connection String take precedence from outside the application it would be easier for the application to be portable -across environments. The order of precedence of MongoClient hosts and options is recommended to be from low to high: +Q: Why is it recommended that Connection Options take precedence over application set options + +This is only a recommendation but the reasoning is application code is much harder to change across deployments. By +making the Connection String take precedence from outside the application it would be easier for the application to be +portable across environments. The order of precedence of MongoClient hosts and options is recommended to be from low to +high: 1. Default values 2. MongoClient hosts and options 3. Connection String hosts and options -Q: Why WARN level warning on unknown options rather than throwing an exception?\ -It is responsible to inform users of -possible misconfigurations and both methods achieve that. However, there are conflicting requirements of a Connection -String. One goal is that any given driver should be configurable by a connection string but different drivers and -languages have different feature sets. Another goal is that Connection Strings should be portable and as such some -options supported by language X might not be relevant to language Y. Any given driver does not know is an option is -specific to a different driver or is misspelled or just not supported. So the only way to stay portable and support -configuration of all options is to not throw an exception but rather log a warning. - -Q: How long should deprecation options be supported?\ -This is not declared in this specification. It's not deemed -responsible to give a single timeline for how long deprecated options should be supported. As such any specifications -that deprecate options that do have the context of the decision should provide the timeline. - -Q: Why can I not use a standard URI parser?\ -The connection string format does not follow the standard URI format (as -described in [RFC 3986](http://tools.ietf.org/html/rfc3986)) we differ in two key areas: - -1. Hosts\ - The connection string allows for multiple hosts for high availability reasons but standard URI's only ever - define a single host. - -2. Query Parameters / Connection Options\ - The connection string provides a concreted definition on how the Connection - Options are parsed, including definitions of different data types. The [RFC 3986](http://tools.ietf.org/html/rfc3986) - only defines that they are `key=value` pairs and gives no instruction on parsing. In fact different languages handle - the parsing of query parameters in different ways and as such there is no such thing as a standard URI parser. - -Q: Can the connection string contain non-ASCII characters?\ -The connection string can contain non-ASCII characters. The -connection string is text, which can be encoded in any way appropriate for the application (e.g. the C Driver requires -you to pass it a UTF-8 encoded connection string). - -Q: Why does reference implementation check for a `.sock` suffix when parsing a socket path and possible auth -database?\ +Q: Why WARN level warning on unknown options rather than throwing an exception + +It is responsible to inform users of possible misconfigurations and both methods achieve that. However, there are +conflicting requirements of a Connection String. One goal is that any given driver should be configurable by a +connection string but different drivers and languages have different feature sets. Another goal is that Connection +Strings should be portable and as such some options supported by language X might not be relevant to language Y. Any +given driver does not know is an option is specific to a different driver or is misspelled or just not supported. So the +only way to stay portable and support configuration of all options is to not throw an exception but rather log a +warning. + +Q: How long should deprecation options be supported + +This is not declared in this specification. It's not deemed responsible to give a single timeline for how long +deprecated options should be supported. As such any specifications that deprecate options that do have the context of +the decision should provide the timeline. + +Q: Why can I not use a standard URI parser + +The connection string format does not follow the standard URI format (as described in +[RFC 3986](http://tools.ietf.org/html/rfc3986)) we differ in two key areas: + +1. Hosts + + The connection string allows for multiple hosts for high availability reasons but standard URI's only ever define a + single host. + +2. Query Parameters / Connection Options + + The connection string provides a concreted definition on how the Connection Options are parsed, including definitions + of different data types. The [RFC 3986](http://tools.ietf.org/html/rfc3986) only defines that they are `key=value` + pairs and gives no instruction on parsing. In fact different languages handle the parsing of query parameters in + different ways and as such there is no such thing as a standard URI parser. + +Q: Can the connection string contain non-ASCII characters + +The connection string can contain non-ASCII characters. The connection string is text, which can be encoded in any way +appropriate for the application (e.g. the C Driver requires you to pass it a UTF-8 encoded connection string). + +Q: Why does reference implementation check for a `.sock` suffix when parsing a socket path and possible auth database + To simplify parsing of a socket path followed by an auth database, we rely on MongoDB's [naming restrictions](https://www.mongodb.com/docs/manual/reference/limits/#naming-restrictions)), which do not allow database names to contain a dot character, and the fact that socket paths must end with `.sock`. This allows us to @@ -412,10 +420,10 @@ on the basis of the dot alone, this specification is primarily concerned with br (e.g. host types, database names, allowed values for an option). Additionally, some drivers might allow a namespace (e.g. `"db.collection"`) for the auth database part, so we do not want to be more strict than is necessary for parsing. -Q: Why throw an exception if the userinfo contains a percent sign ("%"), at-sign ("@"), or more than one colon -(":")?\ -This is done to help users format the connection string correctly. Although at-signs ("@") or colons (":") in -the username must be URL encoded, users may not be aware of that requirement. Take the following example: +Q: Why throw an exception if the userinfo contains a percent sign ("%"), at-sign ("@"), or more than one colon (":") + +This is done to help users format the connection string correctly. Although at-signs ("@") or colons (":") in the +username must be URL encoded, users may not be aware of that requirement. Take the following example: ``` mongodb://anne:bob:pass@localhost:27017 @@ -426,9 +434,9 @@ as the userinfo could cause authentication to fail, causing confusion for the us and percent symbols would invite further ambiguity. By throwing an exception users are made aware and then update the connection string so to be explicit about what forms the username and password. -Q: Why must UNIX domain sockets be URL encoded?\ -This has been done to reduce ambiguity between the socket name and the -database name. Take the following example: +Q: Why must UNIX domain sockets be URL encoded + +This has been done to reduce ambiguity between the socket name and the database name. Take the following example: ``` mongodb:///tmp/mongodb.sock/mongodb.sock @@ -439,10 +447,10 @@ Is the host `/tmp/mongodb.sock` and the auth database `mongodb.sock` or does the be explicit about the host and the auth database. By requiring an exception to be thrown when the host contains a slash ("/") users can be informed on how to migrate their connection strings. -Q: Why must the auth database be URL decoded by the parser?\ -On Linux systems database names can contain a question mark -("?"), in these rare cases the auth database must be URL encoded. This disambiguates between the auth database and the -connection options. Take the following example: +Q: Why must the auth database be URL decoded by the parser + +On Linux systems database names can contain a question mark ("?"), in these rare cases the auth database must be URL +encoded. This disambiguates between the auth database and the connection options. Take the following example: ``` mongodb://localhost/admin%3F?w=1 @@ -450,10 +458,11 @@ mongodb://localhost/admin%3F?w=1 In this case the auth database would be `admin?` and the connection options `w=1`. -Q: How should the space character be encoded in a connection string?\ -Space characters SHOULD be encoded as `%20` rather -than `+`, this will be portable across all implementations. Implementations MAY support decoding `+` into a space, as -many languages treat strings as `x-www-form-urlencoded` data by default. +Q: How should the space character be encoded in a connection string + +Space characters SHOULD be encoded as `%20` rather than `+`, this will be portable across all implementations. +Implementations MAY support decoding `+` into a space, as many languages treat strings as `x-www-form-urlencoded` data +by default. ## Changelog @@ -465,18 +474,15 @@ many languages treat strings as `x-www-form-urlencoded` data by default. - 2017-01-09: In Userinfo section, clarify that percent signs must be encoded. -- 2017-06-10: In Userinfo section, require username and password to be fully URI\ - encoded, not just "%", "@", and ":". - In Auth Database, list the prohibited characters. In Reference Implementation, split at the first "/", not the last. +- 2017-06-10: In Userinfo section, require username and password to be fully URI encoded, not just "%", "@", and ":". In + Auth Database, list the prohibited characters. In Reference Implementation, split at the first "/", not the last. - 2018-01-09: Clarified that space characters should be encoded to `%20`. -- 2018-06-04: Revised Userinfo section to provide an explicit list of allowed\ - characters and clarify rules for +- 2018-06-04: Revised Userinfo section to provide an explicit list of allowed characters and clarify rules for exceptions. -- 2019-02-04: In Repeated Keys section, clarified that the URI options spec may\ - override the repeated key behavior +- 2019-02-04: In Repeated Keys section, clarified that the URI options spec may override the repeated key behavior described here for certain options. - 2019-03-04: Require drivers to document option precedence rules @@ -487,8 +493,6 @@ many languages treat strings as `x-www-form-urlencoded` data by default. - 2022-10-05: Remove spec front matter and reformat changelog. -- 2022-12-27: Note that host information ends with a "/" character in connection\ - options description. +- 2022-12-27: Note that host information ends with a "/" character in connection options description. -- 2023-08-02: Make delimiting slash between host information and connection options\ - optional and update tests +- 2023-08-02: Make delimiting slash between host information and connection options optional and update tests diff --git a/source/crud/crud.md b/source/crud/crud.md index ea96efec9a..cd9389c4e5 100644 --- a/source/crud/crud.md +++ b/source/crud/crud.md @@ -25,15 +25,16 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH #### Terms -**Collection:**\ -The term `interface Collection` will be seen in most of the sections. Each driver will likely have a -class or interface defined for the concept of a collection. Operations appearing inside the `interface Collection` are -required operations to be present on a driver's concept of a collection. +**Collection:** -**Iterable:**\ -The term `Iterable` will be seen as a return type from some of the [Read](#read) methods. Its use is as -that of a sequence of items. For instance, `collection.find({})` returns a sequence of documents that can be iterated -over. +The term `interface Collection` will be seen in most of the sections. Each driver will likely have a class or interface +defined for the concept of a collection. Operations appearing inside the `interface Collection` are required operations +to be present on a driver's concept of a collection. + +**Iterable:** + +The term `Iterable` will be seen as a return type from some of the [Read](#read) methods. Its use is as that of a +sequence of items. For instance, `collection.find({})` returns a sequence of documents that can be iterated over. ### Guidance @@ -2230,105 +2231,113 @@ deviations from the Naming section are still permissible. ## Q & A -Q: Why do the names of the fields differ from those defined in the MongoDB manual?\ -Documentation and commands often -refer to same-purposed fields with different names making it difficult to have a cohesive API. In addition, occasionally -the name was correct at one point and its purpose has expanded to a point where the initial name doesn't accurately -describe its current function. +Q: Why do the names of the fields differ from those defined in the MongoDB manual? + +Documentation and commands often refer to same-purposed fields with different names making it difficult to have a +cohesive API. In addition, occasionally the name was correct at one point and its purpose has expanded to a point where +the initial name doesn't accurately describe its current function. In addition, responses from the servers are sometimes cryptic and used for the purposes of compactness. In these cases, we felt the more verbose form was desirable for self-documentation purposes. -Q: Where is read preference?\ -Read preference is about selecting a server with which to perform a read operation, such -as a query, a count, or an aggregate. Since all operations defined in this specification are performed on a collection, -it's uncommon that two different read operations on the same collection would use a different read preference, -potentially getting out-of-sync results. As such, the most natural place to indicate read preference is on the client, -the database, or the collection itself and not the operations within it. +Q: Where is read preference? + +Read preference is about selecting a server with which to perform a read operation, such as a query, a count, or an +aggregate. Since all operations defined in this specification are performed on a collection, it's uncommon that two +different read operations on the same collection would use a different read preference, potentially getting out-of-sync +results. As such, the most natural place to indicate read preference is on the client, the database, or the collection +itself and not the operations within it. However, it might be that a driver needs to expose this selection filter to a user per operation for various reasons. As noted before, it is permitted to specify this, along with other driver-specific options, in some alternative way. -Q: Where is read concern?\ -Read concern is about indicating how reads are handled. Since all operations defined in this -specification are performed on a collection, it's uncommon that two different read operations on the same collection -would use a different read concern, potentially causing mismatched and out-of-sync data. As such, the most natural place -to indicate read concern is on the client, the database, or the collection itself and not the operations within it. +Q: Where is read concern? + +Read concern is about indicating how reads are handled. Since all operations defined in this specification are performed +on a collection, it's uncommon that two different read operations on the same collection would use a different read +concern, potentially causing mismatched and out-of-sync data. As such, the most natural place to indicate read concern +is on the client, the database, or the collection itself and not the operations within it. However, it might be that a driver needs to expose read concern to a user per operation for various reasons. As noted before, it is permitted to specify this, along with other driver-specific options, in some alternative way. -Q: Where is write concern?\ -Write concern is about indicating how writes are acknowledged. Since all operations defined -in this specification are performed on a collection, it's uncommon that two different write operations on the same -collection would use a different write concern, potentially causing mismatched and out-of-sync data. As such, the most -natural place to indicate write concern is on the client, the database, or the collection itself and not the operations -within it. See the [Read/Write Concern specification](../read-write-concern/read-write-concern.md) for the API of -constructing a read/write concern and associated API. +Q: Where is write concern? + +Write concern is about indicating how writes are acknowledged. Since all operations defined in this specification are +performed on a collection, it's uncommon that two different write operations on the same collection would use a +different write concern, potentially causing mismatched and out-of-sync data. As such, the most natural place to +indicate write concern is on the client, the database, or the collection itself and not the operations within it. See +the [Read/Write Concern specification](../read-write-concern/read-write-concern.md) for the API of constructing a +read/write concern and associated API. However, it might be that a driver needs to expose write concern to a user per operation for various reasons. As noted before, it is permitted to specify this, along with other driver-specific options, in some alternative way. -Q: How do I throttle unacknowledged writes now that write concern is no longer defined on a per operation basis?\ -Some -users used to throttle unacknowledged writes by using an acknowledged write concern every X number of operations. Going -forward, the proper way to handle this is by using the bulk write API. +Q: How do I throttle unacknowledged writes now that write concern is no longer defined on a per operation basis? -Q: What is the logic for adding "One" or "Many" into the method and model names?\ -If the maximum number of documents -affected can only be one, we added "One" into the name. This makes it explicit that the maximum number of documents that -could be affected is one vs. infinite. +Some users used to throttle unacknowledged writes by using an acknowledged write concern every X number of operations. +Going forward, the proper way to handle this is by using the bulk write API. + +Q: What is the logic for adding "One" or "Many" into the method and model names? + +If the maximum number of documents affected can only be one, we added "One" into the name. This makes it explicit that +the maximum number of documents that could be affected is one vs. infinite. In addition, the current API exposed by all our drivers has the default value for "one" or "many" set differently for update and delete. This generally causes some issues for new developers and is a minor annoyance for existing developers. The safest way to combat this without introducing discrepancies between drivers/driver versions or breaking backwards compatibility was to use multiple methods, each signifying the number of documents that could be affected. -Q: Speaking of "One", where is `findOne`?\ -If your driver wishes to offer a `findOne` method, that is perfectly fine. If -you choose to implement `findOne`, please keep to the naming conventions followed by the `FindOptions` and keep in mind -that certain things don't make sense like limit (which should be -1), tailable, awaitData, etc... +Q: Speaking of "One", where is `findOne`? + +If your driver wishes to offer a `findOne` method, that is perfectly fine. If you choose to implement `findOne`, please +keep to the naming conventions followed by the `FindOptions` and keep in mind that certain things don't make sense like +limit (which should be -1), tailable, awaitData, etc... -Q: What considerations have been taken for the eventual merging of query and the aggregation framework?\ -In the future, -it is probable that a new query engine (QE) will look very much like the aggregation framework. Given this assumption, -we know that both `find` and `aggregate` will be renderable in QE, each maintaining their ordering guarantees for full -backwards compatibility. +Q: What considerations have been taken for the eventual merging of query and the aggregation framework? + +In the future, it is probable that a new query engine (QE) will look very much like the aggregation framework. Given +this assumption, we know that both `find` and `aggregate` will be renderable in QE, each maintaining their ordering +guarantees for full backwards compatibility. Hence, the only real concern is how to initiate a query using QE. While `find` is preferable, it would be a backwards breaking change. It might be decided that `find` is what should be used, and all drivers will release major revisions with this backwards breaking change. Alternatively, it might be decided that another initiator would be used. -Q: Didn't we just build a bulk API?\ -Yes, most drivers did just build out a bulk API (fluent-bulk-api). While -unfortunate, we felt it better to have the bulk api be consistent with the rest of the methods in the CRUD family of -operations. However, the fluent-bulk-api is still able to be used as this change is non-backwards breaking. Any driver -which implemented the fluent bulk API should deprecate it and drivers that have not built it should not do so. - -Q: What about explain?\ -Explain has been determined to be not a normal use-case for a driver. We'd like users to use the -shell for this purpose. However, explain is still possible from a driver. For find, it can be passed as a modifier. -Aggregate can be run using a runCommand method passing the explain option. In addition, server 3.0 offers an explain -command that can be run using a runCommand method. - -Q: Where did modifiers go in FindOptions?\ -MongoDB 3.2 introduced the find command. As opposed to using the general -"modifiers" field any longer, each relevant option is listed explicitly. Some options, such as "tailable" or -"singleBatch" are not listed as they are derived from other fields. Upgrading a driver should be a simple procedure of -deprecating the "modifiers" field and introducing the new fields. When a collision occurs, the explicitly specified -field should override the value in "modifiers". - -Q: Where is `save`?\ -Drivers have historically provided a `save` method, which was syntactic sugar for upserting or -inserting a document based on whether it contained an identifier, respectively. While the `save` method may be -convenient for interactive environments, such as the shell, it was intentionally excluded from the CRUD specification -for language drivers for several reasons. The `save` method promotes a design pattern of "fetch, modify, replace" and -invites race conditions in application logic. Additionally, the split nature of `save` makes it difficult to discern at -a glance if application code will perform an insert or potentially dangerous full-document replacement. Instead of -relying on `save`, application code should know whether document already has an identifier and explicitly call -`insertOne` or `replaceOne` with the `upsert` option. - -Q: Where is `useCursor` in AggregateOptions?\ +Q: Didn't we just build a bulk API? + +Yes, most drivers did just build out a bulk API (fluent-bulk-api). While unfortunate, we felt it better to have the bulk +api be consistent with the rest of the methods in the CRUD family of operations. However, the fluent-bulk-api is still +able to be used as this change is non-backwards breaking. Any driver which implemented the fluent bulk API should +deprecate it and drivers that have not built it should not do so. + +Q: What about explain? + +Explain has been determined to be not a normal use-case for a driver. We'd like users to use the shell for this purpose. +However, explain is still possible from a driver. For find, it can be passed as a modifier. Aggregate can be run using a +runCommand method passing the explain option. In addition, server 3.0 offers an explain command that can be run using a +runCommand method. + +Q: Where did modifiers go in FindOptions? + +MongoDB 3.2 introduced the find command. As opposed to using the general "modifiers" field any longer, each relevant +option is listed explicitly. Some options, such as "tailable" or "singleBatch" are not listed as they are derived from +other fields. Upgrading a driver should be a simple procedure of deprecating the "modifiers" field and introducing the +new fields. When a collision occurs, the explicitly specified field should override the value in "modifiers". + +Q: Where is `save`? + +Drivers have historically provided a `save` method, which was syntactic sugar for upserting or inserting a document +based on whether it contained an identifier, respectively. While the `save` method may be convenient for interactive +environments, such as the shell, it was intentionally excluded from the CRUD specification for language drivers for +several reasons. The `save` method promotes a design pattern of "fetch, modify, replace" and invites race conditions in +application logic. Additionally, the split nature of `save` makes it difficult to discern at a glance if application +code will perform an insert or potentially dangerous full-document replacement. Instead of relying on `save`, +application code should know whether document already has an identifier and explicitly call `insertOne` or `replaceOne` +with the `upsert` option. + +Q: Where is `useCursor` in AggregateOptions? + Inline aggregation results are no longer supported in server 3.5.2+. The [aggregate command](https://www.mongodb.com/docs/manual/reference/command/aggregate/) must be provided either the `cursor` document or the `explain` boolean. AggregateOptions does not define an `explain` option. If a driver does @@ -2337,27 +2346,29 @@ document must be added to the `aggregate` command. Regardless, `useCursor` is no a backwards breaking change, so drivers should first deprecate this option in a minor release, and remove it in a major release. -Q: Where is `singleBatch` in FindOptions?\ -Drivers have historically allowed users to request a single batch of results -(after which the cursor is closed) by specifying a negative value for the `limit` option. For servers \< 3.2, a single -batch may be requested by specifying a negative value in the `numberToReturn` wire protocol field. For servers >= 3.2, -the `find` command defines `limit` as a non-negative integer option but introduces a `singleBatch` boolean option. -Rather than introduce a `singleBatch` option to FindOptions, the spec preserves the existing API for `limit` and -instructs drivers to convert negative values accordingly for servers >= 3.2. - -Q: Why are client-side errors raised for some unsupported options?\ -Server versions before 3.4 were inconsistent about -reporting errors for unrecognized command options and may simply ignore them, which means a client-side error is the -only way to inform users that such options are unsupported. For unacknowledged writes using OP_MSG, a client-side error -is necessary because the server has no chance to return a response (even though a 3.6+ server is otherwise capable of -reporting errors for unrecognized options). For unacknowledged writes using legacy opcodes (i.e. OP_INSERT, OP_UPDATE, -and OP_DELETE), the message body has no field with which to express these options so a client-side error is the only -mechanism to inform the user that such options are unsupported. The spec does not explicitly refer to unacknowledged -writes using OP_QUERY primarily because a response document is always returned and drivers generally would not consider -using OP_QUERY precisely for that reason. +Q: Where is `singleBatch` in FindOptions? + +Drivers have historically allowed users to request a single batch of results (after which the cursor is closed) by +specifying a negative value for the `limit` option. For servers \< 3.2, a single batch may be requested by specifying a +negative value in the `numberToReturn` wire protocol field. For servers >= 3.2, the `find` command defines `limit` as a +non-negative integer option but introduces a `singleBatch` boolean option. Rather than introduce a `singleBatch` option +to FindOptions, the spec preserves the existing API for `limit` and instructs drivers to convert negative values +accordingly for servers >= 3.2. + +Q: Why are client-side errors raised for some unsupported options? + +Server versions before 3.4 were inconsistent about reporting errors for unrecognized command options and may simply +ignore them, which means a client-side error is the only way to inform users that such options are unsupported. For +unacknowledged writes using OP_MSG, a client-side error is necessary because the server has no chance to return a +response (even though a 3.6+ server is otherwise capable of reporting errors for unrecognized options). For +unacknowledged writes using legacy opcodes (i.e. OP_INSERT, OP_UPDATE, and OP_DELETE), the message body has no field +with which to express these options so a client-side error is the only mechanism to inform the user that such options +are unsupported. The spec does not explicitly refer to unacknowledged writes using OP_QUERY primarily because a response +document is always returned and drivers generally would not consider using OP_QUERY precisely for that reason. Q: Why does reverting to using `count` instead of `aggregate` with `$collStats` for estimatedDocumentCount not require a -major version bump in the drivers, even though it might break users of the Stable API?\ +major version bump in the drivers, even though it might break users of the Stable API? + SemVer [allows](https://semver.org/#what-if-i-inadvertently-alter-the-public-api-in-a-way-that-is-not-compliant-with-the-version-number-change-ie-the-code-incorrectly-introduces-a-major-breaking-change-in-a-patch-release) for a library to include a breaking change in a minor or patch version if the change is required to fix another @@ -2383,20 +2394,17 @@ aforementioned allowance in the SemVer spec. - 2022-01-27: Use optional return types for write commands and findAndModify -- 2022-01-19: Deprecate the maxTimeMS option and require that timeouts be applied\ - per the client-side operations - timeout spec. +- 2022-01-19: Deprecate the maxTimeMS option and require that timeouts be applied per the client-side operations timeout + spec. - 2022-01-14: Add let to ReplaceOptions -- 2021-11-10: Revise rules for applying read preference for aggregations with\ - $out and $merge. Add let to FindOptions, +- 2021-11-10: Revise rules for applying read preference for aggregations with $out and $merge. Add let to FindOptions, UpdateOptions, DeleteOptions, FindOneAndDeleteOptions, FindOneAndReplaceOptions, FindOneAndUpdateOptions - 2021-09-28: Support aggregations with $out and $merge on 5.0+ secondaries -- 2021-08-31: Allow unacknowledged hints on write operations if supported by\ - server (reverts previous change). +- 2021-08-31: Allow unacknowledged hints on write operations if supported by server (reverts previous change). - 2021-06-02: Introduce WriteError.details and clarify WriteError construction @@ -2404,12 +2412,10 @@ aforementioned allowance in the SemVer spec. - 2021-01-21: Update estimatedDocumentCount to use $collStats stage for servers >= 4.9 -- 2020-04-17: Specify that the driver must raise an error for unacknowledged\ - hints on any write operation, regardless - of server version. +- 2020-04-17: Specify that the driver must raise an error for unacknowledged hints on any write operation, regardless of + server version. -- 2020-03-19: Clarify that unacknowledged update, findAndModify, and delete\ - operations with a hint option should raise +- 2020-03-19: Clarify that unacknowledged update, findAndModify, and delete operations with a hint option should raise an error on older server versions. - 2020-03-06: Added hint option for DeleteOne, DeleteMany, and FindOneAndDelete operations. @@ -2422,8 +2428,7 @@ aforementioned allowance in the SemVer spec. - 2020-01-10: Clarify client-side error reporting for unsupported options -- 2020-01-10: Error if hint specified for unacknowledged update using OP_UPDATE\ - or OP_MSG for servers \< 4.2 +- 2020-01-10: Error if hint specified for unacknowledged update using OP_UPDATE or OP_MSG for servers \< 4.2 - 2019-10-28: Removed link to old language examples. @@ -2443,15 +2448,13 @@ aforementioned allowance in the SemVer spec. - 2018-07-25: Added upsertedCount to UpdateResult. -- 2018-06-07: Deprecated the count helper. Added the estimatedDocumentCount and\ - countDocuments helpers. +- 2018-06-07: Deprecated the count helper. Added the estimatedDocumentCount and countDocuments helpers. - 2018-03-05: Deprecate snapshot option - 2018-03-01: Deprecate maxScan query option. -- 2018-02-06: Note that batchSize in FindOptions and AggregateOptions should also\ - apply to getMore. +- 2018-02-06: Note that batchSize in FindOptions and AggregateOptions should also apply to getMore. - 2018-01-26: Only send bypassDocumentValidation option if it's true, don't send false. @@ -2459,14 +2462,12 @@ aforementioned allowance in the SemVer spec. - 2017-10-17: Document negative limit for FindOptions. -- 2017-10-09: Bumped minimum server version to 2.6 and removed references to\ - older versions in spec and tests. +- 2017-10-09: Bumped minimum server version to 2.6 and removed references to older versions in spec and tests. - 2017-10-09: Prohibit empty insertMany() and bulkWrite() operations. -- 2017-10-09: Split UpdateOptions and ReplaceOptions. Since replaceOne()\ - previously used UpdateOptions, this may have - BC implications for drivers using option classes. +- 2017-10-09: Split UpdateOptions and ReplaceOptions. Since replaceOne() previously used UpdateOptions, this may have BC + implications for drivers using option classes. - 2017-10-05: Removed useCursor option from AggregateOptions. @@ -2486,12 +2487,10 @@ aforementioned allowance in the SemVer spec. - 2017-01-09: Removed modifiers from FindOptions and added in all options. -- 2017-01-09: Changed the value type of FindOptions.skip and FindOptions.limit to\ - Int64 with a note related to +- 2017-01-09: Changed the value type of FindOptions.skip and FindOptions.limit to Int64 with a note related to calculating batchSize for opcode writes. -- 2017-01-09: Reworded description of how default values are handled and when to\ - send certain options. +- 2017-01-09: Reworded description of how default values are handled and when to send certain options. - 2016-09-23: Included collation option in the bulk write models. @@ -2501,8 +2500,7 @@ aforementioned allowance in the SemVer spec. - 2015-10-16: Added maxAwaitTimeMS to FindOptions. -- 2015-10-01: Moved bypassDocumentValidation into BulkWriteOptions and removed it\ - from the individual write models. +- 2015-10-01: Moved bypassDocumentValidation into BulkWriteOptions and removed it from the individual write models. - 2015-09-16: Added bypassDocumentValidation. diff --git a/source/enumerate-collections.md b/source/enumerate-collections.md index 7d23e47329..ce239f07ca 100644 --- a/source/enumerate-collections.md +++ b/source/enumerate-collections.md @@ -19,13 +19,15 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**MongoClient**\ -Driver object representing a connection to MongoDB. This is the root object of a driver's API and MAY -be named differently in some drivers. +**MongoClient** -**Iterable**\ -An object or data structure that is a sequence of elements that can be iterated over. This spec is -flexible on what that means as different drivers will have different requirements, types, and idioms. +Driver object representing a connection to MongoDB. This is the root object of a driver's API and MAY be named +differently in some drivers. + +**Iterable** + +An object or data structure that is a sequence of elements that can be iterated over. This spec is flexible on what that +means as different drivers will have different requirements, types, and idioms. ### listCollections Database Command @@ -298,31 +300,24 @@ The shell implements the first algorithm for falling back if the `listCollection - 2022-02-01: Add `comment` option to `listCollections` command. -- 2022-01-20: Require that timeouts be applied per the client-side operations\ - timeout spec. +- 2022-01-20: Require that timeouts be applied per the client-side operations timeout spec. -- 2021-12-17: Support `authorizedCollections` option in `listCollections`\ - command. +- 2021-12-17: Support `authorizedCollections` option in `listCollections` command. - 2021-04-22: Update to use secondaryOk. -- 2020-03-18: MongoDB 4.4 no longer includes `ns` field in `idIndex` field\ - for `listCollections` responses. +- 2020-03-18: MongoDB 4.4 no longer includes `ns` field in `idIndex` field for `listCollections` responses. -- 2019-03-21: The method that returns a list of collection names should be named\ - `listCollectionNames`. The method that +- 2019-03-21: The method that returns a list of collection names should be named `listCollectionNames`. The method that returns a list of collection objects may be named `listMongoCollections`. -- 2018-07-03: Clarify that `nameOnly` must not be used with filters other than\ - `name`. +- 2018-07-03: Clarify that `nameOnly` must not be used with filters other than `name`. - 2018-05-18: Support `nameOnly` option in `listCollections` command. - 2017-09-27: Clarify reason for filtering collection names containing '$'. -- 2015-01-14: Clarify trimming of database name. Put preferred method name for\ - listing collections with a cursor as +- 2015-01-14: Clarify trimming of database name. Put preferred method name for listing collections with a cursor as return value. -- 2014-12-18: Update with the server change to return a cursor for\ - `listCollections`. +- 2014-12-18: Update with the server change to return a cursor for `listCollections`. diff --git a/source/enumerate-databases.md b/source/enumerate-databases.md index 0b9f916875..3d300afb2b 100644 --- a/source/enumerate-databases.md +++ b/source/enumerate-databases.md @@ -19,21 +19,25 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**MongoClient**\ -Driver object representing a connection to MongoDB. This is the root object of a driver's API and MAY -be named differently in some drivers. +**MongoClient** -**MongoDatabase**\ -Driver object representing a database and the operations that can be performed on it. MAY be named +Driver object representing a connection to MongoDB. This is the root object of a driver's API and MAY be named differently in some drivers. -**Iterable**\ -An object or data structure that is a sequence of elements that can be iterated over. This spec is -flexible on what that means as different drivers will have different requirements, types, and idioms. +**MongoDatabase** -**Document**\ -An object or data structure used by the driver to represent a BSON document. This spec is flexible on what -that means as different drivers will have different requirements, types, and idioms. +Driver object representing a database and the operations that can be performed on it. MAY be named differently in some +drivers. + +**Iterable** + +An object or data structure that is a sequence of elements that can be iterated over. This spec is flexible on what that +means as different drivers will have different requirements, types, and idioms. + +**Document** + +An object or data structure used by the driver to represent a BSON document. This spec is flexible on what that means as +different drivers will have different requirements, types, and idioms. ### Naming Deviations @@ -233,8 +237,7 @@ array of database information documents. - 2024-07-26: Migrated from reStructuredText to Markdown. Removed note that applied to pre-3.6 servers. -- 2022-10-05: Remove spec front matter and reformat changelog. Also reverts the\ - minimum server version to 3.6, which is +- 2022-10-05: Remove spec front matter and reformat changelog. Also reverts the minimum server version to 3.6, which is where `nameOnly` and `filter` options were first introduced for `listDatabases`. - 2022-08-17: Clarify the behavior of comment on pre-4.4 servers. diff --git a/source/extended-json.md b/source/extended-json.md index c60186bceb..bb3c7b367a 100644 --- a/source/extended-json.md +++ b/source/extended-json.md @@ -172,8 +172,8 @@ or: { "zipCode" : { $type : "string" } } ``` -A parser SHOULD support at least 200 \[levels of nesting\](#levels of nesting) in an Extended JSON document but MAY set -other limits on strings it can accept as defined in [section 9](https://tools.ietf.org/html/rfc7159#section-9) of the +A parser SHOULD support at least 200 levels of nesting in an Extended JSON document but MAY set other limits on strings +it can accept as defined in [section 9](https://tools.ietf.org/html/rfc7159#section-9) of the [JSON specification](https://tools.ietf.org/html/rfc7159). When parsing a JSON object other than the top-level object, the presence of a `$`-prefixed key indicates the object @@ -601,13 +601,17 @@ If a BSON type fell into category (3), above, this specification creates a type following new Extended JSON type wrappers are introduced by this spec: - `$dbPointer`- See above. + - `$numberInt` - This is used to preserve the "int32" BSON type in Canonical Extended JSON. Without using `$numberInt`, this type will be indistinguishable from a double in certain languages where the distinction does not exist, such as Javascript. + - `$numberDouble` - This is used to preserve the `double`type in Canonical Extended JSON, as some JSON generators might - omit a trailing ".0" for integral types.\ - It also supports representing non-finite values like NaN or Infinity which - are prohibited in the JSON specification for numbers. + omit a trailing ".0" for integral types. + + It also supports representing non-finite values like NaN or Infinity which are prohibited in the JSON specification + for numbers. + - `$symbol` - The use of the `$symbol` key preserves the symbol type in Canonical Extended JSON, distinguishing it from JSON strings. @@ -644,34 +648,36 @@ This specification will need to be amended if future BSON types are added to the ## Q&A -**Q**. Why was version 2 of the spec necessary? **A**. After Version 1 was released, several stakeholders raised -concerns that not providing an option to output BSON numbers as ordinary JSON numbers limited the utility of Extended -JSON for common historical uses. We decided to provide a second format option and more clearly distinguish the use cases -(and limitations) inherent in each format. +**Q**. Why was version 2 of the spec necessary? + +**A**. After Version 1 was released, several stakeholders raised concerns that not providing an option to output BSON +numbers as ordinary JSON numbers limited the utility of Extended JSON for common historical uses. We decided to provide +a second format option and more clearly distinguish the use cases (and limitations) inherent in each format. **Q**. My BSON parser doesn't distinguish every BSON type. Does my Extended JSON generator need to distinguish these -types?\ -**A**. No. Some BSON parsers do not emit a unique type for each BSON type, making round-tripping BSON through -such libraries impossible without changing the document. For example, a `DBPointer` will be parsed into a `DBRef` by -PyMongo. In such cases, a generator must emit the Extended JSON form for whatever type the BSON parser emitted. It does -not need to preserve type information when that information has been lost by the BSON parser. +types? + +**A**. No. Some BSON parsers do not emit a unique type for each BSON type, making round-tripping BSON through such +libraries impossible without changing the document. For example, a `DBPointer` will be parsed into a `DBRef` by PyMongo. +In such cases, a generator must emit the Extended JSON form for whatever type the BSON parser emitted. It does not need +to preserve type information when that information has been lost by the BSON parser. **Q**. How can implementations which require backwards compatibility with Legacy Extended JSON, in which BSON regular expressions were represented with `$regex`, handle parsing of extended JSON test representing a MongoDB query filter -containing the `$regex` operator?\ -**A**. An implementation can handle this in a number of ways: - Introduce an -enumeration that determines the behavior of the parser. If the value is LEGACY, it will parse `$regex`and not treat -`$regularExpression` specially, and if the value is CANONICAL, it will parse `$regularExpression` and not treat `$regex` -specially. - Support both legacy and canonical forms in the parser without requiring the application to specify one or -the other. Making that work for the `$regex` query operator use case will require that the rules set forth in the 1.0.0 -version of this specification are followed for `$regex`; specifically, that a document with a `$regex` key whose value -is a JSON object should be parsed as a normal document and not reported as an error. - -**Q**. How can implementations which require backwards compatibility with Legacy Extended JSON, in which BSON binary -values were represented like `{"$binary": "AQIDBAU=", "$type": "80"}`, handle parsing of extended JSON test representing -a MongoDB query filter containing the `$type`operator?\ -**A**. An implementation can handle this in a number of ways:\ -\- +containing the `$regex` operator? + +**A**. An implementation can handle this in a number of ways: - Introduce an enumeration that determines the behavior of +the parser. If the value is LEGACY, it will parse `$regex`and not treat `$regularExpression` specially, and if the value +is CANONICAL, it will parse `$regularExpression` and not treat `$regex` specially. - Support both legacy and canonical +forms in the parser without requiring the application to specify one or the other. Making that work for the `$regex` +query operator use case will require that the rules set forth in the 1.0.0 version of this specification are followed +for `$regex`; specifically, that a document with a `$regex` key whose value is a JSON object should be parsed as a +normal document and not reported as an error. + +## **Q**. How can implementations which require backwards compatibility with Legacy Extended JSON, in which BSON binary values were represented like `{"$binary": "AQIDBAU=", "$type": "80"}`, handle parsing of extended JSON test representing a MongoDB query filter containing the `$type`operator? + +**A**. An implementation can handle this in a number of ways: + Introduce an enumeration that determines the behavior of the parser. If the value is LEGACY, it will parse the new binary form and not treat the legacy one specially, and if the value is CANONICAL, it will parse the new form and not treat the legacy form specially. - Support both legacy and canonical forms in the parser without requiring the @@ -680,8 +686,8 @@ rules set forth in the 1.0.0 version of this specification are followed for `$ty a `$type` key whose value is an integral type, or a document with a `$type` key but without a `$binary` key, should be parsed as a normal document and not reported as an error. -**Q**. Sometimes I see the term "extjson" used in other specifications. Is "extjson" related to this -specification?\ +**Q**. Sometimes I see the term "extjson" used in other specifications. Is "extjson" related to this specification? + **A**. Yes, "extjson" is short for "Extended JSON". ### Changelog @@ -699,7 +705,7 @@ specification?\ - Added support for parsing `$uuid` fields as BSON Binary subtype 4. - Changed the example to using the MongoDB Python Driver. It previously used the MongoDB Java Driver. The new example excludes the following BSON types that are unsupported in Python - `Symbol`,`SpecialFloat`,`DBPointer`, and - `Undefined`. Transformations for these types are now only documented in the `Conversion table`\_. + `Undefined`. Transformations for these types are now only documented in the [Conversion table](#conversion-table) - 2017-07-20: - Bumped specification to version 2.0. - Added "Relaxed" format. diff --git a/source/find_getmore_killcursors_commands.md b/source/find_getmore_killcursors_commands.md index 6553425955..3f77738181 100644 --- a/source/find_getmore_killcursors_commands.md +++ b/source/find_getmore_killcursors_commands.md @@ -462,26 +462,20 @@ More in depth information about passing read preferences to Mongos can be found - 2022-10-05: Remove spec front matter and reformat changelog. -- 2022-02-01: Replace examples/tables for find, getMore, and killCursors with\ - server manual links. +- 2022-02-01: Replace examples/tables for find, getMore, and killCursors with server manual links. -- 2021-12-14: Exhaust cursors may fallback to non-exhaust cursors on 5.1+\ - servers. Relax requirement of OP_MSG for +- 2021-12-14: Exhaust cursors may fallback to non-exhaust cursors on 5.1+ servers. Relax requirement of OP_MSG for exhaust cursors. - 2021-08-27: Exhaust cursors must use OP_MSG on 3.6+ servers. - 2021-04-06: Updated to use hello and secondaryOk. -- 2015-10-21: If no **maxAwaitTimeMS** is specified, the driver SHOULD not set\ - **maxTimeMS** on the **getMore** - command. +- 2015-10-21: If no **maxAwaitTimeMS** is specified, the driver SHOULD not set **maxTimeMS** on the **getMore** command. -- 2015-10-13: Added guidance on batchSize values as related to the **getMore**\ - command. Legacy secondaryOk flag SHOULD +- 2015-10-13: Added guidance on batchSize values as related to the **getMore** command. Legacy secondaryOk flag SHOULD not be set on getMore and killCursors commands. Introduced maxAwaitTimeMS option for setting maxTimeMS on getMore commands when the cursor is a tailable cursor with awaitData set. -- 2015-09-30: Legacy secondaryOk flag must be set to true on **getMore** and\ - **killCursors** commands to make drivers +- 2015-09-30: Legacy secondaryOk flag must be set to true on **getMore** and **killCursors** commands to make drivers have same behavior as for OP_GET_MORE and OP_KILL_CURSORS. diff --git a/source/gridfs/gridfs-spec.md b/source/gridfs/gridfs-spec.md index dca3fe4df4..a16e00f1aa 100644 --- a/source/gridfs/gridfs-spec.md +++ b/source/gridfs/gridfs-spec.md @@ -28,14 +28,16 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**Bucket name**\ -A prefix under which a GridFS system"s collections are stored. Collection names for the files and -chunks collections are prefixed with the bucket name. The bucket name MUST be configurable by the user. Multiple buckets -may exist within a single database. The default bucket name is "fs". +**Bucket name** -**Chunk**\ -A section of a user file, stored as a single document in the "chunks" collection of a GridFS bucket. The -default size for the data field in chunks is 255 KiB. Chunk documents have the following form: +A prefix under which a GridFS system"s collections are stored. Collection names for the files and chunks collections are +prefixed with the bucket name. The bucket name MUST be configurable by the user. Multiple buckets may exist within a +single database. The default bucket name is "fs". + +**Chunk** + +A section of a user file, stored as a single document in the "chunks" collection of a GridFS bucket. The default size +for the data field in chunks is 255 KiB. Chunk documents have the following form: ```javascript { @@ -46,34 +48,41 @@ default size for the data field in chunks is 255 KiB. Chunk documents have the f } ``` -**\_id**\ +**\_id** + a unique ID for this document of type BSON ObjectId -**files_id**\ -the id for this file (the `_id` from the files collection document). This field takes the type of the -corresponding `_id` in the files collection. +**files_id** + +the id for this file (the `_id` from the files collection document). This field takes the type of the corresponding +`_id` in the files collection. + +**\_id** -**n**\ the index number of this chunk, zero-based. -**data**\ +**data** + a chunk of data from the user file -**Chunks collection**\ -A collection in which chunks of a user file are stored. The name for this collection is the word -'chunks' prefixed by the bucket name. The default is "fs.chunks". +**Chunks collection** + +A collection in which chunks of a user file are stored. The name for this collection is the word 'chunks' prefixed by +the bucket name. The default is "fs.chunks". + +**Empty chunk** -**Empty chunk**\ A chunk with a zero length "data" field. -**Files collection**\ -A collection in which information about stored files is stored. There will be one files collection -document per stored file. The name for this collection is the word "files" prefixed by the bucket name. The default is -"fs.files". +**Files collection** -**Files collection document**\ -A document stored in the files collection that contains information about a single stored -file. Files collection documents have the following form: +A collection in which information about stored files is stored. There will be one files collection document per stored +file. The name for this collection is the word "files" prefixed by the bucket name. The default is "fs.files". + +**Files collection document** + +A document stored in the files collection that contains information about a single stored file. Files collection +documents have the following form: ```javascript { @@ -89,34 +98,42 @@ file. Files collection documents have the following form: } ``` -**\_id**\ +**\_id** + a unique ID for this document. Usually this will be of type ObjectId, but a custom `_id` value provided by the application may be of any type. -**length**\ +**length** + the length of this stored file, in bytes -**chunkSize**\ -the size, in bytes, of each data chunk of this file. This value is configurable by file. The default is -255 KiB. +**chunkSize** -**uploadDate**\ -the date and time this file was added to GridFS, stored as a BSON datetime value. The value of this -field MUST be the datetime when the upload completed, not the datetime when it was begun. +the size, in bytes, of each data chunk of this file. This value is configurable by file. The default is 255 KiB. + +**uploadDate** + +the date and time this file was added to GridFS, stored as a BSON datetime value. The value of this field MUST be the +datetime when the upload completed, not the datetime when it was begun. + +**md5** -**md5**\ DEPRECATED, a hash of the contents of the stored file -**filename**\ +**filename** + the name of this stored file; this does not need to be unique -**contentType**\ +**contentType** + DEPRECATED, any MIME type, for application use only -**aliases**\ +**aliases** + DEPRECATED, for application use only -**metadata**\ +**metadata** + any additional application data the user wishes to store Note: some older versions of GridFS implementations allowed applications to add arbitrary fields to the files collection @@ -127,27 +144,30 @@ Note: drivers SHOULD store length as Int64 and chunkSize as Int32 when creating be able to handle existing GridFS files where the length and chunkSize fields might have been stored using a different numeric data type. -**Orphaned chunk**\ -A document in the chunks collections for which the "files_id" does not match any `_id` in the files -collection. Orphaned chunks may be created if write or delete operations on GridFS fail part-way through. +**Orphaned chunk** + +A document in the chunks collections for which the "files_id" does not match any `_id` in the files collection. Orphaned +chunks may be created if write or delete operations on GridFS fail part-way through. + +**Stored File** + +A user file that has been stored in GridFS, consisting of a files collection document in the files collection and zero +or more documents in the chunks collection. + +**Stream** + +An abstraction that represents streamed I/O. In some languages a different word is used to represent this abstraction. -**Stored File**\ -A user file that has been stored in GridFS, consisting of a files collection document in the files -collection and zero or more documents in the chunks collection. +**TFileId** -**Stream**\ -An abstraction that represents streamed I/O. In some languages a different word is used to represent this -abstraction. +While GridFS file id values are ObjectIds by default, an application may choose to use custom file id values, which may +be of any type. In this spec the term TFileId refers to whatever data type is appropriate in the driver's programming +language to represent a file id. This would be something like object, BsonValue or a generic `` type parameter. -**TFileId**\ -While GridFS file id values are ObjectIds by default, an application may choose to use custom file id -values, which may be of any type. In this spec the term TFileId refers to whatever data type is appropriate in the -driver's programming language to represent a file id. This would be something like object, BsonValue or a generic -`` type parameter. +**User File** -**User File**\ -A data added by a user to GridFS. This data may map to an actual file on disk, a stream of input, a large -data object, or any other large amount of consecutive data. +A data added by a user to GridFS. This data may map to an actual file on disk, a stream of input, a large data object, +or any other large amount of consecutive data. ## Specification @@ -842,159 +862,172 @@ that are undesirable or incorrect. ## Design Rationale -Why is the default chunk size 255 KiB?\ -On MMAPv1, the server provides documents with extra padding to allow for -in-place updates. When the "data" field of a chunk is limited to 255 KiB, it ensures that the whole chunk document (the -chunk data along with an `_id` and other information) will fit into a 256 KiB section of memory, making the best use of -the provided padding. Users setting custom chunk sizes are advised not to use round power-of-two values, as the whole -chunk document is likely to exceed that space and demand extra padding from the system. WiredTiger handles its memory -differently, and this optimization does not apply. However, because application code generally won"t know what storage -engine will be used in the database, always avoiding round power-of-two chunk sizes is recommended. - -Why can"t I alter documents once they are in the system?\ -GridFS works with documents stored in multiple collections -within MongoDB. Because there is currently no way to atomically perform operations across collections in MongoDB, there -is no way to alter stored files in a way that prevents race conditions between GridFS clients. Updating GridFS stored -files without that server functionality would involve a data model that could support this type of concurrency, and -changing the GridFS data model is outside of the scope of this spec. - -Why provide a "rename" method?\ -By providing users with a reasonable alternative for renaming a file, we can discourage -users from writing directly to the files collections under GridFS. With this approach we can prevent critical files -collection documents fields from being mistakenly altered. - -Why is there no way to perform arbitrary updates on the files collection?\ -The rename helper defined in this spec allows -users to easily rename a stored file. While updating files collection documents in other, more granular ways might be -helpful for some users, validating such updates to ensure that other files collection document fields remain protected -is a complicated task. We leave the decision of how best to provide this functionality to a future spec. - -What is the "md5" field of a files collection document and how was it used?\ -"md5" holds an MD5 checksum that is -computed from the original contents of a user file. Historically, GridFS did not use acknowledged writes, so this -checksum was necessary to ensure that writes went through properly. With acknowledged writes, the MD5 checksum is still -useful to ensure that files in GridFS have not been corrupted. A third party directly accessing the 'files' and "chunks" -collections under GridFS could, inadvertently or maliciously, make changes to documents that would make them unusable by -GridFS. Comparing the MD5 in the files collection document to a re-computed MD5 allows detecting such errors and -corruption. However, drivers now assume that the stored file is not corrupted, and applications that want to use the MD5 -value to check for corruption must do so themselves. - -Why store the MD5 checksum instead of creating the hash as-needed?\ -The MD5 checksum must be computed when a file is -initially uploaded to GridFS, as this is the only time we are guaranteed to have the entire uncorrupted file. Computing -it on-the-fly as a file is read from GridFS would ensure that our reads were successful, but guarantees nothing about -the state of the file in the system. A successful check against the stored MD5 checksum guarantees that the stored file -matches the original and no corruption has occurred. - -Why are MD5 checksums now deprecated? What should users do instead?\ -MD5 is prohibited by FIPS 140-2. Operating systems -and libraries operating in FIPS mode do not provide the MD5 algorithm. To avoid a broken GridFS feature on such systems, -the use of MD5 with GridFS is deprecated, should not be added to new implementations, and should be removed from -existing implementations according to the deprecation policy of individual drivers. Applications that desire a file -digest should implement it outside of GridFS and store it with other file metadata. - -Why do drivers no longer need to call the filemd5 command on upload?\ -When a chunk is inserted and no error occurs the -application can assume that the chunk was correctly inserted. No other operations that insert or modify data require the -driver to double check that the operation succeeded. It can be assumed that any errors would have been detected by use -of the appropriate write concern. Using filemd5 also prevents users from sharding chunk keys. - -What about write concern?\ -This spec leaves the choice of how to set write concern to driver authors. Implementers may -choose to accept write concern through options on the given methods, to set a configurable write concern on the GridFS -object, to enforce a single write concern for all GridFS operations, or to do something different. +Why is the default chunk size 255 KiB? + +On MMAPv1, the server provides documents with extra padding to allow for in-place updates. When the "data" field of a +chunk is limited to 255 KiB, it ensures that the whole chunk document (the chunk data along with an `_id` and other +information) will fit into a 256 KiB section of memory, making the best use of the provided padding. Users setting +custom chunk sizes are advised not to use round power-of-two values, as the whole chunk document is likely to exceed +that space and demand extra padding from the system. WiredTiger handles its memory differently, and this optimization +does not apply. However, because application code generally won"t know what storage engine will be used in the database, +always avoiding round power-of-two chunk sizes is recommended. + +Why can"t I alter documents once they are in the system? + +GridFS works with documents stored in multiple collections within MongoDB. Because there is currently no way to +atomically perform operations across collections in MongoDB, there is no way to alter stored files in a way that +prevents race conditions between GridFS clients. Updating GridFS stored files without that server functionality would +involve a data model that could support this type of concurrency, and changing the GridFS data model is outside of the +scope of this spec. + +Why provide a "rename" method? + +By providing users with a reasonable alternative for renaming a file, we can discourage users from writing directly to +the files collections under GridFS. With this approach we can prevent critical files collection documents fields from +being mistakenly altered. + +Why is there no way to perform arbitrary updates on the files collection? + +The rename helper defined in this spec allows users to easily rename a stored file. While updating files collection +documents in other, more granular ways might be helpful for some users, validating such updates to ensure that other +files collection document fields remain protected is a complicated task. We leave the decision of how best to provide +this functionality to a future spec. + +What is the "md5" field of a files collection document and how was it used? + +"md5" holds an MD5 checksum that is computed from the original contents of a user file. Historically, GridFS did not use +acknowledged writes, so this checksum was necessary to ensure that writes went through properly. With acknowledged +writes, the MD5 checksum is still useful to ensure that files in GridFS have not been corrupted. A third party directly +accessing the 'files' and "chunks" collections under GridFS could, inadvertently or maliciously, make changes to +documents that would make them unusable by GridFS. Comparing the MD5 in the files collection document to a re-computed +MD5 allows detecting such errors and corruption. However, drivers now assume that the stored file is not corrupted, and +applications that want to use the MD5 value to check for corruption must do so themselves. + +Why store the MD5 checksum instead of creating the hash as-needed? + +The MD5 checksum must be computed when a file is initially uploaded to GridFS, as this is the only time we are +guaranteed to have the entire uncorrupted file. Computing it on-the-fly as a file is read from GridFS would ensure that +our reads were successful, but guarantees nothing about the state of the file in the system. A successful check against +the stored MD5 checksum guarantees that the stored file matches the original and no corruption has occurred. + +Why are MD5 checksums now deprecated? What should users do instead? + +MD5 is prohibited by FIPS 140-2. Operating systems and libraries operating in FIPS mode do not provide the MD5 +algorithm. To avoid a broken GridFS feature on such systems, the use of MD5 with GridFS is deprecated, should not be +added to new implementations, and should be removed from existing implementations according to the deprecation policy of +individual drivers. Applications that desire a file digest should implement it outside of GridFS and store it with other +file metadata. + +Why do drivers no longer need to call the filemd5 command on upload? + +When a chunk is inserted and no error occurs the application can assume that the chunk was correctly inserted. No other +operations that insert or modify data require the driver to double check that the operation succeeded. It can be assumed +that any errors would have been detected by use of the appropriate write concern. Using filemd5 also prevents users from +sharding chunk keys. + +What about write concern? + +This spec leaves the choice of how to set write concern to driver authors. Implementers may choose to accept write +concern through options on the given methods, to set a configurable write concern on the GridFS object, to enforce a +single write concern for all GridFS operations, or to do something different. If a user has given GridFS a write concern of 0, should we perform MD5 calculations? (If supported for backwards -compatibility)\ +compatibility) + Yes, because the checksum is used for detecting future corruption or misuse of GridFS collections. -Is GridFS limited by sharded systems?\ -For best performance, clients using GridFS on a sharded system should use a shard -key that ensures all chunks for a given stored file are routed to the same shard. Therefore, if the chunks collection is -sharded, you should shard on the files_id. Normally only the chunks collection benefits from sharding, since the files -collection is usually small. Otherwise, there are no limitations to GridFS on sharded systems. - -Why is contentType deprecated?\ -Most fields in the files collection document are directly used by the driver, with the -exception of: metadata, contentType and aliases. All information that is purely for use of the application should be -embedded in the 'metadata' document. Users of GridFS who would like to store a contentType for use in their applications -are encouraged to add a 'contentType' field to the "metadata" document instead of using the deprecated top-level -"contentType" field. - -Why are aliases deprecated?\ -The "aliases" field of the files collection documents was misleading. It implies that a -file in GridFS could be accessed by alternate names when, in fact, none of the existing implementations offer this -functionality. For GridFS implementations that retrieve stored files by filename or support specifying specific -revisions of a stored file, it is unclear how "aliases" should be interpreted. Users of GridFS who would like to store -alternate filenames for use in their applications are encouraged to add an "aliases" field to the "metadata" document -instead of using the deprecated top-level "aliases" field. - -What happened to the put and get methods from earlier drafts?\ -Upload and download are more idiomatic names that more -clearly indicate their purpose. Get and put are often associated with getting and setting properties of a class, and -using them instead of download and upload was confusing. - -Why aren't there methods to upload and download byte arrays?\ -We assume that GridFS files are usually quite large and -therefore that the GridFS API must support streaming. Most languages have easy ways to wrap a stream around a byte -array. Drivers are free to add helper methods that directly support uploading and downloading GridFS files as byte -arrays. - -Should drivers report an error if a stored file has extra chunks?\ -The length and the chunkSize fields of the files -collection document together imply exactly how many chunks a stored file should have. If the chunks collection has any -extra chunks the stored file is in an inconsistent state. Ideally we would like to report that as an error, but this is -an extremely unlikely state and we don't want to pay a performance penalty checking for an error that is almost never -there. Therefore, drivers MAY ignore extra chunks. - -Why have we changed our mind about requiring the file id to be an ObjectId?\ -This spec originally required the file id -for all new GridFS files to be an ObjectId and specified that the driver itself would be the one to generate the -ObjectId when a new file was uploaded. While this sounded like a good idea, it has since become evident that there are -valid use cases for an application to want to generate its own file id, and that an application wouldn't necessarily -want to use ObjectId as the type of the file id. The most common case where an application would want to use a custom -file id is when the chunks collection is to be sharded and the application wants to use a custom file id that is -suitable for sharding. Accordingly, we have relaxed this spec to allow an application to supply a custom file id (of any -type) when uploading a new file. - -How can we maintain backward compatibility while supporting custom file ids?\ -For most methods supporting custom file -ids is as simple as relaxing the type of the id parameter from ObjectId to something more general like object or BSON -value (or to a type parameter like `` in languages that support generic methods). In a few cases new methods -were added to support custom file ids. The original upload_from_stream method returned an ObjectId, and support for -custom file ids is implemented by adding a new method that takes the custom file id as an additional parameter. Drivers -should continue to support the original method if possible to maintain backward compatibility. This spec does not -attempt to completely mandate how each driver should maintain backward compatibility, as different languages have -different approaches and capabilities for maintaining backward compatibility. +Is GridFS limited by sharded systems? + +For best performance, clients using GridFS on a sharded system should use a shard key that ensures all chunks for a +given stored file are routed to the same shard. Therefore, if the chunks collection is sharded, you should shard on the +files_id. Normally only the chunks collection benefits from sharding, since the files collection is usually small. +Otherwise, there are no limitations to GridFS on sharded systems. + +Why is contentType deprecated? + +Most fields in the files collection document are directly used by the driver, with the exception of: metadata, +contentType and aliases. All information that is purely for use of the application should be embedded in the 'metadata' +document. Users of GridFS who would like to store a contentType for use in their applications are encouraged to add a +'contentType' field to the "metadata" document instead of using the deprecated top-level "contentType" field. + +Why are aliases deprecated? + +The "aliases" field of the files collection documents was misleading. It implies that a file in GridFS could be accessed +by alternate names when, in fact, none of the existing implementations offer this functionality. For GridFS +implementations that retrieve stored files by filename or support specifying specific revisions of a stored file, it is +unclear how "aliases" should be interpreted. Users of GridFS who would like to store alternate filenames for use in +their applications are encouraged to add an "aliases" field to the "metadata" document instead of using the deprecated +top-level "aliases" field. + +What happened to the put and get methods from earlier drafts? + +Upload and download are more idiomatic names that more clearly indicate their purpose. Get and put are often associated +with getting and setting properties of a class, and using them instead of download and upload was confusing. + +Why aren't there methods to upload and download byte arrays? + +We assume that GridFS files are usually quite large and therefore that the GridFS API must support streaming. Most +languages have easy ways to wrap a stream around a byte array. Drivers are free to add helper methods that directly +support uploading and downloading GridFS files as byte arrays. + +Should drivers report an error if a stored file has extra chunks? + +The length and the chunkSize fields of the files collection document together imply exactly how many chunks a stored +file should have. If the chunks collection has any extra chunks the stored file is in an inconsistent state. Ideally we +would like to report that as an error, but this is an extremely unlikely state and we don't want to pay a performance +penalty checking for an error that is almost never there. Therefore, drivers MAY ignore extra chunks. + +Why have we changed our mind about requiring the file id to be an ObjectId? + +This spec originally required the file id for all new GridFS files to be an ObjectId and specified that the driver +itself would be the one to generate the ObjectId when a new file was uploaded. While this sounded like a good idea, it +has since become evident that there are valid use cases for an application to want to generate its own file id, and that +an application wouldn't necessarily want to use ObjectId as the type of the file id. The most common case where an +application would want to use a custom file id is when the chunks collection is to be sharded and the application wants +to use a custom file id that is suitable for sharding. Accordingly, we have relaxed this spec to allow an application to +supply a custom file id (of any type) when uploading a new file. + +How can we maintain backward compatibility while supporting custom file ids? + +For most methods supporting custom file ids is as simple as relaxing the type of the id parameter from ObjectId to +something more general like object or BSON value (or to a type parameter like `` in languages that support +generic methods). In a few cases new methods were added to support custom file ids. The original upload_from_stream +method returned an ObjectId, and support for custom file ids is implemented by adding a new method that takes the custom +file id as an additional parameter. Drivers should continue to support the original method if possible to maintain +backward compatibility. This spec does not attempt to completely mandate how each driver should maintain backward +compatibility, as different languages have different approaches and capabilities for maintaining backward compatibility. ## Backwards Compatibility This spec presents a new API for GridFS systems, which may break existing functionality for some drivers. The following are suggestions for ways to mitigate these incompatibilities. -File revisions\ -This document presents a basic API that does not support specifying specific revisions of a stored file, -and an advanced API that does. Drivers MAY choose to implement whichever API is closest to the functionality they now -support. Note that the methods for file insertion are the same whether specifying specific revisions is supported or -not. - -Method names\ -If drivers provide methods that conform to the functionality outlined in this document, drivers MAY -continue to provide those methods under their existing names. In this case, drivers SHOULD make it clear in their -documentation that these methods have equivalents defined in the spec under a different name. - -ContentType field\ -Drivers MAY continue to create a "contentType'" field within files collection documents, so that -applications depending on this field continue to work. However, drivers SHOULD make it clear in their documentation that -this field is deprecated, and is not used at all in driver code. Documentation SHOULD encourage users to store -contentType in the "metadata" document instead. - -Aliases field\ -Drivers MAY continue to create an "aliases" field within files collection documents, so that applications -depending on this field continue to work. However, drivers SHOULD make it clear in their documentation that this field -is deprecated, and is not used at all in driver code. Documentation SHOULD encourage users to store aliases in the +File revisions + +This document presents a basic API that does not support specifying specific revisions of a stored file, and an advanced +API that does. Drivers MAY choose to implement whichever API is closest to the functionality they now support. Note that +the methods for file insertion are the same whether specifying specific revisions is supported or not. + +Method names + +If drivers provide methods that conform to the functionality outlined in this document, drivers MAY continue to provide +those methods under their existing names. In this case, drivers SHOULD make it clear in their documentation that these +methods have equivalents defined in the spec under a different name. + +ContentType field + +Drivers MAY continue to create a "contentType'" field within files collection documents, so that applications depending +on this field continue to work. However, drivers SHOULD make it clear in their documentation that this field is +deprecated, and is not used at all in driver code. Documentation SHOULD encourage users to store contentType in the "metadata" document instead. +Aliases field + +Drivers MAY continue to create an "aliases" field within files collection documents, so that applications depending on +this field continue to work. However, drivers SHOULD make it clear in their documentation that this field is deprecated, +and is not used at all in driver code. Documentation SHOULD encourage users to store aliases in the "metadata" document +instead. + ## Reference Implementation TBD diff --git a/source/index-management/index-management.md b/source/index-management/index-management.md index 7a19e14e21..667af29530 100644 --- a/source/index-management/index-management.md +++ b/source/index-management/index-management.md @@ -20,23 +20,27 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH #### Terms -**Collection**\ +**Collection** + The term `Collection` references the object in the driver that represents a collection on the server. -**Cursor**\ +**Cursor** + The term `Cursor` references the driver's cursor object. -**Iterable**\ +**Iterable** + The term `Iterable` is to describe an object that is a sequence of elements that can be iterated over. -**Document**\ +**Document** + The term `Document` refers to the implementation in the driver's language of a BSON document. -**Result**\ -The term `Result` references the object that is normally returned by the driver as the result of a command -execution. In the case of situations where an actual command is not executed, rather an insert or a query, an object -that adheres to the same interface must be returned with as much information as possible that could be obtained from the -operation. +**Result** + +The term `Result` references the object that is normally returned by the driver as the result of a command execution. In +the case of situations where an actual command is not executed, rather an insert or a query, an object that adheres to +the same interface must be returned with as much information as possible that could be obtained from the operation. ### Guidance @@ -1053,12 +1057,12 @@ interface SearchIndexView extends Iterable { ### Q & A -Q: Where is write concern?\ -The `createIndexes` and `dropIndexes` commands take a write concern that indicates how the -write is acknowledged. Since all operations defined in this specification are performed on a collection, it's uncommon -that two different index operations on the same collection would use a different write concern. As such, the most -natural place to indicate write concern is on the client, the database, or the collection itself and not the operations -within it. +Q: Where is write concern? + +The `createIndexes` and `dropIndexes` commands take a write concern that indicates how the write is acknowledged. Since +all operations defined in this specification are performed on a collection, it's uncommon that two different index +operations on the same collection would use a different write concern. As such, the most natural place to indicate write +concern is on the client, the database, or the collection itself and not the operations within it. However, it might be that a driver needs to expose write concern to a user per operation for various reasons. It is permitted to allow a write concern option, but since writeConcern is a top-level command option, it MUST NOT be @@ -1066,11 +1070,12 @@ specified as part of an `IndexModel` passed into the helper. It SHOULD be specif helper. For example, it would be ambiguous to specify write concern for one or more models passed to `createIndexes()`, but it would not be to specify it via the `CreateIndexesOptions`. -Q: What does the commitQuorum option do?\ -Prior to MongoDB 4.4, secondaries would simply replicate index builds once -they were completed on the primary. Building indexes requires an exclusive lock on the collection being indexed, so the -secondaries would be blocked from replicating all other operations while the index build took place. This would -introduce replication lag correlated to however long the index build took. +Q: What does the commitQuorum option do? + +Prior to MongoDB 4.4, secondaries would simply replicate index builds once they were completed on the primary. Building +indexes requires an exclusive lock on the collection being indexed, so the secondaries would be blocked from replicating +all other operations while the index build took place. This would introduce replication lag correlated to however long +the index build took. Starting in MongoDB 4.4, secondaries build indexes simultaneously with the primary, and after starting an index build, the primary will wait for a certain number of data-bearing nodes, including itself, to have completed the build before @@ -1083,11 +1088,12 @@ committing the index. The server-default value for `commitQuorum` is "votingMembers", which means the primary will wait for all voting data-bearing nodes to complete building the index before it commits it. -Q: Why would a user want to specify a non-default `commitQuorum`?\ -Like `w: "majority"`, `commitQuorum: "votingMembers"` -doesn't consider non-voting data-bearing nodes such as analytics nodes. If a user wanted to ensure these nodes didn't -lag behind, then they would specify `commitQuorum: `. -Alternatively, if they wanted to ensure only specific non-voting nodes didn't lag behind, they could specify a +Q: Why would a user want to specify a non-default `commitQuorum` + +Like `w: "majority"`, `commitQuorum: "votingMembers"` doesn't consider non-voting data-bearing nodes such as analytics +nodes. If a user wanted to ensure these nodes didn't lag behind, then they would specify +`commitQuorum: `. Alternatively, if they wanted to +ensure only specific non-voting nodes didn't lag behind, they could specify a [custom getLastErrorMode based on the nodes' tag sets](https://www.mongodb.com/docs/manual/reference/replica-configuration/#rsconf.settings.getLastErrorModes) (e.g. `commitQuorum: `). @@ -1095,11 +1101,12 @@ Additionally, if a user has a high tolerance for replication lag, they can set a useful for situations where certain secondaries take longer to build indexes than the primaries, and the user doesn't care if they lag behind. -Q: What is the difference between write concern and `commitQuorum`?\ -While these two options share a lot in terms of how -they are specified, they configure entirely different things. `commitQuorum` determines how much new replication lag an -index build can tolerably introduce, but it says nothing of durability. Write concern specifies the durability -requirements of an index build, but it makes no guarantees about introducing replication lag. +Q: What is the difference between write concern and `commitQuorum`? + +While these two options share a lot in terms of how they are specified, they configure entirely different things. +`commitQuorum` determines how much new replication lag an index build can tolerably introduce, but it says nothing of +durability. Write concern specifies the durability requirements of an index build, but it makes no guarantees about +introducing replication lag. For instance, an index built with `writeConcern: { w: 1 }, commitQuorum: "votingMembers"` could possibly be rolled back, but it will not introduce any new replication lag. Likewise, an index built with @@ -1110,61 +1117,50 @@ lag. To ensure the index is both durable and will not introduce replication lag Also note that, since indexes are built simultaneously, higher values of `commitQuorum` are not as expensive as higher values of `writeConcern`. -Q: Why does the driver manually throw errors if the `commitQuorum` option is specified against a pre 4.4 -server?\ -Starting in 3.4, the server validates all options passed to the `createIndexes` command, but due to a bug in -versions 4.2.0-4.2.5 of the server (SERVER-47193), specifying `commitQuorum` does not result in an error. The option is -used internally by the server on those versions, and its value could have adverse effects on index builds. To prevent -users from mistakenly specifying this option, drivers manually verify it is only sent to 4.4+ servers. +Q: Why does the driver manually throw errors if the `commitQuorum` option is specified against a pre 4.4 server?
+Starting in 3.4, the server validates all options passed to the `createIndexes` command, but due to a bug in versions +4.2.0-4.2.5 of the server (SERVER-47193), specifying `commitQuorum` does not result in an error. The option is used +internally by the server on those versions, and its value could have adverse effects on index builds. To prevent users +from mistakenly specifying this option, drivers manually verify it is only sent to 4.4+ servers. #### Changelog - 2024-03-05: Migrated from reStructuredText to Markdown. -- 2023-11-08: Clarify that `readConcern` and `writeConcern` must not be\ - applied to search index management commands. +- 2023-11-08: Clarify that `readConcern` and `writeConcern` must not be applied to search index management commands. - 2023-07-27: Add search index management clarifications. - 2023-05-18: Add the search index management API. -- 2023-05-10: Merge index enumeration and index management specs and get rid of references\ - to legacy server versions. +- 2023-05-10: Merge index enumeration and index management specs and get rid of references to legacy server versions. - 2022-10-05: Remove spec front matter and reformat changelog. -- 2022-04-18: Added the `clustered` attribute to `IndexOptions` in order to\ - support clustered collections. +- 2022-04-18: Added the `clustered` attribute to `IndexOptions` in order to support clustered collections. -- 2022-02-10: Specified that `getMore` command must explicitly send inherited\ - comment. +- 2022-02-10: Specified that `getMore` command must explicitly send inherited comment. - 2022-02-01: Added comment field to helper methods. -- 2022-01-19: Require that timeouts be applied per the client-side operations\ - timeout spec. +- 2022-01-19: Require that timeouts be applied per the client-side operations timeout spec. -- 2020-03-30: Added options types to various helpers. Introduced `commitQuorum`\ - option. Added deprecation message for +- 2020-03-30: Added options types to various helpers. Introduced `commitQuorum` option. Added deprecation message for `background` option. -- 2019-04-24: Added `wildcardProjection` attribute to `IndexOptions` in order\ - to support setting a wildcard projection +- 2019-04-24: Added `wildcardProjection` attribute to `IndexOptions` in order to support setting a wildcard projection on a wildcard index. - 2017-06-07: Include listIndexes() in Q&A about maxTimeMS. - 2017-05-31: Add Q & A addressing write concern and maxTimeMS option. -- 2016-10-11: Added note on 3.4 servers validation options passed to\ - `createIndexes`. Add note on server generated name +- 2016-10-11: Added note on 3.4 servers validation options passed to `createIndexes`. Add note on server generated name for the `_id` index. - 2016-08-08: Fixed `collation` language to not mention a collection default. -- 2016-05-19: Added `collation` attribute to `IndexOptions` in order to\ - support setting a collation on an index. +- 2016-05-19: Added `collation` attribute to `IndexOptions` in order to support setting a collation on an index. -- 2015-09-17: Added `partialFilterExpression` attribute to `IndexOptions` in\ - order to support partial indexes. Fixed +- 2015-09-17: Added `partialFilterExpression` attribute to `IndexOptions` in order to support partial indexes. Fixed "provides" typo. diff --git a/source/initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md b/source/initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md index 4f4c340a31..65e4275b8c 100644 --- a/source/initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md +++ b/source/initial-dns-seedlist-discovery/initial-dns-seedlist-discovery.md @@ -258,51 +258,41 @@ In the future we could consider using the priority and weight fields of the SRV - 2022-10-05: Revise spec front matter and reformat changelog. -- 2021-10-14: Add `srvMaxHosts` MongoClient option and restructure Seedlist\ - Discovery section. Improve documentation - for the `srvServiceName` MongoClient option and add a new URI Validation section. +- 2021-10-14: Add `srvMaxHosts` MongoClient option and restructure Seedlist Discovery section. Improve documentation for + the `srvServiceName` MongoClient option and add a new URI Validation section. -- 2021-09-15: Clarify that service name only defaults to `mongodb`, and should\ - be defined by the `srvServiceName` URI +- 2021-09-15: Clarify that service name only defaults to `mongodb`, and should be defined by the `srvServiceName` URI option. - 2021-04-15: Adding in behaviour for load balancer mode. - 2019-03-07: Clarify that CNAME is not supported -- 2018-02-08: Clarify that `{options}}` in the [Specification](#specification) section includes\ - all the optional +- 2018-02-08: Clarify that `{options}}` in the [Specification](#specification) section includes all the optional elements from the Connection String specification. -- 2017-11-21: Add clause that using `mongodb+srv://` implies enabling TLS. Add\ - restriction that only `authSource` and +- 2017-11-21: Add clause that using `mongodb+srv://` implies enabling TLS. Add restriction that only `authSource` and `replicaSet` are allows in TXT records. Add restriction that only one TXT record is supported share the same parent domain name as the given host name. -- 2017-11-17: Add new rule that indicates that host names in returned SRV records\ - MUST share the same parent domain - name as the given host name. Remove language and tests for non-ASCII characters. +- 2017-11-17: Add new rule that indicates that host names in returned SRV records MUST share the same parent domain name + as the given host name. Remove language and tests for non-ASCII characters. -- 2017-11-07: Clarified that all parts of listable options such as\ - readPreferenceTags are ignored if they are also +- 2017-11-07: Clarified that all parts of listable options such as readPreferenceTags are ignored if they are also present in options to the MongoClient constructor. Clarified which host names to use for SRV and TXT DNS queries. - 2017-11-01: Clarified that individual TXT records can have multiple strings. -- 2017-10-31: Added a clause that specifying two host names with a\ - `mongodb+srv://` URI is not allowed. Added a few - more test cases. +- 2017-10-31: Added a clause that specifying two host names with a `mongodb+srv://` URI is not allowed. Added a few more + test cases. - 2017-10-18: Removed prohibition of raising DNS related errors when parsing the URI. -- 2017-10-04: Removed from [Future Work](#future-work) the line about multiple MongoS\ - discovery. The current +- 2017-10-04: Removed from [Future Work](#future-work) the line about multiple MongoS discovery. The current specification already allows for it, as multiple host names which are all MongoS servers is already allowed under SDAM. And this specification does not modify SDAM. Added support for connection string options through TXT records. -- 2017-09-19: Clarify that host names in `mongodb+srv://` URLs work like normal\ - host specifications. +- 2017-09-19: Clarify that host names in `mongodb+srv://` URLs work like normal host specifications. -- 2017-09-01: Updated test plan with YAML tests, and moved prose tests for URI\ - parsing into invalid-uris.yml in the +- 2017-09-01: Updated test plan with YAML tests, and moved prose tests for URI parsing into invalid-uris.yml in the Connection String Spec tests. diff --git a/source/logging/logging.md b/source/logging/logging.md index f6578f3832..6c260b68dd 100644 --- a/source/logging/logging.md +++ b/source/logging/logging.md @@ -19,13 +19,14 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**Structured logging**\ -Structured logging refers to producing log messages in a structured format, i.e. a series of -key-value pairs, which can be converted to external formats such as JSON. +**Structured logging** -**Unstructured logging**\ -Unstructured logging refers to producing string log messages which embed all attached -information within that string. +Structured logging refers to producing log messages in a structured format, i.e. a series of key-value pairs, which can +be converted to external formats such as JSON. + +**Unstructured logging** + +Unstructured logging refers to producing string log messages which embed all attached information within that string. ### Implementation requirements @@ -279,30 +280,31 @@ of drivers for our internal teams, and improve our documentation around troubles ### Truncation of large documents -1. Why have an option?\ - We considered a number of approaches for dealing with documents of potentially very large size - in log messages, e.g. command documents, including 1) always logging the full document, 2) only logging documents - with the potential to be large when the user opts in, and 3) truncating large documents by default, but allowing the - user to adjust the maximum length logged. We chose the third option as we felt it struck the best balance between - concerns around readability and usability of log messages. In the case where data is sufficiently small, the default - behavior will show the user the full data. In the case where data is large, the user will receive a readable message - with truncated data, but have the option to see more or all of the data. - -2. Why are the units for max document length flexible?\ - String APIs vary across languages, and not all drivers will be - able to easily and efficiently truncate strings in the same exact manner. The important thing is that the option - exists and that its default value is reasonable, and for all possible unit choices (byte, code point, code unit, or - grapheme) we felt 1000 was a reasonable default. See [here](https://exploringjs.com/impatient-js/ch_unicode.html) for - a helpful primer on related Unicode concepts. - -3. Why do we implement naive truncation rather than truncating the JSON so it is still valid?\ - Designing and - implementing a truncation algorithm for JSON that outputs valid JSON, but fits in as much of the original JSON as - possible, would be non-trivial. The server team wrote an entire separate truncation design document when they - implemented this for their log messages. This is more of a necessity for the server where the entire log message is - JSON, but we don't know if parsing the documents included in log messages is something that users will actually need - to do. Furthermore, any users who want parseable documents have an escape hatch to do so: they can set the max - document length to a very large value. If we hear of use cases in the future for parsing the documents in log +1. Why have an option? + + We considered a number of approaches for dealing with documents of potentially very large size in log messages, e.g. + command documents, including 1) always logging the full document, 2) only logging documents with the potential to be + large when the user opts in, and 3) truncating large documents by default, but allowing the user to adjust the + maximum length logged. We chose the third option as we felt it struck the best balance between concerns around + readability and usability of log messages. In the case where data is sufficiently small, the default behavior will + show the user the full data. In the case where data is large, the user will receive a readable message with truncated + data, but have the option to see more or all of the data. + +2. Why are the units for max document length flexible? + + String APIs vary across languages, and not all drivers will be able to easily and efficiently truncate strings in the + same exact manner. The important thing is that the option exists and that its default value is reasonable, and for + all possible unit choices (byte, code point, code unit, or grapheme) we felt 1000 was a reasonable default. See + [here](https://exploringjs.com/impatient-js/ch_unicode.html) for a helpful primer on related Unicode concepts. + +3. Why do we implement naive truncation rather than truncating the JSON so it is still valid? + + Designing and implementing a truncation algorithm for JSON that outputs valid JSON, but fits in as much of the + original JSON as possible, would be non-trivial. The server team wrote an entire separate truncation design document + when they implemented this for their log messages. This is more of a necessity for the server where the entire log + message is JSON, but we don't know if parsing the documents included in log messages is something that users will + actually need to do. Furthermore, any users who want parseable documents have an escape hatch to do so: they can set + the max document length to a very large value. If we hear of use cases in the future for parsing the documents in log messages, we could make an additive change to this specification to permit a smarter truncation algorithm. ### Structured versus Unstructured Logging @@ -411,8 +413,7 @@ on individual clients or for particular namespaces. - 2022-12-29: Fix typo in trace log level example -- 2023-01-04: Elaborate on treatment of invalid values of environment variables.\ - Permit drivers to omit direct support +- 2023-01-04: Elaborate on treatment of invalid values of environment variables. Permit drivers to omit direct support for logging to file so long as they provide a straightforward way for users to consume the log messages programmatically and write to a file themselves. Require that programmatic configuration take precedence over environment variables. diff --git a/source/max-staleness/max-staleness.md b/source/max-staleness/max-staleness.md index 1770beb630..8062386330 100644 --- a/source/max-staleness/max-staleness.md +++ b/source/max-staleness/max-staleness.md @@ -485,10 +485,8 @@ client-side setting. - 2016-10-24: Rename option from "maxStalenessMS" to "maxStalenessSeconds". -- 2016-10-25: Change minimum maxStalenessSeconds value from 2 \*\ - heartbeatFrequencyMS to heartbeatFrequencyMS + +- 2016-10-25: Change minimum maxStalenessSeconds value from 2 * heartbeatFrequencyMS to heartbeatFrequencyMS + idleWritePeriodMS (with proper conversions of course). -- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the\ - future, require maxStalenessSeconds to +- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the future, require maxStalenessSeconds to be at least 90. diff --git a/source/mongodb-handshake/handshake.md b/source/mongodb-handshake/handshake.md index e540ccc430..0ec83a53e9 100644 --- a/source/mongodb-handshake/handshake.md +++ b/source/mongodb-handshake/handshake.md @@ -27,18 +27,19 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ## Terms -**hello command**\ -The command named `hello`. It is the preferred and modern command for handshakes and topology -monitoring. - -**legacy hello command**\ -The command named `isMaster`. It is the deprecated equivalent of the `hello` command. It was -deprecated in MongoDB 5.0. - -**isMaster / ismaster**\ -The correct casing is `isMaster`, but servers will accept the alternate casing `ismaster`. -Other case variations result in `CommandNotFound`. Drivers MUST take this case variation into account when determining -which commands to encrypt, redact, or otherwise treat specially. +**hello command** + +The command named `hello`. It is the preferred and modern command for handshakes and topology monitoring. + +**legacy hello command** + +The command named `isMaster`. It is the deprecated equivalent of the `hello` command. It was deprecated in MongoDB 5.0. + +**isMaster / ismaster** + +The correct casing is `isMaster`, but servers will accept the alternate casing `ismaster`. Other case variations result +in `CommandNotFound`. Drivers MUST take this case variation into account when determining which commands to encrypt, +redact, or otherwise treat specially. ## Specification diff --git a/source/objectid.md b/source/objectid.md index 31da8c5357..34da2d7234 100644 --- a/source/objectid.md +++ b/source/objectid.md @@ -126,14 +126,11 @@ Currently there is no full reference implementation yet. - 2022-10-05: Remove spec front matter and reformat changelog. -- 2019-01-14: Clarify that the random numbers don't need to be cryptographically\ - secure. Add a test to test that the +- 2019-01-14: Clarify that the random numbers don't need to be cryptographically secure. Add a test to test that the unique value is different in forked processes. -- 2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian,\ - and add the reason why. +- 2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian, and add the reason why. -- 2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte\ - unique value +- 2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte unique value - 2018-05-22: Initial Release diff --git a/source/ocsp-support/ocsp-support.md b/source/ocsp-support/ocsp-support.md index 759d9662c1..8629600823 100644 --- a/source/ocsp-support/ocsp-support.md +++ b/source/ocsp-support/ocsp-support.md @@ -468,8 +468,7 @@ find ~/profile/Library/Keychains -name 'ocspcache.sqlite3' \ -exec sqlite3 "{}" 'DELETE FROM responses ;' \; ``` -To delete only "Let's Encrypt" related entries, the following command\ -could be used: +To delete only "Let's Encrypt" related entries, the following command could be used: ```bash find ~/profile/Library/Keychains -name 'ocspcache.sqlite3' \ @@ -543,16 +542,13 @@ library has contacted the OCSP endpoint specified in the server's certificate. T - 2021-04-07: Updated terminology to use allowList. -- 2020-07-01: Default tlsDisableOCSPEndpointCheck or\ - tlsDisableCertificateRevocationCheck to true in the case that a +- 2020-07-01: Default tlsDisableOCSPEndpointCheck or tlsDisableCertificateRevocationCheck to true in the case that a driver's TLS library exhibits hard-fail behavior and add provision for platform-specific defaults. -- 2020-03-20: Clarify OCSP documentation requirements for drivers unable to\ - enable OCSP by default on a per MongoClient +- 2020-03-20: Clarify OCSP documentation requirements for drivers unable to enable OCSP by default on a per MongoClient basis. -- 2020-03-03: Add tlsDisableCertificateRevocationCheck URI option. Add Go as a\ - reference implementation. Add hard-fail +- 2020-03-03: Add tlsDisableCertificateRevocationCheck URI option. Add Go as a reference implementation. Add hard-fail backwards compatibility documentation requirements. - 2020-02-26: Add tlsDisableOCSPEndpointCheck URI option. @@ -561,8 +557,7 @@ library has contacted the OCSP endpoint specified in the server's certificate. T - 2020-02-10: Add cache requirement. -- 2020-01-31: Add SNI requirement and clarify design rationale regarding\ - minimizing round trips. +- 2020-01-31: Add SNI requirement and clarify design rationale regarding minimizing round trips. - 2020-01-28: Clarify behavior regarding nonces and tolerance periods. diff --git a/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.md b/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.md index 974f7ba3ff..d1c2326b78 100644 --- a/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.md +++ b/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.md @@ -171,6 +171,5 @@ No future work is expected. - 2021-10-14: Specify behavior for `srvMaxHosts` MongoClient option. -- 2021-09-15: Clarify that service name only defaults to `mongodb`, and should\ - be defined by the `srvServiceName` URI +- 2021-09-15: Clarify that service name only defaults to `mongodb`, and should be defined by the `srvServiceName` URI option. diff --git a/source/retryable-reads/retryable-reads.md b/source/retryable-reads/retryable-reads.md index 634c7f7f58..e301aeb6e9 100644 --- a/source/retryable-reads/retryable-reads.md +++ b/source/retryable-reads/retryable-reads.md @@ -548,18 +548,14 @@ any customers experiencing degraded performance can simply disable `retryableRea - 2024-04-30: Migrated from reStructuredText to Markdown. -- 2023-12-05: Add that any server information associated with retryable\ - exceptions MUST reflect the originating server, +- 2023-12-05: Add that any server information associated with retryable exceptions MUST reflect the originating server, even in the presence of retries. -- 2023-11-30: Add ReadConcernMajorityNotAvailableYet to the list of error codes\ - that should be retried. +- 2023-11-30: Add ReadConcernMajorityNotAvailableYet to the list of error codes that should be retried. -- 2023-11-28: Add ExceededTimeLimit to the list of error codes that should\ - be retried. +- 2023-11-28: Add ExceededTimeLimit to the list of error codes that should be retried. -- 2023-08-26: Require that in a sharded cluster the server on which the\ - operation failed MUST be provided to the server +- 2023-08-26: Require that in a sharded cluster the server on which the operation failed MUST be provided to the server selection mechanism as a deprioritized server. - 2023-08-21: Update Q&A that contradicts SDAM transient error logic @@ -572,8 +568,7 @@ any customers experiencing degraded performance can simply disable `retryableRea - 2022-01-25: Note that drivers should retry handshake network failures. -- 2021-04-26: Replaced deprecated terminology; removed requirement to parse error\ - message text as MongoDB 3.6+ servers +- 2021-04-26: Replaced deprecated terminology; removed requirement to parse error message text as MongoDB 3.6+ servers will always return an error code - 2021-03-23: Require that PoolClearedErrors are retried diff --git a/source/retryable-reads/tests/README.md b/source/retryable-reads/tests/README.md index 2f1e18fad0..aad7f25539 100644 --- a/source/retryable-reads/tests/README.md +++ b/source/retryable-reads/tests/README.md @@ -129,9 +129,8 @@ This test MUST be executed against a sharded cluster that supports `retryReads=t - 2024-03-06: Convert legacy retryable reads tests to unified format. -- 2024-02-21: Update mongos redirection prose tests to workaround SDAM behavior\ - preventing execution of - deprioritization code paths. +- 2024-02-21: Update mongos redirection prose tests to workaround SDAM behavior preventing execution of deprioritization + code paths. - 2023-08-26: Add prose tests for retrying in a sharded cluster. @@ -141,8 +140,7 @@ This test MUST be executed against a sharded cluster that supports `retryReads=t - 2021-08-27: Clarify behavior of `useMultipleMongoses` for `LoadBalanced` topologies. -- 2019-03-19: Add top-level `runOn` field to denote server version and/or\ - topology requirements requirements for the +- 2019-03-19: Add top-level `runOn` field to denote server version and/or topology requirements requirements for the test file. Removes the `minServerVersion` and `topology` top-level fields, which are now expressed within `runOn` elements. diff --git a/source/retryable-writes/retryable-writes.md b/source/retryable-writes/retryable-writes.md index 1adfe21e18..9856a23956 100644 --- a/source/retryable-writes/retryable-writes.md +++ b/source/retryable-writes/retryable-writes.md @@ -28,22 +28,25 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**Transaction ID**\ -The transaction ID identifies the transaction as part of which the command is running. In a write -command where the client has requested retryable behavior, it is expressed by the top-level `lsid` and `txnNumber` -fields. The `lsid` component is the corresponding server session ID. which is a BSON value defined in the +**Transaction ID** + +The transaction ID identifies the transaction as part of which the command is running. In a write command where the +client has requested retryable behavior, it is expressed by the top-level `lsid` and `txnNumber` fields. The `lsid` +component is the corresponding server session ID. which is a BSON value defined in the [Driver Session](../sessions/driver-sessions.md) specification. The `txnNumber` component is a monotonically increasing (per server session), positive 64-bit integer. -**ClientSession**\ -Driver object representing a client session, which is defined in the -[Driver Session](../sessions/driver-sessions.md) specification. This object is always associated with a server session; -however, drivers will pool server sessions so that creating a ClientSession will not always entail creation of a new -server session. The name of this object MAY vary across drivers. +**ClientSession** + +Driver object representing a client session, which is defined in the [Driver Session](../sessions/driver-sessions.md) +specification. This object is always associated with a server session; however, drivers will pool server sessions so +that creating a ClientSession will not always entail creation of a new server session. The name of this object MAY vary +across drivers. + +**Retryable Error** -**Retryable Error**\ -An error is considered retryable if it has a RetryableWriteError label in its top-level -"errorLabels" field. See [Determining Retryable Errors](#determining-retryable-errors) for more information. +An error is considered retryable if it has a RetryableWriteError label in its top-level "errorLabels" field. See +[Determining Retryable Errors](#determining-retryable-errors) for more information. Additional terms may be defined in the [Driver Session](../sessions/driver-sessions.md) specification. @@ -680,25 +683,21 @@ retryWrites is not true would be inconsistent with the server and potentially co - 2024-04-29: Fix the link to the Driver Sessions spec. -- 2024-01-16: Do not use `writeConcernError.code` in pre-4.4 mongos response to\ - determine retryability. Do not use +- 2024-01-16: Do not use `writeConcernError.code` in pre-4.4 mongos response to determine retryability. Do not use `writeErrors[].code` in pre-4.4 server responses to determine retryability. - 2023-12-06: Clarify that writes are not retried within transactions. -- 2023-12-05: Add that any server information associated with retryable\ - exceptions MUST reflect the originating server, +- 2023-12-05: Add that any server information associated with retryable exceptions MUST reflect the originating server, even in the presence of retries. - 2023-10-02: When CSOT is not enabled, one retry attempt occurs. -- 2023-08-26: Require that in a sharded cluster the server on which the\ - operation failed MUST be provided to the server +- 2023-08-26: Require that in a sharded cluster the server on which the operation failed MUST be provided to the server selection mechanism as a deprioritized server. -- 2022-11-17: Add logic for persisting "currentError" as "previousError" on first\ - retry attempt, avoiding raising - "null" errors. +- 2022-11-17: Add logic for persisting "currentError" as "previousError" on first retry attempt, avoiding raising "null" + errors. - 2022-11-09: CLAM must apply both events and log messages. @@ -708,32 +707,26 @@ retryWrites is not true would be inconsistent with the server and potentially co - 2022-01-25: Note that drivers should retry handshake network failures. -- 2021-11-02: Clarify that error labels are only specified in a top-level field\ - of an error. +- 2021-11-02: Clarify that error labels are only specified in a top-level field of an error. - 2021-04-26: Replaced deprecated terminology - 2021-03-24: Require that PoolClearedErrors be retried -- 2020-09-01: State the the driver should only add the RetryableWriteError label\ - to network errors when connected to a +- 2020-09-01: State the the driver should only add the RetryableWriteError label to network errors when connected to a 4.4+ server. -- 2020-02-25: State that the driver should only add the RetryableWriteError label\ - when retryWrites is on, and make it +- 2020-02-25: State that the driver should only add the RetryableWriteError label when retryWrites is on, and make it clear that mongos will sometimes perform internal retries and not return the RetryableWriteError label. - 2020-02-10: Remove redundant content in Tests section. -- 2020-01-14: Add ExceededTimeLimit to the list of error codes that should\ - receive a RetryableWriteError label. +- 2020-01-14: Add ExceededTimeLimit to the list of error codes that should receive a RetryableWriteError label. -- 2019-10-21: Change the definition of "retryable write" to be based on the\ - RetryableWriteError label. Stop requiring +- 2019-10-21: Change the definition of "retryable write" to be based on the RetryableWriteError label. Stop requiring drivers to parse errmsg to categorize retryable errors for pre-4.4 servers. -- 2019-07-30: Drivers must rewrite error messages for error code 20 when\ - txnNumber is not supported by the storage +- 2019-07-30: Drivers must rewrite error messages for error code 20 when txnNumber is not supported by the storage engine. - 2019-06-07: Mention `$merge` stage for aggregate alongside `$out` @@ -742,9 +735,7 @@ retryWrites is not true would be inconsistent with the server and potentially co - 2019-03-06: retryWrites now defaults to true. -- 2019-03-05: Prohibit resending wire protocol messages if doing so would violate\ - rules for gossipping the cluster - time. +- 2019-03-05: Prohibit resending wire protocol messages if doing so would violate rules for gossipping the cluster time. - 2018-06-07: WriteConcernFailed is not a retryable error code. @@ -752,30 +743,25 @@ retryWrites is not true would be inconsistent with the server and potentially co - 2018-03-14: Clarify that retryable writes may fail with a FCV 3.4 shard. -- 2017-11-02: Drivers should not raise errors if selected server does not support\ - retryable writes and instead fall - back to non-retryable behavior. In addition to wire protocol version, drivers may check for - `logicalSessionTimeoutMinutes` to determine if a server supports sessions and retryable writes. +- 2017-11-02: Drivers should not raise errors if selected server does not support retryable writes and instead fall back + to non-retryable behavior. In addition to wire protocol version, drivers may check for `logicalSessionTimeoutMinutes` + to determine if a server supports sessions and retryable writes. -- 2017-10-26: Errors when retrying may be raised instead of the original error\ - provided they allow the user to infer +- 2017-10-26: Errors when retrying may be raised instead of the original error provided they allow the user to infer that an attempt was made. - 2017-10-23: Drivers must document operations that support retryability. -- 2017-10-23: Raise the original retryable error if server selection or wire\ - protocol checks fail during the retry +- 2017-10-23: Raise the original retryable error if server selection or wire protocol checks fail during the retry attempt. Encourage drivers to provide intermediary write results after an unrecoverable failure during a bulk write. - 2017-10-18: Standalone servers do not support retryable writes. - 2017-10-18: Also retry writes after a "not writable primary" error. -- 2017-10-08: Renamed `txnNum` to `txnNumber` and noted that it must be a\ - 64-bit integer (BSON type 0x12). +- 2017-10-08: Renamed `txnNum` to `txnNumber` and noted that it must be a 64-bit integer (BSON type 0x12). -- 2017-08-25: Drivers will maintain an allow list so that only supported write\ - operations may be retried. Transaction +- 2017-08-25: Drivers will maintain an allow list so that only supported write operations may be retried. Transaction IDs will not be included in unsupported write commands, irrespective of the `retryWrites` option. - 2017-08-18: `retryWrites` is now a MongoClient option. diff --git a/source/retryable-writes/tests/README.md b/source/retryable-writes/tests/README.md index e883ca368d..48d805f836 100644 --- a/source/retryable-writes/tests/README.md +++ b/source/retryable-writes/tests/README.md @@ -301,40 +301,33 @@ debugger, code coverage tool, etc. - 2024-02-27: Convert legacy retryable writes tests to unified format. -- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing\ - execution of deprioritization code - paths. +- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing execution of deprioritization code paths. - 2024-01-05: Fix typo in prose test title. -- 2024-01-03: Note server version requirements for fail point options and revise\ - tests to specify the `errorLabels` +- 2024-01-03: Note server version requirements for fail point options and revise tests to specify the `errorLabels` option at the top-level instead of within `writeConcernError`. - 2023-08-26: Add prose tests for retrying in a sharded cluster. -- 2022-08-30: Add prose test verifying correct error handling for errors with\ - the NoWritesPerformed label, which is to +- 2022-08-30: Add prose test verifying correct error handling for errors with the NoWritesPerformed label, which is to return the original error. - 2022-04-22: Clarifications to `serverless` and `useMultipleMongoses`. -- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of\ - `useMultipleMongoses` for `LoadBalanced` topologies. +- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of `useMultipleMongoses` for `LoadBalanced` topologies. - 2021-04-23: Add `load-balanced` to test topology requirements. - 2021-03-24: Add prose test verifying `PoolClearedErrors` are retried. -- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to\ - `result` +- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to `result` - 2019-08-07: Add Prose Tests section - 2019-06-07: Mention $merge stage for aggregate alongside $out -- 2019-03-01: Add top-level `runOn` field to denote server version and/or\ - topology requirements requirements for the +- 2019-03-01: Add top-level `runOn` field to denote server version and/or topology requirements requirements for the test file. Removes the `minServerVersion` and `maxServerVersion` top-level fields, which are now expressed within `runOn` elements. diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md index b542844894..c6fe6b3f94 100644 --- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring-tests.md @@ -2,7 +2,7 @@ - Status: Accepted -\- Minimum Server Version: 2.4 See also the YAML test files and their accompanying README in the "tests" directory. +- Minimum Server Version: 2.4 See also the YAML test files and their accompanying README in the "tests" directory. ______________________________________________________________________ diff --git a/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md index be7315c39f..cdbece9bad 100644 --- a/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md +++ b/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md @@ -663,7 +663,8 @@ ServerType and a TopologyType intersect, the table shows what action the client This subsection complements the [TopologyType table](#topologytype-table) with prose explanations of the TopologyTypes (besides Single and LoadBalanced). -TopologyType Unknown\ +**TopologyType Unknown** + A starting state. **Actions**: @@ -679,7 +680,8 @@ A starting state. - If the type is RSSecondary, RSArbiter or RSOther, record its setName, set the TopologyType to ReplicaSetNoPrimary, and call [updateRSWithoutPrimary](#updaterswithoutprimary). -TopologyType Sharded\ +**TopologyType Sharded** + A steady state. Connected to one or more mongoses. **Actions**: @@ -687,7 +689,8 @@ A steady state. Connected to one or more mongoses. - If the server is Unknown or Mongos, keep it. - Remove others. -TopologyType ReplicaSetNoPrimary\ +**TopologyType ReplicaSetNoPrimary** + A starting state. The topology is definitely a replica set, but no primary is known. **Actions**: @@ -699,7 +702,8 @@ A starting state. The topology is definitely a replica set, but no primary is kn - If the type is RSPrimary call [updateRSFromPrimary](#updatersfromprimary). - If the type is RSSecondary, RSArbiter or RSOther, run [updateRSWithoutPrimary](#updaterswithoutprimary). -TopologyType ReplicaSetWithPrimary\ +**TopologyType ReplicaSetWithPrimary** + A steady state. The primary is known. **Actions**: @@ -1902,33 +1906,27 @@ oversaw the specification process. - 2016-05-04: Added link to SDAM monitoring. -- 2016-07-18: Replace mentions of the "Read Preferences Spec" with "Server\ - Selection Spec", and +- 2016-07-18: Replace mentions of the "Read Preferences Spec" with "Server Selection Spec", and "secondaryAcceptableLatencyMS" with "localThresholdMS". - 2016-07-21: Updated for Max Staleness support. - 2016-08-04: Explain better why clients use the hostnames in RS config, not URI. -- 2016-08-31: Multi-threaded clients SHOULD use hello or legacy hello replies to\ - update the topology when they - handshake application connections. +- 2016-08-31: Multi-threaded clients SHOULD use hello or legacy hello replies to update the topology when they handshake + application connections. -- 2016-10-06: In updateRSWithoutPrimary the hello or legacy hello response's\ - "primary" field should be used to update +- 2016-10-06: In updateRSWithoutPrimary the hello or legacy hello response's "primary" field should be used to update the topology description, even if address != me. - 2016-10-29: Allow for idleWritePeriodMS to change someday. -- 2016-11-01: "Unknown" is no longer the default TopologyType, the default is now\ - explicitly unspecified. Update +- 2016-11-01: "Unknown" is no longer the default TopologyType, the default is now explicitly unspecified. Update instructions for setting the initial TopologyType when running the spec tests. -- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the\ - future. +- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the future. -- 2017-02-28: Update "network error when reading or writing": timeout while\ - connecting does mark a server Unknown, +- 2017-02-28: Update "network error when reading or writing": timeout while connecting does mark a server Unknown, unlike a timeout while reading or writing. Justify the different behaviors, and also remove obsolete reference to auto-retry. @@ -1944,32 +1942,27 @@ oversaw the specification process. - 2019-05-29: Renamed InterruptedDueToStepDown to InterruptedDueToReplStateChange -- 2020-02-13: Drivers must run SDAM flow even when server description is equal to\ - the last one. +- 2020-02-13: Drivers must run SDAM flow even when server description is equal to the last one. -- 2020-03-31: Add topologyVersion to ServerDescription. Add rules for ignoring\ - stale application errors. +- 2020-03-31: Add topologyVersion to ServerDescription. Add rules for ignoring stale application errors. - 2020-05-07: Include error field in ServerDescription equality comparison. - 2020-06-08: Clarify reasoning behind how SDAM determines if a topologyVersion is stale. -- 2020-12-17: Mark the pool for a server as "ready" after performing a successful\ - check. Synchronize pool clearing with +- 2020-12-17: Mark the pool for a server as "ready" after performing a successful check. Synchronize pool clearing with SDAM updates. - 2021-01-17: Require clients to compare (electionId, setVersion) tuples. -- 2021-02-11: Errors encountered during auth are handled by SDAM. Auth errors\ - mark the server Unknown and clear the +- 2021-02-11: Errors encountered during auth are handled by SDAM. Auth errors mark the server Unknown and clear the pool. - 2021-04-12: Adding in behaviour for load balancer mode. - 2021-05-03: Require parsing "isWritablePrimary" field in responses. -- 2021-06-09: Connection pools must be created and eventually marked ready for\ - any server if a direct connection is +- 2021-06-09: Connection pools must be created and eventually marked ready for any server if a direct connection is used. - 2021-06-29: Updated to use modern terminology. diff --git a/source/server-selection/server-selection-tests.md b/source/server-selection/server-selection-tests.md index 3aaa2cb73d..def273f726 100644 --- a/source/server-selection/server-selection-tests.md +++ b/source/server-selection/server-selection-tests.md @@ -236,8 +236,7 @@ Multi-threaded and async drivers MUST also implement the following prose test: 8. Start 10 concurrent threads / tasks that each run 100 `findOne` operations with empty filters using that client. -9. Using command monitoring events, assert that each mongos was selected\ - roughly 50% of the time (within +/- 10%). +9. Using command monitoring events, assert that each mongos was selected roughly 50% of the time (within +/- 10%). ## Application-Provided Server Selector diff --git a/source/server-selection/server-selection.md b/source/server-selection/server-selection.md index 8c87423a20..13ba821e25 100644 --- a/source/server-selection/server-selection.md +++ b/source/server-selection/server-selection.md @@ -56,115 +56,141 @@ This specification does not apply to commands issued for server monitoring or au ### Terms -**Available**\ -Describes a server that is believed to be reachable over the network and able to respond to requests. A -server of type Unknown or PossiblePrimary is not available; other types are available. +**Available** + +Describes a server that is believed to be reachable over the network and able to respond to requests. A server of type +Unknown or PossiblePrimary is not available; other types are available. + +**Client** -**Client**\ Software that communicates with a MongoDB deployment. This includes both drivers and mongos. -**Candidate**\ -Describes servers in a deployment that enter the selection process, determined by the read preference -`mode` parameter and the servers' type. Depending on the `mode`, candidate servers might only include secondaries or -might apply to all servers in the deployment. +**Candidate** + +Describes servers in a deployment that enter the selection process, determined by the read preference `mode` parameter +and the servers' type. Depending on the `mode`, candidate servers might only include secondaries or might apply to all +servers in the deployment. + +**Deployment** -**Deployment**\ One or more servers that collectively provide access to a single logical set of MongoDB databases. -**Command**\ +**Command** + An OP_QUERY operation targeting the '$cmd' collection namespace. -**Direct connection**\ -A driver connection mode that sends all database operations to a single server without regard for -type. +**Direct connection** + +A driver connection mode that sends all database operations to a single server without regard for type. -**Eligible**\ -Describes candidate servers that also meet the criteria specified by the `tag_sets` and -`maxStalenessSeconds` read preference parameters. +**Eligible** + +Describes candidate servers that also meet the criteria specified by the `tag_sets` and `maxStalenessSeconds` read +preference parameters. + +**Hedged Read** -**Hedged Read**\ A server mode in which the same query is dispatched in parallel to multiple replica set members. -**Immediate topology check**\ -For a multi-threaded or asynchronous client, this means waking all server monitors for an -immediate check. For a single-threaded client, this means a (blocking) scan of all servers. +**Immediate topology check** + +For a multi-threaded or asynchronous client, this means waking all server monitors for an immediate check. For a +single-threaded client, this means a (blocking) scan of all servers. -**Latency window**\ -When choosing between several suitable servers, the latency window is the range of acceptable RTTs -from the shortest RTT to the shortest RTT plus the local threshold. E.g. if the shortest RTT is 15ms and the local -threshold is 200ms, then the latency window ranges from 15ms - 215ms. +**Latency window** -**Local threshold**\ -The maximum acceptable difference in milliseconds between the shortest RTT and the longest RTT of -servers suitable to be selected. +When choosing between several suitable servers, the latency window is the range of acceptable RTTs from the shortest RTT +to the shortest RTT plus the local threshold. E.g. if the shortest RTT is 15ms and the local threshold is 200ms, then +the latency window ranges from 15ms - 215ms. -**Mode**\ -One of several enumerated values used as part of a read preference, defining which server types are candidates -for reads and the semantics for choosing a specific one. +**Local threshold** + +The maximum acceptable difference in milliseconds between the shortest RTT and the longest RTT of servers suitable to be +selected. + +**Mode** + +One of several enumerated values used as part of a read preference, defining which server types are candidates for reads +and the semantics for choosing a specific one. + +**Primary** -**Primary**\ Describes a server of type RSPrimary. -**Query**\ +**Query** + An OP_QUERY operation targeting a regular (non '$cmd') collection namespace. -**Read preference**\ -The parameters describing which servers in a deployment can receive read operations, including -`mode`, `tag_sets`, `maxStalenessSeconds`, and `hedge`. +**Read preference** + +The parameters describing which servers in a deployment can receive read operations, including `mode`, `tag_sets`, +`maxStalenessSeconds`, and `hedge`. + +**RS** -**RS**\ Abbreviation for "replica set". -**RTT**\ +**RTT** + Abbreviation for "round trip time". -**Round trip time**\ -The time in milliseconds to execute a `hello` or legacy hello command and receive a response for a -given server. This spec differentiates between the RTT of a single `hello` or legacy hello command and a server's -*average* RTT over several such commands. +**Round trip time** + +The time in milliseconds to execute a `hello` or legacy hello command and receive a response for a given server. This +spec differentiates between the RTT of a single `hello` or legacy hello command and a server's *average* RTT over +several such commands. + +**Secondary** -**Secondary**\ A server of type RSSecondary. -**Staleness**\ +**Staleness** + A worst-case estimate of how far a secondary's replication lags behind the primary's last write. -**Server**\ +**Server** + A mongod or mongos process. -**Server selection**\ -The process by which a server is chosen for a database operation out of all potential servers in a -deployment. +**Server selection** + +The process by which a server is chosen for a database operation out of all potential servers in a deployment. + +**Server type** -**Server type**\ -An enumerated type indicating whether a server is up or down, whether it is a mongod or mongos, whether -it belongs to a replica set and, if so, what role it serves in the replica set. See the +An enumerated type indicating whether a server is up or down, whether it is a mongod or mongos, whether it belongs to a +replica set and, if so, what role it serves in the replica set. See the [Server Discovery and Monitoring](https://github.com/mongodb/specifications/tree/master/source/server-discovery-and-monitoring) spec for more details. -**Suitable**\ +**Suitable** + Describes a server that meets all specified criteria for a read or write operation. -**Tag**\ -A single key/value pair describing either (1) a user-specified characteristic of a replica set member or (2) a -desired characteristic for the target of a read operation. The key and value have no semantic meaning to the driver; -they are arbitrary user choices. +**Tag** + +A single key/value pair describing either (1) a user-specified characteristic of a replica set member or (2) a desired +characteristic for the target of a read operation. The key and value have no semantic meaning to the driver; they are +arbitrary user choices. + +**Tag set** -**Tag set**\ A document of zero or more tags. Each member of a replica set can be configured with zero or one tag set. -**Tag set list**\ -A list of zero or more tag sets. A read preference might have a tag set list used for selecting -servers. +**Tag set list** + +A list of zero or more tag sets. A read preference might have a tag set list used for selecting servers. + +**Topology** -**Topology**\ The state of a deployment, including its type, which servers are members, and the server types of members. -**Topology type**\ -An enumerated type indicating the semantics for monitoring servers and selecting servers for database -operations. See the +**Topology type** + +An enumerated type indicating the semantics for monitoring servers and selecting servers for database operations. See +the [Server Discovery and Monitoring](https://github.com/mongodb/specifications/tree/master/source/server-discovery-and-monitoring) spec for more details. @@ -308,22 +334,27 @@ are described elsewhere. Clients MUST support these modes: -**primary**\ +**primary** + Only an available primary is suitable. -**secondary**\ -All secondaries (and *only* secondaries) are candidates, but only [eligible](#eligible) candidates (i.e. -after applying `tag_sets` and `maxStalenessSeconds`) are suitable. +**secondary** + +All secondaries (and *only* secondaries) are candidates, but only [eligible](#eligible) candidates (i.e. after applying +`tag_sets` and `maxStalenessSeconds`) are suitable. + +**primaryPreferred** + +If a primary is available, only the primary is suitable. Otherwise, all secondaries are candidates, but only eligible +secondaries are suitable. + +**secondaryPreferred** -**primaryPreferred**\ -If a primary is available, only the primary is suitable. Otherwise, all secondaries are -candidates, but only eligible secondaries are suitable. +All secondaries are candidates. If there is at least one eligible secondary, only eligible secondaries are suitable. +Otherwise, when there are no eligible secondaries, the primary is suitable. -**secondaryPreferred**\ -All secondaries are candidates. If there is at least one eligible secondary, only eligible -secondaries are suitable. Otherwise, when there are no eligible secondaries, the primary is suitable. +**nearest** -**nearest**\ The primary and all secondaries are candidates, but only eligible candidates are suitable. *Note on other server types*: The @@ -1582,49 +1613,40 @@ maxStalenessSeconds first, then tag_sets, and select Node 2. - 2015-06-26: Updated single-threaded selection logic with "stale" and serverSelectionTryOnce. -- 2015-08-10: Updated single-threaded selection logic to ensure a scan always\ - happens at least once under +- 2015-08-10: Updated single-threaded selection logic to ensure a scan always happens at least once under serverSelectionTryOnce if selection fails. Removed the general selection algorithm and put full algorithms for each of the single- and multi-threaded sections. Added a requirement that single-threaded drivers document selection time expectations. - 2016-07-21: Updated for Max Staleness support. -- 2016-08-03: Clarify selection algorithm, in particular that maxStalenessMS\ - comes before tag_sets. +- 2016-08-03: Clarify selection algorithm, in particular that maxStalenessMS comes before tag_sets. - 2016-10-24: Rename option from "maxStalenessMS" to "maxStalenessSeconds". -- 2016-10-25: Change minimum maxStalenessSeconds value from 2 \*\ - heartbeatFrequencyMS to heartbeatFrequencyMS + +- 2016-10-25: Change minimum maxStalenessSeconds value from 2 * heartbeatFrequencyMS to heartbeatFrequencyMS + idleWritePeriodMS (with proper conversions of course). -- 2016-11-01: Update formula for secondary staleness estimate with the\ - equivalent, and clearer, expression of this +- 2016-11-01: Update formula for secondary staleness estimate with the equivalent, and clearer, expression of this formula from the Max Staleness Spec -- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the\ - future, require maxStalenessSeconds to +- 2016-11-21: Revert changes that would allow idleWritePeriodMS to change in the future, require maxStalenessSeconds to be at least 90. -- 2017-06-07: Clarify socketCheckIntervalMS behavior, single-threaded drivers\ - must retry selection after checking an +- 2017-06-07: Clarify socketCheckIntervalMS behavior, single-threaded drivers must retry selection after checking an idle socket and discovering it is broken. - 2017-11-10: Added application-configurated server selector. -- 2017-11-12: Specify read preferences for OP_MSG with direct connection, and\ - delete obsolete comment direct - connections to secondaries getting "not writable primary" errors by design. +- 2017-11-12: Specify read preferences for OP_MSG with direct connection, and delete obsolete comment direct connections + to secondaries getting "not writable primary" errors by design. - 2018-01-22: Clarify that $query wrapping is only for OP_QUERY -- 2018-01-22: Clarify that $out on aggregate follows the "$out Aggregation\ - Pipeline Operator" spec and warns if read +- 2018-01-22: Clarify that $out on aggregate follows the "$out Aggregation Pipeline Operator" spec and warns if read preference is not primary. -- 2018-01-29: Remove reference to '$out Aggregation spec'. Clarify runCommand\ - selection rules. +- 2018-01-29: Remove reference to '$out Aggregation spec'. Clarify runCommand selection rules. - 2018-12-13: Update tag_set example to use only String values @@ -1646,9 +1668,8 @@ maxStalenessSeconds first, then tag_sets, and select Node 2. - 2021-09-03: Clarify that wire version check only applies to available servers. -- 2021-09-28: Note that 5.0+ secondaries support aggregate with write stages\ - (e.g. `$out` and `$merge`). Clarify - setting `SecondaryOk` wire protocol flag or `$readPreference` global command argument for replica set topology. +- 2021-09-28: Note that 5.0+ secondaries support aggregate with write stages (e.g. `$out` and `$merge`). Clarify setting + `SecondaryOk` wire protocol flag or `$readPreference` global command argument for replica set topology. - 2022-01-19: Require that timeouts be applied per the client-side operations timeout spec diff --git a/source/server-selection/tests/README.md b/source/server-selection/tests/README.md index 8699168cb0..426dc872d7 100644 --- a/source/server-selection/tests/README.md +++ b/source/server-selection/tests/README.md @@ -68,8 +68,7 @@ Each YAML file for these tests has the following format: - `address`: a unique address identifying this server - `operation_count`: the `operationCount` for this server -- `iterations`: the number of selections that should be run as part of this\ - test +- `iterations`: the number of selections that should be run as part of this test - `outcome`: an object describing the expected outcome of the selections diff --git a/source/server_write_commands.md b/source/server_write_commands.md index dd4febf860..9c245cf448 100644 --- a/source/server_write_commands.md +++ b/source/server_write_commands.md @@ -488,8 +488,7 @@ Yes but as of 2.6 the existing getLastError behavior is supported for backward c - 2024-07-31: Migrated from reStructuredText to Markdown. -- 2024-06-04: Add FAQ entry outlining client-side `_id` value generation\ - Update FAQ to indicate legacy opcodes were +- 2024-06-04: Add FAQ entry outlining client-side `_id` value generation Update FAQ to indicate legacy opcodes were removed - 2022-10-05: Revise spec front matter and reformat changelog. @@ -498,8 +497,7 @@ Yes but as of 2.6 the existing getLastError behavior is supported for backward c - 2021-04-22: Updated to use hello command -- 2014-05-15: Removed text related to bulk operations; see the Bulk API spec for\ - bulk details. Clarified some +- 2014-05-15: Removed text related to bulk operations; see the Bulk API spec for bulk details. Clarified some paragraphs; re-ordered the response field sections. - 2014-05-14: First public version diff --git a/source/sessions/driver-sessions.md b/source/sessions/driver-sessions.md index 126c5aae30..d98cc9fe54 100644 --- a/source/sessions/driver-sessions.md +++ b/source/sessions/driver-sessions.md @@ -26,56 +26,68 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**ClientSession**\ -The driver object representing a client session and the operations that can be performed on it. -Depending on the language a driver is written in this might be an interface or a class. See also `ServerSession`. +**ClientSession** -**Deployment**\ -A set of servers that are all part of a single MongoDB cluster. We avoid the word "cluster" because some -people interpret "cluster" to mean "sharded cluster". +The driver object representing a client session and the operations that can be performed on it. Depending on the +language a driver is written in this might be an interface or a class. See also `ServerSession`. -**Explicit session**\ -A session that was started explicitly by the application by calling `startSession` and passed as -an argument to an operation. +**Deployment** + +A set of servers that are all part of a single MongoDB cluster. We avoid the word "cluster" because some people +interpret "cluster" to mean "sharded cluster". + +**Explicit session** + +A session that was started explicitly by the application by calling `startSession` and passed as an argument to an +operation. + +**MongoClient** -**MongoClient**\ The root object of a driver's API. MAY be named differently in some drivers. -**Implicit session**\ -A session that was started implicitly by the driver because the application called an operation -without providing an explicit session. +**Implicit session** + +A session that was started implicitly by the driver because the application called an operation without providing an +explicit session. + +**MongoCollection** + +The driver object representing a collection and the operations that can be performed on it. MAY be named differently in +some drivers. -**MongoCollection**\ -The driver object representing a collection and the operations that can be performed on it. MAY be -named differently in some drivers. +**MongoDatabase** -**MongoDatabase**\ -The driver object representing a database and the operations that can be performed on it. MAY be -named differently in some drivers. +The driver object representing a database and the operations that can be performed on it. MAY be named differently in +some drivers. -**ServerSession**\ -The driver object representing a server session. This type is an implementation detail and does not -need to be public. See also `ClientSession`. +**ServerSession** -**Server session ID**\ -A server session ID is a token used to identify a particular server session. A driver can ask the -server for a session ID using the `startSession` command or it can generate one locally (see Generating a Session ID -locally). +The driver object representing a server session. This type is an implementation detail and does not need to be public. +See also `ClientSession`. -**Session**\ -A session is an abstract concept that represents a set of sequential operations executed by an application -that are related in some way. Other specifications define the various ways in which operations can be related, but -examples include causally consistent reads and retryable writes. +**Server session ID** + +A server session ID is a token used to identify a particular server session. A driver can ask the server for a session +ID using the `startSession` command or it can generate one locally (see Generating a Session ID locally). + +**Session** + +A session is an abstract concept that represents a set of sequential operations executed by an application that are +related in some way. Other specifications define the various ways in which operations can be related, but examples +include causally consistent reads and retryable writes. + +**Topology** -**Topology**\ The current configuration and state of a deployment. -**Unacknowledged writes**\ -Unacknowledged writes are write operations that are sent to the server without waiting for a -reply acknowledging the write. See the "When using unacknowledged writes" section below for information on how -unacknowledged writes interact with sessions. +**Unacknowledged writes** + +Unacknowledged writes are write operations that are sent to the server without waiting for a reply acknowledging the +write. See the "When using unacknowledged writes" section below for information on how unacknowledged writes interact +with sessions. + +**Network error** -**Network error**\ Any network exception writing to or reading from a socket (e.g. a socket timeout or error). ## Specification diff --git a/source/sessions/snapshot-sessions.md b/source/sessions/snapshot-sessions.md index bac56b8cfc..d1813442d1 100644 --- a/source/sessions/snapshot-sessions.md +++ b/source/sessions/snapshot-sessions.md @@ -21,37 +21,44 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**ClientSession**\ +**ClientSession** + The driver object representing a client session and the operations that can be performed on it. -**MongoClient**\ +**MongoClient** + The root object of a driver's API. MAY be named differently in some drivers. -**MongoCollection**\ -The driver object representing a collection and the operations that can be performed on it. MAY be -named differently in some drivers. +**MongoCollection** + +The driver object representing a collection and the operations that can be performed on it. MAY be named differently in +some drivers. + +**MongoDatabase** -**MongoDatabase**\ -The driver object representing a database and the operations that can be performed on it. MAY be -named differently in some drivers. +The driver object representing a database and the operations that can be performed on it. MAY be named differently in +some drivers. + +**ServerSession** -**ServerSession**\ The driver object representing a server session. -**Session**\ -A session is an abstract concept that represents a set of sequential operations executed by an application -that are related in some way. This specification defines how sessions are used to implement snapshot reads. +**Session** + +A session is an abstract concept that represents a set of sequential operations executed by an application that are +related in some way. This specification defines how sessions are used to implement snapshot reads. + +**Snapshot reads** + +Reads with read concern level `snapshot` that occur outside of transactions on both the primary and secondary nodes, +including in sharded clusters. Snapshots reads are majority committed reads. -**Snapshot reads**\ -Reads with read concern level `snapshot` that occur outside of transactions on both the primary and -secondary nodes, including in sharded clusters. Snapshots reads are majority committed reads. +**Snapshot timestamp** -**Snapshot timestamp**\ -Snapshot timestamp, representing timestamp of the first supported read operation (i.e. -find/aggregate/distinct) in the session. The server creates a cursor in response to a snapshot find/aggregate command -and reports `atClusterTime` within the `cursor` field in the response. For the distinct command the server adds a -top-level `atClusterTime` field to the response. The `atClusterTime` field represents the timestamp of the read and is -guaranteed to be majority committed. +Snapshot timestamp, representing timestamp of the first supported read operation (i.e. find/aggregate/distinct) in the +session. The server creates a cursor in response to a snapshot find/aggregate command and reports `atClusterTime` within +the `cursor` field in the response. For the distinct command the server adds a top-level `atClusterTime` field to the +response. The `atClusterTime` field represents the timestamp of the read and is guaranteed to be majority committed. ## Specification diff --git a/source/sessions/tests/README.md b/source/sessions/tests/README.md index 218e481a2f..652b3c0668 100644 --- a/source/sessions/tests/README.md +++ b/source/sessions/tests/README.md @@ -54,7 +54,7 @@ This test applies to drivers with session pools. commands/replies in another idiomatic way, such as monkey-patching or a mock server. - Send a `ping` command to the server with the generic `runCommand` method. - Assert that the command passed to the command-started listener includes `$clusterTime` if and only if `maxWireVersion` - \>= 6. + > = 6. - Record the `$clusterTime`, if any, in the reply passed to the command-succeeded APM listener. - Send another `ping` command. - Assert that `$clusterTime` in the command passed to the command-started listener, if any, equals the `$clusterTime` in diff --git a/source/transactions/tests/README.md b/source/transactions/tests/README.md index 82b9ced512..f7a3936f24 100644 --- a/source/transactions/tests/README.md +++ b/source/transactions/tests/README.md @@ -76,6 +76,5 @@ driver, use command monitoring instead. ## Changelog - 2024-02-15: Migrated from reStructuredText to Markdown. -- 2024-02-07: Converted legacy transaction tests to unified format and moved the\ - legacy test format docs to a separate +- 2024-02-07: Converted legacy transaction tests to unified format and moved the legacy test format docs to a separate file. diff --git a/source/transactions/tests/legacy-test-format.md b/source/transactions/tests/legacy-test-format.md index db2fe9fe0d..eb05c1f9fe 100644 --- a/source/transactions/tests/legacy-test-format.md +++ b/source/transactions/tests/legacy-test-format.md @@ -477,13 +477,10 @@ sharded transaction that uses the `dbVersion` concept so it is the only command - 2019-03-25: Add workaround for StaleDbVersion on distinct. -- 2019-03-01: Add top-level `runOn` field to denote server version and/or\ - topology requirements requirements for the +- 2019-03-01: Add top-level `runOn` field to denote server version and/or topology requirements requirements for the test file. Removes the `topology` top-level field, which is now expressed within `runOn` elements. -- 2019-02-28: `useMultipleMongoses: true` and non-targeted fail points are\ - mutually exclusive. +- 2019-02-28: `useMultipleMongoses: true` and non-targeted fail points are mutually exclusive. -- 2019-02-13: Modify test format for 4.2 sharded transactions, including\ - "useMultipleMongoses", `object: testRunner`, +- 2019-02-13: Modify test format for 4.2 sharded transactions, including "useMultipleMongoses", `object: testRunner`, the `targetedFailPoint` operation, and recoveryToken assertions. diff --git a/source/transactions/transactions.md b/source/transactions/transactions.md index 484ab33fb7..59aaf2502f 100644 --- a/source/transactions/transactions.md +++ b/source/transactions/transactions.md @@ -259,7 +259,7 @@ ClientSession is in one of five states: "no transaction", "starting transaction" diagram: states\ +style="width:6.5in;height:3.68056in" alt="states" /> ([GraphViz source](client-session-transaction-states.dot)) When a ClientSession is created it starts in the "no transaction" state. Starting, committing, and aborting a @@ -875,18 +875,15 @@ The [Python driver](https://github.com/mongodb/mongo-python-driver/) serves as a - Support retryable writes within a transaction. -- Support transactions on secondaries. In this case, drivers would be\ - required to pin a transaction to the server +- Support transactions on secondaries. In this case, drivers would be required to pin a transaction to the server selected for the initial operation. All subsequent operations in the transaction would go to the pinned server. -- Support for transactions that read from multiple nodes in a replica\ - set. One interesting use case would be to run a +- Support for transactions that read from multiple nodes in a replica set. One interesting use case would be to run a single transaction that performs low-latency reads with readPreference "nearest" followed by some writes. -- Support for unacknowledged transaction commits. This might be useful\ - when data consistency is paramount but - durability is optional. Imagine a system that increments two counters in two different collections. The system may - want to use transactions to guarantee that both counters are always incremented together or not at all. +- Support for unacknowledged transaction commits. This might be useful when data consistency is paramount but durability + is optional. Imagine a system that increments two counters in two different collections. The system may want to use + transactions to guarantee that both counters are always incremented together or not at all. ## **Justifications** @@ -1077,8 +1074,7 @@ objective of avoiding duplicate commits. - 2024-02-15: Migrated from reStructuredText to Markdown. -- 2023-11-22: Specify that non-transient transaction errors abort the transaction\ - on the server. +- 2023-11-22: Specify that non-transient transaction errors abort the transaction on the server. - 2022-10-05: Remove spec front matter and reformat changelog @@ -1088,8 +1084,7 @@ objective of avoiding duplicate commits. - 2021-04-12: Adding in behaviour for load balancer mode. -- 2020-04-07: Clarify that all abortTransaction attempts should unpin the session,\ - even if the command is not executed. +- 2020-04-07: Clarify that all abortTransaction attempts should unpin the session, even if the command is not executed. - 2020-04-07: Specify that sessions should be unpinned once a transaction is aborted. @@ -1101,8 +1096,7 @@ objective of avoiding duplicate commits. - 2019-06-07: Mention `$merge` stage for aggregate alongside `$out` -- 2019-05-13: Add support for maxTimeMS on transaction commit, MaxTimeMSExpired\ - errors on commit are labelled +- 2019-05-13: Add support for maxTimeMS on transaction commit, MaxTimeMSExpired errors on commit are labelled UnknownTransactionCommitResult. - 2019-02-19: Add support for sharded transaction recoveryToken. @@ -1113,13 +1107,11 @@ objective of avoiding duplicate commits. - 2018-11-13: Add mongos pinning to support sharded transaction. -- 2018-06-18: Explicit readConcern and/or writeConcern are prohibited within\ - transactions, with a client-side error. +- 2018-06-18: Explicit readConcern and/or writeConcern are prohibited within transactions, with a client-side error. - 2018-06-07: The count command is not supported within transactions. -- 2018-06-14: Any retryable writes error raised by commitTransaction must be\ - labelled "UnknownTransactionCommitResult". +- 2018-06-14: Any retryable writes error raised by commitTransaction must be labelled "UnknownTransactionCommitResult". [^1]: In 4.2, a new mongos waits for the *outcome* of the transaction but will never itself cause the transaction to be committed. If the initial commit on the original mongos itself failed to initiate the transaction's commit sequence, diff --git a/source/unified-test-format/unified-test-format.md b/source/unified-test-format/unified-test-format.md index 7ef53ed5fe..9dd15b94d1 100644 --- a/source/unified-test-format/unified-test-format.md +++ b/source/unified-test-format/unified-test-format.md @@ -1247,15 +1247,13 @@ A log message which is expected to be observed while executing the test's operat The structure of each object is as follows: -- `level`: Required string. This MUST be one of the level names listed - in\ - [log severity levels](logging/logging.rst#log-severity-levels). This specifies the expected level for the log - message and corresponds to the level used for the message in the specification that defines it. Note that since not - all drivers will necessarily support all log levels, some driver may need to map the specified level to the - corresponding driver-supported level. Test runners MUST assert that the actual level matches this value. - -- `component`: Required string. This MUST be one of the component names listed\ - in +- `level`: Required string. This MUST be one of the level names listed in + [log severity levels](logging/logging.rst#log-severity-levels). This specifies the expected level for the log message + and corresponds to the level used for the message in the specification that defines it. Note that since not all + drivers will necessarily support all log levels, some driver may need to map the specified level to the corresponding + driver-supported level. Test runners MUST assert that the actual level matches this value. + +- `component`: Required string. This MUST be one of the component names listed in [components](../logging/logging.md#components). This specifies the expected component for the log message. Note that since naming variations are permitted for components, some drivers may need to map this to a corresponding language-specific component name. Test runners MUST assert that the actual component matches this value. @@ -3488,15 +3486,16 @@ other specs *and* collating spec changes developed in parallel or during the sam ## Changelog -- 2024-05-08: **Schema version 1.21.**\ - Add `writeErrors` and `writeConcernErrors` field to `expectedError` for the - client-level bulk write API. +- 2024-05-08: **Schema version 1.21.** + + Add `writeErrors` and `writeConcernErrors` field to `expectedError` for the client-level bulk write API. - 2024-04-15: Note that when `directConnection` is set to true test runners should only provide a single seed. -- 2024-03-25: **Schema version 1.20.**\ - Add `previousDescription` and `newDescription` assertions to - `topologyDescriptionChangedEvent` when checking events with `expectEvents` +- 2024-03-25: **Schema version 1.20.** + + Add `previousDescription` and `newDescription` assertions to `topologyDescriptionChangedEvent` when checking events + with `expectEvents` - 2024-03-11: Note that `killAllSessions` should not be executed on Atlas Data Lake @@ -3511,43 +3510,42 @@ other specs *and* collating spec changes developed in parallel or during the sam - 2024-02-06: Migrated from reStructuredText to Markdown. -- 2024-01-17: **Schema version 1.19.**\ - Add `authMechanism` to `runOnRequirement` and require that `uriOptions` supports - placeholder documents. +- 2024-01-17: **Schema version 1.19.** -- 2024-01-11: **Schema version 1.18.**\ - Allow named KMS providers in `kmsProviders`. Note location of Client-Side - Encryption test credentials. + Add `authMechanism` to `runOnRequirement` and require that `uriOptions` supports placeholder documents. -- 2024-01-03: Document server version requirements for `errorLabels` and\ - `blockConnection` options for `failCommand` +- 2024-01-11: **Schema version 1.18.** + + Allow named KMS providers in `kmsProviders`. Note location of Client-Side Encryption test credentials. + +- 2024-01-03: Document server version requirements for `errorLabels` and `blockConnection` options for `failCommand` fail point. -- 2023-10-04: **Schema version 1.17.**\ - Add `serverHeartbeatStartedEvent`, `serverHeartbeatSucceededEvent`, and - `serverHeartbeatFailedEvent` for asserting on SDAM server heartbeat events. +- 2023-10-04: **Schema version 1.17.** + + Add `serverHeartbeatStartedEvent`, `serverHeartbeatSucceededEvent`, and `serverHeartbeatFailedEvent` for asserting on + SDAM server heartbeat events. - 2023-09-25: Clarify that the UTR is intended to be run against enterprise servers. -- 2022-07-18: **Schema version 1.16.**\ - Add `ignoreMessages` and `ignoreExtraMessages` fields to - `expectedLogMessagesForClient` section. +- 2022-07-18: **Schema version 1.16.** -- 2023-06-26: `runOnRequirement.csfle` should check for crypt_shared and/or\ - mongocryptd. + Add `ignoreMessages` and `ignoreExtraMessages` fields to `expectedLogMessagesForClient` section. + +- 2023-06-26: `runOnRequirement.csfle` should check for crypt_shared and/or mongocryptd. + +- 2023-06-13: **Schema version 1.15.** -- 2023-06-13: **Schema version 1.15.**\ Add `databaseName` field to `CommandFailedEvent` and `CommandSucceededEvent`. -- 2023-05-26: **Schema version 1.14.**\ +- 2023-05-26: **Schema version 1.14.** + Add `topologyDescriptionChangedEvent`. -- 2023-05-17: Add `runCursorCommand` and `createCommandCursor` operations.\ - Added `commandCursor` entity type which can +- 2023-05-17: Add `runCursorCommand` and `createCommandCursor` operations. Added `commandCursor` entity type which can be used with existing cursor operations. -- 2023-05-12: Deprecate "sharded-replicaset" topology type. Note that server 3.6+\ - requires replica sets for shards, +- 2023-05-12: Deprecate "sharded-replicaset" topology type. Note that server 3.6+ requires replica sets for shards, which is also relevant to load balanced topologies. - 2023-04-13: Remove `readConcern` and `writeConcern` options from `runCommand` operation. @@ -3556,116 +3554,112 @@ other specs *and* collating spec changes developed in parallel or during the sam - 2022-10-17: Add description of a `close` operation for client entities. -- 2022-10-14: **Schema version 1.13.**\ - Add support for logging assertions via the `observeLogMessages` field for client - entities, along with a new top-level field `expectLogMessages` containing `expectedLogMessagesForClient` objects. Add - new special matching operators to enable command logging assertions, `$$matchAsDocument` and `$$matchAsRoot`. +- 2022-10-14: **Schema version 1.13.** + + Add support for logging assertions via the `observeLogMessages` field for client entities, along with a new top-level + field `expectLogMessages` containing `expectedLogMessagesForClient` objects. Add new special matching operators to + enable command logging assertions, `$$matchAsDocument` and `$$matchAsRoot`. + +- 2022-10-14: **Schema version 1.12.** -- 2022-10-14: **Schema version 1.12.**\ Add `errorResponse` to `expectedError`. -- 2022-10-05: Remove spec front matter, add "Current Schema Version" field, and\ - reformat changelog. Add comment to +- 2022-10-05: Remove spec front matter, add "Current Schema Version" field, and reformat changelog. Add comment to remind editors to note schema version bumps in changelog updates (where applicable). -- 2022-09-02: **Schema version 1.11.**\ +- 2022-09-02: **Schema version 1.11.** + Add `interruptInUseConnections` field to `poolClearedEvent` -- 2022-07-28: **Schema version 1.10.**\ - Add support for `thread` entities (`runOnThread`, `waitForThread`), - TopologyDescription entities (`recordTopologyDescription`, `waitForPrimaryChange`, `assertTopologyType`), testRunner - event assertion operations (`waitForEvent`, `assertEventCount`), expected SDAM events, and the `wait` operation. +- 2022-07-28: **Schema version 1.10.** + + Add support for `thread` entities (`runOnThread`, `waitForThread`), TopologyDescription entities + (`recordTopologyDescription`, `waitForPrimaryChange`, `assertTopologyType`), testRunner event assertion operations + (`waitForEvent`, `assertEventCount`), expected SDAM events, and the `wait` operation. -- 2022-07-27: Retroactively note schema version bumps in the changelog and\ - require doing so for future changes. +- 2022-07-27: Retroactively note schema version bumps in the changelog and require doing so for future changes. -- 2022-07-11: Update [Future Work](#future-work) to reflect that support for ignoring extra\ - observed events was added - in schema version 1.7. +- 2022-07-11: Update [Future Work](#future-work) to reflect that support for ignoring extra observed events was added in + schema version 1.7. - 2022-06-16: Require server 4.2+ for `csfle: true`. -- 2022-05-10: Add reference to Client Side Encryption spec - under\ +- 2022-05-10: Add reference to Client Side Encryption spec under [ClientEncryption Operations](#clientencryption-operations). -- 2022-04-27: **Schema version 1.9.**\ - Added `createOptions` field to `initialData`, introduced a new `timeoutMS` field - in `collectionOrDatabaseOptions`, and added an `isTimeoutError` field to `expectedError`. Also introduced the - `createEntities` operation. +- 2022-04-27: **Schema version 1.9.** + + Added `createOptions` field to `initialData`, introduced a new `timeoutMS` field in `collectionOrDatabaseOptions`, and + added an `isTimeoutError` field to `expectedError`. Also introduced the `createEntities` operation. + +- 2022-04-27: **Schema version 1.8.** -- 2022-04-27: **Schema version 1.8.**\ Add `runOnRequirement.csfle`. - 2022-04-26: Add `clientEncryption` entity and `$$placeholder` syntax. -- 2022-04-22: Revise `useMultipleMongoses` and "Initializing the Test Runner"\ - for Atlas Serverless URIs using a load +- 2022-04-22: Revise `useMultipleMongoses` and "Initializing the Test Runner" for Atlas Serverless URIs using a load balancer fronting a single proxy. -- 2022-03-01: **Schema version 1.7.**\ +- 2022-03-01: **Schema version 1.7.** + Add `ignoreExtraEvents` field to `expectedEventsForClient`. - 2022-02-24: Rename Versioned API to Stable API -- 2021-08-30: **Schema version 1.6.**\ - Add `hasServerConnectionId` field to `commandStartedEvent`, - `commandSuccededEvent` and `commandFailedEvent`. +- 2021-08-30: **Schema version 1.6.** + + Add `hasServerConnectionId` field to `commandStartedEvent`, `commandSuccededEvent` and `commandFailedEvent`. -- 2021-08-30: Test runners may create an internal MongoClient for each mongos.\ - Better clarify how internal MongoClients +- 2021-08-30: Test runners may create an internal MongoClient for each mongos. Better clarify how internal MongoClients may be used. Clarify that drivers creating an internal MongoClient for each mongos should use those clients for `targetedFailPoint` operations. - 2021-08-23: Allow `runOnRequirement` conditions to be evaluated in any order. -- 2021-08-09: Updated all existing schema files to require at least one element\ - in `test.expectEvents` if specified. +- 2021-08-09: Updated all existing schema files to require at least one element in `test.expectEvents` if specified. -- 2021-07-29: Note that events for sensitive commands will have redacted\ - commands and replies when using +- 2021-07-29: Note that events for sensitive commands will have redacted commands and replies when using `observeSensitiveCommands`, and how that affects conditionally sensitive commands such as `hello` and legacy hello. -- 2021-07-01: Note that `expectError.expectResult` should use\ - `$$unsetOrMatches` when the result is optional. +- 2021-07-01: Note that `expectError.expectResult` should use `$$unsetOrMatches` when the result is optional. + +- 2021-06-09: **Schema version 1.5.** -- 2021-06-09: **Schema version 1.5.**\ Added an `observeSensitiveCommands` property to the `client` entity type. - 2021-05-17: Ensure old JSON schema files remain in place -- 2021-04-19: **Schema version 1.4.**\ +- 2021-04-19: **Schema version 1.4.** + Introduce `serverless` [runOnRequirement](#runonrequirement). -- 2021-04-12: **Schema version 1.3.**\ - Added a `FindCursor` entity type. Defined a set of cursor operations. Added an - `auth` property to `runOnRequirements` and modified the `topologies` property to accept `load-balanced`. Added CMAP - events to the possible event types for `expectedEvent`. Add `assertNumberConnectionsCheckedOut` operation. Add - `ignoreResultAndError` operation option. +- 2021-04-12: **Schema version 1.3.** + + Added a `FindCursor` entity type. Defined a set of cursor operations. Added an `auth` property to `runOnRequirements` + and modified the `topologies` property to accept `load-balanced`. Added CMAP events to the possible event types for + `expectedEvent`. Add `assertNumberConnectionsCheckedOut` operation. Add `ignoreResultAndError` operation option. -- 2021-04-08: List additional error codes that may be ignored when calling\ - `killAllSessions` and note that the command +- 2021-04-08: List additional error codes that may be ignored when calling `killAllSessions` and note that the command should not be called when connected to Atlas. -- 2021-03-22: Split `serverApi` into its own section. Note types for `loop`\ - operation arguments. Clarify how `loop` +- 2021-03-22: Split `serverApi` into its own section. Note types for `loop` operation arguments. Clarify how `loop` iterations are counted for `storeIterationsAsEntity`. -- 2021-03-10: Clarify that `observedAt` field measures time in seconds for\ - `storeEventsAsEntities`. +- 2021-03-10: Clarify that `observedAt` field measures time in seconds for `storeEventsAsEntities`. -- 2021-03-09: Clarify which components of a version string are relevant for\ - comparisons. +- 2021-03-09: Clarify which components of a version string are relevant for comparisons. -- 2021-03-04: Change `storeEventsAsEntities` from a map to an array of\ - `storeEventsAsEntity` objects. +- 2021-03-04: Change `storeEventsAsEntities` from a map to an array of `storeEventsAsEntity` objects. -- 2021-03-01: **Schema version 1.2.**\ - Added `storeEventsAsEntities` option for client entities and `loop` operation, - which is needed for Atlas Driver Testing. +- 2021-03-01: **Schema version 1.2.** + + Added `storeEventsAsEntities` option for client entities and `loop` operation, which is needed for Atlas Driver + Testing. - 2020-12-23: Clarify how JSON schema is renamed for new minor versions. -- 2020-11-06: **Schema version 1.1.**\ - Added `serverApi` option for client entities, `_yamlAnchors` property to define - values for later use in YAML tests, and `serverParameters` property for `runOnRequirements`. +- 2020-11-06: **Schema version 1.1.** + + Added `serverApi` option for client entities, `_yamlAnchors` property to define values for later use in YAML tests, + and `serverParameters` property for `runOnRequirements`. diff --git a/source/uri-options/uri-options.md b/source/uri-options/uri-options.md index 7f37ac1169..2e636cc812 100644 --- a/source/uri-options/uri-options.md +++ b/source/uri-options/uri-options.md @@ -194,8 +194,7 @@ changes. - 2021-11-08: Add maxConnecting option. -- 2021-10-14: Add srvMaxHosts option. Merge headings discussing URI validation\ - for directConnection option. +- 2021-10-14: Add srvMaxHosts option. Merge headings discussing URI validation for directConnection option. - 2021-09-15: Add srvServiceName option diff --git a/source/uuid.md b/source/uuid.md index 515b79239c..ced26373fa 100644 --- a/source/uuid.md +++ b/source/uuid.md @@ -28,10 +28,12 @@ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SH ### Terms -**UUID**\ +**UUID** + A Universally Unique IDentifier -**BsonBinary**\ +**BsonBinary** + An object that wraps an instance of a BSON binary value ### Naming Deviations