-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-2828 Update mongos redirection prose tests to workaround SDAM behavior #1504
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The test improvements are much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Poke: @alexbevi @comandeo @prestonvasquez Your reviews were requested since you reviewed the prior PR (#1450). Given the expected impact of DRIVERS-1571, I think we should give some priority to this PR (DRIVERS-2828). I also think it is valuable to validate these changes with at least one more Driver (Ruby? Go?). If unwilling or unable to review, please remove yourself from the reviewers list to unblock this PR. Thank you! |
I'll defer to engineering to LGTM this, but from a Product POV it LGTM 😺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Please complete the following before merging:
clusters, and serverless).
Description
This PR is a followup to #1450 that proposes changes to the mongos redirection prose tests introduced in DRIVERS-2828 to workaround SDAM behavior preventing execution of server deprioritization code paths.
SDAM Network Error Handling
SDAM error handling behavior labels a server as Unknown when a post-handshake non-timeout network error occurs. This makes the server ineligible during server selection during the retry operation before the "on other mongos" prose tests have a chance to deprioritize the intended server (only one suitable server). This appears to be caused by the
closeConnection: true
field in the configured fail points. Therefore, this PR proposes their removal.Randomized Server Selection
Normal SDAM behavior randomly selects a server from the list of suitable servers. Consequently, the "on other mongos" prose tests cannot distinguish that a different mongos was used during the retry operation due to deprioritization (the behavior intended to be tested) from lucky randomized server selection. A note is added to the prose tests encouraging Drivers to manually inspect and ensure the deprioritization code paths are being executed by the prose tests as intended.
If Drivers have an alternative means of verifying that deprioritization code paths were executed, such as capturing debug/trace log output, they can do so, but this is not noted in the proposed PR due to falling outside the spec. This can be included in the note as an encouragement to Drivers if preferred, but I believe it unnecessary.
Initial TopologyType
There is variation among Drivers w.r.t. how they handle the directConnection URI option. Many Drivers appear to use
directConnection=true
by default, which means, given only a single host in the initial seed list, many Drivers will consider the topology of a single mongos connection to be Single rather than Sharded. Server deprioritization is only applicable for Sharded topologies. Therefore, the "on same mongos" prose test is likely not executing deprioritization code paths on many Drivers as currently written. Therefore, this PR adds the requirement thatdirectConnection=false
be included in the URI options when it not the default.