Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NODE-6370) response messages to large commands can be lost under load #4245

Closed
wants to merge 1 commit into from

Conversation

hunkydoryrepair
Copy link

@hunkydoryrepair hunkydoryrepair commented Sep 21, 2024

Description

NODE-6370 describes scenarios where executing commands can hang indefinitely.
Debugging uncovered that the response messages were sometimes being lost because the 'data' listener on the socket was not active in time, which was caused by awaiting the 'drain' event before adding the 'data' listener (in readMany).
This changes when the drain event is handled. This would generally only be observable in these scenarios:

  • The command being sent was large enough that it could not be fully written to the kernel socket and was queued in user memory.
  • The load on the driver process is such that the task awaiting the 'drain' event would not be resumed before the data is received from the server. This was reproducible only scenarios where a lot of network activity is happening at once, as well as a lot of other synchronous activity (like mongoose schema parsing). In those conditions, the 'data' event may be emitted and handled (or in this case, not handled) before the task awaiting 'drain' is resumed.

What is changing?

The drain event is not awaited when the response will be read. The response cannot arrive until after socket is drained, so is unnecessary in that case. This ensures that the 'data' event will be listened to.

Is there new documentation needed for these changes?

No

What is the motivation for this change?

Current driver is unsafe for use in production environments that experience high load and save large documents.

Release Highlight

Fill in title or leave empty for no highlight

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

(this area has a lot of tests covering it. A test for the actual issue is impractical)

@dariakp dariakp added tracked-in-jira Ticket filed in MongoDB's Jira system External Submission PR submitted from outside the team labels Sep 27, 2024
@nbbeeken
Copy link
Contributor

Thanks again for your help @hunkydoryrepair! Rather than elevating the promise, we went with pause/resume so we can leave the logic contained in the write step as it relates to a new feature we're working on. Let us know if you get a chance to check if this passes your use case if you have the bandwidth.

@nbbeeken nbbeeken closed this Sep 27, 2024
@nbbeeken
Copy link
Contributor

Fixed by: #4249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
External Submission PR submitted from outside the team tracked-in-jira Ticket filed in MongoDB's Jira system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants