-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Fix the shuffle hanging problem when fetch data from presto … #24301
Conversation
2507aa3
to
8e06b65
Compare
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ed590e7
to
491cd92
Compare
ackSequenceOpt = atol(nextTokenStr.c_str()); | ||
} else { | ||
VELOX_CHECK_EQ( | ||
contentLength, 0, "next token is not set non-empty data response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... is not set for non-empty ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having this check, I assume we are not supporting data fetching from java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support and only java won't set next token str in get data size request with empty data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks % for some minor
sequence_ = ackSequence; | ||
if (ackSequenceOpt.has_value()) { | ||
sequence_ = ackSequenceOpt.value(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have an else block and perform some finish checks? Assuming we only send empty page at completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also see empty sequence in case of get data size. Get data size and get data share the same response handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang thanks!
ackSequenceOpt = atol(nextTokenStr.c_str()); | ||
} else { | ||
VELOX_CHECK_EQ( | ||
contentLength, 0, "next token is not set non-empty data response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support and only java won't set next token str in get data size request with empty data.
sequence_ = ackSequence; | ||
if (ackSequenceOpt.has_value()) { | ||
sequence_ = ackSequenceOpt.value(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also see empty sequence in case of get data size. Get data size and get data share the same response handler.
…java (coordinator) Presto java worker doesn't set next token in response for a get data size request and it simply returns the buffered data from the source. The Prestissimo worker updates 'sequence_' with ack sequence encoded by next token and uses it as data stream offset to fetch data from the source. If the next token str is not set in the data response handler header, then ack sequence is set to zero which resets the 'sequence_' to zero, and the rollback of 'sequence_' can cause data hanging problem when Prestissimo worker fetch data from a Presto java worker (which can only happen in the case that Presto java worker run by the coordinator as some metadata operation can only execute on the coordinator) from the reset 'sequence_'. This PR fixes the issue by avoiding updating 'sequence_' if next token is not set in data response handler.
Presto java worker doesn't set next token in response for a get data size request
and it simply returns the buffered data from the source. The Prestissimo worker
updates 'sequence_' with ack sequence encoded by next token and uses it as data
stream offset to fetch data from the source. If the next token str is not set in
the data response handler header, then ack sequence is set to zero which resets the
'sequence_' to zero, and the rollback of 'sequence_' can cause data hanging problem
when Prestissimo worker fetch data from a Presto java worker (which can only happen
in the case that Presto java worker run by the coordinator as some metadata operation
can only execute on the coordinator) from the reset 'sequence_'.
This PR fixes the issue by avoiding updating 'sequence_' if next token is not set
in data response handler.