-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(hashjoin): Create new VectorHashers for listNullKeyRows to prevent dangling pointer access #12106
base: main
Are you sure you want to change the base?
fix(hashjoin): Create new VectorHashers for listNullKeyRows to prevent dangling pointer access #12106
Conversation
✅ Deploy Preview for meta-velox canceled.
|
876e69a
to
9ea00a3
Compare
… hash mode tables
9ea00a3
to
da85c80
Compare
@xiaoxmeng and @Yuhta , could you please help to take a look this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhli1142015 LGTM. Thanks!
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
9282af2
to
c20627c
Compare
velox/exec/HashTable.h
Outdated
|
||
/// If true, the probe will only consider rows with null keys. | ||
/// During probing, only the hash value is compared, and key comparison is | ||
/// skipped to avoid potential issues. This is used by listNullKeyRows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key comparison is skipped to avoid potential issues
Just curious, why would hashers_[0]->decodedVector().base()
be a dangling pointer ? is there some issue with the lifetime of the vector passed to this hasher? OR do we never pass any vector to it for decoding?
Just want to make sure we dont end up masking any inconsistent state in our data structures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashers_
comes from the hash table, and it should be used during the hash build phase. Also, baseVector_
(hashers_[0]->decodedVector().base()
) is a raw pointer, so we cannot guarantee that baseVector_
is available during probe phase. I think in listNullKeyRows
, comparing only the hash value should be sufficient to find null keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baseVector_ (hashers_[0]->decodedVector().base()) is a raw pointer, so we cannot guarantee that baseVector_ is available during probe phase.
A decodedVector is intended for reading existing vectors, and we should ensure its validity if it's accessible. The fact that it was a dangling pointer suggests some action released those vectors while still allowing access, which this PR aims to fix. However, it would be beneficial to investigate the root cause of this inconsistent state to prevent future potential bugs and ensure we're not just addressing this specific symptom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, the vector whose pointer becomes dangling here should be an input vector for HashBuild. It seems a bit odd to me that during the hash probe phase, we need to access the build-side input vector. From my understanding, the vector being released here is expected behavior by design. The issue here is that we should not be attempting to access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vector is decoded in hashers_[0]
within the addInput
function of HashBuild. Since input
is not referenced by either HashBuild or HashTable, its lifetime may end before probing occurs. As a result, baseVector_ (hashers_[0]->decodedVector().base())
becomes a dangling pointer in this scenario.
https://github.com/facebookincubator/velox/blob/main/velox/exec/HashBuild.cpp#L316C17-L316C23
c20627c
to
2642a07
Compare
I reviewed the code and believe that creating separate VectorHashers with properly decoded vectors (a single row with null keys) would be a more suitable fix. This approach helps prevent potential future bugs. @bikramSingh91 and @xiaoxmeng could you help to take a look again? |
2642a07
to
9a397ce
Compare
This PR addresses an issue in the
listNullKeyRows
function that occurs in hashmode. In this function,
hashers_
from HashTable is used to construct the new HashLookup.The
joinProbe
function requires accessinghashers_[0]->decodedVector().base()
for key comparison. However, this pointer can be dangling, causing the error
described below.
We propose fixing this issue by using separate VectorHashers with properly decoded vectors (1 row with null keys).