Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hashjoin): Create new VectorHashers for listNullKeyRows to prevent dangling pointer access #12106

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented Jan 17, 2025

This PR addresses an issue in the listNullKeyRows function that occurs in hash
mode. In this function, hashers_ from HashTable is used to construct the new HashLookup.
The joinProbe function requires accessing hashers_[0]->decodedVector().base()
for key comparison. However, this pointer can be dangling, causing the error
described below.

We propose fixing this issue by using separate VectorHashers with properly decoded vectors (1 row with null keys).

[ RUN      ] HashJoinTest/MultiThreadedHashJoinTest.hashModeNullAwareAntiJoinWithFilterAndNullKey/1
unknown file: Failure
C++ exception with description "Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Specified element is not found : -96
Retriable: False
Function: mapTypeKindToName
File: /var/git/velox/velox/type/Type.cpp
Line: 116
Stack trace:
# 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxUserError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 2  facebook::velox::mapTypeKindToName[abi:cxx11](facebook::velox::TypeKind const&)
# 3  facebook::velox::exec::HashTable<false>::compareKeys(char const*, facebook::velox::exec::HashLookup&, int)
# 4  facebook::velox::exec::HashTable<false>::joinProbe(facebook::velox::exec::HashLookup&)
# 5  facebook::velox::exec::HashTable<false>::listNullKeyRows(facebook::velox::exec::BaseHashTable::NullKeyRowsIterator*, int, char**)
# 6  facebook::velox::exec::HashProbe::applyFilterOnTableRowsForNullAwareJoin(facebook::velox::SelectivityVector const&, facebook::velox::SelectivityVector&, std::function<int (char**, int)>) [clone .part.0]
# 7  facebook::velox::exec::HashProbe::evalFilterForNullAwareJoin(int, bool)
# 8  facebook::velox::exec::HashProbe::evalFilter(int)
# 9  facebook::velox::exec::HashProbe::getOutputInternal(bool)
# 10 facebook::velox::exec::HashProbe::getOutput()
# 11 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)::{lambda()#5}::operator()() const
# 12 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 13 facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>)
# 14 void folly::detail::function::call_<facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::{lambda()#1}, true, false, void>(, folly::detail::function::Data&)
# 15 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&)
# 16 folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)
# 17 void folly::detail::function::call_<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>, true, false, void>(, folly::detail::function::Data&)
# 18 0x00000000000dc252
# 19 0x0000000000094ac2
# 20 0x000000000012684f
" thrown in the test body.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2025
Copy link

netlify bot commented Jan 17, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9a397ce
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6797089f41e61d000875eec1

@zhli1142015 zhli1142015 force-pushed the fix_dangling_ptr_access branch 2 times, most recently from 876e69a to 9ea00a3 Compare January 22, 2025 01:18
@zhli1142015 zhli1142015 force-pushed the fix_dangling_ptr_access branch from 9ea00a3 to da85c80 Compare January 22, 2025 04:56
@zhli1142015
Copy link
Contributor Author

@xiaoxmeng and @Yuhta , could you please help to take a look this?
Thanks.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhli1142015 LGTM. Thanks!

velox/exec/HashTable.cpp Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@zhli1142015 zhli1142015 force-pushed the fix_dangling_ptr_access branch from 9282af2 to c20627c Compare January 23, 2025 08:46

/// If true, the probe will only consider rows with null keys.
/// During probing, only the hash value is compared, and key comparison is
/// skipped to avoid potential issues. This is used by listNullKeyRows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key comparison is skipped to avoid potential issues

Just curious, why would hashers_[0]->decodedVector().base() be a dangling pointer ? is there some issue with the lifetime of the vector passed to this hasher? OR do we never pass any vector to it for decoding?

Just want to make sure we dont end up masking any inconsistent state in our data structures

Copy link
Contributor Author

@zhli1142015 zhli1142015 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashers_ comes from the hash table, and it should be used during the hash build phase. Also, baseVector_ (hashers_[0]->decodedVector().base()) is a raw pointer, so we cannot guarantee that baseVector_ is available during probe phase. I think in listNullKeyRows, comparing only the hash value should be sufficient to find null keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseVector_ (hashers_[0]->decodedVector().base()) is a raw pointer, so we cannot guarantee that baseVector_ is available during probe phase.

A decodedVector is intended for reading existing vectors, and we should ensure its validity if it's accessible. The fact that it was a dangling pointer suggests some action released those vectors while still allowing access, which this PR aims to fix. However, it would be beneficial to investigate the root cause of this inconsistent state to prevent future potential bugs and ensure we're not just addressing this specific symptom.

Copy link
Contributor Author

@zhli1142015 zhli1142015 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, the vector whose pointer becomes dangling here should be an input vector for HashBuild. It seems a bit odd to me that during the hash probe phase, we need to access the build-side input vector. From my understanding, the vector being released here is expected behavior by design. The issue here is that we should not be attempting to access it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector is decoded in hashers_[0] within the addInput function of HashBuild. Since input is not referenced by either HashBuild or HashTable, its lifetime may end before probing occurs. As a result, baseVector_ (hashers_[0]->decodedVector().base()) becomes a dangling pointer in this scenario.
https://github.com/facebookincubator/velox/blob/main/velox/exec/HashBuild.cpp#L316C17-L316C23

@zhli1142015 zhli1142015 force-pushed the fix_dangling_ptr_access branch from c20627c to 2642a07 Compare January 27, 2025 03:19
@zhli1142015 zhli1142015 changed the title fix(hashjoin): Resolve dangling pointer access in listNullKeyRows for hash mode tables fix(hashjoin): Create new VectorHashers for listNullKeyRows to prevent dangling pointer access Jan 27, 2025
@zhli1142015
Copy link
Contributor Author

zhli1142015 commented Jan 27, 2025

I reviewed the code and believe that creating separate VectorHashers with properly decoded vectors (a single row with null keys) would be a more suitable fix. This approach helps prevent potential future bugs. @bikramSingh91 and @xiaoxmeng could you help to take a look again?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants