-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306
Conversation
…_iter() and suffix_iter()
/// Returns an iterator over the first `prefix_len` bytes of each array | ||
/// element, including null values. | ||
/// | ||
/// If `prefix_len` is larger than the element's length, the iterator will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels rather surprising to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a statement or a question?
If you have a suggestion of how to make it less surprising perhaps @xinlifoobar would be willing to give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would have thought returning Option<&[u8]>
would be perhaps more intuitive and likely perform similarly? That being said I am surprised that skipping the null mask yields performance returns, as this hasn't been my experience with regular StringArray, so my intuition about performance may be off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions! @tustvold.
I created a new PR #6312 to put this discussion. I did this in the early version but didn't see performance gains like this time - it did well for inline values. We might do a combination based on the bench result, e.g., null masks for prefix_iter and slice for suffix_iter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results on #6312 seem to show that using null masks does indeed reduce performance
So is the conclusion that these APIs are good to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, I think this API would be better if it returned None for a prefix too long, even if it continues to not inspect null masks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs look good, we can always update should we alter the signature
Ok, merging in these docs so they match the current code. As @tustvold says if the code needs to be updated we can also update the docs |
Which issue does this PR close?
Follow on to #6231 from @xinlifoobar
Rationale for this change
These new APIs were added in #6231 and I wanted to clarify their behavior (so we can potentailly use them in DataFusion)
What changes are included in this PR?
Update doc comments
Are there any user-facing changes?
Documentation, no functional change