Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 25, 2024

Which issue does this PR close?

Follow on to #6231 from @xinlifoobar

Rationale for this change

These new APIs were added in #6231 and I wanted to clarify their behavior (so we can potentailly use them in DataFusion)

What changes are included in this PR?

Update doc comments

Are there any user-facing changes?

Documentation, no functional change

@alamb alamb added the documentation Improvements or additions to documentation label Aug 25, 2024
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 25, 2024
/// Returns an iterator over the first `prefix_len` bytes of each array
/// element, including null values.
///
/// If `prefix_len` is larger than the element's length, the iterator will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels rather surprising to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a statement or a question?

If you have a suggestion of how to make it less surprising perhaps @xinlifoobar would be willing to give it a try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would have thought returning Option<&[u8]> would be perhaps more intuitive and likely perform similarly? That being said I am surprised that skipping the null mask yields performance returns, as this hasn't been my experience with regular StringArray, so my intuition about performance may be off

Copy link
Contributor

@xinlifoobar xinlifoobar Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! @tustvold.

I created a new PR #6312 to put this discussion. I did this in the early version but didn't see performance gains like this time - it did well for inline values. We might do a combination based on the bench result, e.g., null masks for prefix_iter and slice for suffix_iter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results on #6312 seem to show that using null masks does indeed reduce performance

So is the conclusion that these APIs are good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, I think this API would be better if it returned None for a prefix too long, even if it continues to not inspect null masks

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs look good, we can always update should we alter the signature

@alamb
Copy link
Contributor Author

alamb commented Aug 28, 2024

Ok, merging in these docs so they match the current code. As @tustvold says if the code needs to be updated we can also update the docs

@alamb alamb merged commit dc8427f into apache:master Aug 28, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants