Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Null Mask to Prefix and Suffix Iters #6312

Closed
wants to merge 1 commit into from

Conversation

xinlifoobar
Copy link
Contributor

Which issue does this PR close?

This is a follow-up (possible?) of #6231 and incur discussions https://github.com/apache/arrow-rs/pull/6306/files#r1731703196.

# xinli @ arch-dev in ~/source/repos/arrow-rs on git:master o [20:44:53] 
$ critcmp master_08_27 nullable_08_27                                            
group                                        master_08_27                           nullable_08_27
-----                                        ------------                           --------------
like_utf8view scalar complex                 1.00    171.9±3.44ms        ? ?/sec    1.00    171.2±2.01ms        ? ?/sec
like_utf8view scalar contains                1.00    127.1±1.40ms        ? ?/sec    1.02    129.7±2.94ms        ? ?/sec
like_utf8view scalar ends with 13 bytes      1.00     30.9±0.95ms        ? ?/sec    1.09     33.8±0.72ms        ? ?/sec
like_utf8view scalar ends with 4 bytes       1.00     31.9±0.55ms        ? ?/sec    1.10     35.0±0.51ms        ? ?/sec
like_utf8view scalar ends with 6 bytes       1.00     31.5±0.45ms        ? ?/sec    1.10     34.7±0.42ms        ? ?/sec
like_utf8view scalar equals                  1.00     26.9±1.08ms        ? ?/sec    1.00     26.9±0.44ms        ? ?/sec
like_utf8view scalar starts with 13 bytes    1.00     30.5±0.49ms        ? ?/sec    1.14     34.6±0.42ms        ? ?/sec
like_utf8view scalar starts with 4 bytes     1.25     21.9±0.31ms        ? ?/sec    1.00     17.5±0.51ms        ? ?/sec
like_utf8view scalar starts with 6 bytes     1.00     30.9±0.46ms        ? ?/sec    1.15     35.7±0.61ms        ? ?/sec

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@@ -134,7 +134,9 @@ impl<'a> Predicate<'a> {
string_view_array
.prefix_bytes_iter(v.len())
.map(|haystack| {
equals_bytes(haystack, v.as_bytes(), equals_kernel) != negate
haystack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you use unwrap_or_default instead of map_or

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be worse than map_or(false, ...) as an additional vtable call is introduced.

https://doc.rust-lang.org/1.80.1/src/core/option.rs.html#1003-1013

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the vtable, the types are all known statically here?

Copy link
Contributor Author

@xinlifoobar xinlifoobar Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a mistake. The above statement is wrong.

pub fn prefix_bytes_iter(&self, prefix_len: usize) -> impl Iterator<Item = Option<&[u8]>> {
self.views().into_iter().enumerate().map(move |(i, v)| {
if self.is_null(i) {
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to performance if you don't check the null mask, but still return Option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_null is cached null values which might faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting simply not checking the null mask at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get... The latest commit has removed the is_null check...

Copy link
Contributor Author

@xinlifoobar xinlifoobar Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still points to the old code... I have squash'ed everything in one commit and please see the changes.

@xinlifoobar
Copy link
Contributor Author

New benchmark

like_utf8view scalar complex
----------------------------
nullable_08_27       1.00     171.2±2.01ms       ? ?/sec
master_08_27         1.00     171.9±3.44ms       ? ?/sec
nullable_08_27_2     1.02     174.8±6.85ms       ? ?/sec

like_utf8view scalar contains
-----------------------------
nullable_08_27_2     1.00     124.7±1.50ms       ? ?/sec
master_08_27         1.02     127.1±1.40ms       ? ?/sec
nullable_08_27       1.04     129.7±2.94ms       ? ?/sec

like_utf8view scalar ends with 13 bytes
---------------------------------------
master_08_27         1.00      30.9±0.95ms       ? ?/sec
nullable_08_27       1.09      33.8±0.72ms       ? ?/sec
nullable_08_27_2     1.19      36.8±0.79ms       ? ?/sec

like_utf8view scalar ends with 4 bytes
--------------------------------------
master_08_27         1.00      31.9±0.55ms       ? ?/sec
nullable_08_27       1.10      35.0±0.51ms       ? ?/sec
nullable_08_27_2     1.19      38.0±3.34ms       ? ?/sec

like_utf8view scalar ends with 6 bytes
--------------------------------------
master_08_27         1.00      31.5±0.45ms       ? ?/sec
nullable_08_27       1.10      34.7±0.42ms       ? ?/sec
nullable_08_27_2     1.18      37.1±0.90ms       ? ?/sec

like_utf8view scalar equals
---------------------------
nullable_08_27       1.00      26.9±0.44ms       ? ?/sec
master_08_27         1.00      26.9±1.08ms       ? ?/sec
nullable_08_27_2     1.02      27.4±1.58ms       ? ?/sec

like_utf8view scalar starts with 13 bytes
-----------------------------------------
master_08_27         1.00      30.5±0.49ms       ? ?/sec
nullable_08_27       1.14      34.6±0.42ms       ? ?/sec
nullable_08_27_2     1.19      36.2±0.61ms       ? ?/sec

like_utf8view scalar starts with 4 bytes
----------------------------------------
nullable_08_27       1.00      17.5±0.51ms       ? ?/sec
master_08_27         1.25      21.9±0.31ms       ? ?/sec
nullable_08_27_2     1.29      22.6±0.30ms       ? ?/sec

like_utf8view scalar starts with 6 bytes
----------------------------------------
master_08_27         1.00      30.9±0.46ms       ? ?/sec
nullable_08_27       1.15      35.7±0.61ms       ? ?/sec
nullable_08_27_2     1.23      38.2±2.02ms       ? ?/sec

Update comment

Update comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants