Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up ByteView construction #5879

Merged
merged 2 commits into from
Jun 13, 2024
Merged

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Closes #5878 .

Rationale for this change

Avoids manual construction, use unchecked when it is safe.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Jun 13, 2024
@XiangpengHao
Copy link
Contributor Author

CI failed because the new linting rule is being applied in rust 1.79 👀

@XiangpengHao
Copy link
Contributor Author

CI failed because the new linting rule is being applied in rust 1.79 👀

Waiting for #5881 to merge

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

Waiting for #5881 to merge

Merged!

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

I merged up from main to get a clean CI run

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @XiangpengHao -- very cool. Nice cleanup and speed improvement

offset,
};
views_builder.append(view.into());
// Safety: the input was a valid array so it valid UTF8 (if string). And
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let len = array.len();
let str_values_buf = data.buffers()[1].clone();
let offsets = data.buffers()[0].typed_data::<FROM::Offset>();
let str_values_buf = byte_array.values().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice cleanuo

builder
.try_append_view(block, start.as_usize() as u32, len as u32)
.unwrap();
// Safety: (1) the buffer is valid (2) the offsets are valid (3) the values in between are of ByteViewType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the benchmarks and this seems to have gotten another 8% performance improvement:

++ critcmp master byte-view-clean
group                                                                          byte-view-clean                        master
-----                                                                          ---------------                        ------
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs     1.00    511.7±1.16µs        ? ?/sec    1.08    554.8±4.84µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs    1.00    711.4±1.29µs        ? ?/sec    1.03    729.7±1.41µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs      1.00    513.7±1.25µs        ? ?/sec    1.08    555.4±1.51µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs          1.00    558.3±2.88µs        ? ?/sec    1.08    602.8±2.51µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs         1.00    740.2±3.73µs        ? ?/sec    1.03    765.2±4.20µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs           1.00    561.7±2.62µs        ? ?/sec    1.08    606.8±2.68µs        ? ?/sec

@alamb alamb merged commit e8285be into apache:master Jun 13, 2024
27 checks passed
@XiangpengHao XiangpengHao deleted the byte-view-clean branch June 13, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup ByteView construction
2 participants