-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of casting DictionaryArray
to StringViewArray
#5871
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @XiangpengHao -- This is looking great.
arrow-cast/src/cast/dictionary.rs
Outdated
StringViewArray::new_unchecked( | ||
view_buffer, | ||
vec![value_buffer], | ||
dict_array.nulls().cloned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling nulls()
doesn't handle the case where the dictionary value itself (rather than the key) was null
fn nulls(&self) -> Option<&NullBuffer> { |
I think this should call logical_nulls()
instead:
fn logical_nulls(&self) -> Option<NullBuffer> { |
Also, it would be good to create a test case that covers this too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! added a new test to cover this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(fyi there seem to be no new commits to this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added now!
arrow-cast/src/cast/mod.rs
Outdated
#[test] | ||
fn test_dict_to_view() { | ||
let string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA); | ||
let string_dict_array: DictionaryArray<Int8Type> = VIEW_TEST_DATA.into_iter().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update this test to have a dictionary array that has:
- Repeated use of dictionary values
- keys that are not all increasing
- Nulls in the values (as well as the keys)
Perhaps what you can do is create a StringArray
from VIEW_TEST_DATA
and then make create the indexes manually
Like this example https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#example-from-existing-arrays
Does that make sense?
arrow-cast/src/cast/dictionary.rs
Outdated
let length = end - offset; | ||
let value_buf = &value_buffer[offset as usize..end as usize]; | ||
|
||
if length <= 12 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating the views directly, what do you think about using try_append_view
and try_append_block
added in this PR: #5796
Maybe as bonus points you could add a benchmark for this casting operation, and see if adding append_view_unchecked
would make any difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, adding append_view_unchecked
improved the performance by 50%
dict to view time: [38.116 µs 38.122 µs 38.127 µs]
change: [-50.618% -50.455% -50.290%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
I think it justifies adding append_view_unchecked
, what do you think?
Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it justifies adding append_view_unchecked, what do you think?
Makes sense to me
Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..
I do think we should add the benchmark to the repo (so we can use it for future optimizations)
In terms of justifying append_view_unsafe
I think running the benchmark on a local checkout that calls append_view
and then revert and run the same benchmark is fine (which is presumably what you did)
BTW, with this pull request, casting a dictionary array to string view with 10_000 items improved performance by 25%
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- I think this could be made significantly faster but I think we should add a benchmark and optimize as a follow on PR
Thank you @XiangpengHao
let typed_dict = string_dict_array.downcast_dict::<StringArray>().unwrap(); | ||
|
||
let string_view_array = { | ||
let mut builder = StringViewBuilder::new().with_block_size(8); // multiple buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let value_offsets = array.value_offsets(); | ||
let mut builder = GenericByteViewBuilder::<T>::with_capacity(keys.len()); | ||
builder.append_block(value_buffer.clone()); | ||
for i in keys.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential optimization is a separate loop if there are no nulls in keys (so we can avoid the branch)
Another potental idea is to use value_offsets.windows(2)
as an iterator to avoid the bounds checks in value_offsets
However, I think we should merge this basic PR in as is, and then add a bencmark and optimize this kenrnel as a follow on PR (if we care). I can file a ticket if @tustvold agrees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ops, I checked in a append_view_unchecked
.
But if we do try_append_view
, it is even slower than the unpack_dictionary
approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also move append_view_unchecked
as a follow-up PR and potentially clean up other use cases where ByteViews are manually constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao --
This looks really nice to me. Can you please make a separate PR with the cast dictionary --> view benchmark? Then we can use that benchmark to ensure that this approach is faster than "unpack dictionary" (which I am sure it will be)
}; | ||
self.views_builder.append(view.into()); | ||
unsafe { | ||
self.append_view_unchecked(block, offset, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
My plan is to merge this branch up to main, run the benchmarks added in #5874 on it and then assuming they look good merge it in |
The bechmark I ran suggests this kernel is almost 2x as fast as the existing approach (which makes sense as it avoids a copy). Nice work @XiangpengHao
|
DictionaryArray
to StringViewArray
Which issue does this PR close?
Part of #5861 .
Rationale for this change
Casting from
DictionaryArray
toString/BinaryView
was previously handled byunpack_dictionary
: https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast/dictionary.rs#L93-L105which incurs unnecessary copy to the value buffer. This pr handles Utf8View and BinaryView so that it will reuse the value buffer instead of creating a new one.
What changes are included in this PR?
Are there any user-facing changes?
No