Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of casting StringView/BinaryView to DictionaryArray #5872

Merged
merged 10 commits into from
Jun 13, 2024

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jun 11, 2024

Which issue does this PR close?

Closes #5861 . This pr should only be reviewed/merged after #5871 is merged.

Rationale for this change

Cast view types to dictionary arrays, the opposite of #5871

Note that casting was handled by pack_byte_to_dictionary:
https://github.com/XiangpengHao/arrow-rs/blob/dict-to-view/arrow-cast/src/cast/dictionary.rs#L307

But it takes two steps: Utf8View -> Utf8 -> Dictionary, which causes two copies.
This PR only do one copy (directly from Utf8View -> Dictionary<>)

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 11, 2024
@alamb alamb changed the title View types to DictionaryArray Add cast View types to DictionaryArray Jun 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @XiangpengHao --

Since the rationale for these changes is performance, I think we should have benchmark results showing how much faster it is.

Thus, can you please make a PR with a banchmark (a single PR with benchmarks both for this change and the one in #5871)

Then we'll use the benchmarks to verify that this PR really does improve performance (I am sure it will, but we should double check)

@alamb alamb changed the title Add cast View types to DictionaryArray Improve performance of casting StringView/BinaryView to DictionaryArray Jun 13, 2024
@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

I took the liberty of merging this PR up from main. Running performance benchmarks on it now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measure a 30% performance improvement on this branch. Thank you @XiangpengHao

++ critcmp master dict-to-view
group                       dict-to-view                           master
-----                       ------------                           ------
cast dict to string view    1.00     69.0±2.58µs        ? ?/sec    1.02    70.6±12.48µs        ? ?/sec
cast string view to dict    1.00    203.3±0.32µs        ? ?/sec    1.34    272.1±0.69µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

Does this PR close #5861 I wonder 🤔

@XiangpengHao
Copy link
Contributor Author

Does this PR close #5861 I wonder 🤔

I guess it should. Are there any remaining items?

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

I guess it should. Are there any remaining items?

Not that I know of

@alamb
Copy link
Contributor

alamb commented Jun 13, 2024

🚀

@alamb alamb merged commit 8752e01 into apache:master Jun 13, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cast kernel support for StringViewArray and BinaryViewArray <--> DictionaryArray`
2 participants