-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1604: Deprecate non-utf8 bloom filter for Java writer #1776
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cxzl25 . I agree with your intention.
However, Apache ORC community need a deprecation process before the removal of any public API. I don't think we can remove the API.
Could you convert this PR into a deprecation PR instead of a removal PR?
This reverts commit d2e021a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
### What changes were proposed in this pull request? This PR aims to deprecate non-utf8 bloom filter for writer. 1. deprecate `org.apache.orc.OrcFile.WriterOptions#bloomFilterVersion` 2. deprecate `org.apache.orc.OrcFile.WriterOptions#getBloomFilterVersion 3. deprecate `org.apache.orc.impl.writer.WriterContext#getBloomFilterVersion` ### Why are the changes needed? 1. `orc.bloom.filter.write.version=original` will write two copies of data instead of one, which increases the size of ORC and will also cause Spark2.x to fail to read `BloomFilterUtf8` [comment-17800800](https://issues.apache.org/jira/browse/ORC-297?focusedCommentId=17800800&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17800800) 4. C++ writer does not implement original 5. Plan to remove non-utf8 bloom filter in `orc-format` `ORCv2.md` ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1776 from cxzl25/ORC-1604. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 6c3c451) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thank you for updates. Merged to main/2.0 for Apache ORC 2.0. |
What changes were proposed in this pull request?
This PR aims to deprecate non-utf8 bloom filter for writer.
org.apache.orc.OrcFile.WriterOptions#bloomFilterVersion
org.apache.orc.impl.writer.WriterContext#getBloomFilterVersion
Why are the changes needed?
orc.bloom.filter.write.version=original
will write two copies of data instead of one, which increases the size of ORC and will also cause Spark2.x to fail to readBloomFilterUtf8
comment-17800800
orc-format
ORCv2.md
How was this patch tested?
GA
Was this patch authored or co-authored using generative AI tooling?
No