-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(function): add least
function
#13786
Conversation
for array in arrays_iter { | ||
smallest = keep_smallest(Arc::clone(array), smallest)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waynexia this is the same as the greatest, see why #12474 (comment)
LGTM. thanks! |
Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Thank you @rluvaton Taking into account that the least/greatest implementations are almost identical (the way how these functions handle input arguments / constant folding optimization etc) and the major difference is sort options for the comparator and the method called on the comparator, perhaps it is possible to make these functions share the same parameterized implementation? I suppose, in this case it would be easier to fix/update functions behavior if required (since it'll require only one implementation modification instead of two). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you: The test coverage is great, I left several minor advices.
For copy paste codes from greatest
, I think at least we should add comments to those code which should be kept in sync, so that if someone changes one implementation in the future, they will remember to change the other (and preferably refactor to a parameterized implementation)
|
||
if lhs.len() != rhs.len() { | ||
return exec_err!("All arrays should have the same length for least comparison"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use debug_assert
, given it's a simple invariant check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal_err also would be appropriate to signal that this is not an expected error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used internal_error
Ok(ColumnarValue::Array(smallest)) | ||
} | ||
|
||
fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to reuse greatest
's implementation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged implementation with greatest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton and @Omega359 and @2010YOUY01 for the reivews
🙏
I think @2010YOUY01 's comments are good and ideally should be addressed prior to merge, however, I also think we could do them as a follow on
I also took the liberty of merging up from main and running ./dev/update_function_docs.sh
to fix the CI check and pushing to the branch.
@rluvaton -- let me know if you are able to address @2010YOUY01 's comments in this PR or if I should merge it and we can address in a follow on
|
||
if lhs.len() != rhs.len() { | ||
return exec_err!("All arrays should have the same length for least comparison"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal_err also would be appropriate to signal that this is not an expected error
Thank you, I did not have time to update based on the review, I'll do it in the coming days |
least
function
So I merged both implementations into one, I don't really like it, the naming I choose is off, I need a better name but all comments and resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- I think this looks like a significant improvement to me.
I added the ASF header to the new file and merged up from main in preparation of merging this PR.
In terms of names / files, a pattern that is more common in datafusion would be something like datafusion/functions/src/core/greatest_least.rs
(aka put the two functions that share most implementation in the same file)
However that is something we can improve as a follow on if someone wants to improve the code
Thanks again @rluvaton @Omega359 and @2010YOUY01 :rockeet |
Which issue does this PR close?
Closes #6531
Rationale for this change
adding more expressions support, and I already added
greatest
What changes are included in this PR?
Merged the execute for greatest and least function and and added what left
copied the greatest function I did in #12474 and modified.I choose coping over macro for the following reasons:1. Easier debugging and maintainability2. Least has custom logic regarding scalarsAre these changes tested?
yes
Are there any user-facing changes?
yes