Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(function): add least function #13786

Merged
merged 17 commits into from
Dec 20, 2024
Merged

feat(function): add least function #13786

merged 17 commits into from
Dec 20, 2024

Conversation

rluvaton
Copy link
Contributor

@rluvaton rluvaton commented Dec 14, 2024

Which issue does this PR close?

Closes #6531

Rationale for this change

adding more expressions support, and I already added greatest

What changes are included in this PR?

Merged the execute for greatest and least function and and added what left

copied the greatest function I did in #12474 and modified.

I choose coping over macro for the following reasons:
1. Easier debugging and maintainability
2. Least has custom logic regarding scalars

Are these changes tested?

yes

Are there any user-facing changes?

yes

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions labels Dec 14, 2024
Comment on lines 224 to 226
for array in arrays_iter {
smallest = keep_smallest(Arc::clone(array), smallest)?;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynexia this is the same as the greatest, see why #12474 (comment)

@Omega359
Copy link
Contributor

LGTM. thanks!

rluvaton and others added 2 commits December 15, 2024 01:55
Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
@korowa
Copy link
Contributor

korowa commented Dec 15, 2024

Thank you @rluvaton

Taking into account that the least/greatest implementations are almost identical (the way how these functions handle input arguments / constant folding optimization etc) and the major difference is sort options for the comparator and the method called on the comparator, perhaps it is possible to make these functions share the same parameterized implementation? I suppose, in this case it would be easier to fix/update functions behavior if required (since it'll require only one implementation modification instead of two).

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you: The test coverage is great, I left several minor advices.

For copy paste codes from greatest, I think at least we should add comments to those code which should be kept in sync, so that if someone changes one implementation in the future, they will remember to change the other (and preferably refactor to a parameterized implementation)

datafusion/functions/src/core/least.rs Outdated Show resolved Hide resolved

if lhs.len() != rhs.len() {
return exec_err!("All arrays should have the same length for least comparison");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use debug_assert, given it's a simple invariant check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal_err also would be appropriate to signal that this is not an expected error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used internal_error

datafusion/functions/src/core/least.rs Outdated Show resolved Hide resolved
Ok(ColumnarValue::Array(smallest))
}

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to reuse greatest's implementation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged implementation with greatest

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton and @Omega359 and @2010YOUY01 for the reivews

🙏

I think @2010YOUY01 's comments are good and ideally should be addressed prior to merge, however, I also think we could do them as a follow on

I also took the liberty of merging up from main and running ./dev/update_function_docs.sh to fix the CI check and pushing to the branch.

@rluvaton -- let me know if you are able to address @2010YOUY01 's comments in this PR or if I should merge it and we can address in a follow on


if lhs.len() != rhs.len() {
return exec_err!("All arrays should have the same length for least comparison");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal_err also would be appropriate to signal that this is not an expected error

@rluvaton
Copy link
Contributor Author

Thank you, I did not have time to update based on the review, I'll do it in the coming days

@alamb alamb changed the title feat(function): add least function feat(function): add least function Dec 19, 2024
@rluvaton
Copy link
Contributor Author

So I merged both implementations into one, I don't really like it, the naming I choose is off, I need a better name but all comments and resolved

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- I think this looks like a significant improvement to me.

I added the ASF header to the new file and merged up from main in preparation of merging this PR.

In terms of names / files, a pattern that is more common in datafusion would be something like datafusion/functions/src/core/greatest_least.rs (aka put the two functions that share most implementation in the same file)

However that is something we can improve as a follow on if someone wants to improve the code

@alamb alamb merged commit 667c77a into apache:main Dec 20, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 20, 2024

Thanks again @rluvaton @Omega359 and @2010YOUY01 :rockeet

@rluvaton rluvaton deleted the add-least branch December 21, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add greatest(T,...) and least(T,...) SQL functions
5 participants