Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix result for implicit GROUP BY without matches #1389

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hannahbast
Copy link
Member

For an implicit GROUP BY, the standard dictates that there should be a result (a single line) even when there are no matches. The aggregate value then depends on the aggregate function: 0 for COUNT, SUM, AVG, undefined for MIN and MAX, empty string for GROUP_CONCAT. This is now properly implemented and tested. Here is an example query: https://qlever.cs.uni-freiburg.de/olympics/iq1oph .

Along the way, improve the documentation in src/engine/sparqlExpressions/AggregateExpression.{h,cpp} considerably, which makes the really complex code there a bit easier to follow and extend.

@hannahbast hannahbast requested a review from joka921 July 7, 2024 16:11
Copy link

codecov bot commented Jul 7, 2024

Codecov Report

Attention: Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.06%. Comparing base (14d6e1c) to head (83cbbc7).

Files Patch % Lines
src/engine/GroupBy.cpp 0.00% 0 Missing and 1 partial ⚠️
src/engine/sparqlExpressions/AggregateExpression.h 97.05% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1389   +/-   ##
=======================================
  Coverage   89.06%   89.06%           
=======================================
  Files         328      328           
  Lines       29294    29306   +12     
  Branches     3262     3263    +1     
=======================================
+ Hits        26090    26101   +11     
  Misses       2054     2054           
- Partials     1150     1151    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

sonarcloud bot commented Jul 7, 2024

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thorough review.

The functionality in GroupBy.h is currently untested,
and probably it is hard to do so, because the GroupBY unit tests at all are currently a mess.
An easy way to currently circumvent this (the untestedness, not the coverage in Codecov) would be to add an E2E test for an implicit GROUP BY.


template class AggregateExpression<
AGG_OP<decltype(addForSum), NumericValueGetter>, decltype(averageFinalOp)>;
// Explicit instantiatio for the AVG expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Explicit instantiatio for the AVG expression.
// Explicit instantiation for the AVG expression.

//
// For example, for `SUM(?x + 5)`, `child` is the expression for `?x + 5`,
// `distinct` is `false`, and `aggregateOp` is the operation for computing
// the sum.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are iterating over the comments:
Typically the AggregateOperation has no state and can thus be default-constructed (hence the default argument).
The only exception in the standard is GROUP_CONCAT , where the separator has to be passed in.

// `ExpressionResult` variants). Used in the `evaluate` function.
//
// TODO: Why is this a lambda and not a normal member function? It's rather
// long and complex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be a member function because it is templated on its argument, but has to be passed to another function ( std::visit in this case).
You could however reimplement this as a (non-templated) class with a templated call operator (the exact equivalent of the lambda), then the definition can land in the cpp file.

Comment on lines +185 to +188
// NOTE: If the GROUP BY is implicit and we have a single group, that
// group can be empty. Then we cannot start with the first value and
// successively aggregate the others. Instead, we have to return the
// "neutral element" of that aggregation operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this case has already been handled long before we reach this place, so this comment is a bit long for "Not that we have already handled the case of an empty input.

using SumExpressionBase = AGG_EXP<decltype(addForSum), NumericValueGetter>;
class SumExpression : public AGG_EXP<decltype(addForSum), NumericValueGetter> {
using SumExpressionBase::SumExpressionBase;
ValueId resultForEmptyGroup() const override { return Id::makeFromInt(0); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make the defaultResultForEmptyGroup a template parameter of the AggregateExpression class,
Then all this code goes away and you can again write something like

using SumExpression = AGG_EXP<decltype(addForSum), NumericValueGetter, Id::makeFromInt(0)>;

(Finally a good use of all the makeFrom... functions being constexpr (if I'v e forgotten one of them, please add the constexpr).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first implementation used a template parameter (slightly differently from what you suggest, but in principle the same). But I didn't like the additional code required in the .cpp file: the template ... in front of all the function definitions becomes longer then and so do the explicit instantiations int the end.

Do you have a strong argument in favor of the template parameter vs. an additional member function? Efficiency is not an issue here as far as I can see (because it's only for a very special case with a single group).

Comment on lines +324 to +330
// An Operation that consists of a `FunctionAndValueGetters` that takes
// `NumOperands` parameters. The `FunctionForSetOfIntervalsType` is a function,
// that can efficiently perform the operation when all the operands are
// `SetOfInterval`s. It is necessary to use the `FunctionAndValueGetters`
// struct to allow for multiple `ValueGetters` (a parameter pack, that has to
// appear at the end of the template declaration) and the default parameter for
// the `FunctionForSetOfIntervals` (which also has to appear at the end).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// An Operation that consists of a `FunctionAndValueGetters` that takes
// `NumOperands` parameters. The `FunctionForSetOfIntervalsType` is a function,
// that can efficiently perform the operation when all the operands are
// `SetOfInterval`s. It is necessary to use the `FunctionAndValueGetters`
// struct to allow for multiple `ValueGetters` (a parameter pack, that has to
// appear at the end of the template declaration) and the default parameter for
// the `FunctionForSetOfIntervals` (which also has to appear at the end).
// The `SpecializedFunction`s can be used to choose a more efficient implementation given the types of the operands.
// For example, expressions like `logical-or` or `logical-and` can be implemented more efficiently if all the inputs are `SetOfInterval`s`.

(The mechanism has become more generic in the meantime).

auto testCountString = testAggregate<CountExpression, IdOrLiteralOrIri, Id>;
testCountString({lit("alpha"), lit("äpfel"), lit(""), lit("unfug")}, I(4));
auto testMaxString = testAggregate<MaxExpression, IdOrLiteralOrIri>;
// TODO<joka921> Implement correct comparison on strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am on it:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants