Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-2749-first-and-last-agg #47

Merged
merged 6 commits into from
Jul 10, 2024
Merged

Conversation

Allyson-English
Copy link
Member

@Allyson-English Allyson-English commented Jul 3, 2024

Return the first and last values, chronologically, that appear in the bucket
ex:

  • argMin: return value associated with minimum timestamp
  • argMax: return value associated with maximum timestamp

Note:

  • I changed how we were inserting dummy string values in insertTestData to make sure that the first string value and the last string value were distinct so that the test was more clear. This also required me to update the expected response for StringAggregationUnique

Copy link

linear bot commented Jul 3, 2024

@elffjs
Copy link
Member

elffjs commented Jul 3, 2024

Validate the assumption. I was afraid so I suggested something else.

@elffjs
Copy link
Member

elffjs commented Jul 3, 2024

Early experiments seem pretty good.

@elffjs
Copy link
Member

elffjs commented Jul 3, 2024

image

@elffjs
Copy link
Member

elffjs commented Jul 3, 2024

@KevinJoiner If this does work then we can probably replace the argmaxes for latest signals.

Comment on lines 48 to 49
firstGroup = "first_value(" + vss.ValueNumberCol + ")"
lastGroup = "last_value(" + vss.ValueNumberCol + ")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't think we want to use these since they are not deterministic, since first_value is an alias for any

danger
The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate. To get a determinate result, you can use the min or max function instead of any.

-- https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/any

anyLast

Selects the last value encountered. The result is just as indeterminate as for the any function.

--https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/anylast

Copy link
Member

@elffjs elffjs Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the first_value doc

As with any, without Window Functions the result will be random if the source stream is not ordered and the return type matches the input type

Maybe the bucket here is an ordered stream (inheriting some ordering from the index). It seems to behave that way.

I’m nervous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, this is what I was referencing when initially thinking that the source was ordered https://github.com/DIMO-Network/telemetry-api/blob/main/internal/service/ch/queries.go#L316

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s ordering the buckets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ordered based on the signal table's create statement, but I don't love relying on that. max and min seem safest unless we hit some performance problems

@KevinJoiner
Copy link
Collaborator

I think you need to rebase with main to get CI working.

@Allyson-English Allyson-English merged commit 5967cda into main Jul 10, 2024
3 checks passed
@Allyson-English Allyson-English deleted the SI-2749-first-and-last-agg branch July 10, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants