-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eap): Mutations consumer MVP #6216
Conversation
)); | ||
} | ||
|
||
let mut body = format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Write cancellation rows
|
||
let mut body = format!( | ||
" | ||
INSERT INTO {table} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@untitaker 2 thoughts on this query:
-
We can rewrite this to avoid JOIN by using the WITH context. We would need to look into the performance differences.
-
What if we tried a UNION to get both the cancellation and the update in this one query? (SELECT ... UNION SELECT ...)
I'm also wondering if there's a way to do something like INSERT INTO ... VALUES (SELECT ...) (SELECT ...) so we can have 2 insertions back to back?
Just some thoughts -- I can try these out on some local data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think WITH conflicts directly with input() in my tests. I was also trying to use WITH to make the cancellation rows happen (using both WITH and UNION ALL). I'm not sure how it would help to avoid the join?
If you have a specific query let's discuss 1:1 or just push it, if it works already.
so we can have 2 insertions back to back?
then we should just merge those MutationBatches at JSON level, I don't think the query needs to adapt to accomodate larger batches.
From local testing:
|
@ayirr7 with the test command I gave you, it sends duplicate rows, and the rows contain timestamps that are immediately subjected to TTL. |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard |
I have published the new topic and its schema as follows: getsentry/sentry-kafka-schemas#333. Will bump in Snuba tomorrow. As I see it, we'll have to modify the
As the payload will not be directly deserialized into the |
@@ -82,19 +134,21 @@ fn fnv_1a(input: &[u8]) -> u32 { | |||
impl From<FromSpanMessage> for EAPSpan { | |||
fn from(from: FromSpanMessage) -> EAPSpan { | |||
let mut res = Self { | |||
organization_id: from.organization_id, | |||
primary_key: PrimaryKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@untitaker I was curious why exactly we added initializing the PrimaryKey
here? Is it just so that we have some consistency between the mutations and EAP spans schema?
Doing this separately so we can roll it out independently of the actual consumer: #6216 This will be rolled out after the schema is published getsentry/sentry-kafka-schemas#333
going to try and get this merged even though the query is not performant yet -- otherwise we'll keep fighting merge conflicts |
)); | ||
} | ||
|
||
// rewriting the update query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these still need to be here?
A follow-up to #6216 More rigorous testing for the main query run on the mutations consumer. We need to be able to catch any correctness issues with the query in CI itself. --------- Co-authored-by: Markus Unterwaditzer <markus-tarpit+git@unterwaditzer.net> Co-authored-by: Markus Unterwaditzer <markus-honeypot@unterwaditzer.net>
* span_id should be a string * _sort_timestamp is no longer directly exposed, instead users provide start_timestamp_ms and it gets converted using the same logic used in the ingestion consumer This is a follow-up to #6216
Rolling out the mutations consumer that has general pattern of INSERT INTO .. SELECT FROM into Clickhouse, in order to simultaneously insert updated and cancelled/negated rows into Clickhouse to reflect a single mutation.
This PR includes: