Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output: Clickhouse #212

Open
samcm opened this issue Sep 28, 2023 · 3 comments
Open

Output: Clickhouse #212

samcm opened this issue Sep 28, 2023 · 3 comments

Comments

@samcm
Copy link
Member

samcm commented Sep 28, 2023

Create a new output in pkg/output that directly exports events to a clickhouse cluster. The output should make use of proto/xatu.Filter like other outputs to handle routing. The output should have per-event configuration, allowing for things like batching.

@00x-dx
Copy link
Contributor

00x-dx commented Oct 22, 2023

@samcm I can start looking into this.

@samcm
Copy link
Member Author

samcm commented Oct 23, 2023

That would be incredible! We have a bunch of migrations here: https://github.com/ethpandaops/analytics-pipeline/tree/master/clickhouse/migrations. Maybe they can live in this repo if we officially support clickhouse as an output?

One thing I'm not certain on is batching. I believe inserting bulk data is significantly more efficient, but the config for that in xatu would look really gross and verbose (imagine an output for every table where we filter for only the relevant events to that table 🤮.) Maybe the output maintains its own buffer and batches per table?

@00x-dx
Copy link
Contributor

00x-dx commented Oct 23, 2023

  • Migrations can live there, as we move forward with supporting the CH as an official output plugin.
  • Today with Xatu, we have collection + filtering + batching + output all in one place; this could make the extension of outputs like NoSQL (CH, ES, etc) extremely difficult. Probably segregating them into multiple components can ease; (Collection+Filtering)Xatu-collector, Xatu-buffer(Kafka/memory), and Xatu-Outputs (CH, ES, Mysql, etc). And so the different configs, just a thought. With this approach, all filtering can happen at the collection layer, and the buffer would be independent of event collection & storage.
  • probably worth thinking about the limitations on filtering at a certain time in the future(or we would know this better by the end of this OP plugin). Like isolating them at the component level (sentry/cannon/discovery etc) or deriving a base structure for different event types, and sticking the filtering layer at the event def.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants