-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Source Connectivity #5
Comments
Hi guy(s)! |
@LeonardoBonacci It's definitely in the works, but I've run into some snags. ArangoDB in cluster mode doesn't have a good way to output data events. They have a Write Ahead Log (WAL) API, but it's not supported for clusters so the only way to get the events is to use a It should be doable -- ArangoDB's paid offering uses a proprietary Kafka utility to perform cross-cluster syncing. That's not the goal of this project but I think we'd be retrieving the data from the database in a similar way. I'm working on all this, but it's not trivial especially since I have a day job unrelated to this 🙂 But it's good to hear that there is interest! |
Thanks a lot @jaredpetersen for your elaboration! This morning I thought of ArangoDB and potentially offer (you) some help writing the source connector. Now, reading the very technical issues you are facing and not being an ArangoDB insider myself, I can only wish you the best of luck! :) |
Add support for using ArangoDB as a source system. Read from the ArangoDB write-ahead log (WAL) and create Kafka messages based on the record changes.
This has some complications. ArangoDB's WAL API is only supported for single-server instances.
When you use ArangoDB is cluster mode, you end up with multiple DB Servers that each maintains a write-ahead log. If you tail each log, you can end up with duplicates depending on how replication was set up. Additionally, there has to be some work to ensure that the records from each of the individual logs are written into Kafka in the order that they were written into ArangoDB. While we're doing this work, we may be able to do some de-duplication stuff as well based on timestamps. We'll have to see when we get there.
In all likelihood, we probably won't be able to provide an exactly-once delivery guarantee; consumers will have to expect duplicate messages. This should be acceptable as exactly-once is only a guarantee assuming that nothing ever goes wrong with the producer, in which case consumers can expect to receive messages at least once.
One important thing to note is that this feature is not intended to be used for datacenter-to-datacenter replication. ArangoDB has its own solution for that as part of its enterprise offering (which incidentally uses Kafka). Our goal here is not to make enterprise features free; it's to hook up ArangoDB to Kafka.
The text was updated successfully, but these errors were encountered: