Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka_franz info #2676

Closed
Strafo opened this issue Jun 24, 2024 · 1 comment
Closed

Kafka_franz info #2676

Strafo opened this issue Jun 24, 2024 · 1 comment
Labels

Comments

@Strafo
Copy link

Strafo commented Jun 24, 2024

I would like to have some clarification on RedPanda-connect.

We have a use case structured as follows:

      kafka/redpanda --> sink process --> database

Currently, we have created a process that disables auto-commit and manually commit the offset after writing the data to the database.
This way, we ensure that in case of a process crash, the batch of data consumed from Kafka is replayed to the sink. (We have a requirement not to lose any messages)

The RedPanda-connect documentation states: "It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks, it’s able to guarantee at-least-once delivery without needing to persist messages during transit."

Looking at the code from https://github.com/redpanda-data/connect/blob/main/internal/impl/kafka/input_kafka_franz.go, I see that the library used to communicate with Kafka is twmb/franz-go.

In the example https://github.com/twmb/franz-go/blob/master/examples/group_consuming/main.go of the franz-go library, I see that kgo.DisableAutoCommit() is used to disable autocommit.
Searching within the redpanda-connect repo, there doesn't seem to be any trace of this call. Additionally, from the documentation https://docs.redpanda.com/redpanda-connect/components/inputs/kafka_franz/, I see that one of the configurations is 'commit_period' with a default of 5s, but there is no way to disable autocommit and no way to not use this parameter.
For this reason, I deduce that auto commit is the default behavior and cannot be disabled in any way.

Reading the introduction of Redpanda-connect, I would have expected the interaction mechanism with Kafka to be manual commit rather than autocommit.
If this mechanism is not present, isn't there a risk that in case of a redpanda-connect crash, the batch of data consumed from Kafka and committed will be lost?

I assume a solution would be to ensure that the processing time is less than 'commit_period' to have almost certainty of no data loss, but we would still like to understand the functioning of redpanda-connect thoroughly.

Have I understood the functioning correctly, or have I missed something?

@Jeffail
Copy link
Collaborator

Jeffail commented Jun 24, 2024

Hey @Strafo, here's the line of interest: https://github.com/redpanda-data/connect/blob/main/internal/impl/kafka/input_kafka_franz.go#L647, we only commit marked offsets, and we only mark offsets ready to commit when they're written.

Closing as per #2026

@Jeffail Jeffail closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants