Kafka_franz info #2676

Strafo · 2024-06-24T07:06:02Z

I would like to have some clarification on RedPanda-connect.

We have a use case structured as follows:

kafka/redpanda --> sink process --> database

Currently, we have created a process that disables auto-commit and manually commit the offset after writing the data to the database.
This way, we ensure that in case of a process crash, the batch of data consumed from Kafka is replayed to the sink. (We have a requirement not to lose any messages)

The RedPanda-connect documentation states: "It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks, it’s able to guarantee at-least-once delivery without needing to persist messages during transit."

Looking at the code from https://github.com/redpanda-data/connect/blob/main/internal/impl/kafka/input_kafka_franz.go, I see that the library used to communicate with Kafka is twmb/franz-go.

In the example https://github.com/twmb/franz-go/blob/master/examples/group_consuming/main.go of the franz-go library, I see that kgo.DisableAutoCommit() is used to disable autocommit.
Searching within the redpanda-connect repo, there doesn't seem to be any trace of this call. Additionally, from the documentation https://docs.redpanda.com/redpanda-connect/components/inputs/kafka_franz/, I see that one of the configurations is 'commit_period' with a default of 5s, but there is no way to disable autocommit and no way to not use this parameter.
For this reason, I deduce that auto commit is the default behavior and cannot be disabled in any way.

Reading the introduction of Redpanda-connect, I would have expected the interaction mechanism with Kafka to be manual commit rather than autocommit.
If this mechanism is not present, isn't there a risk that in case of a redpanda-connect crash, the batch of data consumed from Kafka and committed will be lost?

I assume a solution would be to ensure that the processing time is less than 'commit_period' to have almost certainty of no data loss, but we would still like to understand the functioning of redpanda-connect thoroughly.

Have I understood the functioning correctly, or have I missed something?

Jeffail · 2024-06-24T07:24:55Z

Hey @Strafo, here's the line of interest: https://github.com/redpanda-data/connect/blob/main/internal/impl/kafka/input_kafka_franz.go#L647, we only commit marked offsets, and we only mark offsets ready to commit when they're written.

Closing as per #2026

Jeffail added the question label Jun 24, 2024

Jeffail closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka_franz info #2676

Kafka_franz info #2676

Strafo commented Jun 24, 2024

Jeffail commented Jun 24, 2024

Kafka_franz info #2676

Kafka_franz info #2676

Comments

Strafo commented Jun 24, 2024

Jeffail commented Jun 24, 2024