You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to have some clarification on RedPanda-connect.
We have a use case structured as follows:
kafka/redpanda --> sink process --> database
Currently, we have created a process that disables auto-commit and manually commit the offset after writing the data to the database.
This way, we ensure that in case of a process crash, the batch of data consumed from Kafka is replayed to the sink. (We have a requirement not to lose any messages)
The RedPanda-connect documentation states: "It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks, it’s able to guarantee at-least-once delivery without needing to persist messages during transit."
In the example https://github.com/twmb/franz-go/blob/master/examples/group_consuming/main.go of the franz-go library, I see that kgo.DisableAutoCommit() is used to disable autocommit.
Searching within the redpanda-connect repo, there doesn't seem to be any trace of this call. Additionally, from the documentation https://docs.redpanda.com/redpanda-connect/components/inputs/kafka_franz/, I see that one of the configurations is 'commit_period' with a default of 5s, but there is no way to disable autocommit and no way to not use this parameter.
For this reason, I deduce that auto commit is the default behavior and cannot be disabled in any way.
Reading the introduction of Redpanda-connect, I would have expected the interaction mechanism with Kafka to be manual commit rather than autocommit.
If this mechanism is not present, isn't there a risk that in case of a redpanda-connect crash, the batch of data consumed from Kafka and committed will be lost?
I assume a solution would be to ensure that the processing time is less than 'commit_period' to have almost certainty of no data loss, but we would still like to understand the functioning of redpanda-connect thoroughly.
Have I understood the functioning correctly, or have I missed something?
The text was updated successfully, but these errors were encountered:
I would like to have some clarification on RedPanda-connect.
We have a use case structured as follows:
kafka/redpanda --> sink process --> database
Currently, we have created a process that disables auto-commit and manually commit the offset after writing the data to the database.
This way, we ensure that in case of a process crash, the batch of data consumed from Kafka is replayed to the sink. (We have a requirement not to lose any messages)
The RedPanda-connect documentation states: "It implements transaction-based resiliency with back pressure, so when connecting to at-least-once sources and sinks, it’s able to guarantee at-least-once delivery without needing to persist messages during transit."
Looking at the code from https://github.com/redpanda-data/connect/blob/main/internal/impl/kafka/input_kafka_franz.go, I see that the library used to communicate with Kafka is twmb/franz-go.
In the example https://github.com/twmb/franz-go/blob/master/examples/group_consuming/main.go of the franz-go library, I see that kgo.DisableAutoCommit() is used to disable autocommit.
Searching within the redpanda-connect repo, there doesn't seem to be any trace of this call. Additionally, from the documentation https://docs.redpanda.com/redpanda-connect/components/inputs/kafka_franz/, I see that one of the configurations is 'commit_period' with a default of 5s, but there is no way to disable autocommit and no way to not use this parameter.
For this reason, I deduce that auto commit is the default behavior and cannot be disabled in any way.
Reading the introduction of Redpanda-connect, I would have expected the interaction mechanism with Kafka to be manual commit rather than autocommit.
If this mechanism is not present, isn't there a risk that in case of a redpanda-connect crash, the batch of data consumed from Kafka and committed will be lost?
I assume a solution would be to ensure that the processing time is less than 'commit_period' to have almost certainty of no data loss, but we would still like to understand the functioning of redpanda-connect thoroughly.
Have I understood the functioning correctly, or have I missed something?
The text was updated successfully, but these errors were encountered: