You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During routine patching of our MSK Kafka clusters, we see a range of transient errors in our kafka producers (using the kafka-go Writer) as the brokers are patched. Many of these errors are retried by the logic in the kafka-go Writer, but we see a small volume of EOF errors which are not retried at all and result in an immediate permanent failure of those writes. I'm not sure if this is intentional due to certain behaviours of the kafka protocol, but from our perspective we would like to see these requests that result in an io.EOF error retried as well so that we don't lose those messages.
We see this behaviour for a small number of writes every time there is security patching or other operations on our MSK cluster that result in broker rolling replacements/restarts. When the Writer sees an io.EOF error, this is not retried and the message write fails.
Expected Behavior
Ideally all errors that can be retried by the Writer are retried so that maintenance operations on a Kafka cluster are seamless and don't cause any messages to be lost.
Observed Behavior
If an EOF is received by the kafka Writer, this is not retried and the write fails immediately with the following error: Kafka write errors (1/1), errors: [kafka.(*Client).Produce: EOF]
The text was updated successfully, but these errors were encountered:
The reason why these errors are not retried is because an exception occurred in the entire cluster. If you are doing a rolling restart, it stands to reason that these errors defined by him will not be triggered.
Describe the bug
During routine patching of our MSK Kafka clusters, we see a range of transient errors in our kafka producers (using the kafka-go Writer) as the brokers are patched. Many of these errors are retried by the logic in the kafka-go Writer, but we see a small volume of EOF errors which are not retried at all and result in an immediate permanent failure of those writes. I'm not sure if this is intentional due to certain behaviours of the kafka protocol, but from our perspective we would like to see these requests that result in an
io.EOF
error retried as well so that we don't lose those messages.It looks like the related code snippets are:
kafka-go/writer.go
Line 1162 in a8e5eab
kafka-go/error.go
Line 601 in a8e5eab
kafka-go/error.go
Line 609 in a8e5eab
Kafka Version
Kafka version: 3.7
kafka-go
version: v0.4.47To Reproduce
We see this behaviour for a small number of writes every time there is security patching or other operations on our MSK cluster that result in broker rolling replacements/restarts. When the Writer sees an
io.EOF
error, this is not retried and the message write fails.Expected Behavior
Ideally all errors that can be retried by the Writer are retried so that maintenance operations on a Kafka cluster are seamless and don't cause any messages to be lost.
Observed Behavior
If an EOF is received by the kafka Writer, this is not retried and the write fails immediately with the following error:
Kafka write errors (1/1), errors: [kafka.(*Client).Produce: EOF]
The text was updated successfully, but these errors were encountered: