EOF errors aren't retried by the Writer #1352

scottybrisbane · 2024-11-27T22:07:09Z

Describe the bug

During routine patching of our MSK Kafka clusters, we see a range of transient errors in our kafka producers (using the kafka-go Writer) as the brokers are patched. Many of these errors are retried by the logic in the kafka-go Writer, but we see a small volume of EOF errors which are not retried at all and result in an immediate permanent failure of those writes. I'm not sure if this is intentional due to certain behaviours of the kafka protocol, but from our perspective we would like to see these requests that result in an io.EOF error retried as well so that we don't lose those messages.

It looks like the related code snippets are:

kafka-go/writer.go

Line 1162 in a8e5eab

if !isTemporary(err) && !isTransientNetworkError(err) {
kafka-go/error.go

Line 601 in a8e5eab

func isTemporary(err error) bool {
kafka-go/error.go

Line 609 in a8e5eab

func isTransientNetworkError(err error) bool {

Kafka Version

Kafka version: 3.7
kafka-go version: v0.4.47

To Reproduce

We see this behaviour for a small number of writes every time there is security patching or other operations on our MSK cluster that result in broker rolling replacements/restarts. When the Writer sees an io.EOF error, this is not retried and the message write fails.

Expected Behavior

Ideally all errors that can be retried by the Writer are retried so that maintenance operations on a Kafka cluster are seamless and don't cause any messages to be lost.

Observed Behavior

If an EOF is received by the kafka Writer, this is not retried and the write fails immediately with the following error: Kafka write errors (1/1), errors: [kafka.(*Client).Produce: EOF]

The text was updated successfully, but these errors were encountered:

fzj55 · 2024-11-28T13:56:57Z

The reason why these errors are not retried is because an exception occurred in the entire cluster. If you are doing a rolling restart, it stands to reason that these errors defined by him will not be triggered.

scottybrisbane added the bug label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOF errors aren't retried by the Writer #1352

EOF errors aren't retried by the Writer #1352

scottybrisbane commented Nov 27, 2024

fzj55 commented Nov 28, 2024

EOF errors aren't retried by the Writer #1352

EOF errors aren't retried by the Writer #1352

Comments

scottybrisbane commented Nov 27, 2024

fzj55 commented Nov 28, 2024