-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock with parallel processing of single partition stream #852
Comments
Thanks for your report @vladimirkl 🙂 I'm wondering if this is a bug related to zio-kafka or zio-streams 🤔 Sorry for the questions. I'm trying to better understand your issue. |
It may be some zio-streams issue, but I never encountered it without zio-kafka - while I have similar aggregations in other places . It locks exactly on |
I even tried to replace |
Thanks for the additional details :) |
I used this program to test the issue: https://gist.github.com/erikvanoosten/5e9f34d8ff43de32b583c021c858e309 This problem is caused by an immediate unsubscribe after the stream ends (see https://github.com/zio/zio-kafka/blob/master/zio-kafka/src/main/scala/zio/kafka/consumer/Consumer.scala#L254). Since we are no longer subscribed, we (can) no longer poll. When we stop polling, the commit callbacks will not be called anymore and all progress is halted. Only programs that do not consume the entire stream (the example here uses Potential solutions:
@svroonland WDYT? |
Why same code works with |
|
Things are much worse with zio-streams 2.0.14. Even simpler example hangs forever: Consumer.plainStream(Subscription.topics(topic), Serde.int, Serde.string)
.take(100)
.map(_.offset)
.aggregateAsync(Consumer.offsetBatches)
.debug("Offset")
.mapZIO(_.commit)
.debug("Commit")
.runDrain This code works perfectly with zio-streams 2.0.13. Unfortunately |
@erikvanoosten I believe your analysis is correct, the Regarding the possible solutions: if it's a race condition, then I'm not sure we always have pending commits to await before unsubscribing. We could look into a usage pattern that uses graceful shutdown to end the stream but keep the subscription. |
@svroonland Have a look at #890. The issue seems to be with the |
As in, commitAsync requires poll calls to complete? Yeah |
I encountered few more issues with hanging commit in other scenarios - when broker dies, then dies KafkaConsumer, but async commit hangs. Compared with fs2-kafka implementation - it has similar behaviour, but uses timeout for commit operation (15 seconds by default). zip-kafka user can add timeouts to commit everywhere in their code, but I think it's a good idea to add default timeout for safety - similar to fs2-kafka. I can create a PR. What do you think? |
Please do :) |
Isn't this issue fixed by #982? Can we close it? |
Not sure - timeout is definitely better than deadlock, but original issue with async grouping still remains. However if we cannot handle it all, we can close it for now. |
Hi, I need to take n records from single partition, process in parallel batches, and commit. Unfortunately this code causes a deadlock in Consumer on commit:
Replacing
groupedWithin
withgrouped
makes this code working. Also I can leavegroupedWithin
in place and removetake
- no deadlock occurs but I need to terminate stream exactly for n records. It looks like race condition on stream termination withtake
. Tested with zio-kafka 2.3.0, Scala 2.13.10 and embedded KafkaThe text was updated successfully, but these errors were encountered: