Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to simulate Kafka failure #40

Open
vincentfree opened this issue Dec 4, 2019 · 8 comments
Open

Ability to simulate Kafka failure #40

vincentfree opened this issue Dec 4, 2019 · 8 comments

Comments

@vincentfree
Copy link

I'm using Kafka-unit for test through Kafka but i also want to simulate a failing stack to test my resilience using Kafka.

Could you add a method to kill Kafka in such a way that a consumer / producer marks this as actual cluster failure?

A method to bring back the stack would the be great to see how an application recovers

@Crim
Copy link
Collaborator

Crim commented Dec 5, 2019

Hi @vincentfree ,

Does this test case do what you're looking for? Or is there some other functionality specifically you're looking for?

https://github.com/salesforce/kafka-junit/blob/master/kafka-junit-core/src/test/java/com/salesforce/kafka/test/KafkaTestClusterTest.java#L235-L329

@vincentfree
Copy link
Author

This does about the same as I have now which is cleanly shutdown a broker. This is nice because I can test a upgrade scenario or patch scenario but what I also want to do is to have a un clean shutdown of a broker. I had some resilience problems in the past with brokers going down with OOM errors and the like, I don't manage our own kafka cluster so my application gets notice by failing 😅.

Normally this should be managed by the consumers/producers the selfs but that failed in my case.

I want to be able to reenact such failure with brokers, especially the controller node failing.

@Crim
Copy link
Collaborator

Crim commented Dec 11, 2019

Hmm, no such method exists today, tho it is an interesting use case. I'll poke around and see if I can come up with anything.

The underlying KafkaServerStartable doesn't provide much access to work with, so it may require bypassing that and interacting directly with it's underlying KafkaServer instance....

@vincentfree
Copy link
Author

It would be great to be able to do so and after killing a server bringing it back up with the same signature for the cluster after some time. You can then test the resilience, Kafka's election process under failure, reelection when a server comes back up and impact on consumers and producers while this all happens.

@gquintana
Copy link
Contributor

gquintana commented Apr 30, 2020

I wanted to do same test the behaviour when a broker is down.

I tried to do:

sharedKafkaTestResource.getKafkaBrokers().getBrokerById(1).stop();

And then

sharedKafkaTestResource.getKafkaBrokers().getBrokerById(1).start();

But the stop() method seems to be asynchronous, as a result it's impossible to know when the test can continue. After calling KafkaServerStartable#shutdown, I need to wait for shutdown using KafkaServerStartable#awaitShutdown https://github.com/salesforce/kafka-junit/blob/v3.2.1/kafka-junit-core/src/main/java/com/salesforce/kafka/test/KafkaTestServer.java#L307

@vincentfree
Copy link
Author

The difference with your approach is that I would want a abrupt shutdown without any notice. This would ensure that my applications will use their resiliency functions to handle the problem, either by buffering using default fallbacks or differently.

For your problem, do you get any type of future back or are you able to set a callback?

@Crim
Copy link
Collaborator

Crim commented Apr 30, 2020

Both sound like valid use cases to test against. Unfortunately I don't believe Kafka exposes mechanisms to do what you're looking for.

Regarding async shutdowns, it may be possible to block until shutdown is complete similar to how we block waiting for startup here

@gquintana
Copy link
Contributor

My issue is probably different from @vincentfree .
I am roughly doing the same as this unit test:


I'll have to investigate where the difference lies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants