Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky tests with 'await' #3972

Merged
merged 7 commits into from
Oct 17, 2024
Merged

Fix flaky tests with 'await' #3972

merged 7 commits into from
Oct 17, 2024

Conversation

atakavci
Copy link
Contributor

No description provided.

@atakavci atakavci marked this pull request as draft September 27, 2024 13:07
@atakavci atakavci changed the title fix flaky test - set and get dummy key Fix flaky test - set and get dummy key Sep 30, 2024
@atakavci atakavci marked this pull request as ready for review September 30, 2024 07:04
Copy link
Collaborator

@sazzad16 sazzad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 extra commands seems many. Isn't 1 enough?

@atakavci
Copy link
Contributor Author

@sazzad16 i agree with you it should have been ok with 1. Tried with 1 in the past which still get fails intermittently. Just want to get rid of this flakiness once and for all.
TBH looking for another approach (not a Thread.sleep) but couldnt come up with a clean one.

@sazzad16
Copy link
Collaborator

Tried with 1 in the past which still get fails intermittently.

@atakavci In that case, I'd take the Thread.sleep(); at least those would be easier to document and understand 😓

@atakavci
Copy link
Contributor Author

atakavci commented Sep 30, 2024

@sazzad16 i'm strongly against to use of sleep in unit tests. Since its not sensitive and responsive to the environment like resource utilization, latencies etc. there will be always an unpredictable requirement for how long we need to wait.
Is it good enough to add some comments and signal the significance of these dummy operation to the future maintainer ?

@sazzad16
Copy link
Collaborator

@atakavci

there will be always an unpredictable requirement for how long we need to wait.

Well, same thing can be said about the number of commands you're executing. It's possible that if we run the tests way too many times, we would find the flaky tests are failing which would've passed if 4 or 5 commands were executed instead of 3.

i'm strongly against to use of sleep in unit tests.

I respect that. But in my humble understanding, by executing dummy commands you're just doing Thread.sleep() without writing Thread.sleep(). By executing 3 commands, you're just using a larger sleep time compared to the equivalent time of executing 1 command.

... to the future maintainer

This is what I indicated in my last comment. A line of of Thread.sleep() + comments would be much easier to understand than 3 lines of dummy command + comments.

@atakavci
Copy link
Contributor Author

atakavci commented Oct 1, 2024

@sazzad16 i see your point, let me try to explain mine;

Well, same thing can be said about the number of commands you're executing.

this is not correct. When you execute a similar task, you will get same kind of delay and they will always be proportional in average since it uses the same resources and impacted by same set of factors.
so when you do a 'read X' and then a 'read Y' you will get the same ratio of execution times in average.

you're just doing Thread.sleep() without writing Thread.sleep().

right in a sense, purpose is same but,, but sleep is utilization agnostic. It doesn't care if the execution is faster or slower then usual/expected. If we put 1000ms, it will be 1000ms always(And and its not even guaranteed that thread will be back exactly after 1000ms, furthermore consider we start to spread these sleep's here and there in tests, pipeline will start to degrade inevitably).
so, thread.sleep will not remain proportional to system throughput and speed as i described above,, yet command execution will.

i appricaite your stance with readability and really would like to improve it,,, without using sleep stuff.

@killergerbah
Copy link

Hello, I am seeing some of these tests fail in my own fork of jedis. In my case the simple test is failing.

Can this be resolved by implementing an assertion method that evaluates eventual consistency? For example, repeatedly querying the key with some small sleeps and a very large total timeout (e.g. 30 seconds). This could look something like:

assertEventuallyTrue(() -> cache.getSize() == 1)

@atakavci
Copy link
Contributor Author

atakavci commented Oct 1, 2024

hi @killergerbah, trying to figure out best options and i d like to have a way of addressing it without having to wait a fixed long period of time in case of a failure.

@atakavci atakavci requested a review from tishun October 2, 2024 12:50
@tishun
Copy link

tishun commented Oct 9, 2024

I am also not a fan of doing either Thread.sleep or sending some (arbitrary) number of commands.
Ideally we should be able to react to the environment change, based on some server push, but when we are testing the server push itself we are left with two choices (that I can think of):

  1. the solution that @killergerbah suggested, accepting the shortcoming that it might take us some time to fail (but this is ok as we should mostly NOT fail and would only slow down the tests in case of a regression)
  2. subscribe to another alternative mechanism, e.g. Redis keyspace notifications where this would work

What do you think?

@atakavci
Copy link
Contributor Author

subscribe to another alternative mechanism, e.g. Redis keyspace notifications where this would work

this requires another connection which leaves the test case possibly flaky again. If it is the original connection to subscribe, then it is no different then making a 'write' on same connection, which will block on a response read.

first choice remains feasible while it is just a 'sleep' in the heart of it.

there are options like tapping the port for incoming data or pending buffer, but wouldn't it be the perfect example of over engineering.
i believe having pipeline healthy again is getting more important than this topic. let me put a sleep cycle and we ll move fwd.

@ggivo
Copy link
Contributor

ggivo commented Oct 11, 2024

What do you think about using an already available lib for this? There is a nice one that integrates with assert, hamcrest.. like https://github.com/awaitility/awaitility

@tishun
Copy link

tishun commented Oct 11, 2024

What do you think about using an already available lib for this? There is a nice one that integrates with assert, hamcrest.. like https://github.com/awaitility/awaitility

Vote for that. It would only increase the dependencies in the test phase, so it does not have impact on the driver itself.

@atakavci
Copy link
Contributor Author

@sazzad16 looks like there are other flaky tests existing in pipeline.
could you point those you are aware of ??

Comment on lines +347 to +348
Awaitility.await().atMost(5, TimeUnit.SECONDS).pollInterval(50, TimeUnit.MILLISECONDS)
.untilAsserted(() -> assertEquals("aof", j.get("wait")));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Got to add it to Lettuce tests too at some point!

sazzad16
sazzad16 previously approved these changes Oct 14, 2024
@sazzad16 sazzad16 changed the title Fix flaky test - set and get dummy key Fix flaky tests with 'await' Oct 14, 2024
@uglide uglide changed the base branch from 5.2.0 to master October 15, 2024 11:34
@uglide uglide dismissed sazzad16’s stale review October 15, 2024 11:34

The base branch was changed.

Copy link

@tishun tishun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the best solution for now.

Comment on lines +347 to +348
Awaitility.await().atMost(5, TimeUnit.SECONDS).pollInterval(50, TimeUnit.MILLISECONDS)
.untilAsserted(() -> assertEquals("aof", j.get("wait")));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Got to add it to Lettuce tests too at some point!

@atakavci atakavci merged commit 498fee3 into redis:master Oct 17, 2024
5 checks passed
atakavci added a commit to atakavci/jedis that referenced this pull request Oct 17, 2024
* fix flaky test - dumy set-get to gain time

* adding same commands for 'simple'

* introdue tryAssert in CSC tests

* remove leftovers

* introduce awaitility for polling

* nit

* fix pipelining test
untilasserted
sazzad16 pushed a commit that referenced this pull request Oct 20, 2024
* fix flaky test - dumy set-get to gain time

* adding same commands for 'simple'

* introdue tryAssert in CSC tests

* remove leftovers

* introduce awaitility for polling

* nit

* fix pipelining test
untilasserted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants