Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Catch flaky tests (go) #94

Closed
Stebalien opened this issue May 17, 2021 · 7 comments
Closed

Catch flaky tests (go) #94

Stebalien opened this issue May 17, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@Stebalien
Copy link
Member

On PR:

  1. Fetch the test output from the latest test run on master to determine which tests already exist.
  2. Fetch the test output from the go-test run on the PR and diff it with master to determine new tests.
  3. Run those new tests for some period of time (consider using timing information from the PR's test run to figure out how many times the new tests should be run?).

This is very much not an easy thing to implement, but it should really help guard us from checking in new flaky tests by stress-testing new tests.

@marten-seemann
Copy link
Contributor

Very interesting idea!
I'm wondering if the go tool has any way to generate a list of test cases. @mvdan, any idea?

Alternatively, we might also get away with a bash grep "func Test" **/*_test.go | sort. The only complication I can see at this point is that grep is not module-aware, so we'd a way to limit this to test files within one module in the case of multi-module repos.

@marten-seemann
Copy link
Contributor

Further problem: grep won't work in the case of sub-tests.

Related: https://stackoverflow.com/questions/36938520/use-go-test-to-list-all-tests-case/44626375.

@marten-seemann
Copy link
Contributor

go test -list '.*'

works, but not for sub-tests.

-list regexp
List tests, benchmarks, or examples matching the regular expression.
No tests, benchmarks or examples will be run. This will only
list top-level tests. No subtest or subbenchmarks will be shown.

@marten-seemann marten-seemann added the enhancement New feature or request label May 17, 2021
@mvdan
Copy link
Contributor

mvdan commented May 17, 2021

You can't list sub-tests because that implies running the tests themselves, i.e. arbitrary code. If you really want a full list, I think the best way might be to use go test -json ./... and looking at all the result JSON objects.

There isn't a "just run the tests up until they run sub-tests and then stop" because... halting problem :) How do you know when a test is done spawning sub-tests?

As for the goal of this issue, I agree it would be nice to rely more on CI to catch flakes. But I'm less convinced that we can be clever about this. For example:

What if a test was added months ago, and today it got modified to be flaky?

What if a test hasn't been touched at all in months, but updating a Go version made it flaky? Or if the non-test code that it tests became flaky?

So I don't think we can safely determine what tests we need to hammer. Why not hammer all of them? After all, a project that gets regular pushes and merges is already running all tests easily hundreds of times per day. GitHub Actions for open source only limits the number of concurrent jobs, not the total amount of resources used over a month. We could set up something like go test -count=100 ./... to run as a cron or batch job to run when most of PL is idle, possibly around 06h UTC, since most of the EU would still be asleep and most of the US would be away from the keyboard or asleep.

Another solution is to do nothing, and assume that flakes will pop up with time. File issues and investigate as they come. That's what I do personally, and it works pretty well. If a flake is so rare that I never see it without CI hammering my tests, does the flake really matter at all?

From experience, I should also note that it's somewhat common for go test ./... to succeed, but go test -count=10 ./... to fail. It can be easy to write a test in a way that isn't idempotent or otherwise doesn't allow being run twice in a row within the same Go process. Imagine a test that listens on port 1234 but doesn't close it, or a test that assumes a global is empty and fills it. Those tests should be fixed, but they're not flaky tests and they would get in the way.

@Stebalien
Copy link
Member Author

I'm mostly concerned about adding new flaky tests. Specifically, ones with strong timing dependence.

Once we merge, we'll naturally catch flakes as we go. The goal here was to catch them before we merge so we're less likely to run into an "oh well, we merged a flaky test, better go fix it later" issue.

But this just may not be worth fixing (given the complexity and the fact that we can't catch everything).

@anorth
Copy link

anorth commented May 20, 2021

As a first pass, I would suggest that this be solved with engineering discipline rather than tooling (but tooling later can certainly help). If a PR merges a test that's soon discovered to be flaky, revert the PR. No questions asked.

Note I say revert the PR rather than just disable the test in order to avoid the accumulation of tricky but untested code.

mvdan pushed a commit that referenced this issue Jun 7, 2021
While code coverage tools are a valuable tool during development, having code coverage aggressively enforced in CI is not a good idea: Removing a bunch of well-tested, but ultimately unnecessary code from a project would be regarded as an improvement by all developers, however, a code coverage tool would only see a drop in coverage and fail the status check on the PR.

We can use "non-blocking status checks" config from the Codecov manual: https://docs.codecov.io/docs/common-recipe-list#set-non-blocking-status-checks.

What we want the code coverage tool to do is report the changes, but to not affect the PR status. See here for a real-world example how this will look like: marten-seemann/codecov-test#5
@galargh galargh moved this from New to Backlog in IP Productivity 🆙 Jan 13, 2022
@laurentsenta laurentsenta moved this from 🤔 Triage to 🥺 Backlog in InterPlanetary Developer Experience Jun 10, 2022
@galargh
Copy link
Contributor

galargh commented Aug 28, 2023

We'll be aiming to make the existing tooling for flaky tests detection easier to reuse across PL as part of pl-strflt/ipdx#89. I'm going to close this issue now but linking it because the discussion here will be really helpful.

@galargh galargh closed this as completed Aug 28, 2023
@github-project-automation github-project-automation bot moved this from 🥺 Backlog to 🥳 Done in InterPlanetary Developer Experience Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants