-
Notifications
You must be signed in to change notification settings - Fork 15
Catch flaky tests (go) #94
Comments
Very interesting idea! Alternatively, we might also get away with a |
Further problem: Related: https://stackoverflow.com/questions/36938520/use-go-test-to-list-all-tests-case/44626375. |
go test -list '.*' works, but not for sub-tests.
|
You can't list sub-tests because that implies running the tests themselves, i.e. arbitrary code. If you really want a full list, I think the best way might be to use There isn't a "just run the tests up until they run sub-tests and then stop" because... halting problem :) How do you know when a test is done spawning sub-tests? As for the goal of this issue, I agree it would be nice to rely more on CI to catch flakes. But I'm less convinced that we can be clever about this. For example: What if a test was added months ago, and today it got modified to be flaky? What if a test hasn't been touched at all in months, but updating a Go version made it flaky? Or if the non-test code that it tests became flaky? So I don't think we can safely determine what tests we need to hammer. Why not hammer all of them? After all, a project that gets regular pushes and merges is already running all tests easily hundreds of times per day. GitHub Actions for open source only limits the number of concurrent jobs, not the total amount of resources used over a month. We could set up something like Another solution is to do nothing, and assume that flakes will pop up with time. File issues and investigate as they come. That's what I do personally, and it works pretty well. If a flake is so rare that I never see it without CI hammering my tests, does the flake really matter at all? From experience, I should also note that it's somewhat common for |
I'm mostly concerned about adding new flaky tests. Specifically, ones with strong timing dependence. Once we merge, we'll naturally catch flakes as we go. The goal here was to catch them before we merge so we're less likely to run into an "oh well, we merged a flaky test, better go fix it later" issue. But this just may not be worth fixing (given the complexity and the fact that we can't catch everything). |
As a first pass, I would suggest that this be solved with engineering discipline rather than tooling (but tooling later can certainly help). If a PR merges a test that's soon discovered to be flaky, revert the PR. No questions asked. Note I say revert the PR rather than just disable the test in order to avoid the accumulation of tricky but untested code. |
While code coverage tools are a valuable tool during development, having code coverage aggressively enforced in CI is not a good idea: Removing a bunch of well-tested, but ultimately unnecessary code from a project would be regarded as an improvement by all developers, however, a code coverage tool would only see a drop in coverage and fail the status check on the PR. We can use "non-blocking status checks" config from the Codecov manual: https://docs.codecov.io/docs/common-recipe-list#set-non-blocking-status-checks. What we want the code coverage tool to do is report the changes, but to not affect the PR status. See here for a real-world example how this will look like: marten-seemann/codecov-test#5
We'll be aiming to make the existing tooling for flaky tests detection easier to reuse across PL as part of pl-strflt/ipdx#89. I'm going to close this issue now but linking it because the discussion here will be really helpful. |
On PR:
This is very much not an easy thing to implement, but it should really help guard us from checking in new flaky tests by stress-testing new tests.
The text was updated successfully, but these errors were encountered: