Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[common] Introduce testlogger as a workaround of poor lifecycle #1398

Merged
merged 9 commits into from
Nov 11, 2024

Conversation

3vilhamster
Copy link
Contributor

@3vilhamster 3vilhamster commented Nov 8, 2024

What changed?
I've copied over the workaround racy logs in tests from the backend.
We have the same issues with the client who cannot establish a clear stop for internal processes at the first stop. This is not an issue for real workflow usage but can lead to test failures due to racy log messages after the test ends.

Sidenote:
Changed the approach from using atomic to RWMutex, since this situation requires semantic:

  1. We can have concurrent log.Writes - so RLock is not stopping them.
  2. We must wait for all writes to go on Cleanup and then acquire the lock
  3. All writes after the update should go to fallback.

Why?
Improving test stability

How did you test it?
Run unit tests a few hundred times.
This is a bit tricky, since the TestLoggerShouldNotFailIfLoggedLate requires rerun, but I've done for i in {1..100}; do go test . --count 1; done with race and without.

Potential risks
Some tests could be flaky, though they were failing before the fix.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.91%. Comparing base (7a3beaa) to head (e209ff6).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
internal/common/testlogger/testlogger.go 94.56% 4 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
internal/common/convert.go 100.00% <100.00%> (ø)
internal/common/testlogger/testlogger.go 94.56% <94.56%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ec4386...e209ff6. Read the comment docs.

Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though I don't see the reason for the added mutex.
If something seemed to be flaky without it, I can pretty much guarantee that it didn't fix it, so there's something else that needs fixing instead.

@Groxx
Copy link
Member

Groxx commented Nov 8, 2024

From some more reading and thinking and IRL chat: I'm pretty sure the mutex doesn't change anything, but it's not risky or anything. So yeah, still approved, merge any time.

Now that there's a counter-example showing this logger isn't reliable, I think it's pretty clear it's just that the atomics aren't ensuring "fallback logger is used after test completes". it currently allows: pausing after checking the atomic is not-completed -> test completes -> resuming and using the T logger.

But I can make a PR for that if you want. The mutex here shouldn't make anything worse, and a mutex will be needed to fix the "atomics are not enough" issue anyway.

@3vilhamster 3vilhamster merged commit b51e891 into cadence-workflow:master Nov 11, 2024
13 checks passed
@3vilhamster 3vilhamster deleted the introduce-testlogger branch November 11, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants