Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Failures on same Tests #915

Open
awiegel opened this issue Aug 5, 2024 · 10 comments
Open

Random Failures on same Tests #915

awiegel opened this issue Aug 5, 2024 · 10 comments

Comments

@awiegel
Copy link

awiegel commented Aug 5, 2024

We have three different ceedling projects that run perfectly fine without errors.
However, sometimes they fail (randomly).

We executed the tests over 1000 times, and around 1% of them failed.

Some of the errors were:

  • EXCEPTION: ShellExecutionException ==> 'Default Gcov Linker' (gcc) terminated with exit code [1] and output >> "/usr/bin/ld: ... undefined reference to ... -> the file with the definition is 100% referenced in the project.yml.
  • EXCEPTION: CeedlingException ==> Found no file 'abc.c' in search paths. However, a filename having different capitalization was found: '.../.../.../.../.../abc.c'. -> there is only one file with this name in the whole project.
  • A test assert fails and gives a different value than usual.

Occurred on the pre-release gem ceedling-0.32.0-2f246f1.

ceedling version

Ceedling => 0.32.0
CMock => 2.5.4
Unity => 2.6.0
CException => 1.3.4

Does anyone else also noticed their projects failing randomly?

@deltalejo
Copy link

Do you find the same behavior on latest pre-release version?

@mvandervoord
Copy link
Member

@awiegel -- In the project section of your project.yml file, there are likely two settings:

  :test_threads: 8
  :compile_threads: 8

If you set these to 1, do you still get the failures?

Similarly, if you run your tests without the gcov plugin, does it still produce failures?

These symptoms sound like the result of your file system not keeping up with the new threading. I had seen this on Windows with earlier pre-release versions. I haven't seen them with the latest releases, but that doesn't mean there aren't some creeping issues still to be found.

I apologize that you've run into this and hope we can uncover the source!

@mkarlesky
Copy link
Member

mkarlesky commented Aug 5, 2024

@awiegel First of all, thank you so much for hammering prereleases so hard! The 1% failure rate certainly sounds like classic nondeterministic behavior such as with threading.

At least one other community member has been using prerelease builds for large, complex, multi-platform test suite builds successfully. They were finding threading bugs early on (reported not through Github). Since then we had thought we had found and fixed those problems. Perhaps not! The build you are referencing is months more recent than all that work.

To pile on some other thoughts / questions:

  • The test assertion failure stands out as an especially curious problem to me. That sort of failure is typically associated with memory dereferencing issues. It might be a subtle bug in CMock or Unity. It might be that your test is comparing two values and was historically lucky on how memory references shook out. Maybe the updated tools are now disturbing memory layouts on rare occasion. Would you be able to share any of the source and test code around that problem? Feel free to anonymize it as appropriate.
  • The filename capitalization error is also curious. I actually wonder if that's not the problem at all, but something else that is triggering that validation logic and reporting. That area within Ceedling's code is complicated. Would you be able to share any further details, code, or configuration snippets related to this problem?
  • I'm quite unsure of what to think about the missing reference failure. The only request I have there is any more project details, snippets, etc. you can share?

As Mark suggested, please do let us know if cranking down threading to single threads makes a difference. That said, it sounds like it is a non-trivial thing to simply re-run thousands of builds with a changed configuration.

@awiegel
Copy link
Author

awiegel commented Aug 7, 2024

Thank you for the quick responses!

I've tested different things now:

  • Using the latest pre-release gem produces the same errors. Also, it introduced some deterministic failures, which I have to investigate further.
  • Replacing gcov with test produces the same errors.
  • Setting threading to 1 indeed fixes all three problems! However, without threading, the tests take around 3 times longer. Right now the best solution for us is to just retry the tests if such a nondeterministic failure occurs.

If it helps, the tests run on a docker ubuntu container which is executed on a windows pc.

Unfortunately, I cannot share any project data.

@Letme
Copy link

Letme commented Aug 7, 2024

If threading 1 solves the runtime problem then you have a bad setup/teardown for the tests as it means some shared memory is overwritten by each other. So your "virtualization" is not done correctly (you didn't write what CPU and stuff, so we cant really point to a better direction) and I would look into your general memory layout for the problems.

@mkarlesky
Copy link
Member

@awiegel Well, we're learning something here. I'm trying to think of what to ask since confidentiality is a hurdle here.

Could you explain the new deterministic failures with the latest prerelease you mentioned?

@mkarlesky
Copy link
Member

@awiegel A little progress update… Some of what you reported caused me to think about changes in how test runners are generated. And, in fact, the prerelease version you first referenced is only a week or two older than those changes. Threading behavior is hard, as we all know. I think I see some gaps in thread safety those runner generation changes may have opened. It's hard to say if what I have in mind is your problem, but I do think there's an opportunity to fortify some data structure threading protections. It may simply be that not enough people have used recent prerelease versions of Ceedling as intensely and with your specific configuration to have run into the same issue you are.

@mkarlesky
Copy link
Member

@awiegel The latest prerelease has additional threading protection. I am not sure what to expect. On the one hand I can't see any code paths that would have tripped on the lack of thread protection I just added. On the other hand, circumstantial evidence and my gut says what I changed may be the source of your inconsistent builds. Only time and your own testing will tell.

@awiegel
Copy link
Author

awiegel commented Sep 4, 2024

@mkarlesky A little testing update from my side. With the latest prerelease (1.0.0-3d9cd04), I still get the same errors.

  • The random compilation failures disappear when I set :compile_threads: to 1 (:test_threads: don't seem to affect the failures, so it works on :auto). So I guess there is still some error in the multithreading compilation process. Because I don't see how a bad test setup could provoke such random failures.

  • The random test assert failure occurs too rarely to really test it (1 out of 1000 times when running the whole test suite). Could be some bad test setup. At least it's the same test with the same assert that fails. However when I only execute this specific test, it never fails.

@mkarlesky
Copy link
Member

@awiegel Thank you for the followup. We've run some stress testing and have not yet triggered the problem. We're retooling to run better multi-threaded stress testing now. I know you are not able to share your code. However, could you share anything at all about your project and about the failing test? Are you using a lot of mocks? No mocks? Is your test suite exercising a great deal of memory operations? Do you have large test files with many, many test cases? Any complicated macros or conditional compilation scenarios? What is your build rigging (e.g. Jenkins, CircleCI, Github Action, etc.)? Are you capturing logs directly from Ceedling or capturing your Ceedling $stdout output as a log using your build system? Could you share an anonymized version of the failing test case? Anything that stands out to you as unique about your project might help us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants