Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] empty kernel logs on MAX_WAIT_FW_LOADING failures #1112

Closed
marc-hb opened this issue Oct 11, 2023 · 8 comments
Closed

[BUG] empty kernel logs on MAX_WAIT_FW_LOADING failures #1112

marc-hb opened this issue Oct 11, 2023 · 8 comments
Assignees
Labels
P0 urgent Call for attention for urgent issues P1 Blocker bugs or important features type:bug Something doesn't work as expected

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 11, 2023

Describe the bug

#1059 added the ability to wait for the firmware to be loaded, which solved a number of issues, including some i915 timeout issues.

However sof-test does not collect kernel logs when this fails. This is very problematic because the only way to know why the firmware is not loaded is to look at kernel logs.

To Reproduce

  • sudo rm -rf sof-tests/logs/* # do not delete the logs/ directory itself
  • sof-test/tools/kmod/sof_remove.sh
  • Run any test.
  • wc -l logs///dmesg.txt

=> dmesg.txt is empty

Expected behavior

Kernel logs are collected.

Detail Info

  1. Branch name and commit hash of the 3 repositories: sof (firmware/topology), linux (kernel driver) and sof-test (test case)
    • Kernel: any
    • SOF: any
    • SOF-TEST: e3d7664
  2. Test report ID (if you find it from test report)
  3. Test DUT Model (or a brief discribtion about the device)
    • MODEL: any
  4. Test TPLG
    • TPLG: any
  5. Test case (what test script and how you run it)
    • TESTCASE: any

cc:

@marc-hb marc-hb added the type:bug Something doesn't work as expected label Oct 11, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Dec 5, 2023

@marc-hb marc-hb added the P1 Blocker bugs or important features label Dec 5, 2023
@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 21, 2024

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 9, 2024

I just noticed #1036 by chance, possibly a duplicate

marc-hb added a commit to marc-hb/sof-test that referenced this issue May 1, 2024
Stop unconditionally imposing start_test() to everything sourcing lib.sh

This makes it possible to source lib.sh without running anything which
can be useful for debugging.

For now this should be a no-op for all tests except two files in `tools/` which
were never really "tests" in the first place:

- tools/kmod/sof_remove.sh
- tools/sof-kernel-log-check.sh

In these, start_test() never made sense it never really run thanks the
(awkward) "is_subtest()" escape.

There are other tests where start_test() should not be invoked either,
most notably test-case/verify-kernel-boot-log.sh in order to fix
thesofproject#1112 but this is a huge
commit already; one change at a time.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this issue May 1, 2024
Stop unconditionally imposing start_test() to everything sourcing lib.sh

This makes it possible to source lib.sh without running anything which
can be useful for debugging.

For now this should be a no-op for all tests except two files in `tools/` which
were never really "tests" in the first place:

- tools/kmod/sof_remove.sh
- tools/sof-kernel-log-check.sh

In these, start_test() never made sense it never really run thanks the
(awkward) "is_subtest()" escape.

There are other tests where start_test() should not be invoked either,
most notably test-case/verify-kernel-boot-log.sh in order to fix
thesofproject#1112 but this is a huge
commit already; one change at a time.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb marc-hb closed this as completed in 7c2ede0 May 6, 2024
@marc-hb marc-hb reopened this May 6, 2024
marc-hb added a commit to marc-hb/sof-test that referenced this issue May 8, 2024
Provide a new, boot_logs.txt file no matter what happens. Note that file
now includes user-space logs, not just kernel logs.

Also: fix bug where script times out and does not run when the SOF
firmware is not loaded: no need to define NO_POLL_FW_LOADING anymore.

The trick is to stop calling `start_test()`. verify-kernel-boot-log.sh
is not an audio test!

Fixes sof-test issues thesofproject#1036 and thesofproject#1112, find more details there.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented May 8, 2024

Fix submitted: #1191

EDIT: no, this was only for #1036 and verify-kernel-boot-log.sh. Need to test other tests.

marc-hb added a commit that referenced this issue May 9, 2024
Provide a new, boot_logs.txt file no matter what happens. Note that file
now includes user-space logs, not just kernel logs.

Also: fix bug where script times out and does not run when the SOF
firmware is not loaded: no need to define NO_POLL_FW_LOADING anymore.

The trick is to stop calling `start_test()`. verify-kernel-boot-log.sh
is not an audio test!

Fixes sof-test issues #1036 and #1112, find more details there.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented May 16, 2024

Lucky us we just got a complete, real-world sample case here where the firmware did not load (topology mismatch)

https://sof-ci.01.org/linuxpr/PR4995/build3008/devicetest/index.html

The recent, verify-kernel-boot-log.sh fixed in #1191 worked great.

The other, consecutive tests were not too bad. Only 2 CI TIMEOUTS, all other tests timed out after MAX_WAIT_FW_LOADING.

The lack of kernel logs in consecutive tests is expected after all: nothing new to show.

marc-hb added a commit to marc-hb/sof-test that referenced this issue May 16, 2024
i915 does not hang MTL any more, this has been fixed a long time ago.

On the other hand, 70s is a very long time to wait when the firmware
fails to load for some reason; this is wasting precious cycles. See
recent example in:
  https://sof-ci.01.org/linuxpr/PR4995/build3008/devicetest/index.html
  thesofproject#1112 (comment)

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented May 16, 2024

Only 2 CI TIMEOUTS, all other tests timed out after MAX_WAIT_FW_LOADING.

That's only because MTL still has an unreasonably long MAX_WAIT_FW_LOADING, otherwise there would have not been any CI timeout.

Fixed in:

marc-hb added a commit that referenced this issue May 16, 2024
i915 does not hang MTL any more, this has been fixed a long time ago.

On the other hand, 70s is a very long time to wait when the firmware
fails to load for some reason; this is wasting precious cycles. See
recent example in:
  https://sof-ci.01.org/linuxpr/PR4995/build3008/devicetest/index.html
  #1112 (comment)

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented May 16, 2024

#1196 merged, good enough.

@marc-hb marc-hb closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 urgent Call for attention for urgent issues P1 Blocker bugs or important features type:bug Something doesn't work as expected
Projects
None yet
Development

No branches or pull requests

1 participant