Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-sof-logger.sh: fix no-log-file failures due to runtime Power Management #1109

Merged

Conversation

kv2019i
Copy link
Contributor

@kv2019i kv2019i commented Oct 4, 2023

The logger test case has a design limitation in that it assumes the SOF DSP to remain powered up between driver reload and start of the logger. This creates a possibility for a race with runtime PM (RPM).

If the DSP goes to runtime suspend during this window, it is not guaranteed that the SOF driver will provide any logs. Some logging mechanisms save the logs that were emitted before suspend (sof-logger/IPC3), but some not (cavstool/IPC3 and mtrace/IPC4).

To limit the window , add a simple ping of all PCM nodes just before logger process is started. This is a crude mechanism but something that works with all variants of SOF. End result is that DSP will be resumed to D0 just before the logger test starts.

The logger test case has a design limitation in that it
assumes the SOF DSP to remain powered up between driver reload
and start of the logger. This creates a possibility for a race
with runtime PM (RPM).

If the DSP goes to runtime suspend during this window, it is not
guaranteed that the SOF driver will provide any logs. Some logging
mechanisms save the logs that were emitted before suspend
(sof-logger/IPC3), but some not (cavstool/IPC3 and mtrace/IPC4).

To limit the window , add a simple ping of all PCM nodes just
before logger process is started. This is a crude mechanism but
something that works with all variants of SOF. End result is that
DSP will be resumed to D0 just before the logger test starts.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i requested a review from a team as a code owner October 4, 2023 11:47
@kv2019i
Copy link
Contributor Author

kv2019i commented Oct 4, 2023

Only system-PM errors in https://sof-ci.01.org/softestpr/PR1109/build763/devicetest/index.html , https://sof-ci.01.org/softestpr/PR1109/build762/devicetest/index.html and https://sof-ci.01.org/softestpr/PR1109/build761/devicetest/index.html . And no cases of empty mtrace.

marc-hb
marc-hb previously approved these changes Oct 4, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 4, 2023

We don't run check-sof-logger in PR testing (like https://sof-ci.01.org/softestpr/PR1109/build761/devicetest/index.html), we never have - cause it was failing all the time. We only ever ran it in daily tests.

@fredoh9 could you add them to the PR test plan? Then we can SOFCI TEST this and merge.

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 4, 2023

I added check-sof-logger.sh to mtl pr test.

And I manually ran check-sof-logger.sh with this PR on MTLP_RVP_HDA, it still failed. But I ran the test automatically thre result looks good.
planresultdetail/32679

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 4, 2023

SOFCI TEST

fredoh9
fredoh9 previously approved these changes Oct 4, 2023
Copy link
Collaborator

@fredoh9 fredoh9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-test again after re-deploying daily build. Now constantly PASS. I did not deploy daily build before, probably the build installed wasn't stable. There is no way I can track the build installed before.
Looks good!

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 5, 2023

MTLP_RVP_HDA failed in PR test. Interestingly I tried on MTLP_RVP_HDA when I saw the failure with this PR. Apparently the result get better but not complete solution.

@kv2019i
Copy link
Contributor Author

kv2019i commented Oct 5, 2023

@fredoh9 I think this is a different problem:

Polling 'sudo test -e /proc/asound/card0/id' every 1s for 70s
Completed 'sudo test -e /proc/asound/card0/id' after 0s and 1 attempts
2023-10-04 23:00:14 UTC [REMOTE_INFO] Found /proc/asound/card0/id.
[...]
  File "/usr/local/bin/mtrace-reader.py", line 20, in 
    fd = os.open(MTRACE_FILE, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/sof/mtrace/core0'

I checked the DUT and there's the 60sec timeout due to missing i915 driver, and there's a non-SOF driver registered as card0, so the test case logic goes wrong.

We can try to make the test case logic more robust, but I think this is a different problem.

The check for DSP boot assumes SOF is always card0 in the system,
and this leads to failure.

Fix the issue by using the existing sof_firmware_boot_complete()
check to wait for DSP boot to successfully complete.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i dismissed stale reviews from fredoh9 and marc-hb via 4bb1acb October 5, 2023 18:02
@kv2019i
Copy link
Contributor Author

kv2019i commented Oct 5, 2023

Ok, @fredoh9 , added a second patch to this PR to fix the issue that affects this specific DUT. The check should now pass even if some non-SOF ALSA driver is loaded as card0 (like is the case on this DUT).

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 5, 2023

Thank you a lot! I will check again in this afternoon!

marc-hb
marc-hb previously approved these changes Oct 5, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 5, 2023

Interesting, do we have another race?
https://sof-ci.01.org/softestpr/PR1109/build768/devicetest/index.html

SKIP    	snd_sof_imx8m 	(not in tree)
MODPROBE	snd_usb_audio		 
2023-10-05 18:15:08 UTC [REMOTE_INFO] Polling , waiting for DSP boot...
Polling 'sof_firmware_boot_complete --since=@1696529699' every 1s for 3s
[  232.064597] kernel: snd_sof:snd_sof_run_firmware: sof-audio-pci-intel-tgl 0000:00:1f.3: firmware boot complete
Completed 'sof_firmware_boot_complete --since=@1696529699' after 0s and 1 attempts
2023-10-05 18:15:08 UTC [REMOTE_INFO] DSP booted successfully.
2023-10-05 18:15:08 UTC [REMOTE_INFO] Trying to get Zephyr logs from mtrace with background /usr/local/bin/mtrace-reader.py ...
2023-10-05 18:15:08 UTC [REMOTE_COMMAND] sudo  /usr/local/bin/mtrace-reader.py >  /home/ubuntu/sof-test/logs/check-sof-logger/2023-10-05-18:15:03-9615/logger.etrace.txt  2>  /home/ubuntu/sof-test/logs/check-sof-logger/2023-10-05-18:15:03-9615/logger.etrace_stderr.txt
Traceback (most recent call last):
  File "/usr/local/bin/mtrace-reader.py", line 20, in 
    fd = os.open(MTRACE_FILE, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/sof/mtrace/core0'
2023-10-05 18:15:10 UTC [REMOTE_ERROR] timeout /usr/local/bin/mtrace-reader.py returned unexpected: 1
2023-10-05 18:15:10 UTC [REMOTE_INFO] Starting func_exit_handler(1)
2023-10-05 18:15:10 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2023-10-05 18:15:10 UTC [REMOTE_ERROR]  die()  @  /home/ubuntu/sof-test/case-lib/lib.sh
2023-10-05 18:15:10 UTC [REMOTE_ERROR]  run_loggers()  @  /home/ubuntu/sof-test/test-case/check-sof-logger.sh:166
2023-10-05 18:15:10 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-sof-logger.sh:271
2023-10-05 18:15:10 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-sof-logger.sh:364
2023-10-05 18:15:10 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2023-10-05 18:15:10 UTC [REMOTE_INFO] ktime=238 sof-test PID=3794: ending
2023-10-05 18:15:10 UTC [REMOTE_INFO] Test Result: FAIL!

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 5, 2023

I tried on MTLP_RVP_HDA,

2023-10-05 22:56:14 UTC [INFO] Trying to get Zephyr logs from mtrace with background /usr/local/bin/mtrace-reader.py ...
2023-10-05 22:56:14 UTC [COMMAND] sudo  /usr/local/bin/mtrace-reader.py >  /home/ubuntu/sof-test/logs/check-sof-logger/2023-10-05-22:56:08-15319/logger.etrace.txt  2>  /home/ubuntu/sof-test/logs/check-sof-logger/2023-10-05-22:56:08-15319/logger.etrace_stderr.txt
Traceback (most recent call last):
  File "/usr/local/bin/mtrace-reader.py", line 20, in <module>
    fd = os.open(MTRACE_FILE, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/sof/mtrace/core0'

And trigger sof-logger test plan also, Similar errors found.
planresultdetail/32724

lib.sh:sof_firmware_boot_complete() ensures DSP has booted, but it does
not guarantee SOF driver probe has completed. So it is still possible
some of the interfaces SOF driver exposes to user-space are not available
(including interfaces needed to run loggers).

Tracing is initialized before ALSA card registration in SOF diver (see
snd_sof_device_probe() in SOF driver), so by waiting for ALSA card to
show up in procfs, we can guarantee tracing related interfaces have been
initialized.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 6, 2023

check-sof-logger now all passed in https://sof-ci.01.org/softestpr/PR1109/build772/devicetest/index.html . By chance or design?

@marc-hb
Copy link
Collaborator

marc-hb commented Oct 6, 2023

SOFCI TEST

@fredoh9
Copy link
Collaborator

fredoh9 commented Oct 6, 2023

I re-ran sof-logger test, all passed!
planresultdetail/32767

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean code, easy to fix later if needed. Will hopefully help with current PRs, merging.


wait_for_sof_alsa_card()
{
if poll_wait_for 1 "$MAX_WAIT_FW_LOADING" sof_alsa_card_found ; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up PR please:

Suggested change
if poll_wait_for 1 "$MAX_WAIT_FW_LOADING" sof_alsa_card_found ; then
if poll_wait_for 1 "$1" sof_alsa_card_found ; then

@marc-hb marc-hb merged commit e3d7664 into thesofproject:main Oct 6, 2023
5 of 6 checks passed
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 9, 2023

mtrace still empty in 2 out of 6 devices in today's daily test run 32787

2023-10-08 12:29:48 UTC [REMOTE_ERROR] Log header not found in /home/ubuntu/sof-test/logs/check-sof-logger/2023-10-08-12:29:19-18463/logger.etrace.txt

The No such file or directory message is gone though!

@marc-hb marc-hb changed the title check-sof-logger.sh: fix no-log-file failures due to RPM check-sof-logger.sh: fix no-log-file failures due to runtime Power Management Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants