Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FW reported error: 6 - Unknown error while processing the request when disabling a secondary core #8588

Closed
keqiaozhang opened this issue Dec 7, 2023 · 12 comments
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. P1 Blocker bugs or important features

Comments

@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Dec 7, 2023

Describe the bug
I did a pre-test on mtl-007-drop-stable branch and found this issue when recording for a long time. The reproduction rate is unclear, I'm doing the manual check now.

dmesg

[ 5728.131726] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0x12010000|0x0: GLB_DELETE_PIPELINE
[ 5728.133418] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x32000000|0x0: GLB_DELETE_PIPELINE
[ 5728.133449] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx done : 0x12010000|0x0: GLB_DELETE_PIPELINE
[ 5728.133457] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0x47000000|0x0: MOD_SET_DX [data size: 8]
[ 5728.535101] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx reply: 0x67000006|0x0: MOD_SET_DX
[ 5728.535114] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 6 - Unknown error while processing the request
[ 5728.535172] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x47000000|0x0
[ 5728.535194] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to disable target core: 2 for pipeline pipeline.11
[ 5728.535221] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets
[ 5728.535252] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -22
[ 5728.535420] kernel: snd_sof:sof_pcm_close: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: close stream 2 dir 1

To Reproduce
~/sof-test/test-case/check-capture.sh -d 500 -l 1 -r 1

Reproduction Rate
50%+

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: {development/sof-mtl-nocodec.tplg}
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {MTLP_RVP_NOCODEC}

dmesg.txt
mtrace.txt

@keqiaozhang keqiaozhang added bug Something isn't working as expected I2S Applies to I2S bus for codec connection multicore Issues observed when not only core#0 is used. MTL Applies to Meteor Lake platform labels Dec 7, 2023
@keqiaozhang
Copy link
Collaborator Author

This issue can be reproduced manually and the reproduction rate is about 50%+ with stress test case.

@mengdonglin mengdonglin added the P1 Blocker bugs or important features label Dec 7, 2023
@keqiaozhang keqiaozhang changed the title [BUG] [MTL-007] FW reported error: 6 - Unknown error while processing the request [BUG] FW reported error: 6 - Unknown error while processing the request Dec 8, 2023
@mengdonglin
Copy link
Collaborator

mengdonglin commented Dec 8, 2023

This issue seems to be a common issue with IMR context save DISABLED + multicore in latest code base.

This issue is not only on mtl-007. It can also be reproduced on sof main branch if we disable IMR context save:
https://sof-ci.ostc.intel.com/#/result/planresultdetail/35669
case: simultaneous-playback-capture-200.sh

2023-12-08 01:33:13 UTC [ERROR] Caught kernel log error ===========================>> 
[19721.019217] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: FW reported error: 6 - Unknown error while processing the request 
[19721.019353] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc error for msg 0x47000000|0x0 
[19721.019372] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: failed to disable target core: 1 for pipeline pipeline.9 
[19721.019385] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets 
[19721.019406] kernel: sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -22

@kv2019i
Copy link
Collaborator

kv2019i commented Dec 8, 2023

The problem seems to relate to multicore scenarios and error in powering down a secondary core. In the original mtrace:

[   93.002046] <err> zephyr: cpu_disable_core: core 2 did not enter idle state
[   93.002073] <err> ipc: ipc4_module_process_dx: failed to disable core 2
[   93.002100] <err> ipc: ipc_cmd: ipc4: MODULE_MSG failed with err 6

In a more recent case (Intel test plan 35669):

[   35.326876] <inf> ipc: ipc_cmd: rx	: 0x47000000|0x0
[   35.726888] <err> zephyr: cpu_disable_core: core 1 did not enter idle state
[   35.726891] <err> ipc: ipc4_module_process_dx: failed to disable core 1
[   35.726895] <err> ipc: ipc_cmd: ipc4: MODULE_MSG failed with err 6Terminated

@keqiaozhang
Copy link
Collaborator Author

Observed this issue on LNL-NOCODEC when testing multiple-pause-resume-50. Inner test ID:35712.

@wszypelt
Copy link

wszypelt commented Dec 8, 2023

@keqiaozhang does the problem also appear on mtl-006?

@mengdonglin
Copy link
Collaborator

@keqiaozhang does the problem also appear on mtl-006?

@wszypelt we didn't observe this issue on mtl-006

@fredoh9 fredoh9 added the LNL Applies to Lunar Lake platform label Dec 13, 2023
@mengdonglin mengdonglin changed the title [BUG] FW reported error: 6 - Unknown error while processing the request [BUG] FW reported error: 6 - Unknown error while processing the request when disabling a secondary core Dec 17, 2023
@lgirdwood
Copy link
Member

@abonislawski @softwarecki any insights here ? Anything we can check in SW ?

@keqiaozhang
Copy link
Collaborator Author

This issue also can be reproduced on SOF main branch. Inner test ID:36158.
I also did a check on latest mtl-007-drop-stable/6d3061c. The reproduction rate is much lower, but it can still be reproduced with multiple-pipeline related cases.

@RanderWang
Copy link
Collaborator

This is an old bug and can be reproduced two month ago although it is rare. I debug the issue and found that: the secondary core is in idle status with "waiti 0". Primary core calls arch_sched_ipi to wake up it but failed. Idc interrupt is not triggered on secondary core

BTW I did stress test on TGL and didn't found this issue

@abonislawski
Copy link
Member

I was also debugging this one, main problem is very low reproduction rate, in the end primary core will fail on timeout because secondary core stays in idle didnt enter power-down flow.
In my tests disabling memory retention also lowered reproduction rate
(Sorry, I commented but didnt click Comment before holiday break..)

@jsarha
Copy link
Contributor

jsarha commented Jan 18, 2024

I have hit this a few times on (Onecloud) MTL RVP while debugging #8642. E.g. the bug also happens when running multiple-pipeline test.

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 22, 2024

Now fixed with latest Zephyr update #8968

@kv2019i kv2019i closed this as completed Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected I2S Applies to I2S bus for codec connection LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

9 participants