-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fw panic support for IPC4 #4369
Add fw panic support for IPC4 #4369
Conversation
b0cb9c2
to
4864d52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @RanderWang except for a minor nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @RanderWang except for a minor nit
reform the variable definition style in function sof_ipc4_intel_dump_telemetry_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool @RanderWang ! A few comments inline.
@kv2019i update this PR for exception debugfs node support. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, looks good to go!
@plbossart @bardliao @ujfalusi @ranj063 Could this be expedited? The FW part is now merged and it seems as a side-effect , the coredump info is not duplicated to mtrace anymore. If the review needs more time, let me know and we'll need to look at reverting the FW change until driver is ready. FYI @RanderWang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the directions. I thought there was a plan to use zephyr for coredumps, not it's back with the intel telemetry with weird definitions of target code.
Not sure where this is going @RanderWang @kv2019i
sound/soc/sof/intel/telemetry.h
Outdated
COREDUMP_TGT_X86_64, | ||
COREDUMP_TGT_ARM_CORTEX_M, | ||
COREDUMP_TGT_RISC_V, | ||
COREDUMP_TGT_XTENSA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What on earth is all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart This is interface of the coredump structure defined in Zephyr and adopted by SOF FW in current mainline.
We can drop this and only expose this via "exception" debugfs, but based on past experience, enabling kernel to parse the essential bits of the coredump directly to kernel dmesg is valuable for debug. Collecting the exception dumps via debugfs nodes is going to be harder to do in practise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this intel specific with non-intel CPUs listed?
Something does not add up.
Either this file should be moved as sound/soc/sof/ipc4-telemetry.h, or it's intel-specific and needs to be simplified.
Pick one :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart Thanks for your suggestion. Now I divide it into two parts : one in sof core layer for common part in zephyr and the other part is for intel specific. Other vendors can reuse the common part.
3e9c3a9
to
759a0ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RanderWang Uh, sorry, one late discovery in the first patch. Can you check? I think we need to fix this before merge. Otherwise re-reviewed the series and looks good.
452c8ca
3dbb0a4
to
452c8ca
Compare
@kv2019i updated to check max AR count. Thanks! |
include/sound/sof/xtensa.h
Outdated
@@ -15,6 +15,9 @@ | |||
* Architecture specific debug | |||
*/ | |||
|
|||
/* AR0 ~ AR15 */ | |||
#define XTENSA_CORE_MAX_AR_REGS_COUNT 16 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, sorry @RanderWang , your original code was correct.
I now realize xoops->plat_hdr is not directly read from the memory window, but rather parsed in kernel. On line 84 of intel/telemetry.c (you add in this patch), you already limit the AR count to 16. So this number can be trusted after all, it cannot ever be larger than 16 no matter what is in the telemetry contents.
So you can revert to the original version after all (which already had the reviews and was ready to go, apologies!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP , reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the last update is longer strictly necessary (my fault, there's a check in telemetry.c), the additional check doesn't hurt either and is valid check to be done in core/xtensa.c. So giving my +1 for this in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the extra check made things better, for me it is quite the opposite.
sound/soc/sof/xtensa/core.c
Outdated
|
||
dev_printk(level, sdev->dev, "AR registers:\n"); | ||
/* the number of ar registers is a multiple of 4 */ | ||
for (i = 0; i < xoops->plat_hdr.numaregs && i < XTENSA_CORE_MAX_AR_REGS_COUNT; i += 4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if the xoops->plat_hdr.numaregs
is invalid then likely the whole dump is corrupted..
Printing a known corrupted dump have questionable benefit.
and the line can be shortened if it is really desired to print even if the numregs is out of range:
u32 num_regs;
...
num_regs = min(xoops->plat_hdr.numaregs, XTENSA_CORE_MAX_AR_REGS_COUNT);
for (i = 0; i < num_regs; i += 4) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi thanks, also suggested by Kai. Now revert it
On Xtensa platform ar0 is for caller address and ar1 is for stack address. The ar register dump can be used to rebuild call stack with FW elf file by debug tools. Signed-off-by: Rander Wang <rander.wang@intel.com>
The macro definitions of debug slot can be used by gdb, telemetry and mtrace log, so move these definitions to header.h from mtrace. Then these macro definitions can be shared Signed-off-by: Rander Wang <rander.wang@intel.com>
Currently IPC4 supports GDB slot, telemetry slot and debug slot. This helper function will be used to get the slot offset in debug windows for further processing. Signed-off-by: Rander Wang <rander.wang@intel.com>
Core dump includes hardware platform information, cpu registers and exception call stack. FW saves core dump to telemetry slot in shared memory window for host in the event of FW exception. This patch creates exception node in debugfs for user to dump telemetry data. Signed-off-by: Rander Wang <rander.wang@intel.com>
The exception node is created when FW is ready and clear to zero when FW post boot. Signed-off-by: Rander Wang <rander.wang@intel.com>
Telemetry data is decoded based on intel xtensa design and printed in kernel log by sof debug framework. Signed-off-by: Rander Wang <rander.wang@intel.com>
Dump dsp stack with sof_ipc4_intel_dump_telemetry_state since dsp stack information is included by telemetry data. This also supports lnl since the mtl code is reused. Signed-off-by: Rander Wang <rander.wang@intel.com>
Get the FW panic information from telemetry data in memory window and dump it to kernel log. The old platforms before CAVS 2.5+ don't support it since there is no support in FW for them. Signed-off-by: Rander Wang <rander.wang@intel.com>
Driver will receive exception IPC message and process it by snd_sof_dsp_panic. Signed-off-by: Rander Wang <rander.wang@intel.com>
452c8ca
to
f804a68
Compare
@plbossart @ujfalusi @kv2019i @lyakh did null change after getting your approve. |
It gets fw panic information from memory window and reuse the debug framework to dump information to kernel log.
The FW support is: zephyrproject-rtos/zephyr#57329
[43991.605559] sof-audio-pci-intel-tgl 0000:00:1f.3: Create widget copier.host.1.1 instance 0 - pipe 1 - core 0
[43991.605563] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx : 0x40000004|0x15: MOD_INIT_INSTANCE [data size: 84]
[43991.607399] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[43991.607413] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[43991.607437] sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x80000000 intlctl 0x40000000 rirb 00
[43991.607444] sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x80000000 adspis 0x00000001
[43991.607472] sof-audio-pci-intel-tgl 0000:00:1f.3: Host IPC initiator: 0xc0000004|0x15|0x0, target: 0x9b0a0000|0x0|0x0, ctl: 0x3
[43991.607487] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[43991.607492] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[43991.607496] sof-audio-pci-intel-tgl 0000:00:1f.3: FW Panic
[43991.607501] sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[43991.607512] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[43991.607610] sof-audio-pci-intel-tgl 0000:00:1f.3: Core exception record version 0x2, soc: 3
[43991.607670] sof-audio-pci-intel-tgl 0000:00:1f.3: FW is built with XCC toolchain
[43991.607687] sof-audio-pci-intel-tgl 0000:00:1f.3: error: DSP Firmware Oops
[43991.607698] sof-audio-pci-intel-tgl 0000:00:1f.3: error: Exception Cause: IntegerDivideByZeroCause, QUOS, QUOU, REMS, or REMU divisor operand is zero
[43991.607708] sof-audio-pci-intel-tgl 0000:00:1f.3: EXCCAUSE 0x00000006 EXCVADDR 0x00000000 PS 0x00060220 SAR 0x0000000f
[43991.607717] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC1 0xbe0391da EPC2 0x00000000 EPC3 0x00000000 EPC4 0x00000000
[43991.607724] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC5 0x00000000 EPC6 0x00000000 EPC7 0x00000000 DEPC 0x00000000
[43991.607730] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS2 0x00000000 EPS3 0x00000000 EPS4 0x00000000 EPS5 0x00000000
[43991.607735] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS6 0x00000000 EPS7 0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[43991.607741] sof-audio-pci-intel-tgl 0000:00:1f.3: stack dump from 0x00000000
[43991.607746] sof-audio-pci-intel-tgl 0000:00:1f.3: dump ar register number 16
[43991.607750] sof-audio-pci-intel-tgl 0000:00:1f.3: [00]: 0xbe026aa6 0xbe0945c0 0x9e0a0b40 0xbe094610
[43991.607757] sof-audio-pci-intel-tgl 0000:00:1f.3: [04]: 0xbe006000 0xbe0a0bc0 0x9e0a0b00 0x9e084ec8
[43991.607762] sof-audio-pci-intel-tgl 0000:00:1f.3: [08]: 0xbe0391c3 0x000001 0x040000 0x000000
[43991.607769] sof-audio-pci-intel-tgl 0000:00:1f.3: [12]: 0xbe0391c3 0x000001 0x040000 0x000000
[43991.607801] sof-audio-pci-intel-tgl 0000:00:1f.3: extended rom status: 0x5 0x0 0x0 0x0 0x0 0x0 0x0 0x1
[43991.607806] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------