Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOS CPU stall #8744

Open
yichongt opened this issue Oct 14, 2024 · 0 comments
Open

SOS CPU stall #8744

yichongt opened this issue Oct 14, 2024 · 0 comments
Labels
status: new The issue status: new for creation

Comments

@yichongt
Copy link

Describe the bug
When SOS share vCPU with Waag, SOS may encounter kernel panic that one of its CPU stuck and stall for a long time, which will cause system reboot.

Platform
RPL-S 13700E

Codebase
Both 3.2 release and 3.3 release

Scenario
SOS share CPU with all Waag vCPU with out own_pcpu checked.

To Reproduce

  1. Boot Waag
  2. Run Passmark benchmark CPU test for several iteration

Expected behavior
Waag will not stuck during benchmarking

Additional context
SOS kernel demsg output in ACRN console:
[14078.492047] rcu: INFO: rcu_preempt self-detected stall on CPU
[14078.492220] rcu: 5-....: (12571 ticks this GP) idle=7b94/1/0x4000000000000000 softirq=678356/678356 fqs=4205
[14078.492047] rcu: INFO: rcu_preempt self-detected stall on CPU
[14078.492220] rcu: 5-....: (12571 ticks this GP) idle=7b94/1/0x4000000000000000 softirq=678356/678356 fqs=4205
[14078.492452] (t=21000 jiffies g=1086585 q=128 ncpus=6)
[14078.492455] CPU: PID: 315330 Comm: snap-confine Tainted: G U 6.1.80-acrn-service-vm-375513-g7159ad071be8 #1
[14078.492459] Hardware name: Default string Default string/Default string, BIOS 5.27 06/14/2023
[14078.492460] RIP: 0010:smp_call_function_many_cond+0xfd/0x2e0
[14078.492466] Code: d0 48 89 df e8 b4 d3 5a 00 39 05 4e de fc 01 76 b0 48 63 d0 49 8b 0c 24 48 03 0c d5 c0 e8 c6 94 8b 51 08 83 e2 01 74 0a f3 90 <8b> 51 08 83 e2 01 75 f6 83 c0 01 eb c1 9c 58 fa f6 c4 02 0f 85 8f
[14078.492468] RSP: 0018:ffffb2c680f43ba0 EFLAGS: 00000202
[14078.492471] RAX: 0000000000000002 RBX[14106.017626] watchdog: BUG: soft lockup - CPU#5 stuck for 49s! [snap-confine:315330]
[14106.017960] Kernel panic - not syncing: softlockup: hung tasks
[14106.018095] CPU: 5 PID: 315330 Comm: snap-confine Tainted: G U L 6.1.80-acrn-service-vm-375513-g7159ad071be8 #1
[14106.018344] Hardware name: Default string Default string/Default string, BIOS 5.27 06/14/2023
[14106.018533] Call Trace:
[14106.018594]
[14106.018646] dump_stack_lvl+0x49/0x62
[14106.018734] dump_stack+0x10/0x16
[14106.018815] panic+0x114/0x29a
[14106.018891] watchdog_timer_fn.cold.14+0xc/0x16
[14106.019000] ? softlockup_fn+0x30/0x30
[14106.019089] __hrtimer_run_queues+0xa5/0x2c0
[14106.019191] hrtimer_interrupt+0xf6/0x220
[14106.019286] __sysvec_apic_timer_interrupt+0x5f/0x110
[14106.019404] sysvec_apic_timer_interrupt+0x6f/0xa0
[14106.019517]
[14106.019570]
[14106.019624] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[14106.019744] RIP: 0010:smp_call_function_many_cond+0xfd/0x2e0
[14106.019875] Code: d0 48 89 df e8 b4 d3 5a 00 39 05 4e de fc 01 76 b0 48 63 d0 49 8b 0c 24 48 03 0c d5 c0 e8 c6 94 8b 51 08 83 e2 01 74 0a f3 90 <8b> 51 08 83 e2 01 75 f6 83 c0 01 eb c1 9c 58 fa f6 c4 02 0f 85 8f
[14106.020280] RSP: 0018:ffffb2c680f43ba0 EFLAGS: 00000202
[14106.020400] RAX: 0000000000000002 RBX: ffffa2ec4856b488 RCX: ffffa2ec484adb40
[14106.020561] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffa2ec4856b488
[14106.020722] RBP: ffffb2c680f43c10 R08: 0000000000000002 R09: ffffa2ec4856b490
[14106.020882] R10: ffffb2c680f43dc0 R11: 0000000000000000 R12: ffffa2ec4856b480
[14106.021042] R13: 0000000000000001 R14: 000000000002db40 R15: 0000000000000008
[14106.021191] ? __flush_tlb_all+0x30/0x30
[14106.021270] on_each_cpu_cond_mask+0x29/0x50
[14106.021354] flush_tlb_kernel_range+0x41/0xc0
[14106.021441] __purge_vmap_area_lazy+0xba/0x6e0
[14106.021529] ? purge_fragmented_blocks_allcpus+0x40/0x220
[14106.021632] _vm_unmap_aliases+0x116/0x150
[14106.021713] vm_unmap_aliases+0x19/0x20
[14106.021788] change_page_attr_set_clr+0xa0/0x290
[14106.021880] set_memory_ro+0x29/0x30
[14106.021953] bpf_prog_select_runtime+0x11e/0x130
[14106.022044] bpf_prepare_filter+0x541/0x5c0
[14106.022127] bpf_prog_create_from_user+0xc5/0x110
[14106.022220] ? hardlockup_detector_perf_cleanup+0xa0/0xa0
[14106.022324] do_seccomp+0x2c8/0xad0
[14106.022394] __x64_sys_seccomp+0x1a/0x20
[14106.022472] do_syscall_64+0x37/0x90
[14106.022544] entry_SYSCALL_64_after_hwframe+0x64/0xce
[14106.022642] RIP: 0033:0x7ffb6c51e88d
[14106.022713] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[14106.023050] RSP: 002b:00007fff4ec7cb68 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
[14106.023192] RAX: ffffffffffffffda RBX: 00007ffb6c7d36b0 RCX: 00007ffb6c51e88d
[14106.023325] RDX: 00007fff4ec7cba0 RSI: 0000000000000002 RDI: 0000000000000001
[14106.023459] RBP: 0000558781704430 R08: 00005587817121b0 R09: 00007fff4ec7cba0
[14106.023592] R10: 0000000000000c00 R11: 0000000000000246 R12: 00007fff4ec7cba0
[14106.023726] R13: 00007fff4ec7cba0 R14: 00005587817032a0 R15: 00007fff4ec7e958
[14106.023860]
[14108.160285] Shutting down cpus with NMI
[14108.189043] Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[14127.067252] Rebooting in 10 seconds..

Call trace and stuck CPU may be different, but kernel panic type is the same every time.

@yichongt yichongt added the status: new The issue status: new for creation label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: new The issue status: new for creation
Projects
None yet
Development

No branches or pull requests

1 participant