Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default number of Kernel dumps to 3 #20647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bmridul
Copy link
Contributor

@bmridul bmridul commented Oct 30, 2024

Why I did it

Currently there is no limit on the number of kernel dumps that will be captured in the system. This leads to excessive disk space usage if the system encounters many kernel crashes (e.g. as part of sonic-mgmt test suite runs).

According to the HLD, the default number of kdumps should be 3. However the fix is missing in code.

https://github.com/sonic-net/SONiC/blob/master/doc/kdump/SONiC-kdump.md#config-kdump-num_dumps-number

This PR is providing the fix.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Set the number of kernel dumps to 3 in /etc/default/kdump-tools

How to verify it

UT Log:
root@sonic:/home/cisco# show reboot h
Name Cause Time User Comment


2024_08_26_19_35_54 Kernel Panic Mon Aug 26 07:32:18 PM UTC 2024 N/A N/A
2024_08_26_19_29_47 Kernel Panic Mon Aug 26 07:26:16 PM UTC 2024 N/A N/A
2024_08_26_19_04_03 Kernel Panic Mon Aug 26 07:00:36 PM UTC 2024 N/A N/A
2024_08_26_18_54_39 Kernel Panic Mon Aug 26 06:51:35 PM UTC 2024 N/A N/A
2024_08_26_18_43_13 reboot Mon Aug 26 06:36:53 PM UTC 2024 cisco N/A
...
root@sonic:/home/cisco# show kdump files
Kernel core dump files Kernel dmesg files


/var/crash/202408261932/kdump.202408261932 /var/crash/202408261932/dmesg.202408261932
/var/crash/202408261926/kdump.202408261926 /var/crash/202408261926/dmesg.202408261926
/var/crash/202408261900/kdump.202408261900 /var/crash/202408261900/dmesg.202408261900
root@sonic:/home/cisco# ls /var/crash/
202408261900 202408261926 202408261932 kdump_lock kexec_cmd

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

  • 202405

Description for the changelog

Set the number of kernel dumps to 3 in /etc/default/kdump-tools

Link to config_db schema for YANG module changes

N/A

A picture of a cute animal (not mandatory but encouraged)

@bmridul bmridul marked this pull request as ready for review October 30, 2024 07:31
@bmridul bmridul requested a review from lguohan as a code owner October 30, 2024 07:31
@bmridul
Copy link
Contributor Author

bmridul commented Oct 30, 2024

@prgeor , Pls review

@bmridul
Copy link
Contributor Author

bmridul commented Oct 30, 2024

@abdosi , Pls check

@abdosi abdosi self-requested a review November 1, 2024 00:18
@abdosi
Copy link
Contributor

abdosi commented Nov 1, 2024

@saiarcot895 : can you please review this.

Copy link
Contributor

@saiarcot895 saiarcot895 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be needed. On a device where kdump is disabled, running sudo config kdump enable modifies /etc/default/kdump-tools and sets KDUMP_NUM_DUMPS to 3.

@bmridul
Copy link
Contributor Author

bmridul commented Nov 4, 2024

This doesn't appear to be needed. On a device where kdump is disabled, running sudo config kdump enable modifies /etc/default/kdump-tools and sets KDUMP_NUM_DUMPS to 3.

The above assumes that the kernel dump is explicitly enabled by CLI. We have enabled kdump by default in this PR for Cisco platforms.
#16224
With that the KDUMP_NUM_DUMPS is not set as default in /etc/default/kdump-tools

@bmridul
Copy link
Contributor Author

bmridul commented Nov 13, 2024

@saiarcot895 Pls check the response above.

@saiarcot895
Copy link
Contributor

@bmridul That's because hostcfgd is not applying the changes from the default state to the runtime configuration. Please modify hostcfgd to add proper support for this.

@bmridul
Copy link
Contributor Author

bmridul commented Nov 14, 2024

@bmridul That's because hostcfgd is not applying the changes from the default state to the runtime configuration. Please modify hostcfgd to add proper support for this.

Ack. I will check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants