-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[armhf][Nokia-7215] Enable Watchdog service #16612
Conversation
platform/marvell-armhf/sonic-platform-nokia/7215/sonic_platform/watchdog.py
Show resolved
Hide resolved
6f98e0f
to
388853c
Compare
388853c
to
0b7e707
Compare
@prgeor @yxieca the PR LGTM. Can we merge it? @Pavan-Nokia can you help update ADO number "24981141" to your PR description? Thanks |
@Pavan-Nokia can we split this into two PRs?
|
KEEPALIVE=55 | ||
sonic_logger = logger.Logger('Watchdog') | ||
sonic_logger.set_min_log_priority_info() | ||
time.sleep(60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pavan-Nokia why this sleep needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sonic has "watchdog-control.service" which is designed to disable the watchdog on every boot. Adding this sleep to enable the watchdog after that service has completed.
Using the "After" key word in the service file also does not help as the "after" keyword only assure that our service is started after watchdog-control.service is started and does not ensure that watchdog-control.service is completed before this.
|
||
signal.signal(signal.SIGHUP, signal.SIG_IGN) | ||
signal.signal(signal.SIGINT, stopWdtService) | ||
signal.signal(signal.SIGTERM, stopWdtService) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pavan-Nokia can we keep the watchdog running during reboot time so that we don't ever end up in a hung situation if kernel hangs during reboot?See https://github.com/sonic-net/sonic-buildimage/blob/master/platform/broadcom/sonic-platform-modules-cel/haliburton/script/cpu_wdt#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot keep the watchdog active during reboot on the 7215-IXS-T1, the watchdog circuit is reset when the system is rebooted, so we have to arm it again when we come back up after reboot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pavan-Nokiathen what is the point of enabling wachdog? I understand if the system is hung AFTER watchdog is enabled then it works as expected. But consider a case where system is booting up after reboot and hangs...before watchdog is enabled then system is hung and watchdog cannot bail out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor You are right, we cannot bail out if there is a hang before the watchdog is enabled
Uboot Does not have support for enabling the watchdog currently, and it is complicated to implement this on existing platform.
The only way we can enable the watchdog here is by using a service in SONiC. and this service will be stopped at some point when the switch is going down and re-enable on boot up.
Enable CPUWDT service to enable watchdog
0b7e707
to
bcde0b9
Compare
@prgeor do you have any more comments on this service? |
Enable CPUWDT service to enable watchdog
Enable CPUWDT service to enable watchdog
Cherry-pick PR to 202305: #17522 |
Hi @yxieca, can you please help to backport this PR to 202205? I've verified the backport will not introduce any regression. Thanks. |
Enable CPUWDT service to enable watchdog
Cherry-pick PR to 202205: #17704 |
Why I did it
Enable watchdog for Nokia 7215 platform
Work item tracking
How I did it
Implement service which arm's the watchdog and send keep alive every minute.
start service be default on boot up
How to verify it
Verify Watchdog is enabled on boot up:
"sudo systemctl status cpu_wdt.service" should show that service as active
admin@sonic:~$ sudo watchdogutil status
Status: Armed
Time remaining: 138 seconds
Stop Service and verify watchdog is disarmed:
admin@sonic:$ sudo systemctl stop cpu_wdt.service
admin@sonic:$ sudo watchdogutil status
Status: Unarmed
Logs added to notify service being armed/Disarmed and for every keep alive(every 55 seconds)
Also verified OC test suits and created seperate PR to uptate sonic-mgmt
sonic-net/sonic-mgmt#10082
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/api/test_watchdog.py
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/common/helpers/platform_api/watchdog.py
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Enable watchdog service for Nokia-7215 platform
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)