Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[armhf][Nokia-7215] Enable Watchdog service #16612

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

Pavan-Nokia
Copy link
Contributor

@Pavan-Nokia Pavan-Nokia commented Sep 21, 2023

Why I did it

Enable watchdog for Nokia 7215 platform

Work item tracking
  • Microsoft ADO (number only): 24981141

How I did it

Implement service which arm's the watchdog and send keep alive every minute.
start service be default on boot up

How to verify it

Verify Watchdog is enabled on boot up:
"sudo systemctl status cpu_wdt.service" should show that service as active
admin@sonic:~$ sudo watchdogutil status
Status: Armed
Time remaining: 138 seconds

Stop Service and verify watchdog is disarmed:
admin@sonic:$ sudo systemctl stop cpu_wdt.service
admin@sonic:$ sudo watchdogutil status
Status: Unarmed

Logs added to notify service being armed/Disarmed and for every keep alive(every 55 seconds)

2430:2023-10-13T01:42:11.077639+00:00 sonic Watchdog: CPUWDT Enabled: watchdog armed=True
2988:Oct 13 01:43:06.132343 sonic INFO Watchdog: CPUWDT keepalive
11925:Oct 13 01:44:01.186558 sonic INFO Watchdog: CPUWDT keepalive
12418:Oct 13 01:44:56.240061 sonic INFO Watchdog: CPUWDT keepalive

47159:Oct 13 02:27:15.490356 sonic NOTICE Watchdog: CPUWDT Disabled: watchdog armed=False

Also verified OC test suits and created seperate PR to uptate sonic-mgmt
sonic-net/sonic-mgmt#10082

https://github.com/sonic-net/sonic-mgmt/blob/master/tests/platform_tests/api/test_watchdog.py
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/common/helpers/platform_api/watchdog.py

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Enable watchdog service for Nokia-7215 platform

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Pavan-Nokia
Copy link
Contributor Author

@mlok-nokia @jon-nokia

@lguohan lguohan added the device label Sep 23, 2023
@Blueve Blueve requested a review from prgeor October 8, 2023 05:35
@prgeor
Copy link
Contributor

prgeor commented Oct 10, 2023

@Pavan-Nokia Pavan-Nokia changed the title [armhf][Nokia-7215] Enable Watchdog service [armhf][Nokia-7215] Enable Watchdog service and remove platform_reboot Oct 13, 2023
@Blueve
Copy link
Contributor

Blueve commented Oct 19, 2023

@prgeor @yxieca the PR LGTM. Can we merge it?

@Pavan-Nokia can you help update ADO number "24981141" to your PR description? Thanks

@prgeor
Copy link
Contributor

prgeor commented Oct 20, 2023

@Pavan-Nokia can we split this into two PRs?

  1. Enable watchdog
  2. Remove platform reboot

KEEPALIVE=55
sonic_logger = logger.Logger('Watchdog')
sonic_logger.set_min_log_priority_info()
time.sleep(60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokia why this sleep needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonic has "watchdog-control.service" which is designed to disable the watchdog on every boot. Adding this sleep to enable the watchdog after that service has completed.

Using the "After" key word in the service file also does not help as the "after" keyword only assure that our service is started after watchdog-control.service is started and does not ensure that watchdog-control.service is completed before this.


signal.signal(signal.SIGHUP, signal.SIG_IGN)
signal.signal(signal.SIGINT, stopWdtService)
signal.signal(signal.SIGTERM, stopWdtService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokia can we keep the watchdog running during reboot time so that we don't ever end up in a hung situation if kernel hangs during reboot?See https://github.com/sonic-net/sonic-buildimage/blob/master/platform/broadcom/sonic-platform-modules-cel/haliburton/script/cpu_wdt#L50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot keep the watchdog active during reboot on the 7215-IXS-T1, the watchdog circuit is reset when the system is rebooted, so we have to arm it again when we come back up after reboot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokiathen what is the point of enabling wachdog? I understand if the system is hung AFTER watchdog is enabled then it works as expected. But consider a case where system is booting up after reboot and hangs...before watchdog is enabled then system is hung and watchdog cannot bail out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor You are right, we cannot bail out if there is a hang before the watchdog is enabled
Uboot Does not have support for enabling the watchdog currently, and it is complicated to implement this on existing platform.
The only way we can enable the watchdog here is by using a service in SONiC. and this service will be stopped at some point when the switch is going down and re-enable on boot up.

Enable CPUWDT service to enable watchdog
@Pavan-Nokia Pavan-Nokia changed the title [armhf][Nokia-7215] Enable Watchdog service and remove platform_reboot [armhf][Nokia-7215] Enable Watchdog service Oct 23, 2023
@Blueve
Copy link
Contributor

Blueve commented Nov 23, 2023

@prgeor do you have any more comments on this service?

@yxieca yxieca merged commit 36c1b8a into sonic-net:master Dec 1, 2023
9 checks passed
@Pavan-Nokia Pavan-Nokia deleted the dev_watchdog_1 branch December 1, 2023 13:37
yxieca pushed a commit that referenced this pull request Dec 4, 2023
Enable CPUWDT service to enable watchdog
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 15, 2023
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #17522

@lizhijianrd
Copy link
Contributor

Hi @yxieca, can you please help to backport this PR to 202205? I've verified the backport will not introduce any regression. Thanks.

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jan 8, 2024
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202205: #17704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants