Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[armhf][Nokia-7215] Enable Watchdog service #16612

Merged
merged 1 commit into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/usr/bin/python

from sonic_platform.chassis import Chassis
from sonic_py_common import logger
import time
import os
import signal
import sys


TIMEOUT=170
KEEPALIVE=55
sonic_logger = logger.Logger('Watchdog')
sonic_logger.set_min_log_priority_info()
time.sleep(60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokia why this sleep needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonic has "watchdog-control.service" which is designed to disable the watchdog on every boot. Adding this sleep to enable the watchdog after that service has completed.

Using the "After" key word in the service file also does not help as the "after" keyword only assure that our service is started after watchdog-control.service is started and does not ensure that watchdog-control.service is completed before this.

chassis = Chassis()
watchdog = chassis.get_watchdog()

def stopWdtService(signal, frame):
watchdog._disablewatchdog()
sonic_logger.log_notice("CPUWDT Disabled: watchdog armed=%s" % watchdog.is_armed() )
sys.exit()

def main():

signal.signal(signal.SIGHUP, signal.SIG_IGN)
signal.signal(signal.SIGINT, stopWdtService)
signal.signal(signal.SIGTERM, stopWdtService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokia can we keep the watchdog running during reboot time so that we don't ever end up in a hung situation if kernel hangs during reboot?See https://github.com/sonic-net/sonic-buildimage/blob/master/platform/broadcom/sonic-platform-modules-cel/haliburton/script/cpu_wdt#L50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot keep the watchdog active during reboot on the 7215-IXS-T1, the watchdog circuit is reset when the system is rebooted, so we have to arm it again when we come back up after reboot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pavan-Nokiathen what is the point of enabling wachdog? I understand if the system is hung AFTER watchdog is enabled then it works as expected. But consider a case where system is booting up after reboot and hangs...before watchdog is enabled then system is hung and watchdog cannot bail out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor You are right, we cannot bail out if there is a hang before the watchdog is enabled
Uboot Does not have support for enabling the watchdog currently, and it is complicated to implement this on existing platform.
The only way we can enable the watchdog here is by using a service in SONiC. and this service will be stopped at some point when the switch is going down and re-enable on boot up.


watchdog.arm(TIMEOUT)
Pavan-Nokia marked this conversation as resolved.
Show resolved Hide resolved
sonic_logger.log_notice("CPUWDT Enabled: watchdog armed=%s" % watchdog.is_armed() )


while True:
time.sleep(KEEPALIVE)
watchdog._keepalive()
Pavan-Nokia marked this conversation as resolved.
Show resolved Hide resolved
sonic_logger.log_info("CPUWDT keepalive")
done

stopWdtService

return


if __name__ == '__main__':
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[Unit]
Description=CPU WDT
After=nokia-7215init.service
[Service]
ExecStart=/usr/local/bin/cpu_wdt.py

[Install]
WantedBy=multi-user.target
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import os
import fcntl
import array

import time
from sonic_platform_base.watchdog_base import WatchdogBase

""" ioctl constants """
Expand All @@ -35,7 +35,7 @@
WDIOS_ENABLECARD = 0x0002

""" watchdog sysfs """
WD_SYSFS_PATH = "/sys/class/watchdog/"
WD_SYSFS_PATH = "/sys/class/watchdog/watchdog0/"

WD_COMMON_ERROR = -1

Expand All @@ -52,16 +52,32 @@ def __init__(self, wd_device_path):
@param wd_device_path Path to watchdog device
"""
super(WatchdogImplBase, self).__init__()


self.watchdog=""
self.watchdog_path = wd_device_path
self.watchdog = os.open(self.watchdog_path, os.O_WRONLY)

# Opening a watchdog descriptor starts
# watchdog timer; by default it should be stopped
self._disablewatchdog()
self.armed = False
self.wd_state_reg = WD_SYSFS_PATH+"state"
self.wd_timeout_reg = WD_SYSFS_PATH+"timeout"
self.wd_timeleft_reg = WD_SYSFS_PATH+"timeleft"

self.timeout = self._gettimeout()

def _read_sysfs_file(self, sysfs_file):
# On successful read, returns the value read from given
# reg_name and on failure returns 'ERR'
rv = 'ERR'

if (not os.path.isfile(sysfs_file)):
return rv
try:
with open(sysfs_file, 'r') as fd:
rv = fd.read()
except Exception as e:
rv = 'ERR'

rv = rv.rstrip('\r\n')
rv = rv.lstrip(" ")
return rv

def _disablewatchdog(self):
"""
Turn off the watchdog timer
Expand Down Expand Up @@ -102,11 +118,10 @@ def _gettimeout(self):
Get watchdog timeout
@return watchdog timeout
"""
timeout=0
timeout=self._read_sysfs_file(self.wd_timeout_reg)

req = array.array('I', [0])
fcntl.ioctl(self.watchdog, WDIOC_GETTIMEOUT, req, True)

return int(req[0])
return timeout

def _gettimeleft(self):
"""
Expand All @@ -127,15 +142,20 @@ def arm(self, seconds):
ret = WD_COMMON_ERROR
if seconds < 0:
return ret


# Stop the watchdog service to gain access of watchdog file pointer
if self.is_armed():
os.popen("systemctl stop cpu_wdt.service")
time.sleep(2)
if not self.watchdog:
self.watchdog = os.open(self.watchdog_path, os.O_WRONLY)
try:
if self.timeout != seconds:
self.timeout = self._settimeout(seconds)
if self.armed:
Pavan-Nokia marked this conversation as resolved.
Show resolved Hide resolved
if self.is_armed():
self._keepalive()
else:
self._enablewatchdog()
self.armed = True
ret = self.timeout
except IOError:
pass
Expand All @@ -150,22 +170,31 @@ def disarm(self):
A boolean, True if watchdog is disarmed successfully, False
if not
"""

try:
self._disablewatchdog()
self.armed = False
self.timeout = 0
except IOError:
return False

if self.is_armed():
os.popen("systemctl stop cpu_wdt.service")
time.sleep(2)
if not self.watchdog:
self.watchdog = os.open(self.watchdog_path, os.O_WRONLY)
try:
self._disablewatchdog()
self.timeout = 0
except IOError:
return False

return True

def is_armed(self):
"""
Implements is_armed WatchdogBase API
"""
status = False

state = self._read_sysfs_file(self.wd_state_reg)
if (state != 'inactive'):
status = True

return self.armed
return status

def get_remaining_time(self):
"""
Expand All @@ -174,10 +203,7 @@ def get_remaining_time(self):

timeleft = WD_COMMON_ERROR

if self.armed:
try:
timeleft = self._gettimeleft()
except IOError:
pass
if self.is_armed():
timeleft=self._read_sysfs_file(self.wd_timeleft_reg)

return timeleft
return int(timeleft)
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
nokia-7215_plt_setup.sh usr/sbin
7215/scripts/nokia-7215init.sh usr/local/bin
7215/scripts/cpu_wdt.py usr/local/bin
7215/service/nokia-7215init.service etc/systemd/system
7215/service/cpu_wdt.service etc/systemd/system
7215/service/fstrim.timer/timer-override.conf /lib/systemd/system/fstrim.timer.d
7215/sonic_platform-1.0-py3-none-any.whl usr/share/sonic/device/armhf-nokia_ixs7215_52x-r0
inband_mgmt.sh etc/
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,8 @@ sh /usr/sbin/nokia-7215_plt_setup.sh
systemctl enable nokia-7215init.service
systemctl start nokia-7215init.service

systemctl enable cpu_wdt.service
systemctl start cpu_wdt.service

exit 0