-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[System ready] Extend sysmonitor functionality to wait for host daemons #18817
base: master
Are you sure you want to change the base?
Conversation
/azpw ms_conflict |
@fastiuk , PR need to rebase master branch. |
cac3ed8
to
fa52f3f
Compare
93e5740
to
fbaa10a
Compare
@@ -108,6 +108,37 @@ function preStartAction() | |||
fi | |||
{%- elif docker_container_name == "snmp" %} | |||
$SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you enabling this feature by default for snmp? Shouldn't it be based on configuration?
@dgsudharsan It is disabled for snmp by default: see files/build_templates/init_cfg.json.j2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is different. In the init_cfg.j2 you are considering snmp as irrelevent for system ready but using system ready as a criteria to spawn snmp. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you consider handling this logic through featured? Today we have a similar flow in featured where certain daemons are spawned after portinitdone. In this case the critera can be systemd. It can be much easy to extend this to multiple daemons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is different. In the init_cfg.j2 you are considering snmp as irrelevent for system ready but using system ready as a criteria to spawn snmp. Is this correct?
Yes, it is indeed correct. Irrelevant for system ready means that system ready
feature won't just collect readiness status of snmp systemd service. But SNMP must be started after system is ready. The logic actually does't break each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you consider handling this logic through featured? Today we have a similar flow in featured where certain daemons are spawned after portinitdone. In this case the critera can be systemd. It can be much easy to extend this to multiple daemons
I looked at the code in featured
and I don't want to break anything there by adding wait for system ready. Most of the delayed
services there depend on portinit done, and I don't want to add another wait point there as it will just make the logic there more complex and less clear.
@@ -44,6 +44,8 @@ function startplatform() { | |||
/usr/bin/mlnx-fw-upgrade.sh -v | |||
if [[ "$?" -ne "${EXIT_SUCCESS}" ]]; then | |||
debug "Failed to upgrade fw. " "$?" "Restart syncd" | |||
sonic-db-cli STATE_DB HSET "FEATURE|$DEV_SRV" fail_reason \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we considering just the asic firmware update as sysready indication for syncd. Shouldn't it be the create switch success?
@dgsudharsan not only, we have PortInitDone
event in swss and SFP ready
in pmon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is syncd up_status is just set base on startplatform completion. Is it possible that in the actual syncd flow, switch_create might fail and we will still see the up_status as true. Can we include the switch create in the definition of syncd up_status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can happen I think. Set system ready condition you see here is just an internal NV requirement, but at the same time it is very generic, so won't conflict with any other platform. I prefer to leave it as it is.
11a3471
to
0dc3fde
Compare
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
0dc3fde
to
de110cf
Compare
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
@@ -82,11 +82,17 @@ | |||
"auto_restart": "{{autorestart}}", | |||
"support_syslog_rate_limit" : "true", | |||
{# Set check_up_status to true here when app readiness will be marked in state db #} | |||
{%- if feature in ["swss", "syncd", "pmon"] %} | |||
"check_up_status" : "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not change image config generation behavior. We will rely on external service to generate golden config, and ask sonic to config reload -f golden_config.json
to consume them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wen587 to review.
Why I did it
To extend sysmonitor to report services states according to HLD: sonic-net/SONiC#1363
How I did it
How to verify it
sysreadyshow
shows all services as OKWhich release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
Device metadata config
A picture of a cute animal (it is my cat Finn)