Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sysmonitor functionality to wait for host daemons #1363

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

fastiuk
Copy link
Contributor

@fastiuk fastiuk commented Jun 19, 2023

@fastiuk fastiuk changed the title Extend sysmonitor functionality Extend sysmonitor functionality to wait for host daemons Jun 19, 2023
@zhangyanzhao
Copy link
Collaborator

sonic community review recordings, the meeting was ended by mistake in the middle, so we have two recordings:
part 1: https://zoom.us/rec/share/YD1eE3Agsa_DVyIgXEO_qCiKphPg67nDDPtkaFGcuGr6MuhW6CvYwdNkvzYX66yV.DsPiDYdWvgsOg4Va
part 2: https://zoom.us/rec/share/Aa3ey00NjX5mF9w4vcwJGmVakued75dZ5I33prQCm1YLLM4zxKNU5JmnXHNgQFhY.-6FnIa3dDwudbhyI

* Configure system ready feature state
* Allow host daemons to report its status

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
@liat-grozovik
Copy link
Collaborator

@zhangyanzhao this HLD was reviewed in the community as avaialble for 2 months now.
if we dont have reviewers assigned nor feedback provided i believe we should assume this is approved.

@fastiuk please track. when do you expect the code PRs to be available?

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Sep 18, 2023

  • Combatibility with application extension framework.

typo? #Closed


Refers to: doc/system_health_monitoring/system-ready-HLD.md:105 in d2b8071. [](commit_id = d2b8071, deletion_comment = False)

}
}
}
```

The feature configuration is controlled by `sysready_state` field of `DEVICE_METADATA` table.
Copy link
Contributor

@qiluo-msft qiluo-msft Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVICE_METADATA

why not use FEATURE table? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used. But the parameter you highlighted is used to control the global feature state. enabled or disabled. With disabled feature, it will wait for the readiness of only one component - swss "PortInitDone"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the FEATURE is enabled, why allow user to disable it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEATURE table controls the effect of a specific feature (sonic app extension) on sysmonitor. So it will, or won't have an effect on system startup flow.
But DEVICE_METADATA table controls the system ready state globally, with disabled state it won't wait for any daemon enabled in FEATURE table, but only for swss's "PortInitDone"

@fastiuk
Copy link
Contributor Author

fastiuk commented Sep 18, 2023

  • Combatibility with application extension framework.

typo?

Refers to: doc/system_health_monitoring/system-ready-HLD.md:105 in d2b8071. [](commit_id = d2b8071, deletion_comment = False)

I don't think so. Since affect on system readiness is configured using FEATURE table as well as JSON file, it is legit compatibility,

@qiluo-msft
Copy link
Contributor

  • Combatibility with application extension framework.

Typo on Combatibility?


In reply to: 1722855007


Refers to: doc/system_health_monitoring/system-ready-HLD.md:105 in d2b8071. [](commit_id = d2b8071, deletion_comment = False)

Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fastiuk who will monitor sysmonitor task if its healthy? Its a chicken-egg problem....I think its better to montior the health of the system outside the sonic...

@fastiuk
Copy link
Contributor Author

fastiuk commented Oct 2, 2023

  • Combatibility

Oh, yes :) Will fix

@fastiuk
Copy link
Contributor Author

fastiuk commented Oct 2, 2023

@fastiuk who will monitor sysmonitor task if its healthy? Its a chicken-egg problem....I think its better to montior the health of the system outside the sonic...

As part of this feature - no one

@liat-grozovik
Copy link
Collaborator

@prgeor any further comments or this HLD can be approved and merged?
@fastiuk do you have code PRs that you can add to the PR description for tracking?

@fastiuk
Copy link
Contributor Author

fastiuk commented Dec 3, 2023

@fastiuk do you have code PRs that you can add to the PR description for tracking?

Not yet, planning to have it on Monday

@qiluo-msft
Copy link
Contributor

  • Combatibility with application extension framework.

Could you fix?


In reply to: 1728546769


Refers to: doc/system_health_monitoring/system-ready-HLD.md:105 in d2b8071. [](commit_id = d2b8071, deletion_comment = False)

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
@fastiuk
Copy link
Contributor Author

fastiuk commented Jan 29, 2024

Comments addressed. Please approve and merge

@zhangyanzhao
Copy link
Collaborator

@fastiuk can you please help to add the code PRs by referring to #806 ? Thanks.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please check with other reviewers.

@fastiuk
Copy link
Contributor Author

fastiuk commented Feb 19, 2024

@prgeor could you please take a look again? As you have done review in the past?

@prgeor prgeor merged commit ce313db into sonic-net:master Mar 4, 2024
1 check passed
@zhangyanzhao
Copy link
Collaborator

@fastiuk can you please help to provide the code PR in this HLD? Thanks.

@fastiuk
Copy link
Contributor Author

fastiuk commented Apr 28, 2024

@fastiuk can you please help to provide the code PR in this HLD? Thanks.

Added

@zhangyanzhao
Copy link
Collaborator

Still target 202405 release, code PRs need be reviewed and Nvidia will help on the code PR review. @liat-grozovik

@zhangyanzhao
Copy link
Collaborator

Existing test cases will be used for regression testing.

@zhangyanzhao
Copy link
Collaborator

code PRs are still open, move to backlog for future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: MovedToBacklog
Development

Successfully merging this pull request may close these issues.

5 participants