Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16127 tools: Add daos health check command (#14730) #14885

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Aug 6, 2024

Perform basic system health checks from the client
perspective. Checks the following:

  • Client/Server versions
  • Key library versions and paths
  • Connected sytem information
  • Pool status for all pools to which the user
    has access
  • Container status for all containers in the
    checked pools

Signed-off-by: Michael MacDonald mjmac@google.com

Perform basic system health checks from the client
perspective. Checks the following:

  * Client/Server versions
  * Key library versions and paths
  * Connected sytem information
  * Pool status for all pools to which the user
    has access
  * Container status for all containers in the
    checked pools

Change-Id: I9154ee7f3632996e0e67ad6f320874e1df2e0d23
Signed-off-by: Michael MacDonald <mjmac@google.com>
The commit landed for DAOS-16127 resulted in some differences in
JSON output for pool query. Several tests were written in such a
way that the code expected the (en|dis)abled_ranks keys to always
be set, even if those arrays were NULL. This isn't very idiomatic
and is awkward to work with. The test code has been updated to
instead use the get() operator which will return None if the
response dict does not have the requested key.

Also fixes a problem reported in DAOS-16283, where the JSON output
of `dmg pool query` differed from the JSON output of `daos pool query`
because it didn't include the usage array.

Features: pool control
Required-githooks: true
Change-Id: I4b69ed55ce6df8b3122573b4c7df8f2118a57d1b
Signed-off-by: Michael MacDonald <mjmac@google.com>
@mjmac mjmac requested review from a team as code owners August 6, 2024 22:51
@mjmac mjmac changed the title mjmac/DAOS 16127 2.6 DAOS-16127 tools: Add daos health check command (#14730) Aug 6, 2024
Copy link

github-actions bot commented Aug 6, 2024

Ticket title is 'Add daos health check command'
Status is 'Awaiting backport'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-16127

@github-actions github-actions bot added the priority Ticket has high priority (automatically managed) label Aug 6, 2024
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM

src/client/api/SConscript Show resolved Hide resolved
tanabarr
tanabarr previously approved these changes Aug 7, 2024
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add clean/unclean cherry pick label. Thanks

@mjmac mjmac added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Aug 7, 2024
Apparently these are needed on Ubuntu.

Features: control pool
Required-githooks: true
Change-Id: Ieb0446760f0b53e2f09feeae0226ea26dd455d58
Signed-off-by: Michael MacDonald <mjmac@google.com>
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM. Thanks!

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14885/7/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14885/7/execution/node/1524/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14885/7/execution/node/1616/log

@mjmac
Copy link
Contributor Author

mjmac commented Aug 9, 2024

Test failure appears to be an instance of DAOS-16035. The other failures are the usual UCX/MD-on-SSD failures that happen when I forget to deselect those stages. :/

@mjmac mjmac added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Aug 9, 2024
@mjmac mjmac requested a review from a team August 9, 2024 13:03
@mjmac
Copy link
Contributor Author

mjmac commented Aug 9, 2024

@tanabarr: Mind giving this a +1 so we can get it landed? TIA

@mjmac mjmac requested a review from jolivier23 August 9, 2024 15:09
@jolivier23 jolivier23 merged commit 79fb9df into release/2.6 Aug 9, 2024
53 of 57 checks passed
@jolivier23 jolivier23 deleted the mjmac/DAOS-16127-2.6 branch August 9, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-cherry-pick Cherry-pick from another branch that did not require additional edits forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. priority Ticket has high priority (automatically managed)
Development

Successfully merging this pull request may close these issues.

5 participants