Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-13228 control: List BDEV role assignments in scan/format display #13463

Merged
merged 12 commits into from
Jan 23, 2024

Conversation

tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Dec 7, 2023

Add MD-on-SSD bdev role and engine rank assignments to dmg storage
scan. Add role assignments to dmg storage format. Roles will be listed
as NA if MD-on-SSD mode is not enabled. Also show which NVMe namespace
each SMD resides on in dmg storage query list-devices through NSID
(which is useful in the case where NVMe controllers have multiple
namespaces).

Remove dmg storage scan --nvme-meta option as all info is available
through combination of dmg storage scan -v and dmg storage query
list-devices.

Make zero-value LED state N/A so NORMAL is not reported in
uninitialized controller structures.

Skip-func-hw-medium-md-on-ssd: false
Skip-func-hw-medium-verbs-provider-md-on-ssd: false
Skip-func-hw-large-md-on-ssd: false
Features: control
Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@tanabarr tanabarr self-assigned this Dec 7, 2023
Copy link

github-actions bot commented Dec 7, 2023

Bug-tracker data:
Ticket title is 'vos_on_blob: dmg storage commands should list bdev_role(s)'
Status is 'In Review'
Labels: 'md_on_ssd,usability'
https://daosio.atlassian.net/browse/DAOS-13228

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/1/execution/node/1414/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13463/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13463/1/testReport/

Copy link
Contributor

@knard-intel knard-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modification of the dmg command seems to break some functional tests, which will probably needed to be updated.

for _, ctrlr := range parseNvmeFormatResults(ctrlrs) {
row := txtfmt.TableRow{pciTitle: ctrlr.PciAddr}
row[resultTitle] = ctrlr.Info
roles := "???"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ???, the role data could be used for md-on-scm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michael-Hennecke for PMem mode I'm displaying N/A for roles Rather than "Data", is that appropriate or should I change?

Copy link
Collaborator

@Michael-Hennecke Michael-Hennecke Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the NVMe usage clearly is for "data", the PMem mode doesn't have any roles (and we state for the yml that either all 3 roles or none of them shall be assigned). so for consistency this should probably be empty or "none" or "NULL" or "n/a". But using "???" as the text string seems odd, because there's nothing unknown about this state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michael-Hennecke , your responses in line with my approach. "???" should never end up being printed, it is simply a placeholder for either "N/A" in PMem mode or roles in MD-on-SSD mode. I didn't want to start with an empty string or "N/A" as the default as neither are useful and both could be confusing. "???" simply implies that it hasn't been set yet and shouldn't end up being displayed to the user.

@@ -227,6 +234,12 @@ func PrintNvmeControllers(controllers storage.NvmeControllers, out io.Writer, op
row[fwTitle] = ctrlr.FwRev
row[socketTitle] = fmt.Sprint(ctrlr.SocketID)
row[capacityTitle] = humanize.Bytes(ctrlr.Capacity())
roles := "???"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ???, the role data could be used for md-on-scm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@@ -240,7 +239,7 @@ func StorageScan(ctx context.Context, rpcClient UnaryInvoker, req *StorageScanRe
Basic: req.NvmeBasic,
// Health and meta details required to populate usage statistics.
Health: req.NvmeHealth || req.Usage,
Meta: req.NvmeMeta || req.Usage,
Meta: req.Usage,
Copy link
Contributor

@knard-intel knard-intel Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this field to for compatibility reason with the engine.
If yes, is it planned to also remove it from the engine ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where should it be removed in the engine?

@@ -43,6 +43,7 @@ const (
accelOptMoveName = "move"
accelOptCRCName = "crc"

bdevRoleNoneName = "n/a"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it make sense to have the role data instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per conversation with @Michael-Hennecke , "roles" should only exist in MD-on-SSD mode, so no roles should be described in PMem mode.

@tanabarr tanabarr force-pushed the tanabarr/control-add-roles-to-scan-output branch from 4570a8c to 1a8a236 Compare December 12, 2023 16:33
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CURL_VERBOSE=${CURL_VERBOSE:-""}
CURL_PROXY="${CURL_PROXY:+-x }${CURL_PROXY:-}"
CURL_OPTS="$CURL_PROXY $CURL_VERBOSE -s"
if ! output=$(curl $CURL_OPTS -s -X POST -F "jenkinsfile=<${1:-Jenkinsfile}" https://$HOST//pipeline-model-converter/validate); then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(lint) Double quote to prevent globbing and word splitting. [SC2086]
(lint) Double quote to prevent globbing and word splitting. [SC2086]

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/2/execution/node/403/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/2/execution/node/383/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/2/execution/node/379/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/2/execution/node/398/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/2/execution/node/373/log

@tanabarr tanabarr force-pushed the tanabarr/control-add-roles-to-scan-output branch from 1a8a236 to b804a85 Compare December 12, 2023 16:45
@daosbuild1 daosbuild1 dismissed their stale review December 12, 2023 16:47

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/3/execution/node/1163/log

@tanabarr tanabarr force-pushed the tanabarr/control-add-roles-to-scan-output branch from b804a85 to f80b489 Compare December 13, 2023 17:00
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/5/execution/node/1163/log

@tanabarr tanabarr force-pushed the tanabarr/control-add-roles-to-scan-output branch from f80b489 to a2ddaf2 Compare December 14, 2023 11:29
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@tanabarr tanabarr force-pushed the tanabarr/control-remove-bdev-scan-cache-pt2-cp branch from 6b7d53a to 46a1b63 Compare December 14, 2023 14:41
@mjmac mjmac reopened this Jan 9, 2024
@mjmac
Copy link
Contributor

mjmac commented Jan 9, 2024

Oops, I clicked the wrong thing while distracted, my apologies.

@tanabarr tanabarr requested a review from a team January 9, 2024 23:07
@tanabarr
Copy link
Contributor Author

@daos-stack/daos-gatekeeper can this PR please be force landed?

https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-13463/21/tests/ failed due to the known issues:

@tanabarr tanabarr removed the request for review from a team January 14, 2024 21:07
…d-roles-to-scan-output

Skip-func-hw-medium-md-on-ssd: false
Skip-func-hw-medium-verbs-provider-md-on-ssd: false
Skip-func-hw-large-md-on-ssd: false
Features: control pool
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@tanabarr tanabarr removed the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jan 14, 2024
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13463/22/execution/node/1463/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13463/22/testReport/

@tanabarr tanabarr requested a review from a team January 15, 2024 21:35
@tanabarr
Copy link
Contributor Author

tanabarr commented Jan 15, 2024

CI results for run no. 22 with control and pool features failed for the following known issues:

Requesting forced landing

@tanabarr tanabarr added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jan 15, 2024
Copy link
Contributor

@NiuYawei NiuYawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the "tests_dmg_helpers.c" needs be fixed for the json format changes? otherwise, dmg_storage_device_list() used by C testing code will fail.

@tanabarr
Copy link
Contributor Author

tanabarr commented Jan 18, 2024

Looks the "tests_dmg_helpers.c" needs be fixed for the json format changes? otherwise, dmg_storage_device_list() used by C testing code will fail.

So as to not pollute this PR unnecessarily the fix for that is in #13630 . The issue you refer to is not directly related to this PR.

@tanabarr tanabarr requested a review from a team January 18, 2024 18:35
@tanabarr tanabarr added the meta-on-ssd Metadata on SSD Feature label Jan 18, 2024
…d-roles-to-scan-output

Skip-func-hw-medium-md-on-ssd: false
Skip-func-hw-medium-verbs-provider-md-on-ssd: false
Skip-func-hw-large-md-on-ssd: false
Features: control
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@tanabarr
Copy link
Contributor Author

@daos-stack/daos-gatekeeper please can this PR be force landed now landings are open on master?

@mchaarawi
Copy link
Contributor

there are 2 failures on this PR that i am not aware of being as known failures:
FTEST_control.DmgSystemReformatTest
FTEST_pool.ListVerboseTest

so this seems not ready to land.

@mchaarawi
Copy link
Contributor

there are 2 failures on this PR that i am not aware of being as known failures: FTEST_control.DmgSystemReformatTest FTEST_pool.ListVerboseTest

so this seems not ready to land.

also if you think otherwise, please merge with latest master with the control Feature pragma

@tanabarr
Copy link
Contributor Author

there are 2 failures on this PR that i am not aware of being as known failures: FTEST_control.DmgSystemReformatTest FTEST_pool.ListVerboseTest

so this seems not ready to land.

These are both manifestations of https://daosio.atlassian.net/browse/DAOS-14884 as identified by @knard-intel . As the fix has landed I will merge master and repush.

@tanabarr tanabarr removed the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jan 22, 2024
@tanabarr tanabarr requested a review from a team January 23, 2024 10:10
@NiuYawei NiuYawei merged commit 6e20194 into master Jan 23, 2024
51 checks passed
@NiuYawei NiuYawei deleted the tanabarr/control-add-roles-to-scan-output branch January 23, 2024 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
control-plane work on the management infrastructure of the DAOS Control Plane meta-on-ssd Metadata on SSD Feature
Development

Successfully merging this pull request may close these issues.

9 participants