-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-13292 control: Use cart API to detect fabric #13989
Conversation
- Add a lib/hardware package to collect fabric interface information through CART API. - Remove custom OFI and UCX packages and dependencies. - Update Go githook to ignore deleted files. Required-githooks: true Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Ticket title is 'Update control plane fabric scans to use new mercury API' |
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13989/1/testReport/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me... Much simpler to have a single provider.
Required-githooks: true
Features: control Required-githooks: true Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13989/2/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13989/2/testReport/ |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13989/2/execution/node/1545/log |
For systems without Infiniband, getting info for verbs produces a Mercury error. For all other providers, including UCX verbs, it returns no error and instead returns no results. We'll simulate that behavior here until the underlying bug is fixed. Features: control Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Features: control
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13989/4/execution/node/1540/log |
Test failure is https://daosio.atlassian.net/browse/DAOS-15598 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Features: control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work
Features: control
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13989/7/execution/node/1451/log |
Features: control
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13989/8/execution/node/1173/log |
Excluding DAOSVersion.test_version test, which has a known failure. Test-tag: pr control,-test_version
Test failure is DAOS-15686. I'm going to exclude the affected test and re-run. |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13989/9/testReport/ |
Don't include ofi+tcp;ofi_rxm when ofi+tcp is requested. Features: control Required-githooks: true Signed-off-by: Kris Jacque <kris.jacque@intel.com>
3f93fcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13989/10/execution/node/1198/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The functional test failing should not be related to this PR and to be a transient error corrupting the json output.
Features: control
Feature: control
Discussed at gatekeeping and with @kjacque . This change re-uses same information that engines are already using, as such the landing risk is low in terms of breaking other systems/clusters. This change is also a right way to go forward, as we don't want to rely on libfabric/ucx apis and need to use unified cart-level ones for retrieval of the info. |
- Add a lib/hardware package to collect fabric interface information through CART API. - Remove custom OFI and UCX packages and dependencies. - Update Go githook to ignore deleted files. * Compensate for DAOS-15588 For systems without Infiniband, getting info for verbs produces a Mercury error. For all other providers, including UCX verbs, it returns no error and instead returns no results. We'll simulate that behavior here until the underlying bug is fixed. Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: