Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15045 vos: check tls before accessing #13637

Merged
merged 2 commits into from
Feb 4, 2024
Merged

Conversation

wangdi1
Copy link
Contributor

@wangdi1 wangdi1 commented Jan 19, 2024

Check vos tls in vos_pool_hhash_get(), since it maybe called in the system xstream.

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@wangdi1 wangdi1 requested a review from a team as a code owner January 19, 2024 19:05
Copy link

github-actions bot commented Jan 19, 2024

Bug-tracker data:
Ticket title is 'Aurora soak harasser online testing 16 ecb:8 ncn - vos EMRG src/include/daos_srv/daos_engine.h:127 dss_module_key_get() Assertion 'dtls != NULL' failed'
Status is 'In Review'
Labels: 'soak,tds,triaged'
https://daosio.atlassian.net/browse/DAOS-15045

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13637/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13637/1/testReport/

jolivier23
jolivier23 previously approved these changes Jan 19, 2024
liuxuezhao
liuxuezhao previously approved these changes Jan 21, 2024
Comment on lines 77 to 83
#ifndef VOS_STANDALONE
/* main thread doesn't have TLS and XS context or service thread
* is not initialized yet.
*/
if (dss_tls_get() == NULL)
return NULL;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unnecessary to me, the comments logic has been handled in vos_tls_get() codes:
struct vos_tls *
vos_tls_get(bool standalone)
{
#ifdef VOS_STANDALONE
return self_mode.self_tls;
#else

I think the bug here is vos_pool_hhash_get(is_sysdb); return NULL which cause core dump inside d_uhash_link_lookup()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and I don't think this PR fixed the real bug.

It looks to me the real bug is that on mgmt module setup, it calls vos_pool_kill() on main thread (not any xstream) directly to cleanup leftover RDB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see sg_mgmt_tgt_setup() -> cleanup_leftover_pools() -> cleanup_leftover_cb() -> clear_vos_pool() -> vos_pool_kill(). It's called on main thread on start, so we'd ensure the vos_pool_kill() is called by a ULT on sys xstream.

Comment on lines 77 to 83
#ifndef VOS_STANDALONE
/* main thread doesn't have TLS and XS context or service thread
* is not initialized yet.
*/
if (dss_tls_get() == NULL)
return NULL;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and I don't think this PR fixed the real bug.

It looks to me the real bug is that on mgmt module setup, it calls vos_pool_kill() on main thread (not any xstream) directly to cleanup leftover RDB.

NiuYawei
NiuYawei previously approved these changes Jan 22, 2024
uuid_copy(arg.pool_uuid, uuid);
arg.flags = VOS_POF_RDB;

rc = dss_ult_execute(vos_pool_kill_ult, &arg, NULL, NULL, DSS_XS_SYS, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: could reuse the tgt_kill_pool() function?

@@ -1,5 +1,5 @@
/**
* (C) Copyright 2016-2023 Intel Corporation.
* (C) Copyright 2016-2024 Intel Corporation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be reverted.

@@ -1,5 +1,5 @@
/**
* (C) Copyright 2016-2023 Intel Corporation.
* (C) Copyright 2016-2024 Intel Corporation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13637/3/execution/node/1479/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13637/3/execution/node/1433/log

Di Wang added 2 commits January 22, 2024 18:01
vos_pool_kill should be executed in system xstream.

Required-githooks: true

Signed-off-by: Di Wang <di.wang@intel.com>
@wangdi1 wangdi1 requested a review from a team February 3, 2024 20:33
@NiuYawei NiuYawei merged commit 0ecfd10 into master Feb 4, 2024
47 of 48 checks passed
@NiuYawei NiuYawei deleted the wangdi/daos_15045 branch February 4, 2024 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants