-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15045 vos: check tls before accessing #13637
Conversation
Bug-tracker data: |
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13637/1/testReport/ |
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13637/1/testReport/ |
3ff3501
to
ae7ddc0
Compare
src/vos/vos_tls.h
Outdated
#ifndef VOS_STANDALONE | ||
/* main thread doesn't have TLS and XS context or service thread | ||
* is not initialized yet. | ||
*/ | ||
if (dss_tls_get() == NULL) | ||
return NULL; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unnecessary to me, the comments logic has been handled in vos_tls_get() codes:
struct vos_tls *
vos_tls_get(bool standalone)
{
#ifdef VOS_STANDALONE
return self_mode.self_tls;
#else
I think the bug here is vos_pool_hhash_get(is_sysdb); return NULL which cause core dump inside d_uhash_link_lookup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and I don't think this PR fixed the real bug.
It looks to me the real bug is that on mgmt module setup, it calls vos_pool_kill() on main thread (not any xstream) directly to cleanup leftover RDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see sg_mgmt_tgt_setup() -> cleanup_leftover_pools() -> cleanup_leftover_cb() -> clear_vos_pool() -> vos_pool_kill(). It's called on main thread on start, so we'd ensure the vos_pool_kill() is called by a ULT on sys xstream.
src/vos/vos_tls.h
Outdated
#ifndef VOS_STANDALONE | ||
/* main thread doesn't have TLS and XS context or service thread | ||
* is not initialized yet. | ||
*/ | ||
if (dss_tls_get() == NULL) | ||
return NULL; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and I don't think this PR fixed the real bug.
It looks to me the real bug is that on mgmt module setup, it calls vos_pool_kill() on main thread (not any xstream) directly to cleanup leftover RDB.
ae7ddc0
to
f2ab1d8
Compare
src/mgmt/srv_target.c
Outdated
uuid_copy(arg.pool_uuid, uuid); | ||
arg.flags = VOS_POF_RDB; | ||
|
||
rc = dss_ult_execute(vos_pool_kill_ult, &arg, NULL, NULL, DSS_XS_SYS, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: could reuse the tgt_kill_pool() function?
src/vos/vos_pool.c
Outdated
@@ -1,5 +1,5 @@ | |||
/** | |||
* (C) Copyright 2016-2023 Intel Corporation. | |||
* (C) Copyright 2016-2024 Intel Corporation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be reverted.
src/vos/vos_tls.h
Outdated
@@ -1,5 +1,5 @@ | |||
/** | |||
* (C) Copyright 2016-2023 Intel Corporation. | |||
* (C) Copyright 2016-2024 Intel Corporation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13637/3/execution/node/1479/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13637/3/execution/node/1433/log |
vos_pool_kill should be executed in system xstream. Required-githooks: true Signed-off-by: Di Wang <di.wang@intel.com>
f2ab1d8
to
0caa14e
Compare
Check vos tls in vos_pool_hhash_get(), since it maybe called in the system xstream.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: