-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16488 chk: take sd_lock before accessing VOS sys_db #15207
Conversation
Ticket title is 'lru_ref_release_internal() Assertion 'd_list_empty(&llink->ll_qlink)' failed' |
src/vos/vos_obj_cache.c
Outdated
D_ERROR("Error in creating lru cache: "DF_RC"\n", DP_RC(rc)); | ||
DL_CDEBUG(rc != 0, DLOG_ERR, DLOG_INFO, rc, "Create VOS object LRU cache %p, size %u", | ||
*occ, 1 << cache_size); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea to introduce such D_INFO logs. Discussed the ticket with @Nasf-Fan offline, seems some CR code access/modify sys_db without locking, which result in race on accessing sysdb (sysdb is accessed by all xstreams, locking is required).
Let's hold on landing this PR for this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can know the LRU cache address and size via such log, that will help us to trace the LRU cache life-cycle.
On the other hand, I have already refreshed the patch to hold the lock against vos sys_db before accessing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other changes looks good to me. Could you change the DLOG_INFO to DB_TRACE if you need to collect the debug information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, we can check the issue is fixed after applying the lock changes or not. If failed, then add more debug log.
Refreshed the patch.
7a79614
to
cb0a574
Compare
src/vos/vos_obj_cache.c
Outdated
D_ERROR("Error in creating lru cache: "DF_RC"\n", DP_RC(rc)); | ||
DL_CDEBUG(rc != 0, DLOG_ERR, DLOG_INFO, rc, "Create VOS object LRU cache %p, size %u", | ||
*occ, 1 << cache_size); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other changes looks good to me. Could you change the DLOG_INFO to DB_TRACE if you need to collect the debug information?
src/vos/vos_obj_cache.c
Outdated
return rc; | ||
} | ||
|
||
void | ||
vos_obj_cache_destroy(struct daos_lru_cache *occ) | ||
{ | ||
D_ASSERT(occ != NULL); | ||
D_INFO("Destroy VOS object LRU cache %p\n", occ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this INFO message should be removed or changed to DB_TRACE. We really shouldn't print an internal hash create/destroy message in INFO level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this INFO message should be removed or changed to DB_TRACE. We really shouldn't print an internal hash create/destroy message in INFO level.
DB_TRACE is inconvenient for user since they usually will not enable trace level debug.
The VOS sys_db may have multuiple users, such as SMD and CHK. It is caller's duty to take lock against the VOS sys_db before accessing it to handle concurrent operations from multiple XS. Signed-off-by: Fan Yong <fan.yong@intel.com>
cb0a574
to
a1afa8b
Compare
FYI this PR should have ran the test that was reporting failing in the ticket
Generally for test failures we need to verify the test being fixed is fixed. |
The VOS sys_db may have multuiple users, such as SMD and CHK.
It is caller's duty to take lock against the VOS sys_db before
accessing it to handle concurrent operations from multiple XS.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: