Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump of ULTs stacks only works one time for the same execution context #393

Open
bfaccini opened this issue Aug 31, 2023 · 2 comments
Open

Comments

@bfaccini
Copy link
Contributor

bfaccini commented Aug 31, 2023

Trying to dump ULTs stacks by calling ABT_info_trigger_print_all_thread_stacks() in DAOS code, I have found that this can only be done one time in a same process/execution context.

The reason of such buggy behaviour is that, in ABTI_info_check_print_all_thread_stacks(), a NULL/0 test of the return value of the ABTD_atomic_fetch_sub_int() function, basically doing an atomic_fetch_sub(), applied to the print_stack_barrier atomic variable where the number of previously "parked" XStreams (to stop all ABT-related activity during execution of all ULTs stacks dump in the context of an elected “master” XStream) is used to detect when the print_stack_flag can be reset to PRINT_STACK_FLAG_UNSET to allow for a next dump to be started.

The problem is that 1 should be tested instead of 0, since the atomic_fetch_sub() returned value is the one BEFORE the sub not after !

bfaccini added a commit to bfaccini/argobots that referenced this issue Aug 31, 2023
1/one returned value must be tested instead of 0 to
detect that last "parked" XStream is done in
ABTI_info_check_print_all_thread_stacks() and thus
that print_stack_flag can be reset to PRINT_STACK_FLAG_UNSET
to allow for a next dump to be started.

DAOS-14248 ticket, Argobots issue pmodels#393.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
@bfaccini
Copy link
Contributor Author

PR-394 should fix.

knard-intel pushed a commit to knard-intel/argobots that referenced this issue May 15, 2024
1/one returned value must be tested instead of 0 to detect that last
"parked" XStream is done in ABTI_info_check_print_all_thread_stacks()
and thus that print_stack_flag can be reset to PRINT_STACK_FLAG_UNSET to
allow for a next dump to be started.

DAOS-14248 ticket, Argobots issue pmodels#393.

Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Co-authored-by: Bruno Faccini <bruno.faccini@intel.com>
@knard-intel
Copy link

This has been fixed with the PR #397 .
@yfguo , please could you close this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants