Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address a deadlock issue in multi-GPU scenario #1407

Open
wants to merge 1 commit into
base: clang_tot_upgrade
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions lib/hsa/mcwamp_hsa.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4176,14 +4176,14 @@ void HSAQueue::dispose() {

Kalmar::HSADevice* device = static_cast<Kalmar::HSADevice*>(getDev());

wait();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment would help. It seems calling 'wait()' is equivalent to holding qmutex and wait_no_lock() which is what is lines 4185 below after locking rocrQueuesMutex. Does this fix imply that rocrQueuesMutex should NOT be held before qmutex as it may cause deadlock? If so, shouldn't line 4185 locking of qmutex be removed?
Or, could moving acquiring of rocrQueuesMutex to just before calling removeRocrQueue() help?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_no_lock() could potentially call EnqueueMarkerNoLock(). If the HSAQueue does not hold a rocr queue, it will end up calling createOrstealRocrQueue() and lock rocrQueuesMutex.

Moving the lock on rocrQueuesMutex until just before line 4206 might work, too.

// NOTE: needs to acquire rocrQueuesMutex and then the qumtex in this
// sequence in order to avoid potential deadlock with other threads
// executing createOrstealRocrQueue at the same time
std::lock_guard<std::mutex> rl(device->rocrQueuesMutex);
std::lock_guard<std::mutex> l(this->qmutex);

wait_no_lock();

// clear asyncOps to trigger any lingering resource cleanup while we still hold the locks
// this implicitly waits on all remaining async ops as they destruct, including the
// possible marker from above
Expand Down