-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC with cancelled context caused race condition deadlock in pinner #10593
Comments
An alternative solution is to modify func Descendants(ctx context.Context, getLinks dag.GetLinks, set *cid.Set, roots <-chan pin.StreamedPin) error {
// ...
for {
select {
case <-ctx.Done():
for range roots {}
return ctx.Err()
case wrapper, ok := <-roots:
// ...
}
}
} |
Per ipfs/kubo#10593, if no one is reading from the channel returned by RecursiveKeys() and the context is cancelled, streamIndex will hang indefinitely. Proposed fix is to always select when attempting to write to the `out` channel. If the context is done and there is no one to read, we can abort.
Per ipfs/kubo#10593, if no one is reading from the channel returned by RecursiveKeys() and the context is cancelled, streamIndex will hang indefinitely. Proposed fix is to always select when attempting to write to the `out` channel. If the context is done and there is no one to read, we can abort.
Hello @LeeTeng2001 . Thank you for tracking down this bug! A proposed fix is here: ipfs/boxo#727 Your first solution might work, but I'm not sure if cancelling the context might mean streamIndex will only attempt to put only one more result on the channel. From a cursory look it seems to me it might keep looping and try to send more results, in which case we may end up in the same situation regardless of buffering. Your second solution works but whether the |
Seems reasonable, thanks for the alternative solution |
Per ipfs/kubo#10593, if no one is reading from the channel returned by RecursiveKeys() and the context is cancelled, streamIndex will hang indefinitely. Proposed fix is to always select when attempting to write to the `out` channel. If the context is done and there is no one to read, we can abort.
Per ipfs/kubo#10593, if no one is reading from the channel returned by RecursiveKeys() and the context is cancelled, streamIndex will hang indefinitely. Proposed fix is to always select when attempting to write to the `out` channel. If the context is done and there is no one to read, we can abort. Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
fixed |
Checklist
Installation method
built from source
Version
Config
Description
I noticed a deadlock caused by GC, after debugging for a while, I found the problematic part and have a solution for it.
To reproduce, download a file with ipfs, next, run a GC with cancelled context, download a new file with ipfs, the file will hang at 100% while waiting for pinner to pin the file.
The issue occurs at race condition at GC.
GC will invoke
ColoredSet
which in turns invoke pinnerRecursiveKeys
to get a channel. The channel will passed into Descendants to iterate result.The race condition lies between the interaction between pinner's
streamIndex
andDescendants
When the passed in context is cancelled, pinner's streamIndex goroutine will immediately return an error which it wants to deliver outside via the unbuffered
out
channel. At theDescendants
side, the channel will never be read because the select statement contains theout
channel and context, when the context is cancelled, it'll immediately return without ever reading theout
channel.My proposed solution is simple, change the streamIndex channel to a buffered channel
Though, I'm not sure if this will affect other part of infrastructure, please verify my solution validness.
The text was updated successfully, but these errors were encountered: