-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-2520 Remove deadline setters from gridfs #1427
Conversation
mongo/gridfs/download_stream.go
Outdated
return nil | ||
// WithContext sets the context for the DownloadStream, allowing control over | ||
// the execution and behavior of operations associated with the stream. | ||
func (ds *DownloadStream) WithContext(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this would be to have a constructor that accepts a context for initializing a download stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this setter at all? The only ways to create a DownloadStream
are using OpenDownloadStream
or OpenDownloadStreamByName
, which both accept a Context
parameter as of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context set by WithContext is specific to the read operation, which is independent of constructing a DownloadStream. For example, this:
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
mt.Cleanup(cancel)
ds, err := bucket.OpenDownloadStreamByName(ctx, fileName) // could time out finding a file, etc
assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err)
p := make([]byte, len(fileData))
_, err = ds.Read(p)
has a different intent than this:
ds, err := bucket.OpenDownloadStreamByName(context.Background(), fileName)
assert.Nil(mt, err, "OpenDownloadStreamByName error: %v", err)
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
mt.Cleanup(cancel)
ds.WithContext(ctx) // specifically trying to add a context when reading a file
p := make([]byte, len(fileData))
_, err = ds.Read(p)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point about how the Context
in OpenDownloadStream
is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.
The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader
. The patterns are similar for applying timeouts to stream writer types that implement io.Writer
.
The Go net.Conn
allows setting a read timeout via SetReadDeadline
or SetDeadline
.
conn, _ := net.Dial(...)
conn.SetReadDeadline(time.Now().Add(15 * time.Second))
// Will time out in 15 seconds
io.ReadAll(conn)
The Go http.Client
allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.
client := &http.Client{
Timeout: 15 * time.Second,
}
resp, err := client.Get(...)
// Will time out in 15 seconds.
io.ReadAll(http.Body)
Thoughts:
- Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a
Context
(they're all just Go driver CRUD calls), so using aContext
seems to be the best choice. - I think accepting a
Context
inOpenDownloadStream
that is not used for actually downloading the file is confusing and would surprise most users. I recommend using theContext
passed toOpenDownloadStream
(andOpenDownloadStreamByName
) as theContext
on aDownloadStream
. - If we want to allow users to override the
Context
used when actually downloading the file, we can add aSetContext
method toDownloadStream
. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion of WithContext comes directly from the http packages Request.WithContext API. Which uses a context set by this method in it's io operations. I am also open to retaining the existing API which, as you note, is the pattern used in net.Conn. I would argue the existing pattern (SetReadDeadline) is unnecessarily asymmetric as DownloadStream does not have a concept of Write and so WithContext or SetDeadline is concise.
I think accepting a Context in OpenDownloadStream that is not used for actually downloading the file is confusing and would surprise most users. I recommend using the Context passed to OpenDownloadStream (and OpenDownloadStreamByName) as the Context on a DownloadStream.
The context timeout starts ticking around when the DownloadStream is constructed. So The user will have to be judicious about how they set the context timeout and when they plan on reading from io. If we go this way, I agree with omitting a setter specific to setting context on the streaming types until it's more clear if there is a use case for it. However, in my opinion this makes the API for DownloadStream more difficult to use. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue with going the constructor route is that if we ever needed to add a read-specific context timeout, then undoing the constructor propagation of context would be a breaking change.
For example, suppose a user is setting a context on the constructor to timeout the the find operation, i.e. the construction. And they have no intention of attempting to timeout the io read. We would be tempted on the Go Driver team to add a WithContext method to DownloadStream to accommodate this case. However, we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read. This could be an awkward situation.
I think simply having something like SetReadDeadline is the correct approach to the Download/Upload Stream objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GridFS API section of the CSOT spec actually describes the required behavior of the timeout param, which is basically "use the constructor context":
... all methods in the GridFS Bucket API MUST support the timeoutMS option. For methods that create streams (e.g.
open_upload_stream
), the option MUST cap the lifetime of the entire stream. ... Methods that interact with a user-provided stream (e.g.upload_from_stream
) MUST usetimeoutMS
as the timeout for the entire upload/download operation.
Concerning the comment
we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read
If we use the Context passed into the constructor, adding a new WithContext
method to a DownloadStream
doesn't seem like it would create a breaking change in API behavior.
For example, consider downloading a file with a 30 second timeout:
ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second)
defer cancel()
ds, _ := bucket.OpenDownloadStream(ctx, ...)
b, _ := io.ReadAll(ds)
Now consider opening a DownloadStream
with a 30 second timeout, but reading the file document(s) with no timeout:
ctx, cancel := context.WithTimeout(context.Background(), 30 * time.Second)
defer cancel()
ds, _ := bucket.OpenDownloadStream(ctx, ...)
ds.WithContext(context.Background())
b, _ := io.ReadAll(ds)
Is there an examples where those timeouts would conflict?
It's still not clear that there is a use case for having different timeout behavior for different underlying operations during a GridFS upload/download, so I still recommend omitting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewdale For this:
we couldn't simply revert the context associated with the constructor because that could break another user's logic that expects a timeout to be shared between construction and read
I agree that there wouldn't be a conflict if (1) we didn't revert the context on the constructor, and (2) (probably) the WithContext method returned a shallow copy of the DownloadStream. Consider this resolved.
I will update the code to include the requested changes, since it conforms to the specifications. But I also want to make it clear that my concern with this approach is that the context lifecycle begins at construction.
This issue is because we store context on the objects, which is an antipattern, and the documentation linked covers this exact case:
The caller’s lifetime is intermingled with a shared context, and the context is scoped to the lifetime where the Worker is created.
The docs also note that the only reason we should do this is for backwards-compatibility, which is not our issue in 2.x.
Unfortunately, if we want to time out the read operation, we have to do this. However, we can do it more modularly than at instantiation. WithContext
gives us more control over what precisely a timeout effects.
Notes:
The http packages NewRequestWithContext also notes this:
For an outgoing client request, the context controls the entire lifetime of a request and its response: obtaining a connection, sending the request, and reading the response headers and body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we're basically using an antipattern, as described almost exactly in the "Storing context in structs leads to confusion" section of that "Contexts and structs" article. However, it's not significantly clearer if we provide a context via WithContext
on a DownloadStream
(I'd actually argue it's more confusing). It seems like we're designing an API to work around two problems:
io.Reader
andio.Writer
don't include contexts, so they basically have to be side-loaded for types that implement those interfaces. See an interesting proposal for adding contexts to those interfaces here.- The methods like
OpenDownloadStream
aren't an atomic "download a file" operation, but are currently the only way to accomplish downloading a file in the API described in the GridFS spec. That creates an conflict between Go Context best practices and the GridFS spec.
There's not much we can do about (1). However, we could separate the upload/download API into different methods, one supporting timeout and one not. For example, keep the existing methods with timeouts that only affect the initial operations but not the returned DownloadStream
(i.e. there is no way to time out Read
calls):
func (b *Bucket) OpenDownloadStream(ctx context.Context, fileID any) (*DownloadStream, error)
Then add additional methods for upload/download that apply the context to the entire operation.
func (b *Bucket) Download(ctx context.Context, dst io.WriterAt, fileID any) error
That deviates from the spec, but conforms more closely to Go Context best practices.
P.S. The Download
method signature is inspired by the AWS SDK's S3 Download method. The io.WriterAt
allows downloading multiple file chunks simultaneously. That's not something the GridFS spec covers, but that API would allow for optimization in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an offline conversation: The least surprising behavior is to make the Context apply to all I/O operations for a DownloadStream
or UploadStream
, which also matches the GridFS spec. The Download
method I proposed above was intended to provide context and is really out of scope of this ticket, so can be ignored. Consider this comment resolved.
mongo/gridfs/download_stream.go
Outdated
return nil | ||
// WithContext sets the context for the DownloadStream, allowing control over | ||
// the execution and behavior of operations associated with the stream. | ||
func (ds *DownloadStream) WithContext(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point about how the Context
in OpenDownloadStream
is only used to get the file info, but is not used for the subsequent ops that read the file info from the database. I think it's worth considering how similar APIs behave.
The Go stdlib offers a few examples of how to apply timeouts to stream reader types that implement io.Reader
. The patterns are similar for applying timeouts to stream writer types that implement io.Writer
.
The Go net.Conn
allows setting a read timeout via SetReadDeadline
or SetDeadline
.
conn, _ := net.Dial(...)
conn.SetReadDeadline(time.Now().Add(15 * time.Second))
// Will time out in 15 seconds
io.ReadAll(conn)
The Go http.Client
allows setting a timeout that applies to the entire lifetime of any request, including dialing, reading headers, and reading the body.
client := &http.Client{
Timeout: 15 * time.Second,
}
resp, err := client.Get(...)
// Will time out in 15 seconds.
io.ReadAll(http.Body)
Thoughts:
- Concerning using "read deadline" vs "context", all of the underlying APIs used by the GridFS code accept a
Context
(they're all just Go driver CRUD calls), so using aContext
seems to be the best choice. - I think accepting a
Context
inOpenDownloadStream
that is not used for actually downloading the file is confusing and would surprise most users. I recommend using theContext
passed toOpenDownloadStream
(andOpenDownloadStreamByName
) as theContext
on aDownloadStream
. - If we want to allow users to override the
Context
used when actually downloading the file, we can add aSetContext
method toDownloadStream
. However, it's not immediately clear if that is necessary, so I'd recommend omitting it for now.
@prestonvasquez can you please merge from |
API Change Report./mongo/gridfsincompatible changes(*Bucket).Delete: changed from func(interface{}) error to func(context.Context, interface{}) error |
mongo/gridfs/bucket.go
Outdated
opts ...*options.FindOneOptions, | ||
) (*DownloadStream, error) { | ||
result := b.filesColl.FindOne(ctx, filter, opts...) | ||
if err := result.Err(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Explicitly checking the error here is unnecessary because the same error will be returned when Decode
is called below. Consider handling both errors when calling Decode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
GODRIVER-2520
Summary
gridfs.Bucket
in favor of extending function signatures to includecontext.Context
DownloadStream.SetReadDeadline
withWithContext
UploadStream.SetWriteDeadline
withWithContext
Background & Motivation
The current api for many of the GridFS crud operations looks something like this:
The proposal of this ticket is to rework the logic to remove the setter and add a context to the Op() method:
gridfs.UploadStream
andgridfs.DownloadStream
are an io.Writer and io.Reader respectively. Both allow context timeouts in their read/write methods, whose signatures cannot be extended to comply with io. This PR suggests could renaming Set<> to WithContext for these structs to put make them slightly more Go-idiomatic (e.g. http(.