-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] - Support S3 Source Resumption #3570
Conversation
…y/trufflehog into feat-s3-source-resumption
96ae71f
to
094815a
Compare
5ed6639
to
b29a571
Compare
This would be a beneficial capability for other sources. e.g., resuming a large GitHub org scan. |
b29a571
to
f8cc40f
Compare
660f1cd
to
dd20664
Compare
a7dd656
to
aa167b7
Compare
375d817
to
fcdd7ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty straightforward, although I would like to see a test case for (and handling of) the case where an in-progress bucket is ignored while the scan is stopped (described in an inline comment)
pkg/sources/s3/s3.go
Outdated
ctx.Logger().Error(err, "failed to get resume point") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems drastic - what do you think of instead restarting the scan from the beginning when we can't get a resume point? I feel like we should err on the side of scanning too much rather than scanning too little.
i, | ||
len(bucketsToScan), | ||
fmt.Sprintf("Bucket: %s", bucket), | ||
s.Progress.EncodedResumeInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we re-using the existing resume info? I expected to see something from the progress tracker here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was essentially saying, "Don’t modify EncodeResumeInfo—it’s already been updated by progressTracker
, so just use it as-is." Since progressTracker
and s.Progress
reference the same underlying object, would it be clearer if we explicitly used s.progressTracker.Progress
instead?
pkg/sources/s3/s3.go
Outdated
) { | ||
for _, obj := range page.Contents { | ||
s.progressTracker.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't reset entirely, right? It just resets for a new page? I wish I'd been well enough to leave that comment on #3568 :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this was my mistake. I'm going to fix the tracker so it's named more accurately. here
It's no longer a progress tracker, it's pretty much just a checkpointer.
done |
Description:
This PR adds resumption to the S3 source.
Checklist:
make test-community
)?make lint
this requires golangci-lint)?