Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this change?
This adds the ability to pull multiple S3 objects in parallel.
resolves #11
How to test
Using an AWS profile with permission to download from wellcomecollection-editorial-photography...
PYTHONPATH=src python src/transferrer/download.py PBBD_TEST
The log output should show two
downloading
lines, followed by twodownloaded
linesHow can we measure success?
Downloads of s3 folders should be faster. However, this is a premature optimisation, as I don't have a real-world (i.e. running on Lambda) example to compare it against. Running on my computer, fetching EPOPTEST (not currently available), it seemed to shave off a few seconds.
Even so, I expect this to make it possible for us to tune the performance of the process.
Have we considered potential risks?
This is a little bit of added complexity, which has the potential to cause the process to use significantly more memory and possibly fail. In order to mitigate this, the thread pool is constrained to a relatively low number (0.01 of the max threads available in Lambda)