Add option to skip downloading output from S3 to local for AWS runs #578

Innixma · 2023-07-21T21:04:58Z

Currently for AWS runs, the results dir of each task is downloaded from S3 to the local machine that executed the AWS run.

With large-scale runs and additional meta-data, these downloads can become very large (multiple terabytes), leading to out-of-disk on the host machine and potential network errors / bandwidth limitations.

It would be nice to be able to specify to skip the local download (but still have the files saved to S3 from the worker nodes). I primarily work with the S3 files directly for post-run aggregation, which is more convenient generally.

PGijsbers · 2023-07-21T21:50:35Z

Would welcome a PR. I propose to make this configurable by adding a parameter to the aws namespace in the configuration (e.g., aws.download). Ideally it would support three options:

All: downloads all files, current behavior, should remain default
Results: download only result files (the _download_results function internally already identifies those, so it should be easy to filter?)
None: don't download files

I don't know from the top of my head whether or not downloading the results file is always required just for the remainder of the logic to work (to know whether a task has finished). If it is, then None could simply choose not to save it to disk (or if there's a non-invasive way to allow it to finish the task without downloading the file, that would work too).

PGijsbers added the aws AWS support label Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to skip downloading output from S3 to local for AWS runs #578

Add option to skip downloading output from S3 to local for AWS runs #578

Innixma commented Jul 21, 2023

PGijsbers commented Jul 21, 2023

Add option to skip downloading output from S3 to local for AWS runs #578

Add option to skip downloading output from S3 to local for AWS runs #578

Comments

Innixma commented Jul 21, 2023

PGijsbers commented Jul 21, 2023