Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add toggle to disable results/backup/ files in AWS mode #586

Open
Innixma opened this issue Aug 24, 2023 · 1 comment
Open

Add toggle to disable results/backup/ files in AWS mode #586

Innixma opened this issue Aug 24, 2023 · 1 comment

Comments

@Innixma
Copy link
Collaborator

Innixma commented Aug 24, 2023

I am running large-scale benchmarks in AWS mode and finding that there are files being saved in results/backup/ that take up significant space (leading to >1 TB of files that cause the host machine to run out of disk during the benchmark run).

Where in the code are these files being specified and how can I disable them? Are they necessary for anything? I would assume not.

The problem is that each file in backup is concatenating all the results of the benchmark together into a CSV file, causing it to take N^2 space where N is the number of instances being spun up (and in my case, N > 20,000).

As an example:

-rw-rw-r-- 1 ubuntu ubuntu 108684498 Aug 24 17:34 results.20230824T173419.csv
-rw-rw-r-- 1 ubuntu ubuntu 108687827 Aug 24 17:34 results.20230824T173421.csv
-rw-rw-r-- 1 ubuntu ubuntu 108690007 Aug 24 17:34 results.20230824T173440.csv
-rw-rw-r-- 1 ubuntu ubuntu 108694343 Aug 24 17:34 results.20230824T173442.csv
-rw-rw-r-- 1 ubuntu ubuntu 108696534 Aug 24 17:34 results.20230824T173445.csv
-rw-rw-r-- 1 ubuntu ubuntu 108700835 Aug 24 17:34 results.20230824T173447.csv
-rw-rw-r-- 1 ubuntu ubuntu 108702942 Aug 24 17:34 results.20230824T173451.csv
-rw-rw-r-- 1 ubuntu ubuntu 108705127 Aug 24 17:35 results.20230824T173500.csv
-rw-rw-r-- 1 ubuntu ubuntu 108709478 Aug 24 17:35 results.20230824T173506.csv
-rw-rw-r-- 1 ubuntu ubuntu 108711667 Aug 24 17:35 results.20230824T173509.csv
-rw-rw-r-- 1 ubuntu ubuntu 108715990 Aug 24 17:35 results.20230824T173512.csv
-rw-rw-r-- 1 ubuntu ubuntu 108718171 Aug 24 17:35 results.20230824T173516.csv
-rw-rw-r-- 1 ubuntu ubuntu 108720361 Aug 24 17:35 results.20230824T173521.csv
-rw-rw-r-- 1 ubuntu ubuntu 108722544 Aug 24 17:35 results.20230824T173524.csv
-rw-rw-r-- 1 ubuntu ubuntu 108724739 Aug 24 17:35 results.20230824T173526.csv
-rw-rw-r-- 1 ubuntu ubuntu 108726929 Aug 24 17:35 results.20230824T173528.csv
-rw-rw-r-- 1 ubuntu ubuntu 108729124 Aug 24 17:35 results.20230824T173531.csv
-rw-rw-r-- 1 ubuntu ubuntu 108731314 Aug 24 17:35 results.20230824T173546.csv
-rw-rw-r-- 1 ubuntu ubuntu 108733514 Aug 24 17:35 results.20230824T173549.csv
-rw-rw-r-- 1 ubuntu ubuntu 108735701 Aug 24 17:36 results.20230824T173558.csv
-rw-rw-r-- 1 ubuntu ubuntu 108737904 Aug 24 17:36 results.20230824T173610.csv
-rw-rw-r-- 1 ubuntu ubuntu 108740102 Aug 24 17:36 results.20230824T173627.csv
-rw-rw-r-- 1 ubuntu ubuntu 108743879 Aug 24 17:36 results.20230824T173633.csv
-rw-rw-r-- 1 ubuntu ubuntu 108746069 Aug 24 17:36 results.20230824T173635.csv
-rw-rw-r-- 1 ubuntu ubuntu 108748269 Aug 24 17:36 results.20230824T173638.csv
-rw-rw-r-- 1 ubuntu ubuntu 108752563 Aug 24 17:36 results.20230824T173643.csv

There are around 10 of these files being written a minute, each one larger than the last (currently 108MB per file), meaning 1 GB of disk space is being taken up a minute.

@PGijsbers
Copy link
Collaborator

The backup should be made in amlb/results.py#L112, called from the Benchmark, if I am not mistaken. Having an option is call save with append=True should be all it takes.

In the meantime, you could disable results.global_save. Then no results/results.csv will be written at all which should also mean no backup is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants