Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

Open
Innixma opened this issue Mar 17, 2023 · 3 comments

Comments

@Innixma
Copy link
Collaborator

Innixma commented Mar 17, 2023

I tried benchmarking on task 168824, which is a 10 repeated 10 fold crossvalidation.

This would require 100 runs, 10 folds for each of the 10 repeats.

To do this, I made the following code edits: Innixma@994b92c

I tried setting a constraint to do this:

test100f:
  folds: 100
  max_runtime_seconds: 600
  cores: 4
  min_vol_size_mb: 100000

and created a benchmark yaml file:

---
# 10 times 10-fold Crossvalidation tasks


- name: Australian
  openml_task_id: 168824

However, once getting to folds above the first 10, I got the following error:

[INFO] [amlb:00:23:55.654] Running benchmark `constantpredictor` on `/s3bucket/user/benchmarks/test100f.yaml` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:00:23:55.693] Loading frameworks definitions from ['/repo/resources/frameworks.yaml', '/s3bucket/user/frameworks.yaml'].
[INFO] [amlb.resources:00:23:56.971] Loading benchmark constraint definitions from ['/repo/resources/constraints.yaml', '/s3bucket/user/constraints.yaml'].
[INFO] [amlb.benchmarks.file:00:23:56.985] Loading benchmark definitions from /s3bucket/user/benchmarks/test100f.yaml.
[INFO] [amlb.job:00:23:56.988] 
---------------------------------------------------------------------
Starting job local.test100f.test100f.Australian.13.constantpredictor.
[INFO] [amlb.benchmark:00:23:56.991] Assigning 4 cores (total=4) for new task Australian.
[INFO] [amlb.utils.process:00:23:56.991] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] CPU Utilization: 29.7%
[INFO] [amlb.utils.process:00:23:56.993] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] Memory Usage: 4.1%
[INFO] [amlb.benchmark:00:23:56.994] Assigning 15086 MB (total=15734 MB) for new Australian task.
[WARNING] [amlb.benchmark:00:23:56.994] WARNING: Available storage (96248.359375 MB / total=99053.671875 MB) does not meet requirements (102048 MB)!
[INFO] [amlb.utils.process:00:23:56.994] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] Disk Usage: 2.8%
[INFO] [root:00:23:56.994] Starting [get] request for the URL https://www.openml.org/api/v1/xml/task/168824
[INFO] [root:00:23:57.773] 0.7782946s taken for [get] request for the URL https://www.openml.org/api/v1/xml/task/168824
[INFO] [root:00:23:57.773] Starting [get] request for the URL https://www.openml.org/api/v1/xml/data/40981
[INFO] [root:00:23:58.392] 0.6187537s taken for [get] request for the URL https://www.openml.org/api/v1/xml/data/40981
[INFO] [root:00:23:58.393] Starting [get] request for the URL https://www.openml.org/api/v1/xml/data/features/40981
[INFO] [root:00:23:59.004] 0.6109660s taken for [get] request for the URL https://www.openml.org/api/v1/xml/data/features/40981
[INFO] [root:00:23:59.004] Starting [get] request for the URL https://api.openml.org/data/v1/download/18151910/Australian.arff
[INFO] [root:00:23:59.319] 0.3150778s taken for [get] request for the URL https://api.openml.org/data/v1/download/18151910/Australian.arff
[INFO] [urllib3.poolmanager:00:23:59.608] Redirecting http://openml1.win.tue.nl/dataset40981/dataset_40981.pq -> https://openml1.win.tue.nl:443/dataset40981/dataset_40981.pq
[INFO] [urllib3.poolmanager:00:24:00.179] Redirecting http://openml1.win.tue.nl/dataset40981/dataset_40981.pq -> https://openml1.win.tue.nl:443/dataset40981/dataset_40981.pq
[INFO] [root:00:24:00.391] Starting [get] request for the URL https://api.openml.org/api_splits/get/168824/Task_168824_splits.arff
[INFO] [root:00:24:01.088] 0.6965258s taken for [get] request for the URL https://api.openml.org/api_splits/get/168824/Task_168824_splits.arff
[ERROR] [amlb.job:00:24:01.292] Job `local.test100f.test100f.Australian.13.constantpredictor` failed with error: OpenML task 168824 only accepts `fold` < 10.
Traceback (most recent call last):
  File "/repo/amlb/job.py", line 92, in start
    self._setup()
  File "/repo/amlb/benchmark.py", line 518, in setup
    self.load_data()
  File "/repo/amlb/benchmark.py", line 486, in load_data
    self._dataset = Benchmark.data_loader.load(DataSourceType.openml_task, task_id=self._task_def.openml_task_id, fold=self.fold)
  File "/repo/amlb/datasets/__init__.py", line 21, in load
    return self.openml_loader.load(*args, **kwargs)
  File "/repo/amlb/utils/process.py", line 710, in profiler
    return fn(*args, **kwargs)
  File "/repo/amlb/datasets/openml.py", line 52, in load
    raise ValueError("OpenML task {} only accepts `fold` < {}.".format(task_id, nfolds))
ValueError: OpenML task 168824 only accepts `fold` < 10.

How do I make automlbenchmark work for tasks with repeated cross-validation?

@PGijsbers
Copy link
Collaborator

It's currently not supported. I'm not 100% sure on all the steps to be taken, but the start would be by also extracting the number of repeats from the split dimensions here, as in repeats, folds, _ = ... (openml-python docs) and use that in the datasplitter. You can probably hack it so that a 10-repeated 10-fold gets run as if it was a 100-fold CV (though with overlap in splits) only adjusting that file (e.g., encoding repeat and fold with dataset.fold // 10 and dataset.fold % 10respectively).

For proper support I think you would need to be able to specify the repeat on invocation, and also take it into account when saving and processing the results.

@Innixma
Copy link
Collaborator Author

Innixma commented Mar 17, 2023

@PGijsbers Thanks! The hack seems pretty straightforward. Perhaps the hack is sufficient for official implementation? I don't see a reason why we would need to treat repeats any differently from a new fold. And if we want to figure out which repeat/fold a given fold is, we can reverse engineer by looking at the evaluation procedure of the task.

For example, if "5 times 2 fold Crossvalidation" is the tid's evaluation procedure, then we know "fold 5" == "third repeat, 2nd fold" (with 0-indexing this would translate to "repeat 2 fold 1")

@PGijsbers
Copy link
Collaborator

Without it, 100-fold CV would be indistinguishable from 10-repeated 10-fold CV (or 20-repeated 5-fold CV, or ...) without pulling the additional meta-data on the estimation procedure from OpenML. I don't really like that. It might be sufficient to just add the information to the result file(s) and use the hack otherwise.

If that complicates things too much I would consider perhaps using the hack as a temporary solution on the way to full support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants