How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

Innixma · 2023-03-17T01:34:02Z

I tried benchmarking on task 168824, which is a 10 repeated 10 fold crossvalidation.

This would require 100 runs, 10 folds for each of the 10 repeats.

To do this, I made the following code edits: Innixma@994b92c

I tried setting a constraint to do this:

test100f:
  folds: 100
  max_runtime_seconds: 600
  cores: 4
  min_vol_size_mb: 100000

and created a benchmark yaml file:

---
# 10 times 10-fold Crossvalidation tasks


- name: Australian
  openml_task_id: 168824

However, once getting to folds above the first 10, I got the following error:

[INFO] [amlb:00:23:55.654] Running benchmark `constantpredictor` on `/s3bucket/user/benchmarks/test100f.yaml` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:00:23:55.693] Loading frameworks definitions from ['/repo/resources/frameworks.yaml', '/s3bucket/user/frameworks.yaml'].
[INFO] [amlb.resources:00:23:56.971] Loading benchmark constraint definitions from ['/repo/resources/constraints.yaml', '/s3bucket/user/constraints.yaml'].
[INFO] [amlb.benchmarks.file:00:23:56.985] Loading benchmark definitions from /s3bucket/user/benchmarks/test100f.yaml.
[INFO] [amlb.job:00:23:56.988] 
---------------------------------------------------------------------
Starting job local.test100f.test100f.Australian.13.constantpredictor.
[INFO] [amlb.benchmark:00:23:56.991] Assigning 4 cores (total=4) for new task Australian.
[INFO] [amlb.utils.process:00:23:56.991] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] CPU Utilization: 29.7%
[INFO] [amlb.utils.process:00:23:56.993] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] Memory Usage: 4.1%
[INFO] [amlb.benchmark:00:23:56.994] Assigning 15086 MB (total=15734 MB) for new Australian task.
[WARNING] [amlb.benchmark:00:23:56.994] WARNING: Available storage (96248.359375 MB / total=99053.671875 MB) does not meet requirements (102048 MB)!
[INFO] [amlb.utils.process:00:23:56.994] [MONITORING] [local.test100f.test100f.Australian.13.constantpredictor] Disk Usage: 2.8%
[INFO] [root:00:23:56.994] Starting [get] request for the URL https://www.openml.org/api/v1/xml/task/168824
[INFO] [root:00:23:57.773] 0.7782946s taken for [get] request for the URL https://www.openml.org/api/v1/xml/task/168824
[INFO] [root:00:23:57.773] Starting [get] request for the URL https://www.openml.org/api/v1/xml/data/40981
[INFO] [root:00:23:58.392] 0.6187537s taken for [get] request for the URL https://www.openml.org/api/v1/xml/data/40981
[INFO] [root:00:23:58.393] Starting [get] request for the URL https://www.openml.org/api/v1/xml/data/features/40981
[INFO] [root:00:23:59.004] 0.6109660s taken for [get] request for the URL https://www.openml.org/api/v1/xml/data/features/40981
[INFO] [root:00:23:59.004] Starting [get] request for the URL https://api.openml.org/data/v1/download/18151910/Australian.arff
[INFO] [root:00:23:59.319] 0.3150778s taken for [get] request for the URL https://api.openml.org/data/v1/download/18151910/Australian.arff
[INFO] [urllib3.poolmanager:00:23:59.608] Redirecting http://openml1.win.tue.nl/dataset40981/dataset_40981.pq -> https://openml1.win.tue.nl:443/dataset40981/dataset_40981.pq
[INFO] [urllib3.poolmanager:00:24:00.179] Redirecting http://openml1.win.tue.nl/dataset40981/dataset_40981.pq -> https://openml1.win.tue.nl:443/dataset40981/dataset_40981.pq
[INFO] [root:00:24:00.391] Starting [get] request for the URL https://api.openml.org/api_splits/get/168824/Task_168824_splits.arff
[INFO] [root:00:24:01.088] 0.6965258s taken for [get] request for the URL https://api.openml.org/api_splits/get/168824/Task_168824_splits.arff
[ERROR] [amlb.job:00:24:01.292] Job `local.test100f.test100f.Australian.13.constantpredictor` failed with error: OpenML task 168824 only accepts `fold` < 10.
Traceback (most recent call last):
  File "/repo/amlb/job.py", line 92, in start
    self._setup()
  File "/repo/amlb/benchmark.py", line 518, in setup
    self.load_data()
  File "/repo/amlb/benchmark.py", line 486, in load_data
    self._dataset = Benchmark.data_loader.load(DataSourceType.openml_task, task_id=self._task_def.openml_task_id, fold=self.fold)
  File "/repo/amlb/datasets/__init__.py", line 21, in load
    return self.openml_loader.load(*args, **kwargs)
  File "/repo/amlb/utils/process.py", line 710, in profiler
    return fn(*args, **kwargs)
  File "/repo/amlb/datasets/openml.py", line 52, in load
    raise ValueError("OpenML task {} only accepts `fold` < {}.".format(task_id, nfolds))
ValueError: OpenML task 168824 only accepts `fold` < 10.

How do I make automlbenchmark work for tasks with repeated cross-validation?

The text was updated successfully, but these errors were encountered:

PGijsbers · 2023-03-17T10:14:22Z

It's currently not supported. I'm not 100% sure on all the steps to be taken, but the start would be by also extracting the number of repeats from the split dimensions here, as in repeats, folds, _ = ... (openml-python docs) and use that in the datasplitter. You can probably hack it so that a 10-repeated 10-fold gets run as if it was a 100-fold CV (though with overlap in splits) only adjusting that file (e.g., encoding repeat and fold with dataset.fold // 10 and dataset.fold % 10respectively).

For proper support I think you would need to be able to specify the repeat on invocation, and also take it into account when saving and processing the results.

Innixma · 2023-03-17T23:43:08Z

@PGijsbers Thanks! The hack seems pretty straightforward. Perhaps the hack is sufficient for official implementation? I don't see a reason why we would need to treat repeats any differently from a new fold. And if we want to figure out which repeat/fold a given fold is, we can reverse engineer by looking at the evaluation procedure of the task.

For example, if "5 times 2 fold Crossvalidation" is the tid's evaluation procedure, then we know "fold 5" == "third repeat, 2nd fold" (with 0-indexing this would translate to "repeat 2 fold 1")

PGijsbers · 2023-03-18T07:23:56Z

Without it, 100-fold CV would be indistinguishable from 10-repeated 10-fold CV (or 20-repeated 5-fold CV, or ...) without pulling the additional meta-data on the estimation procedure from OpenML. I don't really like that. It might be sufficient to just add the information to the result file(s) and use the hack otherwise.

If that complicates things too much I would consider perhaps using the hack as a temporary solution on the way to full support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

Innixma commented Mar 17, 2023 •

edited

Loading

PGijsbers commented Mar 17, 2023

Innixma commented Mar 17, 2023

PGijsbers commented Mar 18, 2023

How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

How to benchmark datasets with "10 times 10 fold Crossvalidation" evaluaton procedure #515

Comments

Innixma commented Mar 17, 2023 • edited Loading

PGijsbers commented Mar 17, 2023

Innixma commented Mar 17, 2023

PGijsbers commented Mar 18, 2023

Innixma commented Mar 17, 2023 •

edited

Loading