Initialization seed loop in BaseStrategy #395

Mattdl · 2021-02-17T16:38:16Z

Mattdl
Feb 17, 2021

CL experiments don't often take very long. For computer clusters it's easier to have one job being scheduled that runs longer (e.g. 1 job that runs the 5 seeds, instead of having to wait for 5 jobs to get scheduled).
Can this be integrated in the BaseStrategy? And is it possible to accumulate evaluation statistics over different seeds?

vlomonaco · 2021-02-17T17:21:24Z

vlomonaco
Feb 17, 2021
Maintainer

At the moment we do not support hyperparams selection or sophisticated metrics results accumulation (the user should implement them from scratch), we should think how to support them better (maybe using orion?)

@mmasana I'm sure may offer more insights on this issue :)

0 replies

mmasana · 2021-02-17T17:33:29Z

mmasana
Feb 17, 2021

We used a GridSearch system, similar to the one in the TIL survey with @Mattdl. However, we implemented it ourselves, so using smth like Orion could be interesting to make it more standardized. For each method to define which hyperparameters are important to be searched and which are more or less set, can be a rabbit hole though.

I would be in favour of having some kind of accumulation of statistics over runs with the same parameters but different seed. As of now, we solve it as mentioned, by having each job run multiple seeds (parallel or series, depending on resources). Then we have separate scripts that read the output logs and generate graphs, tables or whatever needed.

0 replies

Mattdl · 2021-02-17T18:22:24Z

Mattdl
Feb 17, 2021
Author

Yes I agree, I think the ad-hoc solution might be easiest and certainly useful!

0 replies

AntonioCarta · 2021-02-17T19:10:28Z

AntonioCarta
Feb 17, 2021
Maintainer

I would keep training and model selection separated, like scikit-learn does. Something like GridSearch(param_grid, model_class, train_stream, val_stream, test_stream), where model_class is the strategy class. Grid search covers @Mattdl usecases when you have a single configuration.
I think this is a good contribution for the evaluation module. It's also a good first issue.

Keep in mind that as of now we don't have a method to split data streams yet, therefore you can't separate the train stream intro train and validation. This will be implemented soon (hopefully).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization seed loop in BaseStrategy #395

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Initialization seed loop in BaseStrategy #395

Mattdl Feb 17, 2021

Replies: 4 comments

vlomonaco Feb 17, 2021 Maintainer

mmasana Feb 17, 2021

Mattdl Feb 17, 2021 Author

AntonioCarta Feb 17, 2021 Maintainer

Mattdl
Feb 17, 2021

vlomonaco
Feb 17, 2021
Maintainer

mmasana
Feb 17, 2021

Mattdl
Feb 17, 2021
Author

AntonioCarta
Feb 17, 2021
Maintainer