exp: Autogenerated names from modified parameter values #7877

iesahin · 2021-05-27T09:47:55Z

iesahin
May 27, 2021

Currently, the autogenerated experiment names are hashes produced from the pipeline elements. They are in the form of exp-12ab90 and having two different hashes for an experiment may be confusing at times iterative/dvc.org#2499

Is it possible to set these experiment names from modified parameter names? For example dvc exp run -S param.name=value may produce an experiment name like exp-param-name-value. It may be a bit long, especially for multiple parameters but being able to understand the experiment with the name worth the longer description I think.

There may be a length limit, e.g., 20 characters for the param section and a 4 character hash value can be added to the end. Overall the names could be like exp-param-name-value-1a2c, or it may be possible to skip the exp- part as well. They are descriptive and unique to use for autocompletion.

Current behavior may be the default if there is no -S option.

dberenbaum · 2021-05-27T19:14:45Z

dberenbaum
May 27, 2021
Collaborator

🤔 It's an interesting idea. I'm not sure it's a high priority, but having more human-readable names seems like it would be nice. There are at least a couple complications, like the fact that experiment names should be unique, which this doesn't guarantee, and there can be multiple -S/--set-param values passed, but there are probably workarounds.

One potential downside is that if a user is tuning the same parameter across many experiments, the first part of each experiment name will look the same, which could make the table hard to read and mean that more characters are needed to apply or otherwise reference a specific experiment.

0 replies

shcheklein · 2021-05-27T19:33:09Z

shcheklein
May 27, 2021
Maintainer

Also quick note. Experiments are not always about changing parameter values. They are also about changing data and/or code. (It's not reflected well in the table and should be improved, but it's a separate story).

Btw, how do we guarantee uniqueness now if I modify a comment in a python file that is not even part of the pipeline and run an experiment again?

0 replies

karajan1001 · 2021-05-28T08:22:44Z

karajan1001
May 28, 2021

I also had considered this in the past. But didn't found a good answer to it.

The main difficulty of this is exp names, good representative names for each experiment are not easy to get. There are several ways to achieve it.

1 . Let users give names, but they may not willing to do so.
2. Autogenerate by some rules, quite hard for there might be serval params difference between exps. Which one is most important? The most representative names might change after the users do some additional experiments to make things even worse.
3. Train a naming model for this task, and let users amend it. (I recommend this one)

0 replies

iesahin · 2021-05-28T19:47:41Z

iesahin
May 28, 2021
Author

IMHO some basic heuristics may work:

I don't think users will change more than two parameters per
experiment. These can be considered exception.
Any pipeline changes in dvc.yamls can also be detected but experiment management probably will rely more on --set-param than updating the pipeline. For a parameter in the form of model.param=value, the experiment name may be like exp-param=value-66dc. For two parameters, like exp-param1=value1-param2=value2-4cc3 and more than two parameters, only the suffix hash value may change. The user is expected to make a decision on these experiments and dvc exp apply before proceeding.
Changes in code and data files can be tracked by the first 4 character of their hash value, e.g., if the user modified src/mymodel.py and it has a new hash value, the new experiment name may be similar to exp-mymodel-12cd-77db, mymodel-12cd being the new hash value of the file.
It may not be feasible to track code/text files not included in dvc.yaml.

We have pipeline/parameter elements from HEAD and we also have corresponding elements in the current experiment, either modified by --set-param or by manually changing the DVC files. It should be possible to get a diff between these.

0 replies

iesahin · 2021-05-28T19:49:32Z

iesahin
May 28, 2021
Author

Btw, how do we guarantee uniqueness now if I modify a comment in a python file that is not even part of the pipeline and run an experiment again?

That's a good question but probably if something is not the part of the pipeline, it's out of scope for change detection. I'll test this.

0 replies

karajan1001 · 2021-05-30T04:01:08Z

karajan1001
May 30, 2021

I don't think users will change more than two parameters per
experiment. These can be considered exception.

Yes, we do not change serval parameters one time, but we might change param(a) in the first exp then param(b) in the second, and finally, the last exp might have serval changed params. This is what I do in the past.

0 replies

dberenbaum · 2021-05-31T01:45:09Z

dberenbaum
May 31, 2021
Collaborator

I don't think improved naming requires having a solution for every type of experiment. As @iesahin mentioned, current behavior could still be the default in edge cases.

I'd be more concerned whether any solution that requires checking for uniqueness could slow things down.

For comparison, wandb chooses random human-readable names like valiant-oath-1 presumably because they are easier to read and remember than random strings.

0 replies

shcheklein · 2021-05-31T02:03:08Z

shcheklein
May 31, 2021
Maintainer

A few questions:

do we want (and do we guarantee now) names to be globally unique (e.g. if multiple people start pushing/pulling them)
how do we after all generate names now if pipeline files stay the same (assuming other changes in the workspace)

For comparison, wandb chooses random human-readable names like valiant-oath-1 presumably because they are easier to read and remember than random strings.

I like the idea of using random human readable names. Alternative here could be something that Sentry does - it names errors like this:

error-A
error-B
..
..
..
error-AB

it keep suffix simple and short. Since they don't expect 1000s of them - it works nicely.

0 replies

iesahin · 2021-05-31T12:57:50Z

iesahin
May 31, 2021
Author

Yes, we do not change serval parameters one time, but we might change param(a) in the first exp then param(b) in the second, and finally, the last exp might have serval changed params.

This is a good point.

Does the results between these two differ:

1:

dvc exp run -n exp1 -S param1=value1
dvc exp apply exp1
dvc exp run -n exp2 -S param2=value2

2:

dvc exp run -n exp -S param1=value1 -S param2=value2

As far as I know, exp and exp2 are identical. However if we remove dvc exp apply in the first one, they are different. In that case param1 of exp2 is identical with param1 of HEAD.

So it might be better to track differences in param.yaml against HEAD instead of -S parameters in naming.

In this case, you are right @karajan1001, the user may be changing more than 2 parameters but I think it may still be possible to get most recently changed two parameters.

0 replies

iesahin · 2021-05-31T13:08:32Z

iesahin
May 31, 2021
Author

I'd be more concerned whether any solution that requires checking for uniqueness could slow things down.

Uniqueness is already checked at the end, when the generated experiment name coincides with the directory name. I don't know the implementation details but it shouldn't have an effect on speed.

For comparison, wandb chooses random human-readable names like valiant-oath-1 presumably because they are easier to read and remember than random strings.

Docker uses such names for containers as well but from user POV, I think having experiments named like ˋunits=512-activation=tanhˋ is more desirable than persevering-plum-11. The latter is certainly easier to implement, though.

0 replies

JamesQuirk · 2021-09-22T10:37:33Z

JamesQuirk
Sep 22, 2021

To chime in on this: I have just been looking to see if there was an option to tweak the 'exp-' prefix for experiments. Being able to do this could allow for better grouping of experiments to reflect changes of a significant params such as if you have a 'model_type' param for experiments when comparing architectures.

The experiment name would then still end with the hash.

0 replies

dberenbaum · 2021-09-22T15:01:31Z

dberenbaum
Sep 22, 2021
Collaborator

@JamesQuirk Just making sure you are at least aware that you can manually name your experiments with dvc exp run -n if you want to group experiments like that. I can see how it would be nice to infer your preferred grouping, but wanted to make sure you know it's at least possible 😄 .

0 replies

JamesQuirk · 2021-09-22T16:18:07Z

JamesQuirk
Sep 22, 2021

@JamesQuirk Just making sure you are at least aware that you can manually name your experiments with dvc exp run -n if you want to group experiments like that. I can see how it would be nice to infer your preferred grouping, but wanted to make sure you know it's at least possible 😄 .

Yes I have seen that. But that changes the whole name doesn't it? So the custom name would need to account for uniqueness..?

0 replies

dberenbaum · 2021-09-22T20:23:28Z

dberenbaum
Sep 22, 2021
Collaborator

Correct @JamesQuirk, so it's possible to group experiments but all the naming burden falls to the user.

If I understand, you would like to do something like give multiple experiments the same name and have dvc append some unique id to the end of each one?

0 replies

iesahin · 2021-09-22T20:37:18Z

iesahin
Sep 22, 2021
Author

Could you ``` dvc exp run -n myprefix-${RANDOM} ``` ?

0 replies

JamesQuirk · 2021-09-23T17:06:49Z

JamesQuirk
Sep 23, 2021

@dberenbaum yes, that's what I was thinking.

Could you

dvc exp run -n myprefix-${RANDOM}

?

This is a valid option but there is still chance that a random number (of finite length) could have already been used which introduces inconveniences with handling that. Using the start of the hash would be better - as dvc already does.

0 replies

dberenbaum · 2021-09-23T20:16:05Z

dberenbaum
Sep 23, 2021
Collaborator

Makes sense as a nice feature to have. I've extracted it into a new issue: #6680.

0 replies

iesahin · 2021-09-27T10:27:33Z

iesahin
Sep 27, 2021
Author

@JamesQuirk another option might be:

dvc exp run -n myprefix-$(date +%s | md5sum | cut -c 1-5)

which guarantees unique names for experiments taking longer than 1 second :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp: Autogenerated names from modified parameter values #7877

{{title}}

Replies: 18 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

exp: Autogenerated names from modified parameter values #7877

iesahin May 27, 2021

Replies: 18 comments

dberenbaum May 27, 2021 Collaborator

shcheklein May 27, 2021 Maintainer

karajan1001 May 28, 2021

iesahin May 28, 2021 Author

iesahin May 28, 2021 Author

karajan1001 May 30, 2021

dberenbaum May 31, 2021 Collaborator

shcheklein May 31, 2021 Maintainer

iesahin May 31, 2021 Author

iesahin May 31, 2021 Author

JamesQuirk Sep 22, 2021

dberenbaum Sep 22, 2021 Collaborator

JamesQuirk Sep 22, 2021

dberenbaum Sep 22, 2021 Collaborator

iesahin Sep 22, 2021 Author

JamesQuirk Sep 23, 2021

dberenbaum Sep 23, 2021 Collaborator

iesahin Sep 27, 2021 Author

iesahin
May 27, 2021

dberenbaum
May 27, 2021
Collaborator

shcheklein
May 27, 2021
Maintainer

karajan1001
May 28, 2021

iesahin
May 28, 2021
Author

iesahin
May 28, 2021
Author

karajan1001
May 30, 2021

dberenbaum
May 31, 2021
Collaborator

shcheklein
May 31, 2021
Maintainer

iesahin
May 31, 2021
Author

iesahin
May 31, 2021
Author

JamesQuirk
Sep 22, 2021

dberenbaum
Sep 22, 2021
Collaborator

JamesQuirk
Sep 22, 2021

dberenbaum
Sep 22, 2021
Collaborator

iesahin
Sep 22, 2021
Author

JamesQuirk
Sep 23, 2021

dberenbaum
Sep 23, 2021
Collaborator

iesahin
Sep 27, 2021
Author