-
Notifications
You must be signed in to change notification settings - Fork 26
remove timestamp from save path #200
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some very minor requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should define what the expected default behaviour of our experiment is first (trying to resume vs. re-running from scratch).
Imo name
should serve as a UID given by the User (if they want, they can put a timestamp there), and default behaviour should be to try and resume experiment state in case of os.path.join(save_path, name)
existing rather than changing it to be unique. Reason is, that this is passed foward to experiment._setup_training as default, and this state is ultimately used by the trainer to search for a previous state.
If we don't want to do that, I see no major advantage in refactoring the timestamp to an increasing integer number.
Thoughts?
I think, that is a problem with the current design of the I thought the increasing integer value would be a nice way to decrease the length of the experiment names 🙈 |
I would prefer a version, that ensures unique names, if a path already exists. I usually have one naming convention and rely on the timestamp part for not overwriting previous checkpoints. My proposal: We add a flag to the experiment (called |
This flag is what i had in mind too but we will have to adjust some things for this as pickle dumps all information about the expiremt setup at the end of training right? It might be better so dump the configuration first and then train the network. So that we just have to search for the latest checkpoint. |
Searching for the latest checkpoint is already handled by the trainer |
Format to use for the stamp
If the run should be continued if present the first sequence is used |
This is a minor issue, but can we maybe put the sequence number at the end? This seems more intuitive to me |
We briefly agreed on having it this way in the meeting today of course the other way round makes sense too. I kinda like the idea of having it this way so that a simple number sorting is recognised on first sight but either way is fine for me. |
Okay, that's a valid reason too, but this way you get the first run from different days sorted together instead of a "real" development over time. What doe the others think on this? @haarburger @mibaumgartner ? |
Oh i would continue numerating even if the day changes. |
That's something I wouldn't like. If we create an experiment, the date of the creation should be used for all runs, but if we re-initiate an experiment, I would also choose a new date |
Ok to clearify if i start an experiment today it will be |
Ah okay, I would do it like
Another problem is, that we need some management of adding zeroes. E.g. if we have already 9 successful runs (which are named in one of the ways above, doesn't matter which one exactly) and we start the 10 run, we need to move the runs before to paths with starting zeroes to keep the sorting correct (and same if re reach the 100., 1000. etc. run). This shouldn't be hard to realize, but something we need to be aware of. EDIT: the naming convention you proposed is also okay for me, as long as we take care of the necessary zeroes |
After looking at the pros and cons, I would prefer the number at the end (did not think about sorting during our meeting). Furthermore, I would like to start from 0 at each day (edit: and) zero pad to the following |
I'd prefer the implementation that @mibaumgartner just suggested. |
@mibaumgartner after i fixed the style check the sklearn backend now crashes could you check if that is a known bug because it seems to me like it could be. |
# Conflicts: # delira/training/base_experiment.py
The failure of the test is caused, because in python 3.5 the ordering of a dict is not preserved. EDIT: I adressed this in #227 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation does not completely reflect what we've been discussing, since the mode to also generate a non-unique save-path is missing which would allow automatic resumes of failed trainings
Ok i would actually suggest they should still be somewhat unique so that a k-fold continuation is possible, in which we just start the highest number again. |
Yeah something like that. But we don't need to take care of the kfold in this case, because inside the save_path the experiment will also create a separate folder for each run. |
As this has been here for a while now we may just add in training continuation should we ? |
To resolve issue #197