Inform submission about `accumulated_submission_time` #785

Niccolo-Ajroldi · 2024-09-12T15:27:39Z

Description

Currently, update_params has no up-to-date information about the elapsed time since start.

My motivation for adding this feature is to simplify the implementation of a time-based learning rate schedule.

Can't a submission just keep track of time or estimate it?
In theory yes, this is allowed by the rules and feasible. However, such implementation would require synchronization among devices inside update_params when training in distributed mode, which would penalize such submission.

Why is a time-based scheduler useful?
Currently, a submission can implement a LR scheduler using step_hint as a step budget. This is a reliable estimate of the number of steps needed for (N)AdamW to reach max_runtime. However, a submission could be faster/slower than (N)AdamW, and the extent of this difference can vary based on the workload itself. This makes deriving a custom step budget from step_hint suboptimal.

Implementation

We could simply pass train_state to update_params, or even just train_state['accumulated_submission_time'].
Notice that update_params already receives in input eval_results, which informs the submission about the elapsed time at the moment of the last evaluation, which is not up-to date with accumulated_submission_time.

The text was updated successfully, but these errors were encountered:

Niccolo-Ajroldi · 2024-10-03T15:23:11Z

Remark: we should avoid that a submission modifies train_state.

A straightforward solution would be to pass it by copy. A shallow copy should be enough, since all values in train_state are primitive types, and cannot be modified in-place. Although very fast, this copying operation would however be counted as part of the submission time.

Another solution would be to just pass train_state['accumulated_submission_time'] instead.

priyakasimbeg · 2024-10-30T22:02:10Z

Fixed in #790

Niccolo-Ajroldi changed the title ~~Inform submission about 'accumulated_submission_time'~~ Inform submission about accumulated_submission_time Sep 12, 2024

Niccolo-Ajroldi mentioned this issue Oct 3, 2024

Pass train_state to update_params #790

Merged

Niccolo-Ajroldi mentioned this issue Oct 18, 2024

Introduce prepare for eval, fix evaluation bug #789

Open

priyakasimbeg closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inform submission about `accumulated_submission_time` #785

Inform submission about `accumulated_submission_time` #785

Niccolo-Ajroldi commented Sep 12, 2024

Niccolo-Ajroldi commented Oct 3, 2024 •

edited

Loading

priyakasimbeg commented Oct 30, 2024

Inform submission about accumulated_submission_time #785

Inform submission about accumulated_submission_time #785

Comments

Niccolo-Ajroldi commented Sep 12, 2024

Description

Implementation

Niccolo-Ajroldi commented Oct 3, 2024 • edited Loading

priyakasimbeg commented Oct 30, 2024

Inform submission about `accumulated_submission_time` #785

Inform submission about `accumulated_submission_time` #785

Niccolo-Ajroldi commented Oct 3, 2024 •

edited

Loading