[Refactoring] Training functions and TrainConfig can be made more modular #570

debrevitatevitae · 2024-09-17T06:40:43Z

Describe the feature

Refactoring the training system (training functions and training configuration) in more internal classes or functions.

It should be implemented because

Currently, there are implementation and readability problems about training functions and TrainConfig

training functions are too long (train_grad counts more than 360 lines of code!). Therefore, on one hand, they have too many responsibilities, which makes them hard to maintain and extend. On the other hand they are hard to read.
TrainConfig unloads too much responsibility to the __post_init__, which causes unwanted behavior such as this one.

Modularization and responsibility separation will help in the readability, maintenance and extensibility of the training system

Additional context

No response

Would you like to work on this issue?

None

The text was updated successfully, but these errors were encountered:

mlahariya · 2024-10-03T13:38:53Z

Hey @debrevitatevitae , Thanks for raising this. For the refactoring, I had a few ideas and suggestions.

Logger: We can move the logging part of the training functions outside into a separate class (Logger). An instance of this class could take care of all logging operations - including defining callbacks and providing methods to log at different steps in the training loop
ConfigHandler: We can define a config handler class that would be a collection of methods to handle TrainConfig. This would help us in initial and end stages of the training - where we can define separate methods based on the configuration provided by the user. We can also move the __post_init__ methods into this class, allowing us to resolve this issue

The simple outline of the train function would look something like this

Trainer

Train(args):
  logger = Logger(config)

  confighandle = ConfigHandle(config)

  # training 
  # and logging 

  # end logging and close the logger`

I had two thoughts where I needed inputs though:
Q1: Currently this is defined as a Train function. Can we move it to a Train class ? (a structure similar to the trainer function from PyTorch-lightning trainer).
-- You will be able to call a trainer.fit()
-- fit/step/log other methods will become available to the user to modify later on.

Q2: Do we need the separate functions - train_with_grad and train_without_grad? Can we have a single Train (function/class) - that can be used for either based on a user defined argument?

So, do we need the trainer to be a function and do we need separate functions for train_with_grand/train_without_grad?

Let me know what you think. Once these ideas are refined - I will start working on a PR for this.
@chMoussa @Roland-djee @smitchaudhary @n-toscano
Thanks
M

debrevitatevitae added feature New feature or request refactoring Refactoring of legacy code and removed feature New feature or request labels Sep 17, 2024

mlahariya self-assigned this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactoring] Training functions and TrainConfig can be made more modular #570

[Refactoring] Training functions and TrainConfig can be made more modular #570

debrevitatevitae commented Sep 17, 2024

mlahariya commented Oct 3, 2024 •

edited

Loading

[Refactoring] Training functions and TrainConfig can be made more modular #570

[Refactoring] Training functions and TrainConfig can be made more modular #570

Comments

debrevitatevitae commented Sep 17, 2024

Describe the feature

It should be implemented because

Additional context

Would you like to work on this issue?

mlahariya commented Oct 3, 2024 • edited Loading

mlahariya commented Oct 3, 2024 •

edited

Loading