Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactoring] Training functions and TrainConfig can be made more modular #570

Open
debrevitatevitae opened this issue Sep 17, 2024 · 1 comment
Assignees
Labels
refactoring Refactoring of legacy code

Comments

@debrevitatevitae
Copy link
Collaborator

Describe the feature

Refactoring the training system (training functions and training configuration) in more internal classes or functions.

It should be implemented because

Currently, there are implementation and readability problems about training functions and TrainConfig

  • training functions are too long (train_grad counts more than 360 lines of code!). Therefore, on one hand, they have too many responsibilities, which makes them hard to maintain and extend. On the other hand they are hard to read.
  • TrainConfig unloads too much responsibility to the __post_init__, which causes unwanted behavior such as this one.

Modularization and responsibility separation will help in the readability, maintenance and extensibility of the training system

Additional context

No response

Would you like to work on this issue?

None

@debrevitatevitae debrevitatevitae added feature New feature or request refactoring Refactoring of legacy code and removed feature New feature or request labels Sep 17, 2024
@mlahariya mlahariya self-assigned this Oct 3, 2024
@mlahariya
Copy link
Collaborator

mlahariya commented Oct 3, 2024

Hey @debrevitatevitae , Thanks for raising this. For the refactoring, I had a few ideas and suggestions.

  1. Logger: We can move the logging part of the training functions outside into a separate class (Logger). An instance of this class could take care of all logging operations - including defining callbacks and providing methods to log at different steps in the training loop
  2. ConfigHandler: We can define a config handler class that would be a collection of methods to handle TrainConfig. This would help us in initial and end stages of the training - where we can define separate methods based on the configuration provided by the user. We can also move the __post_init__ methods into this class, allowing us to resolve this issue

The simple outline of the train function would look something like this

Trainer

Train(args):
  logger = Logger(config)

  confighandle = ConfigHandle(config)

  # training 
  # and logging 

  # end logging and close the logger`

I had two thoughts where I needed inputs though:
Q1: Currently this is defined as a Train function. Can we move it to a Train class ? (a structure similar to the trainer function from PyTorch-lightning trainer).
-- You will be able to call a trainer.fit()
-- fit/step/log other methods will become available to the user to modify later on.

Q2: Do we need the separate functions - train_with_grad and train_without_grad? Can we have a single Train (function/class) - that can be used for either based on a user defined argument?

So, do we need the trainer to be a function and do we need separate functions for train_with_grand/train_without_grad?

Let me know what you think. Once these ideas are refined - I will start working on a PR for this.
@chMoussa @Roland-djee @smitchaudhary @n-toscano
Thanks
M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring of legacy code
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants