Welcome to our Deep Learning Project Template, crafted for researchers and developers working with PyTorch. This template is designed to streamline the setup, execution, and modification of deep learning experiments, allowing you to focus more on model development and less on boilerplate code.
- Multi-GPU Support: Utilize the power of multiple GPUs or devices to accelerate your training using accelerate.
- Flexible Configuration: Easily configure your experiments with the versatile YACS configuration system, enabling quick adjustments for different scenarios.
- Clear Architecture: Our template is structured for clarity and ease of use, ensuring you can understand and modify the code with minimal effort.
- Transparent Training Process: Enjoy a clear display of the training process, helping you monitor performance and make necessary tweaks in real-time.
Our project is organized as follows to help you navigate and manage the codebase effectively:
📦deep-learning-template
├── 📂configs # Configuration files for experiments
│ ├── 📂cifar
│ │ ├── cifar_big.yaml # Configuration for a larger model
│ │ └── cifar_small.yaml # Configuration for a smaller model
│ └── 📄default.yaml # Default config for all experiemnts
├── 📂dataset # Modules for data handling
│ └── 📄data_loader.py # Data loader script
├── 📂modeling # Neural network models and loss functions
│ └── 📄model.py # Example model file
├── 📂utils # Utility scripts for various tasks
│ ├── 📄base_trainer.py # Base Trainer class for printing training details
│ ├── 📄logger.py # Logging utilities
│ └── 📄metrics.py # Performance metrics
├── 📄.gitignore # Specifies intentionally untracked files to ignore
├── 📄LICENSE # License file for the project
├── 📄README.md # README file with project details
├── 📄config.py # Main configuration script
├── 📄linter.sh # Shell script for formating the code
├── 📄requirements.txt # Dependencies and libraries
└── 📄engine.py # Main training and validation script
Configure your models and training setups with ease. Modify the config.py
file to suit your experimental needs. Our system uses YACS, which allows for a hierarchical configuration with overrides for command-line options. The recommeneded structure we used:
# Basic setup of the project
cfg = CN()
cfg._BASE_ = None
cfg.PROJECT_DIR = None
cfg.PROJECT_LOG_WITH = ["tensorboard"]
# Control the modeling settings
cfg.MODEL = CN()
# ...
# Control the loss settings
cfg.LOSS = CN()
# ...
# Control the dataset settings (e.g., path)
cfg.DATA = CN()
# ...
# Control the training setup (e.g., lr, epoch)
cfg.TRAIN = CN()
# ...
# Control the training setup (e.g., batch size)
cfg.EVAL = CN()
# ...
To start a training, run:
python engine.py --config configs/your_config.yaml
# Concrete example
python traing.py --config configs/cifar/cifar-small.yaml
After the training start, users can find the training folder called logs
. To modify the default setting, please change the option LOG_DIR
. Followed by logs
is the PROJECT_DIR
defined in the config file.
📦{LOG_DIR}/{PROJECT_DIR}
├── 📂checkpoint # Folder for saving checkpoints
├── 📂... # Other files setup by tracker(s)
└── 📄train.log # Logs during training
Users can override the options with the --opts
flag. For instance, to resume the training:
python engine.py --config configs/your_config.yaml --opts TRAIN.RESUME_CHECKPOINT path/to/checkpoint
# Concrete example
python engine.py --config configs/cifar/cifar-small.yaml --opts TRAIN.RESUME_CHECKPOINT logs/cifar-small/checkpoint/best_model_epoch_10.pth
Please check the config setup section for more details.
This project template is made based on accelerate to provide multi-GPU training. A simple example to train a model with 2 GPUs:
accelerate launch --multi_gpu --num_processes=2 engine.py --config configs/your_config.yaml --opts (optional)
# Concrete example
accelerate launch --multi_gpu --num_processes=2 engine.py --config configs/cifar/cifar-small.yaml \
--opts TRAIN.RESUME_CHECKPOINT logs/cifar-small/checkpoint/best_model_epoch_10.pth
Trackers such as tensorboard
, wandb
, and aim
can be setup from the PROJECT_LOG_WITH
option. We support multiple trackers at once through accelerate! Users are encouraged to find our which is the best for the project from here. Below are some examples to open the local monitor:
# tensorboard
tensorboard --logdir logs
# aim
aim up --repo logs
- Integrating New Models: Place your model files in the
modeling/
folder and update the configurations accordingly. - Adding New Datasets: Implement data handling in the
dataset/
folder and reference it in your config files. - Utility Scripts: Enhance functionality by adding utility scripts in the
utils/
folder. - Customized Training Process: Please change the
engine.py
to modify the training process.
- Support iteration based training with infinite loader.
Special thanks to the creators of accelerate and YACS, whose tools have significantly enhanced the flexibility and usability of this template. Also, we appreciate the inspiration from existing projects like those by L1aoXingyu and victoresque.
Feel free to modify and adapt this README to better fit the specifics and details of your project.