Skip to content

Latest commit

 

History

History
85 lines (71 loc) · 3.51 KB

script.md

File metadata and controls

85 lines (71 loc) · 3.51 KB

e. before running training/evaluation

download backbone checkpoints and create work dir

# create a work dir
mkdir work_dirs
mkdir ckpts
# https://github.com/open-mmlab/mmcv/blob/master/mmcv/model_zoo/open_mmlab.json
cd ckpts
wget https://download.openmmlab.com/pretrain/third_party/resnet50_msra-5891d200.pth

expected proper directory structure:

repvf_workspace
├── mmdetection3d
│   ├── projects
│       ├── repvf (symbolic link)
│       └── ...
│   ├── tools
│       ├── train.py (modified)
│       └── ...
│   └── ...
├── RepVF (this repo)
├── ckpts
│   ├── resnet50_msra-5891d200.pth
└──  data
    └──  waymo
        ├── waymo_format
        ├── openlane_format
        ├── training_filtered.pkl
        ├── validation_filtered.pkl
        ├── cam_gt_filtered.bin
        ├── training_filtered_300.pkl
        ├── validation_filtered_300.pkl
        └── cam_gt_filtered_300.bin

For SyncBN implementation, modify mmdetection3d/tools/train.py to intergrate (this is only required for 30% data experiments for consistency):

...  
    model.init_weights()

    if cfg.get('SyncBN', False):
        import torch.nn as nn
        model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
        logger.info("Using SyncBN")
  
    logger.info(f'Model:\n{model}')
....

we have also provided our modified version of train.py that can be used out of the box.

f. training and evaluation command

assume you're under the workspace we have created, and here we use mmdetection3d/projects/repvf/configs/rftr_r50_15p_1000.py as example:

for debug or single-card:

python mmdetection3d/tools/train.py mmdetection3d/projects/repvf/configs/rftr_r50_15p_1000.py --work-dir work_dirs/rftr_r50_15p_1000/

for ddp training/evluation (4 is the gpu count):

# for training
bash mmdetection3d/tools/dist_train.sh mmdetection3d/projects/repvf/configs/rftr_r50_15p_1000.py 4 --work-dir work_dirs/rftr_r50_15p_1000/
# for evluation
bash mmdetection3d/tools/dist_test.sh mmdetection3d/projects/repvf/configs/rftr_r50_15p_1000.py work_dirs/rftr_r50_15p_1000/epoch_24.pth 4 --eval bbox

it would take about 3~4 days on 4*RTX4090 to train, we have also provided a 30% subset data version config rftr_r50_20p_300_syncbn_flash_bs8.py that would take about less than one day.

to resume training:

bash mmdetection3d/tools/dist_train.sh mmdetection3d/projects/repvf/configs/rftr_r50_15p_1000.py 4 --work-dir work_dirs/rftr_r50_15p_1000/ --resume-from work_dirs/rftr_r50_15p_1000/epoch_x.pth

environment variables:

environment variable purpose
SAVE_FOR_VISUALIZATION set to True to save predictions as numpy arrays
SAVE_PLT_BBOX set to True to save visualizations
WANDB_API_KEY your wandb api key; modify default_runtime to use tensorboard