- Goal : model compression using by Structured Sparsity
- Base Model : Resnet18
- Dataset : Imagenet100
- Pruning Process :
- Train base model with Imagenet100 dataset
- Prune the model in a 2:4 sparse pattern for the FC and convolution layers.
- Retrain the pruned model
- Convert to TensorRT PTQ int8 Model
- Device
- MSI laptop
- CPU i7-11375H
- GPU RTX-3060
- Dependency
- WSL(Ubuntu 22.04)
- cuda 12.1
- cudnn 8.9.2
- tensorrt 8.6.1
- pytorch 2.1.0+cu121
Quantization_EX/
├── calibrator.py # calibration class for TensorRT PTQ
├── common.py # utils for TensorRT
├── onnx_export.py # onnx export ASP model
├── train.py # base model train with ASP
├── trt_infer_2.py # TensorRT model build using Polygraphy
├── trt_infer_acc.py # TensorRT model accuracy check
├── trt_infer.py # TensorRT model infer
├── utils.py # utils
├── LICENSE
└── README.md
- Calculation 10000 iteration with one input data [1, 3, 224, 224]
TensorRT PTQ | TensorRT PTQ with ASP | |
Precision | Int8 | Int8 |
Avg Latency [ms] | 0.418 ms | 0.388 ms |
Avg FPS [frame/sec] | 2388.33 fps | 2572.17 fps |
Gpu Memory [MB] | 123 MB | 119 MB |
- train -> onnx_export -> trt_infer -> trt_infer_acc
- ASP (Automatic SParsity) : https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity
- Polygraphy : https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy
- imagenet100 : https://www.kaggle.com/datasets/ambityga/imagenet100