-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhance] remove expand from mmdet rewriter #371
[Enhance] remove expand from mmdet rewriter #371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, have to test that without expand op, would it be correctly broadcasted? And for batch inference, is it correct?
@RunningLeon both have been tested. import cv2
import numpy as np
import torch
from mmcv import Config
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.backend.tensorrt import TRTWrapper
def visualize(img, scale_factor, dets, window_name='img'):
if not isinstance(dets, np.ndarray):
dets = dets.detach().cpu().numpy()
scores = dets[:, 4]
bboxes = dets[:, :4]
bboxes = bboxes / scale_factor
for score, bbox in zip(scores, bboxes):
if score < 0.5:
continue
bbox = [int(b) for b in bbox]
cv2.rectangle(
img, tuple(bbox[:2]), tuple(bbox[2:]), (0, 0, 255), thickness=5)
cv2.imshow(window_name, img)
def main():
model_cfg = 'mmdetection/configs/ssd/ssd300_coco.py'
deploy_cfg = 'mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-300x300-512x512.py'
engine_path = 'ssd_deploy_trt/end2end.engine'
img0_path = 'mmdetection/demo/demo.jpg'
img1_path = 'demo_flip.jpg'
task_processor = build_task_processor(
Config.fromfile(model_cfg), Config.fromfile(deploy_cfg), 'cuda')
model_input0, _ = task_processor.create_input(img0_path)
model_input1, _ = task_processor.create_input(img1_path)
tensor0 = model_input0['img'][0]
tensor1 = model_input1['img'][0]
scale_factor0 = model_input0['img_metas'][0][0]['scale_factor']
scale_factor1 = model_input1['img_metas'][0][0]['scale_factor']
tensor = torch.cat([tensor0, tensor1]).cuda()
model = TRTWrapper(engine=engine_path)
output = model({'input': tensor})
img = cv2.imread(img0_path)
dets = output['dets'][0].detach().cpu().numpy()
visualize(img, scale_factor0, dets, 'img0')
img = cv2.imread(img1_path)
dets = output['dets'][1].detach().cpu().numpy()
visualize(img, scale_factor1, dets, 'img1')
cv2.waitKey(0)
if __name__ == '__main__':
main() |
@RunningLeon Note that batch support is broken even on master branch with TensorRT 8.2 |
We may need to fix batch inference before this PR is merged. |
I am not sure if I can fix it. NVIDIA/TensorRT#1917 say there will be a fix in TensorRT 8.4 GA, let's see if it can solve our problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
Expand cost more memory at runtime and some backend does not support it.
Modification
Try to remove all expand ops that can be broadcast.