Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhance] remove expand from mmdet rewriter #371

Merged
merged 1 commit into from
Apr 29, 2022

Conversation

grimoire
Copy link
Member

Motivation

Expand cost more memory at runtime and some backend does not support it.

Modification

Try to remove all expand ops that can be broadcast.

@grimoire grimoire added the enhancement New feature or request label Apr 15, 2022
Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, have to test that without expand op, would it be correctly broadcasted? And for batch inference, is it correct?

@grimoire
Copy link
Member Author

@RunningLeon both have been tested.

import cv2
import numpy as np
import torch
from mmcv import Config
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.backend.tensorrt import TRTWrapper


def visualize(img, scale_factor, dets, window_name='img'):
    if not isinstance(dets, np.ndarray):
        dets = dets.detach().cpu().numpy()
    scores = dets[:, 4]
    bboxes = dets[:, :4]
    bboxes = bboxes / scale_factor

    for score, bbox in zip(scores, bboxes):
        if score < 0.5:
            continue
        bbox = [int(b) for b in bbox]
        cv2.rectangle(
            img, tuple(bbox[:2]), tuple(bbox[2:]), (0, 0, 255), thickness=5)
    cv2.imshow(window_name, img)


def main():
    model_cfg = 'mmdetection/configs/ssd/ssd300_coco.py'
    deploy_cfg = 'mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-300x300-512x512.py'
    engine_path = 'ssd_deploy_trt/end2end.engine'
    img0_path = 'mmdetection/demo/demo.jpg'
    img1_path = 'demo_flip.jpg'

    task_processor = build_task_processor(
        Config.fromfile(model_cfg), Config.fromfile(deploy_cfg), 'cuda')
    model_input0, _ = task_processor.create_input(img0_path)
    model_input1, _ = task_processor.create_input(img1_path)

    tensor0 = model_input0['img'][0]
    tensor1 = model_input1['img'][0]
    scale_factor0 = model_input0['img_metas'][0][0]['scale_factor']
    scale_factor1 = model_input1['img_metas'][0][0]['scale_factor']

    tensor = torch.cat([tensor0, tensor1]).cuda()

    model = TRTWrapper(engine=engine_path)

    output = model({'input': tensor})

    img = cv2.imread(img0_path)
    dets = output['dets'][0].detach().cpu().numpy()
    visualize(img, scale_factor0, dets, 'img0')

    img = cv2.imread(img1_path)
    dets = output['dets'][1].detach().cpu().numpy()
    visualize(img, scale_factor1, dets, 'img1')

    cv2.waitKey(0)


if __name__ == '__main__':
    main()

@grimoire
Copy link
Member Author

@RunningLeon Note that batch support is broken even on master branch with TensorRT 8.2

@RunningLeon
Copy link
Collaborator

@RunningLeon Note that batch support is broken even on master branch with TensorRT 8.2

We may need to fix batch inference before this PR is merged.

@grimoire
Copy link
Member Author

We may need to fix batch inference before this PR is merged.

I am not sure if I can fix it. NVIDIA/TensorRT#1917 say there will be a fix in TensorRT 8.4 GA, let's see if it can solve our problem.

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit 5cbb065 into open-mmlab:dev-v0.5.0 Apr 29, 2022
lvhan028 pushed a commit to lvhan028/mmdeploy that referenced this pull request Jun 3, 2022
lvhan028 pushed a commit to lvhan028/mmdeploy that referenced this pull request Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants