TRT support for MAISI #701

borisfom · 2024-10-16T01:27:09Z

Description

TRT optimization support for MAISI.
Depends on Project-MONAI/MONAI#8153

Status

Work in progress

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Nic-Ma · 2024-10-22T00:08:29Z

Hi @yiheng-wang-nv ,

Is the CI pipeline broken?

Thanks.

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

models/maisi_ct_generative/configs/inference.json

models/maisi_ct_generative/scripts/sample.py

yiheng-wang-nv · 2024-10-22T03:29:46Z

Hi @KumoLiu , just FYI, the MAISI tensorrt enhancement PR contains the content of this PR: Project-MONAI/MONAI#8153

We may need to merge this one first before merging the MAISI one

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

binliunls · 2024-10-30T16:04:44Z

Hi @borisfom ,
I got the error shown below on an A100 40GB GPU. Is this expected?

Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_item.py", line 374, in evaluate
    return eval(value[len(self.prefix) :], globals_, locals)
  File "<string>", line 1, in <module>
  File "/home/liubin/data/bundles/maisi_ct_generative/scripts/sample.py", line 681, in sample_multiple_images
    synthetic_images, synthetic_labels = self.sample_one_pair(
  File "/home/liubin/data/bundles/maisi_ct_generative/scripts/sample.py", line 759, in sample_one_pair
    synthetic_images, synthetic_labels = ldm_conditional_sample_one_image(
  File "/home/liubin/data/bundles/maisi_ct_generative/scripts/sample.py", line 245, in ldm_conditional_sample_one_image
    down_block_res_samples, mid_block_res_sample = controlnet(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/monai/monai/networks/trt_compiler.py", line 609, in trt_forward
    return self._trt_compiler.forward(self, argv, kwargs)
  File "/opt/monai/monai/networks/trt_compiler.py", line 454, in forward
    raise e
  File "/opt/monai/monai/networks/trt_compiler.py", line 445, in forward
    self._build_and_save(model, build_args)
  File "/opt/monai/monai/networks/trt_compiler.py", line 590, in _build_and_save
    convert_to_onnx(
  File "/opt/monai/monai/networks/utils.py", line 699, in convert_to_onnx
    torch.onnx.export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/__init__.py", line 377, in export
    export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 502, in export
    _export(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
    graph = _optimize_graph(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 639, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1848, in _run_symbolic_function
    return symbolic_fn(graph_context, *inputs, **attrs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 281, in wrapper
    return fn(g, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 173, in scaled_dot_product_attention
    query_scaled = g.op("Mul", query, g.op("Sqrt", scale))
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 92, in op
    return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 239, in _add_op
    inputs = [_const_if_tensor(graph_context, arg) for arg in args]
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 239, in <listcomp>
    inputs = [_const_if_tensor(graph_context, arg) for arg in args]
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 270, in _const_if_tensor
    return _add_op(graph_context, "onnx::Constant", value_z=arg)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 247, in _add_op
    node = _create_node(
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 306, in _create_node
    _add_attribute(node, key, value, aten=aten)
  File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 336, in _add_attribute
    return getattr(node, f"{kind}_")(name, value)
TypeError: z_(): incompatible function arguments. The following argument types are supported:
    1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Invoked with: %728 : Tensor = onnx::Constant(), scope: monai.apps.generation.maisi.networks.controlnet_maisi.ControlNetMaisi::/monai.networks.nets.diffusion_model_unet.AttnDownBlock::down_blocks.2/monai.networks.blocks.spatialattention.SpatialAttentionBlock::attentions.0/monai.networks.blocks.selfattention.SABlock::attn
, 'value', 0.1767766952966369 
(Occurred when translating scaled_dot_product_attention).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/monai/monai/bundle/__main__.py", line 31, in <module>
    fire.Fire()
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/monai/monai/bundle/scripts.py", line 1010, in run
    workflow.run()
  File "/opt/monai/monai/bundle/workflows.py", line 363, in run
    return self._run_expr(id=self.run_id)
  File "/opt/monai/monai/bundle/workflows.py", line 397, in _run_expr
    return self.parser.get_parsed_content(id, **kwargs) if id in self.parser else None
  File "/opt/monai/monai/bundle/config_parser.py", line 290, in get_parsed_content
    return self.ref_resolver.get_resolved_content(id=id, **kwargs)
  File "/opt/monai/monai/bundle/reference_resolver.py", line 193, in get_resolved_content
    return self._resolve_one_item(id=id, **kwargs)
  File "/opt/monai/monai/bundle/reference_resolver.py", line 163, in _resolve_one_item
    self._resolve_one_item(id=d, waiting_list=waiting_list, **kwargs)
  File "/opt/monai/monai/bundle/reference_resolver.py", line 175, in _resolve_one_item
    item.evaluate(globals={f"{self._vars}": self.resolved_content}) if run_eval else item
  File "/opt/monai/monai/bundle/config_item.py", line 376, in evaluate
    raise RuntimeError(f"Failed to evaluate {self}") from e
RuntimeError: Failed to evaluate ConfigExpression: 
"$__local_refs['ldm_sampler'].sample_multiple_images(__local_refs['num_output_samples'])"

Thanks,
Bin

binliunls · 2024-10-31T09:34:49Z

Should be fine for MAISI as I tested.

	maisi bundle inference(ms)	trt_bundle inference(ms)
end2end (latent feature generation)	80237.45	40979.48
end2end (image_decoding)	2187.02	2777.27

Thanks,
Bin Liu

borisfom · 2024-10-31T19:40:34Z

@binliunls : how come image_decoding is much slower with TRT? How do I run a test for that ?

binliunls · 2024-11-01T13:54:53Z

@binliunls : how come image_decoding is much slower with TRT? How do I run a test for that ?

I was running the command line like python -m monai.bundle run --config_file="['configs/inference.json', 'configs/inference_trt.json']" --output_size_xy=256 --output_size_z=256 on an A100 40G GPU. Then the bundle will output the latency for image decoding. And this was an one-time running, since I got the colossus shutdown when I was going to run it serveral times. So there may be some bias. I will do it again once I get a new colossus node.

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

for more information, see https://pre-commit.ci

KumoLiu · 2024-11-15T07:50:13Z

Project-MONAI/MONAI#8153 has been merged.
Do we need update the readme for MAISI and also include the benchmark data there? @binliunls @yiheng-wang-nv

binliunls · 2024-11-16T09:28:27Z

Hi @borisfom ,
Here is the benchmark details about 100 times running on MAISI with 256x256x256 input shape on A100 80GB. I am not sure why the Image Decoding suffers a slowdown. Can be some overhead issues. Will try to figure it out later.

Latency Type	TRT Mean Latency (s)	Bundle Mean Latency (s)
Mask Preparation	2.897087729	2.793987193
Feature Generation	35.12124193	76.54545327
Image Decoding	1.483238726	1.194563277

Latency Type	TRT Median Latency (s)	Bundle Median Latency (s)
Mask Preparation	2.90212667	2.80731046
Feature Generation	35.12641037	76.54435086
Image Decoding	1.490729215	1.17928219

Thanks,
Bin

borisfom · 2024-11-16T09:39:02Z

Well, stage by stage measurements are tricky as processing is asynchronous and the sections may spill into next. Unless you synchronize between stages, it may be misleading.. Is TRT even used in image decoding ? The only 100% right way to measure if, say, converting controlnet to TRT has positive or negative impact is to compare end-to-end runs with original controlnet and TRT controlnet etc. From: binliunls ***@***.***> Date: Saturday, November 16, 2024 at 1:28 AM To: Project-MONAI/model-zoo ***@***.***> Cc: Boris Fomitchev ***@***.***>, Mention ***@***.***> Subject: Re: [Project-MONAI/model-zoo] TRT support for MAISI (PR #701) Hi @borisfom<https://github.com/borisfom> , Here is the benchmark details about 100 times running on MAISI with 256x256x256 input shape on A100 80GB. I am not sure why the Image Decoding suffers a slowdown. Can be some overhead issues. Will try to figure it out later. Latency Type TRT Mean Latency (s) Bundle Mean Latency (s) Mask Preparation 2.897087729 2.793987193 Feature Generation 35.12124193 76.54545327 Image Decoding 1.483238726 1.194563277 Latency Type TRT Median Latency (s) Bundle Median Latency (s) Mask Preparation 2.90212667 2.80731046 Feature Generation 35.12641037 76.54435086 Image Decoding 1.490729215 1.17928219 Thanks, Bin — Reply to this email directly, view it on GitHub<#701 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADMIIL5M4WLMO3QO434WIMT2A4F5FAVCNFSM6AAAAABQAK6QZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBQGQ4TENZSGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

borisfom added 11 commits August 28, 2024 01:38

Added trt_compile configs for vista2d and vista3d

93a4dc5

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Merge branch 'dev' of github.com:Project-MONAI/model-zoo into vista_trt

c401a63

Stash

89dfac1

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Merge branch 'dev' into maisi-trt

c1ab420

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Merge remote-tracking branch 'origin/dev' into maisi-trt

027f71b

Working MAISI

b6628fc

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Merge remote-tracking branch 'origin/dev' into maisi-trt

860b932

Adding TRT support

e91ad57

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

cleanup

b336e06

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

fixing condition

97d2b0e

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Added output_lists option

60000df

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Nic-Ma requested review from yiheng-wang-nv and binliunls October 22, 2024 00:07

yiheng-wang-nv and others added 3 commits October 22, 2024 03:16

update pre commit config

ebb50bc

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

74a6ecc

for more information, see https://pre-commit.ci

update metadata

a9a8b03

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

yiheng-wang-nv reviewed Oct 22, 2024

View reviewed changes

models/maisi_ct_generative/configs/inference.json Outdated Show resolved Hide resolved

yiheng-wang-nv reviewed Oct 22, 2024

View reviewed changes

models/maisi_ct_generative/scripts/sample.py Outdated Show resolved Hide resolved

Addressing code review comments

e827f5c

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Merge remote-tracking branch 'origin/dev' into maisi-trt

76f8fb6

binliunls approved these changes Oct 31, 2024

View reviewed changes

borisfom mentioned this pull request Nov 1, 2024

[export] run_decomposition fails for permute->view sequence pytorch/pytorch#139508

Open

borisfom and others added 2 commits November 8, 2024 00:36

Merge branch 'dev' into maisi-trt

26c0cae

Removing dynamo repro

42fa00c

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

borisfom and others added 5 commits November 12, 2024 17:09

Merge branch 'maisi-trt' of github.com:borisfom/model-zoo into maisi-trt

94a45e0

Merge remote-tracking branch 'origin/dev' into maisi-trt

b4aefe0

[pre-commit.ci] auto fixes from pre-commit.com hooks

938b6a0

for more information, see https://pre-commit.ci

Fixing c_trt_args

658e4f5

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbfecd7

for more information, see https://pre-commit.ci

borisfom requested review from yiheng-wang-nv and binliunls November 15, 2024 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT support for MAISI #701

TRT support for MAISI #701

borisfom commented Oct 16, 2024 •

edited

Loading

Nic-Ma commented Oct 22, 2024

yiheng-wang-nv commented Oct 22, 2024

binliunls commented Oct 30, 2024

binliunls commented Oct 31, 2024 •

edited

Loading

borisfom commented Oct 31, 2024

binliunls commented Nov 1, 2024 •

edited

Loading

KumoLiu commented Nov 15, 2024

binliunls commented Nov 16, 2024

borisfom commented Nov 16, 2024 via email

TRT support for MAISI #701

Are you sure you want to change the base?

TRT support for MAISI #701

Conversation

borisfom commented Oct 16, 2024 • edited Loading

Description

Status

Nic-Ma commented Oct 22, 2024

yiheng-wang-nv commented Oct 22, 2024

binliunls commented Oct 30, 2024

binliunls commented Oct 31, 2024 • edited Loading

borisfom commented Oct 31, 2024

binliunls commented Nov 1, 2024 • edited Loading

KumoLiu commented Nov 15, 2024

binliunls commented Nov 16, 2024

borisfom commented Nov 16, 2024 via email

borisfom commented Oct 16, 2024 •

edited

Loading

binliunls commented Oct 31, 2024 •

edited

Loading

binliunls commented Nov 1, 2024 •

edited

Loading