Exporting to ONNX; a painful road fraught with poor late-on-a-Friday decisions. #1144

MathijsdeBoer · 2022-08-26T17:03:51Z

MathijsdeBoer
Aug 26, 2022

This is a story of a poor PhD student who's sitting in the office late on a friday evening. "Can we serve this trained model up?" they had asked a few days ago. "Sure", the student had said, "We can package the trained model in a dockerfile, and run the prediction on some bound directories!"

"But won't that require a GPU?", the response scared the student, ever so slightly. "Well yeah, but running on a CPU will probably take a very long time. Our GPU server with slightly old and underpowered hardware already takes about 5-7 minutes to predict on one sample!" "So no way then?" "Not with just the nnUNet code, no." A short silence fell.

"I suppose you could try exporting to ONNX, I'm sure there's a CPU runtime for that?"

Just give me the solution, old man!

Fine.

Minimal-ish example:

import numpy as np
import torch
from torch import onnx as onnx

from nnunet.training.model_restore import load_model_and_checkpoint_files

# Load the model
trainer, params = load_model_and_checkpoint_files(
    folder="/path/to/directory/with/folds/[all, fold_0, fold_1, etc...]",
    folds="all",  # select the desired fold here
)

# Load the parameters
trainer.load_checkpoint_ram(params[0], train=False)

# Find the input patch size
vars(trainer)["patch_size"]
# We add a (1, 1,) to the shape here to function as a channel dimension and batch size
patch_size = (1, 1,) + tuple([i for i in vars(trainer)["patch_size"]])

# Generate some random input data
dummy_input = np.random.randn(*patch_size)

# Retrieve the model and set it to eval mode
model = trainer.network
model.eval()  # probably optional, as we load with train=False

onnx.export(
    model,
    torch.from_numpy(dummy_input).type(torch.float).cuda(),
    "filename.onnx",
    verbose=True,  # Optional
    input_names=["input"], # Possibly optional?
    output_names=["output"] # Possibly optional?
)

There's been a few discussions on this topic, however the usual response has been "look at the code" (Which is fair enough, the maintainers are not obliged to support any and all potential weird things people are trying with their code). I've been doing just that, and I've hit a bit of a snag. Because I am only interested in getting the actual model exported, I'm disregarding acquiring the pre- and postprocessing steps here for brevity.

To export to ONNX with Pytorch, you have to call:

torch.onnx.export(model, dummy_input, "filename.onnx", ...)

Simple enough, acquire the nn.Module object, a random input and feed them both to the export function.

I've gone through the code, and have ended up acquiring the trainer and params objects with:

# Based on the code as found in nnunet.inference.predict.py:predict_cases(), line 184
trainer, params = load_model_and_checkpoint_files(
    folder="[snip]",
    folds="all",
)

I reckon the model parameters are loaded with:

# Based on the code as found in nnunet.inference.predict.py:predict_cases(), line 202
trainer.load_checkpoint_ram(params[0], train=False)

Finding the input shape for the model isn't too difficult, as it's in trainer, too:

vars(trainer)["patch_size"]
# We add a (1, 1,) to the shape here to function as a channel dimension and batch size
patch_size = (1, 1,) + tuple([i for i in vars(trainer)["patch_size"]])
patch_size

> (1, 1, 112, 160, 128)

Generating a random input array with:

dummy_input = np.random.randn(*patch_size)

Getting the nn.Module object seems to work like this:

# Reading the trainer code I found a self.network variable, logic dictates that this is our nn.Module used in nnUNet
model = trainer.network
model.eval()

# shortened for display purposes
Generic_UNet(
  (conv_blocks_localization): ModuleList(
    (0): Sequential(
      (0): StackedConvLayers(
        (blocks): Sequential(
        ...
)

Now, onnx.export() will run the input dummy_input through the model, tracing each step as it goes to build the graph.
Per the PyTorch documentation, this line should do:

onnx.export(model, torch.from_numpy(dummy_input), "filename.onnx", verbose=True, input_names=["input"], output_names=["output"])

Unfortunately, that errors out with an impressive stacktrace:

stacktrace

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 onnx.export(model, torch.from_numpy(dummy_input), "nnUnet/Task500.onnx", verbose=True, input_names=["input"], output_names=["output"])

File /opt/conda/lib/python3.8/site-packages/torch/onnx/__init__.py:332, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
     57 r"""
     58 Exports a model into ONNX format. If ``model`` is not a
     59 :class:`torch.jit.ScriptModule` nor a :class:`torch.jit.ScriptFunction`, this runs
   (...)
    328     model to the file ``f`` even if this is raised.
    329 """
    331 from torch.onnx import utils
--> 332 return utils.export(model, args, f, export_params, verbose, training,
    333                     input_names, output_names, operator_export_type, opset_version,
    334                     do_constant_folding, dynamic_axes,
    335                     keep_initializers_as_inputs, custom_opsets,
    336                     export_modules_as_functions)

File /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:125, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    119 def export(model, args, f, export_params=True, verbose=False, training=None,
    120            input_names=None, output_names=None, operator_export_type=OperatorExportTypes.ONNX,
    121            opset_version=None, do_constant_folding=True, dynamic_axes=None,
    122            keep_initializers_as_inputs=None, custom_opsets=None,
    123            export_modules_as_functions=False):
--> 125     _export(model, args, f, export_params, verbose, training, input_names, output_names,
    126             operator_export_type=operator_export_type, opset_version=opset_version,
    127             do_constant_folding=do_constant_folding, dynamic_axes=dynamic_axes,
    128             keep_initializers_as_inputs=keep_initializers_as_inputs,
    129             custom_opsets=custom_opsets, export_modules_as_functions=export_modules_as_functions)

File /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:803, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
    799     dynamic_axes = {}
    800 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
    802 graph, params_dict, torch_out = \
--> 803     _model_to_graph(model, args, verbose, input_names,
    804                     output_names, operator_export_type,
    805                     val_do_constant_folding,
    806                     fixed_batch_size=fixed_batch_size,
    807                     training=training,
    808                     dynamic_axes=dynamic_axes)
    810 # TODO: Don't allocate a in-memory string for the protobuf
    811 defer_weight_export = export_type is not ExportTypes.PROTOBUF_FILE

File /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:544, in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
    541 if isinstance(args, (torch.Tensor, int, float, bool)):
    542     args = (args, )
--> 544 graph, params, torch_out, module = _create_jit_graph(model, args)
    546 params_dict = _get_named_param_dict(graph, params)
    548 try:

File /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:449, in _create_jit_graph(model, args)
    447     return graph, params, torch_out, None
    448 else:
--> 449     graph, torch_out = _trace_and_get_graph_from_model(model, args)
    450     torch._C._jit_pass_onnx_lint(graph)
    451     state_dict = _unique_state_dict(model)

File /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:400, in _trace_and_get_graph_from_model(model, args)
    393 def _trace_and_get_graph_from_model(model, args):
    394 
    395     # A basic sanity check: make sure the state_dict keys are the same
    396     # before and after running the model.  Fail fast!
    397     orig_state_dict_keys = _unique_state_dict(model).keys()
    399     trace_graph, torch_out, inputs_states = \
--> 400         torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
    401     warn_on_static_input_change(inputs_states)
    403     if orig_state_dict_keys != _unique_state_dict(model).keys():

File /opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py:1166, in _get_trace_graph(f, args, kwargs, strict, _force_outplace, return_inputs, _return_inputs_states)
   1164 if not isinstance(args, tuple):
   1165     args = (args,)
-> 1166 outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
   1167 return outs

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py:127, in ONNXTracedModule.forward(self, *args)
    124     else:
    125         return tuple(out_vars)
--> 127 graph, out = torch._C._create_graph_by_tracing(
    128     wrapper,
    129     in_vars + module_state,
    130     _create_interpreter_name_lookup_fn(),
    131     self.strict,
    132     self._force_outplace,
    133 )
    135 if self._return_inputs:
    136     return graph, outs[0], ret_inputs[0]

File /opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py:118, in ONNXTracedModule.forward.<locals>.wrapper(*args)
    116 if self._return_inputs_states:
    117     inputs_states.append(_unflatten(in_args, in_desc))
--> 118 outs.append(self.inner(*trace_inputs))
    119 if self._return_inputs_states:
    120     inputs_states[0] = (inputs_states[0], trace_inputs)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1117, in Module._slow_forward(self, *input, **kwargs)
   1115         recording_scopes = False
   1116 try:
-> 1117     result = self.forward(*input, **kwargs)
   1118 finally:
   1119     if recording_scopes:

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:391, in Generic_UNet.forward(self, x)
    389 seg_outputs = []
    390 for d in range(len(self.conv_blocks_context) - 1):
--> 391     x = self.conv_blocks_context[d](x)
    392     skips.append(x)
    393     if not self.convolutional_pooling:

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1117, in Module._slow_forward(self, *input, **kwargs)
   1115         recording_scopes = False
   1116 try:
-> 1117     result = self.forward(*input, **kwargs)
   1118 finally:
   1119     if recording_scopes:

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:142, in StackedConvLayers.forward(self, x)
    141 def forward(self, x):
--> 142     return self.blocks(x)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1117, in Module._slow_forward(self, *input, **kwargs)
   1115         recording_scopes = False
   1116 try:
-> 1117     result = self.forward(*input, **kwargs)
   1118 finally:
   1119     if recording_scopes:

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1117, in Module._slow_forward(self, *input, **kwargs)
   1115         recording_scopes = False
   1116 try:
-> 1117     result = self.forward(*input, **kwargs)
   1118 finally:
   1119     if recording_scopes:

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:65, in ConvDropoutNormNonlin.forward(self, x)
     64 def forward(self, x):
---> 65     x = self.conv(x)
     66     if self.dropout is not None:
     67         x = self.dropout(x)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1117, in Module._slow_forward(self, *input, **kwargs)
   1115         recording_scopes = False
   1116 try:
-> 1117     result = self.forward(*input, **kwargs)
   1118 finally:
   1119     if recording_scopes:

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:592, in Conv3d.forward(self, input)
    591 def forward(self, input: Tensor) -> Tensor:
--> 592     return self._conv_forward(input, self.weight, self.bias)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:587, in Conv3d._conv_forward(self, input, weight, bias)
    575 if self.padding_mode != "zeros":
    576     return F.conv3d(
    577         F.pad(
    578             input, self._reversed_padding_repeated_twice, mode=self.padding_mode
   (...)
    585         self.groups,
    586     )
--> 587 return F.conv3d(
    588     input, weight, bias, self.stride, self.padding, self.dilation, self.groups
    589 )

NotImplementedError: Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::slow_conv3d_forward' is only available for these backends: [Dense, PrivateUse1, PrivateUse2, PrivateUse3, UNKNOWN_TENSOR_TYPE_ID, QuantizedCPU, QuantizedCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, AutogradCPU].

CPU: registered at /opt/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:21390 [kernel]
BackendSelect: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:113 [backend fallback]
Named: registered at /opt/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /opt/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradCPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradCUDA: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradXLA: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradMLC: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradIPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradXPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradHPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradLazy: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse1: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse2: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse3: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
Tracer: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp:9834 [kernel]
AutocastCPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:462 [backend fallback]
Autocast: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at /opt/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1059 [backend fallback]
VmapMode: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /opt/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:65 [backend fallback]
PythonTLSSnapshot: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:117 [backend fallback]

Ok, maybe we made a mistake?

Let's try feeding the patch directly to the model:

model(torch.from_numpy(dummy_input))

stacktrace

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [11], in <cell line: 2>()
      1 # onnx.export(model, torch.from_numpy(dummy_input), "nnUnet/Task500.onnx", verbose=True, input_names=["input"], output_names=["output"])
----> 2 model(torch.from_numpy(dummy_input))

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:391, in Generic_UNet.forward(self, x)
    389 seg_outputs = []
    390 for d in range(len(self.conv_blocks_context) - 1):
--> 391     x = self.conv_blocks_context[d](x)
    392     skips.append(x)
    393     if not self.convolutional_pooling:

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:142, in StackedConvLayers.forward(self, x)
    141 def forward(self, x):
--> 142     return self.blocks(x)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.local/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py:65, in ConvDropoutNormNonlin.forward(self, x)
     64 def forward(self, x):
---> 65     x = self.conv(x)
     66     if self.dropout is not None:
     67         x = self.dropout(x)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1129, in Module._call_impl(self, *input, **kwargs)
   1125 # If we don't have any hooks, we want to skip the rest of the logic in
   1126 # this function, and just call forward.
   1127 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1128         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1129     return forward_call(*input, **kwargs)
   1130 # Do not call functions when jit is used
   1131 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:592, in Conv3d.forward(self, input)
    591 def forward(self, input: Tensor) -> Tensor:
--> 592     return self._conv_forward(input, self.weight, self.bias)

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:587, in Conv3d._conv_forward(self, input, weight, bias)
    575 if self.padding_mode != "zeros":
    576     return F.conv3d(
    577         F.pad(
    578             input, self._reversed_padding_repeated_twice, mode=self.padding_mode
   (...)
    585         self.groups,
    586     )
--> 587 return F.conv3d(
    588     input, weight, bias, self.stride, self.padding, self.dilation, self.groups
    589 )

NotImplementedError: Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::slow_conv3d_forward' is only available for these backends: [Dense, PrivateUse1, PrivateUse2, PrivateUse3, UNKNOWN_TENSOR_TYPE_ID, QuantizedCPU, QuantizedCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, AutogradCPU].

CPU: registered at /opt/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:21390 [kernel]
BackendSelect: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:113 [backend fallback]
Named: registered at /opt/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /opt/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradCPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradCUDA: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradXLA: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradMLC: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradIPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradXPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradHPU: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradLazy: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse1: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse2: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
AutogradPrivateUse3: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:10593 [autograd kernel]
Tracer: registered at /opt/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp:9834 [kernel]
AutocastCPU: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:462 [backend fallback]
Autocast: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:305 [backend fallback]
Batched: registered at /opt/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1059 [backend fallback]
VmapMode: fallthrough registered at /opt/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /opt/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:65 [backend fallback]
PythonTLSSnapshot: registered at /opt/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:117 [backend fallback]

Nope, same error. This is strange, because this time there's no weird torch.onnx or jit stuff that might get in our way.
I've stepped through the code to see how the normal prediction command, nnUNet_predict works.

First it calls:
nnunet/inference/predict_simple.py:main()
That one calls:
nnunet/inference/predict.py:predict_from_folder()
to:
nnunet/inference/predict.py:predict_cases()
to:
nnunet/training/network_training/nnUNetTrainer.py:nnUNetTrainer.predict_preprocessed_data_return_seg_and_softmax()
to:
nnunet/network_architecture/neural_network.py:SegmentationNetwork.predict_3D()
Here we have a choice, tiled or untiled. As we're just feeding a single patch through it makes sense to follow the untiled path:
nnunet/network_architecture/neural_network.py:SegmentationNetwork._internal_predict_3D_3Dconv()
Are we there yet?
This one calls:
nnunet/network_architecture/neural_network.py:SegmentationNetwork._internal_maybe_mirror_and_pred_3D()
Which contains the line:

pred = self.inference_apply_nonlin(self(x))

A-ha!
But wait, that just calls the same function that we called with model(dummy_input) again with self(x)!

Because at this point we're living on a Friday evening at about 18:15, my brain cells are a little boiled. But even a boiled stew is right sometimes, or whatever the saying was.

Could it be?
Could it be that I forgot to send my tensor to the GPU?

I try:

model(torch.from_numpy(dummy_input).cuda())

[stacktrace omitted]
RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

[You can imagine your favorite expletive and paste it here]

I quickly change the command to:

model(torch.from_numpy(dummy_input).type(torch.float).cuda())

(tensor([[[[[ 4.0648,  3.9651,  4.1651,  ...,  3.7678,  3.5736,  3.9431],
            [ 3.9661,  4.3929,  4.7134,  ...,  3.8528,  3.6856,  3.8387],
            [ 3.9412,  4.3298,  4.9328,  ...,  4.1890,  3.6333,  3.8925],
            ...,
            [ 3.9705,  4.2863,  4.6023,  ...,  6.7092,  6.5922,  6.3863],
            [ 4.0792,  4.4517,  4.5448,  ...,  6.7306,  6.7072,  6.4194],
            [ 4.1815,  4.2111,  4.3309,  ...,  6.2245,  6.1992,  5.9105]],
...

So I plug that into the earlier onnx.export() call:

onnx.export(
    model,
    torch.from_numpy(dummy_input).type(torch.float).cuda(),
    "filename.onnx",
    verbose=True,
    input_names=["input"],
    output_names=["output"]
)

Exported graph: graph(%input : Float(1, 1, 112, 160, 128, strides=[2293760, 2293760, 20480, 128, 1], requires_grad=0, device=cuda:0),
      %conv_blocks_localization.0.0.blocks.0.conv.weight : Float(320, 640, 3, 3, 3, strides=[17280, 27, 9, 3, 1], requires_grad=1, device=cuda:0),
      %conv_blocks_localization.0.0.blocks.0.conv.bias : Float(320, strides=[1], requires_grad=1, device=cuda:0),
      %conv_blocks_localization.0.0.blocks.0.instnorm.weight : Float(320, strides=[1], requires_grad=1, device=cuda:0),
...

Well now, that seems to have worked!
There is indeed an .onnx file written out, and there's only the minimal amount of scary red text.

Note that I haven't tried to load the model in an actual ONNX runtime yet, it's currently nearing 19:00 on the same Friday evening, and I am ready for the weekend.

I hope my painful day helps some other people out, and that this may serve as a cautionary tale... Don't export nnUNets on a Friday evening after 18:00... They grow evil, or something like that.

I would also like to mention that I did this all to myself, none of the people referred to in the intro actually made me do this, I just really wanted to figure this puzzle out.

kbthornton · 2022-08-30T21:10:04Z

kbthornton
Aug 30, 2022

My approach was the following:

After training on all 5 folds and running inference using “nnUNet_predict”, I renamed one of my predicted “.nii.gz” files to “.nii.gz.backup”.
After making the modifications below, I then reran “nnUNet_predict”. While predicting, the models from all 5 folds were loaded, converted to Onnx, and saved.

The modifications I made were the following:

In network_trainer.py add a member variable “modelIndex = -1”
Then in predict_cases (predict.py) where the models from the different folds are loaded, you will set trainer.modelIndex = 0 after fold 0 model is loaded, and then increment trainer.modelIndex after folds 1, 2, 3, 4 models are loaded (assuming you use 5 folds).
Then inside predict_preprocessed_data_return_seg_and_softmax (nnUNetTrainerV2.py), add a call self.network.saveToOnnx(trainer.modelIndex).
Implement saveToOnnx(self, idx: int = 0) within generic_UNet.py. Assuming your patch size is (nx,ny,nz), you do:
input = torch.randn(1,1,nx,ny,nz, requires_grad=False).cuda() #assuming you use 1 channel
torch.onnx.export(self, input, fn) #where fn is the filename constructed from the index “idx”.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporting to ONNX; a painful road fraught with poor late-on-a-Friday decisions. #1144

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Exporting to ONNX; a painful road fraught with poor late-on-a-Friday decisions. #1144

MathijsdeBoer Aug 26, 2022

Replies: 1 comment

kbthornton Aug 30, 2022

MathijsdeBoer
Aug 26, 2022

kbthornton
Aug 30, 2022