Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Unsqueeze_93...Softmax_2088]} #1917

Closed
Xinchengzelin opened this issue Apr 12, 2022 · 23 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@Xinchengzelin
Copy link

Xinchengzelin commented Apr 12, 2022

I use trtexec (TensorRT 8.2.4.2 GA CUDA11.4) to convert my onnx model to trt engine, It show the error below. I tried TensorRT8.4EA Cuda11.5, the transform is OK. and I check the support operator in 8.2GA includes Unsequeeze Softmax, and it's same with that of 8.4EA, So I don't know why this error happens?

[04/12/2022-11:30:51] [V] [TRT] *************** Autotuning format combination: Bool(450,1), Bool(22500,50,1), Float(202500,450,9,1), Float(900,2,1) -> Float(512,1), Float(3,1) ***************
[04/12/2022-11:30:51] [V] [TRT] --------------- Timing Runner: {ForeignNode[Unsqueeze_93...Softmax_2088]} (Myelin)
[04/12/2022-11:30:51] [W] [TRT] Skipping tactic 0 due to insuficient memory on requested size of 26931712 detected for tactic 0.
[04/12/2022-11:30:51] [V] [TRT] Fastest Tactic: -3360065831133338131 Time: inf
[04/12/2022-11:30:51] [E] Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Unsqueeze_93...Softmax_2088]}.)
[04/12/2022-11:30:51] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[04/12/2022-11:30:51] [E] Engine could not be created from network
[04/12/2022-11:30:51] [E] Building engine failed
[04/12/2022-11:30:51] [E] Failed to create engine from model.
[04/12/2022-11:30:51] [E] Engine set up failed
@ttyio
Copy link
Collaborator

ttyio commented Apr 12, 2022

@Xinchengzelin , the ForeignNode[Unsqueeze_93...Softmax_2088] include several nodes that handle by our internal myelin compiler, seems we fixed a failure between 8.2 and 8.4, could you use 8.4 since it fixed your issue? thanks

@ttyio ttyio added triaged Issue has been triaged by maintainers Topic: Myelin labels Apr 12, 2022
@Xinchengzelin
Copy link
Author

@Xinchengzelin , the ForeignNode[Unsqueeze_93...Softmax_2088] include several nodes that handle by our internal myelin compiler, seems we fixed a failure between 8.2 and 8.4, could you use 8.4 since it fixed your issue? thanks

Because the model deployment environment is 8.2, so I couldn't change it.
Besides, in tensorRT 8.2, when I use trtexec to convert the onnx model to .trt model, if I add argument --workspace = 32, trtexec could generate the .trt model. I could load the engine and do inferenc in python, but failed in C++. I'm still confused

@Xinchengzelin
Copy link
Author

@Xinchengzelin , the ForeignNode[Unsqueeze_93...Softmax_2088] include several nodes that handle by our internal myelin compiler, seems we fixed a failure between 8.2 and 8.4, could you use 8.4 since it fixed your issue? thanks

I find the release notes:

The --workspace flag in trtexec has been deprecated. TensorRT now allocates as much workspace as available GPU memory by default when the --workspace/--memPoolSize flags are not added, instead of having 16MB default workspace size limit in the trtexec in TensorRT 8.2. To limit the workspace size, use the --memPoolSize=workspace: flag instead.

Unfortunately, my problem seems to with it, Could I have the solutions to solve it? the error happens in C++ code when createExcutionContext

@handoku
Copy link

handoku commented Apr 13, 2022

[04/13/2022-11:48:34] [W] [TRT] Skipping tactic 0 due to Myelin error: Copy operation "concat" has 513 inputs.
[04/13/2022-11:48:34] [E] Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[MPS_VAR_3/strided_slice_1__843:0[Constant]...strided_slice_8__923]}.)
[04/13/2022-11:48:34] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[04/13/2022-11:48:34] [E] Engine could not be created from network
[04/13/2022-11:48:34] [E] Building engine failed
[04/13/2022-11:48:34] [E] Failed to create engine from model.
[04/13/2022-11:48:34] [E] Engine set up failed

similar problem here, can it be solved by any other method other than using trt8.4

@ttyio
Copy link
Collaborator

ttyio commented Apr 14, 2022

@Xinchengzelin , the ForeignNode[Unsqueeze_93...Softmax_2088] include several nodes that handle by our internal myelin compiler, seems we fixed a failure between 8.2 and 8.4, could you use 8.4 since it fixed your issue? thanks

I find the release notes:

The --workspace flag in trtexec has been deprecated. TensorRT now allocates as much workspace as available GPU memory by default when the --workspace/--memPoolSize flags are not added, instead of having 16MB default workspace size limit in the trtexec in TensorRT 8.2. To limit the workspace size, use the --memPoolSize=workspace: flag instead.

Unfortunately, my problem seems to with it, Could I have the solutions to solve it? the error happens in C++ code when createExcutionContext

Yes we changed the default workspace to max GPU memory start from 8.4. So no need to set workspace memory in 8.4.
Did you serialize cuda engine using trtexec and load it from C++? can you check the trtexec source code and fix your code since trtexec works? https://github.com/NVIDIA/TensorRT/tree/release/8.2/samples/trtexec

@ttyio
Copy link
Collaborator

ttyio commented Apr 14, 2022

[04/13/2022-11:48:34] [W] [TRT] Skipping tactic 0 due to Myelin error: Copy operation "concat" has 513 inputs.
[04/13/2022-11:48:34] [E] Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[MPS_VAR_3/strided_slice_1__843:0[Constant]...strided_slice_8__923]}.)
[04/13/2022-11:48:34] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[04/13/2022-11:48:34] [E] Engine could not be created from network
[04/13/2022-11:48:34] [E] Building engine failed
[04/13/2022-11:48:34] [E] Failed to create engine from model.
[04/13/2022-11:48:34] [E] Engine set up failed

similar problem here, can it be solved by any other method other than using trt8.4

@handoku , sorry, better to upgrade TRT8.4

@Xinchengzelin
Copy link
Author

@Xinchengzelin , the ForeignNode[Unsqueeze_93...Softmax_2088] include several nodes that handle by our internal myelin compiler, seems we fixed a failure between 8.2 and 8.4, could you use 8.4 since it fixed your issue? thanks

I find the release notes:

The --workspace flag in trtexec has been deprecated. TensorRT now allocates as much workspace as available GPU memory by default when the --workspace/--memPoolSize flags are not added, instead of having 16MB default workspace size limit in the trtexec in TensorRT 8.2. To limit the workspace size, use the --memPoolSize=workspace: flag instead.

Unfortunately, my problem seems to with it, Could I have the solutions to solve it? the error happens in C++ code when createExcutionContext

Yes we changed the default workspace to max GPU memory start from 8.4. So no need to set workspace memory in 8.4. Did you serialize cuda engine using trtexec and load it from C++? can you check the trtexec source code and fix your code since trtexec works? https://github.com/NVIDIA/TensorRT/tree/release/8.2/samples/trtexec

@ttyio Yes, I serialize cuda engine using trtexec and load it from C++
You mean I should change the trtexec source code? Could you specificlly tell me how to fix my code?

@ttyio
Copy link
Collaborator

ttyio commented Apr 14, 2022

@Xinchengzelin , trtexec also support --saveEngine --loadEngine, did you also hit failure when using trtexec? If not you can check the trtexec source to see what's difference between yours and trtexec.

@Xinchengzelin
Copy link
Author

@Xinchengzelin , trtexec also support --saveEngine --loadEngine, did you also hit failure when using trtexec? If not you can check the trtexec source to see what's difference between yours and trtexec.

@ttyio Thank you very much! I could use trtexec to load engine. now I can check the difference between my code and trtexec.cpp. trtexec.cpp, right?

@ttyio
Copy link
Collaborator

ttyio commented Apr 14, 2022

@handoku
Copy link

handoku commented Apr 20, 2022

[04/13/2022-11:48:34] [W] [TRT] Skipping tactic 0 due to Myelin error: Copy operation "concat" has 513 inputs.
[04/13/2022-11:48:34] [E] Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[MPS_VAR_3/strided_slice_1__843:0[Constant]...strided_slice_8__923]}.)
[04/13/2022-11:48:34] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[04/13/2022-11:48:34] [E] Engine could not be created from network
[04/13/2022-11:48:34] [E] Building engine failed
[04/13/2022-11:48:34] [E] Failed to create engine from model.
[04/13/2022-11:48:34] [E] Engine set up failed

similar problem here, can it be solved by any other method other than using trt8.4

@handoku , sorry, better to upgrade TRT8.4

@ttyio hava just tried with trt 8.4.0.6, still get same error

@handoku
Copy link

handoku commented Apr 20, 2022

Is there any way to prevent myelin from doing this op fusion?

It seems that myelin fused some node into a single one,however, can't find a corresponding implementation to run it.

@ttyio
Copy link
Collaborator

ttyio commented Apr 20, 2022

@handoku , we cannot disable myelin.
have you tried increase the workspace size --workspace if you are using trtexec, if it still failed, could you share your onnx here? thanks

@handoku
Copy link

handoku commented Apr 20, 2022

@ttyio hello, I hava already set workspace=12Gb on T4. The model can be found here(model.onnx), thanks for looking into this.

@ttyio
Copy link
Collaborator

ttyio commented Apr 21, 2022

Thanks @handoku , internal issue created to track the failure.

@ttyio
Copy link
Collaborator

ttyio commented Apr 26, 2022

@handoku , your issue is fixed in 8.4 GA, please upgrade 8.4 GA when we release the binary in https://developer.nvidia.com/tensorrt, thanks!

@Xinchengzelin
Copy link
Author

@handoku , your issue is fixed in 8.4 GA, please upgrade 8.4 GA when we release the binary in https://developer.nvidia.com/tensorrt, thanks!

Thanks for your advices, I compared and changed my code with trtexec code, now it works. Thank you very much

@handoku
Copy link

handoku commented Apr 26, 2022

@ttyio thanks,and when will the GA version be released

@ttyio
Copy link
Collaborator

ttyio commented Apr 26, 2022

@handoku , GA should be available in early June, thanks!

@edric1261234
Copy link

I have the similar problem, but fixed by adding /path/to/TensorRT-8.2.4.2/lib to LD_LIBRARY_PATH

@a227799770055
Copy link

a227799770055 commented Jan 17, 2023

Hi, I meet the similar problem. And I have tried "--workspace=32", but the problem still occur.
My tensorrt version is 8.5.1.7, and os is ubuntu 20.
Thanks!

[01/17/2023-10:14:48] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: autotuning: CUDA error 2 allocating 4362077693-byte buffer: out of memory
[01/17/2023-10:14:48] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1590) [Shuffle]...Reshape_11153 + Transpose_11154]}.)
[01/17/2023-10:14:48] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[01/17/2023-10:14:48] [E] Engine could not be created from network
[01/17/2023-10:14:48] [E] Building engine failed
[01/17/2023-10:14:48] [E] Failed to create engine from model or file.
[01/17/2023-10:14:48] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=/home/insign/Doc/insign/Monocular-Depth-Estimation-Toolbox/toTRT/depth2023_sim.onnx --int8

@proevgenii
Copy link

proevgenii commented May 2, 2023

Hi, I meet similar problem
My tensorrt version is 8.5.3.1,

[05/02/2023-12:00:12] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[05/02/2023-12:00:12] [E] Engine could not be created from network
[05/02/2023-12:00:12] [E] Building engine failed
[05/02/2023-12:00:12] [E] Failed to create engine from model or file.
[05/02/2023-12:00:12] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8503] # trtexec --onnx=/workspace/vit_bert_model.onnx --minShapes=images:1x3x224x224,input_ids:1x128,attention_mask:1x128 --optShapes=images:4x3x224x224,input_ids:4x128,attention_mask:4x128 --maxShapes=images:8x3x224x224,input_ids:8x128,attention_mask:8x128 --explicitBatch --workspace=8000 --saveEngine=/workspace/vit_bert_model_fp16_dyn_1_4_8.trt --fp16

UPD: When I run trtexec without --fp16 model exported normally
So, is there a possibility to run with fp16 precision flag?

@lix19937
Copy link

CUDA error 2 allocating 4362077693-byte buffer: out of memory

CUDA error 2 allocating 4362077693-byte buffer: out of memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

7 participants