-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spend more than 20ms on prepocess on jetson nano, batch=1 #109
Comments
@JinRanYAO Is the data you're testing a picture or a video? |
@JinRanYAO Try to use the following instructions to achieve fp16 quantization, and improve performance by about 100% ./trtexec --onnx=yolov8n-pose.onnx --saveEngine=yolov8n-pose-fp16.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:2x3x640x640 --maxShapes=images:4x3x640x640 --fp16 【FP32】: 【FP16】: |
@FeiYull Thank you for your quick reply!
|
@JinRanYAO void YOLOv8Pose::preprocess(const std::vectorcv::Mat& imgsBatch) ![]() |
@FeiYull It seems that resize, bgr2rgb, norm and hwc2chw cost almost the same time, about 5ms for each process. Could I use the similar fuctions in opencv when I receive image, instead of using these processes here? |
@JinRanYAO U can merge the following operations to one:
Inside the resizeDevice's cuda kernel function you call, modify the following: [modify bofore] TensorRT-Alpha/utils/kernel_function.cu Line 142 in bca9575
[modify after] // bgr2rgb // normlization |
@FeiYull Thanks for your advice, the preprocess time decreases to 8ms after merging resize, bgr2rgb, norm to one. Then I resize the image to trtfile size when it is received, and use the same src_size and dst_size in yolov8-pose. Finally I simplify the preporcess code by deleting affinematrix and interpolation to save more time. Here is my code now.
After simplifying, the preprocess time decreases to about 6ms, with right inference result. Is this code all right or anything can be improved? |
Hello, thank you for your excellent work. I am trying to use your yolov8-pose code in jetson nano for real-time detection. I set batchs=1, imageshape=640(h)x384(w). I can get right result, and I found that it costs 40+ ms on inference, but 20+ ms on preprocess. I think it takes too long time on preprocess. Is there anything wrong, and is there anything I can do to optimize it?
The text was updated successfully, but these errors were encountered: