Skip to content

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

License

Notifications You must be signed in to change notification settings

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

Repository files navigation

Advanced Triton Inference Pipeline for CRAFT (Character-Region Awareness For Text detection)

Overview

Implementation new inference pipeline using NVIDIA Triton Inference Server for CRAFT text detector in Pytorch.

Author

k9ele7en. Give 1 star if you find some value in this repo.
Thank you.

License

[BSD-3-Clause License] The BSD 3-clause license allows you almost unlimited freedom with the software so long as you include the BSD copyright and license notice in it (found in Fulltext).

Updates

13 Jul, 2021: Initial update, preparation script run well.

14 Jul, 2021: Inference on Triton server run well (single request), TensorRT format give advance performance.

Getting started

1. Install dependencies

Requirements

$ pip install -r requirements.txt

2. Install required environment for inference using Triton server

Check ./README_ENV.md for details. Install tools/packages included:

  • TensorRT
  • Docker
  • nvidia-docker
  • PyCUDA ...

3. Training

The code for training is not included in this repository, as ClovaAI provided.

4. Inference instruction using pretrained model

  • Download the trained models
Model name Used datasets Languages Purpose Model Link
General SynthText, IC13, IC17 Eng + MLT For general purpose Click
IC15 SynthText, IC15 Eng For IC15 only Click
LinkRefiner CTW1500 - Used with the General Model Click

5. Model preparation before run Triton server:

a. Triton Inference Server inference: see details at ./README_ENV.md
Initially, you need to run a (.sh) script to prepare Model Repo, then, you just need to run Docker image when inferencing. Script get things ready for Triton server, steps covered:

  • Convert downloaded pretrain into mutiple formats
  • Locate converted model formats into Triton's Model Repository
  • Run (Pull first if not exist) Triton Server image from NGC

Check if Server running correctly:

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

Now everythings ready, start inference by:

  • Run docker image of Triton server (replace mount -v path to your full path to model_repository):
$ sudo docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/maverick911/repo/Triton-TensorRT-Inference-CRAFT-pytorch/model_repository:/models nvcr.io/nvidia/tritonserver:21.05-py3 tritonserver --model-repository=/models
...
+------------+---------+--------+
| Model      | Version | Status |
+------------+---------+--------+
| detec_onnx | 1       | READY  |
| detec_pt   | 1       | READY  |
| detec_trt  | 1       | READY  |
+------------+---------+--------+
I0714 00:37:55.265177 1 grpc_server.cc:4062] Started GRPCInferenceService at 0.0.0.0:8001
I0714 00:37:55.269588 1 http_server.cc:2887] Started HTTPService at 0.0.0.0:8000
I0714 00:37:55.312507 1 http_server.cc:2906] Started Metrics Service at 0.0.0.0:8002

Run infer by cmd:

$ python infer_triton.py -m='detec_trt' -x=1 --test_folder='./images' -i='grpc' -u='localhost:8001'
Request 1, batch size 1s/sample.jpg
elapsed time : 0.9521937370300293s

Output from Triton:

Performance benchmarks: single image (sample.jpg), time in seconds

  • Triton server: (gRPC-HTTP):

    Model format gRPC (s) HTTP (s)
    TensoRT 0.946 0.952
    Torchscript 1.244 1.098
    ONNX 1.052 1.060
  • Classic Pytorch: 1.319s

Arguments

  • -m: name of model with format
  • -x: version of model
  • --test_folder: input image/folder
  • -i: protocol (HTTP/gRPC)
  • -u: URL of corresponding protocol (HTTP-8000, gRPC-8001)
  • ... (Details in ./infer_triton.py)

Notes:

  • Error below is caused by wrong dynamic input shapes, check if the input image shape is valid to dynamic shapes in config.
inference failed: [StatusCode.INTERNAL] request specifies invalid shape for input 'input' for detec_trt_0_gpu0. Error details: model expected the shape of dimension 2 to be between 256 and 1200 but received 1216

b. Classic Pytorch (.pth) inference:

$ python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]

The result image and socre maps will be saved to ./result by default.

About

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

Topics

Resources

License

Stars

Watchers

Forks