-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor inference results of TensorRT 8.6.3 when running INT8-calibration on GPU RTX3090 #3708
Comments
For transformer-based model, PTQ cannot provide a good accuracy, you can try QAT. |
Can you elaborate on this? Considering EfficientViT, avoids SoftMax activations, by using reLU, and convolutional layers, I would have thought that it would not experience the same performance hit. But, I am just theorizing, and I guess the results speak for themselves. Are you suggesting to use the pytorch integration for QAT? Could a conservative mixed-precision work without the need for training in your opinion? What are the obstacles to trying PTQ-methods in tensorrt-llm on a vision model? e.g. smoothquant. |
你好,请问关于扩散模型的量化您是否有了解呢?基于A10卡,量化sdxl-turbo模型的时候,unet的推理耗时比不量化的时间更长了。(仅量化了nn.Linear,如果卷积层和线性层都量化22G显存不太够) In English, |
@bernardrb @ApolloRay , we have a SD INT8 sample in https://github.com/NVIDIA/TensorRT/tree/release/10.0/demo/Diffusion, not sure if this helps. |
Description
I tried to run EfficientViT-SAM on a RTX 3090, but quantization to 8-bit gave severely distorted results. Unsure whether issue has to do with calibration code, or nature of quantization. I've altered the ImageBatcher to work with my model in mind.
Image below was quantized on a 10000 images from the Meta SAM dataset,
INT8
FP16
Adapted from samples/EfficientDet,
class ImageBatcher:
def init(
self,
input,
shape,
dtype,
max_num_images=None,
exact_batches=False,
preprocessor="EfficientDet",
shuffle_files=False,
):
class SamResize:
def init(self, size: int) -> None:
self.size = size
Environment
TensorRT Version: 8.6.3
NVIDIA GPU: RTX 3090
NVIDIA Driver Version: 525.147.05
CUDA Version: 12.0
CUDNN Version: 9.0.0
Operating System: Ubuntu 22.04
Python Version (if applicable): 3.10.12
PyTorch Version (if applicable): 2.2.1
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.02-py3
Relevant Files
Model link: https://github.com/mit-han-lab/efficientvit/blob/master/applications/sam.md
All files: https://drive.google.com/drive/folders/16Qe72Kf1SmXobz9X1YKuDB8pDGQVAheK?usp=sharing
Includes minimal setup, source code, logs, results
Steps To Reproduce
Commands or scripts:
For convenience,
docker compose up build_engine
docker compose up inference
scripts/quantize.sh to build engine
scripts/inference.sh to run inference
Have you tried the latest release?: Yes.
The text was updated successfully, but these errors were encountered: