-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example pytorch_quantization doesn't shown speed up #3808
Comments
You can use https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/models/classification/resnet.py#L370 define a nn.module, then export a quant onnx, which different with https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py. |
What's the difference between these two version of resnet? |
I print the model precision, I find the model is actually still in fp32, this is the script I copied from the documentation, did I miss anything?
|
quant.pt has more scale layers than no_quant.pt, or you export param.dtype always fp32 |
Sorry I think you didn't quite answer my question... Do I save the model not correctly? If so, how shall I save the quantized model in this case, so that next time, I can directly load the same modeling code with the new quantized checkpoint? |
quant version(insert Q-DQ) vs no quant version
quant version add more scale layers (mul op), nv I think you should read more about quant doc. |
Okay...good to know, but tbh this is not explicitly stated in the documentation. If I missed it, would appreciate you kindly point out, thanks
|
Besides, I am trying to use the tool to quantize a latent encoder, this is the model code https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/diffusionmodules/model.py#L368. The inference code has this model as a
Would you have some clue why this happens? If I use the nn.Module, the quantize works, but the exported onnx produce pretty wrong results. |
does the tool supports pl.LightningModule out of the box? |
What is your qat.py or calib.py ? |
This is all the code #3808 (comment), except I swapped the model to the latent encoder model, am I missing something? |
Would you recommend for me to try use the latest AMMO from Nvidia? |
ref https://pypi.org/project/nvidia-ammo/ I think ammo contains the pytorch-quanitzation functions and more other new features support LLM model(like transformer-based). TensorRT-LLM and diffusion model both use nvidia-ammo. |
We encourage use AMMO now and pytorch-quantization will be deprecated in the future. |
Description
I have been following this documentation to quantize the pretrained resnet, want to get a feel of how it works, however, the quantized resnet model is the same size of the pytorch model as well as runtime and memory, is this expected?
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html
Environment
Build pytorch_quantization from source as instructed in the RreadMe.
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version: 525.105
CUDA Version: 12.1
CUDNN Version:
Operating System:
Python Version (if applicable): 3.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.1.0+cu121
Baremetal or Container (if so, version):
The text was updated successfully, but these errors were encountered: