Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Apply Different Quantization Settings Per Layer in ExecuTorch? #6846

Open
crinex opened this issue Nov 14, 2024 · 2 comments
Open

How to Apply Different Quantization Settings Per Layer in ExecuTorch? #6846

crinex opened this issue Nov 14, 2024 · 2 comments
Assignees
Labels
module: quantization triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@crinex
Copy link

crinex commented Nov 14, 2024

Dear @kimishpatel @jerryzh168 @shewu-quic

I want to split a model(eg, Llama-3.2-3B) into multiple layers and apply different quantization settings(qnn_8a8w, qnn_16a4w...) to each layer.
Has such a method been tested in ExecuTorch?
If not, could you suggest how this can be achieved?

Thank you

@kimishpatel
Copy link
Contributor

Is this specific to qnn backend or your question is in general?

@GregoryComer GregoryComer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 14, 2024
@crinex
Copy link
Author

crinex commented Nov 15, 2024

@kimishpatel

I meant was about the QNN backend.
I’ve noticed that when I quantize the LLaMA 3.2-3B model using QNN_8A8W, inference doesn’t work properly on the device. It seems that others are reporting similar results in issues as well.
Therefore, I’m considering solving this problem using mixed precision on the QNN backend. Would that be possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: quantization triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants