NUQMM https://arxiv.org/pdf/2206.09557.pdf https://arxiv.org/pdf/1706.02021.pdf https://arxiv.org/pdf/1802.00150.pdf
https://arxiv.org/pdf/2208.11580.pdf https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf https://authors.library.caltech.edu/54981/1/Optimal%20Brain%20Surgeon%20and%20general%20network%20pruning.pdf
LLM.int8() https://arxiv.org/abs/2208.07339 https://arxiv.org/pdf/2212.09720.pdf SmoothQuant https://arxiv.org/abs/2211.10438
LLM PTQ https://arxiv.org/abs/2303.08302
Pareto Optimal Quantization https://arxiv.org/abs/2105.03536 GPTQ https://arxiv.org/abs/2210.17323 Survey https://arxiv.org/abs/2103.13630