Skip to content

ONNX Neural Compressor v1.0 Release

Latest
Compare
Choose a tag to compare
@chensuyue chensuyue released this 02 Aug 02:13
· 3 commits to main since this release
71c2484

Neural Compressor provides ONNX model quantization techniques inherited from Intel Neural Compressor, including Post-training Quantization and Weight-only Quantization.

  • Features
  • Validated Configurations

Features

  • Support Post-training Quantization, including static and dynamic approach
  • Support SmoothQuant for Post-training Quantization
  • Support Weight-only Quantization with several algorithms, including RTN, GPTQ, AWQ
  • Support layer-wise quantization for RTN, GPTQ
  • Validate popular LLMs such as Llama3, Phi-3, Qwen2 with weight-only quantization on multiple Intel hardware, such as Intel Xeon Scalable processor and Intel Core Ultra Processors

Validated Configurations

  • OS version: CentOS 8.4, Ubuntu 22.04
  • Python version: 3.10
  • ONNX Runtime version: 1.18.1