Release Release v0.10.0 · Xilinx/brevitas

Highlights

Support for PyTorch up to version 2.1 .
Support for GPTQ PTQ algorithm.
Support for GPFQ PTQ algorithm.
Support for SmoothQuant / activation equalization PTQ algorithm.
Support for MSE based scale and zero-point for weights and activations.
Support for row-wise scaling at the input of QuantLinear.
Support for quantization of a slice of a weight tensor.
End-to-end support for learned rounding in ImageNet PTQ.
End-to-end example training scripts for A2Q (low precision accumulation) over superresolution.
Experimental support for minifloats (eXmY quantization).
Experimental LLM PTQ flow with support for weight-only and weight+activation quantization, together with GPTQ, AWQ and SmoothQuant.
Experimental Stable Diffusion PTQ flow with support for weight-only quantization.
Deprecated FINN ONNX export flow.
Update custom value_trace FX tracer to latest FX.
New custom variant of make_fx tracer with support for custom torch.library ops through @Wrap annotation.

What's Changed

Feat (nn): cache modules that require subtensor slicing by @volcacius in #628
Feat: support slicing for gptq by @Giuseppe5 in #626
Feat: add support to row wise input quantization to QuantLinear by @volcacius in #625
Fix (nn): disable weight tensor slicing syntax by @volcacius in #633
Feat (core): add SliceTensor util for sub-weight quant by @volcacius in #634
Fix (core): add missing dtype and device by @Giuseppe5 in #635
Feat (ptq): activation equalization support by @Giuseppe5 in #541
Feat (fx): value_trace improvements by @volcacius in #636
Fix (core/utils): jit ignore eager mode tensor slicing impl by @volcacius in #637
Fix (weight_eq): fix for llm equalization by @Giuseppe5 in #638
Add missing license by @Giuseppe5 in #640
Feat (ptq): act equalization support for vision by @Giuseppe5 in #643
Fix (tracer): support for index and no-tracer ops by @Giuseppe5 in #644
Setup: pin version of inflect for compatibility by @Giuseppe5 in #647
Activation eq extension by @Giuseppe5 in #642
Fix (core): correct forward in ParameterFromStatsFromParameter by @Giuseppe5 in #650
Feat (zero_point): grid search for mse zp by @Giuseppe5 in #651
Fix (weight_eq): correct handling of layernorm/batchnorm as sink by @Giuseppe5 in #646
Feat (nn): set dim names in QuantMHA Linear by @volcacius in #629
Fix (act_quant): flag to enable/disable stats collection by @Giuseppe5 in #641
Feat (core): add keepdim to min/max/percentile stats by @volcacius in #657
Fix (ptq): conflicts between gptq and equalization by @volcacius in #656
Fix (nn): state_dict load for unpacked in_proj in MHA by @volcacius in #654
Feat (ptq): learned round support in evaluate/benchmark by @Giuseppe5 in #639
Feat (nn): avoid computing output scale/zp when not needed by @volcacius in #655
Fix (QuantTensor): pixel_shuffle and unshuffle handler by @volcacius in #663
Setup: fix installation of libgomp1 by @Giuseppe5 in #662
Fix (quantize): fix and improvements for fx quantize by @Giuseppe5 in #661
Fix (resnet18): fixing default weight quantizer for linear layer by @i-colbert in #660
Fix(gptq): fix for quant convtranspose1d/2d and conv1d by @Giuseppe5 in #665
Refactor of ptq_common by @Giuseppe5 in #649
Examples: initial support for LLMs PTQ by @volcacius in #658
Fix (weight_eq): mantain order of regions by @Giuseppe5 in #667
Feat (core): simplify binary_sign impl by @volcacius in #672
Feat (core): add permute_dims to all reshape fns by @volcacius in #671
Feat (graph/equalize): clean up scale invariant ops by @volcacius in #669
Misc: fix pre-commit by @volcacius in #676
Misc: fix another pre-commit by @volcacius in #677
Feat (examples/llm): initial support for loading AWQ results by @volcacius in #673
Fix (espcn): updating links to use new tags by @i-colbert in #678
Fix (ptq): fix for act quantizers by @Giuseppe5 in #675
Fix (ptq): fix for residual with mha by @Giuseppe5 in #681
Fix (fx): fix fx quantize for conv->bn by @Giuseppe5 in #680
Feat (gptq): add option to return output from forward by @Giuseppe5 in #684
Fix (a2q): correcting post-rounding scaling initialization by @i-colbert in #659
Feat (quant): initial support for fp8 variants by @volcacius in #686
Fix (gptq): fix for depthwise act_order by @Giuseppe5 in #688
Feat (core): support for stochastic round by @volcacius in #689
Fix (gptq): Caching quant_inp values for quant_weight by @i-colbert in #653
Feat (gptq): support for groupwise conv by @Giuseppe5 in #690
Fix (gptq): typo in variable name by @Giuseppe5 in #691
Rename brevitas quant custom op by @jinchen62 in #693
Change tolerance for fp16 by @jinchen62 in #694
Fix (docs): Updating references to A2Q paper by @i-colbert in #698
Feat (examples/llm): add first/last layer support by @volcacius in #699
Feat (examples/llm): add packed 3/5/6b export by @volcacius in #700
Fix (examples/llm): padding for packed 3/5/6b by @volcacius in #701
Fix (gptq): linalg import fix by @Giuseppe5 in #705
Examples (a2q): updating and extending ESPCN demo by @i-colbert in #706
Examples (a2q): adding links for pretrained models by @i-colbert in #707
Fix (nn): add missing support for padding_mode by @volcacius in #709
Feat (examples/llm): add custom float support by @volcacius in #708
GPFQ by @Giuseppe5 in #666
Feat (ptq): support for float bias by @Giuseppe5 in #713
Feat (ptq): flag to disable/enable signed activations by @Giuseppe5 in #714
Support for minifloat benchmark by @Giuseppe5 in #712
adding quant_format, mantissa, and exponent options to evaluate script by @fabianandresgrob in #717
Fix (fx): import backport on 2.1 by @volcacius in #732
Fix (ptq): correct bitwidth for layerwise int benchmark by @Giuseppe5 in #737
Fix (ptq): fix for ptq_common by @Giuseppe5 in #739
Fix (examples): adding bias_quant to final linear layer in resnet18 by @i-colbert in #720
Fix (base): Updating A2Q defaults by @i-colbert in #718
Fix (core): arithmetic of zero-point with positive only values by @volcacius in #670
Fix (nn): QuantConv group calculation by @i-colbert in #703
Feat (QuantTensor): QuantTensor x Tensor elementary ops dequantize to Tensor by @volcacius in #668
Feat (examples): initial Stable Diffusion support by @volcacius in #715
changes class_implementation to init_class in gpxq_mode by @fabianandresgrob in #754
Fix errors in test by @Giuseppe5 in #716
Fix (notebook): increase atol for asserts by @Giuseppe5 in #759
Gpfq/act order by @fabianandresgrob in #729
Fix (backport): op decomp in make_fx backport by @volcacius in #763
Feat (export): deprecate FINN ONNX export by @Giuseppe5 in #753
Update torch-mlir jit_ir import path by @jinchen62 in #771
Fix (ptq): disable input_quant in graph quant by @Giuseppe5 in #770
Setup: CI tests against pytorch 2.x by @Giuseppe5 in #760
Fixed no cuda error by @saadulkh in #741
Fix (jit): remove patcher by @Giuseppe5 in #752
Fix (minifloat): add scaling_min_val to base quantizers by @Giuseppe5 in #773
Fix (ptq/benchmark): better dataframe creation by @Giuseppe5 in #774
Release v0.10.0 by @nickfraser in #780
Fix (docs): README.md for pre-commit by @volcacius in #781

New Contributors

@fabianandresgrob made their first contribution in #717
@saadulkh made their first contribution in #741

Full Changelog: v0.9.1...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.10.0

Highlights

What's Changed

New Contributors

Contributors