10 Oct 12:31

Giuseppe5

4617f7b

Release v0.11.0 Latest

Latest

Breaking Changes

Remove ONNX QOp export (#917)
QuantTensor cannot have empty metadata fields (e.g., scale, bitwidth, etc.) (#819)
Bias quantization now requires the specification of bit-width (#839)
QuantLayers do not expose quant_metadata directly. This is delegated to the proxies (#883)
QuantDropout has been removed (#861)
QuantMaxPool has been removed (#858)

Highlights

Support for OCP/FNUZ FP8 quantization
- Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
- Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, etc.)
- Support for ONNX QDQ export
Support for OCP MX Quantization
- Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
- Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, group size, etc.)
New QuantTensor supports:
- FloatQuantTensor: supports OCP FP formats and general minifloat quantization
- GroupwiseQuantTensor: supports for OCP MX formats and general groupwise int/minifloat quantization
Support for Channel splitting
Support for HQO optimization for zero point
Support for HQO optimization for scale (prototype)
Improved SDXL entrypoint under brevitas_examples
Improved LLM entrypoint under brevitas_examples
- Compatibility with accelerate
Prototype support for torch.compile:
- Check PR #1006 for an example on how to use it

What's Changed

For a more comprehensive list of changes and fix, check the list below:

Enhance: Importing quantized models after bias correction by @costigt-dev in #868
Fix QCDQDecoupledWeightQuantProxyHandlerMixin return args by @costigt-dev in #870
Fix - Speech to text: Create an empty json file by @costigt-dev in #871
Feat (scaling/standalone): flag to retrieve full state dict by @Giuseppe5 in #874
Notebooks: makes notebooks deterministic and prints output of asserts by @fabianandresgrob in #847
Fix (proxy): revert value tracer change by @Giuseppe5 in #888
Fix (proxy): fix for attributes retrieval by @Giuseppe5 in #880
Feat (notebook): add example for dynamic quantization to ONNX export by @fabianandresgrob in #877
Fix (gpxq): handling empty tensors with GPxQ and adding unit tests by @i-colbert in #892
Fix (ptq): expose uint_sym_act flag and fix issue with minifloat sign by @fabianandresgrob in #898
Feat (minifloat): add support for user specified minifloat format by @fabianandresgrob in #821
Feat: Add QuantConv3d and QuantConv3dTranspose by @costigt-dev in #805
Add tutorial examples of per-channel quantization by @OscarSavolainenDR in #867
Fix (tests): revert pytest pin by @Giuseppe5 in #903
Remove: Remove original_cat workaround by @costigt-dev in #902
Infra: Update issue template by @nickfraser in #893
Pull Request Template by @capnramses in #885
Fix (core): add return in state_dict by @Giuseppe5 in #910
Fix (quant_tensor): fix typing and remove unused checks by @Giuseppe5 in #913
Fix (nn): removed unused caching in adaptive avgpool2d by @Giuseppe5 in #911
Fix (quant_tensor): remove unused checks by @Giuseppe5 in #918
Setup: pin ONNX to 1.15 due to ORT incompatibility by @Giuseppe5 in #924
Feat (examples): add support for Stable Diffusion XL by @Giuseppe5 in #909
Assert all ptq-common bit widths are positive integers by @OscarSavolainenDR in #931
Enhance: Quant Tensor Test by @costigt-dev in #894
Fix (examples/stable_diffusion): README formatting and clarification by @Giuseppe5 in #932
Fix (examples/ptq): fix for bitwidth check by @Giuseppe5 in #934
Feat: functionalize QuantTensor by @Giuseppe5 in #878
Feat (minifloat): cleanup minifloat impl by @Giuseppe5 in #922
Fix tests in dev by @Giuseppe5 in #939
Feat (proxy): scale computation delegated to bias proxy by @Giuseppe5 in #938
Fix (gpxq): adding input quant to process input by @i-colbert in #943
Fix (quant): propagate device and dtype in subinjector by @Giuseppe5 in #942
Fix (gpxq): correct variable name by @Giuseppe5 in #944
Fix (quant_tensor): fix AvgPool functional implementation by @Giuseppe5 in #945
Feat (quant_tensor): support for dim() and ndim by @Giuseppe5 in #947
Fix (graph/standardize): correct check for Mean to AvgPool by @Giuseppe5 in #948
Feat (graph/standardize): default keepdim value by @Giuseppe5 in #950
Fix bullet formatting in getting started guide by @timkpaine in #952
Fix (quant/float): correct scaling_impl and float_scaling_impl by @Giuseppe5 in #953
Fix/remove-numel - Remove numel is zero check from context manager exit method by @costigt-dev in #920
Feat (examples/ptq): support for dynamic act quant by @Giuseppe5 in #935
Feat (quant_tensor): support for FloatQuantTensor by @Giuseppe5 in #919
Fix (examples/llm): Add all rewriters to the list by @nickfraser in #956
Fix (core/quant/float): use eps to avoid log(0) by @Giuseppe5 in #957
Fix (test/actions): Excluded torch==1.9.1, platform=macos-latest tests by @nickfraser in #960
Adding FP8 weight export by @costigt-dev in #907
Fix (llm): fix device issue for eval when not using default device by @fabianandresgrob in #949
Fix (GPFQ): using random projection for speed up/less memory usage by @fabianandresgrob in #964
Fix (calibrate/minifloat): fix for act calibration by @Giuseppe5 in #966
Fix (quant/float): restore fix for log(0) by @Giuseppe5 in #968
Setup: pin numpy version by @Giuseppe5 in #974
Feat (minifloat): support for FNUZ variants by @Giuseppe5 in #973
Fix (core/float): add default for float_scaling_impl by @Giuseppe5 in #972
Feat (graph/equalize): upcast during equalization computation by @Giuseppe5 in #970
Generative improv by @Giuseppe5 in #965
Fix (requirements/setuptools): Set maximum requirement for setuptools by @nickfraser in #963
Fix: Typo fix on SDXL command line args by @nickfraser in #976
Fix (graph/bias_correction): Fix when layer parameters are offloaded to accelerate by @nickfraser in #962
Fix (ptq/bias_correction): remove unnecessary forward pass by @Giuseppe5 in #980
Fix (export/qonnx): Fixed symbolic kwargs order. by @nickfraser in #988
Various SDXL quantization fixes by @nickfraser in #977
Fix (brevitas_examples/sdxl): Various fixes by @Giuseppe5 in #991
Feat (proxy/parameter_quant): cache quant weights by @Giuseppe5 in #990
Docs: Added 0.10.3 release note to README. by @nickfraser in #993
Added some preliminary unit tests to the CNNs 'quantize_model' by @OscarSavolainenDR in #927
Feat (tests): extended minifloat unit tests by @alexredd99 in #979
Fix (proxy/runtime_quant): correct handling of mixed type quantization by @Giuseppe5 in #985
docs (readme): Fixed GH actions badges by @nickfraser in #996
Feat: Update LLM entry-point ...

Contributors

capnramses, nickfraser, and 8 other contributors

Assets 2

23 Jul 13:48

nickfraser

v0.10.3

b7da344

Release v0.10.3

What's Changed

Backport: Fix (export/qonnx): Fixed symbolic kwargs order. (#988) by @nickfraser in #992
numpy version, onnx version and maximum setuptools version set

Full Changelog: v0.10.2...v0.10.3

Contributors

nickfraser

Assets 2

19 Feb 16:37

nickfraser

v0.10.2

2004568

Release v0.10.2

What's Changed

Fix (QuantLayer): make bias for QuantLayer optional by @fabianandresgrob in #846
Fix (examples/llm): set group_size only for groupwise quantization by @nickfraser in #853
Fix (gpfq): updating input processing and L1-norm constraints for GPFA2Q by @i-colbert in #852
ImageNet PTQ example fix by @Giuseppe5 in #863
feat (gen/quantize): Added device flag to quantize_model by @nickfraser in #860
Docs: update README for 0.10.2 release by @Giuseppe5 in #865

Full Changelog: v0.10.1...v0.10.2

Contributors

nickfraser, Giuseppe5, and 2 other contributors

Assets 2

15 Feb 11:50

nickfraser

v0.10.1

2d76aa3

Release v0.10.1

Highlights

A2Q+ support paper
A2Q+ examples with CIFAR10 and Super Resolution
Support for concatenation equalization for weights and activations
Support for GPFQ + A2Q L1 Norm bound
Possibility to explicitly export Q node for weights in QCDQ export
Support for float16 and bfloat16 for QCDQ export
Support for Dynamic Activation Quantization for ONNX QDQ export
Support for channel-splitting paper
(Beta) Better compatibility with Huggingface accelerate and optimum
(Beta) Improved support and testing for minifloat quantization

What's Changed

Fix (examples/generative): set weight_bit_width in weight_quant by @Giuseppe5 in #783
Feat (graph/equalize): improvements for llm equalization by @Giuseppe5 in #784
[graph] Fix typo in class name by @nickfraser in #765
Fix (graph/equalize): refactor for act equalization by @Giuseppe5 in #787
[quant_tensor] Updates __truediv__ behaviour to match "standard fixed point rules" by @nickfraser in #769
Feat (export): (b)float16 support for qcdq export by @Giuseppe5 in #776
Feat (ptq): Adding A2Q Upper Bound clipping to GPFQ by @fabianandresgrob in #734
Extended equalization by @Giuseppe5 in #778
Better Bfloat16 support by @Giuseppe5 in #777
Fix (stats): add return statement in state_dict by @Giuseppe5 in #792
Fix (equalize): improved cat eq checks by @Giuseppe5 in #793
Fix (export): add CastMixin by @Giuseppe5 in #794
Dynamic Act Quant support by @Giuseppe5 in #796
Fix (examples/quantizers): correct dynamic zero point handling by @Giuseppe5 in #806
Feat (a2q+): improving accumulator-aware weight quantization by @i-colbert in #797
Feat (a2q+): adding new super resolution models to brevitas_examples by @i-colbert in #811
Feat (Channel-Splitting): sets up first skeleton for channel-splitting by @fabianandresgrob in #772
Feat: support for optimum by @Giuseppe5 in #826
Fix (tests): adding tests for FloatQuant by @fabianandresgrob in #815
Fix (export): correct q node export by @Giuseppe5 in #829
Fix (examples/llm): correct groupwise export by @Giuseppe5 in #832
Fix (examples/super_res): updating README by @i-colbert in #828
Fix (examples/export): improved export by @Giuseppe5 in #838
Fix (graph/equalize): cleanup and device management by @Giuseppe5 in #840
Feat (examples/a2q): adding CIFAR10 example by @i-colbert in #813
Fix (export): check for Per Group quantization by @Giuseppe5 in #848

Full Changelog: v0.10.0...v0.10.1

Contributors

nickfraser, Giuseppe5, and 2 other contributors

Assets 2

12 Feb 18:17

nickfraser

a2q_cifar10_r1

c78f974

A2Q+ CIFAR10 model release Pre-release

Pre-release

This release contains training code and pre-trained weights to demonstrate accumulator-aware quantization (A2Q) on an image classification task. Code is also provided to demonstrate Euclidean projection-based weight initialization (EP-init) as proposed in our paper "A2Q+: Improving Accumulator-Aware Weight Quantization".

Find the associated docs at https://github.com/Xilinx/brevitas/tree/a2q_cifar10_r1/src/brevitas_examples/imagenet_classification/a2q.

Assets 13

30 Jan 19:00

nickfraser

super_res_r2

17fb49e

A2Q+ model release Pre-release

Pre-release

A2Q+ Super Resolution Experiments with Brevitas

This release contains training code and pre-trained weights to demonstrate accumulator-aware quantization (A2Q+) as proposed in our paper "A2Q+: Improving Accumulator-Aware Weight Quantization" on a super resolution task.

Find the associated docs at https://github.com/Xilinx/brevitas/tree/super_res_r2/src/brevitas_examples/super_resolution.

Assets 4

08 Dec 16:36

volcacius

v0.10.0

84f4225

Release v0.10.0

Highlights

Support for PyTorch up to version 2.1 .
Support for GPTQ PTQ algorithm.
Support for GPFQ PTQ algorithm.
Support for SmoothQuant / activation equalization PTQ algorithm.
Support for MSE based scale and zero-point for weights and activations.
Support for row-wise scaling at the input of QuantLinear.
Support for quantization of a slice of a weight tensor.
End-to-end support for learned rounding in ImageNet PTQ.
End-to-end example training scripts for A2Q (low precision accumulation) over superresolution.
Experimental support for minifloats (eXmY quantization).
Experimental LLM PTQ flow with support for weight-only and weight+activation quantization, together with GPTQ, AWQ and SmoothQuant.
Experimental Stable Diffusion PTQ flow with support for weight-only quantization.
Deprecated FINN ONNX export flow.
Update custom value_trace FX tracer to latest FX.
New custom variant of make_fx tracer with support for custom torch.library ops through @Wrap annotation.

What's Changed

Feat (nn): cache modules that require subtensor slicing by @volcacius in #628
Feat: support slicing for gptq by @Giuseppe5 in #626
Feat: add support to row wise input quantization to QuantLinear by @volcacius in #625
Fix (nn): disable weight tensor slicing syntax by @volcacius in #633
Feat (core): add SliceTensor util for sub-weight quant by @volcacius in #634
Fix (core): add missing dtype and device by @Giuseppe5 in #635
Feat (ptq): activation equalization support by @Giuseppe5 in #541
Feat (fx): value_trace improvements by @volcacius in #636
Fix (core/utils): jit ignore eager mode tensor slicing impl by @volcacius in #637
Fix (weight_eq): fix for llm equalization by @Giuseppe5 in #638
Add missing license by @Giuseppe5 in #640
Feat (ptq): act equalization support for vision by @Giuseppe5 in #643
Fix (tracer): support for index and no-tracer ops by @Giuseppe5 in #644
Setup: pin version of inflect for compatibility by @Giuseppe5 in #647
Activation eq extension by @Giuseppe5 in #642
Fix (core): correct forward in ParameterFromStatsFromParameter by @Giuseppe5 in #650
Feat (zero_point): grid search for mse zp by @Giuseppe5 in #651
Fix (weight_eq): correct handling of layernorm/batchnorm as sink by @Giuseppe5 in #646
Feat (nn): set dim names in QuantMHA Linear by @volcacius in #629
Fix (act_quant): flag to enable/disable stats collection by @Giuseppe5 in #641
Feat (core): add keepdim to min/max/percentile stats by @volcacius in #657
Fix (ptq): conflicts between gptq and equalization by @volcacius in #656
Fix (nn): state_dict load for unpacked in_proj in MHA by @volcacius in #654
Feat (ptq): learned round support in evaluate/benchmark by @Giuseppe5 in #639
Feat (nn): avoid computing output scale/zp when not needed by @volcacius in #655
Fix (QuantTensor): pixel_shuffle and unshuffle handler by @volcacius in #663
Setup: fix installation of libgomp1 by @Giuseppe5 in #662
Fix (quantize): fix and improvements for fx quantize by @Giuseppe5 in #661
Fix (resnet18): fixing default weight quantizer for linear layer by @i-colbert in #660
Fix(gptq): fix for quant convtranspose1d/2d and conv1d by @Giuseppe5 in #665
Refactor of ptq_common by @Giuseppe5 in #649
Examples: initial support for LLMs PTQ by @volcacius in #658
Fix (weight_eq): mantain order of regions by @Giuseppe5 in #667
Feat (core): simplify binary_sign impl by @volcacius in #672
Feat (core): add permute_dims to all reshape fns by @volcacius in #671
Feat (graph/equalize): clean up scale invariant ops by @volcacius in #669
Misc: fix pre-commit by @volcacius in #676
Misc: fix another pre-commit by @volcacius in #677
Feat (examples/llm): initial support for loading AWQ results by @volcacius in #673
Fix (espcn): updating links to use new tags by @i-colbert in #678
Fix (ptq): fix for act quantizers by @Giuseppe5 in #675
Fix (ptq): fix for residual with mha by @Giuseppe5 in #681
Fix (fx): fix fx quantize for conv->bn by @Giuseppe5 in #680
Feat (gptq): add option to return output from forward by @Giuseppe5 in #684
Fix (a2q): correcting post-rounding scaling initialization by @i-colbert in #659
Feat (quant): initial support for fp8 variants by @volcacius in #686
Fix (gptq): fix for depthwise act_order by @Giuseppe5 in #688
Feat (core): support for stochastic round by @volcacius in #689
Fix (gptq): Caching quant_inp values for quant_weight by @i-colbert in #653
Feat (gptq): support for groupwise conv by @Giuseppe5 in #690
Fix (gptq): typo in variable name by @Giuseppe5 in #691
Rename brevitas quant custom op by @jinchen62 in #693
Change tolerance for fp16 by @jinchen62 in #694
Fix (docs): Updating references to A2Q paper by @i-colbert in #698
Feat (examples/llm): add first/last layer support by @volcacius in #699
Feat (examples/llm): add packed 3/5/6b export by @volcacius in #700
Fix (examples/llm): padding for packed 3/5/6b by @volcacius in #701
Fix (gptq): linalg import fix by @Giuseppe5 in #705
Examples (a2q): updating and extending ESPCN demo by @i-colbert in #706
Examples (a2q): adding links for pretrained models by @i-colbert in #707
Fix (nn): add missing support for padding_mode by @volcacius in #709
Feat (examples/llm): add custom float support by @volcacius in #708
GPFQ by @Giuseppe5 in #666
Feat (ptq): support for float bias by @Giuseppe5 in #713
Feat (ptq): flag to disable/enable signed activations by @Giuseppe5 in #714
Support for minifloat benchmark by @Giuseppe5 in #712
adding quant_format, mantissa, and exponent options to evaluate script by @fabianandresgrob in #717
Fix (fx): import backport on 2.1 by @volcacius in #732
Fix (ptq): correct bitwidth for layerwise int benchmark by @Giuseppe5 in #737
Fix (ptq): fix for ptq_common by @Giuseppe5 in #739
Fix (examples): adding bias_quant to final linear layer in resnet18 by @i-colbert in #720
Fix (base): Updating A2Q defaults by @i-colbert in #718
Fix (core): arithmetic of zero-point with positive only values by @volcacius in #670
Fix (nn): QuantConv group calculation by @i-colbert in #703
Feat (QuantTensor): QuantTensor x Tensor elementary ops dequantize to Tensor by @volcacius in #668
Feat (examples): initial Stable Diffusion support by @volcacius in #715
changes class_implementation to init_class in gpxq_mode by @fabianandresgrob in #754
Fix errors in test by @Giuseppe5 in #716
Fix (notebook): increase atol for asserts by @Giuseppe5 in #759
Gpfq/act order by @fabianandresgrob in #729
Fix (backport): op decomp in make_fx backport by @volcacius in #763
Feat (export): deprecate FINN ONNX export by @Giuseppe5 in https://github.com/Xilinx/brevitas/p...

Contributors

volcacius, nickfraser, and 6 other contributors

Assets 2

0 Join discussion

20 Sep 16:07

volcacius

super_res_r1

acf1f5d

A2Q model release Pre-release

Pre-release

Integer-Quantized Super Resolution Experiments with Brevitas

This release contains scripts demonstrating how to train integer-quantized super resolution models using Brevitas.
Code is also provided to demonstrate accumulator-aware quantization (A2Q) as proposed in our ICCV 2023 paper "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance".

Find the associated docs at https://github.com/Xilinx/brevitas/tree/super_res_r1/src/brevitas_examples/super_resolution .

Assets 9

28 Apr 16:57

volcacius

v0.9.1

d30ba0d

Release v0.9.1

What's Changed

Setup: add requirements-dev with pre-commit by @Giuseppe5 in #581
CI update by @Giuseppe5 in #570
Fix (brevitas_examples/bnn_pynq): missing 4b resnet18 link and hash fn by @volcacius in #583
Docs: update READMEs by @Giuseppe5 in #584

Full Changelog: v0.9.0...v0.9.1

Contributors

volcacius and Giuseppe5

Assets 2

21 Apr 17:50

volcacius

v0.9.0

1b9589d

Release v0.9.0

Highlights

Initial support for graph quantization to programmatically generate a quantized model from a floating-point one. ImageNet examples with PTQ can be found here: https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/imagenet_classification/ptq .
Initial support for QuantMultiheadAttention, which is leveraged for e.g. ViT support above.
Various improvements to graph equalization, which are leveraged in the PTQ examples above.
New accumulation-aware quantizers, to train for low-precision accumulation, based on our A2Q paper https://arxiv.org/abs/2301.13376 .
Experimental support for BatchQuant quantizer, based on https://arxiv.org/abs/2105.08952 , currently still untested.
Initial support for learned rounding.

Overview of changes

Graph quantization

Initial graph quantization support by @Giuseppe5 in #549 #574 #532 #579

Quantized layers

Initial support for QuantMultiheadAttention #568
Breaking change: rename Quant(Adaptive)AvgPool to Trunc(Adaptive)AvgPool by @volcacius in #562

Quantizers

Weight normalization-based integer quantizers by @i-colbert in #559
Accumulator-aware weight quantization by @i-colbert in #567
BatchQuant quantizers support by @volcacius in #563

QuantTensor

Support to move QuantTensor across devices by @Giuseppe5 in #528
Initial support for interpolate and pixel_shuffle by @volcacius in #578

PTQ

Batch Norm support in graph equalization by @Giuseppe5 in #531
Mul support in graph equalization by @Giuseppe5 in #530
Learned round support by @Giuseppe5 in #573
MultiheadAttention and LayerNorm support in graph equalization by @Giuseppe5 in #555
Fix calibration over large number of batches by @Giuseppe5 in #523

Export

Itemize scalar quantize args only in TorchScript QCDQ by @volcacius in #561
Round avgpool export fixes by @volcacius in #562

CI, linting

Linter isort by @Giuseppe5 in #505
CI: bump isort from 5.10.1 to 5.11.5 by @Giuseppe5 in #540
Test: enable parallelism with pytest-xdist by @Giuseppe5 in #513
GHA workflow improvement by @Giuseppe5 in #507
Add support for yapf by @Giuseppe5 in #511

FX

Disable FX backport on 1.8.1+ by @volcacius in #504

Examples

Pretrained Resnet18 example on CIFAR10 targeting FINN by @volcacius in #577
Graph quantization + PTQ examples and benchmarking scripts by @Giuseppe5 in #547 #575 #576

For the Full Changelog please check : v0.8.0...v0.9.0

Contributors

volcacius, Giuseppe5, and i-colbert

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking Changes

Highlights

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Highlights

What's Changed

Contributors

A2Q+ Super Resolution Experiments with Brevitas

Highlights

What's Changed

Contributors

Integer-Quantized Super Resolution Experiments with Brevitas

What's Changed

Contributors

Highlights

Overview of changes

Graph quantization

Quantized layers

Quantizers

QuantTensor

PTQ

Export

CI, linting

FX

Examples

Contributors

Releases: Xilinx/brevitas

Release v0.11.0

Breaking Changes

Highlights

What's Changed

Contributors

Release v0.10.3

What's Changed

Contributors

Release v0.10.2

What's Changed

Contributors

Release v0.10.1

Highlights

What's Changed

Contributors

A2Q+ CIFAR10 model release

A2Q+ model release

A2Q+ Super Resolution Experiments with Brevitas

Release v0.10.0

Highlights

What's Changed

Contributors

A2Q model release

Integer-Quantized Super Resolution Experiments with Brevitas

Release v0.9.1

What's Changed

Contributors

Release v0.9.0

Highlights

Overview of changes

Graph quantization

Quantized layers

Quantizers

QuantTensor

PTQ

Export

CI, linting

FX

Examples

Contributors