Release Release v0.11.0 · Xilinx/brevitas

Breaking Changes

Remove ONNX QOp export (#917)
QuantTensor cannot have empty metadata fields (e.g., scale, bitwidth, etc.) (#819)
Bias quantization now requires the specification of bit-width (#839)
QuantLayers do not expose quant_metadata directly. This is delegated to the proxies (#883)
QuantDropout has been removed (#861)
QuantMaxPool has been removed (#858)

Highlights

Support for OCP/FNUZ FP8 quantization
- Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
- Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, etc.)
- Support for ONNX QDQ export
Support for OCP MX Quantization
- Compatibility with QAT/PTQ, including all current PTQ algorithms implemented (GPTQ, LearnedRound, GPFQ, etc.)
- Possibility to fully customize the minifloat configuration (i.e., select mantissa/exponent bit-width, exponent bias, group size, etc.)
New QuantTensor supports:
- FloatQuantTensor: supports OCP FP formats and general minifloat quantization
- GroupwiseQuantTensor: supports for OCP MX formats and general groupwise int/minifloat quantization
Support for Channel splitting
Support for HQO optimization for zero point
Support for HQO optimization for scale (prototype)
Improved SDXL entrypoint under brevitas_examples
Improved LLM entrypoint under brevitas_examples
- Compatibility with accelerate
Prototype support for torch.compile:
- Check PR #1006 for an example on how to use it

What's Changed

For a more comprehensive list of changes and fix, check the list below:

Enhance: Importing quantized models after bias correction by @costigt-dev in #868
Fix QCDQDecoupledWeightQuantProxyHandlerMixin return args by @costigt-dev in #870
Fix - Speech to text: Create an empty json file by @costigt-dev in #871
Feat (scaling/standalone): flag to retrieve full state dict by @Giuseppe5 in #874
Notebooks: makes notebooks deterministic and prints output of asserts by @fabianandresgrob in #847
Fix (proxy): revert value tracer change by @Giuseppe5 in #888
Fix (proxy): fix for attributes retrieval by @Giuseppe5 in #880
Feat (notebook): add example for dynamic quantization to ONNX export by @fabianandresgrob in #877
Fix (gpxq): handling empty tensors with GPxQ and adding unit tests by @i-colbert in #892
Fix (ptq): expose uint_sym_act flag and fix issue with minifloat sign by @fabianandresgrob in #898
Feat (minifloat): add support for user specified minifloat format by @fabianandresgrob in #821
Feat: Add QuantConv3d and QuantConv3dTranspose by @costigt-dev in #805
Add tutorial examples of per-channel quantization by @OscarSavolainenDR in #867
Fix (tests): revert pytest pin by @Giuseppe5 in #903
Remove: Remove original_cat workaround by @costigt-dev in #902
Infra: Update issue template by @nickfraser in #893
Pull Request Template by @capnramses in #885
Fix (core): add return in state_dict by @Giuseppe5 in #910
Fix (quant_tensor): fix typing and remove unused checks by @Giuseppe5 in #913
Fix (nn): removed unused caching in adaptive avgpool2d by @Giuseppe5 in #911
Fix (quant_tensor): remove unused checks by @Giuseppe5 in #918
Setup: pin ONNX to 1.15 due to ORT incompatibility by @Giuseppe5 in #924
Feat (examples): add support for Stable Diffusion XL by @Giuseppe5 in #909
Assert all ptq-common bit widths are positive integers by @OscarSavolainenDR in #931
Enhance: Quant Tensor Test by @costigt-dev in #894
Fix (examples/stable_diffusion): README formatting and clarification by @Giuseppe5 in #932
Fix (examples/ptq): fix for bitwidth check by @Giuseppe5 in #934
Feat: functionalize QuantTensor by @Giuseppe5 in #878
Feat (minifloat): cleanup minifloat impl by @Giuseppe5 in #922
Fix tests in dev by @Giuseppe5 in #939
Feat (proxy): scale computation delegated to bias proxy by @Giuseppe5 in #938
Fix (gpxq): adding input quant to process input by @i-colbert in #943
Fix (quant): propagate device and dtype in subinjector by @Giuseppe5 in #942
Fix (gpxq): correct variable name by @Giuseppe5 in #944
Fix (quant_tensor): fix AvgPool functional implementation by @Giuseppe5 in #945
Feat (quant_tensor): support for dim() and ndim by @Giuseppe5 in #947
Fix (graph/standardize): correct check for Mean to AvgPool by @Giuseppe5 in #948
Feat (graph/standardize): default keepdim value by @Giuseppe5 in #950
Fix bullet formatting in getting started guide by @timkpaine in #952
Fix (quant/float): correct scaling_impl and float_scaling_impl by @Giuseppe5 in #953
Fix/remove-numel - Remove numel is zero check from context manager exit method by @costigt-dev in #920
Feat (examples/ptq): support for dynamic act quant by @Giuseppe5 in #935
Feat (quant_tensor): support for FloatQuantTensor by @Giuseppe5 in #919
Fix (examples/llm): Add all rewriters to the list by @nickfraser in #956
Fix (core/quant/float): use eps to avoid log(0) by @Giuseppe5 in #957
Fix (test/actions): Excluded torch==1.9.1, platform=macos-latest tests by @nickfraser in #960
Adding FP8 weight export by @costigt-dev in #907
Fix (llm): fix device issue for eval when not using default device by @fabianandresgrob in #949
Fix (GPFQ): using random projection for speed up/less memory usage by @fabianandresgrob in #964
Fix (calibrate/minifloat): fix for act calibration by @Giuseppe5 in #966
Fix (quant/float): restore fix for log(0) by @Giuseppe5 in #968
Setup: pin numpy version by @Giuseppe5 in #974
Feat (minifloat): support for FNUZ variants by @Giuseppe5 in #973
Fix (core/float): add default for float_scaling_impl by @Giuseppe5 in #972
Feat (graph/equalize): upcast during equalization computation by @Giuseppe5 in #970
Generative improv by @Giuseppe5 in #965
Fix (requirements/setuptools): Set maximum requirement for setuptools by @nickfraser in #963
Fix: Typo fix on SDXL command line args by @nickfraser in #976
Fix (graph/bias_correction): Fix when layer parameters are offloaded to accelerate by @nickfraser in #962
Fix (ptq/bias_correction): remove unnecessary forward pass by @Giuseppe5 in #980
Fix (export/qonnx): Fixed symbolic kwargs order. by @nickfraser in #988
Various SDXL quantization fixes by @nickfraser in #977
Fix (brevitas_examples/sdxl): Various fixes by @Giuseppe5 in #991
Feat (proxy/parameter_quant): cache quant weights by @Giuseppe5 in #990
Docs: Added 0.10.3 release note to README. by @nickfraser in #993
Added some preliminary unit tests to the CNNs 'quantize_model' by @OscarSavolainenDR in #927
Feat (tests): extended minifloat unit tests by @alexredd99 in #979
Fix (proxy/runtime_quant): correct handling of mixed type quantization by @Giuseppe5 in #985
docs (readme): Fixed GH actions badges by @nickfraser in #996
Feat: Update LLM entry-point by @nickfraser in #987
Feat: Support for Groupwise (MX) quantization by @Giuseppe5 in #971
Feat(graph): better exclusion mechanism by @Giuseppe5 in #1003
Fix (mx): input view during quantization by @Giuseppe5 in #1005
Feat (mx): adding padding and transposed support by @Giuseppe5 in #1007
fix (nn/avg_pool): Fix for trunc quant not being applied by @nickfraser in #1014
Fix (nn/conv): Fixed conversion of convolutions when padding_mode='same' by @nickfraser in #1017
Fix (proxy): clean-up by @Giuseppe5 in #1011
Feat (mx): automatic group_dim in layerwise quant by @Giuseppe5 in #1012
fix (nn/conv): Fix regression introduced in #1017 by @nickfraser in #1019
Feat (mx): gptq compatibility and quant tests by @Giuseppe5 in #1013
Feat (mx): PTQ MX + Float support by @Giuseppe5 in #1010
Fix (graph/quant): Bugfix in blacklist matching in find_module by @nickfraser in #1021
Test (graph/quantize) Added extra prefix test to layerwise_quantize by @nickfraser in #1022
Test (example/llm): Refactor and add basic tests for the LLM entry-point by @nickfraser in #1002
Feat (examples/sdxl): Updates to SDXL entry-point by @nickfraser in #1020
Feat (gptq): optimizing CPU to GPU memory transfer by @i-colbert in #1009
notebooks: rerun notebooks. by @nickfraser in #1026
HQO for scale/zero point by @Giuseppe5 in #937
Test calibration reference by @Giuseppe5 in #1031
Fix (sdxl): avoid suppressing checkpoint errors by @Giuseppe5 in #1034
Setup: update checkout version to v3 by @Giuseppe5 in #1039
Fix (gpxq): correct index for groupwise GPxQ by @Giuseppe5 in #1040
Fix (llm): small fixes to LLM by @Giuseppe5 in #1035
Feat (activation_calibration): speed-up by skipping quantization by @Giuseppe5 in #1029
Fix (proxy): fix for float quant properties is_ocp and is_fnuz by @alexredd99 in #1028
Decoupled PerChannel/PerTensor quantization by @Giuseppe5 in #1025
Fix (examples/llm): Fix infinite loop in LLM entrypoint with WikiText2 by @pablomlago in #1044
Fix po2 for float quant by @Giuseppe5 in #1033
Feat (gpfq): adding memory-efficient formulation by @i-colbert in #1043
Fix weights mse by @Giuseppe5 in #1047

New Contributors

@OscarSavolainenDR made their first contribution in #861
@costigt-dev made their first contribution in #868
@timkpaine made their first contribution in #952
@alexredd99 made their first contribution in #979
@pablomlago made their first contribution in #1044

Full Changelog: v0.10.2...v0.11.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.11.0

Breaking Changes

Highlights

What's Changed

New Contributors

Contributors