Releases: larq/compute-engine
Releases · larq/compute-engine
v0.4.3
🎉 Features
- Add support for multi threaded interpreter (#512) @lgeiger
- Add support for iterators in Interpreter.predict (#511) @lgeiger
- Add Python wrapper for LCE interpreter (#507) @lgeiger
👷♂️ Internal Improvements
- Remove redundant code paths: reference BGemm, BGemm functor. (#510) @AdamHillier
- Make Interpreter::get_shapes and get_types more generic (#509) @lgeiger
- Add padding to temporary arrays to ensure we don't read beyond bounds. (#505) @AdamHillier
- Simplify MAKE_ZERO macro (#503) @lgeiger
⬆️ Dependencies
- Bump DoozyX/clang-format-lint-action from v0.9 to v0.10 (#504) @dependabot
v0.4.2
🐛 Bug Fixes
- Fix bug introduced by #497. Make sure OPT kernels run. (#501) @AdamHillier
v0.4.1
🎉 Features
- Add bitpacked output support to the Aarch64 optimised int16 kernel. (#496) @AdamHillier
- Add Int8 output support to the optimised binary convolution kernels. (#492) @AdamHillier
🚀 Performance
- Add bitpacked output support to the Aarch64 optimised int16 kernel. (#496) @AdamHillier
- Update int8 reference bitpack code (#487) @Tombana
- Add an optimised Aarch64 assembly implementation of Int8 bitpacking. (#494) @AdamHillier
- Add Int8 output support to the optimised binary convolution kernels. (#492) @AdamHillier
- Fuse part of the back-transformation into the multiplier and bias. (#490) @AdamHillier
🐛 Bug Fixes
- Fix bug in 32-bit accumulator kernel introduced by 346f72e. (#491) @AdamHillier
📖 Documentation
- Fix typo in README.md (#485) @leonoverweel
👷♂️ Internal Improvements
- Simplify path selection and remove unnecessary kernel registrations. (#497) @AdamHillier
- Add
inline
toround
andsaturate
(#499) @Tombana - Remove Preprocessed from OutputTransformDetails (#495) @Tombana
- Use Bazel test_filter argument instead of action_env for tests (#488) @AdamHillier
- Clean-up bconv2d by simplifying Init/Prepare/OneTimeSetup. (#486) @AdamHillier
- Simplify BConv2DParams by removing flags. (#484) @AdamHillier
⬆️ Dependencies
- Bump DoozyX/clang-format-lint-action from v0.8 to v0.9 (#489) @dependabot
v0.4.0
⚠️ Breaking Changes ⚠️
- Split BSign into Quantize/Dequantize and separate from BConv (#457) @lgeiger, @AdamHillier @Tombana
- Change custom option enums to match flatbuffer schema (#464) @lgeiger
- Set padding and fused activations as enums (#386) @lgeiger
- Rename BConv attributes to follow TFLite more closely (#373) @lgeiger
🎉 Features
- Support full int8 QAT models in converter (#449) @lgeiger
- Add ARM32 kernel implementation (#432) @honglh
- Allow to set fake default ranges to enable latency test of int8 models (#357) @lgeiger
- Publish prebuilt benchmark binaries with each new release (#398) @lgeiger
- Publish dev docker images for releases (#397) @lgeiger
- Use thresholds for bitpacked output (#387) @Tombana
- Support dilated convolutions in converter (#385) @lgeiger
🚀 Performance
- Improve performance with optimised Ruy matrix packing. (#462) @AdamHillier
- Improve Aarch64 performance by removing NEON pipeline stalls. (#394) @AdamHillier
🐛 Bug Fixes
- Simplify make build (#480) @Tombana
- Fix int8 outputtransform (#422) @Tombana
- Fix trailing quantization op with experimental_bitpacking fusion (#402) @lgeiger
- Fix BMaxPool IR definition (#404) @lgeiger
- Fix sequential model in end2end test: add missing quantisers. (#390) @AdamHillier
📖 Documentation
- Update URLs in readme to adapt to the build docs changes (#481) @lgeiger
- Update benchmark table in Readme (#470) @lgeiger
- Fix broken links (#413) @koenhelwegen
👷♂️ Internal Improvements
- Remove zero_point argument from im2col (#478) @lgeiger
- Remove lce_benchmark_all binary (#477) @lgeiger
- Add profiler scopes for bitpacking functions and quantization/maxpool ops. (#471) @AdamHillier
- Remove Windows builds from release workflow (#476) @lgeiger
- Remove input/output dimensions from BConvParams (#475) @lgeiger
- Directly pass packed filer shape to bconv kernel (#474) @lgeiger
- Remove output shape argument from bitpack_tensor (#473) @lgeiger
- Simplify shape handling in kernel prepare method (#472) @lgeiger
- Fix unittest linkopts for macos (#465) @lgeiger
- Rename
packbits
tobitpack
for consistency. (#463) @AdamHillier - Automatically upload benchmark binaries to new releases (#456) @lgeiger
- Standardise on 32-bit bitpacking. Closes #446. (#461) @AdamHillier
- add extra braces around padding_buffer (#455) @andrewstanfordjason
- Prebuild AArch32 benchmark binary for new releases (#453) @lgeiger
- Add optimised canonical bitpacking for Aarch64. Closes #435. (#443) @AdamHillier
- Add Android AAR build to release workflow (#451) @lgeiger
- Add
make_unsigned
tounpack_matrix
type (#442) @Tombana - Use a 32-column C++ kernel layout when writing bitpacked output. (#441) @AdamHillier
- Test against Python 3.8 by default on CI (#438) @lgeiger
- Remove unused packbits_arm32 (#437) @lgeiger
- Test converter against TF 2.3 (#436) @lgeiger
- Update the unittest error threshold (#430) @Tombana
- Refactor OutputTransform with template types (#426) @Tombana
- Replace context->ReportError calls with TF_LITE_KERNEL_LOG (#403) @lgeiger
- Improve error message when passing wrong converter arg (#414) @lgeiger
- Use custom larq MLIR dialect for our ops (#384) @lgeiger
- Add misc updates for Micro (#401) @Tombana
- Upgrade TensorFlow to eaacee173897b77cdb6afd22d5e78154177a10f3 (#363) @lgeiger
- Slightly optimize docker image size (#399) @lgeiger
- Add new performance label to release notes (#395) @lgeiger
- Move weight bitpacking into its own converter pass. (#393) @AdamHillier
- Add custom dev docker file (#392) @lgeiger
- Remove RUY_ASM_FLAG_HAS_BIAS (#389) @lgeiger
- Remove filter_format attribute from IR and bconv kernel (#382) @lgeiger
- Add pattern to ensure post_activation_multipliers are always positiv (#375) @lgeiger
- Upgrade github/actions (#381) @lgeiger
- Add testing script for faster model conversions debugging (#380) @lgeiger
⬆️ Dependencies
- Update lint dependencies (#479) @lgeiger
- Bump actions/setup-python from v2.1.1 to v2.1.2 (#460) @dependabot
- Update actions/upload-artifact requirement to v2.1.4 (#419, #425, #440, #448, #454, #459) @dependabot
- Bump GoogleCloudPlatform/github-actions from 0.1.2 to 0.1.3 (#444) @dependabot
- Upgrade tensorflow to 2.3 stable (#439) @lgeiger
- Update lce_register.cc (#434) @lgeiger
- Bump DoozyX/clang-format-lint-action from v0.6 to v0.8 (#427, #433) @dependabot
- Bump actions/setup-python from v2 to v2.1.1 (#429) @dependabot
- Upgrade TensorFlow to 2.3.0rc0 (#415) @lgeiger
- Bump actions/download-artifact from v1 to v2 (#418) @dependabot
- Bump toolmantim/release-drafter from v5.8.0 to v5.11.0 (#417) @dependabot
- Create Dependabot config file (#416) @dependabot-preview
- Upgrade TensorFlow to tensorflow@9a70ab8 (#405) @lgeiger
v0.3.1
v0.3.0
🎉 Features
- Add experimental support for bitpacked output tensors to the converter (#352) @AdamHillier
- Build wheels for Windows (#355) @lgeiger
- Add support for prebuilt Python 3.8 wheels (#347) @lgeiger
- Upgrade to TF 2.2 (#258) @lgeiger
📖 Documentation
- Update benchmark results (#359) @lgeiger
- Reformat summaries in python.converter.py (#343) @leonoverweel
- Update README with QuickNet Correction (#341) @jamescook106
👷♂️ Internal Improvements
- Add 8x4 Arm64 kernel to improve performance (#350) @AdamHillier
- Build wheels using AVX support (#353) @lgeiger
- Use pre-installed bazelisk on CI (#354) @lgeiger
- Upgrade ARM gcc compiler to 9.2.1 (#331) @lgeiger
- Add TF 2.2 to converter test matrix (#348) @lgeiger
- Setup possibility to build nightly wheels on CI (#346) @lgeiger
- ⬆️ lint dependencies (#342) @lgeiger
v0.2.1
🎉 Features
- Add 8-bit quantization support to kernels (#327) @Tombana
- Support reading and writing bitpacked activations in C++ kernels. (#305) @AdamHillier
- Support weight scaling in converter (#326) @lgeiger
- Add ImageNet evaluation tool from TFLite (#313) @AdamHillier
- Add reference implementation of binary convolution. (#314) @arashb
📖 Documentation
- Add new benchmark numbers (#339) @Tombana
- Fix broken links (#335) @lgeiger
- Fix typo (#334) @lgeiger
- Correct top-1 accuracies according to #315 (#319) @lgeiger
- Correct model accuracies (#315) @lgeiger
- Fix docs link in contributing guide (#299) @lgeiger
👷♂️ Internal Improvements
- Fix
GTEST_FILTER
on CI (#338) @Tombana - Add bitpack order argument to packbits_tensor (#337) @Tombana
- Remove bitpacking dynamic memory allocations (#336) @Tombana
- AArch64 BMLA: Use pairwise addition to accumulate popcount results (#332) @lgeiger
- Simplify ARM compiler build files (#330) @lgeiger
- Consistently use bazel test and
--test_output=all
on CI (#329) @lgeiger - Run AAR test outside the custom op container (#322) @lgeiger
- Switch CI testing from mac to linux (#321) @lgeiger
- fix dependency for ref. bconv (#320) @arashb
- set bazel as default builder for android AAR (#318) @arashb
- fix the header guard for packbit utils (#312) @arashb
- Add MLIR status handler to log converter errors (#310) @lgeiger
- Consolidate four vector register loads into one instruction. (#308) @AdamHillier
- add header guard for bconv impl (#304) @arashb
- Fix typos and standardise
std::namespace
usage. (#303) @AdamHillier - Support a dynamic bitpacking bitwidth. (#302) @AdamHillier
- Don't use cache if authentication fails (#301) @lgeiger
- Consolidate MLIR and End2End tests (#298) @lgeiger
v0.2.0
⚠️ Breaking Changes ⚠️
🎉 Features
🐛 Bug Fixes
📖 Documentation
👷♂️ Internal Improvements
- Simplify ci builds (#297) @Tombana
- Add preprocessor statement to fix warning (#296) @Tombana
- Enable Bazel GCS remote cache (#287) @lgeiger
- Use
fastbuild
instead ofdbg
for tests. (#288) @AdamHillier - Add +1 padding to the end2end test (#283) @lgeiger
- (ci-skip) Simplify pip package publishing (#282) @lgeiger
- Switch converter tests to be Python only (#280) @lgeiger
- Add end2end test (#269) @Tombana
- Merge some bconv test params (as the # of params is limited). (#277) @AdamHillier
- Replace mul + add with shl + sub (#268) @lgeiger
- remove armv7 android ABI from LCE Lite AAR (#274) @arashb
- Cleanup LCE AAR build (#273) @lgeiger
- Remove un-used variable, fix typo. (#272) @AdamHillier
- Run ARM tests with debug builds. Closes #270. (#271) @AdamHillier
- Reduce normal register loads (#263) @AdamHillier
- Split fused multiply into backtransform and post multiply (#260) @Tombana
- Simplify padding constraint (#264) @lgeiger
- Run CI on PRs (including from forks). (#265) @AdamHillier
- remove legacy reference impl. of LCE bconv op (#261) @arashb
- Add core tests back to CI (#259) @arashb
- Simplify is binary check (#253) @lgeiger
- Cleanup MLIR tests (#255) @lgeiger
- Support running lit tests on macOS without needing to install GNU find (#254) @lgeiger
- Update BConv TableGen definition to match TFLite Op (#256) @lgeiger
v0.1.2
🎉 Features
🐛 Bug Fixes
- Use vectors instead of tflite temp buffers (#247) @arashb
- Bitpack weights only on the first run (#246) @Tombana
📖 Documentation
- update the pixel phone benchmark results (#249) @arashb
- Move docs to larq/docs (#239) @lgeiger
- Improve converter API docs (#237) @lgeiger
- fix links in second table (#236) @koenhelwegen
- QuickNet .h5 links (#235) @jamescook106