v0.12.0
Release Note
The following are the highlights in this release:
Performance Optimization
We found that the lack of OP implementations on devices(GPU, Hexagon DSP, etc.) would lead to inefficient model execution, for the memory synchronization between the device and the CPU consumed much time, so we added and enhanced some operators on the GPU( reshape, lpnorm, mvnorm, etc.) and Hexagon DSP (s2d, d2s, sub, etc.) to improve the efficiency of model execution.
Further Support For Speech Recognition
In the last version, we supported the Kaldi framework. In Xiaomi we did a lot of work to support the speech recognition model, including the support of flatten, unsample and other operators in onnx, as well as some bug fixes.
CMake Support
Mace is continuously optimizing our compilation tools. This time, we support cmake compilation. Because of the use of ccache for acceleration, the compilation speed of cmake is much faster than the original bazel.
Related Docs: https://mace.readthedocs.io/en/latest/user_guide/basic_usage_cmake.html
Others
In this version, We supported detection of perfomance regression by dana , and “ gpu_queue_window” parameter is added to yml file, to solve the UI jam problem caused by GPU task execution.
Related Docs: https://mace.readthedocs.io/en/latest/faq.html
Acknowledgement
Thanks for the following guys who contribute code which make MACE better.
yungchienhsu, gasgallo, albu, yunikkk