Math: Optimise 16-bit matrix multiplication functions. #9088

This patch introduces Doxygen-style documentation to the matrix multiplication functions. Clear descriptions and parameter details are provided to facilitate better understanding and ease of use. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>

- Added checks for integer overflow during shifting. - Validated matrix dimensions to prevent mismatches. - Ensured non-null pointers before operating on matrices. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>

Changed the accumulator data type from `int64_t` to `int32_t` to reduce instruction cycle count. This change results in an approximate 8.18% gain in performance for matrix multiplication operations. Performance Results: Compiler Settings: -O2 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 6487 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 6106 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>

Enhanced pointer arithmetic within loops to improve readability and reduce overhead. This change potentially reduces minor computational overhead, contributing to overall performance improvements of around 8.23% for Test 1 and 16.00% for Test 2. Performance Results: Compiler Settings: -O3 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 5953 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 5128 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>

Updated comments for better clarity and understanding. Made cosmetic changes such as reformatting code and renaming variables to enhance readability without impacting functionality. This resulted in approximately 7.97% and 15.00% performance improvements for Test 1 and Test 2, respectively. Performance Results: Compiler Settings: -O2 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 5975 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 5192 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math: Optimise 16-bit matrix multiplication functions. #9088

Math: Optimise 16-bit matrix multiplication functions. #9088

Commits on Aug 26, 2024

Commits on Aug 27, 2024

Math: Optimise 16-bit matrix multiplication functions. #9088

Are you sure you want to change the base?

Math: Optimise 16-bit matrix multiplication functions. #9088

Commits on Aug 26, 2024

Commits on Aug 27, 2024