-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Math: Optimise 16-bit matrix multiplication functions. #9088
base: main
Are you sure you want to change the base?
Math: Optimise 16-bit matrix multiplication functions. #9088
Commits on Aug 26, 2024
-
Math: Add Doxygen documentation for matrix multiplication
This patch introduces Doxygen-style documentation to the matrix multiplication functions. Clear descriptions and parameter details are provided to facilitate better understanding and ease of use. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 4a3f6b9 - Browse repository at this point
Copy the full SHA 4a3f6b9View commit details -
Math: Error Checking Enhancements
- Added checks for integer overflow during shifting. - Validated matrix dimensions to prevent mismatches. - Ensured non-null pointers before operating on matrices. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 1ac1b5d - Browse repository at this point
Copy the full SHA 1ac1b5dView commit details -
Math: Change accumulator data type to int32_t for matrix multiplication
Changed the accumulator data type from `int64_t` to `int32_t` to reduce instruction cycle count. This change results in an approximate 8.18% gain in performance for matrix multiplication operations. Performance Results: Compiler Settings: -O2 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 6487 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 6106 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 1f4f10a - Browse repository at this point
Copy the full SHA 1f4f10aView commit details -
Math: Enhance pointer arithmetic in matrix multiplication
Enhanced pointer arithmetic within loops to improve readability and reduce overhead. This change potentially reduces minor computational overhead, contributing to overall performance improvements of around 8.23% for Test 1 and 16.00% for Test 2. Performance Results: Compiler Settings: -O3 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 5953 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 5128 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 5ddb7c0 - Browse repository at this point
Copy the full SHA 5ddb7c0View commit details -
Math: Update comments and apply cosmetic changes
Updated comments for better clarity and understanding. Made cosmetic changes such as reformatting code and renaming variables to enhance readability without impacting functionality. This resulted in approximately 7.97% and 15.00% performance improvements for Test 1 and Test 2, respectively. Performance Results: Compiler Settings: -O2 +------------+------+------+--------+-----------+-----------+----------+ | Test Name | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------------+------+------+--------+-----------+-----------+----------+ | Test 1 | 3 | 5 | 5975 | 0.00 | 0.00 | Pass | | Test 2 | 6 | 8 | 5192 | 0.00 | 0.00 | Pass | +------------+------+------+--------+-----------+-----------+----------+ Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 53dbdef - Browse repository at this point
Copy the full SHA 53dbdefView commit details
Commits on Aug 27, 2024
-
Math: Improve pointer manipulation in mat_multiply_elementwise
- Enhanced data pointers for matrix elements - Streamlined loop iteration for matrix element-wise multiplication - Achieved a 0.09% performance improvement in cycle count | Rows | Cols | Cycles | Max Error | RMS Error | Result| +------+------+--------+-----------+-----------+-------+ | 5 | 6 | 3359 | 0.00 | 0.00 | Pass | Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for 65c21f0 - Browse repository at this point
Copy the full SHA 65c21f0View commit details -
Math: Switch mat_multiply_elementwise product type to int32_t
- Changed product variable from int64_t to int32_t - Improved performance by reducing data size - Achieved a 11.57% performance improvement in cycle count | Rows | Cols | Cycles | Max Error | RMS Error | Result | +------+------+--------+-----------+-----------+--------+ | 5 | 6 | 2972 | 0.00 | 0.00 | Pass | Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c3eeab1 - Browse repository at this point
Copy the full SHA c3eeab1View commit details