Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256. #5030

Merged
merged 2 commits into from
Dec 30, 2024

Conversation

tingboliao
Copy link

The implementation of the original zgemm_tcopy_4_rvv, when the vector length (vlen) is 128 and 256,
causes some cases in the cgemmt series to fail when running openblas_utest_ext for functional testing.
The optimized version can pass the functional tests with various vector lengths such as 128, 256, 512, and 1024.

Furthermore, for the relevant cases in the benchmark, the further optimized version has better performance on two pieces of hardware, namely K230 [C908, vlen = 128] and K1 [C908, vlen = 256], compared with the original optimized version.
The performance data is shown as below:

Parameter setting: OPENBLAS_LOOPS = 10000.
1. K230 [C908, vlen = 128]:
Cases Original RVV / MFlops Optimized RVV / MFlops
cher2k.goto 4619.25 4753.04
cherk.goto 4117.78 4182.16
csyr2k.goto 4581.21 4701.76
csyrk.goto 4033.85 4126.95

2. K1 [C908, vlen = 256]:
Cases Original RVV / MFlops Optimized RVV / MFlops
cher2k.goto 6697.40 7298.92
cherk.goto 5701.16 6224.16
csyr2k.goto 6558.31 7195.55
csyrk.goto 5599.63 6136.10

In the above data, the bigger value is, the better performance is.

tingbo.liao added 2 commits December 24, 2024 10:33
…uations where the vector lengths(vlens) are 128 and 256.

Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
@martin-frbg martin-frbg added this to the 0.3.29 milestone Dec 30, 2024
@martin-frbg martin-frbg merged commit 73527aa into OpenMathLib:develop Dec 30, 2024
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants