You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi.
I was looking for a performance comparison between ruy and OpenBLAS and I came across this.
But when I benchmark the ruy (almost for any shape with single thread execution and on raspberry pi 4), my results are far behind the reported results.
For example, for the 512x512x512 Int8 benchmark, I can only get ~10 GOPs but excel reported 40 GOPs.
I know Raspberry Pi 4 CPU frequency can be maxed out to 1.5 GHz while Pixel 4 max frequency is 2.84 GHz, but it does not justify the 30 GOPs gap.
So I thought it might be better to ask it here.
How did you measure GOPs for ruy?
I calculate the GOPs for the method with the ((2 * N * K * M * iterations) / time) / 10e+9 formula (time is the sum of the execution time of ruy::Mul for each iteration) (I pack the RHS matrix beforehand).
Am I doing anything wrong?
The text was updated successfully, but these errors were encountered:
Hi.
I was looking for a performance comparison between
ruy
andOpenBLAS
and I came across this.But when I benchmark the ruy (almost for any shape with single thread execution and on raspberry pi 4), my results are far behind the reported results.
For example, for the 512x512x512 Int8 benchmark, I can only get ~10 GOPs but excel reported 40 GOPs.
I know Raspberry Pi 4 CPU frequency can be maxed out to 1.5 GHz while Pixel 4 max frequency is 2.84 GHz, but it does not justify the 30 GOPs gap.
So I thought it might be better to ask it here.
How did you measure GOPs for ruy?
I calculate the GOPs for the method with the
((2 * N * K * M * iterations) / time) / 10e+9
formula (time
is the sum of the execution time ofruy::Mul
for each iteration) (I pack the RHS matrix beforehand).Am I doing anything wrong?
The text was updated successfully, but these errors were encountered: