KHMBB16 #131

iollmann · 2022-02-10T01:55:56Z

Similar instructions in other vector ISAs, SQRDMULH, PMULHRSW, and vmhraddshs, all round their products by adding 2**14 before right shifting. It would be helpful if this design did similar things to better enable software portability. It is typically one of the better performing multipliers out there.

As a bit of a history note, these things are used for fixed point multiplication by [-1,1), which is exactly what you want to do in DCTs (JPEG) and other DFT like algorithms. It can also be helpful in some image blend modes/filters, particularly those involving luminance calculation, Lanczos resampling, and I can only imagine that there are audio applications as well, given prevalence of FFT/mDCT in that space. The rounding helps reduce accumulated error and allows the codec to better conform to behavior on other platforms. Without it, we may expect some modest darkening of the image / quieting of the sound, or to be forced off to use some other instruction that runs at half multiplication throughput in order to provide comparable results to other platforms. It also provides for more symmetric rounding. Otherwise the right shift rounds towards -Inf, and is asymmetric about 0, not good for sinusoids.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KHMBB16 #131

KHMBB16 #131

iollmann commented Feb 10, 2022 •

edited

Loading

KHMBB16 #131

KHMBB16 #131

Comments

iollmann commented Feb 10, 2022 • edited Loading

iollmann commented Feb 10, 2022 •

edited

Loading