You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar instructions in other vector ISAs, SQRDMULH, PMULHRSW, and vmhraddshs, all round their products by adding 2**14 before right shifting. It would be helpful if this design did similar things to better enable software portability. It is typically one of the better performing multipliers out there.
As a bit of a history note, these things are used for fixed point multiplication by [-1,1), which is exactly what you want to do in DCTs (JPEG) and other DFT like algorithms. It can also be helpful in some image blend modes/filters, particularly those involving luminance calculation, Lanczos resampling, and I can only imagine that there are audio applications as well, given prevalence of FFT/mDCT in that space. The rounding helps reduce accumulated error and allows the codec to better conform to behavior on other platforms. Without it, we may expect some modest darkening of the image / quieting of the sound, or to be forced off to use some other instruction that runs at half multiplication throughput in order to provide comparable results to other platforms. It also provides for more symmetric rounding. Otherwise the right shift rounds towards -Inf, and is asymmetric about 0, not good for sinusoids.
The text was updated successfully, but these errors were encountered:
Similar instructions in other vector ISAs, SQRDMULH, PMULHRSW, and vmhraddshs, all round their products by adding 2**14 before right shifting. It would be helpful if this design did similar things to better enable software portability. It is typically one of the better performing multipliers out there.
As a bit of a history note, these things are used for fixed point multiplication by [-1,1), which is exactly what you want to do in DCTs (JPEG) and other DFT like algorithms. It can also be helpful in some image blend modes/filters, particularly those involving luminance calculation, Lanczos resampling, and I can only imagine that there are audio applications as well, given prevalence of FFT/mDCT in that space. The rounding helps reduce accumulated error and allows the codec to better conform to behavior on other platforms. Without it, we may expect some modest darkening of the image / quieting of the sound, or to be forced off to use some other instruction that runs at half multiplication throughput in order to provide comparable results to other platforms. It also provides for more symmetric rounding. Otherwise the right shift rounds towards -Inf, and is asymmetric about 0, not good for sinusoids.
The text was updated successfully, but these errors were encountered: