-
Notifications
You must be signed in to change notification settings - Fork 65
Home
Other suggestion for new name: select
and bitsel
or bselect
-> Discuss on mailing list
In response to a request by a member of this task group: Add SH1ADD
, SH2ADD
, SH3ADD
, , SH4ADD
SH1ADDU.W
, SH2ADDU.W
, SH3ADDU.W
, and instructions with the following semantic.SH4ADDU.W
SHnADD RD, RS1, RS2 := RD = (RS1 << n) + RS2
SHnADDU.W RD, RS1, RS2 := RD = ((RS1 & 0xFFFFFFFF) << n) + RS2
These instructions only replace two other instructions each (SLLI + C.ADD
or SLLIU.W + C.ADD
), but these are extremely common operations for pointer arithmetic, so it might be worth having the extra instructions.
We might also want to create a new Zba (address) category for those 8 instructions and ADD[I]WU
, SUBWU
, ADDU.W
, SUBU.W
, and SLLIU.W
.
-> Discuss on mailing list
People have raised concerns about shift-ones not being common enough to justify an inclusion in Zbb.
-> Discuss on mailing list
(some of those are already in the doc)
clz, ctz, pcnt:
- FP emulation
- Hamming distance, parity
And-with-complement (andc):
- MIX pattern
- applying masks
- And-inverter-graph evaluation
- SHA-2 (1x in each round, ≈ 3% of operations)
- SHA-3 (25x in each round, ≈ 15% of operations)
Shift-ones:
- (dyn) mask generation
Generalized Reverse (grev, grevi):
- bit permutation
- endian-swapping (e.g. for big-endian)
- bit reversal (e.g. for FFT)
- bitboards (e.g. for chess engines)
Generalized Shuffle (shfl, unshfl, shfli, unshfli):
- bit permutation
- LUT input permutations
- bitboards (e.g. chess engines)
Bit Extract/Deposit (bext, bdep):
- maybe google more examples using x86 pext/pdep
Min/max instructions (min, max, minu, maxu):
- branchless code
- saturated arithmetic
- absolute value
Carry-less multiply (clmul, clmulh):
- CRC and CRC-like ("industry") algorithms
- Hashing, PRNG
- Gray decode
Bit-matrix operations (bmatxor, bmator, bitmatflip):
- bit permutation (within bytes)
- byte permutation
- bit duplication (within bytes)
- byte duplication
- many xor / many or (think "vector lite")
- full NxM bit matrix multiply (using many 8x8 ops)
- searching (finding zero bytes in 8-byte chunk)
- linear algebra in GF(2)
- arithmetic in GF(2k) with k ≤ 8
Funnel shift (fsl, fsr, fsri):
- mask generation
- bit/byte permutations on >XLEN blocks
- consuming a non-byte-aligned bit-stream of variable-length words