Quick summary

Instruction	General theme	Optional special features
`ldx`	`x[i] = memory[i]`	Load pair
`ldy`	`y[i] = memory[i]`	Load pair
`ldz` `ldzi`	`z[_][i] = memory[i]`	Load pair, interleaved Z
`stx`	`memory[i] = x[i]`	Store pair
`sty`	`memory[i] = y[i]`	Store pair
`stz` `stzi`	`memory[i] = z[_][i]`	Store pair, interleaved Z

Instruction encoding

Bit	Width	Meaning	Notes
10	22	A64 reserved instruction	Must be `0x201000 >> 10`
5	5	Instruction	`0` for `ldx` `1` for `ldy` `2` for `stx` `3` for `sty` `4` for `ldz` `5` for `stz` `6` for `ldzi` `7` for `stzi`
0	5	5-bit GPR index	See below for the meaning of the 64 bits in the GPR

Operand bitfields

For ldx / ldy:

Bit	Width	Meaning
63	1	Ignored
62	1	Load multiple registers (`1`) or single register (`0`)
61	1	On M1/M2: Ignored (loads are always to consecutive registers) On M3: Load to non-consecutive registers (`1`) or to consecutive registers (`0`)
60	1	On M1: Ignored ("multiple" always means two registers) On M2/M3: "Multiple" means four registers (`1`) or two registers (`0`)
59	1	Ignored
56	3	X / Y register index
0	56	Pointer

For stx / sty:

Bit	Width	Meaning
63	1	Ignored
62	1	Store pair of registers (`1`) or single register (`0`)
59	3	Ignored
56	3	X / Y register index
0	56	Pointer

For ldz / stz:

Bit	Width	Meaning
63	1	Ignored
62	1	Load / store pair of registers (`1`) or single register (`0`)
56	6	Z row
0	56	Pointer

For ldzi / stzi:

Bit	Width	Meaning
62	2	Ignored
57	5	Z row (high 5 bits thereof)
56	1	Right hand half (`1`) or left hand half (`0`) of Z register pair
0	56	Pointer

Description

Move 64 bytes of data between memory (does not have to be aligned) and an AMX register, or move 128 bytes of data between memory (must be aligned to 128 bytes) and an adjacent pair of AMX registers. On M2/M3, can also move 256 bytes of data from memory to four consecutive X or Y registers. On M3, can move 128 or 256 bytes of data from memory to non-consecutive X or Y registers: if bit 61 is set, 128 bytes are moved to registers n and (n+4)%8, or 256 bytes are moved to registers n, (n+2)%8, (n+4)%8, (n+6)%8.

The ldzi and stzi instructions manipulate half of a pair of Z registers. Viewing the 64 bytes of memory and the 64 bytes of every Z register as vectors of i32 / u32 / f32, the mapping between memory and Z is:

Memory	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Z0	0 L	2 L	4 L	6 L	8 L	10 L	12 L	14 L	0 R	2 R	4 R	6 R	8 R	10 R	12 R	14 R
Z1	1 L	3 L	5 L	7 L	9 L	11 L	13 L	15 L	1 R	3 R	5 R	7 R	9 R	11 R	13 R	15 R

In other words, the even Z register contains the even lanes from memory, and the odd Z register contains the odd lanes from memory. By a happy coincidence, this matches up with the "interleaved pair" lane arrangements of mixed-width mac16 / fma16 / fms16 instructions, and with the "interleaved pair" lane arrangements of other instructions when in a (16, 16, 32) arrangement.

Emulation code

See ldst.c.

A representative sample is:

void emulate_AMX_LDX(amx_state* state, uint64_t operand) {
    ld_common(state->x, operand, 7);
}

void ld_common(amx_reg* regs, uint64_t operand, uint32_t regmask) {
    uint32_t rn = (operand >> 56) & regmask;
    const uint8_t* src = (uint8_t*)((operand << 8) >> 8);
    memcpy(regs + rn, src, 64);
    if (operand & LDST_MULTIPLE) {
        uint32_t rs = 1;
        if ((AMX_VER >= AMX_VER_M3) && (operand & LDST_NON_CONSECUTIVE) && (regmask <= 15)) {
            rs = (operand & LDST_MULTIPLE_MEANS_FOUR) ? 2 : 4;
        }
        memcpy(regs + ((rn + rs) & regmask), src + 64, 64);
        if ((AMX_VER >= AMX_VER_M2) && (operand & LDST_MULTIPLE_MEANS_FOUR) && (regmask <= 15)) {
            memcpy(regs + ((rn + rs*2) & regmask), src + 128, 64);
            memcpy(regs + ((rn + rs*3) & regmask), src + 192, 64);
        }
    }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ldst.md

ldst.md

Quick summary

Instruction encoding

Operand bitfields

Description

Emulation code

Files

ldst.md

Latest commit

History

ldst.md

File metadata and controls

Quick summary

Instruction encoding

Operand bitfields

Description

Emulation code