So here is the idea...
To realistically simulate the MIPS32 Processor:
-
Create a main process, let's call it server. Server basically represents all the memory and control that happens in a MIPS32 Processor.
-
Create 5 subprocesses using fork, let's call them clients. The 5 clients are an abstract representation of Fetch, Decode, Execute, Memory, and Write-Back stages of the processor.
-
Execution of a cycle on the MIPS32 Processor.
- All the clients receive a signal from main server at the start of the cycle.
- They work as required on whatever is there in the input channels.
- Once done, they send the necessary info on output channels back to the server.
- The server takes the info that it has received from the clients on each channel, does whatever needs to be done to the memory, and pushes the required info onto each of the client's channels.
-
The serve
-
Ram
-
El
Max Vector (Data Segment) size: 40 MB.
- A simple clock edge triggered cycle. Three stages - fetch&decode, execute, memory. Each stage gets completed in one cycle irrelevant of the instrcution complexity.
- I will have 2 different channels that are an abstract way of implementing data and control bus on the simulator:
- dec_channel
- 32-bit instruction
- control (2 bit):
- reserved_instruction (1 bit) (check yes if the next stage will try to execute a branch instruction in the next cycle)
- actually_branching (1 bit) (will be yes if branch instruction actually led to a jump)
- exec_channel
- functionPointer (use functional with any from tuple)
- 8-bit control (only 6 bits used): specify which of the below lines actually have function argument
- 64 bit data:
- 4 8-bit lines (lsb)
- 1 16 bit line
- 1 32-bit line (msb)
- mem_channel
- memory address (32 bit)
- Control (8 bit)
- read or write (1 bit)
- width (2 bit) (how much to read or write)
- 0 (1 byte)
- 1 (2 byte)
- 2
- 3 (4 byte)
- regAddr (5 bit) (register to read from or write into)
- an additional bit to indicate whether to sign_extend (1) or not (0)
- dec_channel
- In each cycle, we execute the functions in following sequence:
- Store or load if there is anything in the mem_channel
- then execute if there is anything in the execute_channel. If a store/load instruction, push it on the mem_channel. If the instruction is branch and you do branch then check branching as yes in the execute channel, so that the next decoded instruction is discarded.
- decode if there is anything on the decode channel. If the instruction is branch, check execute_branch as yes. So that the if the very next instruction turns out to be reserved, then we can signal the reserved exception.
- fetch. Don't fetch if the execute_branch is set to yes in decode channel.
- Flushing out stages in reverse ensures that the channels are clear for the previous stage.
- R-Type: op: 6 | rs: 5 | rt: 5 | rd: 5 | sa: 5 | func: 6 |
- I-Type:
- op: 6 | rs: 5 | rt: 5 | immediate: 16 |
- op: 6 | rd: 5 | offset: 21 |
- op: 6 | rs: 5 | rt: 00 | immediate: 19 |
- J-Type: op: 6 | instr_index: 26 |
Now let's see which op codes and func codes does each format use:
1: given as op or (op, func) or (op, func, shamt)
- (0, (32, 33, 36, 39, 37, 42, 43, 34, 35, 38))
- (28, (33, 32))
- (31, 32, (0, not Zero)),
- (0, (0, 2, 3, 4, 6, 7), (0, not Zero))
- (0, (48, 49, 50, 51), (2, 3))
- (0, 9, (MSB0, MSB1))
2.1:
- 4, 5, 9, 10, 11, 12, 13, 14, 22, 23, 32, 36, 33, 37, 35, 40, 41, 43
- (1, rt = (0, 1))
- ((54, 62), rs = 0)
- ((6, 7), rs = (0, not zero), rt = (0, not zero))
- (8, (rs = zero, not zero))
- (24, rs = (0, not zero)),
- 59 (check the table for this)
2.2: 54, 62
2.5: 59
3: 2, 3, 50, 58
Notes:
-
Instructions not mentioned in volume-I A but are there in release-6: SEB, SEH, EXT, INS, WSBH, MOVN, MOVZ
-
LWUPC, LDPC (load and store instructions) mentioned in Volume-I A but not found in volume-II A of MIPS Manual. Thus not included here.
-
The term “unsigned” in ADDIU, ADDU is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow.
-
Calculative Instructions:
- ALU
- Shift
- Multiply and Divide
-
Jumpy and Branch
- PC-relative conditional branch
- PC-region unconditional jump
- PC-relative unconditional branch
- Release 6 introduces unconditional branches with 26-bit offsets.
- Absolute (register) unconditional jumps
- Register-indirect jumps
- Register indexed jumps (Release 6)
- Procedures
- A set of procedure calls that record a return link address in a general register.
- Procedure return is performed by register indirect jumps that use the contents of the link register, r31, as the branch target. This provides a hint to the hardware to adjust the call/return predictor appropriately.
16 Bit Operand:
Below instructions are all of type: op (6 bit), rs (5 bit), rt(5 bit), immediate(16 bit)
Instruction | op | op (in dec) |
---|---|---|
ADDIU RD,RS, CONST16 |
001001 | 9 |
ANDI RD, RS, CONST16 |
001100 | 12 |
ORI RD, RS, CONST16 |
001101 | 13 |
SLTI RD, RS, CONST16 |
001010 | 10 |
SLTIU RD,RS, CONST16 |
001011 | 11 |
XORI RD, RS, CONST16 |
001110 | 14 |
There is also suppose to be an LUI instruction above. But it is to be interpreted and done as AUI anyways so...
3 Operands:
Below instructions are all of type: op(6 bit) = 0, rs (5 bit), rt(5 bit), rd(5 bit), blank(5 bit) = 0, func(6 bit)
Instruction | func | (in dec) |
---|---|---|
ADD RD, RS, RT |
100000 | 32 |
ADDU RD, RS, RT |
100001 | 33 |
AND RD, RS, RT |
100100 | 36 |
NOR RD, RS, RT |
100111 | 39 |
OR RD, RS, RT |
100101 | 37 |
SLT RD, RS, RT |
101010 | 42 |
SLTU RD, RS, RT |
101011 | 43 |
SUB RD, RS, RT |
100010 | 34 |
SUBU RD, RS, RT |
100011 | 35 |
XOR RD, RS, RT |
100110 | 38 |
2 Operands:
Below instructions are all of type: op(6 bit) = 28, rs (5 bit), blank(5 bit) = 0, rd(5 bit), blank(5 bit) = 0, func(6 bit)
Instruction | func | (in dec) |
---|---|---|
CLO RD, RS |
100001 | 33 |
CLZ RD, RS |
100000 | 32 |
Below instructions are all of type: op(6 bit), rs/special (5 bit), blank(5 bit) = 0, rd(5 bit), shamt(5 bit), func(6 bit). First two have 31 as op code, the rest have zero.
Instruction | rs/special | shamt | func | func (in dec) |
---|---|---|---|---|
ALIGN RD, RS, RT, BP |
XXXXX | 010XX | 100000 | 32 |
BITSWAP RD, RT |
00000 | 00000 | 100000 | 32 |
ROTR RD, RS, BITS5 |
00001 | XXXXX | 000010 | 2 |
SRL RD, RS, SHIFT5 |
00000 | XXXXX | 000010 | 2 |
SLL RD, RS, SHIFT5 |
00000 | XXXXX | 000000 | 0 |
SLLV RD, RS, RT |
XXXXX | 00000 | 000100 | 4 |
SRA RD, RS, SHIFT5 |
00000 | XXXXX | 000011 | 3 |
SRAV RD, RS, RT |
XXXXX | 00000 | 000111 | 7 |
ROTRV RD, RS, RT |
XXXXX | 00001 | 000110 | 6 |
SRLV RD, RS, RT |
XXXXX | 00000 | 000110 | 6 |
Below instructions have binary of the form: op: 000000 | Rs: 5 | Rt: 5 | Rd: 5 | shamt: 5 | func: 6
Instruction | shamt | func | shamt (in dec) | func (in dec) |
---|---|---|---|---|
MUL RD, RS, RT |
00010 | 011000 | 2 | 48 |
MUH RD, RS, RT |
00011 | 011000 | 3 | 48 |
MULU RD, RS, RT |
00010 | 011001 | 2 | 49 |
MUHU RD, RS, RT |
00011 | 011001 | 3 | 49 |
DIV RD, RS, RT |
00010 | 011010 | 2 | 50 |
MOD RD, RS, RT |
00011 | 011010 | 3 | 50 |
DIVU RD, RS, RT |
00010 | 011011 | 2 | 51 |
MODU RD, RS, RT |
00011 | 011011 | 3 | 51 |
Unconditional Jump 256-MB span: J ADDR28
: 000010 (2) and JAL ADDR28
: 000011 (3) with the 26 bit address after that.
Unconditional Jump Absolute Address: func in dec is 9.
JALR Rd Rs and JALR.HB Rd, Rs are for the form op: 000000 | rs: XXXXX | 00000 | rd: XXXXX | shamt: 0XXXX | 001001
where the MSB of shamt is 1 instead of 0 for JALR.HB. The other 4 bits of shamt are not important.
PC-Relative Branch Instructions: All are of the form op(6) | rs(5) | rt(5) | offset(16). First two have custom rt, last three have rt as zero, and the third one has rt as 1.
Instruction | op | op(in dec) |
---|---|---|
BEQ RS, RT, OFF18 |
000100 | 4 |
BNE RS, RT, OFF18 |
000101 | 5 |
BGEZ RS, OFF18 |
000001 | 1 |
BGTZ RS, OFF18 |
000111 | 7 |
BLEZ RS, OFF18 |
000110 | 6 |
BLTZ RS, OFF18 |
000001 | 1 |
The below 3 don't have forbidden slot in the very next one, and won't cause a reserved instruction exception:
Unconditional Branch and Call: Called as BC offset
or BALC offset
with opcodes as 110010 (50) and 111010 (58) respectively. offset
is 26 bits.
Indexed Jumps (register + unscaled offset): Both of them are of the form op(6) | 0(5) | rt(5) | offset(16). JIC rt offset
: 110110 (54) and JIALC rt offset
: 111110 (62).
Compare to Zero Instructions:
BEQZC, BNEZC of the type op(6) | rs(5) | offset (21). The rest have rs and rt, and offset of length 16. A lot of transaction that end up translating to other in binary and work fine are not mentioned here. Since I won't have to simulate those.
Instruction | op | op(in dec) |
---|---|---|
BEQZC | 110110 | 54 |
BNEZC | 111110 | 62 |
BEQC | 001000 | 8 |
BNEC | 011000 | 24 |
BLTC | 010111 | 23 |
BGEC | 010110 | 22 |
BLTUC | 000111 | 7 |
BGEUC | 000110 | 6 |
Conditional Calls, Compare Against Zero:
Offset is 16 bits long in all the below instructions. And rt can never be zero.
Instruction | op | rs | op(in dec) |
---|---|---|---|
BEQZALC | 001000 | 00000 | 8 |
BNEZALC | 011000 | 00000 | 24 |
BLEZALC | 000110 | 00000 | 6 |
BGEZALC | 000110 | rt | 6 |
BGTZALC | 000111 | 00000 | 7 |
BLTZALC | 000111 | rt | 7 |
Branch if Overflow:
BOVC and BNVC have op-codes 001000
(8) and 011000
(24) respectively. They have rs, rt, and offset(16).
LSA has func 000101
at the end.
Instruction | Format | op | op (in dec) |
---|---|---|---|
LSA | rs(5) rt(5) rd(5) 000 sa(2) | 000000 | 0 |
AUI | rs(5) rd(5) immediate(16) | 001111 | 15 |
ADDIUPC | rs(5) 00 immediate(19) | 111011 | 59 |
AUIPC | rs(5) 11110 immediate(16) | 111011 | 59 |
ALUIPC | rs(5) 11111 immediate(16) | 111011 | 59 |
Unsigned here is again a misnomer that signifies no overflow trap.
Following instructions are of the format: op(6) | base(5) | rt(5) | offset(16).
inst | op |
---|---|
LB | 32 |
LBU | 36 |
LH | 33 |
LHU | 37 |
LW | 35 |
SB | 40 |
SH | 41 |
SW | 43 |
There are also lbe, lbue, lhe, lhue, lwe, sbe, swe. Which do the same thing but to and from user mode virtual address space when executing in kernel mode. So I have decided to leave them for now.
Atomic read-modify-write
LL - op(6) | base(5) rt(5) | offset(9) | 0 | func(6)
- op = 31 and func = 54.
SC
LLWP
Serialization
Instruction |
---|
SYNC |
SYNCI |
Exception
Instruction |
---|
BREAK |
SYSCALL |
Each of the below instructions is called as something like TEQ Rs Rt
. Binary from is: SPECIAL: 000000 | rs: XXXXX | rt: XXXXX | code: 10 bits | func |
Instruction | Func |
---|---|
TEQ | 110100 |
TGE | 110000 |
TGEU | 110001 |
TLT | 110010 |
TLTU | 110100 |
TNE | 110100 |
Conditional Move
Assembly: SELEQZ Rd, Rs, Rt
| Binary: SPEICAL: 000000 | Rs: XXXXX | Rt: XXXXX | Rd: XXXXX | 00000| func
Instruction | Func |
---|---|
SELEQZ | 110101 |
SELNEZ | 110111 |
PREFETCH
PREF hint, offset (base)
: SPECIAL3: 011111 | base: XXXXX | hint: XXXXX | offset: 9 bits | 0 | PREF: 110101 |
NOP
: All zero
SSNOP
: SHAMT: 1 (executes SLL)
- 32 General purpose registers
$zero
: Always holds the value 0.$at
: Reserved for the assembler.$v0
-$v1
: Used for function return values.$a0
-$a3
: Used for function arguments.$s0
-$s7
: Saved registers.$t0
-$t9
: Temporary registers.$k0
-$k1
: Reserved for the OS Kernel.$gp
: Global pointer.$sp
: Stack pointer.$fp
: Frame pointer.$ra
: Return address.
- 32 Floating Points Registers -
$f0
to$31
- Used for floating point operations (Present on CP1) - Special Registers
- Progam Counter (PC) - Holds the address of next instruction to be executed
- HI and LO Registers: Used to store the results of certain arithmetic operations like multiplication and division.
- Cycle Counter Register: Often referred to as the Count register (
$count
), it keeps track of the number of cycles.
The main MIPS processors also has four coprocessors.
- cp0: System Control Coprocessor. Provides control, memory management, and exception handling functions.
- cp1: Floating point Coprocessor. Load and store instructions (move to and from coprocessor), reserved in the main opcode space. Coprocessor-specific operations, defined entirely by the coprocessor.
- cp2 - Seems to be for double operations
- cp3
When executing a branch instruction, the very next instruction (called delay - slot instruction) comes in fetch because of pipelining. So...
- this delay slot instruction is typically executed. (Code optimization in compiler makes sure to move the branch instruction one line upward or backward depending on how you see it.)
- unless that instruction is a control transfer instruction. Pre-release 6 will give UNPREDICTABLE results, and release 6 will signal a Reserved Instruction Exception.
- Release 6 introduces conditional compact branches and compact jumps that do not have a delay slot; they have instead a forbidden slot. Release 6 unconditional compact branches have neither a delay slot nor a forbidden slot.
- FORBIDDEN: any CTI, including branches and jumps, ERET, ERETNC, DERET, WAIT, and PAUSE.
- Instruction Memory: Memory map all the instructions to this segment. Address stored in program counter.
- Data Segment: Stores initialized global and static variables.
- Block Started by Symbol (BSS) Segment: Stores uninitialized global and static variables.
- Heap: Used for dynamic memory allocation.
- Stack: Used for function calls and local variables. Address stored at stack pointer (register 29)
- Global Data Segment: Stores global variables and constants. Address stored at global pointer (register 28)
ELF file is the binary file that the CPU (cough OS cough) actually processes. These contain all the goodies and information, all the data that needs to be loaded in the memory, what ISA we are processing - 32 bit or 64 bit, little or big endian, the processor name, and such things. But ELF files are very broad, and are also used in OS for other things. Now since we aren't actually making an OS and ktthis is just a MIPS32 Simulator for a very specific case, we already would know a lot of things so we can ignore most of the fields, and only focus on the ones that we want to.
So the (maybe) important fields for us would be:
- ELF Header:
- entry point address
- program headers start offset
- sections headers start offset
- size of ELF header
- size of Program header
- Number of Program Headers
- Size of Section Headers
- Number of Section Headers
- Section Header String Table Index
- Traditional model for implementing a processor does it in five stages: Fetch, Decode, Execute, Memory, and Write-back. This processor simulator however will only have one single stage. It will take instructions from memory and execute and store back in the same stage. This decision enables me to focus on the core concepts of the computer-architecture and enable better learning.
- We will have coprocessor cp0 extension the simulator.
Refs: