title | subtitle | author | ||
---|---|---|---|---|
IPA Project Report |
Final Submission |
|
Continuing from the mid-evals, we have now implemented a Y86 processor with pipelining support. Each stage has its own module, like they did in SEQ, and each intermediate pipeline register also have their own modules in separate files, named as rxxx.v
. The final module processor.v
is used for instantiating the various stages and registers, and to keep track of the outputs using $monitor
. The centralised register bank module regarr.v
is used here as well.
There is a separate module for PC selection called pc_select.v
, and the PC prediction logic is included in the fetch.v
file. All data forwarding is taken care of in the stages themselves, with appropriate inputs. The pipeline control logic also has its own module in pipectrl.v
which decides whther a certain instruction needs to be stalled/bubbled or not depending on the conditions. Data forwarding has been implemented in the decode.v
file.
The processor has 2kB of instruction memory, 14 registers and 2kB of data memory. Naming conventions for the variables are as is expected for a Y86 pipelined design.
-
halt
-
nop
-
cmovXX
-
irmovq
-
rmmovq
-
mrmovq
-
OPq
-
jXX
-
call
-
ret
-
pushq
-
popq
-
Stalling
-
Bubble
-
PC prediction
-
Data forwarding
\pagebreak
The fetch stage works on the instruction memory insmem
, reading 10 bytes at a time. icode
and ifun
are split and aligned from the first byte, and valP is decided based on the value of icode. From the second byte of the fetched instruction, the register operand specifiers are also obtained and stored. In the case where the instruction is 10 bytes long, the eight byte constant is stored in valC
.
The fetch block now also includes PC prediction and status condition setting. The pc_select
module is instantiated here, as we need it to update the PC, which requires the predicted PC from this stage. The PC is predicted according to the coniditions discussed in class.
Thus, the inputs and outputs to this stage are as follows:
clk
F_PC
f_stat
f_icode
f_ifun
f_rA
f_rB
f_valC
f_valP
- Status conditions:
inst_valid
: set when inst is validimem_er
: set when address is invalidhlt_er
: set when halt is encountereddmem_er
: set when data memory error is encountered
insmem
: register array that functions as the instruction memoryinst
: register that is used to fetch 10 bytes frominsmem
at the location pointed to by PC- Forwarded values from future stages, which are used in PC preditcion and selection
The instructions are hardcoded into the processor in this stage in an initial
block as the instruction memory is local to the fetch stage.
\pagebreak
In this stage, the instruction is decoded from the icode
value and the required values (usually valA
and valB
) are obtained from the registers rA
and rB
which are read from the central register bank according to operand specifiers that were obtained from the fetch stage. The stack pointer is also required for a few of these instructions. Data forwarding is also used here from further stages, as decode is where we require the updated values for instructions. We have also only accessed regarr
here, and hence, all the operations for the write back stage have been taken care of here as well. Essentially, due to the presence of the reg file in this stage, the writeback operation from the writeback stage also occurs here. This is the stage that varies the most from its SEQ implementation, as the processes of data forwarding have been added, and the stage has been merged (on a hardware level) with the writeback stage.
D_stat
D_icode
D_ifun
rA
rB
e_dstE
M_dstE
M_dstM
W_dstM
W_dstE
D_valC
D_valP
e_valE
M_valE
m_valM
W_valM
W_valE
d_stat
d_icode
d_ifun
d_valC
d_valA
d_valB
d_dstE
d_dstM
d_srcA
d_srcB
\pagebreak
The ALU is instantiated in this stage, and the results of computations on valA
and valB
are stored in valE
(where applicable). In most cases, this is an OPq
instruction from the ALU. Also, the three flags used by this architecture: zf, of and sf are computed in this stage. The flags are set for the OPq
instructions and are used for the conditional instructions. This module has not changed much from the SEQ, except for taking inputs from further stages to confirm the status of the processor.
E_stat
m_stat
W_stat
E_icode
E_ifun
E_dstE
E_dstM
E_valA
E_valB
E_valC
e_stat
e_icode
e_dstE
e_dstM
e_valA
e_valE
- Condition Codes:
e_cnd
of
zf
sf
Note: The ALU module has absolute paths for including the various sub modules. They will have to be modified accordingly.
\pagebreak
The portion of instructions that require the altering of or reading from data memory is done in this stage. It is here that we interact with the actual memory of the device with read and write operations. The fwd
variables are essentially the same as outputs of the stage, except they are sent without any alteration as a forwarded value to other stages, notably to the decode stage.
M_cnd
M_stat
M_icode
M_valA
M_dstE
M_dstM
M_valE
m_stat
m_valE
m_valM
m_valAfwd
m_valEfwd
m_dstE
m_dstM
m_icode
M_cndfwd
datamem
: a register that serves as the data memory and is local to the memory stage.
\pagebreak
Writes either valE
or valM
to the required registers in the instructions that call for it. Therefore, this stage handles register updates. These updates are done by accessing the register write functionality in the reg file implemented in the decode block.
W_stat
W_icode
W_dstM
W_dstE
W_valM
W_valE
w_stat
w_icode
w_dstM
w_dstE
w_valM
w_valE
\pagebreak
The processor.v
file includes all the required files for all the stages as well as the register bank. This code is mainly meant to set the status conditions for the processor and to monitor and end execution if necessary. The clock is also controlled by this code with an always statement.
This module takes no arguments as all the required files are included and the modules are instantiated.
This module stores the data contained in all 14 registers and stack pointer using a two dimensional array regArr
. It is capable of reading and writing to this array of registers depending on the input specifiers given, maximum being two at a time. It takes in the specifiers for the required registers and values (if applicable) to be written to them, and performs the required operations accordingly.
PC
rA
, input for the read operationrB
, input for the read operationdstM
, input address for write operationdstE
, input address for write operationwrtA
, input values to be writtenwrtB
, input values to be written
valA
, value in the register specified byrA
valB
, value in the register specified byrB
valStk
, value of the stack pointer%rsp
Implemented in pipectrl.v
, this module is responsible for avoiding and navigating the caveats of pipelining. It sets condition codes for the implementation of stalls and bubbles in the fetch, decode and execution stages where required due to occurrences like load/use hazards, ret instruction processing, as well as navigating mispredicted branches in jump instructions. These condition codes are passed as inputs to the respective registers between stages so that their output values at each positive clock edge can be set accordingly.
e_cnd
d_srcA
d_srcB
D_icode
E_dstM
E_icode
M_icode
m_stat
W_stat
F_stall
D_bubble
D_stall
E_bubble
luhaz
: set in case of load/use hazardinret
: set in case of ret instruction processingmisbranch
: miscalculated branch
The code for the processor compiles without any errors, and the executable file a.out
runs without issues according to the instruction memory given. The output of the instructions given to the processor, however there are a few faults with the data forwarding logic that need to be dealt with. The test instructions run fine, and the data is being forwarded to previous stages.