Author | Editor | Title |
---|---|---|
Jason Lowe-Power |
Maryam Babaie |
DINO CPU Assignment 3 |
Originally from ECS 154B Lab 3, Winter 2019.
Modified for ECS 154B Lab 3, Winter 2021.
Part 3.1 Due on 02/05/2021.
See End of Part 3.1 for details on turning in lab 3.1.
Part 3.2 Due on 02/14/2021
See Grading for details on turning in lab 3.2.
- Introduction
- Pipelined CPU design
- Part I: Re-implement the CPU logic and add pipeline registers
- Part II: Implementing forwarding
- Part III: Implementing branching and flushing
- Part IV: Hazard detection
- Grading
- Submission
- Hints
In the last assignment, you implemented a full single cycle RISC-V CPU. In this assignment, you will be extending this design to be a 5 stage pipeline instead of a single cycle. You will also be implementing full forwarding for ALU instructions and hazard detection. The simple in-order CPU design is based closely on the CPU model in Patterson and Hennessey's Computer Organization and Design.
The DINO CPU code must be updated before you can run each lab. You should read up on how to update your code to get the assignment 3 template from GitHub.
You can check out the main branch to get the template code for this lab.
If you want to use your solution from lab2 as a starting point, you can merge your commits with the origin
main by running git pull
or git fetch; git merge origin/main
.
The goal of this assignment is to implement a pipelined RISC-V CPU which can execute all of the RISC-V integer instructions. Like the previous assignment, you will be implementing this step by step starting with a simple pipeline in Part 1. After that, you will add to your design to implement forwarding and hazard detection.
We are making one major constraint on how you are implementing your CPU. You may not modify the I/O for any module. This is the same constraint that you had in Lab 2. We will be testing your data path, your hazard detection unit, and our forwarding unit in isolation. Therefore, you must keep the exact same I/O. You will get errors on Gradescope (and thus no credit) if you modify the I/O.
- Learn how to implement a pipelined CPU.
- Learn what information must be stored in pipeline registers.
- Learn which combinations of instructions cause hazards, and which can be overcome with forwarding.
Below is a diagram of the pipelined DINO CPU. This diagram includes all control wires unlike the diagram in Assignment 2 in addition to all of the MUXes needed.
Notice: Please be aware that in some of the connections we have not specified what bits of a signal must be connected to the input of a module. While working on this assignment, please make sure you specify the proper bits, wherever needed in your implementation.
The pipelined design is based very closely on the single cycle design.
You may notice there are a few minor changes (e.g., the location of the PC MUX).
You can take your code from the Assignment 2 as a starting point, or you can use the code provided in src/main/scala/single-cycle/cpu.scala
, which is my solution to Assignment 2.
Just like with the last lab, run the following command to go through all tests once you believe you're done.
sbt:dinocpu> test
You can also run the individual tests for each part with testOnly
.
The command to run the tests for each part are included in each part below.
When you see something like the following output when running a test:
- should run branch bne-False *** FAILED ***
This means that the test bne-False
failed.
For this assignment, it would be a good idea to single step through each one of the failed tests. You can find out more information on this in the DINO CPU documentation and in the video DinoCPU - Debugging your implementation.
You may also want to add your own printf
statements to help you debug.
Details on how to do this were are in the Chisel notes.
In this part, you will be implementing a full pipelined processor with the exception of forwarding and hazards. After you finish this part, you should be able to correctly execute any single instruction application.
This is the biggest part of this assignment, and is worth the most points. Unfortunately, there isn't an easy way to break this part down into smaller components. I suggest working from left to right through the pipeline as shown in the diagram above. We have already implemented the instruction fetch (IF) stage for you.
Each of the pipeline registers is defined as a Bundle
at the top of the CPU definition.
For instance, the IF/ID register contains the PC
and instruction
as shown below.
// Everything in the register between IF and ID stages
class IFIDBundle extends Bundle {
val instruction = UInt(32.W)
val pc = UInt(32.W)
}
We have also grouped the control into three different blocks to match the book.
These blocks are the EXControl
, MControl
, and WBControl
.
We have given you the signals that are needed in the EX stage as an example of how to use these bundles.
class EXControl extends Bundle {
val itype = Bool()
val aluop = Bool()
val resultselect = Bool()
val xsrc = Bool()
val ysrc = Bool()
val plus4 = Bool()
val branch = Bool()
val jal = Bool()
val jalr = Bool()
}
You can also create registers for the controls, and in the template we have split these out into other StageReg
s.
We have given you the control registers.
However, each control register simply holds a set of bundles.
You have to set the correct signals in these bundles.
Note that to access the control signals, you may need an "extra" indirection. See the example below:
class EXControl extends Bundle {
val itype = Bool()
val aluop = Bool()
val resultselect = Bool()
val xsrc = Bool()
val ysrc = Bool()
val plus4 = Bool()
val branch = Bool()
val jal = Bool()
val jalr = Bool()
}
class IDEXControl extends Bundle {
val ex_ctrl = new EXControl
val mem_ctrl = new MControl
val wb_ctrl = new WBControl
}
val id_ex_ctrl = Module(new StageReg(new IDEXControl))
...
id_ex_ctrl.io.in.ex_ctrl.aluop := control.io.aluop
Specifically in id_ex_ctrl.io.in.ex_ctrl.aluop
you have to specify ex_ctrl.aluop
since you are are getting a signal out of the ex_ctrl
part of the IDEXControl
bundle.
This pipeline register/bundle isn't complete. It's missing a lot of important signals, which you'll need to add.
Again, I suggest working your way left to right through the pipeline.
For each stage, you can copy the datapath for that stage from the previous lab in src/main/scala/single-cycle/cpu.scala
.
Then, you can add the required signals to drive the datapath to the register that feeds that stage.
Throughout the given template code in src/main/scala/pipelined/cpu.scala
, we have given hints on where to find the datapath components from Lab 2.
We have also already instantiated each of the pipeline registers for you as shown below.
val if_id = Module(new StageReg(new IFIDBundle))
val id_ex = Module(new StageReg(new IDEXBundle))
val id_ex_ctrl = Module(new StageReg(new IDEXControl))
val ex_mem = Module(new StageReg(new EXMEMBundle))
val ex_mem_ctrl = Module(new StageReg(new EXMEMControl))
val mem_wb = Module(new StageReg(new MEMWBBundle))
val mem_wb_ctrl = Module(new StageReg(new MEMWBControl))
For the StageReg
, you have to specify whether the inputs are valid via the valid
signal.
When this signal is high, this tells the register to write the values on the in
lines to the register.
Similarly, there is a flush
signal that when high will set all of the register values to 0
flushing the register.
In Part III, when implementing the hazard unit, you will have to wire these signals to the hazard detection unit.
For Part I, all of the registers (including the control registers) should always be valid
and not flush
as shown below.
if_id.io.valid := true.B
if_id.io.flush := false.B
For Part I, you do not need to use the hazard detection unit or the forwarding unit.
These will be used in later parts of the assignment.
You also do not need to add the forwarding MUXes or worry about the PC containing any value except PC+4
, for the same reason.
Important: Remember to remove the *.io := DontCare
at the top of the cpu.scala
file as you flesh out the I/O for each module.
You can run the tests for this part with the following commands:
sbt:dinocpu> testOnly dinocpu.RTypeTesterLab3
sbt:dinocpu> testOnly dinocpu.ITypeTesterLab3
sbt:dinocpu> testOnly dinocpu.UTypeTesterLab3
sbt:dinocpu> testOnly dinocpu.MemoryTesterLab3
Don't forget about how to single-step through the pipelined CPU and DinoCPU - Debugging your implementation.
Hint-1: auipc1
and auipc3
actually execute two instructions (the first is a nop
) so even though this section is about single instructions, you still need to think about the value of the PC
.
Note: These instructions don't require forwarding.
Hint-2: You may need to include other modules to properly drive the pipelined CPU. We strongly encourage you to analyze the data and control path we provided to make sure you have included all the modules.
Grading will be done automatically on Gradescope. See the Submission section for more information on how to submit to Gradescope.
Name | Percentage |
---|---|
Part I | 60% |
Warning: read the submission instructions carefully. Failure to adhere to the instructions will result in a loss of points.
You will upload the file that you changed to Gradescope on the Assignment 3.1 assignment.
src/main/scala/pipelined/cpu.scala
Once uploaded, Gradescope will automatically download and run your code. This should take less than 5 minutes. For each part of the assignment, you will receive a grade. If all of your tests are passing locally, they should also pass on Gradescope unless you made changes to the I/O, which you are not allowed to do.
Note: There is no partial credit on Gradescope. Each part is all or nothing. Either the test passes or it fails.
You are to work on this project individually. You may discuss high level concepts with one another (e.g., talking about the diagram), but all work must be completed on your own.
Remember, DO NOT POST YOUR CODE PUBLICLY ON GITHUB! Any code found on GitHub that is not the base template you are given will be reported to SJA. If you want to sidestep this problem entirely, don't create a public fork and instead create a private repository to store your work. GitHub now allows everybody to create unlimited private repositories for up to three collaborators, and you shouldn't have any collaborators for your code in this class.
There are three steps to implementing forwarding.
- Add the forwarding MUXes to the execute stage, as seen below.
- Wire the forwarding unit into the processor.
- Implement the forwarding logic in the forwarding unit.
For #3, you may want to consult Section 4.7 of Patterson and Hennessy.
Specifically, Figure 4.53 will be helpful.
Think about the conditions you want to forward and what you want to forward under each condition.
when/elsewhen/otherwise
statements will be useful here.
After this, you can remove the forwarding.io := DontCare
from the top of the file.
With forwarding, you can now execute applications with multiple R-type and/or I-type instructions! The following tests should now pass.
sbt:dinocpu> testOnly dinocpu.ITypeMultiCycleTesterLab3
sbt:dinocpu> testOnly dinocpu.RTypeMultiCycleTesterLab3
Don't forget about how to single-step through the pipelined CPU and DinoCPU - Debugging your implementation.
There are five steps to implementing branches and flushing.
- Add MUXes for PC stall and PC from taken
- Add code to bubble for ID/EX and EX/MEM
- Add code to flush IF/ID
- Connect the taken signal to the hazard detection unit
- Add the logic to the hazard detection unit for when branches are taken.
Before you dive into this part, give some thought to what it means to bubble ID/EX and EX/MEM, how you will implement bubbles, and what it means to flush IF/ID. Section 4.8 of Patterson and Hennessy will be helpful for understanding this part.
With the branch part of the hazard detection unit implemented, you should now be able to execute branch and jump instructions! The following tests should now pass.
sbt:dinocpu> testOnly dinocpu.BranchTesterLab3
sbt:dinocpu> testOnly dinocpu.JumpTesterLab3
Don't forget about how to single-step through the pipelined CPU and DinoCPU - Debugging your implementation.
For the final part of the pipelined CPU, you need to detect hazards for certain combinations of instructions. There are only three remaining steps!
- Wire the rest of the hazard detection unit.
- Modify the PC MUX.
- Add code to bubble in IF/ID.
Again, section 4.8 of Patterson and Hennessy will be helpful here.
After this, you can remove the hazard.io := DontCare
line from the top of the file.
With the full hazard detection implemented, you should now be able to execute any RISC-V application! The following tests should now pass.
sbt:dinocpu> testOnly dinocpu.MemoryMultiCycleTesterLab3
sbt:dinocpu> testOnly dinocpu.ApplicationsTesterLab3
Don't forget about how to single-step through the pipelined CPU and DinoCPU - Debugging your implementation.
To make debugging easier, below are links to the full application traces from the solution to Lab 3.
To check your design, you can use the singlestep program.
If you run print inst
at the prompt in the singlestep program it will print the current PC and the instruction at that PC (in the fetch stage).
- Fibonacci, which computes the nth Fibonacci number. The initial value of
t1
contains the Fibonacci number to compute, and after computing, the value is found int0
. - Natural sum, which computes the sum of numbers from 1 to 10 and stores the result (55) in the data memory at address 0x400 (and it can be found in
t0
). - Multiplier, which multiplies two numbers initially in registers
t0
andt1
. It stores the result of multiplication in the data memory at address 0x500 (and it can be found int0
). - Divider, which divides the value in
t0
by the value int1
and the results can be found int2
and stored in data memory at address 0x450.
Grading will be done automatically on Gradescope. See the Submission section for more information on how to submit to Gradescope.
Name | Percentage |
---|---|
Part I | 50% |
Part II | 10% |
Part III | 10% |
Part IV | 20% |
Warning: read the submission instructions carefully. Failure to adhere to the instructions will result in a loss of points.
You will upload the three files that you changed to Gradescope on the Assignment 3.2 assignment.
src/main/scala/components/forwarding.scala
src/main/scala/components/hazard.scala
src/main/scala/pipelined/cpu.scala
Once uploaded, Gradescope will automatically download and run your code. This should take less than 5 minutes. For each part of the assignment, you will receive a grade. If all of your tests are passing locally, they should also pass on Gradescope unless you made changes to the I/O, which you are not allowed to do.
Note: There is no partial credit on Gradescope. Each part is all or nothing. Either the test passes or it fails.
You are to work on this project individually. You may discuss high level concepts with one another (e.g., talking about the diagram), but all work must be completed on your own.
Remember, DO NOT POST YOUR CODE PUBLICLY ON GITHUB! Any code found on GitHub that is not the base template you are given will be reported to SJA. If you want to sidestep this problem entirely, don't create a public fork and instead create a private repository to store your work. GitHub now allows everybody to create unlimited private repositories for up to three collaborators, and you shouldn't have any collaborators for your code in this class.
- You have commented out or removed any extra debug statements.
- You have uploaded three files:
cpu.scala
,hazard.scala
, andforwarding.scala
.
- Start early! Start early and ask questions on Discord and in discussion.
- If you need help, come to office hours for the TA, or post your questions on Discord.
- See common errors for some common errors and their solutions.