Introduction to RISC-V Basic keywords
RISC-V (pronounced "risk-five") is an open-source instruction set architecture (ISA) designed for computer processors. It has gained significant attention in recent years due to its simplicity, flexibility, and potential for customization. Here are some basic keywords to help you understand RISC-V
RISC-V's basic keywords:
RISC-V: Open-source computer architecture with simple instructions.
ISA: Instruction Set Architecture defines processor instructions.
Registers: Temporary data storage in the processor.
Load-Store: Instructions access memory via loads and stores.
Privilege Levels: User, supervisor, and machine modes.
Formats: Instruction types like R, I, S, B, U, J.
Pipeline: Concurrent instruction execution stages.
Branch: Instructions for decision-making and jumping.
Immediate: Constants embedded in instructions.
Opcode: Binary code for specifying operations.
Assembly: Human-readable machine code representation.
Toolchain: Software tools for RISC-V development.
ABI: Interface for binary-level software interaction.
CISC vs. RISC: Simple vs. complex instruction architectures.
Labwork for RISC-V software toolchain
C program to compute Sum from 1 to N.
#include<stdio.h>
int main()
{
int i,sum = 0,n = 100;
for(i= 1;i <= n; ++i)
{
sum += i;
}
printf("sum of numbers from 1 to %d is %d\n", n,sum);
}
Risc-V Compile and Disassemble
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.
Spike simulation and Debug for output using spike command
spike -d pk sum1ton.o
Integer number Representation
Unsigned integers cover a larger positive range, while signed integers have both positive and negative values within a more limited range due to the need to represent the sign bit.
64-Bit Number system for Unsigned Number
- Range: 0 to 18,446,744,073,709,551,615 (2^64 - 1)
- All 64 bits are used to represent the magnitude of the number.
- The leftmost (most significant) bit is the "sign bit" for determining whether the number is positive or negative.
- Since unsigned integers don't have a sign bit, all 64 bits contribute to the value.
64-Bit Number system for Signed Number
- Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
- The leftmost (most significant) bit is the sign bit.
- 0 in the sign bit represents a positive number, and 1 represents a negative number.
- The remaining 63 bits are used to represent the magnitude using the two's complement representation.
- To convert from a negative value to its two's complement, invert all the bits and add 1.
Application Binary Interface
Introduction to Application Binary Interface
An Application Binary Interface (ABI) is a set of rules and conventions that define how different software components interact with each other at the binary level. It establishes a standard for communication between various parts of a software system, such as libraries, applications, and the operating system. The ABI ensures compatibility and interoperability, allowing programs compiled on different systems to work together seamlessly.
Memory allocation For Double words
Allocating memory for double words means reserving space in units of two processor words. Each word is typically 4 or 8 bytes. This approach helps maintain memory alignment, essential for efficient memory access and performance. Proper alignment follows the word size and ensures data starts at addresses divisible by the word size.
Load Store and Add Instructions with Examples
Load and Store Instructions:
Load and store instructions are fundamental in RISC architectures like RISC-V. They handle data movement between memory and registers.
-
Load: Moves data from memory to registers. Examples are LW (Load Word) and LD (Load Doubleword), which fetch 32 or 64 bits respectively.
-
Store: Moves data from registers to memory. SW (Store Word) and SD (Store Doubleword) are common examples.
These instructions are essential for accessing data stored in memory and interacting with it.
Add Instructions: Add instructions perform addition operations in processors.
-
ADD: Adds two values and stores the result in a destination register.
-
ADDI: Adds an immediate value to the value in a register, storing the result in a destination register.
-
ADDU: Unsigned version of ADD, which ignores overflow.
These instructions are core arithmetic operations and are used for various calculations in programs.
32-Bit registers and their respective API function calls In a 32-bit architecture, registers are data storage locations within the processor that are directly accessible by the CPU. They are used for temporary data storage during program execution.
Lab work using ABI function calls
Study New Algorithm for Sum 1 to N Using ASM and Simulate New C program with Fumction Call
Combinational Logic
Logic Gates Logic gates are fundamental building blocks of digital circuits and are used to perform logical operations on binary inputs (0s and 1s). These gates are the foundation of digital computing and are used to create more complex functions and operations. There are several types of logic gates, each with its own specific behavior.
Combinational Calculator
Pipelined Logic
Error Conditions within Computation Pipeline
RISC-V CPU
1. Program Counter (PC) - The program counter is a special register in a CPU that keeps track of the memory address of the next instruction to be fetched and executed. It is incremented as instructions are fetched, and it provides the address to the instruction memory for fetching the next instruction in the program.
2. Instruction Decoder - The instruction decoder is a circuit within the CPU that interprets the machine instructions fetched from memory. It decodes the binary representation of the instruction and generates control signals that govern the operation of other components in the CPU to execute the instruction.
3. Instruction Memory - The instruction memory is a storage component that holds the machine instructions of a program. It is typically read-only and stores the binary instructions that the CPU fetches and decodes. The program counter provides the address to the instruction memory for fetching the next instruction.
4. Data Memory - The data memory is a storage component used to store data that is manipulated by instructions during program execution. Unlike instruction memory, data memory can be both read from and written to. It holds variables, data arrays, and other information that the program uses during its execution.
5. ALU (Arithmetic Logic Unit) - The ALU is a fundamental digital circuit within the CPU that performs arithmetic and logical operations on data. It can perform tasks such as addition, subtraction, multiplication, division, bitwise operations (AND, OR, XOR), and comparisons. The ALU generates results that are used in various computations specified by the instructions.
6. Read Register File - The read register file is a component that stores a set of registers used to hold data during the execution of instructions. Instructions often involve reading data from these registers. The instruction specifies which registers to read, and the data from these registers can be used as operands for operations performed by the ALU or other components.
7. Write Register File - The write register file is responsible for storing the results of operations back into registers. After an instruction is executed, the result is often written back to the register file. This ensures that the updated data is available for subsequent instructions.
These components work together to execute machine instructions in a CPU. The program counter guides the instruction fetch process, the instruction decoder interprets instructions, the ALU performs computations, the register files hold data, and the memory components provide data storage and access. This orchestration allows a CPU to carry out the tasks required by a program's instructions.
Fetch And Decoder
Template For Running Viz:
\m4_TLV_version 1d: tl-x.org
\SV
// This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
m4_include_lib(['https://raw.githubusercontent.com/BalaDhinesh/RISC-V_MYTH_Workshop/master/tlv_lib/risc-v_shell_lib.tlv'])
\SV
m4_makerchip_module // (Expanded in Nav-TLV pane.)
\TLV
// /====================\
// | Sum 1 to 9 Program |
// \====================/
//
// Program for MYTH Workshop to test RV32I
// Add 1,2,3,...,9 (in that order).
//
// Regs:
// r10 (a0): In: 0, Out: final sum
// r12 (a2): 10
// r13 (a3): 1..10
// r14 (a4): Sum
//
// External to function:
m4_asm(ADD, r10, r0, r0) // Initialize r10 (a0) to 0.
// Function:
m4_asm(ADD, r14, r10, r0) // Initialize sum register a4 with 0x0
m4_asm(ADDI, r12, r10, 1010) // Store count of 10 in register a2.
m4_asm(ADD, r13, r10, r0) // Initialize intermediate sum register a3 with 0
// Loop:
m4_asm(ADD, r14, r13, r14) // Incremental addition
m4_asm(ADDI, r13, r13, 1) // Increment intermediate register by 1
m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
m4_asm(ADD, r10, r14, r0) // Store final result to register a0 so that it can be read by main program
// Optional:
// m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)
|cpu
@0
$reset = *reset;
// YOUR CODE HERE
// ...
// Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
// be sure to avoid having unassigned signals (which you might be using for random inputs)
// other than those specifically expected in the labs. You'll get strange errors for these.
// Assert these to end simulation (before Makerchip cycle limit).
*passed = *cyc_cnt > 40;
*failed = 1'b0;
// Macro instantiations for:
// o instruction memory
// o register file
// o data memory
// o CPU visualization
|cpu
//m4+imem(@1) // Args: (read stage)
//m4+rf(@1, @1) // Args: (read stage, write stage) - if equal, no register bypass is required
//m4+dmem(@4) // Args: (read/write stage)
//m4+myth_fpga(@0) // Uncomment to run on fpga
//m4+cpu_viz(@4) // For visualisation, argument should be at least equal to the last stage of CPU logic. @4 would work for all labs.
\SV
endmodule
L1 - Implementation Plan and Lab for PC L2 - Lab for instruction fetch logic
Fetch Block diagram Output: Correct fetch Block Diagram : Via
L3 - Lab for RV instruction types IRSBJU Decode Logic
L4 - Lab for instruction immediate decode logic for RV ISBUJ
L5 - Lab to decode other fields of Instruction of RV ISBUJ
L6 - Lab to decode instruction fields based on Instruction type RV ISBUJ
RV-D4SK3 - RISCV Control Logic
L1 - Lab for Register file read -1
L4 - Lab for Register file write
L5 - Concept of array and Rgister file details
L6 - Lab for implementing branch Instructions
L7 - Lab fpr completing branch instructions implementations
Pipelining the CPU
Under this section, we will look into pipelining and its benefits, and pipeline the RISC-V CPU design. We will go over the possible hazards and how to work around to avoid hazards.
First of all it is important to understand pipelining. It streamlines the process of retiming and considerably reducing the occurrence of functional errors. This technique enables faster computational tasks. We have listed the various benefits of pipelining as follows
- Increased throughput
- Reduced latency
- Better resource utilization
- Improved parallelism
- Smoother performance
- Scalability
- Faster clock speeds
- Reduced dependencies
- Flexibility
- Efficient resource sharing
As previously explained, establishing the pipeline is a straightforward process of incorporating stages labelled as @1, @2, and so on. A visual representation of the pipelining setup is provided below. In TL Verilog, it's important to note that there is no strict requirement to define the pipeline stages in a specific systematic order, providing an extra layer of benefit.
The hazards that can arise in pipelining a design are listed as
- Control flow hazard
- Read after Write hazard
Now, first we will look into how to pipeline the system, then will tackle the incoming hazards.
Creating 3-Cycle Valid Signal
-
We make a start pulse to reset the previous cycle
-
The we make a 3 cycle loop of valid pulses.
-
Schematic Diagram for the design
- Code for Makerchip IDE implementation.
$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ;
$start_int = $reset ? 1'b0 : 1'b1;
$start = $reset ? 1'b0 : ($start_int && !>>1$start_int);
Invalid Cycles Adjustments
-
Once we have created a 3 cycles with valid cycles, we get cycles in which there are non valid cycles.
-
We have to make sure invalid instruction does write in the register files and PC.
-
TLverilog code for implementation on Makerchip IDE.
// introducing valid_taken_br
$valid_taken_br = $valid && $taken_branch;
// updating the PC
$pc[31:0] = >>1$reset ? 32'b0 : (>>1$valid_taken_br)? (>>1$br_target_pc) : (>>1$pc + 32'd4);
Logic Distribution into 3-Cycles
- Under this step we look into how to update the design to execute the logic into 3 cycles.
- Schematic for distribution ![Screenshot from 2023-08-28 08-35-23](https://github.com/ShubhamGitHub528/ASIC/assets/140998623/7404537e-e39b-427a-9e42-f1dd3
- Implementation of 3-Cycle Pipeline over MakerChip IDE.
Solutions to Pipelining Hazards
We will look into how to get past the pipeline hazards.
-
One such hazards, is read after write hazard.
-
Code introduced to the CPU for the tackle
$src1_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs1 )) ? (>>1$result): $rf_rd_data1;
$src2_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs2 )) ? (>>1$result) : $rf_rd_data2;
-
Now, we look into how to rectify the branch paths in the CPU core developed.
-
Code Introduced
$pc[31:0] = (>>1$reset) ? 32'b0 : (>>3$valid_taken_br) ? (>>3$br_tgt_pc) : (>>3$int_pc) ;
// we will comment off the valid line
//$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ;
- Now, we will decode the remaining RV32I Base Instruction Set. Can refer this page for a detailed discription --> LINK
- Once we complete the decoding, we finish the ALU logic for the decode instruction set.
- Complete implementation on Makerchip IDE.
Load/Store Instructions and Completing the CPU
Under this section, we will look into how to add the load and store data from register files and test program, followed by instantiation of the data memory unit. Towards the end we will look into how to generate branch control logic for the jump statements.
-
Now, we look into the schematic flow to load data and implement this on makerchip.
-
Now we begin with creating the data memory.
-
The block diagram for the memory structure, representing the inputs and outputs for the memory block are as follows.
-
After the memory is instantiated, we try to load and store using different register and have a hands-on practice.
-
The final being is the integration of control for branching of jump statements.
-
The scehmatic diagram showing the implemetation of jump statement logic
- Kunal Ghosh, Co-founder, VSD Corp. Pvt. Ltd.
- Steve Hoover, Founder, Redwood EDA
- Shant Rakshit
- Alwin Shaju
- https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
- https://github.com/kunalg123
- https://github.com/stevehoover/RISC-V_MYTH_Workshop
- https://www.vsdiat.com/
- https://redwoodeda.com/
- https://makerchip.com/
- https://riscv.org/
- https://inst.eecs.berkeley.edu/
- https://github.com/riscv/riscv-gnu-toolchain