RISC-V

Day 1: Introduction to RISC-V and GNU compiler toolchain

Introduction to RISC-V Basic keywords

RISC-V (pronounced "risk-five") is an open-source instruction set architecture (ISA) designed for computer processors. It has gained significant attention in recent years due to its simplicity, flexibility, and potential for customization. Here are some basic keywords to help you understand RISC-V

RISC-V's basic keywords:

RISC-V: Open-source computer architecture with simple instructions.
ISA: Instruction Set Architecture defines processor instructions.
Registers: Temporary data storage in the processor.
Load-Store: Instructions access memory via loads and stores.
Privilege Levels: User, supervisor, and machine modes.
Formats: Instruction types like R, I, S, B, U, J.
Pipeline: Concurrent instruction execution stages.
Branch: Instructions for decision-making and jumping.
Immediate: Constants embedded in instructions.
Opcode: Binary code for specifying operations.
Assembly: Human-readable machine code representation.
Toolchain: Software tools for RISC-V development.
ABI: Interface for binary-level software interaction.
CISC vs. RISC: Simple vs. complex instruction architectures.

Labwork for RISC-V software toolchain

C program to compute Sum from 1 to N.

#include<stdio.h>

int main()
{
int i,sum = 0,n = 100;
for(i= 1;i <= n; ++i)
{
sum += i;
}
printf("sum of numbers from 1 to %d is %d\n", n,sum);

}

Risc-V Compile and Disassemble

riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o sum1ton.o sum1ton.

Spike simulation and Debug for output using spike command

spike -d pk sum1ton.o

Integer number Representation

Unsigned integers cover a larger positive range, while signed integers have both positive and negative values within a more limited range due to the need to represent the sign bit.

64-Bit Number system for Unsigned Number

Range: 0 to 18,446,744,073,709,551,615 (2^64 - 1)
All 64 bits are used to represent the magnitude of the number.
The leftmost (most significant) bit is the "sign bit" for determining whether the number is positive or negative.
Since unsigned integers don't have a sign bit, all 64 bits contribute to the value.

64-Bit Number system for Signed Number

Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
The leftmost (most significant) bit is the sign bit.
0 in the sign bit represents a positive number, and 1 represents a negative number.
The remaining 63 bits are used to represent the magnitude using the two's complement representation.
To convert from a negative value to its two's complement, invert all the bits and add 1.

Day 2: Introduction to ABI and Basic verification flow

Application Binary Interface

Introduction to Application Binary Interface

An Application Binary Interface (ABI) is a set of rules and conventions that define how different software components interact with each other at the binary level. It establishes a standard for communication between various parts of a software system, such as libraries, applications, and the operating system. The ABI ensures compatibility and interoperability, allowing programs compiled on different systems to work together seamlessly.

Memory allocation For Double words Allocating memory for double words means reserving space in units of two processor words. Each word is typically 4 or 8 bytes. This approach helps maintain memory alignment, essential for efficient memory access and performance. Proper alignment follows the word size and ensures data starts at addresses divisible by the word size.

Load Store and Add Instructions with Examples Load and Store Instructions: Load and store instructions are fundamental in RISC architectures like RISC-V. They handle data movement between memory and registers.

Load: Moves data from memory to registers. Examples are LW (Load Word) and LD (Load Doubleword), which fetch 32 or 64 bits respectively.
Store: Moves data from registers to memory. SW (Store Word) and SD (Store Doubleword) are common examples.

These instructions are essential for accessing data stored in memory and interacting with it.

Add Instructions: Add instructions perform addition operations in processors.

ADD: Adds two values and stores the result in a destination register.
ADDI: Adds an immediate value to the value in a register, storing the result in a destination register.
ADDU: Unsigned version of ADD, which ignores overflow.

These instructions are core arithmetic operations and are used for various calculations in programs.

32-Bit registers and their respective API function calls In a 32-bit architecture, registers are data storage locations within the processor that are directly accessible by the CPU. They are used for temporary data storage during program execution.

Lab work using ABI function calls

Study New Algorithm for Sum 1 to N Using ASM and Simulate New C program with Fumction Call

Day 3: Digital Logic with TL-Verilog and Makerchip.

Combinational Logic

Logic Gates Logic gates are fundamental building blocks of digital circuits and are used to perform logical operations on binary inputs (0s and 1s). These gates are the foundation of digital computing and are used to create more complex functions and operations. There are several types of logic gates, each with its own specific behavior.

Inverter

And

Vector

Mux

Combinational Calculator

Sequential Logic

Febonacci Series

Counter

Sequential Calculator

Pipelined Logic

Error Conditions within Computation Pipeline

Counter & Calculator

2-Cycle Calculator

Validity

2- Cycle Calculator with Validity

Calculator with Single Value Memory

Day4: Basic RISC-V CPU micro-architecture.

RISC-V CPU

1. Program Counter (PC) - The program counter is a special register in a CPU that keeps track of the memory address of the next instruction to be fetched and executed. It is incremented as instructions are fetched, and it provides the address to the instruction memory for fetching the next instruction in the program.

2. Instruction Decoder - The instruction decoder is a circuit within the CPU that interprets the machine instructions fetched from memory. It decodes the binary representation of the instruction and generates control signals that govern the operation of other components in the CPU to execute the instruction.

3. Instruction Memory - The instruction memory is a storage component that holds the machine instructions of a program. It is typically read-only and stores the binary instructions that the CPU fetches and decodes. The program counter provides the address to the instruction memory for fetching the next instruction.

4. Data Memory - The data memory is a storage component used to store data that is manipulated by instructions during program execution. Unlike instruction memory, data memory can be both read from and written to. It holds variables, data arrays, and other information that the program uses during its execution.

5. ALU (Arithmetic Logic Unit) - The ALU is a fundamental digital circuit within the CPU that performs arithmetic and logical operations on data. It can perform tasks such as addition, subtraction, multiplication, division, bitwise operations (AND, OR, XOR), and comparisons. The ALU generates results that are used in various computations specified by the instructions.

6. Read Register File - The read register file is a component that stores a set of registers used to hold data during the execution of instructions. Instructions often involve reading data from these registers. The instruction specifies which registers to read, and the data from these registers can be used as operands for operations performed by the ALU or other components.

7. Write Register File - The write register file is responsible for storing the results of operations back into registers. After an instruction is executed, the result is often written back to the register file. This ensures that the updated data is available for subsequent instructions.

These components work together to execute machine instructions in a CPU. The program counter guides the instruction fetch process, the instruction decoder interprets instructions, the ALU performs computations, the register files hold data, and the memory components provide data storage and access. This orchestration allows a CPU to carry out the tasks required by a program's instructions.

Fetch And Decoder

Fetch

Template For Running Viz:

\m4_TLV_version 1d: tl-x.org
\SV
   // This code can be found in: https://github.com/stevehoover/RISC-V_MYTH_Workshop
   
   m4_include_lib(['https://raw.githubusercontent.com/BalaDhinesh/RISC-V_MYTH_Workshop/master/tlv_lib/risc-v_shell_lib.tlv'])

\SV
   m4_makerchip_module   // (Expanded in Nav-TLV pane.)
\TLV

   // /====================\
   // | Sum 1 to 9 Program |
   // \====================/
   //
   // Program for MYTH Workshop to test RV32I
   // Add 1,2,3,...,9 (in that order).
   //
   // Regs:
   //  r10 (a0): In: 0, Out: final sum
   //  r12 (a2): 10
   //  r13 (a3): 1..10
   //  r14 (a4): Sum
   // 
   // External to function:
   m4_asm(ADD, r10, r0, r0)             // Initialize r10 (a0) to 0.
   // Function:
   m4_asm(ADD, r14, r10, r0)            // Initialize sum register a4 with 0x0
   m4_asm(ADDI, r12, r10, 1010)         // Store count of 10 in register a2.
   m4_asm(ADD, r13, r10, r0)            // Initialize intermediate sum register a3 with 0
   // Loop:
   m4_asm(ADD, r14, r13, r14)           // Incremental addition
   m4_asm(ADDI, r13, r13, 1)            // Increment intermediate register by 1
   m4_asm(BLT, r13, r12, 1111111111000) // If a3 is less than a2, branch to label named <loop>
   m4_asm(ADD, r10, r14, r0)            // Store final result to register a0 so that it can be read by main program
   
   // Optional:
   // m4_asm(JAL, r7, 00000000000000000000) // Done. Jump to itself (infinite loop). (Up to 20-bit signed immediate plus implicit 0 bit (unlike JALR) provides byte address; last immediate bit should also be 0)
   m4_define_hier(['M4_IMEM'], M4_NUM_INSTRS)

   |cpu
      @0
         $reset = *reset;



      // YOUR CODE HERE
      // ...

      // Note: Because of the magic we are using for visualisation, if visualisation is enabled below,
      //       be sure to avoid having unassigned signals (which you might be using for random inputs)
      //       other than those specifically expected in the labs. You'll get strange errors for these.

   
   // Assert these to end simulation (before Makerchip cycle limit).
   *passed = *cyc_cnt > 40;
   *failed = 1'b0;
   
   // Macro instantiations for:
   //  o instruction memory
   //  o register file
   //  o data memory
   //  o CPU visualization
   |cpu
      //m4+imem(@1)    // Args: (read stage)
      //m4+rf(@1, @1)  // Args: (read stage, write stage) - if equal, no register bypass is required
      //m4+dmem(@4)    // Args: (read/write stage)
      //m4+myth_fpga(@0)  // Uncomment to run on fpga

   //m4+cpu_viz(@4)    // For visualisation, argument should be at least equal to the last stage of CPU logic. @4 would work for all labs.
\SV
   endmodule

L1 - Implementation Plan and Lab for PC L2 - Lab for instruction fetch logic

Fetch Block diagram Output: Correct fetch Block Diagram : Via

Decoder:

L3 - Lab for RV instruction types IRSBJU Decode Logic

L4 - Lab for instruction immediate decode logic for RV ISBUJ

L5 - Lab to decode other fields of Instruction of RV ISBUJ

L6 - Lab to decode instruction fields based on Instruction type RV ISBUJ

L7 - Lab to decode individual Instruction

RV-D4SK3 - RISCV Control Logic

L1 - Lab for Register file read -1

Lab for Register file read -2

L3 - Lab for ALU operations

L4 - Lab for Register file write

L5 - Concept of array and Rgister file details

L6 - Lab for implementing branch Instructions

L7 - Lab fpr completing branch instructions implementations

L8 - Lab to create simple testbench

Day 5: Complete Pipelined RISC-V CPU Micro-Architecture

Pipelining the CPU

Under this section, we will look into pipelining and its benefits, and pipeline the RISC-V CPU design. We will go over the possible hazards and how to work around to avoid hazards.

First of all it is important to understand pipelining. It streamlines the process of retiming and considerably reducing the occurrence of functional errors. This technique enables faster computational tasks. We have listed the various benefits of pipelining as follows

Increased throughput
Reduced latency
Better resource utilization
Improved parallelism
Smoother performance
Scalability
Faster clock speeds
Reduced dependencies
Flexibility
Efficient resource sharing

As previously explained, establishing the pipeline is a straightforward process of incorporating stages labelled as @1, @2, and so on. A visual representation of the pipelining setup is provided below. In TL Verilog, it's important to note that there is no strict requirement to define the pipeline stages in a specific systematic order, providing an extra layer of benefit.

The hazards that can arise in pipelining a design are listed as

Control flow hazard
Read after Write hazard

Now, first we will look into how to pipeline the system, then will tackle the incoming hazards.

Creating 3-Cycle Valid Signal

We make a start pulse to reset the previous cycle
The we make a 3 cycle loop of valid pulses.
Schematic Diagram for the design

Code for Makerchip IDE implementation.

	$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ;
	$start_int = $reset ? 1'b0 : 1'b1;
	$start = $reset ? 1'b0 : ($start_int && !>>1$start_int);

Invalid Cycles Adjustments

Once we have created a 3 cycles with valid cycles, we get cycles in which there are non valid cycles.
We have to make sure invalid instruction does write in the register files and PC.
Schematic to be implemented
TLverilog code for implementation on Makerchip IDE.

// introducing valid_taken_br
$valid_taken_br = $valid && $taken_branch;

// updating the PC
$pc[31:0] = >>1$reset ? 32'b0 : (>>1$valid_taken_br)? (>>1$br_target_pc) : (>>1$pc + 32'd4);

Logic Distribution into 3-Cycles

Under this step we look into how to update the design to execute the logic into 3 cycles.
Schematic for distribution ![Screenshot from 2023-08-28 08-35-23](https://github.com/ShubhamGitHub528/ASIC/assets/140998623/7404537e-e39b-427a-9e42-f1dd3
Implementation of 3-Cycle Pipeline over MakerChip IDE.

Solutions to Pipelining Hazards

We will look into how to get past the pipeline hazards.

One such hazards, is read after write hazard.
Code introduced to the CPU for the tackle

	$src1_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs1 )) ? (>>1$result): $rf_rd_data1; 
	$src2_value[31:0] = ((>>1$rf_wr_en) && (>>1$rd == $rs2 )) ? (>>1$result) : $rf_rd_data2;

Now, we look into how to rectify the branch paths in the CPU core developed.
Scehmatic to rectify the brancg path followed
Code Introduced

 	$pc[31:0] = (>>1$reset) ? 32'b0 : (>>3$valid_taken_br) ? (>>3$br_tgt_pc) :  (>>3$int_pc)  ;


	// we will comment off the valid line
	//$valid = $reset ? 1'b0 : ($start) ? 1'b1 : (>>3$valid) ;

Now, we will decode the remaining RV32I Base Instruction Set. Can refer this page for a detailed discription --> LINK
Once we complete the decoding, we finish the ALU logic for the decode instruction set.
Complete implementation on Makerchip IDE.

Load/Store Instructions and Completing the CPU

Under this section, we will look into how to add the load and store data from register files and test program, followed by instantiation of the data memory unit. Towards the end we will look into how to generate branch control logic for the jump statements.

Schematic for how to redirect the load.
Now, we look into the schematic flow to load data and implement this on makerchip.
Now we begin with creating the data memory.
The block diagram for the memory structure, representing the inputs and outputs for the memory block are as follows.
After the memory is instantiated, we try to load and store using different register and have a hands-on practice.
The final being is the integration of control for branching of jump statements.
The scehmatic diagram showing the implemetation of jump statement logic

Final Implementaion on Makerchip IDE

Diagram Generated along with the waveform and visualisation

Acknowledgements

Kunal Ghosh, Co-founder, VSD Corp. Pvt. Ltd.
Steve Hoover, Founder, Redwood EDA
Shant Rakshit
Alwin Shaju

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RISC-V

Day 1: Introduction to RISC-V and GNU compiler toolchain

Day 2: Introduction to ABI and Basic verification flow

Day 3: Digital Logic with TL-Verilog and Makerchip.

Day4: Basic RISC-V CPU micro-architecture.

Fetch

Decoder:

Day 5: Complete Pipelined RISC-V CPU Micro-Architecture

Acknowledgements

References

About

Releases

Packages

ShubhamGitHub528/RISC-V

Folders and files

Latest commit

History

Repository files navigation

RISC-V

Day 1: Introduction to RISC-V and GNU compiler toolchain

Day 2: Introduction to ABI and Basic verification flow

Day 3: Digital Logic with TL-Verilog and Makerchip.

Day4: Basic RISC-V CPU micro-architecture.

Fetch

Decoder:

Day 5: Complete Pipelined RISC-V CPU Micro-Architecture

Acknowledgements

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages