Skip to content

Latest commit

 

History

History
1573 lines (1014 loc) · 62.9 KB

README.md

File metadata and controls

1573 lines (1014 loc) · 62.9 KB

A repository containing a detailed documentation of my progress in the VSD-HDP program

Program link: VSD-HDP

DAY 0

Tool-1 Yosys

Installation guide

https://github.com/YosysHQ/yosys

Prerequiste dependency

$ sudo apt-get install build-essential clang bison flex \
  libreadline-dev gawk tcl-dev libffi-dev git \
  graphviz xdot pkg-config python3 libboost-system-dev \
  libboost-python-dev libboost-filesystem-dev zlib1g-dev

Installation Flow

$ mkdir yosys-master
$ cd yosys-master
$ git clone https://github.com/YosysHQ/yosys.git
$ sudo apt install make(installing make if you havent done it yet)
$ sudo apt-get install build-essential clang bison flex \
    libreadline-dev gawk tcl-dev libffi-dev git \
    graphviz xdot pkg-config python3 libboost-system-dev \
    libboost-python-dev libboost-filesystem-dev zlib1g-dev
$ cd yosys-master/yosys/
$ make
$ sudo install make

if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install yosys
$ sudo apt upgrade

Note : one can choose to make a separate build folder for performing the make command(installing yosys). This should be done after installing the dependency in the installation flow

$ mkdir build; cd build
$ make -f ../Makefile

Provides a path to the Makefile

Progress image

yosys

Tool-2 OpenSTA

Installation guide

https://github.com/The-OpenROAD-Project/OpenSTA

Prerequiste dependency

$ sudo apt install swig

Installation flow

$ git clone https://github.com/The-OpenROAD-Project/OpenSTA.git
$ cd OpenSTA
$ mkdir build
$ cd build
$ cmake ..
$ make

if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install opensta
$ sudo apt upgrade

Progress image

image

Tool-3 ngspice

Installation guide

Download the tarbell file ngspice-37.tar.gz from old releases parent folder from

https://sourceforge.net/projects/ngspice/files/

Installation flow

$ tar -zxvf ngspice-37.tar.gz
$ cd ngspice-37
$ mkdir release
$ cd release
$ ../configure  --with-x --with-readline=yes --disable-debug
$ make
$ sudo make install


if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install ngspice
$ sudo apt upgrade

Progress image

ngspice

Note: gtkwave and iverilog were also installed

iverilog

$ sudo apt-get install iverilog

gtkwave

$ sudo apt update
$ sudo apt install gtkwave

Magic

$   sudo apt-get install m4
$   sudo apt-get install tcsh
$   sudo apt-get install csh
$   sudo apt-get install libx11-dev
$   sudo apt-get install tcl-dev tk-dev
$   sudo apt-get install libcairo2-dev
$   sudo apt-get install mesa-common-dev libglu1-mesa-dev
$   sudo apt-get install libncurses-dev

Day 1

Learning how to use softwares like iverilog , gtkwave ,yosys etc

7x1 MUX using iverilog and gtkwave

flow for simulation:

Iverilog and gtkwave codelines
- iverilog <filetop.v> <file1.v> …. <tb_filetop.v>
- ./a.out
Copy the generated dumpfile.vcd and run it with
- gtkwave dumpfile.vcd

RTL

`timescale 1ns / 1ps

module own_MUX_7x1(

    input [6:0] i,

    input [2:0] s,

    output reg y

    );

	always@(i,s)
	begin
		case(s)
				3'b000: y = i[0];
				3'b001: y = i[1];
				3'b010: y = i[2];
				3'b011: y = i[3];
				3'b100: y = i[4];
				3'b101: y = i[5];
				3'b110: y = i[6];
				3'b111: y = i[6];
				default: y = i[0];
		endcase
	end
endmodule

Testbench

`timescale 1ns / 1ps

module tb_own_MUX_7x1();

  reg [6:0]i;

  reg [2:0]s;

  wire y;

  own_MUX_7x1 uut(i,s,y);


initial
    begin
    $dumpfile("own_MUX_7x1.vcd");
    $dumpvars(0,tb_own_MUX_7x1);
    i=0;
    s=0;
    
    #300 $finish;
    end

always #10 i=i+1;
always #25 s=s+1;

endmodule

Simulation Waveform

mux7x1_sim

Synthesis

flow for synthesis under yosys:

yosys
read_liberty -lib <relative or abs path>/ lib file 
read_verilog <verilog_file.v>
synth -top <verilog_file> 
abc -liberty <relative or abs path>/ lib file ( generates results on ur design → netlist verify them before continuing)
show 
write_verilog <file_name>.v  OR    write_verilog -noattr  <file_name>.v 

mux7x1cells

The following standard cells were invoked when mapped to the standard library file.

The synthesis of the design is as shown below

mux7x1syn

Netlist

/* Generated by Yosys 0.26+4 (git sha1 5ea2c290a, clang 10.0.0-4ubuntu1 -fPIC -Os) */

module own_MUX_7x1(i, s, y);
  wire _00_;
  wire _01_;
  wire _02_;
  wire _03_;
  wire _04_;
  wire _05_;
  wire _06_;
  wire _07_;
  wire _08_;
  wire _09_;
  wire _10_;
  wire _11_;
  wire _12_;
  wire _13_;
  wire _14_;
  wire _15_;
  wire _16_;
  wire _17_;
  wire _18_;
  wire _19_;
  wire _20_;
  wire _21_;
  wire _22_;
  wire _23_;
  wire _24_;
  wire _25_;
  wire _26_;
  wire _27_;
  wire _28_;
  wire _29_;
  wire _30_;
  wire _31_;
  wire _32_;
  wire _33_;
  wire _34_;
  wire _35_;
  wire _36_;
  wire _37_;
  wire _38_;
  wire _39_;
  wire _40_;
  wire _41_;
  wire _42_;
  input [6:0] i;
  wire [6:0] i;
  input [2:0] s;
  wire [2:0] s;
  output y;
  wire y;
  sky130_fd_sc_hd__mux2_1 _43_ (
    .A0(_33_),
    .A1(_34_),
    .S(_39_),
    .X(_36_)
  );
  sky130_fd_sc_hd__mux2_1 _44_ (
    .A0(_29_),
    .A1(_30_),
    .S(_39_),
    .X(_37_)
  );
  sky130_fd_sc_hd__mux2_1 _45_ (
    .A0(_31_),
    .A1(_32_),
    .S(_39_),
    .X(_38_)
  );
  sky130_fd_sc_hd__mux4_2 _46_ (
    .A0(_37_),
    .A1(_38_),
    .A2(_36_),
    .A3(_35_),
    .S0(_40_),
    .S1(_41_),
    .X(_42_)
  );
  assign _41_ = s[2];
  assign _40_ = s[1];
  assign _39_ = s[0];
  assign _35_ = i[6];
  assign _34_ = i[5];
  assign _33_ = i[4];
  assign _32_ = i[3];
  assign _31_ = i[2];
  assign _30_ = i[1];
  assign _29_ = i[0];
  assign y = _42_;
endmodule

DAY 2

Hierarchy and flat synthesis under Yosys ; Synthesis of a flop

Hierachy png

mux7x1synhier

Flattened png

mux7x1synflat

flatten was used to break the hierachy and make a single module.

D flip-flop with asynchronous and synchronous reset

dffsyn

to make use of the D Flip-Flop , the following command needed to be executed before mapping to the standard cell library

dfflibmap -liberty <relative or abs path>/ lib file

DAY 3

Logic Optimization: combinational and sequential (basic)

Combinational Logic Optimisation

To remove unused cells from the synthesis design, the command opt_clean -purge is used. It optimises the cells that are redundant to the design but have been intialised from RTL code.

optcheck4cells

The above diagram involves the cells invoked to solve a complex boolean logic y = a?(b?(a & c ):c):(!c);

The optimisation of this boolean funvtion resulted in a XNOR gate y = a^c as shown in the image below:

optcheck4syn

Here is another example where a hierachy of the modules exist.

multmodcells

these cells were inferred on synthesisiing the RTL file

multmodsyn

The above image shows the synthesis of the design without removal of the hierachy. Yosys infers only the necessary logic which is linked to the outputof the design.

On removal of hierachy and optimising the logic using flatten and opt_clean -purge , the following result was obtained:

multmodpurged

Sequential Logic Optimisation

Two examples were used to show the difference in design. [Sequential Constant - Basics]

seqconst

The above diagram is the optimised design after identifying a sequential constant. Since the values of q1 and q remain unaffected by any change in inputs the combinations of D flipflops was replaced by a simple wires.

seqconstcells

Now consider the second case where a sequential constant was not identified

dffconst5syn

The outputs q1 and q were not constant throughout and could be affected by inputs like reset and clk. Hence flops had to be inferred to complete the design synthesis.

notseqconstcells

A case of optimisation of unused ports which are not linked to the design output

3 bit up counter:

case 1: assign q = count[0];

dffoptimistation

it is clearly visible that yosys has optimised uncessary logic i.e the other two bits of the counter , that is not linked to the output q.

case 2: assign q = (count[2:0] == 3'b100);

dffoptimisation2

In this case all bits of the counter are affiliated with q. Hence three flops are inferred to represent the same. The other circuitry contributes to the incrementation of the counter.

DAY 4

GLS, blocking vs non-blocking and Synthesis-Simulation mismatch

Testing to check whether the netlist generated from the synthesis tool works with ur testbench (it definitely should!). But cases of RTL to GLS mismatch do exist.

To make sure that the netlist is to be checked you will need to call the library files present in verilog_model.

Simulation-Synthesis Match using 2x1 MUX

A RTL design for a 2x1 MUX using ternary operator was tested and the follwing waveform was obtained.

d4mux1

The netlist for the same was generated using yosys and it was tested with the same testbench. To do so the library files needed to be called before doing so since the standard cell modules were used.

d4mux2

From the two pics, it is clear that Simulation and Synthesis matching is happening.

Simulation-Synthesis Mismatch using 2x1 MUX

The RTL Design for a 'bad' 2x1 mux was done by not completing the sensitivity list and was tested.

d4badmux1

The output shown is not correct as the y is updated only when there is a change in sel. The netlistfor the samw was generated and tested.

d4badmux2

The difference in outputs is clearly stated. Hence a case of Simulation-Synthesis Mismatch is observed.

Blocking caveat (understanding blocking assignements)

This RTL desgin of an OR-AND gate was done to understand blocking assigments in verilog. Inputs a,b were fed to the OR gate and its o/p and input c to an AND gate. The block was deigned with blocking assignments with the AND operation first followed by the OR operation.

d4blocking1

Output y is incorrect as the previous value of OR o/p is taken for evaluation (flopped value) The synthesis of this desgin was done and the netlist was tested.

d4blocking2

The output shown does incur the previous value of OR o/p hence giving the correct result.

d4blocking3

DAY 5

Avoiding latches due to incomplete if case conditions

Eg1: A 2x1 mux with no else block will lead to latch on the i0 - this will become the enable signal for the latch. It is observed in RTL simulation below.

incomp if RTL

a latch is inferred in yosys as well

latch inferred

show incompif

Eg2: An undefined case will again lead to a latch being inferred. RTL simulation is shown below

incomp_if2RTL

Synthesis results

image

image

Avoiding latches due to incomplete case conditions, partial assignments of case outputs ,overlapping case conditions.

1) Incomplete Case Statement

Lets take a 4x1 mux where the condtions for select =2,3 are not defined along with default case. Latch is inferred for select =2,3 with enable being sel1_n.

RTL Simulation

image

Synthesis Output

incomp_case latch

image

2) Partial assignment case statement The partial assignment code is as follows

module partial_case_assign (input i0 , input i1 , input i2 , input [1:0] sel, output reg y , output reg x);
always @ (*)
begin
	case(sel)
		2'b00 : begin
			y = i0;
			x = i2;
			end
		2'b01 : y = i1;
		default : begin
		           x = i1;
			   y = i2;
			  end
	endcase
end
endmodule

y will have no latches.A latch will be inferred for x as it has not been assigned a value when sel =1;

RTL simulation

image

Synthesis Output

image

image

3) overlapping case statements

A situation where more than one case is satisfied in case-statement. This constitutes to bad coding as there should never be cases of case-statements conditions being the same. You will be at the mercy of the simulator to see how it will simultae this confused state since all the cases are checked in a case -statement despite being satisfied(no priority order).

module bad_case (input i0 , input i1, input i2, input i3 , input [1:0] sel, output reg y);
always @(*)
begin
	case(sel)
		2'b00: y = i0;
		2'b01: y = i1;
		2'b10: y = i2;
		2'b1?: y = i3;
		//2'b11: y = i3;
	endcase
end

endmodule

RTL simulation

image

Synthesis Output The Synthesis tool will optimise the code and remove the redundant parallel case. It is observed that no latches are inferred.

image

image

There will be a simulation synthesis mismatch in this case as the code was optimised to remove the confusion.

image

for and for generate

for - used inside always block for evaluating multiple expressions --> like large mux ,demux etc generate for - used outside always block for initialising/generating multiple hardware units --> like ripple carry adder(rca)

4x1 mux using for loop

RTL code


module mux_generate (input i0 , input i1, input i2 , input i3 , input [1:0] sel  , output reg y);
wire [3:0] i_int;
assign i_int = {i3,i2,i1,i0};
integer k;
always @ (*)
begin
for(k = 0; k < 4; k=k+1) begin
	if(k == sel)
		y = i_int[k];
end
end
endmodule

RTL Siumlation

mux rtl

Synthesis Output

image

GLS Results

image

8x1 demux using for loop

RTL code


module demux_generate (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel  , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
y_int = 8'b0;
for(k = 0; k < 8; k++) begin
	if(k == sel)
		y_int[k] = i;
end
end
endmodule

RTL Simulation

image

Synthesis Output

image

GLS Result

image

Ripple Carry Adder using for generate

rule for addition - [N,M] +1 bits = o/p; N,M are inputs

RTL code

RCA

module rca (input [7:0] num1 , input [7:0] num2 , output [8:0] sum);
wire [7:0] int_sum;
wire [7:0]int_co;

genvar i;
generate
	for (i = 1 ; i < 8; i=i+1) begin
		fa u_fa_1 (.a(num1[i]),.b(num2[i]),.c(int_co[i-1]),.co(int_co[i]),.sum(int_sum[i]));
	end

endgenerate
fa u_fa_0 (.a(num1[0]),.b(num2[0]),.c(1'b0),.co(int_co[0]),.sum(int_sum[0]));


assign sum[7:0] = int_sum;
assign sum[8] = int_co[7];
endmodule

FA

module fa (input a , input b , input c, output co , output sum);
	assign {co,sum}  = a + b + c ;
endmodule

RTL Simulation

image

Synthesis Output

image

GLS Result

image

DAY 6

PWM Generator with Variable Duty Cycle

Pulse Width Modulation is a well-known technique used to create pulses of the desired width. The duty cycle is the ratio of how long that PWM signal stays at the high position to the total time period.

image

Applications

Pulse Width Modulated Wave Generator can be used to:

- control the brightness of the LED
- drive buzzers at different loudnes
- control the angle of the servo motor
- encode messages in telecommunication
- used in speed controlers of motors

Block Diagram

This PWM generator generates 10Mhz signal(dependent on the counter module in the block). We can control duty cycles in steps of 10%. The default duty cycle is 50%. Along with clock signal we provide another two external signals to increase and decrease the duty cycle.

image

In this specific circuit, we mainly require a n-bit counter and comparator. Duty given to the comparator is compared with the current value of the counter. If current value of counter is lower than duty then comparator results in output high. Similarly, If current value of counter is higher than duty is then comparator results in output low. As counter starts at zero, initially comparator gives high output and when counter crosses duty it becomes low. Hence by controlling duty, we can vary the duty cycle.

image

RTL Simulation

RTL SIM  f

Synthesis output

The logic of the code was implemented using the following components

syn f

The gate level netlist generated connections were shown as follows

syn imp

GLS Simulation

The functionality of the PWM generator with variable duty cycle is retained post-synthesis. Hence the deisgn does not have Simulation-Synthesis Mismatch

gls f

The related files are present in the PROJECT folder.

DAY 7

Introduction to STA

  1. Delay is a function of input transition i.e current (inflow) and output load i.e load capacitance (size of the bucket). [ direct proportional ]
  2. Timing arc --> delay infromation from all inputs to all outputs eg 2 i/p AND gate has 2 timing arcs -> a-q and b-q. Any changes in the inputs will affect the output. For a D FF we have 3 timimg arcs -> Clk - Q delay , setup time and hold time. For a D latch we have 4 timing arcs - D - Q delay ,Clk - Q delay ,setup time and hold time.

Note : triggering of DFF and Dlatch (setup and hold time) occur at sampling points. Therefore, for DFF it will be at posedge or negedge of Clk and for Dlatch it will be at negedge or posedge of Clk (pos level Clk or neg level clk). ------- IMPORTANT.

image

  1. Timing path - the path for data to move from a) clk of one flop to the input of next flop (reg 2 reg) b) input to output (not present usually IO path) c) input to flop d) clk of flop to output (c,d are IO Timing Paths). The max Tclk value will be the critical path of the design as it will be the least clk delay that can be used for the design.

Note :a) MAX constraint :- Tclk >= Tcq + Tcombi + Tsetup --> Data path(max) > Clk path(min)
b) MIN constraint :- Thold <= Tcq + Tcombi --> Data path(min) < Clk path(max)

Basic STA

explains setup and hold time

Constraints

image

The constraints are applied based on the design specifications

  1. reg2reg --> contrained by clk -> Tcombi will be squeezed to compensate.

image

  1. in2reg --> constrained by clk ,input external delay and input transition . Input logic will be squeezed to compensate.

input external delay

image

input transition delay --> incresase input logic delay which needs to be further squeezed.

image

  1. reg2op --> constrained by clk ,output external delay and output load (parasitic capacitance). Output logic will be squeezed to compensate.

output external delay

image

output load (parasitic capacitance) --> incresase output logic delay which needs to be further squeezed.

image

  1. reg2op and in2reg are called IO Paths and the delay modelling is called IO delay Modelling. (standard interface specifications like SPI,I2C --> industry protocols)

NOTE : 1) rule of thumb --> external delay : internal delay is 70:30. 2) IO paths need to be constrained for MAX delay(setup) and MIN delay(hold).

LABS

  • exploring the library file and learning how the files were characterised --> area ,power , delay , capacitances , input (rise , fall) transistion , pin attributes --> direction,function,Clock pin,Timing sense and type etc , Power pin conncetions ( since logic gates are nothing but CMOS --> VGND , VPBN etc), and other necessary information.

image

image

  • Lookup tables were also present in the lib file so that tool is able to select the necessary o/p based of the 2 indexes. Eg indexes --> input transition and output capacitance o/p --> timing delay. in case the specified values dont lie in the indexes the range in which the values lie is taken and interpolation is done obtain the value at the specified point.

image

similarly sequential cells will also have such factors and there exist more dependencies of one pin o/p on the other pin o/p --> related pins. They will also have Clock pins --> There timing type will be specified based of the type of flop it is (rising_edge or falling_edge --> posedge or negedge). The setup/hold time claculation part should also be specified to the tool as "setup_rising" or "setup_falling" to let the tool know at what edge of clock must the setup time be calculated.

LHS - posedge clk , RHS - negedge clk for DFF - defining type of clock

image

LHS - posedge clk , RHS - negedge clk for DFF - defining setup time calc

image

dc shell commands --> in our case we will be using OpenSTA since its a free source // verify if these functions match with OpenSTA

> list_lib
> foreach_in_collection my_lib_cell [get_lib_cells */*<the cell you need> { 
	set my_lib_cell_name [get_object_name $my_lib_cell]; 
	echo$my_lib_cell_name;
	}
> foreach_in_collection my_pins [get_lib_pins <cell name>/*]{  --> my_pins is a loop variable
	set my_pins_name [get_object_name $my_pins];		--> 'get_object_name' 
	set pin_dir [get_lib_attribute $my_pins_name direction];   --> 'set' is for variable instantiation ; get_lib_pins/attribute <file_name> <pin/attribute_name>
	echo$my_pins_name	$pin_dir;                      --> to use a variable we use $<var>; to print a variable echo$<var>
	}

> source <script filename>.tcl --> for calling a script file
> list_attributes -app > a  --> for seeing all defined attributes in a library ; it is fed to file called 'a'.

A script file called my_script.tcl image

DAY 8

NOTE: Important Constraint Commands 1)get_clocks 2)get_ports 3)get_pins 4)get_nets 5)set_input_transition -min -max 6)set_input_delay -min -max 7)set_clock_latency -source 8)set_clock_latency - 9)set_clock_uncertainty - 10)create_clock -name -per -wave 11)get_attribute 12)create_generated_clock -master -source -div 13) regexp a b

DAY 9

IN2REG constraints and REG2OUT constraints

The three start point is from the inputs of the design : reset, increase_duty(Flop), decrease_duty(Flop). There is no slack violation in the mentioned cases.

image

image

image

The reports generated by OpenSTA show that the design does not have any violators // add hyperlink here

DAY 10

An introduction to SPICE and how it is used to analyse MOSFETS

The graph below is a spice simulation of a long channel NMOS with W = 5 L = 2

image

Some of the coordinates as shown in ngspice

image

Long and Short channel ( L > 25nm and L < 25nm )

Due to velocity saturation effect in short channel MOS, the saturation sets in early. As a result the peak current is much less compared to long channel MOS.

in the given case the L-channel MOS has peak current idmax = 410 uA while S-channel MOS had peak current idmax = 197 uA.

a plot between id vs Vgs at constant Vds = 1.8V

image

The threshold voltage Vt = 0.75V for this NMOS

CMOS Voltage Transfer Characteristics

image

NMOS and PMOS dc characteristics were merged and the voltage transfer characteristics for CMOS inverter was plotted. The above diagram denotes the behavior exhibited by the NMOS and PMOS clubbed together at different Vin and Vout conditions in the range of 0 to 2 V.

DAY 11

__SPICE Simulations for Id vs Vds for W = 0.39u and L = 0.15u image

  • It can be seen that compared to the previous Id vs Vds graph that the saturation current slightly increases even though W/L ratio remains constant. This is because of short channel effect.

SPICE Simulations for Id vs Vgs for W = 0.39u and L = 0.15u image

SPICE Simulations for Id vs Vgs for W = 3.12u and L = 1.20u image

  • It can be seen from the even though the W/L ratio remains 2.6 for both the plots, Id is slightly different. Short channel transistor characteristics has more linearity than the long channel one.

Velocity Saturation

  • At higher electric fields, the electrons velocity becomes constant.
  • This happens in short channel as the electric field strengh increases due to reduced channel length.

VTC

  • Used to calculate the delay tables for STA.
  • The plot of Vout vs Vin in CMOS inverter.

DAY 12

Spice Netlist for inverter

image

Spice Simulation for VTC of CMOS Inverter image

The switching threshold voltage was found to be around 0.87 V.

DAY 13

Spice Simulation of CMOS Inverter ( V vs Time )

image

  • From the above simulation delay values for rise and fall can be observed.
  • Rise time --> 0.33ns
  • Fall time --> 0.28ns
  • The delay values of different inverters with increased pmos W/L ratio can be found.
  • It is seen that the value of rise delay decreases and fall delay increases with increase in W/L of pmos.

DAY 14

NOISE MARGIN

    NMH = VOH-VIH // High noise margin
    NML = VIL-VOL // Low noise margin

SPICE Simulation

image

  • VOH = 1.728 V , VOL = 0.066 V, VIL = 0.754 V, VOH = 1.014 V.
  • NMH = 0.714 V
  • NML = 0.688 V

DAY 15

Spice Simulation of Power Supply Scaling

image

  • It can be observed by the above values that the gain increases as power supply is reduced upto 1.2 V and after 1 V it starts decreasing.

Spice Simulation of Device Variation

image

  • The swithing threshold is observed to be around 0.98 V.
  • The pmos W/L ratio is more than its nmos counterpart due to which the pmos charges the load capacitor more. Meaning we have a strong p-fet and a weak n-fet.

DAY 16

STA on PWM_gen.v with different ss,ff,tt PVT corner

image

Plot of TNS,WNS,WHS and Worst Slack with multiple corners

image

DAY 17

Software - Hardware (S-H) Communication

image

The above image represents the S-H translation. It starts at the software application level which takes in an input. This input is now processed by the Operating System (OS). An OS performs low level system functions, handles IO operations and allocates memory. It instigates the Compiler to convert high level abstract code of the software to Assembly/Low level code instructions. This is further converted into a bit stream by the Assembler to serve as input to the Hardware.

The Instruction Set Architecture (ISA) refers to the 'architecture' of the computer/processor. For example, if the ISA used is of RISC-V, the code converted by the compiler should give instructions suitable for RISC-V core. Hence, one can say that ISA basically represents the Hardware at an intermediate stage.

image

The Assembler converts the instructions into a bitstream that is fed to the Hardware. To obtain the hardware or final layout, a certain number of steps need to be followed. The RTL/HDL of the core followed by an optimised/synthesized netlist which is converted into the layout/hardware.

Introduction to Physical Design

Consider a chip on an arduino board, it would contain the following components:-

image

It has several protocols, an external memory unit (SDRAM), GPIOS, PWM etc. Now all (except memory external chip) of these are contained in a package as shown below.

image

It represents a 7x7 [dimensions] QFN-48 [Quad Flat No leads; 48 pins].

image

The chip is connected to the package with the help of wire bonds

image

Signals are sent to and fro the pins of the chip via pads. The die is the minimal silicon area on which the pads are present. The core is the region which contains the logic of the designed chip.

image

The core region contains Macros and Foundry IPs. Foundry IPs are intellectual blocks which are designed specifically by the foundry and it is unique to each foundry. Some examples are PLL, SRAM, DAC, ADC, etc. Macros are Digital designs which form the crux of the core region. In this case, RISC-V SoC, GPIO bank etc.

Summary

  • The chip in general contains many cores which are known as foundary IPs. These can be PLL, SRAM, DAC etc.
  • The general flow from applicaton software down to the hardware with system software block in between.
  • System Compiler consists of the OS, Compiler and the assembler.
  • OS handles I/O operations, memory management, process management and low level system functions.
  • Compliler converts high level code into the respective low level code according to hardware (ARM, Intelx86, RISCV, MIPS etc).
  • Assembler converts low level machine instructions (ISA) into binary streams.
  • Hardware description is written in HDL for respective ISA to follow PD flow.

Openlane and sky130 pdk inception

What is PDK?

PDK (process design kit) is the interface between the FAB and the designers. It contains a collection of files used to model a fabrication process for the EDA tools used to design an IC.

  • Process Design Rules: DRC,LVS,PEX
  • Device Models
  • Digital Standard Cell Libraries
  • I/O Libraries etc

RTL TO GDSII flow

image

  1. RTL Design

  2. Synthesis

    • Translation of RTL to a gate level netlist using Standard Cell Library (SCL). It is followed by STA to check for initial timing violation.
  3. Floor planning (PD)

    • Initial Layout of the Design.
      • Chip Partitioning --> divide the design into smaller blocks while maintaining functionality.
      • Macro Partitioning --> dividing and placing macros, rows and pins.
      • Power planning --> setting the VDD and GND layers. The top layers are used as they are wide and offer less resistance.

the above steps involve Partitioning, Floor planning and Power planning.

  1. Placement

    • finalised layout of the modules, macros, pins and pads.
      • Global --> tries to find optimal position for all cells. Such cells are not neccesarily legal. There is overlapping of cells.
      • Detailed --> placements obtained from global are minimally altered to be legal.
  2. Clock Distribution Network

    • Clock Tree Synthesis (CTS) performed to ensure the clk signal reaches all sequential elements in a circuit design with zero to minimal skew.
      • CTS alters the netlist. Functionality check is required before progressing
      • Logical Equivalence Check (LEC) --> formally confirm that the function did not change after modifying netlist

it is imperative to check functionality when the netlist is modified.

Note: Fake antenna diode insertion --> Antenna violations may occur which cause damage to the transistors as reactive charges begin to accumulate (usually taken care by Routing). There are two methods to approach this issue.

  • Bridging --> Attaches higher layer intermediary.
  • Antenna diodes --> nullify reactive charges.

OpenLane adds fake antenna cells to all gates after placement --> if ant violation is detected it will replace the fake cell with a real one.

  1. Routing (Reference)
    • Implement the interconnect (horizontal and vertical wires) using the available metal layers.
    • The skywater pdk contains all the data ( location, size, thickness, pitch, vias ..etc) about the interconnect/metal layers.
    • Metal tracks form a routing grid.
      • Global Routing --> coarse grained grids used to generate routing guides
      • Detailed Routing --> fine grained grids and routing guides to implement actual wiring.
    • Physical Verification --> DRC and LVS.
    • Timing Verification --> STA

skywater130 pdk --> (1) lowest layer/local interconnect layer (Titanium Nitride) + 5 layers above (Aluminium) = (6)

Note: OpenLANE --> produce clean (no DRC, LVS, timing violations) GDSII with no human intervention.

  1. DFT (Design for Testing)
    • Scan insertion
    • Automatic Test Pattern Generation (ATPG)
    • Test Patterns Compaction
    • Fault Coverage
    • Fault Simulation
  1. Physical Implementation (Automatic PNR) --> OpenRoad
    • Floor/Power planning
    • End Decoupling capacitors and tap cell insertion
    • Placement
    • Post Placement Optimization
    • CTS
    • Routing

Installation (Reference)

Docker installation.

sudo apt-get update
sudo apt-get upgrade
sudo apt install -y build-essential python3 python3-venv python3-pip make git
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world
sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot 

After reboot, check for correct installation.

docker run hello-world

Successfull installation

Screenshot from 2023-11-20 11-38-47

Check for the following dependencies.

git --version
docker --version
python3 --version
python3 -m pip --version
make --version
python3 -m venv -h

Download and build OpenLane from github

git clone https://github.com/The-OpenROAD-Project/OpenLane
cd OpenLane
make
make test

Run a basic test.

# Enter a Docker session:
make mount

# Open the spm.gds using KLayout with sky130 PDK
klayout -e -nn $PDK_ROOT/sky130A/libs.tech/klayout/tech/sky130A.lyt \
   -l $PDK_ROOT/sky130A/libs.tech/klayout/tech/sky130A.lyp \
   ./designs/spm/runs/openlane_test/results/final/gds/spm.gds

# Leave the Docker
exit

Successful execution with some warnings.

Screenshot from 2023-11-20 12-07-11

Labs

Openlane is automated RTL to GDSII flow that consists of multiple tools (obviously opensource) such as OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout and a number of custom scripts for design exploration and optimization. It has two modes to promote "No human in flow", that is, autonomous and interactive. For understanding the process of the flow, I will be using the "interactive" method.

Before I get into the Openlane Flow, A small intro about Opensource pdks used in Openlane would be helpful.

From Openlane is compatible with pdks namely skywater130 and osu. sky130A is the variant of skywater-pdk which is compatible with opensource tools. Under the variant, we have libs.tech--> contains the library files related to the tools used in the flow and libs.ref--> contains library files for the different skywater pdks

I will be using sky130_fd_sc_hd for my design.

Look into the different types of file types which are used to build a pdk.

  • verilog --> netlists
  • techlef --> metal layer data and design rules (technology files)
  • spice --> circuit netlists of analog devices
  • maglef --> used for displaying metal layers in the layout tool
  • mag --> used for displaying layout on the layout tool
  • lib --> contains the flavours of library files for different process corners. In short logical libraries.
  • lef --> contains physical info such as shape, size, direction, and symmetry, input and output pins direction for each cellin the design.
  • gds --> (GDSII) used to store IC layout information.
  • cdl --> similar to spice netlists; stores electronic circuit information.

Commands to run OpenLane

Starting up docker

cd Openlane
make mount

In docker

./flow.tcl -interactive
package require openlane 0.9
prep -design picorv32a

This sets up the tool for running the flow for the design picorv32a under the designs folder.

run_synthesis

is used to perform synthesis and sta of your design.

Note: For a custom design. You will need to create a config.tcl. The sky130_fd_sc_hd_config.tcl is not compulsory. Config.tcl overwrites default parameters.

So the question that arises is what is in the file?

 # Design
set ::env(DESIGN_NAME) "picorv32a"

set ::env(VERILOG_FILES) "./designs/picorv32a/src/picorv32a.v"
set ::env(SDC_FILE) "./designs/picorv32a/src/picorv32a.sdc"

set ::env(CLOCK_PERIOD) "5.000"
set ::env(CLOCK_PORT) "clk"


set ::env(CLOCK_NET) $::env(CLOCK_PORT)




set filename $::env(OPENLANE_ROOT)/designs/$::env(DESIGN_NAME)/$::env(PDK)_$::env(STD_CELL_LIBRARY)_config.tcl
if { [file exists $filename] == 1} {
        source $filename
}

Config.tcl is used to set the files and parameters in the flow environment. As shown in the snippet above.

Lab exercise

Flop ratio and chip area

pattern_1 ( before opt command )
1613/18036 = 8.9432%

pattern_2 ( post opt command )
1613/14876 = 10.8430%

Chip area for module '\picorv32a': 147712.918400

DAY 18

Width and Height of cells

Lets consider a basic example of a combo logic between capture and launch flops. Each cell and each flop will have dimensions (in this case lets take unit dimensions). Now to produce the components of the netlist (cell and flops) , it needs to be structured on the silicon wafer die. Hence I would need to place these components of the netlist in certain way such that it fits in the core to be placed on the die.

Utilization factor = Area of the netlist / Total area of the core ( < 1 usually 0.5/0.6 )
Aspect ratio = Height / Width of the core ( if 1 --> square core; else --> rectangle core )

The below image explains the factors

Chip Partitioning

lets consider a combo logic which consists of a massive number of gates (50,000). When implementing this on a single die the utilization factor will surely increase. Hence the gates are partitioned into smaller blocks with input and outputs between these blocks. These blocks can further be converted to a black box to aid in reuseability of the function in the design.

When hard macros such as memory, comparator,.. etc are used in the designs, these locations are user defined and the tools will not touch these IPs during the automated PnR flow.

Note: Macro is a predefined and reuseable blocks of logic which can perform specific tasks. There are two types of macros, namely:

  • Hard macros --> non-flexible, PPA and timing is fixed, available as ICs, industry graded.
  • Soft macros --> flexible, PPA and timing is unpredictable, synthesizable RTL.

De-coupling Capacitors

Memories are often placed close to the input side. Memory units serve as pre-placed cells. Now connectivity with these units is done through the supply/power lines in the chip. They are connected with wires. The physical distance between the source and the cell will cause a drop in the voltage. In such a scenario, if the voltage reaching the cell is not sufficient to meet the Noise Margin specifications, it would cause an unpredictable output at the cell. The solution for it is to use de-coupling capacitors to provide a "backup supply" closer to the unit(zero to minimal voltage drop due to very short distance).

How does a de-coupling capacitor work?
lets take an AND gate. During switching from 0 to 1 state, if the voltage being supplied to the gate from the Power line drops below the required voltage, the capacitor Cd discharges and supplies power to the AND gate temporarily to ensure correct voltage is being supplied. When no switching is taking place the Cd is charged by the Power lines. Hence it ensures proper voltage is being supplied to the gate during switching operations.
It also bypasses high frequency noise from other units and prevents crosstalk between closely placed cells.

Power Grids

The power fluctuation issue was stabilised for a local module using de-coupling capacitors. Now I will have to consider fluctuations between multiple such modules in the chip.

The orange line indicates a 16-bit bus.

It is not feasible to have capacitors throughout the chip. However, if not considered it will lead to voltage drooping and ground bounce which will momentarily affect the working of the chip (it is bad for large designs). Voltage drooping is a condition in which multiple capacitors (of a bus) draw current from the same power line causing the source voltage to drop below the original value. Closely, ground bounce is a state when the ground value is slightly above zero because of many capacitors discharge current into a single ground line. These will definitely lead to uncertaintity in the internal functioning of the chip.



The solution to this problem is the introduction of many other power lines in the form of a grid/mesh. Hence the capacitor closest to the power line can tap into whichever needed. VDD power lines are placed in vertically and horizontal layers with metal contacts. The GND lines are also placed similarly in the same level as VDD. However it is made sure that both these lines are isolated from each other.

Pin placement

A chip will have input as well as outputs and to tap into these values I will require pin placement on the chip. Once the design is complete, all the inputs and outputs are placed in a region specifically reserved for pins. This is done by adding a blockage element to that area to restrict the tool from placing cells. This is called as logical cell placement blockage.


sample design

The pins are optimized by fanout from a common point and are placed in a random order in the reserved area. Many parameters are considered while placing the pins such as connectivity, proximity, type of pins (eg i/o, clk, power/gnd),.. etc. Clock signal is used to facilitate all the flops and sequential elements in the chip. Hence, the clk pin is larger than the i/o pins so that it offers least resistance to the path.


floorplan of sample design

LABS

Change switches/variables ( info under configuration) in the design config.tcl to suit your needs. Then run the follwong command.

run_floorplan

to check the results go to the runs/floorplan folder. to view the floorplan we make use of magic. The command is as follows:-

magic -T /home/pandabro/.volare/volare/sky130/versions/dd7771c384ed36b91a25e9f8b314355fc26561be/sky130A/libs.tech/magic/sky130A.tech lef read /home/pandabro/OpenLane/designs/picorv32a/runs/RUN_2023.11.17_06.16.24/tmp/merged.nom.lef def read picorv32a.def

floorplan results.

Screenshot from 2023-11-21 17-15-18

The decap capacitors and tap cells (prevent latchup) are placed on the layout by tool. The physical edges are labelled based of the number of rows (same on b/s) and edges on Right and Left. Refer to the diagram below.


			    Left _2x+1_ ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Right _x_
     					|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
	  				|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
					.....							        |
					.								|
					.								|
			    Left _x+1_  .								| Right _0_

The image for reference.

image

Standard cells to be placed after placement.

image

Placement (Reference)

Placement involves the placing of standard cells onto the floorplan of the die/core. It occurs in 2 steps, that is, Global Placement and Detailed Placement.
Global Placement is a coarse placement of cells which will consider initial timing constraints, congestion and multi-voltage variants. However they are not legalised ( meaning the cells are placed such that they are not present on the standard cell rows, not appended with each other [incase of high frequency operations] and they overlap other cells --> in short they arent placed perfectly). Legalisation occurs in Detailed Placement. This will give rise to new timing violations as the postions of cells will be minutely changed and hence the wire lengths (capacitances) will also change. This will have to be optimised to progress forward.

The above image depicts a physical view of logic cells.

These cells are placed onto the core space in the following manner.

image

To ensure that the timing is maintained we optimise the placement. The respective cells are placed as close as possible to the related derivatives. In case signal intergrity fails due to large distance between the cells, repeaters (buffers) are placed in the path to reproduce the signal and drive it to the respective cell. Hence Area is compromised for better timing and performance.

Labs

OpenLane has congestion aware placement using RepLace. The Half Parameter Wire Length (HPWL) and overflow (OVFL) on reduction reduce the area used for placement of standard cells. (optimal and compact).

run_placement

image

zoomed in image of placement.

image

Cell Design Flow

Library file contains information about the gate functionality, dimensions, capacitance rating, timing and delay values and much more. We build, characterise and model these cells so that the tool can understand it.

It consists of 3 sections:-
Inputs

  1. PDKs --> files which contain information about the technology being used for yout design.
  2. DRC & LVS --> Physical design rules that need to be met so that the foundry can fabricate the cell.
  3. SPICE Models --> contains characteristics of the transistors that will be used to build the cell (threshold voltage, aspect ratio, capacitances, etc).
  4. library and user-defined spec --> cell height (space between Vcc and Gnd rails), cell width (delay constraints, drive strength), supply voltage (noise margin), metal layer specs (specific metal layer to be used), pin location (close to Vcc or Gnd).

Design steps

  1. Circuit design --> The circuit is designed by making use of the industry parameters and inputs. For instance, to model the aspect ratio of 2.5, the PMOS = 2.5 NMOS dimensions while keeping height constant based of the technology file. Similarly, Switching threshold is also model based off the requirement.
  2. Layout design --> build the circuit with transistors to meet the required functionality, apply Euler's Path (unidirectional traverse only) and create the respective network graphs, implement the stick diagram of the circuit topology. Finally, it should pass all the DRC & LVS checks set by the foundry.
  3. Characterization --> specific flow; Gives information on Timing, Power and Noise in the form of .libs files along with functionality.


Outputs

  1. Circuit Description Language (CDL)
  2. GSDII, LEF, extracted SPICE netlists (.cir)
  3. Timing, noise, power .libs, function

Characterization Flow (GUNA)

  1. Read the SPICE Model file
  2. Read extracted SPICE netlist
  3. Recognise the behavior of the circuit design*
  4. Read sub-circuit of the design
  5. Set the Power supply
  6. Apply stimulus
  7. Provide the load capacitance (NLDM --> range of capacitances)
  8. Provide simultion constraints

Timing Characterisation --> Delays between input and output wave from (Propagation Delay), Rise time; Fall time delays (Transition delay).
Solution --> Choosing the correct threshold points, Having proper circuit designs to reduce the wire delays. Negative delays are intolerable.

DAY 19

DAY 20

DAY 21