A repository containing a detailed documentation of my progress in the VSD-HDP program
Program link: VSD-HDP
- Author: Visruat(Visruat T R), visruattr@gmail.com
-
Program quick links:-
-
VSD-HDP Status Quick links:-
Installation guide
https://github.com/YosysHQ/yosys
Prerequiste dependency
$ sudo apt-get install build-essential clang bison flex \
libreadline-dev gawk tcl-dev libffi-dev git \
graphviz xdot pkg-config python3 libboost-system-dev \
libboost-python-dev libboost-filesystem-dev zlib1g-dev
Installation Flow
$ mkdir yosys-master
$ cd yosys-master
$ git clone https://github.com/YosysHQ/yosys.git
$ sudo apt install make(installing make if you havent done it yet)
$ sudo apt-get install build-essential clang bison flex \
libreadline-dev gawk tcl-dev libffi-dev git \
graphviz xdot pkg-config python3 libboost-system-dev \
libboost-python-dev libboost-filesystem-dev zlib1g-dev
$ cd yosys-master/yosys/
$ make
$ sudo install make
if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install yosys
$ sudo apt upgrade
Note : one can choose to make a separate build folder for performing the make command(installing yosys). This should be done after installing the dependency in the installation flow
$ mkdir build; cd build
$ make -f ../Makefile
Provides a path to the Makefile
Progress image
Installation guide
https://github.com/The-OpenROAD-Project/OpenSTA
Prerequiste dependency
$ sudo apt install swig
Installation flow
$ git clone https://github.com/The-OpenROAD-Project/OpenSTA.git
$ cd OpenSTA
$ mkdir build
$ cd build
$ cmake ..
$ make
if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install opensta
$ sudo apt upgrade
Progress image
Installation guide
Download the tarbell file ngspice-37.tar.gz
from old releases parent folder from
https://sourceforge.net/projects/ngspice/files/
Installation flow
$ tar -zxvf ngspice-37.tar.gz
$ cd ngspice-37
$ mkdir release
$ cd release
$ ../configure --with-x --with-readline=yes --disable-debug
$ make
$ sudo make install
if it doesn't work ( version mismatch might occur when combining other open software )
$ sudo apt install ngspice
$ sudo apt upgrade
Progress image
Note: gtkwave and iverilog were also installed
iverilog
$ sudo apt-get install iverilog
gtkwave
$ sudo apt update
$ sudo apt install gtkwave
Magic
$ sudo apt-get install m4
$ sudo apt-get install tcsh
$ sudo apt-get install csh
$ sudo apt-get install libx11-dev
$ sudo apt-get install tcl-dev tk-dev
$ sudo apt-get install libcairo2-dev
$ sudo apt-get install mesa-common-dev libglu1-mesa-dev
$ sudo apt-get install libncurses-dev
Learning how to use softwares like iverilog , gtkwave ,yosys etc
flow for simulation:
Iverilog and gtkwave codelines
- iverilog <filetop.v> <file1.v> …. <tb_filetop.v>
- ./a.out
Copy the generated dumpfile.vcd and run it with
- gtkwave dumpfile.vcd
RTL
`timescale 1ns / 1ps
module own_MUX_7x1(
input [6:0] i,
input [2:0] s,
output reg y
);
always@(i,s)
begin
case(s)
3'b000: y = i[0];
3'b001: y = i[1];
3'b010: y = i[2];
3'b011: y = i[3];
3'b100: y = i[4];
3'b101: y = i[5];
3'b110: y = i[6];
3'b111: y = i[6];
default: y = i[0];
endcase
end
endmodule
Testbench
`timescale 1ns / 1ps
module tb_own_MUX_7x1();
reg [6:0]i;
reg [2:0]s;
wire y;
own_MUX_7x1 uut(i,s,y);
initial
begin
$dumpfile("own_MUX_7x1.vcd");
$dumpvars(0,tb_own_MUX_7x1);
i=0;
s=0;
#300 $finish;
end
always #10 i=i+1;
always #25 s=s+1;
endmodule
Simulation Waveform
Synthesis
flow for synthesis under yosys:
yosys
read_liberty -lib <relative or abs path>/ lib file
read_verilog <verilog_file.v>
synth -top <verilog_file>
abc -liberty <relative or abs path>/ lib file ( generates results on ur design → netlist verify them before continuing)
show
write_verilog <file_name>.v OR write_verilog -noattr <file_name>.v
The following standard cells were invoked when mapped to the standard library file.
The synthesis of the design is as shown below
Netlist
/* Generated by Yosys 0.26+4 (git sha1 5ea2c290a, clang 10.0.0-4ubuntu1 -fPIC -Os) */
module own_MUX_7x1(i, s, y);
wire _00_;
wire _01_;
wire _02_;
wire _03_;
wire _04_;
wire _05_;
wire _06_;
wire _07_;
wire _08_;
wire _09_;
wire _10_;
wire _11_;
wire _12_;
wire _13_;
wire _14_;
wire _15_;
wire _16_;
wire _17_;
wire _18_;
wire _19_;
wire _20_;
wire _21_;
wire _22_;
wire _23_;
wire _24_;
wire _25_;
wire _26_;
wire _27_;
wire _28_;
wire _29_;
wire _30_;
wire _31_;
wire _32_;
wire _33_;
wire _34_;
wire _35_;
wire _36_;
wire _37_;
wire _38_;
wire _39_;
wire _40_;
wire _41_;
wire _42_;
input [6:0] i;
wire [6:0] i;
input [2:0] s;
wire [2:0] s;
output y;
wire y;
sky130_fd_sc_hd__mux2_1 _43_ (
.A0(_33_),
.A1(_34_),
.S(_39_),
.X(_36_)
);
sky130_fd_sc_hd__mux2_1 _44_ (
.A0(_29_),
.A1(_30_),
.S(_39_),
.X(_37_)
);
sky130_fd_sc_hd__mux2_1 _45_ (
.A0(_31_),
.A1(_32_),
.S(_39_),
.X(_38_)
);
sky130_fd_sc_hd__mux4_2 _46_ (
.A0(_37_),
.A1(_38_),
.A2(_36_),
.A3(_35_),
.S0(_40_),
.S1(_41_),
.X(_42_)
);
assign _41_ = s[2];
assign _40_ = s[1];
assign _39_ = s[0];
assign _35_ = i[6];
assign _34_ = i[5];
assign _33_ = i[4];
assign _32_ = i[3];
assign _31_ = i[2];
assign _30_ = i[1];
assign _29_ = i[0];
assign y = _42_;
endmodule
Hierarchy and flat synthesis under Yosys ; Synthesis of a flop
Hierachy png
Flattened png
flatten
was used to break the hierachy and make a single module.
D flip-flop with asynchronous and synchronous reset
to make use of the D Flip-Flop , the following command needed to be executed before mapping to the standard cell library
dfflibmap -liberty <relative or abs path>/ lib file
Logic Optimization: combinational and sequential (basic)
Combinational Logic Optimisation
To remove unused cells from the synthesis design, the command opt_clean -purge
is used. It optimises the cells that are redundant to the design but have been intialised from RTL code.
The above diagram involves the cells invoked to solve a complex boolean logic y = a?(b?(a & c ):c):(!c);
The optimisation of this boolean funvtion resulted in a XNOR gate y = a^c
as shown in the image below:
Here is another example where a hierachy of the modules exist.
these cells were inferred on synthesisiing the RTL file
The above image shows the synthesis of the design without removal of the hierachy. Yosys infers only the necessary logic which is linked to the outputof the design.
On removal of hierachy and optimising the logic using flatten
and opt_clean -purge
, the following result was obtained:
Sequential Logic Optimisation
Two examples were used to show the difference in design. [Sequential Constant - Basics]
The above diagram is the optimised design after identifying a sequential constant. Since the values of q1 and q remain unaffected by any change in inputs the combinations of D flipflops was replaced by a simple wires.
Now consider the second case where a sequential constant was not identified
The outputs q1 and q were not constant throughout and could be affected by inputs like reset and clk. Hence flops had to be inferred to complete the design synthesis.
A case of optimisation of unused ports which are not linked to the design output
3 bit up counter:
case 1: assign q = count[0];
it is clearly visible that yosys has optimised uncessary logic i.e the other two bits of the counter , that is not linked to the output q.
case 2: assign q = (count[2:0] == 3'b100);
In this case all bits of the counter are affiliated with q. Hence three flops are inferred to represent the same. The other circuitry contributes to the incrementation of the counter.
GLS, blocking vs non-blocking and Synthesis-Simulation mismatch
Testing to check whether the netlist generated from the synthesis tool works with ur testbench (it definitely should!). But cases of RTL to GLS mismatch do exist.
To make sure that the netlist is to be checked you will need to call the library files present in verilog_model.
A RTL design for a 2x1 MUX using ternary operator was tested and the follwing waveform was obtained.
The netlist for the same was generated using yosys and it was tested with the same testbench. To do so the library files needed to be called before doing so since the standard cell modules were used.
From the two pics, it is clear that Simulation and Synthesis matching is happening.
The RTL Design for a 'bad' 2x1 mux was done by not completing the sensitivity list and was tested.
The output shown is not correct as the y is updated only when there is a change in sel. The netlistfor the samw was generated and tested.
The difference in outputs is clearly stated. Hence a case of Simulation-Synthesis Mismatch is observed.
This RTL desgin of an OR-AND gate was done to understand blocking assigments in verilog. Inputs a,b were fed to the OR gate and its o/p and input c to an AND gate. The block was deigned with blocking assignments with the AND operation first followed by the OR operation.
Output y is incorrect as the previous value of OR o/p is taken for evaluation (flopped value) The synthesis of this desgin was done and the netlist was tested.
The output shown does incur the previous value of OR o/p hence giving the correct result.
Eg1: A 2x1 mux with no else block will lead to latch on the i0 - this will become the enable signal for the latch. It is observed in RTL simulation below.
a latch is inferred in yosys as well
Eg2: An undefined case will again lead to a latch being inferred. RTL simulation is shown below
Synthesis results
Avoiding latches due to incomplete case conditions, partial assignments of case outputs ,overlapping case conditions.
1) Incomplete Case Statement
Lets take a 4x1 mux where the condtions for select =2,3 are not defined along with default case. Latch is inferred for select =2,3 with enable being sel1_n.
RTL Simulation
Synthesis Output
2) Partial assignment case statement The partial assignment code is as follows
module partial_case_assign (input i0 , input i1 , input i2 , input [1:0] sel, output reg y , output reg x);
always @ (*)
begin
case(sel)
2'b00 : begin
y = i0;
x = i2;
end
2'b01 : y = i1;
default : begin
x = i1;
y = i2;
end
endcase
end
endmodule
y will have no latches.A latch will be inferred for x as it has not been assigned a value when sel =1;
RTL simulation
Synthesis Output
3) overlapping case statements
A situation where more than one case is satisfied in case-statement. This constitutes to bad coding as there should never be cases of case-statements conditions being the same. You will be at the mercy of the simulator to see how it will simultae this confused state since all the cases are checked in a case -statement despite being satisfied(no priority order).
module bad_case (input i0 , input i1, input i2, input i3 , input [1:0] sel, output reg y);
always @(*)
begin
case(sel)
2'b00: y = i0;
2'b01: y = i1;
2'b10: y = i2;
2'b1?: y = i3;
//2'b11: y = i3;
endcase
end
endmodule
RTL simulation
Synthesis Output The Synthesis tool will optimise the code and remove the redundant parallel case. It is observed that no latches are inferred.
There will be a simulation synthesis mismatch in this case as the code was optimised to remove the confusion.
for - used inside always block for evaluating multiple expressions --> like large mux ,demux etc generate for - used outside always block for initialising/generating multiple hardware units --> like ripple carry adder(rca)
RTL code
module mux_generate (input i0 , input i1, input i2 , input i3 , input [1:0] sel , output reg y);
wire [3:0] i_int;
assign i_int = {i3,i2,i1,i0};
integer k;
always @ (*)
begin
for(k = 0; k < 4; k=k+1) begin
if(k == sel)
y = i_int[k];
end
end
endmodule
RTL Siumlation
Synthesis Output
GLS Results
RTL code
module demux_generate (output o0 , output o1, output o2 , output o3, output o4, output o5, output o6 , output o7 , input [2:0] sel , input i);
reg [7:0]y_int;
assign {o7,o6,o5,o4,o3,o2,o1,o0} = y_int;
integer k;
always @ (*)
begin
y_int = 8'b0;
for(k = 0; k < 8; k++) begin
if(k == sel)
y_int[k] = i;
end
end
endmodule
RTL Simulation
Synthesis Output
GLS Result
rule for addition - [N,M] +1 bits = o/p; N,M are inputs
RTL code
RCA
module rca (input [7:0] num1 , input [7:0] num2 , output [8:0] sum);
wire [7:0] int_sum;
wire [7:0]int_co;
genvar i;
generate
for (i = 1 ; i < 8; i=i+1) begin
fa u_fa_1 (.a(num1[i]),.b(num2[i]),.c(int_co[i-1]),.co(int_co[i]),.sum(int_sum[i]));
end
endgenerate
fa u_fa_0 (.a(num1[0]),.b(num2[0]),.c(1'b0),.co(int_co[0]),.sum(int_sum[0]));
assign sum[7:0] = int_sum;
assign sum[8] = int_co[7];
endmodule
FA
module fa (input a , input b , input c, output co , output sum);
assign {co,sum} = a + b + c ;
endmodule
RTL Simulation
Synthesis Output
GLS Result
Pulse Width Modulation is a well-known technique used to create pulses of the desired width. The duty cycle is the ratio of how long that PWM signal stays at the high position to the total time period.
Pulse Width Modulated Wave Generator can be used to:
- control the brightness of the LED
- drive buzzers at different loudnes
- control the angle of the servo motor
- encode messages in telecommunication
- used in speed controlers of motors
This PWM generator generates 10Mhz signal(dependent on the counter module in the block). We can control duty cycles in steps of 10%. The default duty cycle is 50%. Along with clock signal we provide another two external signals to increase and decrease the duty cycle.
In this specific circuit, we mainly require a n-bit counter and comparator. Duty given to the comparator is compared with the current value of the counter. If current value of counter is lower than duty then comparator results in output high. Similarly, If current value of counter is higher than duty is then comparator results in output low. As counter starts at zero, initially comparator gives high output and when counter crosses duty it becomes low. Hence by controlling duty, we can vary the duty cycle.
The logic of the code was implemented using the following components
The gate level netlist generated connections were shown as follows
The functionality of the PWM generator with variable duty cycle is retained post-synthesis. Hence the deisgn does not have Simulation-Synthesis Mismatch
The related files are present in the PROJECT folder.
- Delay is a function of input transition i.e current (inflow) and output load i.e load capacitance (size of the bucket). [ direct proportional ]
- Timing arc --> delay infromation from all inputs to all outputs eg 2 i/p AND gate has 2 timing arcs -> a-q and b-q. Any changes in the inputs will affect the output. For a D FF we have 3 timimg arcs -> Clk - Q delay , setup time and hold time. For a D latch we have 4 timing arcs - D - Q delay ,Clk - Q delay ,setup time and hold time.
Note : triggering of DFF and Dlatch (setup and hold time) occur at sampling points. Therefore, for DFF it will be at posedge or negedge of Clk and for Dlatch it will be at negedge or posedge of Clk (pos level Clk or neg level clk). ------- IMPORTANT.
- Timing path - the path for data to move from a) clk of one flop to the input of next flop (reg 2 reg) b) input to output (not present usually IO path) c) input to flop d) clk of flop to output (c,d are IO Timing Paths). The max Tclk value will be the critical path of the design as it will be the least clk delay that can be used for the design.
Note :a) MAX constraint :- Tclk >= Tcq + Tcombi + Tsetup --> Data path(max) > Clk path(min)
b) MIN constraint :- Thold <= Tcq + Tcombi --> Data path(min) < Clk path(max)
The constraints are applied based on the design specifications
- reg2reg --> contrained by clk -> Tcombi will be squeezed to compensate.
- in2reg --> constrained by clk ,input external delay and input transition . Input logic will be squeezed to compensate.
input external delay
input transition delay --> incresase input logic delay which needs to be further squeezed.
- reg2op --> constrained by clk ,output external delay and output load (parasitic capacitance). Output logic will be squeezed to compensate.
output external delay
output load (parasitic capacitance) --> incresase output logic delay which needs to be further squeezed.
- reg2op and in2reg are called IO Paths and the delay modelling is called IO delay Modelling. (standard interface specifications like SPI,I2C --> industry protocols)
NOTE : 1) rule of thumb --> external delay : internal delay is 70:30. 2) IO paths need to be constrained for MAX delay(setup) and MIN delay(hold).
- exploring the library file and learning how the files were characterised --> area ,power , delay , capacitances , input (rise , fall) transistion , pin attributes --> direction,function,Clock pin,Timing sense and type etc , Power pin conncetions ( since logic gates are nothing but CMOS --> VGND , VPBN etc), and other necessary information.
- Lookup tables were also present in the lib file so that tool is able to select the necessary o/p based of the 2 indexes. Eg indexes --> input transition and output capacitance o/p --> timing delay. in case the specified values dont lie in the indexes the range in which the values lie is taken and interpolation is done obtain the value at the specified point.
similarly sequential cells will also have such factors and there exist more dependencies of one pin o/p on the other pin o/p --> related pins. They will also have Clock pins --> There timing type will be specified based of the type of flop it is (rising_edge or falling_edge --> posedge or negedge). The setup/hold time claculation part should also be specified to the tool as "setup_rising" or "setup_falling" to let the tool know at what edge of clock must the setup time be calculated.
LHS - posedge clk , RHS - negedge clk for DFF - defining type of clock
LHS - posedge clk , RHS - negedge clk for DFF - defining setup time calc
dc shell commands --> in our case we will be using OpenSTA since its a free source // verify if these functions match with OpenSTA
> list_lib
> foreach_in_collection my_lib_cell [get_lib_cells */*<the cell you need> {
set my_lib_cell_name [get_object_name $my_lib_cell];
echo$my_lib_cell_name;
}
> foreach_in_collection my_pins [get_lib_pins <cell name>/*]{ --> my_pins is a loop variable
set my_pins_name [get_object_name $my_pins]; --> 'get_object_name'
set pin_dir [get_lib_attribute $my_pins_name direction]; --> 'set' is for variable instantiation ; get_lib_pins/attribute <file_name> <pin/attribute_name>
echo$my_pins_name $pin_dir; --> to use a variable we use $<var>; to print a variable echo$<var>
}
> source <script filename>.tcl --> for calling a script file
> list_attributes -app > a --> for seeing all defined attributes in a library ; it is fed to file called 'a'.
A script file called my_script.tcl
NOTE: Important Constraint Commands 1)get_clocks 2)get_ports 3)get_pins 4)get_nets 5)set_input_transition -min -max 6)set_input_delay -min -max 7)set_clock_latency -source 8)set_clock_latency - 9)set_clock_uncertainty - 10)create_clock -name -per -wave 11)get_attribute 12)create_generated_clock -master -source -div 13) regexp a b
IN2REG constraints and REG2OUT constraints
The three start point is from the inputs of the design : reset, increase_duty(Flop), decrease_duty(Flop). There is no slack violation in the mentioned cases.
The reports generated by OpenSTA show that the design does not have any violators // add hyperlink here
The graph below is a spice simulation of a long channel NMOS with W = 5 L = 2
Some of the coordinates as shown in ngspice
Long and Short channel ( L > 25nm and L < 25nm )
Due to velocity saturation effect in short channel MOS, the saturation sets in early. As a result the peak current is much less compared to long channel MOS.
in the given case the L-channel MOS has peak current idmax = 410 uA while S-channel MOS had peak current idmax = 197 uA.
a plot between id vs Vgs at constant Vds = 1.8V
The threshold voltage Vt = 0.75V for this NMOS
CMOS Voltage Transfer Characteristics
NMOS and PMOS dc characteristics were merged and the voltage transfer characteristics for CMOS inverter was plotted. The above diagram denotes the behavior exhibited by the NMOS and PMOS clubbed together at different Vin and Vout conditions in the range of 0 to 2 V.
__SPICE Simulations for Id vs Vds for W = 0.39u and L = 0.15u
- It can be seen that compared to the previous Id vs Vds graph that the saturation current slightly increases even though W/L ratio remains constant. This is because of short channel effect.
SPICE Simulations for Id vs Vgs for W = 0.39u and L = 0.15u
SPICE Simulations for Id vs Vgs for W = 3.12u and L = 1.20u
- It can be seen from the even though the W/L ratio remains 2.6 for both the plots, Id is slightly different. Short channel transistor characteristics has more linearity than the long channel one.
Velocity Saturation
- At higher electric fields, the electrons velocity becomes constant.
- This happens in short channel as the electric field strengh increases due to reduced channel length.
VTC
- Used to calculate the delay tables for STA.
- The plot of Vout vs Vin in CMOS inverter.
Spice Netlist for inverter
Spice Simulation for VTC of CMOS Inverter
The switching threshold voltage was found to be around 0.87 V.
Spice Simulation of CMOS Inverter ( V vs Time )
- From the above simulation delay values for rise and fall can be observed.
- Rise time --> 0.33ns
- Fall time --> 0.28ns
- The delay values of different inverters with increased pmos W/L ratio can be found.
- It is seen that the value of rise delay decreases and fall delay increases with increase in W/L of pmos.
NOISE MARGIN
NMH = VOH-VIH // High noise margin
NML = VIL-VOL // Low noise margin
SPICE Simulation
- VOH = 1.728 V , VOL = 0.066 V, VIL = 0.754 V, VOH = 1.014 V.
- NMH = 0.714 V
- NML = 0.688 V
Spice Simulation of Power Supply Scaling
- It can be observed by the above values that the gain increases as power supply is reduced upto 1.2 V and after 1 V it starts decreasing.
Spice Simulation of Device Variation
- The swithing threshold is observed to be around 0.98 V.
- The pmos W/L ratio is more than its nmos counterpart due to which the pmos charges the load capacitor more. Meaning we have a strong p-fet and a weak n-fet.
Plot of TNS,WNS,WHS and Worst Slack with multiple corners
The above image represents the S-H translation. It starts at the software application level which takes in an input. This input is now processed by the Operating System (OS). An OS performs low level system functions, handles IO operations and allocates memory. It instigates the Compiler to convert high level abstract code of the software to Assembly/Low level code instructions. This is further converted into a bit stream by the Assembler to serve as input to the Hardware.
The Instruction Set Architecture (ISA) refers to the 'architecture' of the computer/processor. For example, if the ISA used is of RISC-V, the code converted by the compiler should give instructions suitable for RISC-V core. Hence, one can say that ISA basically represents the Hardware at an intermediate stage.
The Assembler converts the instructions into a bitstream that is fed to the Hardware. To obtain the hardware or final layout, a certain number of steps need to be followed. The RTL/HDL of the core followed by an optimised/synthesized netlist which is converted into the layout/hardware.
Consider a chip on an arduino board, it would contain the following components:-
It has several protocols, an external memory unit (SDRAM), GPIOS, PWM etc. Now all (except memory external chip) of these are contained in a package as shown below.
It represents a 7x7 [dimensions] QFN-48 [Quad Flat No leads; 48 pins].
The chip is connected to the package with the help of wire bonds
Signals are sent to and fro the pins of the chip via pads. The die is the minimal silicon area on which the pads are present. The core is the region which contains the logic of the designed chip.
The core region contains Macros and Foundry IPs. Foundry IPs are intellectual blocks which are designed specifically by the foundry and it is unique to each foundry. Some examples are PLL, SRAM, DAC, ADC, etc. Macros are Digital designs which form the crux of the core region. In this case, RISC-V SoC, GPIO bank etc.
Summary
- The chip in general contains many cores which are known as foundary IPs. These can be PLL, SRAM, DAC etc.
- The general flow from applicaton software down to the hardware with system software block in between.
- System Compiler consists of the OS, Compiler and the assembler.
- OS handles I/O operations, memory management, process management and low level system functions.
- Compliler converts high level code into the respective low level code according to hardware (ARM, Intelx86, RISCV, MIPS etc).
- Assembler converts low level machine instructions (ISA) into binary streams.
- Hardware description is written in HDL for respective ISA to follow PD flow.
What is PDK?
PDK (process design kit) is the interface between the FAB and the designers. It contains a collection of files used to model a fabrication process for the EDA tools used to design an IC.
- Process Design Rules: DRC,LVS,PEX
- Device Models
- Digital Standard Cell Libraries
- I/O Libraries etc
RTL TO GDSII flow
-
RTL Design
-
Synthesis
- Translation of RTL to a gate level netlist using Standard Cell Library (SCL). It is followed by STA to check for initial timing violation.
-
Floor planning (PD)
- Initial Layout of the Design.
- Chip Partitioning --> divide the design into smaller blocks while maintaining functionality.
- Macro Partitioning --> dividing and placing macros, rows and pins.
- Power planning --> setting the VDD and GND layers. The top layers are used as they are wide and offer less resistance.
- Initial Layout of the Design.
the above steps involve Partitioning, Floor planning and Power planning.
-
Placement
- finalised layout of the modules, macros, pins and pads.
- Global --> tries to find optimal position for all cells. Such cells are not neccesarily legal. There is overlapping of cells.
- Detailed --> placements obtained from global are minimally altered to be legal.
- finalised layout of the modules, macros, pins and pads.
-
Clock Distribution Network
- Clock Tree Synthesis (CTS) performed to ensure the clk signal reaches all sequential elements in a circuit design with zero to minimal skew.
- CTS alters the netlist. Functionality check is required before progressing
- Logical Equivalence Check (LEC) --> formally confirm that the function did not change after modifying netlist
- Clock Tree Synthesis (CTS) performed to ensure the clk signal reaches all sequential elements in a circuit design with zero to minimal skew.
it is imperative to check functionality when the netlist is modified.
Note: Fake antenna diode insertion --> Antenna violations may occur which cause damage to the transistors as reactive charges begin to accumulate (usually taken care by Routing). There are two methods to approach this issue.
- Bridging --> Attaches higher layer intermediary.
- Antenna diodes --> nullify reactive charges.
OpenLane adds fake antenna cells to all gates after placement --> if ant violation is detected it will replace the fake cell with a real one.
- Routing (Reference)
- Implement the interconnect (horizontal and vertical wires) using the available metal layers.
- The skywater pdk contains all the data ( location, size, thickness, pitch, vias ..etc) about the interconnect/metal layers.
- Metal tracks form a routing grid.
- Global Routing --> coarse grained grids used to generate routing guides
- Detailed Routing --> fine grained grids and routing guides to implement actual wiring.
- Physical Verification --> DRC and LVS.
- Timing Verification --> STA
skywater130 pdk --> (1) lowest layer/local interconnect layer (Titanium Nitride) + 5 layers above (Aluminium) = (6)
Note: OpenLANE --> produce clean (no DRC, LVS, timing violations) GDSII with no human intervention.
- DFT (Design for Testing)
- Scan insertion
- Automatic Test Pattern Generation (ATPG)
- Test Patterns Compaction
- Fault Coverage
- Fault Simulation
- Physical Implementation (Automatic PNR) --> OpenRoad
- Floor/Power planning
- End Decoupling capacitors and tap cell insertion
- Placement
- Post Placement Optimization
- CTS
- Routing
Installation (Reference)
Docker installation.
sudo apt-get update
sudo apt-get upgrade
sudo apt install -y build-essential python3 python3-venv python3-pip make git
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world
sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot
After reboot, check for correct installation.
docker run hello-world
Successfull installation
Check for the following dependencies.
git --version
docker --version
python3 --version
python3 -m pip --version
make --version
python3 -m venv -h
Download and build OpenLane from github
git clone https://github.com/The-OpenROAD-Project/OpenLane
cd OpenLane
make
make test
Run a basic test.
# Enter a Docker session:
make mount
# Open the spm.gds using KLayout with sky130 PDK
klayout -e -nn $PDK_ROOT/sky130A/libs.tech/klayout/tech/sky130A.lyt \
-l $PDK_ROOT/sky130A/libs.tech/klayout/tech/sky130A.lyp \
./designs/spm/runs/openlane_test/results/final/gds/spm.gds
# Leave the Docker
exit
Successful execution with some warnings.
Openlane is automated RTL to GDSII flow that consists of multiple tools (obviously opensource) such as OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout and a number of custom scripts for design exploration and optimization. It has two modes to promote "No human in flow", that is, autonomous and interactive. For understanding the process of the flow, I will be using the "interactive" method.
Before I get into the Openlane Flow, A small intro about Opensource pdks used in Openlane would be helpful.
From Openlane is compatible with pdks namely skywater130 and osu. sky130A is the variant of skywater-pdk which is compatible with opensource tools. Under the variant, we have libs.tech--> contains the library files related to the tools used in the flow and libs.ref--> contains library files for the different skywater pdks
I will be using sky130_fd_sc_hd for my design.
Look into the different types of file types which are used to build a pdk.
- verilog --> netlists
- techlef --> metal layer data and design rules (technology files)
- spice --> circuit netlists of analog devices
- maglef --> used for displaying metal layers in the layout tool
- mag --> used for displaying layout on the layout tool
- lib --> contains the flavours of library files for different process corners. In short logical libraries.
- lef --> contains physical info such as shape, size, direction, and symmetry, input and output pins direction for each cellin the design.
- gds --> (GDSII) used to store IC layout information.
- cdl --> similar to spice netlists; stores electronic circuit information.
Starting up docker
cd Openlane
make mount
In docker
./flow.tcl -interactive
package require openlane 0.9
prep -design picorv32a
This sets up the tool for running the flow for the design picorv32a under the designs folder.
run_synthesis
is used to perform synthesis and sta of your design.
Note: For a custom design. You will need to create a config.tcl. The sky130_fd_sc_hd_config.tcl is not compulsory. Config.tcl overwrites default parameters.
So the question that arises is what is in the file?
# Design
set ::env(DESIGN_NAME) "picorv32a"
set ::env(VERILOG_FILES) "./designs/picorv32a/src/picorv32a.v"
set ::env(SDC_FILE) "./designs/picorv32a/src/picorv32a.sdc"
set ::env(CLOCK_PERIOD) "5.000"
set ::env(CLOCK_PORT) "clk"
set ::env(CLOCK_NET) $::env(CLOCK_PORT)
set filename $::env(OPENLANE_ROOT)/designs/$::env(DESIGN_NAME)/$::env(PDK)_$::env(STD_CELL_LIBRARY)_config.tcl
if { [file exists $filename] == 1} {
source $filename
}
Config.tcl is used to set the files and parameters in the flow environment. As shown in the snippet above.
Flop ratio and chip area
pattern_1 ( before opt command )
1613/18036 = 8.9432%
pattern_2 ( post opt command )
1613/14876 = 10.8430%
Chip area for module '\picorv32a': 147712.918400
Lets consider a basic example of a combo logic between capture and launch flops. Each cell and each flop will have dimensions (in this case lets take unit dimensions). Now to produce the components of the netlist (cell and flops) , it needs to be structured on the silicon wafer die. Hence I would need to place these components of the netlist in certain way such that it fits in the core to be placed on the die.
Utilization factor = Area of the netlist / Total area of the core ( < 1 usually 0.5/0.6 )
Aspect ratio = Height / Width of the core ( if 1 --> square core; else --> rectangle core )
The below image explains the factors
lets consider a combo logic which consists of a massive number of gates (50,000). When implementing this on a single die the utilization factor will surely increase. Hence the gates are partitioned into smaller blocks with input and outputs between these blocks. These blocks can further be converted to a black box to aid in reuseability of the function in the design.
When hard macros such as memory, comparator,.. etc are used in the designs, these locations are user defined and the tools will not touch these IPs during the automated PnR flow.
Note: Macro is a predefined and reuseable blocks of logic which can perform specific tasks. There are two types of macros, namely:
- Hard macros --> non-flexible, PPA and timing is fixed, available as ICs, industry graded.
- Soft macros --> flexible, PPA and timing is unpredictable, synthesizable RTL.
Memories are often placed close to the input side. Memory units serve as pre-placed cells. Now connectivity with these units is done through the supply/power lines in the chip. They are connected with wires. The physical distance between the source and the cell will cause a drop in the voltage. In such a scenario, if the voltage reaching the cell is not sufficient to meet the Noise Margin specifications, it would cause an unpredictable output at the cell. The solution for it is to use de-coupling capacitors to provide a "backup supply" closer to the unit(zero to minimal voltage drop due to very short distance).
How does a de-coupling capacitor work?
lets take an AND gate. During switching from 0 to 1 state, if the voltage being supplied to the gate from the Power line drops below the required voltage, the capacitor Cd discharges and supplies power to the AND gate temporarily to ensure correct voltage is being supplied. When no switching is taking place the Cd is charged by the Power lines. Hence it ensures proper voltage is being supplied to the gate during switching operations.
It also bypasses high frequency noise from other units and prevents crosstalk between closely placed cells.
The power fluctuation issue was stabilised for a local module using de-coupling capacitors. Now I will have to consider fluctuations between multiple such modules in the chip.
The orange line indicates a 16-bit bus.
It is not feasible to have capacitors throughout the chip. However, if not considered it will lead to voltage drooping and ground bounce which will momentarily affect the working of the chip (it is bad for large designs). Voltage drooping is a condition in which multiple capacitors (of a bus) draw current from the same power line causing the source voltage to drop below the original value. Closely, ground bounce is a state when the ground value is slightly above zero because of many capacitors discharge current into a single ground line. These will definitely lead to uncertaintity in the internal functioning of the chip.
The solution to this problem is the introduction of many other power lines in the form of a grid/mesh. Hence the capacitor closest to the power line can tap into whichever needed. VDD power lines are placed in vertically and horizontal layers with metal contacts. The GND lines are also placed similarly in the same level as VDD. However it is made sure that both these lines are isolated from each other.
A chip will have input as well as outputs and to tap into these values I will require pin placement on the chip. Once the design is complete, all the inputs and outputs are placed in a region specifically reserved for pins. This is done by adding a blockage element to that area to restrict the tool from placing cells. This is called as logical cell placement blockage.
The pins are optimized by fanout from a common point and are placed in a random order in the reserved area. Many parameters are considered while placing the pins such as connectivity, proximity, type of pins (eg i/o, clk, power/gnd),.. etc. Clock signal is used to facilitate all the flops and sequential elements in the chip. Hence, the clk pin is larger than the i/o pins so that it offers least resistance to the path.
Change switches/variables ( info under configuration) in the design config.tcl to suit your needs. Then run the follwong command.
run_floorplan
to check the results go to the runs/floorplan folder. to view the floorplan we make use of magic. The command is as follows:-
magic -T /home/pandabro/.volare/volare/sky130/versions/dd7771c384ed36b91a25e9f8b314355fc26561be/sky130A/libs.tech/magic/sky130A.tech lef read /home/pandabro/OpenLane/designs/picorv32a/runs/RUN_2023.11.17_06.16.24/tmp/merged.nom.lef def read picorv32a.def
floorplan results.
The decap capacitors and tap cells (prevent latchup) are placed on the layout by tool. The physical edges are labelled based of the number of rows (same on b/s) and edges on Right and Left. Refer to the diagram below.
Left _2x+1_ ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Right _x_
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
..... |
. |
. |
Left _x+1_ . | Right _0_
The image for reference.
Standard cells to be placed after placement.
Placement (Reference)
Placement involves the placing of standard cells onto the floorplan of the die/core. It occurs in 2 steps, that is, Global Placement and Detailed Placement.
Global Placement is a coarse placement of cells which will consider initial timing constraints, congestion and multi-voltage variants. However they are not legalised ( meaning the cells are placed such that they are not present on the standard cell rows, not appended with each other [incase of high frequency operations] and they overlap other cells --> in short they arent placed perfectly). Legalisation occurs in Detailed Placement. This will give rise to new timing violations as the postions of cells will be minutely changed and hence the wire lengths (capacitances) will also change. This will have to be optimised to progress forward.
The above image depicts a physical view of logic cells.
These cells are placed onto the core space in the following manner.
To ensure that the timing is maintained we optimise the placement. The respective cells are placed as close as possible to the related derivatives. In case signal intergrity fails due to large distance between the cells, repeaters (buffers) are placed in the path to reproduce the signal and drive it to the respective cell. Hence Area is compromised for better timing and performance.
OpenLane has congestion aware placement using RepLace. The Half Parameter Wire Length (HPWL) and overflow (OVFL) on reduction reduce the area used for placement of standard cells. (optimal and compact).
run_placement
zoomed in image of placement.
Library file contains information about the gate functionality, dimensions, capacitance rating, timing and delay values and much more. We build, characterise and model these cells so that the tool can understand it.
It consists of 3 sections:-
Inputs
- PDKs --> files which contain information about the technology being used for yout design.
- DRC & LVS --> Physical design rules that need to be met so that the foundry can fabricate the cell.
- SPICE Models --> contains characteristics of the transistors that will be used to build the cell (threshold voltage, aspect ratio, capacitances, etc).
- library and user-defined spec --> cell height (space between Vcc and Gnd rails), cell width (delay constraints, drive strength), supply voltage (noise margin), metal layer specs (specific metal layer to be used), pin location (close to Vcc or Gnd).
Design steps
- Circuit design --> The circuit is designed by making use of the industry parameters and inputs. For instance, to model the aspect ratio of 2.5, the PMOS = 2.5 NMOS dimensions while keeping height constant based of the technology file. Similarly, Switching threshold is also model based off the requirement.
- Layout design --> build the circuit with transistors to meet the required functionality, apply Euler's Path (unidirectional traverse only) and create the respective network graphs, implement the stick diagram of the circuit topology. Finally, it should pass all the DRC & LVS checks set by the foundry.
- Characterization --> specific flow; Gives information on Timing, Power and Noise in the form of .libs files along with functionality.
Outputs
- Circuit Description Language (CDL)
- GSDII, LEF, extracted SPICE netlists (.cir)
- Timing, noise, power .libs, function
- Read the SPICE Model file
- Read extracted SPICE netlist
- Recognise the behavior of the circuit design*
- Read sub-circuit of the design
- Set the Power supply
- Apply stimulus
- Provide the load capacitance (NLDM --> range of capacitances)
- Provide simultion constraints
Timing Characterisation --> Delays between input and output wave from (Propagation Delay), Rise time; Fall time delays (Transition delay).
Solution --> Choosing the correct threshold points, Having proper circuit designs to reduce the wire delays. Negative delays are intolerable.