- 1. Introduction
- 1.1. Context, History and Outlook
- 1.2. Objectives for this tutorial, and target audience
- 1.3. What is an ISA Formal Specification?
- 1.4. How people use ISA formal specs
- 1.5. About Sail, the language in which the RISC-V ISA Formal Spec is written
- 1.6. RISC-V ISA Formal Spec Credits and Acknowledgements
- 1.7. Status and plans
- 1.8. Links, Downloading and Installing
- 2. Reading the ISA to understand semantics of each RISC-V instruction
- 2.1. A first taste
- 2.2. Spec source files
- 2.3. A word about types and type-checking
- 2.4. Preliminaries; Registers and values; “Scattered” organization of instruction specs
- 2.5. Instruction fields
- 2.6. Some base instructions
- 2.6.1. LUI and AUIPC: decoded view
- 2.6.2. LUI and AUIPC: Decoding
- 2.6.3. LUI and AUIPC: execution
- 2.6.4. LUI and AUIPC: assembly language parsing and printing
- 2.6.5. Conditional branch instructions: ast clause
- 2.6.6. Conditional branch instructions: encoding/decoding
- 2.6.7. Conditional branch instructions: execution
- 2.6.8. Conditional branch instructions: Assembly language parsing and printing
- 2.7. Taking stock
- 2.8. LOAD instructions: execution
- 2.9. Exceptions
- 3. Composing Individual Instruction Semantics into an Executable Model
- 4. Taking stock …
- 5. Executing standard RISC-V ISA Tests
- 6. Executing standard RISC-V Compliance Tests
- 7. Adding new ISA extensions to the Formal Spec
- 8. Concurrency and Connection to RISC-V Weak Memory Model
This is an introductory tutorial for the RISC-V ISA Formal Specification. The Formal Spec is intended to be the official and definitive description of the RISC-V ISA, i.e., the standard against which the functional correctness of all other implementations are measured (including hardware implementations and software simulators).
This document is in the following repository: https://github.com/rsnikhil/RISCV_ISA_Spec_Tour, in the following files:
Tutorial.adoc (Asciidoc source) Tutorial.html (generated HTML)
Please view it either by visiting the GitHub link and clicking on the
.adoc
version (GitHub will render it automatically for your
browser), or by downloading the .html
version and opening it in your
browser.
Formal specifications have a reputation of being intimidating/difficult for most people. However:
Dear Reader, it may previously have seemed abnormal,
To consider using an ISA Spec Formal;But our future RISC-V tasks loom large and can terrify,
As CPUs, compilers and IPs we design and verify;Greek symbol-laden formal specs may make us queasy,
But this one’s different; Sail’s really quite easy:Instead of passing out in shock, wheeled out on a gurney,
This tutorial, we hope, for you, starts a more pleasant journey.
If you wish to dive right into the tutorial you can skip to Section Links, Downloading and Installing.
-
The official “owner” of the RISC-V ISA specification is RISC-V International Association (formerly the RISC-V Foundation). We’ll refer to it as “RVIA” below.
-
In 2017, RVIA created a Technical Committee to develop a formal specification for the RISC-V ISA. In the meanwhile, the “specification” was written in English, and manifested in simulators such as Spike.
-
The Technical Committee met 2017-2019 and considered several approaches, and ultimately selected the Sail spec as the canonical one. This was approved by the RVIA Board in Fall 2019.
-
RVIA has stated that the Formal Spec is intended to be the official and definitive description of the RISC-V ISA, i.e., the standard against which the functional correctness of all other implementations are measured (including hardware implementations and software simulators).
-
The RISC-V ISA is evolving. The base Unprivileged and Privileged specs continue to be clarified, and many new ISA extensions are being developed (e.g., vector). This Sail Formal Spec will be continuously updated accordingly.
-
The current version of the formal spec covers the Unprivileged RV32GC and RV64GC, and Privileged ISAs. It is sufficient for simulators generated automatically from the spec to be able to boot a number of operating systems (Linux, FreeBSD, FreeRTOS, …).
-
Disclaimer: Any errors in this document are solely due to the author; we welcome feedback and suggestions for improvement.
Objectives
-
Primary objective: To give you a basic reading literacy of the RISC-V ISA Formal Specification, i.e., where you are comfortable reading and consulting the formal spec on your own on a daily basis.
-
Secondary objective: To show you how to execute the formal spec on RISC-V binaries, such as standard ISA tests, the Compliance Suite, and your own compiled programs. Typically, you’d do this to compare a RISC-V implementation (simulator, architectural model, actual hardware implementation) to see if the implementation’s behavior matches the formal spec accurately.
-
Tertiary objective: Briefly discuss:
-
how to extend the spec for a newly developed ISA extension
-
concurrency and weak memory models
-
Target audience
We aim this tutorial at working RISC-V engineers—CPU designers, compiler writers, simulator/emulator writers, etc.-- who consult the spec frequently to clarify their understanding of instruction encodings and semantics. We do not assume any prior experience with formal languages or formal methods.
Experts and Afficionados of formal methods may also find this tutorial useful as an initial familiarization with Sail and the RISC-V formal spec.
The formal spec of the RISC-V ISA is intended to be:
-
the authoritative and definitive reference for RISC-V instructions:
-
Encoding (in bits)
-
Execution semantics (what executing each instruction is supposed to do). It is intended to be more authoritative than the English prose spec (completeness, precision, unambiguity) or any other simulator.
-
-
executable: can be run as a simulator executing RISC-V binaries, providing definitive execution behaviors
-
readable and usable by, and useful to, ordinary mortals who don’t do formal stuff for a living.
-
Casual reading, as a reference guide to RISC-V instructions.
-
Executable “golden reference model” to check implementation correctness.
-
-
for those who do formal stuff for a living, usable with formal tools for proofs of correctness of compilers, CPU implementations, automatic generation of tests, test coverage, etc.
People are already using and will use the ISA formal spec in various ways.
-
As a reading reference to clarify the intended semantics of each type of instruction.
(Enabling this is the primary goal of this tutorial).
-
As a “golden reference model” against which to compare functional correctness other implementations (simulators and hardware designs). Specific examples of such usage include:
-
The RVIA official Compliance Test Suite and Compliance testing framework
-
Tandem Verification (which is a kind of dynamic instruction-by-instruction compliance testing).
-
-
In a tool to generate ISA tests automatically.
-
In a tool to measure instruction coverage automatically.
-
In a tool to formally prove a separately-written implementation correct, by directly correlating the ISA formal semantics with the semantics of the language of the implementation:
-
Simulators (written in C, C++, SystemVerilog, …)
-
Actual CPU hardware, designed in SystemVerilog, Bluespec, Chisel, …
-
-
In a tool to formally and systematically derive an implementation from the ISA formal spec using a series of derivations, each formally proved correct (“correct-by-construction implementation”).
-
… and so on.
Note
|
The name “Sail” is not an acronym; it is the same word as in “a ship’s sail”. |
The RISC-V ISA Formal Spec is written in the language Sail, which is a DSL (Domain-Specific Language) designed for purpose, i.e., for writing ISA specs. Sail has also been used to describe other ISAs, including ARMv8 (complete spec!), MIPS, parts of x86 and IBM POWER, and more. It has also been used to formalize ISA extensions for some of these ISAs.
Don’t worry, Sail is easy! The design of the Sail language was heavily informed by the style in which actual specs for many previous ISAs were written (in English, but typically in a very formal style). It is intended to be easily understandable and usable by practicing engineers, not just by experts in formal methods. Sail was created by Peter Sewell and his research group at University of Cambridge, UK.
In this tutorial we won’t study Sail separately; we’ll jump into studying the RISC-V Spec written in Sail, explaining any necessary Sail notation as we go along.
The general Sail repository (not RISC-V specific) is https://github.com/rems-project/sail, where you can find the Sail manual, a compiler to produce executable models from Sail specs, and much more.
The RISC-V spec in Sail has its own repository:
https://github.com/rems-project/sail-riscv. We will be studying the
code in the model/
directory.
The general Sail repository contains a compiler with several back ends to transform an ISA spec written in Sail into a C-based executable simulator, into an OCaml-based executable simulator, into inputs for formal environments such as Coq, Isabelle, HOL4, etc. The compiler itself is written in OCaml, a widely used general-purpose functional programming language.
The following picture (original in https://github.com/rems-project/sail) gives a sense of the Sail world:
The following publication is a standard reference for Sail:
ISA Semantics for ARMv8-A, RISC-V, and Cheri-MIPS, Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, Peter Sewell, in Proc. 46th ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), Cascais/Lisbon, Portugal, Jan 13-19, 2019, pp. 71:1—71:31.
Authors (so far) of the RISC-V ISA Formal Spec in the Sail language:
Prashanth Mundkur, Jon French, Brian Campbell, Robert Norton-Wright, Alasdair Armstrong, Thomas Bauereiss, Shaked Flur, Christopher Pulte, Peter Sewell, Rishiyur Nikhil
This list will no doubt grow as the spec evolves, both for clarity and to include new ISA extensions.
Thanks to Alasdair Armstrong, Robert Norton-Wright, Prashant Mundkur and Peter Sewell for guidance and feedback in preparing this tutorial.
The RISC-V ISA Formal Specification in Sail currently covers the RV32GC, RV64GC Unprivileged and Privileged ISAs. The functionality is sufficient to cover many operating systems (Linux, FreeBSD, FreeRTOS, and more). The generated C-based simulators are fast enough to boot those OSs in a few seconds to minutes.
Here are some of the upcoming and future activities:
-
General accessibilty, training, tutorials so that the Formal Spec becomes routinely used by all RISC-V practising engineers.
-
Integration of the Formal Spec into the prose English spec document, so that, for each instruction described in prose, you can also see the corresponding Sail code for that instruction.
-
Formalization of “implementation options”. When comparing the functional correctness of a particular implementation (hardware or simulator) with the ISA Formal Spec, the simulator generated from the Formal Spec should be configured with the same implementation choices as the implementation. Examples of implementation choices include:
-
Whether a misaligned memory access traps or is handled directly.
-
Whether the A (accessed) and D (dirty) bits in a Page Table Entry invoke a trap or are handled directly.
-
How WARL (“Write Any, Read Legal”) bits of a CSR are actually updated.
-
and so on.
-
-
Automation and ease-of-use in use with the Compliance Test suite
-
Concurrency and tight integration with weak memory models
-
Formalization of all new official ISA extensions
This tutorial is in https://github.com/rsnikhil/RISCV_ISA_Spec_Tour.
This document is Tutorial.{adoc,html}
in that repository.
The repository also contains a document Installation.{adoc,html}
,
which describes some installation you can/should perform before
embarking on this tutorial. It describes one necessary and one
optional step:
-
Step A: sufficient for reading the ISA formal spec; just clones the RISC-V ISA Formal Spec repository https://github.com/rems-project/sail-riscv.
-
Step B: needed for creating an executable version of the spec.
Although this document is self-contained, containing code fragments,
we recommend that, in parallel, you view the actual code from the
sail-riscv
repository in a text viewer or editor. The fragments
here are excerpts, contain elisions, and cannot show their larger
context.
Each code fragment in this document shows the file from which it is taken.
Note
|
The descriptions below are based on commit 41a4072fac3e (Nov 30
2020) of the |
The central component of the formal specification of an ISA is the specification of individual instructions:
-
How each instruction is represented in bits
-
In Sail, we also specify how assembly language notation for an instruction is mapped to its bit representation
-
-
How each instruction modifies architectural state when it is executed
Surrounding these individual-instruction specs can be many possible specs for composing them into an execution engine. The simplest of these would be a completely sequential composition, a classical fetch-decode-execute loop. A more complex version would allow for more concurrent composition, for modeling superscalar and out-of-order cores and multicores with weak memory models.
The semantics of each instruction is given by an execute
instruction, a fragment of which is shown below.
sail-riscv/model/riscv_insts_base.sail
function clause execute (RTYPE(rs2, rs1, rd, op)) = { let rs1_val = X(rs1); let rs2_val = X(rs2); let result : xlenbits = match op { RISCV_ADD => rs1_val + rs2_val, RISCV_SLL => if sizeof(xlen) == 32 then rs1_val << (rs2_val[4..0]) else rs1_val << (rs2_val[5..0]), ... }; X(rd) = result; RETIRE_SUCCESS }
The function argument says that it is an “R-format” instruction
(RTYPE
) containing source register fields rs1
and rs2
,
destination register field rd
, and an op
sub-opcode identifying
the specific operation within the group of R-format instructions.
The function body shows that we:
-
read a source register
X(rs1)
, -
read a source register
X(rs2)
, -
perform the operation specified by
op
(this excerpt showing only theADD
andSLL
sub-opcodes), -
and write the result to destination register
X(rd)
.
The Sail language does not have a package/module structure—a full Sail program is just the concatenation of the source files. We organize the spec into separate files just according to our own convenience.
sail-riscv/model/
$ ls main.sail riscv_insts_cfext.sail riscv_step_ext.sail prelude_mapping.sail riscv_insts_dext.sail riscv_step_rvfi.sail prelude_mem_metadata.sail riscv_insts_end.sail riscv_step.sail prelude_mem.sail riscv_insts_fext.sail riscv_sync_exception.sail prelude.sail riscv_insts_hints.sail riscv_sys_control.sail README.md riscv_insts_mext.sail riscv_sys_exceptions.sail riscv_addr_checks_common.sail riscv_insts_next.sail riscv_sys_regs.sail riscv_addr_checks.sail riscv_insts_rmem.sail riscv_termination_common.sail riscv_analysis.sail riscv_insts_zicsr.sail riscv_termination_duo.sail riscv_csr_ext.sail riscv_iris.sail riscv_termination_rv32.sail riscv_csr_map.sail riscv_jalr_rmem.sail riscv_termination_rv64.sail riscv_decode_ext.sail riscv_jalr_seq.sail riscv_types_common.sail riscv_duopod.sail riscv_mem.sail riscv_types_ext.sail riscv_ext_regs.sail riscv_misa_ext.sail riscv_types.sail riscv_fdext_control.sail riscv_next_control.sail riscv_vmem_common.sail riscv_fdext_regs.sail riscv_next_regs.sail riscv_vmem_rv32.sail riscv_fetch_rvfi.sail riscv_pc_access.sail riscv_vmem_rv64.sail riscv_fetch.sail riscv_platform.sail riscv_vmem_sv32.sail riscv_flen_D.sail riscv_pmp_control.sail riscv_vmem_sv39.sail riscv_flen_F.sail riscv_pmp_regs.sail riscv_vmem_sv48.sail riscv_freg_type.sail riscv_pte.sail riscv_vmem_tlb.sail riscv_insts_aext.sail riscv_ptw.sail riscv_vmem_types.sail riscv_insts_base.sail riscv_regs.sail riscv_xlen32.sail riscv_insts_begin.sail riscv_reg_type.sail riscv_xlen64.sail riscv_insts_cdext.sail riscv_softfloat_interface.sail rvfi_dii.sail riscv_insts_cext.sail riscv_step_common.sail
As you can see, there are many source files (and the number will grow as we add formal specs for new standard ISA extensions).
The files named riscv_insts_*.sail
are the central files describing
individual instructions and their semantics: …_base
for the base
instruction set, …_aext
for the A (atomics) extension, …_cext
for the C (compressed) extension, etc.
Other files describe state: riscv_regs
for the integer register
file, riscv_fdext_regs
for the floating point extension (F,D)
register file, etc.
Still other files concern composing individual instruction semantics
into an execution model: riscv_fetch
, riscv_step
(fetch-execute-interrupt), etc.
In this tutorial we will look at excerpts of some of these files.
-
Sail is a strongly-typed language, and does its type-checking statically (i.e., on the source code, without running the code).
-
Many types are familiar from other languages (particularly functional programming languages): vectors, structs, algebraic types/tagged unions, …
-
Perhaps the most unfamiliar for many people will be the use of numbers as types.
-
In ISAs (unlike most software programming languages) we deal with representations (e.g., bit-vectors) of many different sizes, and the precise size is important.
-
Moreover, sizes of various entities are often related. E.g., the shift amount in RV32 should be a 5-bit value and, for RV64, a 6-bit value. In Sail, such relationships can be expressed in types, and are type-checked.
-
-
Sail also statically keeps track of effects (for example, does a certain expression read any registers? Write any registers? …). More about this later.
These two files define XLEN for RV32 and RV64, respectively.
Remember, a complete Sail program is a concatenation of .sail
files,
so we’d use one of these files, depending on whether we are
considering RV32 or RV64.
sail-riscv/model/riscv_xlen32.sail
/* Define the XLEN value for the architecture. */ type xlen : Int = 32 type xlen_bytes : Int = 4 type xlenbits = bits(xlen)
sail-riscv/model/riscv_xlen64.sail
/* Define the XLEN value for the architecture. */ type xlen : Int = 64 type xlen_bytes : Int = 8 type xlenbits = bits(xlen)
In the first two lines of each excerpt, we are defining new types that are numeric.
In the next line we are defining a new type for bit-vectors of size xlen
.
The type bits(
t )
represents the type of
bit-vectors of size t.
Its parameter t must be a numeric type (here, we instantiate it as xlen
).
sail-riscv/model/riscv_reg_type.sail
/* default register type */ type regtype = xlenbits /* default zero register */ let zero_reg : regtype = EXTZ(0x0)
In the first line we’re defining the type of values in registers; it’s the
same type as xlenbits
, which we just saw was defined as bits(xlen)
.
In the second line we’re defining a specific value of this type,
using the library function EXTZ
to zero-extend the constant 0x0
to
the appropriate length. Because of strong type-checking (including
some amount of type inference), Sail knows exactly how much extension
is needed.
Note: the keyword type
introduces a type definition, the keyword
let
introduces a value definition.
sail-riscv/model/riscv_regs.sail
register PC : xlenbits ... register x1 : regtype register x2 : regtype ... register x31 : regtype
In line 1 with keyword register
we declare PC
to be a register,
and we specify the type of values it can contain, xlenbits
. The
remaining lines similarly declare registers x1
…x31
. (There’s no
x0
register because it’s a constant 0.)
sail-riscv/model/riscv_regs.sail
val rX : forall 'n, 0 <= 'n < 32. regno('n) -> xlenbits effect {rreg, escape} function rX r = { let v : regtype = match r { 0 => zero_reg, 1 => x1, ... 31 => x31, _ => {assert(false, "invalid register number"); zero_reg} }; regval_from_reg(v) }
This defines a function rX
that takes a register number r
as
argument and returns the value contained in that register. Line 1,
introduced by the val
keyword, specifies the type of the function.
It can be read as:
For all n in the range 0..31, it takes an argument n that is a
register number, and returns a value of type xlenbits
.
Executing this function can have two possible effects, rreg
(reading
a register) and escape
(abort due to illegal register number).
The next line, introduced by the function
keyword, defines the
function rX
itself, with argument r
. Note that the argument is
not enclosed in parentheses; this is quite common in functional
languages like OCaml, SML, and Haskell.
The let
binding introduces a local variable v
and binds it to the
value of the “pattern-matching” expression in Lines 4-10. This
matches the value r
with each of the subsequent patterns 0, 1, 2,
… 31, returning the value of the right-hand side on first match.
The type of v
is regtype
, i.e., it is a register, and so in Line
11 the regval_from_reg(v)
application reads out the register value,
of type xlenbits
.
In Sail, a block is a series of expressions in in braces, and the value of the last expression is treated as the value of the whole block; here, that is also the result of the function.
Observation: Future improvements in type-checking and pattern analysis
in the Sail compiler should allow us to omit the assert
statement.
This, in turn, should allow us to omit the escape
effect.
sail-riscv/model/riscv_regs.sail
val wX : forall 'n, 0 <= 'n < 32. (regno('n), xlenbits) -> unit effect {wreg, escape} function wX (r, in_v) = { let v = regval_into_reg(in_v); match r { 0 => (), 1 => x1 = v, ... 31 => x31 = v, _ => assert(false, "invalid register number") }; }
This is similar to the rX
read-function. The function
type-declaration in line 1 says its argument is a pair of values, one
a register number and the second a value of type xlenbits
, and its
result type is unit
which is like the “void” type in C, indicating
a value of no particular interest, since this is a pure side effect.
Its effects include wreg
(writing a register) and escape
.
sail-riscv/model/riscv_regs.sail
overload X = {..., rX, wX}
This allows the notation X(r)
to be used to read a register (in
which case it invokes the function rX(r)
), and the notation X(r)=v
to write a register (in which case it invokes the function wX(r,v)
).
In a traditional programming language, we might have:
-
A type definition showing all the different variants of instructions (opcodes, register fields, immediate fields, …).
-
A decode function that describes how to take a 32-bit value into into each of the different instruction variants.
-
An execute function that describes how to execute each variant of instruction.
The problem is that for a given instruction, it would have one clause in the first group (type definitions), one clause in the second group (decode) and one clause in the third group (execute), and these may be quite far apart in the text, possibly in different files. To add a new instruction, one would have to add a clause to each of the groups.
Traditional instruction set manuals, on the other hand “scatter” this same information differently---a page (or a few) per instruction variant, showing:
-
Its fields (opcode, register fields, immediate fields, …).
-
How a 32-bit instruction is decoded/encoded.
-
How it is executed.
Sail supports this more traditional, familiar organization of ISA specs. For each type of instruction, all its relevant information is collected in one place. Or, to view it another way, the type definitions, decode information and execute information are “scattered” across instruction defintions.
We must first introduce the generic information about entities whose
individual definition-clauses will be given later in scattered
fashion. In the concatenation of .sail
files, the following
is given early before any of the scattered definitions:
sail-riscv/model/riscv_insts_begin.sail
scattered union ast /* returns whether an instruction was retired, used for computing minstret */ val execute : ast -> Retired effect {escape, wreg, rreg, wmv, wmvt, eamem, rmem, rmemt, barr, exmem, undef} scattered function execute val encdec : ast <-> bits(32) scattered mapping encdec val assembly : ast <-> string scattered mapping assembly
The first line introduces the type ast
which is a union of all the
different variants of instructions. Each variant will follow later,
in a scattered fashion. Here, ast
stands for Abstract Syntax Tree,
the decoded view of an instruction.
The next line declares the type of the execute
function. It takes
an argument whose type is ast
and returns a value of type Retired
,
which indicates whether it should be counted as a retired instruction
or not. It also specifies all the possible effects of an instruction,
such as aborting (escape
), writing and reading registers (wreg
,
rreg
), reading memory, and so on. The following line indicates that
execute
's definition will be scattered.
The next line declares the type of the encdec
mapping. The <->
notation says it is a mapping, which is a pair of functions
converting from a 32-bit value (instruction) to is decoded view (ast),
and vice versa. When applied to an ast
argument it produces a
bits(32)
result, and when applied to a bits(32)
argument it
produces an ast
result. The following line indicates that its
definition will be scattered.
The next line declares the type of the assembly
mapping that
converts from a string to a decoded instruction and vice versa, and
the following line indicates that its definition will be scattered.
The top of each page in The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, Chapter 25 Instruction Set Listings shows the RISC-V instruction formats:
-
The least-significant 7 bits provide a major opcode.
-
The funct3 and funct7 fields (and sometimes the immediate fields) often specify sub-opcodes.
-
The rs1, rs2 and rd fields are 5-bit values specifying source and destination registers.
-
Immediate values are often composed from non-trivial permutation of imm instruction fields.
We declare convenient types for instruction fields.
sail-riscv/model/riscv_types.sail
type regidx = bits(5) type cregidx = bits(3) /* identifiers in RVC instructions */ type csreg = bits(12) /* CSR addressing */ ... type opcode = bits(7) type imm12 = bits(12) type imm20 = bits(20) ...
These are definitions for register indexes, register indexes in compressed instructions, CSR register addresses, major opcodes, and 12-bit and 20-bit immediates.
Earlier we declared ast
to be a union
type, i.e., a type with
several variants. We also declared that the variants would be
provided later in scattered clauses.
We now provide one of those clauses, for U-format instructions (LUI and AUIPC):
sail-riscv/model/riscv_insts_base.sail
union clause ast = UTYPE : (bits(20), regidx, uop)
This says: one variant of the ast
type is called UTYPE
. It
contains 3 fields (identified positionally, not with keywords) whose
types are, respectively, a bit-vector of 20 bits, a register index,
and a uop
which identifies whether it’s an LUI or AUIPC.
Note: Sail unions are similar to “algebraic types” or “tagged unions” in other programming languages. Each value of a tagged union carries a way (a “tag”) by which we can query which variant this value encodes.
In Sail, as is common in functional programming languages, values of union type are usually analyzed in “pattern-matching” statements, which are like case/switch statements where each clause matches a variant of the union.
Earlier, we declared a scattered mapping (a function and its inverse) encdec
along with its type:
sail-riscv/model/riscv_insts_begin.sail
val encdec : ast <-> bits(32) scattered mapping encdec
We now provide one such clause, showing how to encode/decode LUI and AUIPC instructions.
sail-riscv/model/riscv_insts_base.sail
mapping encdec_uop : uop <-> bits(7) = { RISCV_LUI <-> 0b0110111, RISCV_AUIPC <-> 0b0010111 } mapping clause encdec = UTYPE(imm, rd, op) <-> imm @ rd @ encdec_uop(op)
The first four lines define a new, local mapping between the
bit-encodings of the 7-bit opcode in a U-format instruction to a value
of uop
type, i.e., the symbolic names for the corresponding
instructions.
The last two lines add a scattered clause to the encdec
mapping.
The left-hand-side of <-> shows the decoded view, i.e., a UTYPE
ast
. The right-hand side shows a bit-concatenation. The prior
declarations allow Sail to infer that imm
, rd
, and
encddec_op(op)
are are 20-bit, 5-bit and 7-bit fields, respectively,
and that the concatenation is a 32-bit value,
Earlier, we declared a scattered function execute
and its type:
sail-riscv/model/riscv_insts_begin.sail
val execute : ast -> Retired effect {escape, wreg, rreg, wmv, wmvt, eamem, rmem, rmemt, barr, exmem, undef} scattered function execute
Here is the definition of the Retired
type:
sail-riscv/model/riscv_types.sail
enum Retired = {RETIRE_SUCCESS, RETIRE_FAIL}
Since it is a type with 2 values, we could have used the bool
type
for this, but (a) defining a new type provides more readable names,
and (b) this prevents accidental confusion of random booleans where a
Retired
value is expected.
We now provide one of the clauses for execute
, for LUI and AUIPC
instructions.
sail-riscv/model/riscv_insts_base.sail
function clause execute UTYPE(imm, rd, op) = { let off : xlenbits = EXTS(imm @ 0x000); let ret : xlenbits = match op { RISCV_LUI => off, RISCV_AUIPC => get_arch_pc() + off }; X(rd) = ret; RETIRE_SUCCESS }
In the first line, the argument to the execute
function is given as
a pattern UTYPE(imm, rd, op)
. Remember execute
can be applied
to any value of type ast
. The pattern here ensures that this clause
will only be relevant to those ast
values that are of the UTYPE
variant. On a successful match, it also binds the names imm
, rd
and op
to the three fields of the decoded instruction, so we can use
these variables in the body of the function.
Strong-typing assures us that imm
is of type bits(20)
, i.e., a
bit-vector of length 20. In Line 2, we concatenate this with the
12-bit value 0x000, giving us a 32-bit value. Then, we use EXTS
to
sign-extend it as necessary. This does nothing in RV32, since it’s
already a 32-bit value, and it sign-extends it to 64 bits in RV64.
The result is bound to the local variable off
of type xlenbits
.
The third line binds local variable ret
, of type xlenbits
, to the
right-hand side, which is a pattern-matching expression despatching on
op
. When it matches RISCV_LUI
, the value is just off
. When it
matches RISCV_AUIPC
, the value is added to get_arch_pc()
, which
retrieves the value of the program counter in the current machine
state.
The penultimate line assigns this value to register rd
, using the
overloading of X
we saw earlier.
The final line is the constant expression RETIRE_SUCCESS
. Being the
last expression in the block, and the block being the body of the
function, this is the value returned by the function. It’s type is
Retire
, as given in the execute
function’s type declaration.
We first define a mapping (function and its inverse) to convert the
sub-opcode uop
to a string and back:
sail-riscv/model/riscv_insts_base.sail
mapping utype_mnemonic : uop <-> string = { RISCV_LUI <-> "lui", RISCV_AUIPC <-> "auipc" }
Then, we add a scattered clause to our previously introduced assembly
mapping:
sail-riscv/model/riscv_insts_base.sail
mapping clause assembly = UTYPE(imm, rd, op) <-> utype_mnemonic(op) ^ spc() ^ reg_name(rd) ^ sep() ^ hex_bits_20(imm)
-
the caret operator concatenates strings;
spc()
andsep()
return strings for spaces and commas; -
reg_name(r)
returns the string name for its register-number argument; -
hex_bits_20()
returns a string showing a hex printing of a 20-bit value.
Conditional branch instructions include BEQ, BNE, BLT, BGE, BLTU, and BGEU. We define symbolic names:
sail-riscv/model/riscv_types.sail
enum bop = {RISCV_BEQ, RISCV_BNE, RISCV_BLT, RISCV_BGE, RISCV_BLTU, RISCV_BGEU} /* branch ops */
Branch instructions are encoded in the B-format:
Our abstract (decoded) ast view is:
sail-riscv/model/riscv_insts_base.sail
union clause ast = BTYPE : (bits(13), regidx, regidx, bop)
-
The branch offset immediate value is 13 bits composed from 12 bits in the instruction, with 0 appended as the least-significant bit.
-
The 12 bits come from non-contiguous 7-bit and 5-bit fields in the instruction.
-
Our ast (decoded) view holds the 13-bit offset (computed in the
encdec
function to be shown shortly).
We define a mapping converting the 3-bit funct3
field in the
instruction to its abstract names:
sail-riscv/model/riscv_insts_base.sail
mapping encdec_bop : bop <-> bits(3) = { RISCV_BEQ <-> 0b000, RISCV_BNE <-> 0b001, RISCV_BLT <-> 0b100, RISCV_BGE <-> 0b101, RISCV_BLTU <-> 0b110, RISCV_BGEU <-> 0b111 }
Then, we add a scattered clause to the encdec
mapping:
sail-riscv/model/riscv_insts_base.sail
mapping clause encdec = BTYPE(imm7_6 @ imm5_0 @ imm7_5_0 @ imm5_4_1 @ 0b0, rs2, rs1, op) <-> imm7_6 : bits(1) @ imm7_5_0 : bits(6) @ rs2 @ rs1 @ encdec_bop(op) @ imm5_4_1 : bits(4) @ imm5_0 : bits(1) @ 0b1100011
Observe the 13-bit offset is composed by extracting bits from various places in the instruction.
We add a scattered clause to the execute
function. The first part is straightforward:
sail-riscv/model/riscv_insts_base.sail
function clause execute (BTYPE(imm, rs2, rs1, op)) = { let rs1_val = X(rs1); let rs2_val = X(rs2); let taken : bool = match op { RISCV_BEQ => rs1_val == rs2_val, RISCV_BNE => rs1_val != rs2_val, RISCV_BLT => rs1_val <_s rs2_val, RISCV_BGE => rs1_val >=_s rs2_val, RISCV_BLTU => rs1_val <_u rs2_val, RISCV_BGEU => rs1_val >=_u rs2_val }; let t : xlenbits = PC + EXTS(imm); ... }
-
Line 4 computes
taken
, indicating whether the branch is taken or not. It does a pattern-match on the sub-opcodeop
. Note that BLT and BLTU are supposed to interpret their argument as signed and unsigned values, respectively. This is encoded by using different Sail pre-defined comparison operators<_s
and<_u
, respectively. -
The
let t
line computest
, the branch target of typexlenbits
by adding a sign-extension of the immediate to the PC.
The next section of the execute
function clause performs different
actions depending on whether the branch is taken or not:
sail-riscv/model/riscv_insts_base.sail
function clause execute (BTYPE(imm, rs2, rs1, op)) = { ... if taken then { ... ... } else RETIRE_SUCCESS }
If the branch is not taken, there is no further action and the result
is RETIRE_SUCCESS
.
If the branch is taken, we first check that the branch target PC is valid.
sail-riscv/model/riscv_insts_base.sail
if taken then { ... <some code elided> ... if bit_to_bool(target[1]) & (~ (haveRVC())) then { handle_mem_exception(target, E_Fetch_Addr_Align()); RETIRE_FAIL; } else { set_next_pc(target); RETIRE_SUCCESS }
-
Line 3 checks the requirement that, without the “C” ISA extension (compressed instructions), the branch target must be 4-byte aligned, i.e., bit [1] must be 0.
bit_to_bool
converts a value ofbits(1)
type tobool
type (we could have also used==1
).haveRVC
checks if the C extension is active. If the target is not ok, in the next line we invoke functionhandle_mem_exception
to perform exception actions and return failure. If the target is ok, the next line assigns the target to the next PC and we return success. -
Our
<some code elided>
on Line 2 contains additional checks for target validity that may be required by any other extensions.
We first define a mapping (function and its inverse) to convert the
sub-opcode bop
to a string and back:
sail-riscv/model/riscv_insts_base.sail
mapping btype_mnemonic : bop <-> string = { RISCV_BEQ <-> "beq", RISCV_BNE <-> "bne", RISCV_BLT <-> "blt", RISCV_BGE <-> "bge", RISCV_BLTU <-> "bltu", RISCV_BGEU <-> "bgeu" }
Then, we add a scattered clause to our previously introduced assembly
mapping:
sail-riscv/model/riscv_insts_base.sail
mapping clause assembly = BTYPE(imm, rs2, rs1, op) <-> btype_mnemonic(op) ^ spc() ^ reg_name(rs1) ^ sep() ^ reg_name(rs2) ^ sep() ^ hex_bits_13(imm)
-
the caret operator concatenates strings;
spc()
andsep()
return strings for spaces and commas; -
reg_name(r)
returns the string name for its register-number argument; -
hex_bits_13()
returns a string showing a hex printing of a 13-bit value.
The general scheme for each new instruction, or new class of instructions, should be clear by now:
-
Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
-
Augment the
ast
type by adding a scattered clause to describe this new class -
Augment the
encdec
mapping by adding a scattered clause to describe this new class -
Augment the
execute
function by adding a scattered clause to describe this new class -
Augment the
assembly
mapping by adding a scattered clause to describe this new class
It is a stylistic judgement call whether you define a class with
sub-opcodes, or just define a separate clause for each instruction in
the class. E.g., we could have defined separate ast
, encdec
,
execute
and assembly
clauses for BEQ, BNE, BLT, …
A class with sub-opcodes makes sense when the instructions share structure and semantics. For example, BEQ/BNE/BLT/… differ only in the particular comparison operator; using a class with sub-opcodes captures this similarity.
For the remaining examples we’ll focus on the execute
function only
(the encdec
and assembly
clauses are similar to the previous
examples).
Memory-access instructions involve many more steps, since they can involve alignment checks, virtual address-to-physical address translation, physical memory protection checks, ordering relationships with other memory accesses, and so on. Many of these can trap (raise an exception).
The header of the scattered clause of execute
to handle memory-load instructions is:
sail-riscv/model/riscv_insts_base.sail
function clause execute(LOAD(imm, rs1, rd, is_unsigned, width, aq, rl)) = {
The arguments are the
-
the immediate, rs1 and rd fields from the instruction;
-
whether the loaded value is treated as signed or unsigned, i.e., whether the loaded value should be sign-extended or zero-extended to the width of the destination register;
-
the width to be loaded: byte, halfword (2 bytes), word (4 bytes) or double (8 bytes);
-
the acquire/release semantics for memory ordering.
The next step is to compute the actual (virtual) address to be accessed:
sail-riscv/model/riscv_insts_base.sail
let offset : xlenbits = EXTS(imm); match ext_data_get_addr(rs1, offset, Read(Data), width) { Ext_DataAddr_Error(e) => { ext_handle_data_check_error(e); RETIRE_FAIL }, Ext_DataAddr_OK(vaddr) => ...
After computing the offset by sign-extending the immediate value, it
invokes the function ext_data_get_addr
to perform a signed addition
of the offset to the contents of rs1. This function is defined in
riscv_addr_checks.sail
. By encapsulating this addition in a
function, we allow future extensibility to new ISA extensions that may
perform additional checks/transformations on the address.
This function can return an error, but in the normal simple case
without additional ISA extensions it returns Ext_DataAddr_OK(vaddr)
containing the effective virtual address. We use pattern-matching (a
match
expression) to distinguish these two outcomes. Next:
sail-riscv/model/riscv_insts_base.sail
if check_misaligned(vaddr, width) then { handle_mem_exception(vaddr, E_Load_Addr_Align()); RETIRE_FAIL } else match translateAddr(vaddr, Read(Data)) { ...
The function check_misaligned(vaddr, width)
optionally checks if the
access is aligned for the requested width. This function is defined a
little earlier in the file and returns true it is misaligned and if
we’ve configured the model to disallow misaligned accesses. If we’ve
configured the model to allow misaligned accesses, this function will
always return False
.
If ok, it invokes translateAddr(vaddr, Read(Data)
to
optionally translate virtual addresses to physical addresses.
This function is defined in a collection of files:
riscv_vmem_types.sail, riscv_vmem_common.sail riscv_vmem_rv32.sail, riscv_vmem_sv32.sail riscv_vmem_rv64.sail, riscv_vmem_sv39.sail, riscv_vmem_sv48.sail riscv_vmem_tlb.sail
different subsets of which are used depending on whether we’re modeling RV32 or RV64, and the Sv32, Sv39 or Sv48 virtual memory schemes.
The translateAddr
function simply returns the address as-is if not
running with virtual memory.
In the virtual-memory translation functions, you’ll notice that they also model a TLB (Translation Lookaside Buffer). This is because TLBs are visible in the semantics via the SFENCE.VMA instruction.
Finally:
sail-riscv/model/riscv_insts_base.sail
else match translateAddr(vaddr, Read(Data)) { TR_Failure(e, _) => { handle_mem_exception(vaddr, e); RETIRE_FAIL }, TR_Address(addr, _) => match (width, sizeof(xlen)) { (BYTE, _) => process_load(rd, vaddr, mem_read(Read(Data), addr, 1, aq, rl, false), is_unsigned), (HALF, _) => process_load(rd, vaddr, mem_read(Read(Data), addr, 2, aq, rl, false), is_unsigned), (WORD, _) => process_load(rd, vaddr, mem_read(Read(Data), addr, 4, aq, rl, false), is_unsigned), (DOUBLE, 64) => process_load(rd, vaddr, mem_read(Read(Data), addr, 8, aq, rl, false), is_unsigned)
If the virtual-to-physical translation was successful, we invoke
mem_read
to perform the raw memory read, and pass the result to
process_load
to process the result (which could be an exception,
e.g, if there is no memory at that address).
The first three clauses of the match
expression use the wildcard
pattern _
in the second component, since these sizes are valid in
RV32 and RV64. The fourth clause will only match when the second
component is 64, i.e., it restricts it to RV64.
RISC-V has
-
interrupts (asynchronous exceptions, conceptually “between” any two instructions)
-
traps (synchronous exceptions, due to execution of an instruction)
The different kinds of interrupts are Software, Timer and External and are delivered at User, Supervisor or Machine privilege levels:
sail-riscv/model/riscv_types.sail
enum InterruptType = { I_U_Software, I_S_Software, I_M_Software, I_U_Timer, I_S_Timer, I_M_Timer, I_U_External, I_S_External, I_M_External }
This is followed by a function to convert bit-encodings to these symbolic names:
sail-riscv/model/riscv_types.sail
val interruptType_to_bits : InterruptType -> bits (8) function interruptType_to_bits (i) = match (i) { I_U_Software => 0x00, I_S_Software => 0x01, I_M_Software => 0x03, I_U_Timer => 0x04, I_S_Timer => 0x05, I_M_Timer => 0x07, I_U_External => 0x08, I_S_External => 0x09, I_M_External => 0x0b }
A mapping would be more expressive than a function, but since we don’t decode interrupts/exceptions, we don’t need the inverse function.
The different kinds of traps, and converting to bits:
sail-riscv/model/riscv_types.sail
union ExceptionType = { E_Fetch_Addr_Align : unit, E_Fetch_Access_Fault : unit, E_Illegal_Instr : unit, E_Breakpoint : unit, E_Load_Addr_Align : unit, E_Load_Access_Fault : unit, E_SAMO_Addr_Align : unit, E_SAMO_Access_Fault : unit, E_U_EnvCall : unit, E_S_EnvCall : unit, E_Reserved_10 : unit, E_M_EnvCall : unit, E_Fetch_Page_Fault : unit, E_Load_Page_Fault : unit, E_Reserved_14 : unit, E_SAMO_Page_Fault : unit } val exceptionType_to_bits : ExceptionType -> exc_code function exceptionType_to_bits(e) = match (e) { E_Fetch_Addr_Align() => 0x00, E_Fetch_Access_Fault() => 0x01, ... }
Note
|
I think this could also have been written as an enum. The
|
Some traps may carry additional information. In Sail (and OCaml),
optional information is usually expressed using the option
predefined type:
union option ('a : Type) = { Some : 'a, None : unit }
i.e., the Some
variant carries some additional information
(generic/polymorphic type 'a
), and the None
variant carries no
additional information.
sail-riscv/model/riscv_sync_exception.sail
struct sync_exception = { trap : ExceptionType, excinfo : option(xlenbits), ext : option(ext_exception) /* for extensions */ }
The trap
field is necessary information. The other two fields carry
optional information, for standard traps (such as an address that
provoked a trap), and also for future standard or non-standard ISA
extensions.
The handle_mem_exception
action function we saw earlier in
conditional branches with illegal branch targets is:
sail-riscv/model/riscv_sys_control.sail
function handle_mem_exception(addr : xlenbits, e : ExceptionType) -> unit = { let t : sync_exception = struct { trap = e, excinfo = Some(addr), ext = None() } in set_next_pc(exception_handler(cur_privilege, CTL_TRAP(t), PC)) }
The let t
line constructs a sync_exception
value, filling in
the address as optional exception info, and binds it to the local
variable t
.
The next line invokes a more general exception_handler
.
sail-riscv/model/riscv_sys_control.sail
function exception_handler(cur_priv : Privilege, ctl : ctl_result, pc: xlenbits) -> xlenbits = { match (cur_priv, ctl) { (_, CTL_TRAP(e)) => { let del_priv = exception_delegatee(e.trap, cur_priv); ... trap_handler(del_priv, false, exceptionType_to_bits(e.trap), pc, e.excinfo, e.ext) }, (_, CTL_MRET()) => { ... } (_, CTL_SRET()) => { ... } (_, CTL_URET()) => { ... } }
Line 5 checks if the current trap, at the current privilege level, is being delegated to be handled at a different privilege level (returning that privilege level or the current privilege level).
Line 7 invokes an even more general trap handler (below).
Lines 9-11 handle exception returns from the Machine, Supervisor and User privilege levels, respectively.
sail-riscv/model/riscv_sys_control.sail
function trap_handler(del_priv : Privilege, intr : bool, c : exc_code, pc : xlenbits, info : option(xlenbits), ext : option(ext_exception)) -> xlenbits = { cancel_reservation(); /* for LR/SC */ match (del_priv) { Machine => { mcause->IsInterrupt() = bool_to_bits(intr); mcause->Cause() = EXTZ(c); mstatus->MPIE() = mstatus.MIE(); mstatus->MIE() = 0b0; mstatus->MPP() = privLevel_to_bits(cur_privilege); mtval = tval(info); mepc = pc; cur_privilege = del_priv; prepare_trap_vector(del_priv, mcause) }, Supervisor => { ... } User => { ... }
This is an intricate but otherwise unremarkable assignment of certain values to certain CSRs.
The last line of the Machine
case invokes prepare_trap_vector
(in
file riscv_sys_extensions.sail
) which returns the PC that is in
mtvec
, stvec
, or utvec
, as appropriate.
So far, we’ve only talked about the decode and execute function for individual instructions. We’ve said nothing about how and when these get invoked, nor about how instructions are fetched.
This separation is deliberate. We may wish to build several different processor models: pipelined, superscalar, multi-hart, and so on. Each of these would be a different top-level system, with its own system-level semantics, but they can all share the individual instruction semantics discussed so far.
In this section we’ll sketch one such encapsulating model, which is
used in the default simulators built from the model. This model,
shown in files main.sail
and riscv_step.sail
implement a simple,
sequential, unpipelined, one-instruction-at-a-time fetch-execute loop
().
Section Concurrency and Connection to RISC-V Weak Memory Model discusses alternatives, such as concurrent composition of instruction semantics, for more accurate modeling of out-of-order processors and weak memory models.
The top-level function initializes the PC to 0x1000, initializes the model as a whole (including certain CSRs and registers), and then invokes the fetch-execute loop:
sail-riscv/model/main.sail
function main () : unit -> unit = { PC = sail_zero_extend(0x1000, sizeof(xlen)); init_model(); loop() }
Note, the 0x1000 initial value is not part of the spec, it is an
implementation choice by the “platform”. The loop()
function, in
turn, repeatedly performs a fetch-execute step:
sail-riscv/model/riscv_step.sail
function loop () : unit -> unit = { while (...) do { let stepped = step(step_no); ... } }
In each iteration of the loop, we perform a step()
:
sail-riscv/model/riscv_step.sail
function step(step_no : int) -> bool = { let (retired, stepped) : (Retired, bool) = match dispatchInterrupt(cur_privilege) { Some(intr, priv) => { handle_interrupt(intr, priv); (RETIRE_FAIL, false) }, None() => { let f : FetchResult = ext_fetch_hook(fetch()); match f { F_RVC(h) => { let ast = decodeCompressed(h); if haveRVC() then { nextPC = PC + 2; (execute(ext_post_decode_hook(ast)), true) } else { handle_illegal(); (RETIRE_FAIL, true) } }, F_Base(w) => { let ast = decode(w); nextPC = PC + 4; (execute(ext_post_decode_hook(ast)), true) } ...
The step()
function first checks for interrupts and handles it if
there is one. To check this, it will consult various CSRs including
MSTATUS, MIP, MIE.
If there is no interrupt, it fetches an instruction and decides whether its an RVC (compressed) instruction or a base instruction. In each case, it decodes it and executes it.
By this time we hope you’re getting the hang of reading the Sail code that expresses the semantics of RISC-V instructions.
Some observations:
-
In many senses, Sail is “just another” programming language. Many of its notations and features are taken from or inspired by the functional programming language OCaml (whose heritage, in turn, goes back to SML and ML).
-
Expressing the semantics of RISC-V instructions is an exercise in coding in this programming language.
-
In principle, a simulator for RISC-V written in any programming language, including C/C++, is a formalization of the RISC-V ISA. So what makes Sail a preferred “host language” for the Formal Spec?
-
The host language needs to be small, and its semantics needs to be simple, clear, and formalizable. This is true of Sail, but not true of most widely known programming languages (including C and C++).
-
Sail’s features like numeric types with type-checking, scattered definitions, mappings, bit-vectors with type-encoded lengths all make it into a DSL (Domain Specific Language) that is particularly suited for expressing ISAs.
-
The RISC-V ISA has many optional parts; Sail’s powerful parameterization is useful in expressing this cleanly.
-
RISC-V implementations can vary highly in their amount of concurrency and allowed non-determinism, from completely sequential fetch-decode-execute simulators to pipelining, speculation, superscalarity, out-of-order, multi-hart, multi-core, and weak memory models. Sail is able to express all such variants cleanly.
-
Sail’s simple, clean, semantics make it suitable for connecting to well-known formal-method tools (such as Coq, Isabelle, HOL4), for automated reasoning about properties (including correctness) of RISC-V programs and artefacts that manipulate or interpret RISC-V programs, such as hardware implementations, simulators, compilers.
-
The following diagram gives an Overview of module dependencies in the
Sail RISC-V spec. (From: doc/figs/riscvspecdeps.svg
in GitHub
repository https://github.com/rems-project/sail-riscv).
If you have not already done so, please follow both Step A and Step B
described in document Installation.html
in the repository
https://github.com/rsnikhil/RISCV_ISA_Spec_Tour
to download/build/install executable versions of the formal spec.
Step A clones the repository
https://github.com/rems-project/sail-riscv, with the Sail RISC-V
spec in the model/
directory (this is the code we’ve been studying
so far in this tutorial).
Step B takes you through these steps:
-
Install Opam, the package manager for OCaml;
-
Using Opam, install OCaml
-
Using Opam, install Sail
-
Using Ocaml and Sail, build executable versions of the Sail RISC-V spec.
As a result, you should now have the following executables in your sail-riscv repository:
$ pwd; ls c_emulator/riscv_sim_RV* /home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv c_emulator/riscv_sim_RV32* c_emulator/riscv_sim_RV64*
The directory sail-riscv/test/riscv-tests/
has a full suite of
pre-compiled standard RISC-V ISA tests. Each has an ELF file (RISC-V
binary) and a disassembly (text file) of the test. Examples:
Example of ISA test ELF files (RISC-V executables) and disassembly (dump) text files:
sail-riscv/
$ ls -1 test/riscv-tests/rv64ui-p-add* test/riscv-tests/rv64ui-p-add.elf test/riscv-tests/rv64ui-p-add.dump test/riscv-tests/rv64ui-p-addi.elf test/riscv-tests/rv64ui-p-addi.dump test/riscv-tests/rv64ui-p-addiw.elf test/riscv-tests/rv64ui-p-addiw.dump test/riscv-tests/rv64ui-p-addw.elf test/riscv-tests/rv64ui-p-addw.dump
Using the C-based simulator we can execute, for example, the
rv64ui-p-add
ISA test for the ADD instruction:
sail-riscv/
$ ./c_emulator/riscv_sim_RV64 test/riscv-tests/rv64ui-p-add.elf Tue Dec 10 07:37:05 2019 ... Running file test/riscv-tests/rv64ui-p-add.elf. ELF Entry @ 0x80000000 CSR mstatus <- 0x0000000A00000000 (input: 0x0000000000000000) mem[X,0x0000000000001000] -> 0x0297 mem[X,0x0000000000001002] -> 0x0000 [0] [M]: 0x0000000000001000 (0x00000297) auipc t0, 0 ... [1] [M]: 0x0000000000001004 (0x02028593) addi a1, t0, 32 ... [2] [M]: 0x0000000000001008 (0xF1402573) csrrs a0, zero, mhartid ... [477] [M]: 0x0000000080000044 (0xFC3F2023) sw gp, 4032(t5) htif[0x0000000080001000] <- 0x00000001 htif-syscall-proxy cmd: 0x000000000001 SUCCESS
During execution of the RISC-V binary, it prints out a trace of instructions executed (PC, instruction, assembly).
Another example: the rv32um-v-mulhsu
test for the MULHSU instruction
in virtual-memory mode:
sail-riscv/
$ ./c_emulator/riscv_sim_RV32 test/riscv-tests/rv32um-v-mulhsu.elf Tue Dec 10 07:46:33 2019 Running file test/riscv-tests/rv32um-v-mulhsu.elf. ELF Entry @ 0x80000000 [0] [M]: 0x00001000 (0x00000297) auipc t0, 0 ... [20652] [S]: 0xFFC02270 (0x0106A023) sw a6, 0(a3) htif[0x80001000] <- 0x00000001 htif-syscall-proxy cmd: 0x000000000001 SUCCESS
During execution of the RISC-V binary, it prints out a trace of instructions executed (PC, instruction, assembly).
You can execute any of the *.elf
tests in directory
sail-riscv/test/riscv-tests/
in the same way.
FYI, for those who wish to explore the OCaml-based simulators and/or connections to various formal tools.
The make
command in Step B.4 of the Installation.html
document
(without the csim
argument) also makes:
-
OCaml-based executable versions of the spec, in directory
./ocaml_emulator/
. These are run in the same way as the C-based simulators of the previous examples. -
Material to connect to formal tools Coq, Isabelle, HOL4, etc. Please see documentation in the repository about these options.
RISC-V International Association (RVIA) is developing a “Compliance” suite for checking whether an implementation (whether in hardware or simulation) is compliant with the RISC-V Formal Specification. This is a work-in-progress; as of December 2020, compliance suites are ready only for some RV32I subsets of the ISA. RVIA will gradually grow the Compliance Suite to encompass the full ISA.
RVIA’s Compliance Suite is in this repository: https://github.com/riscv/riscv-compliance, where you can find details about the Compliance Suite, how to run it on new implementations, etc.
Each Compliance Test program is an ELF file compiled for a particular RISC-V ISA configuration. When the program is executed on a RISC-V implementation, it produces a certain output called a “signature”. This signature is compared for an exact match with a particular “expected value” in order to decide whether the implementation has passed this test or not. (Since
In this section we show how to execute the tests in the Compliance Suite using the C simulator compiled from the RISC-V Formal Spec in Sail. The process is highly automated, and should automatically encompass new tests as the Compliance suite grows.
First, please ensure you have downloaded/built/installed executable versions of the RISC-V Formal Spec in Sail, as described in Section Reminder: Download and Install.
Clone a copy of the RISC-V International Association’s “Compliance” repository:
$ git clone https://github.com/riscv/riscv-compliance
Set up your environment for RISC-V compiler tools gcc and friends (the Compliance scripts will use this to re-compile compliance tests).
$ export RISCV=<your toolchain_installation_dir>/riscv64 $ export PATH=$RISCV/bin:$PATH
Spot check that we’ve got the toolchain setup:
$ which riscv64-unknown-elf-gcc /home/nikhil/git_clones/RISCV_Gnu_Toolchain/riscv64/bin/riscv64-unknown-elf-gcc $ riscv64-unknown-elf-gcc --version riscv64-unknown-elf-gcc (GCC) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Setup up your PATH
environment variable to include your
clone-directory of the sail-riscv
repository, so that the Compliance
scripts know where to find the executable versions of the Sail RISC-V
spec:
$ export SAIL_RISCV=<path to your clone of sail-riscv repository>/sail-riscv $ export PATH=$SAIL_RISCV/c_emulator:${PATH}
Spot check that you have the RISC-V Formal Spec’s C simulators:
$ which riscv_sim_RV32 riscv_sim_RV64 /home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv/c_emulator/riscv_sim_RV32 /home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv/c_emulator/riscv_sim_RV64
Finally, the following will execute all relevant variants of the Compliance test suite:
riscv-compliance/
$ make RISCV_TARGET=sail-riscv-c all_variant
Your terminal output will look something like this:
for isa in rv32i rv32im rv32imc rv32Zicsr rv32Zifencei; do \ ... ... Compile /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.elf Execute /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.log Running file /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.elf. ELF Entry @ 0x80000000 begin_signature: 0x80002000 end_signature: 0x80002090 CSR mstatus <- 0x00000000 (input: 0x00000000) ... SUCCESS ... Compile /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.elf Execute /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.log Running file /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.elf. ELF Entry @ 0x80000000 begin_signature: 0x80002000 end_signature: 0x80002090 CSR mstatus <- 0x00000000 (input: 0x00000000) ... SUCCESS ... riscv-test-env/verify.sh ... Compare to reference files ... Check I-ADD-01 ... OK Check I-ADDI-01 ... OK ... Check I-XORI-01 ... OK -------------------------------- OK: 48/48 RISCV_TARGET=sail-riscv-c RISCV_DEVICE=rv32i RISCV_ISA=rv32i ...
In the transcript, you will see the results for each of the ISA groups
mentioned in the for isa in rv32i, rv32im, …
line at the top. For
each test, it compiles the ELF file and executes it on the C
simulator, and then verifies the output against the expected output.
The general scheme for adding a new instruction, or new class of instructions is (these bullets repeat information from Section Taking stock.):
-
Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
-
Augment the
ast
type by adding a scattered clause to describe this new class -
Augment the
encdec
mapping by adding a scattered clause to describe this new class -
Augment the
execute
function by adding a scattered clause to describe this new class -
Augment the
assembly
mapping by adding a scattered clause to describe this new class
It is a stylistic judgement call whether you define a class with
sub-opcodes, or just define a separate clause for each instruction in
the class. E.g., we could have defined separate ast
, encdec
,
execute
and assembly
clauses for BEQ, BNE, BLT, …
A class with sub-opcodes makes sense when the instructions share structure and semantics. For example, BEQ/BNE/BLT/… differ only in the particular comparison operator; using a class with sub-opcodes captures this similarity.
The implementation of F and D extensions (single and double-precision floating point) in the RISC-V ISA Formal Spec in Sail is somewhat atypical in that it is not currently wholly written in Sail.
-
All the RISC-V-specific parts are indeed written in Sail—everything about instruction bit representation, encoding/decoding, assembly-language mapping, floating point registers and CSRs, and opcode-based dispatching is done in the canonical way in Sail.
In this section, we only discuss this part.
-
For the actual IEEE arithmetic, we call out to a collection of standard external C routines that emulate IEEE Floating Point operations. This is the well known “Berkeley SoftFloat” emulation of IEEE FP, from https://github.com/ucb-bar/berkeley-softfloat-3.git. Berkeley Softfloat is widely used across the industry and has been well tested over many years.
In principle, this part could also be coded in Sail, but it is quite a lot of work with marginal immediate benefit, since Berkeley softfloat has been so very well tested. Coding in Sail could be a future project for some interested party.
Since F,D define a new register width FLEN
, we created two new files
by analogy with riscv_xlen32.sail
and riscv_xlen64.sail
. (This
would not be necessary for extensions that don’t define new widths):
sail-riscv/model/
riscv_flen_F.sail riscv_flen_D.sail
Since F,D define new registers, we created two new files by analogy
with riscv_reg_type.sail
and riscv_regs.sail
. (This would not be
necessary for extensions that don’t define new registers):
sail-riscv/model/
riscv_freg_type.sail riscv_fdext_regs.sail
Since F,D define new CSRs, we created a new file by analogy with
riscv_sys_control.sail
. (This would not be necessary for extensions
that don’t define new CSRs):
sail-riscv/model/
riscv_fdext_control.sail
Finally, the semantics of F and D instructions are in two new files:
sail-riscv/model/
riscv_insts_fext.sail riscv_insts_dext.sail
The following two files are for the floating point instructions in the
C
(Compressed instructions) ISA extension:
sail-riscv/model/
riscv_insts_cfext.sail riscv_insts_cdext.sail
If we study riscv_insts_fext.sail
, it starts with a number of
help-functions. Then, it goes through each class of F instruction,
with each class following the standard pattern:
-
Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
-
Augment the
ast
type by adding a scattered clause to describe this new class -
Augment the
encdec
mapping by adding a scattered clause to describe this new class -
Augment the
execute
function by adding a scattered clause to describe this new class -
Augment the
assembly
mapping by adding a scattered clause to describe this new class
For example the first such class is for Floating-point load
instructions. It starts with a scattered clause to define the ast
:
sail-riscv/model/riscv_insts_fext.sail
/* FLW and FLD; W/D is encoded in 'word_width' */ union clause ast = LOAD_FP : (bits(12), regidx, regidx, word_width)
Then, the scattered clause for the encdec
mapping (encode/decode):
sail-riscv/model/riscv_insts_fext.sail
mapping clause encdec = LOAD_FP(imm, rs1, rd, WORD) if is_RV32F_or_RV64F() <-> imm @ rs1 @ 0b010 @ rd @ 0b000_0111 if is_RV32F_or_RV64F() mapping clause encdec = LOAD_FP(imm, rs1, rd, DOUBLE) if is_RV32D_or_RV64D() <-> imm @ rs1 @ 0b011 @ rd @ 0b000_0111 if is_RV32D_or_RV64D()
Then, the scattered clause for the execute
function (semantics):
sail-riscv/model/riscv_insts_fext.sail
function clause execute(LOAD_FP(imm, rs1, rd, width)) = { let offset : xlenbits = EXTS(imm); ...
And, finally, the scattered clause for the assembly
notation:
sail-riscv/model/riscv_insts_fext.sail
mapping clause assembly = LOAD_FP(imm, rs1, rd, width) <-> "fl" ...
The Sail files described in the previous sections are enough for reading and understanding the spec.
In order to incorporate it into the generated simulators (C-based and
OCaml based), we also need to update the Makefile. They following display
shows the key lines updated or added to the Makefile (the lines
for F,D are marked here with +
on the left):
sail-riscv/Makefile
+ # Currently, we only have F with RV32, and both F and D with RV64. ifeq ($(ARCH),RV32) SAIL_XLEN := riscv_xlen32.sail + SAIL_FLEN := riscv_flen_F.sail else ifeq ($(ARCH),RV64) SAIL_XLEN := riscv_xlen64.sail + SAIL_FLEN := riscv_flen_D.sail else $(error '$(ARCH)' is not a valid architecture, must be one of: RV32, RV64) endif ... SAIL_DEFAULT_INST += riscv_insts_fext.sail riscv_insts_cfext.sail + ifeq ($(ARCH),RV64) + SAIL_DEFAULT_INST += riscv_insts_dext.sail riscv_insts_cdext.sail + endif ... + SAIL_SYS_SRCS += riscv_softfloat_interface.sail riscv_fdext_regs.sail riscv_fdext_control.sail ... + PRELUDE = prelude.sail prelude_mapping.sail $(SAIL_XLEN) $(SAIL_FLEN) ... ... + SAIL_REGS_SRCS = riscv_reg_type.sail riscv_freg_type.sail riscv_regs.sail ... ... C_INCS = $(addprefix c_emulator/,riscv_prelude.h riscv_platform_impl.h riscv_platform.h riscv_softfloat.h) C_SRCS = $(addprefix c_emulator/,riscv_prelude.c riscv_platform_impl.c riscv_platform.c riscv_softfloat.c riscv_sim.c)
The lines involving softfloat
are the links to the external Berkeley
Softfloat library; such lines will not be necessary for any ISA
extension coded wholly in Sail.
-
In this introductory tutorial we deliberately stayed away from questions of concurrency, which is a more advanced topic.
-
Each instruction semantics can be regarded as a small sequential thread performing that instruction’s semantics. There are various “events” during this thread’s progress.
-
In our simple one-instruction-at-a-time fetch-execute loop model shown in this tutorial (and in the default simulators built), these threads are simply concatenated into an overall single sequential thread.
-
A parallel model can overlap these threads:
-
a pipeline model may launch the next instruction’s thread before the current one has finished; in fact can launch it speculatively based only on PC of previous instruction;
-
a superscalar model may launch two or more of these threads together;
-
an out-of-order model may have many of these threads running concurrently;
-
register read/write events can model renamed registers;
-
memory address/read/write events can interact with a model of weakly ordered memory;
-
and so on.
-
-
The RMEM concurrency tool is meant for these purposes (https://github.com/rems-project/rmem)
-
RISC-V’s Weak Memory Model was developed by a separate RISC-V Foundation Technical Group, chaired by Dan Lustig of NVidia.
-
One of their formalizations was indeed using these Sail and RMEM system and models. As mentioned in the previous slide, this uses a concurrent fetch-execute model where multiple instructions may be in flight concurrently, with concurrent interactions with the weak memory model.
-
(These should be covered in another, more advanced tutorial.)