A Tour of the RISC-V ISA Formal Specification

Contents

1. Introduction
2. Reading the ISA to understand semantics of each RISC-V instruction
3. Composing Individual Instruction Semantics into an Executable Model
4. Taking stock …
- 4.1. Overview of spec module dependencies
5. Executing standard RISC-V ISA Tests
- 5.1. Reminder: Download and Install
- 5.2. Executing standard RISC-V ISA tests using the C simulator
  - 5.2.1. Executing standard RISC-V ISA tests using the OCaml simulator
6. Executing standard RISC-V Compliance Tests
- 6.1. Executing Compliance tests: preparation
- 6.2. Executing Compliance tests
7. Adding new ISA extensions to the Formal Spec
- 7.1. Case study: adding ISA Extensions F and D (single and double-precision floating point)
8. Concurrency and Connection to RISC-V Weak Memory Model
- 8.1. Concurrency
- 8.2. Connection to official RISC-V Memory Model

This is an introductory tutorial for the RISC-V ISA Formal Specification. The Formal Spec is intended to be the official and definitive description of the RISC-V ISA, i.e., the standard against which the functional correctness of all other implementations are measured (including hardware implementations and software simulators).

This document is in the following repository: https://github.com/rsnikhil/RISCV_ISA_Spec_Tour, in the following files:

    Tutorial.adoc    (Asciidoc source)
    Tutorial.html    (generated HTML)

Please view it either by visiting the GitHub link and clicking on the .adoc version (GitHub will render it automatically for your browser), or by downloading the .html version and opening it in your browser.

Formal specifications have a reputation of being intimidating/difficult for most people. However:

Dear Reader, it may previously have seemed abnormal,
To consider using an ISA Spec Formal;

But our future RISC-V tasks loom large and can terrify,
As CPUs, compilers and IPs we design and verify;

Greek symbol-laden formal specs may make us queasy,
But this one’s different; Sail’s really quite easy:

Instead of passing out in shock, wheeled out on a gurney,
This tutorial, we hope, for you, starts a more pleasant journey.

If you wish to dive right into the tutorial you can skip to Section Links, Downloading and Installing.

1. Introduction

1.1. Context, History and Outlook

The official “owner” of the RISC-V ISA specification is RISC-V International Association (formerly the RISC-V Foundation). We’ll refer to it as “RVIA” below.
In 2017, RVIA created a Technical Committee to develop a formal specification for the RISC-V ISA. In the meanwhile, the “specification” was written in English, and manifested in simulators such as Spike.
The Technical Committee met 2017-2019 and considered several approaches, and ultimately selected the Sail spec as the canonical one. This was approved by the RVIA Board in Fall 2019.
RVIA has stated that the Formal Spec is intended to be the official and definitive description of the RISC-V ISA, i.e., the standard against which the functional correctness of all other implementations are measured (including hardware implementations and software simulators).
The RISC-V ISA is evolving. The base Unprivileged and Privileged specs continue to be clarified, and many new ISA extensions are being developed (e.g., vector). This Sail Formal Spec will be continuously updated accordingly.
- The current version of the formal spec covers the Unprivileged RV32GC and RV64GC, and Privileged ISAs. It is sufficient for simulators generated automatically from the spec to be able to boot a number of operating systems (Linux, FreeBSD, FreeRTOS, …).

Disclaimer: Any errors in this document are solely due to the author; we welcome feedback and suggestions for improvement.

1.2. Objectives for this tutorial, and target audience

Objectives

Primary objective: To give you a basic reading literacy of the RISC-V ISA Formal Specification, i.e., where you are comfortable reading and consulting the formal spec on your own on a daily basis.
Secondary objective: To show you how to execute the formal spec on RISC-V binaries, such as standard ISA tests, the Compliance Suite, and your own compiled programs. Typically, you’d do this to compare a RISC-V implementation (simulator, architectural model, actual hardware implementation) to see if the implementation’s behavior matches the formal spec accurately.
Tertiary objective: Briefly discuss:
- how to extend the spec for a newly developed ISA extension
- concurrency and weak memory models

Target audience

We aim this tutorial at working RISC-V engineers—CPU designers, compiler writers, simulator/emulator writers, etc.-- who consult the spec frequently to clarify their understanding of instruction encodings and semantics. We do not assume any prior experience with formal languages or formal methods.

Experts and Afficionados of formal methods may also find this tutorial useful as an initial familiarization with Sail and the RISC-V formal spec.

1.3. What is an ISA Formal Specification?

The formal spec of the RISC-V ISA is intended to be:

the authoritative and definitive reference for RISC-V instructions:
- Encoding (in bits)
- Execution semantics (what executing each instruction is supposed to do). It is intended to be more authoritative than the English prose spec (completeness, precision, unambiguity) or any other simulator.
executable: can be run as a simulator executing RISC-V binaries, providing definitive execution behaviors
readable and usable by, and useful to, ordinary mortals who don’t do formal stuff for a living.
- Casual reading, as a reference guide to RISC-V instructions.
- Executable “golden reference model” to check implementation correctness.
for those who do formal stuff for a living, usable with formal tools for proofs of correctness of compilers, CPU implementations, automatic generation of tests, test coverage, etc.

1.4. How people use ISA formal specs

People are already using and will use the ISA formal spec in various ways.

As a reading reference to clarify the intended semantics of each type of instruction.

(Enabling this is the primary goal of this tutorial).
As a “golden reference model” against which to compare functional correctness other implementations (simulators and hardware designs). Specific examples of such usage include:
- The RVIA official Compliance Test Suite and Compliance testing framework
- Tandem Verification (which is a kind of dynamic instruction-by-instruction compliance testing).
In a tool to generate ISA tests automatically.
In a tool to measure instruction coverage automatically.
In a tool to formally prove a separately-written implementation correct, by directly correlating the ISA formal semantics with the semantics of the language of the implementation:
- Simulators (written in C, C++, SystemVerilog, …)
- Actual CPU hardware, designed in SystemVerilog, Bluespec, Chisel, …
In a tool to formally and systematically derive an implementation from the ISA formal spec using a series of derivations, each formally proved correct (“correct-by-construction implementation”).
… and so on.

1.5. About Sail, the language in which the RISC-V ISA Formal Spec is written

Note	The name “Sail” is not an acronym; it is the same word as in “a ship’s sail”.

The RISC-V ISA Formal Spec is written in the language Sail, which is a DSL (Domain-Specific Language) designed for purpose, i.e., for writing ISA specs. Sail has also been used to describe other ISAs, including ARMv8 (complete spec!), MIPS, parts of x86 and IBM POWER, and more. It has also been used to formalize ISA extensions for some of these ISAs.

Don’t worry, Sail is easy! The design of the Sail language was heavily informed by the style in which actual specs for many previous ISAs were written (in English, but typically in a very formal style). It is intended to be easily understandable and usable by practicing engineers, not just by experts in formal methods. Sail was created by Peter Sewell and his research group at University of Cambridge, UK.

In this tutorial we won’t study Sail separately; we’ll jump into studying the RISC-V Spec written in Sail, explaining any necessary Sail notation as we go along.

The general Sail repository (not RISC-V specific) is https://github.com/rems-project/sail, where you can find the Sail manual, a compiler to produce executable models from Sail specs, and much more.

The RISC-V spec in Sail has its own repository: https://github.com/rems-project/sail-riscv. We will be studying the code in the model/ directory.

The general Sail repository contains a compiler with several back ends to transform an ISA spec written in Sail into a C-based executable simulator, into an OCaml-based executable simulator, into inputs for formal environments such as Coq, Isabelle, HOL4, etc. The compiler itself is written in OCaml, a widely used general-purpose functional programming language.

The following picture (original in https://github.com/rems-project/sail) gives a sense of the Sail world:

The following publication is a standard reference for Sail:

ISA Semantics for ARMv8-A, RISC-V, and Cheri-MIPS, Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, Peter Sewell, in Proc. 46th ACM SIGPLAN Symp. on Principles of Programming Languages (POPL), Cascais/Lisbon, Portugal, Jan 13-19, 2019, pp. 71:1—71:31.

1.6. RISC-V ISA Formal Spec Credits and Acknowledgements

Authors (so far) of the RISC-V ISA Formal Spec in the Sail language:

Prashanth Mundkur, Jon French, Brian Campbell, Robert Norton-Wright, Alasdair Armstrong, Thomas Bauereiss, Shaked Flur, Christopher Pulte, Peter Sewell, Rishiyur Nikhil

This list will no doubt grow as the spec evolves, both for clarity and to include new ISA extensions.

Thanks to Alasdair Armstrong, Robert Norton-Wright, Prashant Mundkur and Peter Sewell for guidance and feedback in preparing this tutorial.

1.7. Status and plans

The RISC-V ISA Formal Specification in Sail currently covers the RV32GC, RV64GC Unprivileged and Privileged ISAs. The functionality is sufficient to cover many operating systems (Linux, FreeBSD, FreeRTOS, and more). The generated C-based simulators are fast enough to boot those OSs in a few seconds to minutes.

Here are some of the upcoming and future activities:

General accessibilty, training, tutorials so that the Formal Spec becomes routinely used by all RISC-V practising engineers.
Integration of the Formal Spec into the prose English spec document, so that, for each instruction described in prose, you can also see the corresponding Sail code for that instruction.
Formalization of “implementation options”. When comparing the functional correctness of a particular implementation (hardware or simulator) with the ISA Formal Spec, the simulator generated from the Formal Spec should be configured with the same implementation choices as the implementation. Examples of implementation choices include:
- Whether a misaligned memory access traps or is handled directly.
- Whether the A (accessed) and D (dirty) bits in a Page Table Entry invoke a trap or are handled directly.
- How WARL (“Write Any, Read Legal”) bits of a CSR are actually updated.
- and so on.
Automation and ease-of-use in use with the Compliance Test suite
Concurrency and tight integration with weak memory models
Formalization of all new official ISA extensions

1.8. Links, Downloading and Installing

This tutorial is in https://github.com/rsnikhil/RISCV_ISA_Spec_Tour.

This document is Tutorial.{adoc,html} in that repository.

The repository also contains a document Installation.{adoc,html}, which describes some installation you can/should perform before embarking on this tutorial. It describes one necessary and one optional step:

Step A: sufficient for reading the ISA formal spec; just clones the RISC-V ISA Formal Spec repository https://github.com/rems-project/sail-riscv.
Step B: needed for creating an executable version of the spec.

Although this document is self-contained, containing code fragments, we recommend that, in parallel, you view the actual code from the sail-riscv repository in a text viewer or editor. The fragments here are excerpts, contain elisions, and cannot show their larger context.

Each code fragment in this document shows the file from which it is taken.

2. Reading the ISA to understand semantics of each RISC-V instruction

Note	The descriptions below are based on commit 41a4072fac3e (Nov 30 2020) of the `sail-riscv` repository.

The central component of the formal specification of an ISA is the specification of individual instructions:

How each instruction is represented in bits
- In Sail, we also specify how assembly language notation for an instruction is mapped to its bit representation
How each instruction modifies architectural state when it is executed

Surrounding these individual-instruction specs can be many possible specs for composing them into an execution engine. The simplest of these would be a completely sequential composition, a classical fetch-decode-execute loop. A more complex version would allow for more concurrent composition, for modeling superscalar and out-of-order cores and multicores with weak memory models.

2.1. A first taste

The semantics of each instruction is given by an execute instruction, a fragment of which is shown below.

From file sail-riscv/model/riscv_insts_base.sail

function clause execute (RTYPE(rs2, rs1, rd, op)) = {
  let rs1_val = X(rs1);
  let rs2_val = X(rs2);
  let result : xlenbits = match op {
    RISCV_ADD  => rs1_val + rs2_val,
    RISCV_SLL  => if   sizeof(xlen) == 32
                  then rs1_val << (rs2_val[4..0])
                  else rs1_val << (rs2_val[5..0]),
    ... };
  X(rd) = result;
  RETIRE_SUCCESS
}

The function argument says that it is an “R-format” instruction (RTYPE) containing source register fields rs1 and rs2, destination register field rd, and an op sub-opcode identifying the specific operation within the group of R-format instructions.

The function body shows that we:

read a source register X(rs1),
read a source register X(rs2),
perform the operation specified by op (this excerpt showing only the ADD and SLL sub-opcodes),
and write the result to destination register X(rd).

2.2. Spec source files

The Sail language does not have a package/module structure—a full Sail program is just the concatenation of the source files. We organize the spec into separate files just according to our own convenience.

In sail-riscv/model/

$ ls
main.sail		       riscv_insts_cfext.sail	       riscv_step_ext.sail
prelude_mapping.sail	       riscv_insts_dext.sail	       riscv_step_rvfi.sail
prelude_mem_metadata.sail      riscv_insts_end.sail	       riscv_step.sail
prelude_mem.sail	       riscv_insts_fext.sail	       riscv_sync_exception.sail
prelude.sail		       riscv_insts_hints.sail	       riscv_sys_control.sail
README.md		       riscv_insts_mext.sail	       riscv_sys_exceptions.sail
riscv_addr_checks_common.sail  riscv_insts_next.sail	       riscv_sys_regs.sail
riscv_addr_checks.sail	       riscv_insts_rmem.sail	       riscv_termination_common.sail
riscv_analysis.sail	       riscv_insts_zicsr.sail	       riscv_termination_duo.sail
riscv_csr_ext.sail	       riscv_iris.sail		       riscv_termination_rv32.sail
riscv_csr_map.sail	       riscv_jalr_rmem.sail	       riscv_termination_rv64.sail
riscv_decode_ext.sail	       riscv_jalr_seq.sail	       riscv_types_common.sail
riscv_duopod.sail	       riscv_mem.sail		       riscv_types_ext.sail
riscv_ext_regs.sail	       riscv_misa_ext.sail	       riscv_types.sail
riscv_fdext_control.sail       riscv_next_control.sail	       riscv_vmem_common.sail
riscv_fdext_regs.sail	       riscv_next_regs.sail	       riscv_vmem_rv32.sail
riscv_fetch_rvfi.sail	       riscv_pc_access.sail	       riscv_vmem_rv64.sail
riscv_fetch.sail	       riscv_platform.sail	       riscv_vmem_sv32.sail
riscv_flen_D.sail	       riscv_pmp_control.sail	       riscv_vmem_sv39.sail
riscv_flen_F.sail	       riscv_pmp_regs.sail	       riscv_vmem_sv48.sail
riscv_freg_type.sail	       riscv_pte.sail		       riscv_vmem_tlb.sail
riscv_insts_aext.sail	       riscv_ptw.sail		       riscv_vmem_types.sail
riscv_insts_base.sail	       riscv_regs.sail		       riscv_xlen32.sail
riscv_insts_begin.sail	       riscv_reg_type.sail	       riscv_xlen64.sail
riscv_insts_cdext.sail	       riscv_softfloat_interface.sail  rvfi_dii.sail
riscv_insts_cext.sail	       riscv_step_common.sail

As you can see, there are many source files (and the number will grow as we add formal specs for new standard ISA extensions).

The files named riscv_insts_*.sail are the central files describing individual instructions and their semantics: …_base for the base instruction set, …_aext for the A (atomics) extension, …_cext for the C (compressed) extension, etc.

Other files describe state: riscv_regs for the integer register file, riscv_fdext_regs for the floating point extension (F,D) register file, etc.

Still other files concern composing individual instruction semantics into an execution model: riscv_fetch, riscv_step (fetch-execute-interrupt), etc.

In this tutorial we will look at excerpts of some of these files.

2.3. A word about types and type-checking

Sail is a strongly-typed language, and does its type-checking statically (i.e., on the source code, without running the code).
Many types are familiar from other languages (particularly functional programming languages): vectors, structs, algebraic types/tagged unions, …
Perhaps the most unfamiliar for many people will be the use of numbers as types.
- In ISAs (unlike most software programming languages) we deal with representations (e.g., bit-vectors) of many different sizes, and the precise size is important.
- Moreover, sizes of various entities are often related. E.g., the shift amount in RV32 should be a 5-bit value and, for RV64, a 6-bit value. In Sail, such relationships can be expressed in types, and are type-checked.
Sail also statically keeps track of effects (for example, does a certain expression read any registers? Write any registers? …). More about this later.

2.4. Preliminaries; Registers and values; “Scattered” organization of instruction specs

2.4.1. Let’s start with some small, easy files

These two files define XLEN for RV32 and RV64, respectively. Remember, a complete Sail program is a concatenation of .sail files, so we’d use one of these files, depending on whether we are considering RV32 or RV64.

From file sail-riscv/model/riscv_xlen32.sail

    /* Define the XLEN value for the architecture. */

    type xlen       : Int = 32
    type xlen_bytes : Int = 4
    type xlenbits         = bits(xlen)

From file sail-riscv/model/riscv_xlen64.sail

    /* Define the XLEN value for the architecture. */

    type xlen       : Int = 64
    type xlen_bytes : Int = 8
    type xlenbits         = bits(xlen)

In the first two lines of each excerpt, we are defining new types that are numeric.

In the next line we are defining a new type for bit-vectors of size xlen. The type bits( t ) represents the type of bit-vectors of size t. Its parameter t must be a numeric type (here, we instantiate it as xlen).

2.4.2. Registers and values

Values in integer registers

From file sail-riscv/model/riscv_reg_type.sail

    /* default register type */
    type regtype = xlenbits

    /* default zero register */
    let zero_reg : regtype = EXTZ(0x0)

In the first line we’re defining the type of values in registers; it’s the same type as xlenbits, which we just saw was defined as bits(xlen).

In the second line we’re defining a specific value of this type, using the library function EXTZ to zero-extend the constant 0x0 to the appropriate length. Because of strong type-checking (including some amount of type inference), Sail knows exactly how much extension is needed.

Note: the keyword type introduces a type definition, the keyword let introduces a value definition.

Integer registers

From file sail-riscv/model/riscv_regs.sail

    register PC       : xlenbits
    ...
    register x1  : regtype
    register x2  : regtype
    ...
    register x31 : regtype

In line 1 with keyword register we declare PC to be a register, and we specify the type of values it can contain, xlenbits. The remaining lines similarly declare registers x1…x31. (There’s no x0 register because it’s a constant 0.)

Reading from integer registers

From file sail-riscv/model/riscv_regs.sail

    val rX : forall 'n, 0 <= 'n < 32. regno('n) -> xlenbits effect {rreg, escape}
    function rX r = {
      let v : regtype =
        match r {
          0 => zero_reg,
          1 => x1,
          ...
          31 => x31,
          _  => {assert(false, "invalid register number"); zero_reg}
        };
      regval_from_reg(v)
    }

This defines a function rX that takes a register number r as argument and returns the value contained in that register. Line 1, introduced by the val keyword, specifies the type of the function. It can be read as:

For all n in the range 0..31, it takes an argument n that is a register number, and returns a value of type xlenbits. Executing this function can have two possible effects, rreg (reading a register) and escape (abort due to illegal register number).

The next line, introduced by the function keyword, defines the function rX itself, with argument r. Note that the argument is not enclosed in parentheses; this is quite common in functional languages like OCaml, SML, and Haskell.

The let binding introduces a local variable v and binds it to the value of the “pattern-matching” expression in Lines 4-10. This matches the value r with each of the subsequent patterns 0, 1, 2, … 31, returning the value of the right-hand side on first match.

The type of v is regtype, i.e., it is a register, and so in Line 11 the regval_from_reg(v) application reads out the register value, of type xlenbits.

In Sail, a block is a series of expressions in in braces, and the value of the last expression is treated as the value of the whole block; here, that is also the result of the function.

Observation: Future improvements in type-checking and pattern analysis in the Sail compiler should allow us to omit the assert statement. This, in turn, should allow us to omit the escape effect.

Writing integer registers

From file sail-riscv/model/riscv_regs.sail

    val wX : forall 'n, 0 <= 'n < 32. (regno('n), xlenbits) -> unit effect {wreg, escape}
    function wX (r, in_v) = {
      let v = regval_into_reg(in_v);
      match r {
        0  => (),
        1  => x1 = v,
        ...
        31 => x31 = v,
        _  => assert(false, "invalid register number")
      };
    }

This is similar to the rX read-function. The function type-declaration in line 1 says its argument is a pair of values, one a register number and the second a value of type xlenbits, and its result type is unit which is like the “void” type in C, indicating a value of no particular interest, since this is a pure side effect. Its effects include wreg (writing a register) and escape.

Using Sail overloading to simplify notation

From file sail-riscv/model/riscv_regs.sail

    overload X = {..., rX, wX}

This allows the notation X(r) to be used to read a register (in which case it invokes the function rX(r)), and the notation X(r)=v to write a register (in which case it invokes the function wX(r,v)).

2.4.3. “Scattered” organization of instruction specs

In a traditional programming language, we might have:

A type definition showing all the different variants of instructions (opcodes, register fields, immediate fields, …).
A decode function that describes how to take a 32-bit value into into each of the different instruction variants.
An execute function that describes how to execute each variant of instruction.

The problem is that for a given instruction, it would have one clause in the first group (type definitions), one clause in the second group (decode) and one clause in the third group (execute), and these may be quite far apart in the text, possibly in different files. To add a new instruction, one would have to add a clause to each of the groups.

Traditional instruction set manuals, on the other hand “scatter” this same information differently---a page (or a few) per instruction variant, showing:

Its fields (opcode, register fields, immediate fields, …).
How a 32-bit instruction is decoded/encoded.
How it is executed.

Sail supports this more traditional, familiar organization of ISA specs. For each type of instruction, all its relevant information is collected in one place. Or, to view it another way, the type definitions, decode information and execute information are “scattered” across instruction defintions.

We must first introduce the generic information about entities whose individual definition-clauses will be given later in scattered fashion. In the concatenation of .sail files, the following is given early before any of the scattered definitions:

From file sail-riscv/model/riscv_insts_begin.sail

    scattered union ast

    /* returns whether an instruction was retired, used for computing minstret */
    val execute : ast -> Retired effect {escape, wreg, rreg, wmv, wmvt, eamem,
                                         rmem, rmemt, barr, exmem, undef}
    scattered function execute

    val encdec : ast <-> bits(32)
    scattered mapping encdec

    val assembly : ast <-> string
    scattered mapping assembly

The first line introduces the type ast which is a union of all the different variants of instructions. Each variant will follow later, in a scattered fashion. Here, ast stands for Abstract Syntax Tree, the decoded view of an instruction.

The next line declares the type of the execute function. It takes an argument whose type is ast and returns a value of type Retired, which indicates whether it should be counted as a retired instruction or not. It also specifies all the possible effects of an instruction, such as aborting (escape), writing and reading registers (wreg, rreg), reading memory, and so on. The following line indicates that execute 's definition will be scattered.

The next line declares the type of the encdec mapping. The <-> notation says it is a mapping, which is a pair of functions converting from a 32-bit value (instruction) to is decoded view (ast), and vice versa. When applied to an ast argument it produces a bits(32) result, and when applied to a bits(32) argument it produces an ast result. The following line indicates that its definition will be scattered.

The next line declares the type of the assembly mapping that converts from a string to a decoded instruction and vice versa, and the following line indicates that its definition will be scattered.

2.5. Instruction fields

2.5.1. Type definitions for instruction fields

The top of each page in The RISC-V Instruction Set Manual Volume I: Unprivileged ISA, Chapter 25 Instruction Set Listings shows the RISC-V instruction formats:

The least-significant 7 bits provide a major opcode.
The funct3 and funct7 fields (and sometimes the immediate fields) often specify sub-opcodes.
The rs1, rs2 and rd fields are 5-bit values specifying source and destination registers.
Immediate values are often composed from non-trivial permutation of imm instruction fields.

We declare convenient types for instruction fields.

From file sail-riscv/model/riscv_types.sail

    type regidx  = bits(5)
    type cregidx = bits(3)    /* identifiers in RVC instructions */
    type csreg   = bits(12)   /* CSR addressing */
    ...
    type opcode = bits(7)
    type imm12  = bits(12)
    type imm20  = bits(20)
    ...

These are definitions for register indexes, register indexes in compressed instructions, CSR register addresses, major opcodes, and 12-bit and 20-bit immediates.

2.6. Some base instructions

2.6.1. LUI and AUIPC: decoded view

Earlier we declared ast to be a union type, i.e., a type with several variants. We also declared that the variants would be provided later in scattered clauses.

We now provide one of those clauses, for U-format instructions (LUI and AUIPC):

From file sail-riscv/model/riscv_insts_base.sail

    union clause ast = UTYPE : (bits(20), regidx, uop)

This says: one variant of the ast type is called UTYPE. It contains 3 fields (identified positionally, not with keywords) whose types are, respectively, a bit-vector of 20 bits, a register index, and a uop which identifies whether it’s an LUI or AUIPC.

Note: Sail unions are similar to “algebraic types” or “tagged unions” in other programming languages. Each value of a tagged union carries a way (a “tag”) by which we can query which variant this value encodes.

In Sail, as is common in functional programming languages, values of union type are usually analyzed in “pattern-matching” statements, which are like case/switch statements where each clause matches a variant of the union.

2.6.2. LUI and AUIPC: Decoding

Earlier, we declared a scattered mapping (a function and its inverse) encdec along with its type:

From file sail-riscv/model/riscv_insts_begin.sail

    val encdec : ast <-> bits(32)
    scattered mapping encdec

We now provide one such clause, showing how to encode/decode LUI and AUIPC instructions.

From file sail-riscv/model/riscv_insts_base.sail

    mapping encdec_uop : uop <-> bits(7) = {
      RISCV_LUI   <-> 0b0110111,
      RISCV_AUIPC <-> 0b0010111
    }

    mapping clause encdec = UTYPE(imm, rd, op)
      <-> imm @ rd @ encdec_uop(op)

The first four lines define a new, local mapping between the bit-encodings of the 7-bit opcode in a U-format instruction to a value of uop type, i.e., the symbolic names for the corresponding instructions.

The last two lines add a scattered clause to the encdec mapping. The left-hand-side of <-> shows the decoded view, i.e., a UTYPE ast. The right-hand side shows a bit-concatenation. The prior declarations allow Sail to infer that imm, rd, and encddec_op(op) are are 20-bit, 5-bit and 7-bit fields, respectively, and that the concatenation is a 32-bit value,

2.6.3. LUI and AUIPC: execution

Earlier, we declared a scattered function execute and its type:

From file sail-riscv/model/riscv_insts_begin.sail

    val execute : ast -> Retired effect {escape, wreg, rreg, wmv, wmvt, eamem,
                                         rmem, rmemt, barr, exmem, undef}
    scattered function execute

Here is the definition of the Retired type:

From file sail-riscv/model/riscv_types.sail

    enum Retired = {RETIRE_SUCCESS, RETIRE_FAIL}

Since it is a type with 2 values, we could have used the bool type for this, but (a) defining a new type provides more readable names, and (b) this prevents accidental confusion of random booleans where a Retired value is expected.

We now provide one of the clauses for execute, for LUI and AUIPC instructions.

From file sail-riscv/model/riscv_insts_base.sail

    function clause execute UTYPE(imm, rd, op) = {
      let off : xlenbits = EXTS(imm @ 0x000);
      let ret : xlenbits = match op {
        RISCV_LUI   => off,
        RISCV_AUIPC => get_arch_pc() + off
      };
      X(rd) = ret;
      RETIRE_SUCCESS
    }

In the first line, the argument to the execute function is given as a pattern UTYPE(imm, rd, op). Remember execute can be applied to any value of type ast. The pattern here ensures that this clause will only be relevant to those ast values that are of the UTYPE variant. On a successful match, it also binds the names imm, rd and op to the three fields of the decoded instruction, so we can use these variables in the body of the function.

Strong-typing assures us that imm is of type bits(20), i.e., a bit-vector of length 20. In Line 2, we concatenate this with the 12-bit value 0x000, giving us a 32-bit value. Then, we use EXTS to sign-extend it as necessary. This does nothing in RV32, since it’s already a 32-bit value, and it sign-extends it to 64 bits in RV64. The result is bound to the local variable off of type xlenbits.

The third line binds local variable ret, of type xlenbits, to the right-hand side, which is a pattern-matching expression despatching on op. When it matches RISCV_LUI, the value is just off. When it matches RISCV_AUIPC, the value is added to get_arch_pc(), which retrieves the value of the program counter in the current machine state.

The penultimate line assigns this value to register rd, using the overloading of X we saw earlier.

The final line is the constant expression RETIRE_SUCCESS. Being the last expression in the block, and the block being the body of the function, this is the value returned by the function. It’s type is Retire, as given in the execute function’s type declaration.

2.6.4. LUI and AUIPC: assembly language parsing and printing

We first define a mapping (function and its inverse) to convert the sub-opcode uop to a string and back:

From file sail-riscv/model/riscv_insts_base.sail

    mapping utype_mnemonic : uop <-> string = {
      RISCV_LUI   <-> "lui",
      RISCV_AUIPC <-> "auipc"
    }

Then, we add a scattered clause to our previously introduced assembly mapping:

From file sail-riscv/model/riscv_insts_base.sail

    mapping clause assembly = UTYPE(imm, rd, op)
      <-> utype_mnemonic(op) ^ spc() ^ reg_name(rd) ^ sep() ^ hex_bits_20(imm)

the caret operator concatenates strings; spc() and sep() return strings for spaces and commas;
reg_name(r) returns the string name for its register-number argument;
hex_bits_20() returns a string showing a hex printing of a 20-bit value.

2.6.5. Conditional branch instructions: ast clause

Conditional branch instructions include BEQ, BNE, BLT, BGE, BLTU, and BGEU. We define symbolic names:

From file sail-riscv/model/riscv_types.sail

    enum bop = {RISCV_BEQ, RISCV_BNE, RISCV_BLT,
                RISCV_BGE, RISCV_BLTU, RISCV_BGEU}    /* branch ops */

Branch instructions are encoded in the B-format:

Our abstract (decoded) ast view is:

From file sail-riscv/model/riscv_insts_base.sail

    union clause ast = BTYPE : (bits(13), regidx, regidx, bop)

The branch offset immediate value is 13 bits composed from 12 bits in the instruction, with 0 appended as the least-significant bit.
The 12 bits come from non-contiguous 7-bit and 5-bit fields in the instruction.
Our ast (decoded) view holds the 13-bit offset (computed in the encdec function to be shown shortly).

2.6.6. Conditional branch instructions: encoding/decoding

We define a mapping converting the 3-bit funct3 field in the instruction to its abstract names:

From file sail-riscv/model/riscv_insts_base.sail

    mapping encdec_bop : bop <-> bits(3) = {
      RISCV_BEQ  <-> 0b000,
      RISCV_BNE  <-> 0b001,
      RISCV_BLT  <-> 0b100,
      RISCV_BGE  <-> 0b101,
      RISCV_BLTU <-> 0b110,
      RISCV_BGEU <-> 0b111
    }

Then, we add a scattered clause to the encdec mapping:

From file sail-riscv/model/riscv_insts_base.sail

    mapping clause encdec = BTYPE(imm7_6 @ imm5_0 @ imm7_5_0 @ imm5_4_1 @ 0b0, rs2, rs1, op)
    <-> imm7_6 : bits(1) @ imm7_5_0 : bits(6)
        @ rs2 @ rs1 @ encdec_bop(op)
        @ imm5_4_1 : bits(4) @ imm5_0 : bits(1)
        @ 0b1100011

Observe the 13-bit offset is composed by extracting bits from various places in the instruction.

2.6.7. Conditional branch instructions: execution

We add a scattered clause to the execute function. The first part is straightforward:

From file sail-riscv/model/riscv_insts_base.sail

    function clause execute (BTYPE(imm, rs2, rs1, op)) = {
      let rs1_val = X(rs1);
      let rs2_val = X(rs2);
      let taken : bool = match op {
        RISCV_BEQ  => rs1_val == rs2_val,
        RISCV_BNE  => rs1_val != rs2_val,
        RISCV_BLT  => rs1_val <_s rs2_val,
        RISCV_BGE  => rs1_val >=_s rs2_val,
        RISCV_BLTU => rs1_val <_u rs2_val,
        RISCV_BGEU => rs1_val >=_u rs2_val
      };
      let t : xlenbits = PC + EXTS(imm);
      ...
    }

Line 4 computes taken, indicating whether the branch is taken or not. It does a pattern-match on the sub-opcode op. Note that BLT and BLTU are supposed to interpret their argument as signed and unsigned values, respectively. This is encoded by using different Sail pre-defined comparison operators <_s and <_u, respectively.
The let t line computes t, the branch target of type xlenbits by adding a sign-extension of the immediate to the PC.

The next section of the execute function clause performs different actions depending on whether the branch is taken or not:

From file sail-riscv/model/riscv_insts_base.sail

    function clause execute (BTYPE(imm, rs2, rs1, op)) = {
      ...
      if taken then {
        ...
        ...
      } else RETIRE_SUCCESS
    }

If the branch is not taken, there is no further action and the result is RETIRE_SUCCESS.

If the branch is taken, we first check that the branch target PC is valid.

From file sail-riscv/model/riscv_insts_base.sail

      if taken then {
        ... <some code elided> ...
            if bit_to_bool(target[1]) & (~ (haveRVC())) then {
              handle_mem_exception(target, E_Fetch_Addr_Align());
              RETIRE_FAIL;
            } else {
              set_next_pc(target);
              RETIRE_SUCCESS
            }

Line 3 checks the requirement that, without the “C” ISA extension (compressed instructions), the branch target must be 4-byte aligned, i.e., bit [1] must be 0. bit_to_bool converts a value of bits(1) type to bool type (we could have also used ==1). haveRVC checks if the C extension is active. If the target is not ok, in the next line we invoke function handle_mem_exception to perform exception actions and return failure. If the target is ok, the next line assigns the target to the next PC and we return success.
Our <some code elided> on Line 2 contains additional checks for target validity that may be required by any other extensions.

2.6.8. Conditional branch instructions: Assembly language parsing and printing

We first define a mapping (function and its inverse) to convert the sub-opcode bop to a string and back:

From file sail-riscv/model/riscv_insts_base.sail

    mapping btype_mnemonic : bop <-> string = {
      RISCV_BEQ  <-> "beq",
      RISCV_BNE  <-> "bne",
      RISCV_BLT  <-> "blt",
      RISCV_BGE  <-> "bge",
      RISCV_BLTU <-> "bltu",
      RISCV_BGEU <-> "bgeu"
}

Then, we add a scattered clause to our previously introduced assembly mapping:

From file sail-riscv/model/riscv_insts_base.sail

    mapping clause assembly = BTYPE(imm, rs2, rs1, op)
      <-> btype_mnemonic(op) ^ spc() ^ reg_name(rs1) ^ sep() ^ reg_name(rs2) ^
              sep() ^ hex_bits_13(imm)

the caret operator concatenates strings; spc() and sep() return strings for spaces and commas;
reg_name(r) returns the string name for its register-number argument;
hex_bits_13() returns a string showing a hex printing of a 13-bit value.

2.7. Taking stock

The general scheme for each new instruction, or new class of instructions, should be clear by now:

Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
Augment the ast type by adding a scattered clause to describe this new class
Augment the encdec mapping by adding a scattered clause to describe this new class
Augment the execute function by adding a scattered clause to describe this new class
Augment the assembly mapping by adding a scattered clause to describe this new class

It is a stylistic judgement call whether you define a class with sub-opcodes, or just define a separate clause for each instruction in the class. E.g., we could have defined separate ast , encdec, execute and assembly clauses for BEQ, BNE, BLT, …

A class with sub-opcodes makes sense when the instructions share structure and semantics. For example, BEQ/BNE/BLT/… differ only in the particular comparison operator; using a class with sub-opcodes captures this similarity.

For the remaining examples we’ll focus on the execute function only (the encdec and assembly clauses are similar to the previous examples).

2.8. LOAD instructions: execution

Memory-access instructions involve many more steps, since they can involve alignment checks, virtual address-to-physical address translation, physical memory protection checks, ordering relationships with other memory accesses, and so on. Many of these can trap (raise an exception).

The header of the scattered clause of execute to handle memory-load instructions is:

From file sail-riscv/model/riscv_insts_base.sail

    function clause execute(LOAD(imm, rs1, rd, is_unsigned, width, aq, rl)) = {

The arguments are the

the immediate, rs1 and rd fields from the instruction;
whether the loaded value is treated as signed or unsigned, i.e., whether the loaded value should be sign-extended or zero-extended to the width of the destination register;
the width to be loaded: byte, halfword (2 bytes), word (4 bytes) or double (8 bytes);
the acquire/release semantics for memory ordering.

The next step is to compute the actual (virtual) address to be accessed:

From file sail-riscv/model/riscv_insts_base.sail

      let offset : xlenbits = EXTS(imm);
      match ext_data_get_addr(rs1, offset, Read(Data), width) {
        Ext_DataAddr_Error(e)  => { ext_handle_data_check_error(e); RETIRE_FAIL },
        Ext_DataAddr_OK(vaddr) =>
            ...

After computing the offset by sign-extending the immediate value, it invokes the function ext_data_get_addr to perform a signed addition of the offset to the contents of rs1. This function is defined in riscv_addr_checks.sail. By encapsulating this addition in a function, we allow future extensibility to new ISA extensions that may perform additional checks/transformations on the address.

This function can return an error, but in the normal simple case without additional ISA extensions it returns Ext_DataAddr_OK(vaddr) containing the effective virtual address. We use pattern-matching (a match expression) to distinguish these two outcomes. Next:

From file sail-riscv/model/riscv_insts_base.sail

          if   check_misaligned(vaddr, width)
          then { handle_mem_exception(vaddr, E_Load_Addr_Align()); RETIRE_FAIL }
          else match translateAddr(vaddr, Read(Data)) {
              ...

The function check_misaligned(vaddr, width) optionally checks if the access is aligned for the requested width. This function is defined a little earlier in the file and returns true it is misaligned and if we’ve configured the model to disallow misaligned accesses. If we’ve configured the model to allow misaligned accesses, this function will always return False.

If ok, it invokes translateAddr(vaddr, Read(Data) to optionally translate virtual addresses to physical addresses. This function is defined in a collection of files:

    riscv_vmem_types.sail, riscv_vmem_common.sail
    riscv_vmem_rv32.sail, riscv_vmem_sv32.sail
    riscv_vmem_rv64.sail, riscv_vmem_sv39.sail, riscv_vmem_sv48.sail
    riscv_vmem_tlb.sail

different subsets of which are used depending on whether we’re modeling RV32 or RV64, and the Sv32, Sv39 or Sv48 virtual memory schemes.

The translateAddr function simply returns the address as-is if not running with virtual memory.

In the virtual-memory translation functions, you’ll notice that they also model a TLB (Translation Lookaside Buffer). This is because TLBs are visible in the semantics via the SFENCE.VMA instruction.

Finally:

From file sail-riscv/model/riscv_insts_base.sail

          else match translateAddr(vaddr, Read(Data)) {
            TR_Failure(e, _) => { handle_mem_exception(vaddr, e); RETIRE_FAIL },
            TR_Address(addr, _) =>
              match (width, sizeof(xlen)) {
                (BYTE, _)   =>
                   process_load(rd, vaddr,
                                mem_read(Read(Data), addr, 1, aq, rl, false),
                                is_unsigned),
                (HALF, _)   =>
                   process_load(rd, vaddr,
                                mem_read(Read(Data), addr, 2, aq, rl, false),
                                is_unsigned),
                (WORD, _)   =>
                   process_load(rd, vaddr,
                                mem_read(Read(Data), addr, 4, aq, rl, false),
                                is_unsigned),
                (DOUBLE, 64) =>
                   process_load(rd, vaddr,
                                mem_read(Read(Data), addr, 8, aq, rl, false),
                                is_unsigned)

If the virtual-to-physical translation was successful, we invoke mem_read to perform the raw memory read, and pass the result to process_load to process the result (which could be an exception, e.g, if there is no memory at that address).

The first three clauses of the match expression use the wildcard pattern _ in the second component, since these sizes are valid in RV32 and RV64. The fourth clause will only match when the second component is 64, i.e., it restricts it to RV64.

2.9. Exceptions

RISC-V has

interrupts (asynchronous exceptions, conceptually “between” any two instructions)
traps (synchronous exceptions, due to execution of an instruction)

The different kinds of interrupts are Software, Timer and External and are delivered at User, Supervisor or Machine privilege levels:

From file sail-riscv/model/riscv_types.sail

    enum InterruptType = {  I_U_Software,    I_S_Software,    I_M_Software,
                            I_U_Timer,       I_S_Timer,       I_M_Timer,
                            I_U_External,    I_S_External,    I_M_External    }

This is followed by a function to convert bit-encodings to these symbolic names:

From file sail-riscv/model/riscv_types.sail

    val interruptType_to_bits : InterruptType -> bits (8)
    function interruptType_to_bits (i) =
      match (i) {
        I_U_Software => 0x00,  I_S_Software => 0x01,  I_M_Software => 0x03,
        I_U_Timer    => 0x04,  I_S_Timer    => 0x05,  I_M_Timer    => 0x07,
        I_U_External => 0x08,  I_S_External => 0x09,  I_M_External => 0x0b
      }

A mapping would be more expressive than a function, but since we don’t decode interrupts/exceptions, we don’t need the inverse function.

The different kinds of traps, and converting to bits:

From file sail-riscv/model/riscv_types.sail

    union ExceptionType = { E_Fetch_Addr_Align   : unit,     E_Fetch_Access_Fault : unit,
                            E_Illegal_Instr      : unit,     E_Breakpoint         : unit,
                            E_Load_Addr_Align    : unit,     E_Load_Access_Fault  : unit,
                            E_SAMO_Addr_Align    : unit,     E_SAMO_Access_Fault  : unit,
                            E_U_EnvCall          : unit,     E_S_EnvCall          : unit,
                            E_Reserved_10        : unit,     E_M_EnvCall          : unit,
                            E_Fetch_Page_Fault   : unit,     E_Load_Page_Fault    : unit,
                            E_Reserved_14        : unit,     E_SAMO_Page_Fault    : unit }

    val exceptionType_to_bits : ExceptionType -> exc_code
    function exceptionType_to_bits(e) =
      match (e) {
        E_Fetch_Addr_Align()   => 0x00,
        E_Fetch_Access_Fault() => 0x01,
        ...
      }

Note	I think this could also have been written as an enum. The `unit` type is like `void`, so these union variants don’t contain any interesting data with each tag.

Some traps may carry additional information. In Sail (and OCaml), optional information is usually expressed using the option predefined type:

    union option ('a : Type) = { Some : 'a,
                                 None : unit }

i.e., the Some variant carries some additional information (generic/polymorphic type 'a), and the None variant carries no additional information.

From file sail-riscv/model/riscv_sync_exception.sail

    struct sync_exception = {
      trap    : ExceptionType,
      excinfo : option(xlenbits),
      ext     : option(ext_exception)   /* for extensions */
    }

The trap field is necessary information. The other two fields carry optional information, for standard traps (such as an address that provoked a trap), and also for future standard or non-standard ISA extensions.

2.9.1. Handling certain exceptions

The handle_mem_exception action function we saw earlier in conditional branches with illegal branch targets is:

From file sail-riscv/model/riscv_sys_control.sail

    function handle_mem_exception(addr : xlenbits, e : ExceptionType) -> unit = {
      let t : sync_exception = struct { trap    = e,
                                        excinfo = Some(addr),
                                        ext     = None() } in
      set_next_pc(exception_handler(cur_privilege, CTL_TRAP(t), PC))
    }

The let t line constructs a sync_exception value, filling in the address as optional exception info, and binds it to the local variable t.

The next line invokes a more general exception_handler.

From file sail-riscv/model/riscv_sys_control.sail

    function exception_handler(cur_priv : Privilege, ctl : ctl_result,
                               pc: xlenbits) -> xlenbits = {
      match (cur_priv, ctl) {
        (_, CTL_TRAP(e)) => {
          let del_priv = exception_delegatee(e.trap, cur_priv);
          ...
          trap_handler(del_priv, false, exceptionType_to_bits(e.trap), pc, e.excinfo, e.ext)
        },
        (_, CTL_MRET())  => { ... }
        (_, CTL_SRET())  => { ... }
        (_, CTL_URET())  => { ... } }

Line 5 checks if the current trap, at the current privilege level, is being delegated to be handled at a different privilege level (returning that privilege level or the current privilege level).

Line 7 invokes an even more general trap handler (below).

Lines 9-11 handle exception returns from the Machine, Supervisor and User privilege levels, respectively.

From file sail-riscv/model/riscv_sys_control.sail

    function trap_handler(del_priv : Privilege, intr : bool, c : exc_code,
                          pc : xlenbits, info : option(xlenbits),
                          ext : option(ext_exception))
                         -> xlenbits = {
      cancel_reservation();    /* for LR/SC */
      match (del_priv) {
        Machine => { mcause->IsInterrupt() = bool_to_bits(intr);
                     mcause->Cause()       = EXTZ(c);

                     mstatus->MPIE()       = mstatus.MIE();
                     mstatus->MIE()        = 0b0;
                     mstatus->MPP()        = privLevel_to_bits(cur_privilege);
                     mtval                 = tval(info);
                     mepc                  = pc;

                     cur_privilege         = del_priv;
                     prepare_trap_vector(del_priv, mcause)
        },
        Supervisor => { ... }
        User => { ... }

This is an intricate but otherwise unremarkable assignment of certain values to certain CSRs.

The last line of the Machine case invokes prepare_trap_vector (in file riscv_sys_extensions.sail) which returns the PC that is in mtvec, stvec, or utvec, as appropriate.

3. Composing Individual Instruction Semantics into an Executable Model

So far, we’ve only talked about the decode and execute function for individual instructions. We’ve said nothing about how and when these get invoked, nor about how instructions are fetched.

This separation is deliberate. We may wish to build several different processor models: pipelined, superscalar, multi-hart, and so on. Each of these would be a different top-level system, with its own system-level semantics, but they can all share the individual instruction semantics discussed so far.

In this section we’ll sketch one such encapsulating model, which is used in the default simulators built from the model. This model, shown in files main.sail and riscv_step.sail implement a simple, sequential, unpipelined, one-instruction-at-a-time fetch-execute loop ().

Section Concurrency and Connection to RISC-V Weak Memory Model discusses alternatives, such as concurrent composition of instruction semantics, for more accurate modeling of out-of-order processors and weak memory models.

The top-level function initializes the PC to 0x1000, initializes the model as a whole (including certain CSRs and registers), and then invokes the fetch-execute loop:

From file sail-riscv/model/main.sail

    function main () : unit -> unit = {
      PC = sail_zero_extend(0x1000, sizeof(xlen));
      init_model();
      loop()
    }

Note, the 0x1000 initial value is not part of the spec, it is an implementation choice by the “platform”. The loop() function, in turn, repeatedly performs a fetch-execute step:

From file sail-riscv/model/riscv_step.sail

    function loop () : unit -> unit = {
      while (...) do {
        let stepped = step(step_no);
        ...
      }
    }

In each iteration of the loop, we perform a step():

From file sail-riscv/model/riscv_step.sail

    function step(step_no : int) -> bool = {
      let (retired, stepped) : (Retired, bool) =
        match dispatchInterrupt(cur_privilege) {
          Some(intr, priv) => { handle_interrupt(intr, priv); (RETIRE_FAIL, false) },
          None() => {
            let f : FetchResult = ext_fetch_hook(fetch());
            match f {
              F_RVC(h) => { let ast = decodeCompressed(h);
                            if haveRVC() then {
                              nextPC = PC + 2;
                              (execute(ext_post_decode_hook(ast)), true)
                          } else {
                            handle_illegal();
                            (RETIRE_FAIL, true)
                          }
              },
              F_Base(w) => { let ast = decode(w);
                             nextPC = PC + 4;
                             (execute(ext_post_decode_hook(ast)), true)
              }
          ...

The step() function first checks for interrupts and handles it if there is one. To check this, it will consult various CSRs including MSTATUS, MIP, MIE.

If there is no interrupt, it fetches an instruction and decides whether its an RVC (compressed) instruction or a base instruction. In each case, it decodes it and executes it.

4. Taking stock …

By this time we hope you’re getting the hang of reading the Sail code that expresses the semantics of RISC-V instructions.

Some observations:

In many senses, Sail is “just another” programming language. Many of its notations and features are taken from or inspired by the functional programming language OCaml (whose heritage, in turn, goes back to SML and ML).
Expressing the semantics of RISC-V instructions is an exercise in coding in this programming language.
In principle, a simulator for RISC-V written in any programming language, including C/C++, is a formalization of the RISC-V ISA. So what makes Sail a preferred “host language” for the Formal Spec?
- The host language needs to be small, and its semantics needs to be simple, clear, and formalizable. This is true of Sail, but not true of most widely known programming languages (including C and C++).
- Sail’s features like numeric types with type-checking, scattered definitions, mappings, bit-vectors with type-encoded lengths all make it into a DSL (Domain Specific Language) that is particularly suited for expressing ISAs.
- The RISC-V ISA has many optional parts; Sail’s powerful parameterization is useful in expressing this cleanly.
- RISC-V implementations can vary highly in their amount of concurrency and allowed non-determinism, from completely sequential fetch-decode-execute simulators to pipelining, speculation, superscalarity, out-of-order, multi-hart, multi-core, and weak memory models. Sail is able to express all such variants cleanly.
- Sail’s simple, clean, semantics make it suitable for connecting to well-known formal-method tools (such as Coq, Isabelle, HOL4), for automated reasoning about properties (including correctness) of RISC-V programs and artefacts that manipulate or interpret RISC-V programs, such as hardware implementations, simulators, compilers.

4.1. Overview of spec module dependencies

The following diagram gives an Overview of module dependencies in the Sail RISC-V spec. (From: doc/figs/riscvspecdeps.svg in GitHub repository https://github.com/rems-project/sail-riscv).

5. Executing standard RISC-V ISA Tests

5.1. Reminder: Download and Install

If you have not already done so, please follow both Step A and Step B described in document Installation.html in the repository https://github.com/rsnikhil/RISCV_ISA_Spec_Tour to download/build/install executable versions of the formal spec.

Step A clones the repository https://github.com/rems-project/sail-riscv, with the Sail RISC-V spec in the model/ directory (this is the code we’ve been studying so far in this tutorial).

Step B takes you through these steps:

Install Opam, the package manager for OCaml;
Using Opam, install OCaml
Using Opam, install Sail
Using Ocaml and Sail, build executable versions of the Sail RISC-V spec.

As a result, you should now have the following executables in your sail-riscv repository:

$ pwd; ls c_emulator/riscv_sim_RV*
/home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv
c_emulator/riscv_sim_RV32*  c_emulator/riscv_sim_RV64*

5.2. Executing standard RISC-V ISA tests using the C simulator

The directory sail-riscv/test/riscv-tests/ has a full suite of pre-compiled standard RISC-V ISA tests. Each has an ELF file (RISC-V binary) and a disassembly (text file) of the test. Examples:

Example of ISA test ELF files (RISC-V executables) and disassembly (dump) text files:

In sail-riscv/

    $ ls -1 test/riscv-tests/rv64ui-p-add*
    test/riscv-tests/rv64ui-p-add.elf
    test/riscv-tests/rv64ui-p-add.dump
    test/riscv-tests/rv64ui-p-addi.elf
    test/riscv-tests/rv64ui-p-addi.dump
    test/riscv-tests/rv64ui-p-addiw.elf
    test/riscv-tests/rv64ui-p-addiw.dump
    test/riscv-tests/rv64ui-p-addw.elf
    test/riscv-tests/rv64ui-p-addw.dump

Using the C-based simulator we can execute, for example, the rv64ui-p-add ISA test for the ADD instruction:

In sail-riscv/

    $ ./c_emulator/riscv_sim_RV64  test/riscv-tests/rv64ui-p-add.elf
    Tue Dec 10 07:37:05 2019
    ...
    Running file test/riscv-tests/rv64ui-p-add.elf.
    ELF Entry @ 0x80000000
    CSR mstatus <- 0x0000000A00000000 (input: 0x0000000000000000)
    mem[X,0x0000000000001000] -> 0x0297
    mem[X,0x0000000000001002] -> 0x0000
    [0] [M]: 0x0000000000001000 (0x00000297) auipc t0, 0
    ...
    [1] [M]: 0x0000000000001004 (0x02028593) addi a1, t0, 32
    ...
    [2] [M]: 0x0000000000001008 (0xF1402573) csrrs a0, zero, mhartid
    ...
    [477] [M]: 0x0000000080000044 (0xFC3F2023) sw gp, 4032(t5)
    htif[0x0000000080001000] <- 0x00000001
    htif-syscall-proxy cmd: 0x000000000001
    SUCCESS

During execution of the RISC-V binary, it prints out a trace of instructions executed (PC, instruction, assembly).

Another example: the rv32um-v-mulhsu test for the MULHSU instruction in virtual-memory mode:

In sail-riscv/

    $ ./c_emulator/riscv_sim_RV32  test/riscv-tests/rv32um-v-mulhsu.elf
    Tue Dec 10 07:46:33 2019
    Running file test/riscv-tests/rv32um-v-mulhsu.elf.
    ELF Entry @ 0x80000000
    [0] [M]: 0x00001000 (0x00000297) auipc t0, 0
    ...
    [20652] [S]: 0xFFC02270 (0x0106A023) sw a6, 0(a3)
    htif[0x80001000] <- 0x00000001
    htif-syscall-proxy cmd: 0x000000000001

    SUCCESS

During execution of the RISC-V binary, it prints out a trace of instructions executed (PC, instruction, assembly).

You can execute any of the *.elf tests in directory sail-riscv/test/riscv-tests/ in the same way.

5.2.1. Executing standard RISC-V ISA tests using the OCaml simulator

FYI, for those who wish to explore the OCaml-based simulators and/or connections to various formal tools.

The make command in Step B.4 of the Installation.html document (without the csim argument) also makes:

OCaml-based executable versions of the spec, in directory ./ocaml_emulator/. These are run in the same way as the C-based simulators of the previous examples.
Material to connect to formal tools Coq, Isabelle, HOL4, etc. Please see documentation in the repository about these options.

6. Executing standard RISC-V Compliance Tests

RISC-V International Association (RVIA) is developing a “Compliance” suite for checking whether an implementation (whether in hardware or simulation) is compliant with the RISC-V Formal Specification. This is a work-in-progress; as of December 2020, compliance suites are ready only for some RV32I subsets of the ISA. RVIA will gradually grow the Compliance Suite to encompass the full ISA.

RVIA’s Compliance Suite is in this repository: https://github.com/riscv/riscv-compliance, where you can find details about the Compliance Suite, how to run it on new implementations, etc.

Each Compliance Test program is an ELF file compiled for a particular RISC-V ISA configuration. When the program is executed on a RISC-V implementation, it produces a certain output called a “signature”. This signature is compared for an exact match with a particular “expected value” in order to decide whether the implementation has passed this test or not. (Since

In this section we show how to execute the tests in the Compliance Suite using the C simulator compiled from the RISC-V Formal Spec in Sail. The process is highly automated, and should automatically encompass new tests as the Compliance suite grows.

First, please ensure you have downloaded/built/installed executable versions of the RISC-V Formal Spec in Sail, as described in Section Reminder: Download and Install.

6.1. Executing Compliance tests: preparation

Clone a copy of the RISC-V International Association’s “Compliance” repository:

    $ git clone  https://github.com/riscv/riscv-compliance

Set up your environment for RISC-V compiler tools gcc and friends (the Compliance scripts will use this to re-compile compliance tests).

    $ export RISCV=<your toolchain_installation_dir>/riscv64
    $ export PATH=$RISCV/bin:$PATH

Spot check that we’ve got the toolchain setup:

    $ which riscv64-unknown-elf-gcc
    /home/nikhil/git_clones/RISCV_Gnu_Toolchain/riscv64/bin/riscv64-unknown-elf-gcc

    $ riscv64-unknown-elf-gcc  --version
    riscv64-unknown-elf-gcc (GCC) 9.2.0
    Copyright (C) 2019 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Setup up your PATH environment variable to include your clone-directory of the sail-riscv repository, so that the Compliance scripts know where to find the executable versions of the Sail RISC-V spec:

    $ export SAIL_RISCV=<path to your clone of sail-riscv repository>/sail-riscv
    $ export PATH=$SAIL_RISCV/c_emulator:${PATH}

Spot check that you have the RISC-V Formal Spec’s C simulators:

    $ which riscv_sim_RV32  riscv_sim_RV64
    /home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv/c_emulator/riscv_sim_RV32
    /home/nikhil/git_clones/ISA_Formal_Spec/sail-riscv/c_emulator/riscv_sim_RV64

6.2. Executing Compliance tests

Finally, the following will execute all relevant variants of the Compliance test suite:

In riscv-compliance/

    $ make RISCV_TARGET=sail-riscv-c all_variant

Your terminal output will look something like this:

    for isa in rv32i rv32im rv32imc rv32Zicsr rv32Zifencei; do \
        ...
    ...
    Compile /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.elf
    Execute /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.log
    Running file /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-MISALIGN_JMP-01.elf.
    ELF Entry @ 0x80000000
    begin_signature: 0x80002000
    end_signature: 0x80002090
    CSR mstatus <- 0x00000000 (input: 0x00000000)
    ...
    SUCCESS
    ...
    Compile /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.elf
    Execute /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.log
    Running file /home/nikhil/git_clones/riscv-compliance/work/rv32i/I-ADD-01.elf.
    ELF Entry @ 0x80000000
    begin_signature: 0x80002000
    end_signature: 0x80002090
    CSR mstatus <- 0x00000000 (input: 0x00000000)
    ...
    SUCCESS
    ...
    riscv-test-env/verify.sh
    ...
    Compare to reference files ...

    Check                 I-ADD-01 ... OK
    Check                I-ADDI-01 ... OK
    ...
    Check                I-XORI-01 ... OK
    --------------------------------
    OK: 48/48 RISCV_TARGET=sail-riscv-c RISCV_DEVICE=rv32i RISCV_ISA=rv32i
    ...

In the transcript, you will see the results for each of the ISA groups mentioned in the for isa in rv32i, rv32im, … line at the top. For each test, it compiles the ELF file and executes it on the C simulator, and then verifies the output against the expected output.

7. Adding new ISA extensions to the Formal Spec

The general scheme for adding a new instruction, or new class of instructions is (these bullets repeat information from Section Taking stock.):

Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
Augment the ast type by adding a scattered clause to describe this new class
Augment the encdec mapping by adding a scattered clause to describe this new class
Augment the execute function by adding a scattered clause to describe this new class
Augment the assembly mapping by adding a scattered clause to describe this new class

It is a stylistic judgement call whether you define a class with sub-opcodes, or just define a separate clause for each instruction in the class. E.g., we could have defined separate ast, encdec, execute and assembly clauses for BEQ, BNE, BLT, …

A class with sub-opcodes makes sense when the instructions share structure and semantics. For example, BEQ/BNE/BLT/… differ only in the particular comparison operator; using a class with sub-opcodes captures this similarity.

7.1. Case study: adding ISA Extensions F and D (single and double-precision floating point)

The implementation of F and D extensions (single and double-precision floating point) in the RISC-V ISA Formal Spec in Sail is somewhat atypical in that it is not currently wholly written in Sail.

All the RISC-V-specific parts are indeed written in Sail—everything about instruction bit representation, encoding/decoding, assembly-language mapping, floating point registers and CSRs, and opcode-based dispatching is done in the canonical way in Sail.

In this section, we only discuss this part.
For the actual IEEE arithmetic, we call out to a collection of standard external C routines that emulate IEEE Floating Point operations. This is the well known “Berkeley SoftFloat” emulation of IEEE FP, from https://github.com/ucb-bar/berkeley-softfloat-3.git. Berkeley Softfloat is widely used across the industry and has been well tested over many years.

In principle, this part could also be coded in Sail, but it is quite a lot of work with marginal immediate benefit, since Berkeley softfloat has been so very well tested. Coding in Sail could be a future project for some interested party.

7.1.1. New Registers for the F and D Extensions

Since F,D define a new register width FLEN, we created two new files by analogy with riscv_xlen32.sail and riscv_xlen64.sail. (This would not be necessary for extensions that don’t define new widths):

In sail-riscv/model/

    riscv_flen_F.sail
    riscv_flen_D.sail

Since F,D define new registers, we created two new files by analogy with riscv_reg_type.sail and riscv_regs.sail. (This would not be necessary for extensions that don’t define new registers):

In sail-riscv/model/

    riscv_freg_type.sail
    riscv_fdext_regs.sail

Since F,D define new CSRs, we created a new file by analogy with riscv_sys_control.sail. (This would not be necessary for extensions that don’t define new CSRs):

In sail-riscv/model/

    riscv_fdext_control.sail

7.1.2. Semantics of F and D instructions

Finally, the semantics of F and D instructions are in two new files:

In sail-riscv/model/

    riscv_insts_fext.sail
    riscv_insts_dext.sail

The following two files are for the floating point instructions in the C (Compressed instructions) ISA extension:

In sail-riscv/model/

    riscv_insts_cfext.sail
    riscv_insts_cdext.sail

If we study riscv_insts_fext.sail, it starts with a number of help-functions. Then, it goes through each class of F instruction, with each class following the standard pattern:

Define an enum and mapping for any sub-opcodes in the class (if the class contains more than one instruction)
Augment the ast type by adding a scattered clause to describe this new class
Augment the encdec mapping by adding a scattered clause to describe this new class
Augment the execute function by adding a scattered clause to describe this new class
Augment the assembly mapping by adding a scattered clause to describe this new class

For example the first such class is for Floating-point load instructions. It starts with a scattered clause to define the ast:

In sail-riscv/model/riscv_insts_fext.sail

    /* FLW and FLD; W/D is encoded in 'word_width' */

    union clause ast = LOAD_FP : (bits(12), regidx, regidx, word_width)

Then, the scattered clause for the encdec mapping (encode/decode):

In sail-riscv/model/riscv_insts_fext.sail

    mapping clause encdec = LOAD_FP(imm, rs1, rd, WORD)          if is_RV32F_or_RV64F()
                        <-> imm @ rs1 @ 0b010 @ rd @ 0b000_0111  if is_RV32F_or_RV64F()

    mapping clause encdec = LOAD_FP(imm, rs1, rd, DOUBLE)        if is_RV32D_or_RV64D()
                        <-> imm @ rs1 @ 0b011 @ rd @ 0b000_0111  if is_RV32D_or_RV64D()

Then, the scattered clause for the execute function (semantics):

In sail-riscv/model/riscv_insts_fext.sail

    function clause execute(LOAD_FP(imm, rs1, rd, width)) = {
      let offset : xlenbits = EXTS(imm);
      ...

And, finally, the scattered clause for the assembly notation:

In sail-riscv/model/riscv_insts_fext.sail

    mapping clause assembly = LOAD_FP(imm, rs1, rd, width)
                          <-> "fl" ...

7.1.3. Updating the Makefiles

The Sail files described in the previous sections are enough for reading and understanding the spec.

In order to incorporate it into the generated simulators (C-based and OCaml based), we also need to update the Makefile. They following display shows the key lines updated or added to the Makefile (the lines for F,D are marked here with + on the left):

In sail-riscv/Makefile

+   # Currently, we only have F with RV32, and both F and D with RV64.
    ifeq ($(ARCH),RV32)
      SAIL_XLEN := riscv_xlen32.sail
+     SAIL_FLEN := riscv_flen_F.sail
    else ifeq ($(ARCH),RV64)
      SAIL_XLEN := riscv_xlen64.sail
+     SAIL_FLEN := riscv_flen_D.sail
    else
      $(error '$(ARCH)' is not a valid architecture, must be one of: RV32, RV64)
    endif
    ...
    SAIL_DEFAULT_INST += riscv_insts_fext.sail riscv_insts_cfext.sail
+   ifeq ($(ARCH),RV64)
+   SAIL_DEFAULT_INST += riscv_insts_dext.sail riscv_insts_cdext.sail
+   endif
    ...
+   SAIL_SYS_SRCS += riscv_softfloat_interface.sail riscv_fdext_regs.sail riscv_fdext_control.sail
    ...
+   PRELUDE = prelude.sail prelude_mapping.sail $(SAIL_XLEN) $(SAIL_FLEN) ...
    ...
+   SAIL_REGS_SRCS = riscv_reg_type.sail riscv_freg_type.sail riscv_regs.sail ...
    ...
    C_INCS = $(addprefix c_emulator/,riscv_prelude.h riscv_platform_impl.h riscv_platform.h riscv_softfloat.h)
    C_SRCS = $(addprefix c_emulator/,riscv_prelude.c riscv_platform_impl.c riscv_platform.c riscv_softfloat.c riscv_sim.c)

The lines involving softfloat are the links to the external Berkeley Softfloat library; such lines will not be necessary for any ISA extension coded wholly in Sail.

8. Concurrency and Connection to RISC-V Weak Memory Model

8.1. Concurrency

In this introductory tutorial we deliberately stayed away from questions of concurrency, which is a more advanced topic.
Each instruction semantics can be regarded as a small sequential thread performing that instruction’s semantics. There are various “events” during this thread’s progress.
In our simple one-instruction-at-a-time fetch-execute loop model shown in this tutorial (and in the default simulators built), these threads are simply concatenated into an overall single sequential thread.

A parallel model can overlap these threads:
- a pipeline model may launch the next instruction’s thread before the current one has finished; in fact can launch it speculatively based only on PC of previous instruction;
- a superscalar model may launch two or more of these threads together;
- an out-of-order model may have many of these threads running concurrently;
- register read/write events can model renamed registers;
- memory address/read/write events can interact with a model of weakly ordered memory;
- and so on.
The RMEM concurrency tool is meant for these purposes (https://github.com/rems-project/rmem)

8.2. Connection to official RISC-V Memory Model

RISC-V’s Weak Memory Model was developed by a separate RISC-V Foundation Technical Group, chaired by Dan Lustig of NVidia.
One of their formalizations was indeed using these Sail and RMEM system and models. As mentioned in the previous slide, this uses a concurrent fetch-execute model where multiple instructions may be in flight concurrently, with concurrent interactions with the weak memory model.
(These should be covered in another, more advanced tutorial.)

Files

Tutorial.adoc

Latest commit

History

Tutorial.adoc

File metadata and controls

A Tour of the RISC-V ISA Formal Specification

1. Introduction

1.1. Context, History and Outlook

1.2. Objectives for this tutorial, and target audience

1.3. What is an ISA Formal Specification?

1.4. How people use ISA formal specs

1.5. About Sail, the language in which the RISC-V ISA Formal Spec is written

1.6. RISC-V ISA Formal Spec Credits and Acknowledgements

1.7. Status and plans

1.8. Links, Downloading and Installing

2. Reading the ISA to understand semantics of each RISC-V instruction

2.1. A first taste

2.2. Spec source files

2.3. A word about types and type-checking

2.4. Preliminaries; Registers and values; “Scattered” organization of instruction specs

2.4.1. Let’s start with some small, easy files

2.4.2. Registers and values

Values in integer registers

Integer registers

Reading from integer registers

Writing integer registers

Using Sail overloading to simplify notation

2.4.3. “Scattered” organization of instruction specs

2.5. Instruction fields

2.5.1. Type definitions for instruction fields

2.6. Some base instructions

2.6.1. LUI and AUIPC: decoded view

2.6.2. LUI and AUIPC: Decoding

2.6.3. LUI and AUIPC: execution

2.6.4. LUI and AUIPC: assembly language parsing and printing

2.6.5. Conditional branch instructions: ast clause

2.6.6. Conditional branch instructions: encoding/decoding

2.6.7. Conditional branch instructions: execution

2.6.8. Conditional branch instructions: Assembly language parsing and printing

2.7. Taking stock

2.8. LOAD instructions: execution

2.9. Exceptions

2.9.1. Handling certain exceptions

3. Composing Individual Instruction Semantics into an Executable Model

4. Taking stock …​

4.1. Overview of spec module dependencies

5. Executing standard RISC-V ISA Tests

5.1. Reminder: Download and Install

5.2. Executing standard RISC-V ISA tests using the C simulator

5.2.1. Executing standard RISC-V ISA tests using the OCaml simulator

6. Executing standard RISC-V Compliance Tests

6.1. Executing Compliance tests: preparation

6.2. Executing Compliance tests

7. Adding new ISA extensions to the Formal Spec

7.1. Case study: adding ISA Extensions F and D (single and double-precision floating point)

7.1.1. New Registers for the F and D Extensions

7.1.2. Semantics of F and D instructions

7.1.3. Updating the Makefiles

8. Concurrency and Connection to RISC-V Weak Memory Model

8.1. Concurrency

8.2. Connection to official RISC-V Memory Model

4. Taking stock …