Skip to content

Latest commit

 

History

History
1063 lines (816 loc) · 36.7 KB

README.adoc

File metadata and controls

1063 lines (816 loc) · 36.7 KB

AWSteria_Infra

1. Introduction and Overview

AWSteria_Infra is for apps that run jointly on a standard host computer “host side” and an attached FPGA board (“hardware side”). The goal is that an app, when written against this API, will port trivially across a variety of supported platforms (host/FPGA pairings).

The initial set of platforms are (more in future):

  • Simulation in Bluesim and Verilator simulation (infrastructure is in Platform_Sim/).

  • Xilinx VCU118 FPGA board (infrastructure is in Platform_VCU118/).

  • Amazon AWS F1 (infrastructure is in Platform_AWSF1/).

The infrastructure provides:

  • A high-level API for the host side and hardware side to communicate.

  • A high-level API for the hardware side to access FPGA DDR memory.

  • Implementations for a number of platforms (host and hardware combinations) including, for some FPGAs, reclocking facilities for the hardware side to run at slower clock speeds if needed.

We also provide a small Test Application (in TestApp/) to test that the infrastructure is working properly on each platform. TestApp is also useful during development of support for new platforms.

About the name “AWSteria”: pronounced like the Italian word “Osteria” which is a tavern or pub (and rhymes with “pizzeria”). “AWS” of course is from “Amazon AWS”. “AWSteria” can also be read as a new word meaning a workplace or studio where one develops AWS apps.

2. APIs (i.e., interface specs for apps)

The APIs are specified in files in the Include_API/ directory. The following diagram illustrates the architecture of the user-application, AWSteria_Infra, and the APIs.

Fig 010 AWSteria Infra Architecture

AWSteria_Infra Implementations are provided for HDL simulation, for Xilinx VCU118 boards, and for Amazon AWS F1.

The API is inspired by Xilinx XDMA facilities (at https://github.com/Xilinx/dma_ip_drivers) and the “shell” in the Amazon AWS F1 HDK/SDK (at https://github.com/aws/aws-fpga.git).

2.1. Hardware side API

AWSteria_Infra defines the top-level Verilog or Bluespec BSV interface for your app hardware. This interface has several ports:

  • Two sets of clock-and-reset (a default and an extra).

  • An AXI4 M interface (manager/master) through which the host communicates with your hardware (wide interface, high bandwidth, typically used for moving data).

  • An AXI4-Lite M interface (manager/master) through which the host communicates with your hardware (32-bit wide, typically used for control/status).

  • One to four An AXI4 S interface (subordinate/slave) through which your hardware connects to DDR memories on the FPGA board.

  • A few miscellanous utility ports (evironment-ready input, DUT-halted output, 4ns counter input).

If you are coding directly in Verilog, use the following file as a starting point: it is an “empty” module with the required module name and port list; you can populate the interior with your application-specific logic (including instantiating sub-modules, etc.)

    Include_API/mkAWSteria_HW_EMPTY.v

The port list looks like this, in summary:

    module mkAWSteria_HW (CLK,
                          RST_N,
                          CLK_b_CLK,
                          RST_N_b_RST_N,

                          ... AXI4 M interface ports for host communication ...
                          ... AXI4-Lite M interface ports for host communication ...
                          ... AXI4 S interface port(s) for DDR communication ...

                          m_env_ready_env_ready,
                          m_halted,
                          m_glcount_glcount);

Here, CLK and RST_N are the default clock and reset, and CLK_b_CLK and RST_N_b_RST_N are the extra clock-reset pair (your app can ignore the extra pair if they are not needed).

If you are coding in BSV, use the following files as a starting point:

    Include_API/AWSteria_HW_EMPTY.bsv
    Include_API/AWSteria_HW_IFC.bsv

The former defines an “empty” BSV module with the required module name and interface. The latter defines the required interface. When compiled with the Bluespec bsc compiler it will produce a Verilog module with the required module name and port list.

The BSV module header looks like this:

    module mkAWSteria_HW #(Clock b_CLK, Reset b_RST_N)
       (AWSteria_HW_IFC #(AXI4_Slave_IFC #(16, 64, 512, 0),
                          AXI4_Lite_Slave_IFC #(32, 32, 0),
                          AXI4_Master_IFC #(16, 64, 512, 0)));

If you are coding in some other HDL or using HLS, you can either arrange for it to compile your top-level module to look like:

    Include_API/mkAWSteria_HW_EMPTY.v

or manually instantiate your top-level module inside this empty module.

Of course, when targeting an FPGA platform (Amazon AWS F1, Xilinx VCU118, …​) your Verilog RTL should be acceptable to the synthesis tool for that platform.

2.2. Host side API

On the host side, AWSteria_Infra defines a C API through which your host-side application communicates with the hardware via the AXI4 M and AXI4-Lite M ports described above.

    Include_API/AWSteria_Host_lib.h

Briefly, it contains an intialization and an shutdown call, a pair of read/write functions to communicate via the AXI4 M port, and a pair of read/write functions to communicate via the AXI4-Lite M port.

Host side code can be written in any language environment. To communicate with the hardware side it should invoke the C host-side API. AWSteria_Infra provides C code implementing the API for each platform.

3. Platform: Simulation (Bluesim and Verilator)

The Platform_Sim/ directory provides an implementation of AWSteria_Infra for simulation.

  • The host side and hardware side run as two processes on a standard computer.

  • The hardware side runs in simulation, Bluesim or Verilator simulation (it can be ported easily to other Verilog simulators).

  • The AWSteria_Infra host-hardware communication is emulated over TCP/IP.

  • The AWSteria_Infra DDR memory interfaces are connected to memory models.

This is illustrated in the following diagram:

Fig 020 AWSteria Infra Simulation

The “Test Application” and “Porting your application” sections below illustrate how to build and run an application on AWSteria_Infra in simulation.

In general, you won’t have to modify anything in this directory or build anything in this directory; it just provides resources for your application-build.

Note: the following source files:

    Platform_Sim/
    ├── Host
    │   ├── Bytevec.c
    │   ├── Bytevec.h
    └── HW
        ├── Bytevec.bsv

are auto-generated by the Python program in:

    Platform_Sim/
    ├── Gen_Bytevec
    └── README.txt

4. Platform: VCU118 board (from Xilinx)

The Platform_VCU118/ directory provides an implementation of AWSteria_Infra for a standard Debian/Ubuntu computer with a Xilinx VCU118 FPGA board attached with a PCIe bus. It uses the “Garnet” repository from University of Cambridge (https://github.com/CTSRD-CHERI/garnet).

The implementation offers an option where your hardware-side app runs at the Garnet default clock speed of 250 MHz, and an option where your hardware runs at a slower clock speed of 100 MHz. The latter option is achieved through a “reclocking” layer.

These are illustrated in the following diagram:

Fig 030 AWSteria Infra VCU118

The “Test Application” and “Porting your application” sections below illustrate how to build and run an application on AWSteria_Infra and Garnet on VCU118.

In general, you won’t have to modify anything or build anything in in Platform_VCU118/; it just provides resources for your application-build.

4.1. VCU118 Host side

Host/AWSteria_Host_lib.c implements the host-side API, invoking various system calls to interact with the Xilinx XDMA driver, to communicate with the FPGA.

Host/Cmd_Line_Tests.mk shows examples of using command-line tools provided in the Xilinx XDMA driver repo to read and write through the AXI4 and AXI4-Lite buses into the hardware side: dma_to_device, dma_from_device, and reg_rw. The dma_to_device tool optionally takes data from a file, to be written to the FPGA. Host/gen_test_data.c is a small program to generate such a test data file.

4.2. VCU118 Hardware side

HW/AWSteria_HW_reclocked/ is a Vivado Block Design project that was used to create the “reclocking layer” for AWSteria_HW_IFC.bsv that allows the app to run at slower clock speeds than the Garnet-supplied 250 MHz. I.e., it creates a module which is “shim” that:

  • Instantiates a app module (with the AWSteria_HW_IFC.bsv interface), and

  • The shim itself presents the same AWSteria_HW_IFC.bsv interface interface.

  • Inside the shim, it:

    • Instantiates a clock divider so that the inner module receives two sets of clock-and-reset, at 100 MHz and 50 MHz, respectively,

    • Instantiates clock crossings between corresponding the outer and inner interfaces.

This allows the user’s design (inner app module instance) to run at a slower clock.

In Vivado, the "Generate Block Design" action creates and populates the following directory:

    AWSteria_HW_reclocked/AWSteria_HW_reclocked.srcs/sources_1/bd

which is copied into example_AWSteria_HW_reclocked/src/bd (see below).

The Block Design creation has already been has already been done, in Vivado. Unless you want to change the clock speed configurations, or change the interfaces, this Block Design project does not have to be repeated.

TODO: Instead of copying .bd/ it should be possible to copy just a Tcl script that encodes the Block Design.

HW/example_AWSteria_HW/ and HW/example_AWSteria_HW_reclocked/ are template directories for Garnet, and are copied into the app’s build directories (see VCU118 flow for Test Application below). The former is meant for apps that can run at the full 250 MHz Garnet clock speed (and so do not need the reclocking shim); the latter is meant for apps that must run at slower clocks speeds and need the reclocking shim.

HW/synchronizers.v contains small RTL modules used by the reclocking shim for reset synchronization, 1-bit clock-crossing synchronization, and 64-bit clock-crossing synchronization. These instantiate and customize modules from the following IP in the Xilinx IP directories.

    /tools/Xilinx/Vivado/2019.1/data/ip/xpm/xpm_cdc/hdl/xpm_cdc.sv

5. Platform: Amazon AWS F1

The Platform_AWSF1/ directory provides an implementation of AWSteria_Infra for an Amazon AWS F1 instance (i.e., a server in the cloud with an FPGA board attached with a PCIe bus).

These are illustrated in the following diagram:

Fig 040 AWSteria Infra AWSF1

The “Test Application” and “Porting your application” sections below illustrate how to build and run an application on AWSteria_Infra on AWS F1.

In general, you won’t have to modify anything in this directory or build anything in this directory; it just provides resources for your application-build.

5.1. AWS F1 Host side

Host/AWSteria_Host_lib.c implements the host-side API, invoking various functions in AWS' aws-fpga SDK libraries to communicate with the FPGA.

5.2. AWS F1 Hardware side

HW/ contains some SystemVerilog files that are a wrapper around the app RTL, and which plugs into the so-called “shell” in the AWS' aws-fpga HDK. The shell connects the host-communication AXI4 and AXI4-Lite interfaces to the PCIe bus, and the DDR interfaces to DDRs on the FPGA board.

6. Test Application

The TestApp/ directory provides a small and simple test application. When you create a new application, you could use this as a starting template and modify it for purpose (see Section “Porting your application” for more details).

6.1. App source code and intended behavior

TestApp/Host/main.c is the host-side source code; it invokes the host side C API Include_API/AWSteria_Host_lib.h.

TestApp/HW/AWSteria_HW.bsv is the hardware-side source code, filling out the “empty” module provided in Include_API/AWSteria_HW_EMPTY.bsv.

The hardware side is simple: it connects the host AXI4-Lite interface to an AXI4-Lite-to-AXI4 adapter which, along with the host AXI4 interface connects to a 2x2 AXI4 crossbar switch which, in turn, connects to two AXI4 DDR interfaces.

The host side simply writes random data to hardware-side DDRs, and reads them back to verify the data. Writes and reads are performed over both the host AXI4 and AXI4 Lite interfaces, including writing through one and reading through the other. The AXI4 interface is also exercised with large writes and reads, to exercise AXI4 burst transfers.

This is illustrated in the following diagram:

Fig 050 AWSteria Infra TestApp

6.2. Building and running with Bluesim or Verilator simulation

  • In TestApp/Host/build_sim do make exe_Host_sim to create the host-side executable exe_Host_sim.

  • In TestApp/HW/build_Bluesim do make all to create the HW-side simulation executable exe_HW_sim.

    or,

    in TestApp/HW/build_Verilator do make all to create the HW-side simulation executable exe_HW_sim.

  • Run the hardware side executable in one process (e.g., in one terminal window) It will await a TCP connection on a TCP port from the host side; it will then execute the hardware.

  • Run the host side executable in another process (e.g., in another terminal window) It will connect using TCP to the hardware side and then interact with the hardware side, displaying messages about its actions (reading and writing to DDRs on the hardware side).

You will have to kill the HW-side process when done (e.g., using ^C). You can restore each build directory to its pristine state with make full_clean.

6.3. Building and running on Xilinx VCU118

AWSteria_Infra support for Xilinx PCIe-based boards (e.g., VCU118) is built on top of the “Garnet” system. Please see Section “Prerequsites for Xilinx PCIe-based FPGA boards” for background and details.

6.3.1. Building the hardware side

You will need to have installed Xilinx’s Vivado tool suite, and have a Vivado license that includes synthesis for the FPGA on the VCU118. Building the hardware side for VCU118 involves some steps locally in the AWSteria_Infra repo, followed by a step in the “Garnet” repo.

An app in AWSteria_Infra can either run at Garnet’s full speed (250 MHz), or it can run at a slower clock speed; AWSteria_Infra provides the slower clock, and suitable clock-crossing logic.

We describe first the flow for a full speed app, and then the slight variation for a slower speed app.

The following steps are performed in the AWSteria_Infra repo (the two make targets can be provided together):

  • In TestApp/HW/build_VCU118 do make compile. This will create a directory RTL/ and populate it with Verilog RTL generated from the BSV source code by the Bluespec bsc compiler.

  • In TestApp/HW/build_VCU118 do make for_garnet. This will create a directory example_TestApp/ that is ready to run through the Garnet flow.

Copy the example_TestApp/ directory into the top-level of the Garnet repo; change to that directory, and make:

    ... copy example_TestApp directory to garnet repo ...
    $ cd garnet/example_TestApp
    $ make

Garnet will run Vivado on TestApp RTL, eventually producing a “partial bitfile”:

    garnet/example_TestApp/build/AWSteria_pblock_partition_partial.bit

This takes about 1 hour on a 12-core, 1.1 GHz, Intel Core i7-10710U CPU.

You should check that your design has met timing:

    $ grep ^Slack  build/timing_summary.rpt

A line like this, showing “negative slack” indicates the design did not meet timing:

    Slack (VIOLATED) : -0.592ns  (required time - arrival time)

If so, you need to fix your design and repeat the hardware-build steps to this point, until your design meets timing.

Building the hardware side with a slower clock

To build TestApp to run at the slower clock speed (100 MHz), the steps are analogous:

  • In TestApp/HW/build_VCU118 do make for_garnet_reclocked. This will create a directory example_TestApp_reclocked/ that is ready to run through the Garnet flow.

Copy the example_TestApp_reclocked/ directory into the top-level of the Garnet repo; change to that directory, and make:

    ... copy example_TestApp_reclocked directory to garnet repo ...
    $ cd garnet/example_TestApp
    $ make

Garnet will run Vivado on TestApp RTL, eventually producing a “partial bitfile”:

    garnet/example_TestApp_reclocked/build/AWSteria_pblock_partition_partial.bit

You should check that your design has met timing:

    $ grep ^Slack  build/timing_summary.rpt

A line like this, showing “negative slack” indicates the design did not meet timing:

    Slack (VIOLATED) : -0.592ns  (required time - arrival time)

If so, you need to fix your design and repeat the hardware-build steps to this point, until your design meets timing.

6.3.2. Loading the partial bitfile into the VCU118 FPGA

We should have already loaded the Garnet fixed bitfile (the Garnet “shell”). Loading the partial bitfile for TestApp requires the xvsecctl tool and xvsec driver. (See Section “Prerequsites for Xilinx PCIe-based FPGA boards”).

Example Makefile fragment to perform the partial bitfile reconfiguration:

BUS           = 0x07
DEVICE_NO     = 0x0
CAPABILITY_ID = 0x1
BITFILE       = garnet/example_TestApp/build/AWSteria_pblock_partition_partial.bit

reconfig:
        sudo xvsecctl -b $(BUS) -F $(DEVICE_NO) -c $(CAPABILITY_ID) -p $(BITFILE)

The file AWSteria_Infra/TestApp/HW/build_VCU118/Reconfig.mk is a makefile showing this command for various partial bitfiles (Garnet example, TestApp 250 MHz and 100 MHz, etc.).

6.3.3. Building the host side on VCU118

In TestApp/Host/build_VCU118 do

    make  exe_Host_VCU118

to create the host-side executable exe_Host_VCU118.

6.3.4. Running the application

Important
please be familiar with Section “Prerequsites for Xilinx PCIe-based FPGA boards” below and ensure everything is in place.

Then, run the executable. It will interact with the hardware on the FPGA. The console output should be exactly the same as running in simulation (described earlier).

6.4. Building and running on Amazon AWS F1

6.4.1. Prerequisites: aws-fpga (SDK and HDK) and Vivado tool suite

You can perform the builds on your own computers (“on premisies”), but you may find it more convenient to build on the Amazon AWS cloud, using an “FPGA Developer” AMI (Amazon Machine Instance) because it has the prerequisite tools and licenses already installed.

If you are building on your own computers:

  • Please clone Amazon’s aws-fpga repo, which can be found at https://github.com/aws/aws-fpga.git. Initialize them as described in its README, sourcing hdk_setup.sh and sdk_setup.sh. The former is needed for the hardware build, below, and the latter is needed for the host-side software build.

  • Please install the Amazon AWS Command Line Interface aws as described in https://aws.amazon.com/cli/.

  • You need to have installed Xilinx’s Vivado tool suite and have a Vivado license for synthesis for the FPGA part that is on AWS F1 instances.

6.4.2. Building the hardware side

Generate/prepare your app RTL

The following steps are performed in the AWSteria_Infra repo (these two make commands can be given as one):

  • In TestApp/HW/build_AWSF1 do make compile. This will create a directory RTL/ and populate it with Verilog RTL generated from the BSV source code by the Bluespec bsc compiler.

  • In TestApp/HW/build_AWSF1 do make for_AWSF1_HDK. This will create a directory cl_AWSteria_TestApp/ that is ready to run through the aws-fpga HDK flow.

AWS HDK step to create a DCP

This step is performed on a machine where you have installed the Amazon AWS aws-fpga repo, in particular its HDK (see Prerequisites section above). You should have initialized the HDK by sourcing hdk_setup.sh (which will also define the environment variable HDK_DIR). The repo has more detailed documentation, if you need it.

If you created cl_AWSteria_TestApp/ directory (previous section) on a different machine, please copy it to the machine with aws-fpga machine.

Perform the “create DCP” (Design Checkpoint) action:

    $ cd  <wherever>/cl_AWSteria_TestApp/
    $ export CL_DIR=$(pwd)
    $ cd build/scripts
    $ ./aws_build_dcp_from_cl.sh  -ignore_memory_requirement

This will create a background process that runs Vivado on the AWSteria TestApp RTL, eventually producing a “Design Checkpoint” (DCP). The console output of the background process and the Vivado run are continuously logged in files whose names have this pattern, i.e., the prefix is the timestamp of when command was started:

    21_08_20-020656.nohup.out
    21_08_20-020656.vivado.log

You can monitor Vivado’s progress by watching these log files, e.g.,

    tail -f 21_08_20-020656.vivado.log

The aws_build_dcp_from_cl.sh step optionally can take an aws-fpga “clock recipe” argument. Examples:

    $ ./aws_build_dcp_from_cl.sh  -ignore_memory_requirement  -clock_recipe_a A1
    $ ./aws_build_dcp_from_cl.sh  -ignore_memory_requirement  -clock_recipe_a A2

The default clock recipe is A0, and builds for 125 MHz; A1 is for 250 MHz, and A2 is for 16.67 MHz. Details about clock recipes can be found at:

The DCP build for the default clock recipe (A0, 125 MHz) takes about 1:40 hours running in an “FPGA Developer” AMI on an Amazon z1d.2xlarge machine.

You should check that your design has met timing for the selected clock recipe:

    $ cd  <wherever>/cl_AWSteria_TestApp/build/scripts
    $ grep ^Slack ../reports/21_08_20-020656.timing_summary_route_design.rpt

A line like this, showing “negative slack” indicates the design did not meet timing:

    Slack (VIOLATED) : -0.592ns  (required time - arrival time)

If so, you need to fix your design and repeat the hardware-build steps to this point, until your design meets timing.

Your DCP should be available in a tarfile here (the timestamp will differ):

    <wherever>/cl_AWSteria_TestApp/build/checkpoints/to_aws/21_08_20-020656.Developer_CL.tar
AWS HDK step to create an AFI from the DCP

Once your DCP is ready, you need to upload it into a folder in an Amazon S3 cloud storage “bucket”. If you don’t already have a bucket-and-folder, you can create it and list its contents like this (this is a one-time step; you can reuse this bucket/folder in subsequent builds):

    $ aws s3 mb  s3://my_bucket/my_folder/
    $ aws s3 ls  s3://my_bucket/my_folder/

Copy your DCP tarfile into the S3 folder:

    TO_AWS_DIR  = <wherever>/cl_AWSteria_TestApp/build/checkpoints/to_aws
    DCP_TARFILE = 21_08_20-020656.Developer_CL.tar
    $ aws s3 cp  $(TO_AWS_DIR)/$(DCP_TARFILE)  s3://my_bucket/my_folder/
    $ aws s3 ls  s3://my_bucket/my_folder/

Note: AWS requires you to have “permission” to create folders and upload files; it may complain “Unable to locate credentials”. You’ll need to follow the usual steps for this:

  • Go to your Amazon AWS Management Console in your brower;

  • Select “Command Line or Programmatic Access” which pops up a window “Get credentials for AWSPowerUserAccess”, and

  • Follow one of the options there for establishing your credentials (e.g., copy the environment variable defs to your clipboard and paste them into your command shell).

Once uploaded, you can issue the command to create an AWS AFI (AWS F1 Image). You must provide a name for your AFI and a brief description, and specify the Amazon AWS cloud "region" in which you work. Example:

    $ aws ec2 create-fpga-image \
        --region us-west-2 \
        --name AWSteria_TestApp \
        --description "Testapp for AWSteria Infrastructure on AWS F1" \
        --input-storage-location Bucket=my_bucket,Key=my_folder/$(DCP_TARFILE) \
        --logs-storage-location Bucket=my_bucket,Key=my_folder

This will submit (to some mysterious process in the AWS cloud), a request to create your AFI from your DCP checkpoint, but it will immmediately print out two unique IDs for this AFI:

{
    "FpgaImageId": "afi-0bf39b6143abf492c",
    "FpgaImageGlobalId": "agfi-0a4fd4a251c7e8690"
}

Please make a careful note of these IDs, as you will need it for subsequent steps!

You can monitor progress of your AFI creation with:

    $ aws ec2 describe-fpga-images --fpga-image-ids  "afi-0bf39b6143abf492c"

whose initial output will look like this (note that State is “pending”, and UpdateTime is the same as CreateTime):

        {
            "UpdateTime": "2021-08-20T17:18:16.000Z",
            "Name": "RSNAWSteriaTestApp",
            "Tags": [],
            "FpgaImageGlobalId": "agfi-0a4fd4a251c7e8690",
            "Public": false,
            "State": {
                "Code": "pending"
            },
            "OwnerId": "845509001885",
            "FpgaImageId": "afi-0bf39b6143abf492c",
            "CreateTime": "2021-08-20T17:18:16.000Z",
            "Description": "ASWteria TestApp 125 MHz"
        }

After about 50-60 minutes, your AFI will be ready, and the output of the command will change to the following. Note, State will be “available” and the UpdateTime will have been updated to the AFI creation time.

        {
            "UpdateTime": "2021-08-20T18:10:49.000Z",
            "Name": "RSNAWSteriaTestApp",
            "Tags": [],
            "PciId": {
                "SubsystemVendorId": "0xfedc",
                "VendorId": "0x1d0f",
                "DeviceId": "0xf001",
                "SubsystemId": "0x1d51"
            },
            "FpgaImageGlobalId": "agfi-0a4fd4a251c7e8690",
            "Public": false,
            "State": {
                "Code": "available"
            },
            "ShellVersion": "0x04261818",
            "OwnerId": "845509001885",
            "FpgaImageId": "afi-0bf39b6143abf492c",
            "CreateTime": "2021-08-20T17:18:16.000Z",
            "Description": "ASWteria TestApp 125 MHz"
        }

Note: the aws ec2 create-fpga-image command has options to notify completion by sending you an email, instead of manual monitoring.

Your AFI is now ready to load onto an AWS F1 FPGA and run, interacting with your host-side app software.

6.4.3. Loading the AFI (with bitfile) into the AWS F1 FPGA

On an Amazon AWS F1 instance, load your AFI (your app’s hardware side) into the FPGA as follows:

    $ sudo fpga-load-local-image -S 0 -I "agfi-0a4fd4a251c7e8690"

The fpga-load-local-image program becomes available when cloned the aws-fpga repo and sourced sdk_setup.sh (see Prerequisites section above).

You can check on the status of your loaded AFI in the FPGA using either of the following commands (the latter is much more verbose):

    $ sudo fpga-describe-local-image -S 0 -R -H
    $ sudo fpga-describe-local-image -S 0 -R -H -M

6.4.4. Building and running the host side on AWSF1

In TestApp/Host/build_AWSF1 do make to create the host-side executable exe_Host_AWSF1.

Then, run the executable.

    $ sudo ./exe_Host_AWSF1

It will interact with the hardware on the FPGA. The console output should be exactly the same as running in simulation (described earlier).

7. Porting your application to use AWSteria_Infra

The small TestApp example and its build-and-run flow provides a template for coding, building and running your app. The Include_API/ files provide “empty” Verilog and BSV modules for convenience, which you can use as your starting point.

Create your own app directory as a sibling to TestApp, with the same structure (you can omit any of these platform-directories that you don’t need):

    MyApp/
        Host/
            ... your source files ...
            build_sim/
            build_VCU118/
            build_AWSF1/
        HW/
            ... your source files ...
            build_Bluesim/
            build_Verilator/
            build_VCU118/
            build_AWSF1/

Create Makefiles in each build_xxx directory, using those in the corresponding directories in TestApp as a starting template.

Follow the build-and-run flows described for TestApp.

8. Prerequsites for Xilinx PCIe-based FPGA boards: XDMA and XVSEC drivers and Garnet

8.1. Xilinx XDMA and XVSEC drivers

Please install Xilinx’s XDMA and XVSEC drivers on your host Linux machine, where your Xilinx PCIe-based board (e.g., VCU118) is attached. The drivers, build and installation instructions are at: https://github.com/Xilinx/dma_ip_drivers.git.

XDMA installation will install the xdma driver in your Linux kernel. After intallation you’ll see files like this /dev/xdma* on your Linux host. See instructions at:

XVSEC installation will install the xvssecctl tool and driver, which is used for “partial reconfiguration” of the FPGA with a partial bitfile. After intallation you’ll see files like this /dev/xvsec* on your Linux host, and the following executable tool: /usr/local/sbin/xvsecctl. See instructions at:

VSEC (Vendor Specific Extended Capability) is a feature of PCIe.

8.1.1. Useful utilities for querying and installing drivers

List PCI devices on system:

    $ lspci                              List all devices on PCI
    $ man lspci                          for help on lspci

    $ lspci -d 10ee:903f                 List Xilinx devices on PCI (vendoe, device id)
    07:00.0 Serial controller: Xilinx Corporation Device 903f

Check which drivers are currently loaded in the OS kernel:

    $ lsmod                              List all loaded drivers
    $ lsmod | grep -e xdma -e xvsec      Filter for xdma and xvsec

Loading a driver from the OS kernel (starting):

    $ sudo modprobe xvsec

When xdma and xvsec drivers are loaded, we can see related “devices” like this:

    $ ls /dev/xdma*  /dev/xvsec*

We can see which version of a driver was loaded by examining host system messages, or using modinfo:

    $ sudo dmesg | grep -i xdma
    ...
    [    4.533329] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.1.8
    ...

    $ sudo modinfo xvsec
    ...
    version:        2020.2.0
    ...

Unloading a driver from the OS kernel (removing/stopping):

    $ sudo rmmod -s xvsec

8.2. Garnet

AWSteria_Infra support for Xilinx PCIe-based boards (e.g., VCU118) is built on top of the “Garnet” system, and thus follows Garnet-build flows. The Garnet repo from Cambridge University, UK is at https://github.com/CTSRD-CHERI/garnet.

Once cloned, we need to perform the following one-time step in the top-level directory to prepare some components that are used by AWSteria_Infra apps:

    $ make ip

Garnet provides PCIe and DDR infrastructure for VCU118, and a 250 MHz clock and reset. Please clone Garnet and follow the instructions there to build and run the provided simple example.

The Garnet flow installs two separate bitfiles on the VCU118, in two separate steps. The first bitfile is for a component called the “shell” and contains fixed, unchanging support for PCIe and DDR4s. This component needs to be loaded just onc, using standard bitfile-loading mechanisms (USB, jtag, Flash, …​).

The second bitfile is a “partial bitfile” containing the application logic. This partial bitfile is loaded using Xilinx’s “partial reconfiguration” mechanism, which uses Xilinx’s XVSEC driver.

The second bitfile can be repeatedly re-loaded with different application design iterations, or with different applications, without having to reload the “shell” bitfile.

The flow for AWSteria_Infra results in such a partial bitfile which plugs into the Garnet “shell” environment.

8.3. Steps for running AWSteria_Infra on Garnet on Xilinx PCIe-based boards

One-time steps:

  • Load the Xilinx XDMA driver into the host’s OS kernel if not already running. We recommend configuring the host OS so that XDMA is automatically started on each reboot.

  • Load the the Garnet “shell” (fixed bitfile) into the FPGA using normal bitfile-loading procedures. This does not require PCIe or XDMA or XVSEC drivers; it’s usually done over USB or jtag. Example:

        $cd  $(AWSTERIA_INFRA_REPO)/Platform_VCU118/HW
        $ ./program_fpga  $(GARNET_REPO)/shell/prebuilt/empty.bit

    program_fpga is a Tcl script for Vivado to program the FPGA using USB. As a convenience, this command is also in the Load_Shell.mk makefile, so you can just say make load_shell (edit the path to the Garnet shell bitfile, if needed).

  • Reboot the host, so that the host’s PCI probing and XDMA recognize the PCI IP in the just-loaded FPGA bitfile.

  • Load the Xilinx XVSEC driver into the OS kernel if not already running.

Repeated steps for each new application, or each design iteration of an appilication:

  • Use the Xilinx xvsecctl command-line tool to load the AWSteria_Infra application’s FPGA-side partial bitfile. xvsecctl performs “partial reconfiguration” using the Xilinx XVSEC driver.

Then, the application’s host-side can interact with the FPGA-side. This communication uses the Xilinx XDMA driver.

9. Possible Futures

We may Port AWSteria_Infra to more platforms (more host/FPGA board pairings). Note the host-FPGA communication does not have to be over PCIe; it could run over other transports such as Ethernet, USB, JTAG, …​ (albeit with slower performance). Indeed Platform_Sim described above uses TCP/IP as a transport.

We may augment TestApp for other uses:

  • Measure AWSteria_Infra performance: latencies and bandwidths for host-FPGA communication, for DUT-Memory access, etc.

  • “Unload” DDR after some DUT has run in AWSteria_Infra, e.g., application performance counters stored in DDR (for platforms where DDR contents are preserved across bitfile reloads). This would be a minor change to host side C code.

  • “Preload” DDR before some DUT has run in AWSteria_Infra, e.g., a section of DDR used by the DUT as a ROM, or as initialized memory (for platforms where DDR contents are preserved across bitfile reloads). This would be a minor change to host side C code.