Example: VGA Graphics

pmodarty

This page describes using an FPGA development board and a VGA PMOD to do basic VGA graphics.

This example is from a series of examples designed for dev boards.

Some of the most simple VGA designs involve outputting a test pattern to the screen with code like so:

// See top level IO pin config in top.h
#include "top.h"

// VGA pmod part of demo
#define FRAME_WIDTH 640
#define FRAME_HEIGHT 480
#include "vga/vga_timing.h"
#include "vga/test_pattern.h"
// vga_timing() and PIXEL_CLK_MHZ from vga_timing.h in top.h
MAIN_MHZ(vga_pmod_main, PIXEL_CLK_MHZ) 
void vga_pmod_main(){
  // VGA timing for fixed resolution
  vga_signals_t vga_signals = vga_timing();
  
  // Test pattern from Digilent of color bars and bouncing box
  test_pattern_out_t test_pattern_out = test_pattern(vga_signals);
  
  // Drive output signals/registers
  vga_r = test_pattern_out.pixel.r;
  vga_g = test_pattern_out.pixel.g;
  vga_b = test_pattern_out.pixel.b;
  vga_hs = test_pattern_out.vga_signals.hsync;
  vga_vs = test_pattern_out.vga_signals.vsync;
}

VGA Signals and Timing

Typically inside top.h is an include of vga_timing.h that configures VGA video timing signals.

The details of how VGA signals are produced is well documented in places like Digilent's tutorial and the awesome set of Project F tutorials.

project f timings image

In PipelineC vga_timing.h defines a vga_timing() function to produce the counter based signals:

typedef struct vga_pos_t
{
  uint12_t x;
  uint12_t y;
}vga_pos_t;
typedef struct vga_signals_t
{
  vga_pos_t pos;
  uint1_t hsync;
  uint1_t vsync;
  uint1_t active;
}vga_signals_t;
vga_signals_t vga_timing();

Dev Board Setup

It is recommended to follow the design patterns shown in a series of examples designed for dev boards.

pico-ice version of top.c
Arty version of top.c

Notice the include of top.h in the above code. This is where top level IO is configured. For instance you might configure the VGA signals for specific PMOD pins like so:

// Configure VGA module to use PMOD0 and PMOD1
// PMOD0 = VGA PMOD J1
// PMOD1 = VGA PMOD J2
// 12b color, 4b per RGB channel
#define VGA_R0_WIRE pmod_0a_o1
#define VGA_R1_WIRE pmod_0a_o2
#define VGA_R2_WIRE pmod_0a_o3
#define VGA_R3_WIRE pmod_0a_o4
//
#define VGA_G0_WIRE pmod_1a_o1
#define VGA_G1_WIRE pmod_1a_o2
#define VGA_G2_WIRE pmod_1a_o3
#define VGA_G3_WIRE pmod_1a_o4
//
#define VGA_B0_WIRE pmod_0b_o1
#define VGA_B1_WIRE pmod_0b_o2
#define VGA_B2_WIRE pmod_0b_o3
#define VGA_B3_WIRE pmod_0b_o4
//
#define VGA_HS_WIRE pmod_1b_o1
#define VGA_VS_WIRE pmod_1b_o2
#include "vga/vga_wires_4b.c"

top.h also references a board specific header inside of the board/ directory.

This design pattern of common code in top.c with different board specifics configured in top.h allows the final generated VHDL to be dev board specific and easy instantiate. While the internal PipelineC code can share a common interface with generic names, common helper libraries, etc.

Global Variables Interface

It has proven convenient to expose globally visible wires as one way of composing a design.

From the above include of vga_wires_4b.c and vga_wires.c the following global wires are visible:

uint1_t vga_hs;
uint1_t vga_vs;
uint8_t vga_r;
uint8_t vga_g;
uint8_t vga_b;

Test Pattern

// VGA timing for fixed resolution
vga_signals_t vga_signals = vga_timing();

// Test pattern from Digilent of color bars and bouncing box
test_pattern_out_t test_pattern_out = test_pattern(vga_signals);

// Drive output signals/registers
vga_r = test_pattern_out.pixel.r;
vga_g = test_pattern_out.pixel.g;
vga_b = test_pattern_out.pixel.b;
vga_hs = test_pattern_out.vga_signals.hsync;
vga_vs = test_pattern_out.vga_signals.vsync;

main data flow

Inside test_pattern.h the copy of Digilent's test pattern VHDL called test_pattern() is declared. The test_pattern() function does not use stateful static local variables and thus may be auto-pipelined.

typedef struct test_pattern_out_t{
  vga_signals_t vga_signals;
  pixel_t pixel;
}test_pattern_out_t;
test_pattern_out_t test_pattern(vga_signals_t vga_signals){
  // Logic to make a box move
  vga_pos_t box_pos = moving_box_logic();
  
  // Color pixel at x,y 
  pixel_t pixel = get_pixel_color(vga_signals.active, vga_signals.pos, box_pos);

  test_pattern_out_t o;
  o.vga_signals = vga_signals;
  o.pixel = pixel;
  return o;
}

test pattern data flow

The moving_box_logic() function is what animates the bouncing box part of the test pattern. Every clock cycle it updates its box position and direction registers in a simple finite state machine (it will not be auto-pipelined):

vga_pos_t moving_box_logic()
{  
  static uint12_t box_x_reg = BOX_X_INIT;
  static uint1_t box_x_dir = 1;
  static uint12_t box_y_reg = BOX_Y_INIT;
  static uint1_t box_y_dir = 1;
  static uint25_t box_cntr_reg;
  
  // ...FSM logic...
  vga_pos_t pos; // .x, .y
  ...
  return pos;
}

get_pixel_color() is a pure function with no state and may be auto-pipelined. It takes as input a screen position pos, and the current position of the box box_pos. Using these x,y positions the test pattern is encoded as a series if-else checks, i.e. 'if in a certain position, return a certain color'.

// Logic for coloring pixels
pixel_t get_pixel_color(uint1_t active, vga_pos_t pos, vga_pos_t box_pos)
{
  pixel_t p; // .r,.g,.b

  // ... if position is inside box
  //     ... p = box color... etc
  // (a pure function)
  return p;
}

Old Examples

bouncingimages

pong

man

Setup
Bouncing Images
Pong
Mandelbrot Viewer

Setup

Digilent provides reference files: Here is the .xdc file describing the PMOD ports for the VGA adapter. Connecting internal VGA signal to the external VGA PMOD is handled in vga_pmod.c.

Copying Digilent's VHDL example, a basic VGA test pattern example was the first step confirmed working. VGA timing parameters, front porch, back porch, etc for a fixed resolution can be found in vga_timing.h. The code for the VGA test pattern, using timing logic, and included PMOD port, etc can be seen in test_pattern_modular.c.

Bouncing Images

Based off of Digilent's test pattern which includes a bouncing box, the small black box was replaced with a colorful PipelineC logo and several of them were made to bounce around the screen. See bouncing_images.c. The file starts by including the board/pmod/vga things via #include "vga_pmod.c".

The logic for drawing and moving a rectangle filled with an image is also included #include "image_rect.h". Inside that file is another include #include "pipelinec_color.h" of a RAM initialization text file generated by make_image_files.py helper script.

Using the helper functions from image_rect.h the main function snippet shown below shows the per clock iteration of moving rectangles and getting the pixel color.

// Set design to run at pixel clock
MAIN_MHZ(app, PIXEL_CLK_MHZ)
void app()
{
  // VGA timing for fixed resolution
  vga_signals_t vga_signals = vga_timing();
  
  // N image rectangles all moving in parallel
  // Initial state values
  rect_t start_states[NUM_IMAGES];
  RECT_INIT(start_states) // Constants macro
  
  // Rectangle moving animation func/module outputs current state
  rect_t rects[NUM_IMAGES];
  uint32_t i;
  for(i=0;i<NUM_IMAGES;i+=1)
  {
    // Logic to make a rectangle move
    rects[i] = rect_move(start_states[i]);
  }
  
  // Color pixel at x,y
  color_12b_t color = get_pixel_color(vga_signals.active, vga_signals.pos, rects);
  
  // Drive output signals/registers
  ...
}

The rect_move() function contains static local variables that maintain a single rectangle's position+color state updated with each call from app(). The output from that function, image rectangle states rects, is then passed to get_pixel_color(). Inside get_pixel_color() a few things occur:

  // Func from from pipelinec_color.h
  uint32_t pixel_addr = pipelinec_color_pixel_addr(rel_pos);

  // In pipelineable luts, too slow for pycparser (and probabaly rest of PipelineC too)
  //color_12b_t pipelinec_color[pipelinec_color_W*pipelinec_color_H];
  // return pipelinec_color[pixel_addr]; 
  
  // As synthesis tool inferred (LUT)RAM
  pipelinec_color_DECL // Macro from pipelinec_color.h
  color_12b_t unused_write_data;
  // (LUT)RAM template function
  color_12b_t c = pipelinec_color_RAM_SP_RF_0(pixel_addr, unused_write_data, 0);

A function from the generated image file header is used to get the RAM address holding the single pixel of color data. Another piece of generated code in the form of a macro is used to initialize the RAM variable named pipelinec_color. Then using the pixel address a special PipelineC ROM-inferring function is invoked to retrieve the pixel color values (single/same cycle LUT RAM for simplicity).

Finally, RGB color values are swapped by using a color_mode state variable per stored rectangle. That value is incremented as the images bounce around and collide with the walls.

Pong

Similar in spirit to the above bouncing images example, Pong has 3 rectangles, two paddles and one 'ball'. The ball bounces off walls and user paddles. The paddles move from user input button presses on the Arty board.

pong.c starts off with inclusion of PMOD/VGA related things via #include "vga_pmod.c", and rectangle helper functions from rect.h.

Additionally buttons.c is included for access to the button state. Reading the state of the buttons and assigning them to user inputs looks like this:

// User input buttons
typedef struct user_input_t
{
  uint1_t paddle_r_up;
  uint1_t paddle_r_down;
  uint1_t paddle_l_up;
  uint1_t paddle_l_down;
}user_input_t;
user_input_t get_user_input()
{
  // Read buttons wire/board IO port
  uint4_t btns;
  WIRE_READ(uint4_t, btns, buttons)
  user_input_t i;
  // Select which buttons are up and down
  i.paddle_r_up = btns >> 0;
  i.paddle_r_down = btns >> 1;
  i.paddle_l_up = btns >> 2;
  i.paddle_l_down = btns >> 3;
  return i;
}

Several tiny functions are declared for collision detection, an example one: (not-Pong specific helpers inside rect.h are used too).

// Ball hit top of frame?
uint1_t ball_hit_roof(rect_animated_t ball)
{
  return (ball.vel_y_dir==UP) & (ball.rect.pos.y == 0);
}

And some helper functions for moving the paddles from user input, an example one:

// How to move paddle from user input, with screen limits
vga_pos_t move_paddle(vga_pos_t pos, uint1_t paddle_up, uint1_t paddle_down)
{
  if(paddle_up & !paddle_down)
  {
    if(pos.y >= BTN_POS_INC)
    {
      pos.y -= BTN_POS_INC;
    }
  }
  else if(paddle_down & !paddle_up)
  {
    if((pos.y + BTN_POS_INC) <= (FRAME_HEIGHT-PADDLE_HEIGHT))
    {
      pos.y += BTN_POS_INC;
    }
  }
  return pos;
}

Which then leads to the functionality described in the top level app() main function:

// State of objects in the game
typedef struct game_state_t
{
  rect_animated_t ball;
  rect_animated_t lpaddle;
  rect_animated_t rpaddle;
}game_state_t;

// Set design to run at pixel clock
MAIN_MHZ(app, PIXEL_CLK_MHZ)
void app()
{
  // VGA timing for fixed resolution
  vga_signals_t vga_signals = vga_timing();
  
  // Reset register
  static uint1_t reset = 1; // Start in reset
  // State registers
  static game_state_t state;
  // Per clock game logic:
  // Render the pixel at x,y pos given state
  pixel_t color = render_pixel(vga_signals.pos, state);
  // Do animation state update, not every clock, but on every frame
  if(vga_signals.end_of_frame)
  {
    // Read input controls from user
    user_input_t user_input = get_user_input();
    //printf("user input: %d\n", (int) user_input.paddle_r_up);

    state = next_state_func(reset, state, user_input);
    reset = 0; // Out of reset after first frame
  }  
  
  // Drive output signals/registers
  vga_pmod_register_outputs(vga_signals, color);
}

On each clock cycle the current game_state_t object (ball position+velocity, paddle position, etc) is passed to render_pixel() to determine the pixel color at a given VGA (x,y) position. Upon completing each frame, if(vga_signals.end_of_frame), the game is animated by a standard next state = f(current state) function via state = next_state_func(reset, state, user_input);.

render_pixel() is a simple function checking if the pixel position is colored for background, the ball, or one of the user paddles - it uses the rect_contains() helper function, for ex.

  if(rect_contains(state.ball.rect, pos))
  {
    c.r = BALL_RED;
    c.g = BALL_GREEN;
    c.b = BALL_BLUE;
  }

next_state_func() is where the majority of the Pong game logic resides. A snippet of such game logic for example:

  // Ball passing goal lines?
  if(ball_in_l_goal(state.ball))
  {
    if(rects_collide(state.ball.rect, state.lpaddle.rect))
    {
      // Bounce off left paddle
      next_state.ball.rect.pos = state.ball.rect.pos;
      next_state.ball.vel_x_dir = RIGHT;
      next_state.ball.vel = ball_paddle_inc_vel(state.ball.vel);
    }
    else
    {
      // Left scored on by right
      reset = 1; // Start over
      // TODO keep+display score
    }
  }

There absolutely exists optimizations that can be made for better resource utilization - but I did not feel the need as its already quite small: resources

Pretty device picture - look at that little chunk of logic, aw :) device

Mandelbrot Viewer

The above examples were quite simple in terms of the computation required to produce the image on screen. In fact, neither above example uses PipelineC's autopipelining capability since all operations can be completed in a single cycle (no pipelining required).

Computing the Mandelbrot set image on the other hand requires use of complex fractional/floating point numbers and requires many multiply and addition operations. The extent of the computation can be scaled to as many loop iterations as you desire (wiki pseudo code):

while (x2 + y2 ≤ 4 and iteration < max_iteration) do
    y := 2 × x × y + y0
    x := x2 - y2 + x0
    x2 := x × x
    y2 := y × y
    iteration := iteration + 1
return iteration

To pipeline a loop it must be a fixed number of iterations (for unrolling). mandelbrot.c is written using a for loop instead:

uint1_t not_found_n = 1;
for(i=0;i<MAX_ITER;i+=1)
{
  // Mimic while loop
  if(not_found_n) 
  {
    if((z_squared.re+z_squared.im) <= (ESCAPE*ESCAPE))
    {
      z.im = ((z.re*z.im)<< 1) + c.im;
      z.re = z_squared.re - z_squared.im + c.re;
      z_squared.re = z.re * z.re;
      z_squared.im = z.im * z.im;
      n += 1;
    }
    else
    {
      not_found_n = 0;
    }
  }
}
return n;

Note the 'body of the loop' uses the optimized Mandelbrot escape time calculation and makes use use of the built in floating point shift << operation for power of two operations that do not consume an entire multiplier/divider of resources. Both the ~fractalness MAX_ITER parameter and floating point mantissa size (~screen detail) can be scaled to as many resources as your FPGA allows.

In this demo MAX_ITER=14 iterations is used. C language FP32 float has an 8b exponent and 23b mantissa. An alias for this type in PipelineC is float_8_23_t. In this demo a reduced width floating point format is used:

#define float float_8_11_t // 8b exponent, 11b mantissa

There is a state_t struct holding state maintained from frame to frame. In this case, the bounds of the real and imaginary window:

typedef struct state_t
{
  // Plot window
  float re_start;
  float re_width;
  float im_start;
  float im_height;
}state_t;

Rendering a pixel involves using this state to compute the coordinate in the complex plane to run Mandelbrot iterations on.

// Convert pixel coordinate to complex number
complex_t c = {state.re_start + ((float)pos.x * (1.0f/(float)FRAME_WIDTH)) * state.re_width,
              state.im_start + ((float)pos.y * (1.0f/(float)FRAME_HEIGHT)) * state.im_height};
// Compute the number of iterations
uint32_t m = mandelbrot(c);
// The color depends on the number of iterations
uint8_t color = 255 - (int32_t)((float)m *(255.0/(float)MAX_ITER));

Similar to how Pong used buttons to control the state of paddle positions, this demo uses buttons and switches to move the state complex plane window bounds of the displayed image. However, unlike Pong, the state update computation requires pipelining as the floating point add and multiply operations cannot complete in a single pixel clock cycle. The state_t registers below are declared volatile so pipelining can still occur in this non-pure function that maintains state:

// Logic to update the state in a multiple cycle volatile feedback pipeline
inline state_t do_state_update(uint1_t reset, uint1_t end_of_frame)
{
  // Volatile state registers
  volatile static state_t state;
  
  // Use 'slow' end of frame pulse as 'now' valid flag occuring 
  // every N cycles > pipeline depth/latency
  uint1_t update_now = end_of_frame | reset;

  // Update state
  if(reset)
  {
    // Reset condition?
    state = reset_values();
  }
  else if(end_of_frame)
  {
    // Normal next state update
    state = next_state_func(reset, state);
  }  
  
  // Buffer/save state as it periodically is updated/output from above
  state_t curr_state = curr_state_buffer(state, update_now);
  
  // Overwrite potententially invalid volatile 'state' circulating in feedback
  // replacing it with always valid buffered curr state. 
  // This way state will be known good when the next frame occurs
  state = curr_state;
         
  return curr_state;
}

The above code uses a volatile static local variable called state. If the above code omitted the volatile keyword while keeping the static state then this function (which includes calling next_state_func(), floating point mults+adds, etc) would not be pipelined. The entire function logic would be squeezed into one long clock cycle - far too long to meet the pixel clock requirement. Instead volatile allows pipelining to occur and the user is responsible for maintaining non-volatile state via the curr_state_buffer() function.

Results

SDL C Prototype

Prior examples were small enough to need minimal debug if any at all. However, the scope of design iterations and debug needed for more complicated graphics demos is substantial. This work would not have been possible without the help of Victor Suarez Rovere @suarezvictor. Beginning many months ago he was invaluable in working to expand the verification/simulation capabilities of PipelineC and explore reusable hardware architectures focused on graphics - we found many bugs together. His main.cpp PipelineC-as-C OR Verilator code structure is the core of the simulation environment for this work. Thanks Victor!

With some preprocessor use it is possible to compile the PipelineC per pixel mandelbrot.c app() function as regular C code. The resulting pixels are drawn to the screen using the Simple DirectMedia Layer library. This same main.cpp is used for running Verilator based simulations as well. See the top of that .cpp file for how to build and run.

Full 32b floating point software compile of the PipelineC code: ccodesim

Generated VHDL

Verilator

It is possible to setup PipelineC and use --sim --comb --verilator arguments to prepare a Verilator simulation (VHDL->GHDL->Yosys->Verilog->Verilator flow). Similar to above, the SDL library is used in main.cpp to display pixels. See the #define USE_VERILATOR preprocessor directive to switch between compiling PipelineC as C vs running the C++ based Verilator simulation. Build instructions are at the top of the file.

Reduced 11b mantissa floating point format Verilator based simulation of the PipelineC code: ver

Autopipelining

This design is complex enough to require autopipelining from the PipelineC tool to meet timing at the 148.5 MHz target operating frequency. A summary of design autopipelining:

* render_pixel() : 252 stages
  * float_8_11_t adders : 6 stages each
  * float_8_11_t multipliers : 5 stages each
  * mandelbrot() : 222 stages
    * float_8_11_t adders : 6 stages each
    * float_8_11_t multipliers : 5 stages each
* do_state_update() : 15 stages
  * next_state_func() : 13 stages:
    * float_8_11_t adders : 6 stages each
    * float_8_11_t multipliers : 4,5 stages each

Vivado

Instantiation of the PipelineC entity inside the dev board top level VHDL file board.vhd:

-- The PipelineC generated entity
top_inst : entity work.top port map (   
    -- Main function clocks
    clk_148p5 => vga_pixel_clk,
    
    -- Switches
    switches_module_sw => unsigned(sw),
    
    -- Buttons
    buttons_module_btn => unsigned(btn),

    -- PMODB
    pmod_jb_return_output.jb0(0) => jb(0),
    pmod_jb_return_output.jb1(0) => jb(1),
    pmod_jb_return_output.jb2(0) => jb(2),
    pmod_jb_return_output.jb3(0) => jb(3),
    pmod_jb_return_output.jb4(0) => jb(4),
    pmod_jb_return_output.jb5(0) => jb(5),
    pmod_jb_return_output.jb6(0) => jb(6),
    pmod_jb_return_output.jb7(0) => jb(7),
    -- PMODC
    pmod_jc_return_output.jc0(0) => jc(0),
    pmod_jc_return_output.jc1(0) => jc(1),
    pmod_jc_return_output.jc2(0) => jc(2),
    pmod_jc_return_output.jc3(0) => jc(3),
    pmod_jc_return_output.jc4(0) => jc(4),
    pmod_jc_return_output.jc5(0) => jc(5),
    pmod_jc_return_output.jc6(0) => jc(6),
    pmod_jc_return_output.jc7(0) => jc(7)  
);

Resource utilization: resources device Autopipelined design critical path meets timing: timereport

Demo

Check out the demo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly