Skip to content

Streams and Handshakes

Julian Kemmerer edited this page Dec 11, 2024 · 11 revisions

Adapted from a conversation, this page introduces what PipelineC currently has to offer in terms of 'abstractions', more so design patterns, for working with streams and handshakes based on valid-ready signalling.

Streams (Data and Valid Signals)

Had asked about what PipelineC has to offer in terms of a stream abstraction, and working with things like AXI4-Stream.

Wrote up some example code with plenty of comments in stream_io.c

And want to walk through some of it here:

Starting off we can define a C struct that looks like AXIS (you might do something like this with VHDL record or Verilog struct)

typedef struct my_axis_32_t{
  uint8_t data[4];
  uint1_t keep[4];
  uint1_t last;
  // TODO user field
}my_axis_32_t;

Immediately you can see that this is not flexible. If you need a different width AXIS, or one with or without tuser of a specific size, then you need to define another struct.

The closest thing to a stream abstraction in PipelineC is a helper for the 'data and valid' parts of a valid-ready handshake (like that used in AXIS).

Imagine a struct like:

typedef struct my_axis_32_t_stream_t{
  my_axis_32_t data;
  uint1_t valid;
}my_axis_32_t_stream_t;

That is what is handled for you by using the DECL_STREAM_TYPE macro along with the helper stream(type) macro for _stream_t

//ex.
DECL_STREAM_TYPE(my_axis_32_t)
...
stream(my_axis_32_t) axis_in;
// axis_in.data is a my_axis_32_t

Ready Signal

May have noticed the missing 'ready' signal for flow control. We did not mix the ready signal into the data and valid stream(my_axis_32_t) because it goes in the opposite direction. PipelineC doesnt have an equivalent of System Verilog modports with in and out inside one variable etc.

So instead you will see separate inputs and outputs for data+valid vs ready when writing a module/function:

typedef struct my_func_out_t{
  // Outputs from module
  //  Output .data and .valid stream
  stream(my_axis_32_t) axis_out;
  //  Output ready for input axis stream
  uint1_t ready_for_axis_in;
}my_func_out_t;
my_func_out_t my_func(
  // Inputs to module
  //  Input .data and .valid stream
  stream(my_axis_32_t) axis_in,
  //  Input ready for output axis stream
  uint1_t ready_for_axis_out
);

^A function that takes as inputs

  • data+valid stream incoming AXIS
  • and a flag for outgoing AXIS ready

and outputs

  • data+valid stream of outgoing AXIS
  • and a flag for incoming AXIS ready

The generated output VHDL for such a module looks like:

entity my_func_0CLK_262c3538 is
port(
 clk : in std_logic;
 CLOCK_ENABLE : in unsigned(0 downto 0);
 axis_in : in my_axis_32_t_stream_t;
 ready_for_axis_out : in unsigned(0 downto 0);
 return_output : out my_func_out_t
);
end my_func_0CLK_262c3538;

Ready Feedback Pragma

Next will look at the PipelineC to wire together two of those modules. -> first -> second-> chained dataflow.

Lets talk about just the feed forward data part of things. That is just the data and valid parts of stream. Not the feedback ready signal.

A sketch of that input axis -> first instance -> second instance -> output axis:

my_func_out_t func0 = my_func(
    input_axis,
);

my_func_out_t func1 = my_func(
   func0.axis_out,
);

output_axis = func1.axis_out

Notice output of first instance is input to second. Output of second is connected to final output.

Adding in ready signals requires some use of PipelineC specific #pragma FEEDBACK. Meaning: the first time you read from this wire, the value is from feedback, from the last point where the wire was assigned.

In practice it looks like variable has no/zero value but is still used, you can just pretend the variable has its correct feedback value when 'running this in your head'.

With the ready signals included:

// Input stream into first instance  
uint1_t ready_for_func0_axis_out;
// Note: FEEDBACK not assigned a value yet
#pragma FEEDBACK ready_for_func0_axis_out
my_func_out_t func0 = my_func(
  input_axis,
  ready_for_func0_axis_out
);
uint1_t ready_for_input_axis = func0.ready_for_axis_in;

// Output of first instance into second
uint1_t ready_for_func1_axis_out = ready_for_output_axis;
my_func_out_t func1 = my_func(
  func0.axis_out,
  ready_for_func1_axis_out
);
// Note: FEEDBACK assigned here
ready_for_func0_axis_out = func1.ready_for_axis_in;

Notice the #pragma FEEDBACK ready_for_func0_axis_out and ready_for_func0_axis_out = func1.ready_for_axis_in; the ready input into the first instance is driven by the ready output of the second instance.

Which has a block diagram like so: image

Full Example

The final two parts of the example from stream_io.c are

  1. filling what the my_func block actually does: not super critical, but the demo shows how to make a skid buffer to avoid having ready as a critical path through block and
  2. the final hooks to build / compile this design / make some vhdl to synthesize and use on fpga

Notice earlier how the function returning a struct in C my_func_out_t my_func(... turned into a VHDL module outputting a record: return_output : out my_func_out_t

Similarly, we could try to use structs as outputs on the top level of the final VHDL but that's not particularly friendly across tools and/or when mixing into a Verilog flow.

So instead you can declare top level ports like so. And will use Xilinx AXIS style names with simple uint types:

DECL_INPUT(uint32_t, s_axis_tdata)
DECL_INPUT(uint4_t, s_axis_tkeep)
DECL_INPUT(uint1_t, s_axis_tlast)
DECL_INPUT(uint1_t, s_axis_tvalid)
DECL_OUTPUT(uint1_t, s_axis_tready)
DECL_OUTPUT(uint32_t, m_axis_tdata)
DECL_OUTPUT(uint4_t, m_axis_tkeep)
DECL_OUTPUT(uint1_t, m_axis_tlast)
DECL_OUTPUT(uint1_t, m_axis_tvalid)
DECL_INPUT(uint1_t, m_axis_tready)

After those inputs are declared we can define a space to wire together the multiple instances:

#pragma PART "xc7a35ticsg324-1l" // Artix 7 35T (Arty)
#pragma MAIN_MHZ top 100.0
void top(){
  ... code using s_axis_tdata, m_axis_tdata, etc here
}

Notice that function top has no inputs or outputs since they were separately declared as global variables.

The final rendered VHDL if run with --top my_two_instances to ends up looking like:

entity my_two_instances is
port(
  clk_100p0 : in std_logic;
  s_axis_tdata_val_input : in unsigned(31 downto 0);
  s_axis_tkeep_val_input : in unsigned(3 downto 0);
  s_axis_tlast_val_input : in unsigned(0 downto 0);
  s_axis_tvalid_val_input : in unsigned(0 downto 0);
  s_axis_tready_return_output : out unsigned(0 downto 0);
  m_axis_tdata_return_output : out unsigned(31 downto 0);
  m_axis_tkeep_return_output : out unsigned(3 downto 0);
  m_axis_tlast_return_output : out unsigned(0 downto 0);
  m_axis_tvalid_return_output : out unsigned(0 downto 0);
  m_axis_tready_val_input : in unsigned(0 downto 0)
);

And you would be free to instantiate that my_two_instances in some external flow.

Generally can refer to getting started and setup for running the tool for more information.

So in full with the connection to the s/m_axis top level global signals, the two instances of the function, along with some type massaging from uint to/from arrays, looks like so:

void top(){
  // Connect top level input ports to local stream type variables
  //  Input stream data
  stream(my_axis_32_t) input_axis;
  UINT_TO_BYTE_ARRAY(input_axis.data.data, 4, s_axis_tdata)
  UINT_TO_BIT_ARRAY(input_axis.data.keep, 4, s_axis_tkeep)
  input_axis.data.last = s_axis_tlast;
  input_axis.valid = s_axis_tvalid;
  //  Output stream ready
  uint1_t ready_for_output_axis = m_axis_tready;

  // Input stream into first instance  
  uint1_t ready_for_func0_axis_out;
  // Note: FEEDBACK not assigned a value yet
  #pragma FEEDBACK ready_for_func0_axis_out
  my_func_out_t func0 = my_func(
    input_axis,
    ready_for_func0_axis_out
  );
  uint1_t ready_for_input_axis = func0.ready_for_axis_in;

  // Output of first instance into second
  uint1_t ready_for_func1_axis_out = ready_for_output_axis;
  my_func_out_t func1 = my_func(
    func0.axis_out,
    ready_for_func1_axis_out
  );
  // Note: FEEDBACK assigned here
  ready_for_func0_axis_out = func1.ready_for_axis_in; 

  // Connect top level output ports from local stream type variables
  //  Output stream data
  m_axis_tdata = uint8_array4_le(func1.axis_out.data.data); // Array to uint
  m_axis_tkeep = uint1_array4_le(func1.axis_out.data.keep); // Array to uint
  m_axis_tlast = func1.axis_out.data.last;
  m_axis_tvalid = func1.axis_out.valid;
  //  Input stream ready
  s_axis_tready = ready_for_input_axis;
}

In this case because of the FEEDBACK in the top() function, there isn't room to add autopipelining to this design (ex. if you needed to do a bunch of math on a stream, ex. any crypto)

That requires a little bit different of a design style to connect an auto pipeline into one of these streams: see GLOBAL_VALID_READY_PIPELINE_INST from global_func_inst.h.

Handshakes (Data, Valid, and Ready Signals)

Recently there have been experiments describing data,valid,ready handshakes in a way that hides the separate FEEDBACK ready signal as discussed above.

See example code in handshake_io.c.

To explain what's been done inside that handshake_io.c demo, it's easiest to talk about what has changed from the above stream_io.c example.

In addition to the old DECL_STREAM_TYPE(my_axis_32_t) these were also added

DECL_HANDSHAKE_TYPE(my_axis_32_t)
DECL_HANDSHAKE_INST_TYPE(my_axis_32_t, my_axis_32_t) // out type, in type
// needed for my_axis_32_t some_func(my_axis_32_t)

That declares types for 'handshakes' that include ready signals.

The second 'INST_TYPE' macro is specific to declaring signals for a module that is of the form:in type -> module -> out type. One input handshake/stream into the module/function and one output handshake/stream output.

handshake.h comes with hs_in/out helper macros to be used like:

hs_out(my_axis_32_t) my_func(
  hs_in(my_axis_32_t) inputs
);

ex.

hs_out(my_axis_32_t) outputs;
outputs.stream_out = ...;
outputs.ready_for_stream_in = ...;

and similar for input side.

Finally getting to the top() function where the main dataflow is specified:

DECL_INPUT(uint32_t, s_axis_tdata)
DECL_INPUT(uint4_t, s_axis_tkeep)
DECL_INPUT(uint1_t, s_axis_tlast)
DECL_INPUT(uint1_t, s_axis_tvalid)
DECL_OUTPUT(uint1_t, s_axis_tready)
DECL_OUTPUT(uint32_t, m_axis_tdata)
DECL_OUTPUT(uint4_t, m_axis_tkeep)
DECL_OUTPUT(uint1_t, m_axis_tlast)
DECL_OUTPUT(uint1_t, m_axis_tvalid)
DECL_INPUT(uint1_t, m_axis_tready)
void top(){
  ...
}

The first thing to see is that making an instance of your my_func is now not simply 'use the function in code'. Instead, helper macros like so exist to make instances to be wired up later:

// func0: my_axis_32_t my_func(my_axis_32_t)
DECL_HANDSHAKE_INST(func0, my_axis_32_t, my_func, my_axis_32_t)
// func1: my_axis_32_t my_func(my_axis_32_t)
DECL_HANDSHAKE_INST(func1, my_axis_32_t, my_func, my_axis_32_t)

Most importantly is the simplified syntax for connecting the input stream -> func0 -> func1 -> output stream data data flow.

// Input stream into first instance
// func0 input handshake = input_axis, s_axis_tready
HANDSHAKE_FROM_STREAM(func0, input_axis, s_axis_tready) 

// Output of first instance into second
// func1 input handshake = func0 output handshake
HANDSHAKE_CONNECT(func1, func0)

// Output stream from second instance
stream(my_axis_32_t) output_axis;
// output_axis, m_axis_tready = func1 output handshake
STREAM_FROM_HANDSHAKE(output_axis, m_axis_tready, func1)

image

Idea is to easily use HANDSHAKE_CONNECT for wiring up a data flow.

Eventually can get into handshakes that are more than one input -> func -> one output and get into 'split' and 'join' of multiple streams/handshakes.

More Information

Have questions? Want to chat? Stop by the PipelineC Discord or start a discussion.

Clone this wiki locally