Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required setup to synthesize #9

Open
0xB4demeister opened this issue Jun 10, 2021 · 2 comments
Open

Required setup to synthesize #9

0xB4demeister opened this issue Jun 10, 2021 · 2 comments

Comments

@0xB4demeister
Copy link

I am currently trying to create a hwpe datamover that transforms the hwpe interface into axi4 streams such that i can use different existing ips directly with PULPissimo. This work is part of an university project, therefore i cannot share my files right now. My previous solution did already synthesize, but i missed the hwpe-ctrl (especially the part about the addressgen_ctrl, which then caused the synthesized result to not contain any connections to the tcdm because they were not connected and therefore optimized out. I then added the addressgen part with this configuration

    streamer_ctrl_cfg = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.trans_size  = 1;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.line_stride = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.line_length = 1;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_stride = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_length = 1;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.base_addr = reg_file.hwpe_params[MAC_REG_A_ADDR];
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_roll   = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.loop_outer  = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.realign_type = '0;

    streamer_ctrl_cfg = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.trans_size  = 1;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.line_stride = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.line_length = 1;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_stride = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_length = 1;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.base_addr = reg_file.hwpe_params[MAC_REG_B_ADDR];
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_roll   = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.loop_outer  = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.realign_type = '0;

which closed all the connections but now the pulpissimo_project does not synthesize anymore.
I also played around with adding a simple FSM like for example from https://github.com/pulp-platform/hwpe-datamover-example
which has kind of the same use case, but it currently still fails.

I wanted to ask, which configurations need to be present for the hwpe to synthesize correctly and how to properly set up the FSM, regfiles,.. if they are required - because right now, i think there is no real documentation for that apart the working code for mac and datamover @FrancescoConti

@FrancescoConti
Copy link
Member

Hi, in fact there is no "correct" way to design a controller. The code here provides essentially two things:

  • one possible kind of memory-mapped slave (hwpe_slave) that can be used to program the accelerator, with features like register shadowing and 2-job queues.
  • it also includes a hwpe_uloop which may be used to implement programmable auto-looping of the accelerator.
    There is no FSM "stub" however, because FSM design is too intimately tied with the specifics of a single accelerator.

If you want to better understand how to connect address generator control wires, you can have a look at its documentation if you did not already: https://hwpe-doc.readthedocs.io/en/latest/modules.html#hwpe-stream-addressgen

Finally: as far as I understand, you are trying to synthesize the system for FPGA without simulating it first. In my personal experience, it is extremely difficult to find and debug things in this way. Often synthesis issues are really functional problems that would be identified with proper simulation.
If you want, you can use a standalone testbench ( https://github.com/pulp-platform/hwpe-tb ) to increase confidence in your RTL. The software part is outdated (still relying on the older SDK), but the hardware should be fine and simple enough to be simulated even in free tools such as ModelSim Intel edition or Vivado's simulator, with some effort.

Let me know if you have any further question.

@0xB4demeister
Copy link
Author

thanks for your replies and sorry for answering myself such late. I needed to finish my project and therefore couldn't test further configurations or the hwpe-tb. My last state was kind of a simple combination from datamover and the mac example. My goal was to set up an HWPE that transforms HWPE-Stream data into AXI4-Stream data in the simplest possible way. Therefore I tried to find a minimal working solution with just a regfile to configure the base address for read and write + their respective transfer length. But I couldn't find many documentation regarding if uloop and fsm are even required for a successful synthesize, maybe you could add a little bit more documentation regarding the hwpe-ctrl modules in [https://hwpe-doc.readthedocs.io/en/latest/modules.html](HWPE documentation) - before adding the addressgen configuration my system did successfully finish implementation, but as one would expect, there was no connection regarding the tcdm from the hwpe.

I provided my top-file as a reference, perhaps you can directely spot the issue (reg size still is 16 even though i just need 4 because i wanted to change as few as possible compared to the mac, which does work to remove possible issues) :

import hwpe_to_axi4_stream_converter_package::*;
import hwpe_ctrl_package::*;

module hwpe_to_axi4_stream_converter_top #(parameter int unsigned N_CORES = 2,                  // number of tcdm master ports, just one for read and one for write
                                           parameter int unsigned N_CONTEXT       = 2,
                                           parameter int unsigned N_IO_REGS       = 16,         //number of io register entries
                                           parameter int unsigned ID              = 10,
                                           parameter int unsigned MP = 2,                       // number of tcdm master ports, just one for read and one for write
                                           parameter int unsigned REGFILE_N_EVT = 2
                                           )
                                          (input logic clk_i,
                                           input logic rst_ni,
                                           input logic test_mode_i,
                                           output logic [N_CORES-1:0][REGFILE_N_EVT-1:0] evt_o,
                                           hwpe_stream_intf_tcdm.master tcdm[MP-1:0],
                                           // periph slave port
                                           hwpe_ctrl_intf_periph.slave  periph,
                                           output logic fifo_read_m_tvalid,
                                           input logic fifo_read_m_tready,
                                           output logic [31 : 0] fifo_read_m_tdata,
                                           output logic fifo_read_m_tlast,
                                           input logic fifo_write_s_tvalid,
                                           output logic fifo_write_s_tready,
                                           input logic [31 : 0] fifo_write_s_tdata,
                                           input logic fifo_write_s_tlast,
                                           output logic read_wr_rst_busy,
                                           output logic read_rst_rst_busy,
                                           output logic [10:0] read_axis_data_count,
                                           output logic write_wr_rst_busy,
                                           output logic write_rst_rst_busy,
                                           output logic [10:0] write_axis_data_count);
    
    logic enable, clear;

    // These are the bit fields used to control the streamer.
    ctrl_streamer_t  streamer_ctrl, streamer_ctrl_cfg;
    flags_streamer_t streamer_flags;

    // These are the bit fields used to propagate flags from/to the peripheral
    // interconnect slave interface.
    ctrl_slave_t slave_ctrl;
    flags_slave_t slave_flags;
    ctrl_regfile_t reg_file;

    //state machine variables (cs = current state, ns = next state)
    state_fsm_t cs, ns;
    
    //variable used to indicate that first phase of filling the queue is done, reset to 0 after emptying is done
    logic read_fifo_full;
    
    //tcdm does not support tlast variable by default
    logic read_axi_tlast, write_axi_tlast;
    
    hwpe_stream_intf_stream #(
    .DATA_WIDTH(32)
    ) read (
    .clk (clk_i)
    );
    hwpe_stream_intf_stream #(
    .DATA_WIDTH(32)
    ) write (
    .clk (clk_i)
    );
    [...]

    // The slave module exposes a peripheral interconnect HWPE-Periph plug;
    // in the default configuration, it provides 2 contexts with 13 registers
    // each, which are exposed into `reg_file.hwpe_params`
    hwpe_ctrl_slave #(
    .N_CORES        ( N_CORES  ),
    .N_CONTEXT      ( N_CONTEXT ),
    .N_IO_REGS      ( N_IO_REGS ),
    .N_GENERIC_REGS ( 8  ),
    .ID_WIDTH       ( ID )
    ) i_slave (
    .clk_i   ( clk_i       ),
    .rst_ni  ( rst_ni      ),
    .clear_o ( clear       ),
    .cfg     ( periph      ),
    .ctrl_i  ( slave_ctrl  ),
    .flags_o ( slave_flags ),
    .reg_file( reg_file    )
    );


// Datamover FSM: sequential process.
  always_ff @(posedge clk_i or negedge rst_ni)
  begin : fsm_seq
    if(~rst_ni)
      cs <= FSM_IDLE;
    else if(clear)
      cs <= FSM_IDLE;
    else
      cs <= ns;
  end

  // Datamover FSM: combinational next-state calculation process.
  always_comb
  begin : fsm_ns_comb
    ns = cs;
    if(cs == FSM_IDLE) begin
      if(slave_flags.start)
        ns = FSM_STARTING;
    end
    else if(cs == FSM_STARTING) begin
      ns = FSM_WORKING;
    end
    else if(cs == FSM_WORKING) begin
      if ((streamer_flags.write_sink_flags.done | streamer_flags.write_sink_flags.ready_start) & (streamer_flags.read_source_flags.done | streamer_flags.read_source_flags.ready_start) & streamer_flags.tcdm_fifo_empty)
        ns = FSM_FINISHED;
    end
    else begin
      ns = FSM_IDLE;
    end
  end

  // Datamover FSM: combinational output calculation process.
  always_comb
  begin : fsm_out_comb
    slave_ctrl = '0;
    streamer_ctrl = streamer_ctrl_cfg;
    if(cs == FSM_STARTING) begin
      streamer_ctrl.read_source_ctrl.req_start = 1'b1;
      streamer_ctrl.write_sink_ctrl.req_start = 1'b1;
    end
    else if (cs == FSM_FINISHED) begin
      slave_ctrl.done = 1'b1;
    end
  end

  // Here we bind the register file parameters to the streamer configuration.
  // `streamer_ctrl_cfg` contains the "base" configuration, with null `req_start`.
  // The FSM copies this base configuration into `streamer_ctrl` and sets
  // the `req_start` signals when in state FSM_STARTING.

  // direct mappings - these have to be here due to blocking/non-blocking assignment
    // combination with the same ctrl_engine_o/ctrl_streamer_o variable
    // shift-by-3 due to conversion from bits to bytes
  always_comb
  begin
    streamer_ctrl_cfg = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.trans_size  = reg_file.hwpe_params[CONV_REG_READ_TRANSACTION_LENGTH ];
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.line_stride = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.line_length = 1;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_stride = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_length = 1;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.base_addr = reg_file.hwpe_params[CONV_REG_READ_BASE_ADDR ];
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.feat_roll   = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.loop_outer  = '0;
    streamer_ctrl_cfg.read_source_ctrl.addressgen_ctrl.realign_type = '0;

    streamer_ctrl_cfg = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.trans_size  = reg_file.hwpe_params[CONV_REG_WRITE_TRANSACTION_LENGTH ];
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.line_stride = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.line_length = 1;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_stride = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_length = 1;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.base_addr = reg_file.hwpe_params[CONV_REG_WRITE_BASE_ADDR ];
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.feat_roll   = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.loop_outer  = '0;
    streamer_ctrl_cfg.write_sink_ctrl.addressgen_ctrl.realign_type = '0;
  end

  // Bind the output event, which is propagated to the event unit and used
  // to implement HWPE datamover barriers.
  assign evt_o = slave_flags.evt[7:0];

    
    assign enable         = 1'b1;
    
endmodule  //hwpe_to_axi4_stream_converter_top

with

package hwpe_to_axi4_stream_converter_package;
  parameter int unsigned CONV_REG_READ_BASE_ADDR            = 0;
  parameter int unsigned CONV_REG_READ_TRANSACTION_LENGTH   = 1;
  parameter int unsigned CONV_REG_WRITE_BASE_ADDR           = 2;
  parameter int unsigned CONV_REG_WRITE_TRANSACTION_LENGTH  = 3;
[...]
endmodule 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants