Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Zcb implementation #322

Closed
wants to merge 5 commits into from
Closed

Adding Zcb implementation #322

wants to merge 5 commits into from

Conversation

junambi
Copy link
Collaborator

@junambi junambi commented Oct 18, 2023

Zcb implementation for review and merger.

@Timmmm
Copy link
Collaborator

Timmmm commented Oct 18, 2023

Following the discussion in Zoom I had a closer look at this and almost all of these instructions map very clearly to existing instructions. Only 2 don't map cleanly and would maybe be clearer reimplemented.

  • c.lbu -> lbu
  • c.lhu -> lhu
  • c.lh -> lh
  • c.sb -> sb
  • c.sh -> sh
  • c.zext.b -> reimplement?
  • c.sext.b -> sext.b
  • c.zext.h -> zext.h
  • c.sext.h -> sext.h
  • c.not -> reimplement?
  • c.mul -> mul

For example instead of:

function clause execute (C_MUL(rsd,rs2)) = {
  let rd = creg2reg_idx(rsd);
  let rs = creg2reg_idx(rs2);

  let result_wide = to_bits(2 * sizeof(xlen), signed(X(rd)) * signed(X(rs)));
  X(rd) = result_wide[(sizeof(xlen) - 1) .. 0];
  RETIRE_SUCCESS
}

Use:

function clause execute (C_MUL(rsd,rs2)) = {
  let rd = creg2reg_idx(rsd);
  let rs = creg2reg_idx(rs2);

  execute(MUL((rs, rd, rd, false, true, true))
}

Primitive obsession notwithstanding, I think this is easier to understand, matches the existing C instructions and is less error prone. The intention of this extension is just to have shorter decodings, so the implementation should match.

For example in this case (which I picked randomly!), reimplementing it introduced a bug, whereas just mapping it would not have. I think that's probably a strong enough argument on its own, given the lack of testing.

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 18, 2023

Following the discussion in Zoom I had a closer look at this and almost all of these instructions map very clearly to existing instructions. Only 2 don't map cleanly and would maybe be clearer reimplemented.

  • c.lbu -> lbu
  • c.lhu -> lhu
  • c.lh -> lh
  • c.sb -> sb
  • c.sh -> sh
  • c.zext.b -> reimplement?
  • c.sext.b -> sext.b
  • c.zext.h -> zext.h
  • c.sext.h -> sext.h
  • c.not -> reimplement?
  • c.mul -> mul

For example instead of:

function clause execute (C_MUL(rsd,rs2)) = {
  let rd = creg2reg_idx(rsd);
  let rs = creg2reg_idx(rs2);

  let result_wide = to_bits(2 * sizeof(xlen), signed(X(rd)) * signed(X(rs)));
  X(rd) = result_wide[(sizeof(xlen) - 1) .. 0];
  RETIRE_SUCCESS
}

Use:

function clause execute (C_MUL(rsd,rs2)) = {
  let rd = creg2reg_idx(rsd);
  let rs = creg2reg_idx(rs2);

  execute(MUL((rs, rd, rd, false, true, true))
}

Primitive obsession notwithstanding, I think this is easier to understand, matches the existing C instructions and is less error prone. The intention of this extension is just to have shorter decodings, so the implementation should match.

For example in this case (which I picked randomly!), reimplementing it introduced a bug, whereas just mapping it would not have. I think that's probably a strong enough argument on its own, given the lack of testing.

Yes. See #255 (comment).

@martinberger
Copy link
Collaborator

martinberger commented Oct 18, 2023

I just asked Tariq Kurd, the original author of the spec, what he thinks we should do:

  1. Simply forward to the corresponding non-compressed instruction (where that is possible).
  2. Implement the spec faithfully to RISC-V Instruction Set Manual (so not forwarding).

Tariq said we should do (1), that was always the intent - and demonstrates that the instructions are trivial. This is also how the other compressed instructions are implemented, e.g. c.addi. So we are all in agreement. @Timmmm @junambi

So in addition to C_MUL which Tim sketched above, here are c.sext.b and c.zext.h and c.sext.h in the format Tariq suggested (pls double check they are correct:

 function clause execute (C_SEXT_B(rsd)) = {
  let rsdc = creg2reg_idx(rsd);
  execute(ZBB_EXTOP(rsdc, rsdc, RISCV_SEXTB))
}

function clause execute (C_ZEXT_H(rsd)) = {
  let rsdc = creg2reg_idx(rsd);
  execute(ZBB_EXTOP(rsdc, rsdc, RISCV_ZEXTH))
}

function clause execute (C_SEXT_H(rsd)) = {
  let rsdc = creg2reg_idx(rsd);
  execute(ZBB_EXTOP(rsdc, rsdc,  RISCV_SEXTH))
}

@abukharmeh
Copy link
Contributor

Same thoughts #255 (comment)

Makefile Outdated
@@ -16,7 +16,6 @@ else
endif

SAIL_FLEN := riscv_flen_D.sail
SAIL_VLEN := riscv_vlen.sail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and this is removing vector, resolve conflicts accept all mine ?

@junambi
Copy link
Collaborator Author

junambi commented Oct 20, 2023

c.lbu -> lbu                             :: (Is implemented as part of LOAD), i do not see a separate LBU implementation
c.lhu -> lhu                             :: (Is implemented as part of LOAD), i do not see a separate LHU implementation
c.lh -> lh                                 :: (Is implemented as part of LOAD), i do not see a separate LH implementation
c.sb -> sb                               :: (Is implemented as part of STORE), i do not see a separate SB implementation
c.sh -> sh                               :: (Is implemented as part of STORE), i do not see a separate SH implementation
c.zext.b -> reimplement?        :: Is the current implementation fine for this
c.sext.b -> sext.b                    :: Will redo as suggested reusing SEXT_B
c.zext.h -> zext.h                    :: Will redo as suggested reusing ZEXT_H
c.sext.h -> sext.h                    :: Will redo as suggested reusing SEXT_H
c.not -> reimplement?        :: Is the current implementation fine for this
c.mul -> mul                           :: Will redo as suggested reusing MUL

Just for my understanding, can one of you please clarify the following,

  1. What are the advantages of using the existing instruction implementation, will it not affect the performance of the sail code
  2. In HW, having to implement other instructions (except the base) to support compressed instructions, will it not affect the area, power and performance of the system as a whole / is it required for HW to implement many other extensions to support only the base and compressed instructions.

@martinberger
Copy link
Collaborator

  1. What are the advantages of using the existing instruction implementation,
  2. In HW, [...] will it not affect the area, power and performance of the system as a whole /

The advantages are clarity (because it communicates the intention of the compressed instructions) and it simplifies us verifying the correctness of the Sail implementation. The performance of the Sail model is currently a secondary concern. A sufficiently smart compiler should be able to inline those function calls.

The effect on actual circuits that are used to implement compressed instructions in physically is not a concern of the Sail model, as long as the have the same observable behaviour.

@junambi
Copy link
Collaborator Author

junambi commented Oct 20, 2023

Thank you Martin,
Are the other listed implementation in my previous comment, good to implement.

Thank you.

@Timmmm
Copy link
Collaborator

Timmmm commented Oct 20, 2023

In HW, having to implement other instructions (except the base) to support compressed instructions, will it not affect the area, power and performance of the system as a whole / is it required for HW to implement many other extensions to support only the base and compressed instructions.

These instructions are only supported if the instructions they map to are are implemented. That was the bug in your c.mul implementation that I mentioned: c.mul is not supported if mul is not implemented.

@junambi
Copy link
Collaborator Author

junambi commented Oct 20, 2023

Yes Tim, i understood that part.

I was trying to understand, does this not put a limitation on the implementation, as someone could choose to implement only the compressed instructions.

Is this even allowed/supported.

Thank you.

@Timmmm
Copy link
Collaborator

Timmmm commented Oct 20, 2023

as someone could choose to implement only the compressed instructions.

They can't. The ones that are mapped to optional instructions require those instructions to be implemented. For example here's what the spec says about c.mul:

image

Your c.mul implementation was missing the check for this, but the mul implementation has it:

function clause execute (MUL(rs2, rs1, rd, high, signed1, signed2)) = {
  if haveMulDiv() | haveZmmul() then {

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 20, 2023

... and there's no sensible use case for it, either; if you have the hardware to do a c.mul, you might as well use it to do a mul too. Decode is very little cost compared to the actual multiplier.

@jjscheel jjscheel mentioned this pull request Oct 23, 2023
13 tasks
@junambi
Copy link
Collaborator Author

junambi commented Oct 24, 2023

Modified the implementation and pushed the file, please review and comment.

@github-actions
Copy link

github-actions bot commented Oct 24, 2023

Unit Test Results

712 tests  ±0   712 ✔️ ±0   0s ⏱️ ±0s
    6 suites ±0       0 💤 ±0 
    1 files   ±0       0 ±0 

Results for commit c35a554. ± Comparison against base commit c90cf2e.

♻️ This comment has been updated with latest results.

/*=======================================================================================*/

/* ****************************************************************** */
/* This file specifies the instructions in the 'ZCEE' extension. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say Zcb not ZCEE?

let rd = creg2reg_idx(rsd);
let rs = creg2reg_idx(rs2);

execute(MUL((rs, rd, rd, false, true, true)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra brackets here.

<-> 0b100 @ 0b000 @ rs1 : cregidx @ ui65 : bits(2) @ rd : cregidx @ 0b00

function clause execute (C_LBU(uimm, rsc, rdc)) = {
let immext : xlenbits = zero_extend(uimm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused


function clause execute (C_LBU(uimm, rsc, rdc)) = {
let immext : xlenbits = zero_extend(uimm);
let imm : bits(12) = zero_extend(uimm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong. The ui65 should be called ui01 and then

let imm : bits(12) = zero_extend(ui01[0] @ ui01[1]);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swizzle the bits in decode not execute

}

mapping clause assembly = C_LBU(uimm, rsc, rdc)
<-> "c.lbu" ^ spc() ^ creg_name(rdc) ^ sep() ^ creg_name(rsc) ^ sep() ^ hex_bits_5(uimm @ 0b000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uimm wrong here too.

<-> 0b100 @ 0b001 @ rs1 : cregidx @ 0b0 @ ui5 : bits(1) @ rd : cregidx @ 0b00

function clause execute (C_LHU(uimm, rsc, rdc)) = {
let immext : xlenbits = zero_extend(uimm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unused. Please check the other instructions too.


function clause execute (C_LHU(uimm, rsc, rdc)) = {
let immext : xlenbits = zero_extend(uimm);
let imm : bits(12) = zero_extend(uimm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wrong - should be zero_extend(ui1 @ 0b0);

Please check the other instructions too.

/* the encode/decode mapping between AST elements and 16-bit half words */

mapping clause encdec_compressed = C_MUL(rsd, rs2)
<-> 0b100 @ 0b111 @ rsd : cregidx @ 0b00 @ rs2 : cregidx @ 0b01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ 0b00 @ should be @ 0b10 @

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 24, 2023

Given these review comments, are you testing your code before pushing it for review, and if so how? Because basic things like decode patterns should be caught by even the most basic testing.

Also you've still deleted a bunch of V-related changes from Makefile which causes the build to fail in CI. So I don't think you could possibly have tested this code given it won't build.

@junambi
Copy link
Collaborator Author

junambi commented Oct 25, 2023

Thank you Tim and Jessica for the comments, i have updated the Makefile and the Zcb model.

Makefile Outdated
@@ -30,6 +30,9 @@ SAIL_DEFAULT_INST += riscv_insts_zbc.sail
SAIL_DEFAULT_INST += riscv_insts_zbs.sail

SAIL_DEFAULT_INST += riscv_insts_zfh.sail

SAIL_DEFAULT_INST += riscv_insts_zcb.sail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to keep this list sorted alphabetically, i.e. move it up one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated.

/* ****************************************************************** */
union clause ast = C_LHU : (bits(1), cregidx, cregidx)

mapping clause encdec_compressed = C_LHU(ui1, rs1, rd)
Copy link
Collaborator

@arichardson arichardson Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mapping clause encdec_compressed = C_LHU(ui1, rs1, rd)
mapping clause encdec_compressed = C_LHU(ui1 @ 0b0, rs1, rd)

Immediate is not scaled in current code. I assume this means there are no tests?

@Timmmm
Copy link
Collaborator

Timmmm commented Oct 25, 2023

Thank you Tim and Jessica for the comments, i have updated the Makefile and the Zcb model.

Looks like you didn't fix the issues with the immediates though?

@jjscheel
Copy link

Given these review comments, are you testing your code before pushing it for review, and if so how? Because basic things like decode patterns should be caught by even the most basic testing.

Hi, @jrtc27, fair question. I asked @junambi to post his current PR work early because @martinberger had interest in the work and we wanted to make sure that we didn't duplicating previous effort.

@jjscheel
Copy link

jjscheel commented Mar 5, 2024

@martinberger, what is the status of your work? I can't tell if a PR was submitted.

@Timmmm
Copy link
Collaborator

Timmmm commented Mar 5, 2024

There isn't a PR. I thought Martin added our implementation to a comment but I can't see it now. I'll make a new PR.

@Timmmm Timmmm mentioned this pull request Mar 6, 2024
@Timmmm
Copy link
Collaborator

Timmmm commented Mar 6, 2024

@jjscheel Martin's code (plus emulator flags) is in #427

@jjscheel
Copy link

jjscheel commented Mar 8, 2024

Thanks, @Timmmm and @martinberger!

@jordancarlin
Copy link
Contributor

@Timmmm Seems like this can be closed now that #427 has been merged

@Timmmm Timmmm closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants