add support for CMO v1.0 #891

liweiwei90 · 2021-12-18T02:32:42Z

Add support for CMO v1.0:

Add zicboz,zicbom,zicbop flags to isa string
Add blocksz parameter to specify the cache block size
Update cache access function to support cache block size access
Add clean_invalidate function for cache
Add functions for CMO instructions
Add disasm support for CMO instructions

aswaterman · 2021-12-18T02:41:24Z

I'd like someone from the CMO TG to perform a code review. @dkruckemyer-ventana can you nominate someone?

a4lg · 2021-12-19T05:40:48Z

I was also making RISC-V CMO instruction but my implementation lacked actual cache invalidation.
I generally support this branch with a few fixes.

a4lg

This PR is definitely better than my test implementation (not PR-ed).
Still, my work told me that there's a few changes are required.

a4lg · 2021-12-19T05:42:56Z

disasm/disasm.cc

+  DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
+  DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});
+  DISASM_INSN("prefetch.w", prefetch_i, 0, {&store_address});


Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

You can test this file (spike --isa=rv64gc_zicbom_zicbop_zicboz -l test-zicbo.bare.elf): test-zicbo.bare.zip

This is built with GNU Binutils + my CBO support patchset and my toy builder (will be open-sourced later).

.globl _start .section .entry, "ax", %progbits .attribute arch, "rv64gc_zicbom_zicbop_zicboz" _start: prefetch.i 0(x0) prefetch.r 32(x1) prefetch.w 64(x2) cbo.clean x1 cbo.clean x30 cbo.inval x1 cbo.inval x30 cbo.flush x1 cbo.flush x30 cbo.zero x1 cbo.zero x30 ret

That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

a4lg · 2021-12-19T05:46:30Z

riscv/encoding.h

+#define ENVCFG_CBIE    0x30
+#define ENVCFG_CBCFE   0x40
+#define ENVCFG_CBZE    0x80


My changes including CBIE, CBCFE and CBZE bits are merged into riscv-opcodes.
riscv/riscv-opcodes#94

Because riscv/encoding.h is intended to be auto-generated by riscv-opcodes, managing our bits (especially with our names) is not a good idea.

Even if we change riscv/encoding.h manually, we should use following changes instead (as upstreamed into master):

#define MENVCFG_CBIE 0x00000030 #define MENVCFG_CBCFE 0x00000040 #define MENVCFG_CBZE 0x00000080

OK. I'll update encoding.h.

a4lg · 2021-12-19T05:52:50Z

riscv/processor.cc

@@ -546,6 +552,10 @@ void state_t::reset(processor_t* const proc, reg_t max_isa)
  csrmap[CSR_MIMPID] = std::make_shared<const_csr_t>(proc, CSR_MIMPID, 0);
  csrmap[CSR_MVENDORID] = std::make_shared<const_csr_t>(proc, CSR_MVENDORID, 0);
  csrmap[CSR_MHARTID] = std::make_shared<const_csr_t>(proc, CSR_MHARTID, proc->get_id());
+  const reg_t envcfg_mask = ENVCFG_CBIE | ENVCFG_CBCFE | ENVCFG_CBZE;


Extension masking is required.

envcfg_mask = (proc->extension_enabled(EXT_ZICBOM) ? MENVCFG_CBCFE | MENVCFG_CBIE: 0) | (proc->extension_enabled(EXT_ZICBOZ) ? MENVCFG_CBZE: 0);

OK. I'll update this.

a4lg · 2021-12-19T06:00:54Z

riscv/processor.cc

+  csrmap[CSR_MENVCFG] = menvcfg = std::make_shared<masked_csr_t>(proc, CSR_MENVCFG, envcfg_mask, 0);
+  csrmap[CSR_SENVCFG] = senvcfg = std::make_shared<masked_csr_t>(proc, CSR_SENVCFG, envcfg_mask, 0);
+  csrmap[CSR_HENVCFG] = henvcfg = std::make_shared<masked_csr_t>(proc, CSR_HENVCFG, envcfg_mask, 0);


Actually, those CSRs are not that simple.
I'm working on the fix as a part of other privileged spec 1.12 CSR changes and will support merging this simple implementation for now (I just let you know there is a todo-list entry here).

a4lg · 2021-12-19T06:12:17Z

riscv/insns/prefetch_i.h

@@ -0,0 +1,2 @@
+require_extension(EXT_ZICBOP);


Delete this line since this instruction is a hint and must not generate illegal instruction trap.

You can optionally stop prefetching if EXT_ZICBOP is not enabled. Even so, we must not generate any trap here (even prefetch.i 0(zero) must be safe).

OK. I'll fix this.

a4lg · 2021-12-19T06:12:23Z

riscv/insns/prefetch_r.h

@@ -0,0 +1,2 @@
+require_extension(EXT_ZICBOP);


Delete this line since this instruction is a hint and must not generate illegal instruction trap.

a4lg · 2021-12-19T06:12:26Z

riscv/insns/prefetch_w.h

@@ -0,0 +1,2 @@
+require_extension(EXT_ZICBOP);


Delete this line since this instruction is a hint and must not generate illegal instruction trap.

a4lg · 2021-12-19T06:14:38Z

disasm/disasm.cc

+  DISASM_INSN("cbo.inval", cbo_inval, 0, {&xrs1});
+  DISASM_INSN("cbo.zero", cbo_zero, 0, {&xrs1});
+  DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
+  DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});


DISASM_INSN("prefetch.r", prefetch_r, 0, {&store_address});

Sorry. I forgot to update the second argument after copy.

a4lg · 2021-12-19T06:15:23Z

disasm/disasm.cc

+  DISASM_INSN("cbo.zero", cbo_zero, 0, {&xrs1});
+  DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
+  DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});
+  DISASM_INSN("prefetch.w", prefetch_i, 0, {&store_address});


DISASM_INSN("prefetch.w", prefetch_w, 0, {&store_address});

liweiwei90 · 2021-12-20T01:22:33Z

This PR is definitely better than my test implementation (not PR-ed). Still, my work told me that there's a few changes are required.

Thanks for your comments.

a4lg · 2021-12-20T09:09:48Z

(Comment by @a4lg)
Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

(Comment by @liweiwei90)
That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

I looked into the current implementation and... I concluded that ori and Zicbop instructions will not be decoded incorrectly.

Hash table is not relevant here (because hash table key is exact instruction bits) and linear search using the instruction list is important.
Instruction list is initially ordered in descending order of (match, mask).
Related four instructions are ordered in following order (that is okay):
- prefetch.w (match == 0x306013)
- prefetch.r (match == 0x106013)
- prefetch.i (match == 0x006013, mask == 0x01f07fff)
- ori (match == 0x006013, mask == 0x707f)
processor_t::decode_insn moves matched instruction to front to reduce decode miss penalty but that behavior is suppressed if there's another instruction with the same match value exists and...
prefetch.i and ori share the same match value (of 0x006013). That means they will not be moved to the front (on the other hand, prefetch.[rw] will be).
prefetch.[irw] are always tested before ori. (Q.E.D.)

There will be a small performance degradation, but nothing more.

Yes, it's not very intuitive behavior but not very urgent to fix either.
What do you think? >@aswaterman

a4lg

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

a4lg · 2021-12-20T10:53:22Z

spike_main/spike.cc

@@ -241,6 +242,7 @@ int main(int argc, char** argv)
  uint16_t rbb_port = 0;
  bool use_rbb = false;
  unsigned dmi_rti = 0;
+  reg_t blocksz = 16;


I agree with riscv/sail-riscv#137 (comment) and default block size would be better to be 64.

OK, I'll fix this. There is still some problem unsolved in the review of sail-riscv, this may also affect the implementation of spike.

a4lg · 2021-12-20T10:55:04Z

spike_main/spike.cc

@@ -376,6 +378,7 @@ int main(int argc, char** argv)
        exit(-1);
     }
  });
+  parser.option(0, "blocksz", 1, [&](const char* s){blocksz = strtoull(s, 0, 0);});


How about checking whether the block size is a power of 2 somewhere? See processor_t::check_pow2 for argument checking.

I have sent email to David to discuss about the unsolved problems, including the reasonable range for cache block size. There is no reply currently. Anyway, I agree it should be power of 2. I'll update it later.

a4lg · 2021-12-20T10:57:39Z

riscv/riscv.mk.in

@@ -1260,6 +1260,15 @@ riscv_insn_svinval = \
 	hinval_vvma \
 	hinval_gvma \

+riscv_insn_cmo = \


This is not sail-riscv but actual extension name would be better here, too. How about zicbo (for Zicbo[mpz]) instead of cmo?

The name in riscv_insn_list are all use the total name of extensions(k, b, v,...) or specific extension name(svinval). I think cmo is more like the total name of zicbo[mpz] than zicbo.

Got it. If the maintainer agrees, that's okay.

liweiwei90 · 2021-12-21T09:01:04Z

(Comment by @a4lg)
Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

(Comment by @liweiwei90)
That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

I looked into the current implementation and... I concluded that ori and Zicbop instructions will not be decoded incorrectly.
1. Hash table is not relevant here (because hash table key is exact instruction bits) and linear search using the instruction list is important.

2. Instruction list is initially ordered in descending order of `(match, mask)`.

3. Related four instructions are ordered in following order (that is okay):
   
   * `prefetch.w` (`match == 0x306013`)
   * `prefetch.r` (`match == 0x106013`)
   * `prefetch.i` (`match == 0x006013, mask == 0x01f07fff`)
   * `ori` (`match == 0x006013, mask == 0x707f`)

4. `processor_t::decode_insn` moves matched instruction to front to reduce decode miss penalty but that behavior is suppressed if there's another instruction with the same `match` value exists and...

5. `prefetch.i` and `ori` share the same `match` value (of `0x006013`). That means they will not be moved to the front (on the other hand, `prefetch.[rw]` will be).

6. `prefetch.[irw]` are always tested before `ori`. (Q.E.D.)
There will be a small performance degradation, but nothing more.

Yes, it's not very intuitive behavior but not very urgent to fix either. What do you think? >@aswaterman

Thanks for your explanation. I missed the case that the instruction will not moved to the front if it have the same match value with its previous or later instruction.

liweiwei90 · 2021-12-21T09:03:23Z

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

a4lg · 2021-12-21T09:11:58Z

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

I just felt that it wasn't necessary. prefetch.* instructions need to be placed before ori but regular extended instructions are placed under base instructions, organized by extension (so I thought cbo.* instruction should follow regular coding style).
But that's just about cleanliness, not functionality.

liweiwei90 · 2021-12-21T09:34:15Z

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

I just felt that it wasn't necessary. prefetch.* instructions need to be placed before ori but regular extended instructions are placed under base instructions, organized by extension (so I thought cbo.* instruction should follow regular coding style). But that's just about cleanliness, not functionality.

OK. I just want to put the cmo related instructions together.

aswaterman · 2021-12-22T06:10:23Z

The prefetch instructions should be deleted from this PR. Executing them as no-ops is a fully legitimate implementation of the extension. The proposed approach adds complexity but doesn't provide meaningful benefit.

scottj97 · 2021-12-24T20:23:47Z

riscv/memtracer.h

@@ -39,6 +40,11 @@ class memtracer_list_t : public memtracer_t
    for (std::vector<memtracer_t*>::iterator it = list.begin(); it != list.end(); ++it)
      (*it)->trace(addr, bytes, type);
  }
+  void clean_invalidate(uint64_t addr, size_t bytes, bool clean, bool inval)
+  {
+    for (std::vector<memtracer_t*>::iterator it = list.begin(); it != list.end(); ++it)


Use modern C++ loop:

for (auto it: list)

The prefetch instructions should be deleted from this PR. Executing them as no-ops is a fully legitimate implementation of the extension. The proposed approach adds complexity but doesn't provide meaningful benefit.

OK. The prefetchs have been deleted

scottj97 · 2021-12-24T20:27:36Z

riscv/mmu.cc

+  void mmu_t::cbo_zero(reg_t addr) {
+    addr &= ~(blocksz - 1);
+    reg_t vpn = addr >> PGSHIFT;
+    if (likely(tlb_store_tag[vpn % TLB_ENTRIES] == vpn)) {


There is a lot of logic here (and in clean_inval() too) to perform what is basically a simple store, no? There ought to be a better way than to have to deal with all these possibilities (TLB hit/miss, tracing, triggers) over and over.

Yes, this is similar to simple store, However the store length is longer. So the effect of TLB hit/miss, tracing, triggers is a little different from simple store.

Sorry, I cannot get your idea of a better way. Current MACRO for load/store cannot fit into block size operation.

@aswaterman any ideas? cbo_zero() and clean_inval() ought to be much simpler for what they do.

@scottj97 I agree.

cbo.zero was defined in such a way that breaking it into a sequence of smaller stores is legal. IMO, we should take advantage of that property here. Could delete this routine and replace the instruction implementation with something like

auto base = RS1 & -blocksz; for (size_t offset = 0; offset < blocksz; offset += 4) MMU.store_uint32(base + offset, 0);

Better yet, increment "offset" by 1 and simply store bytes, since that's the arch definition (and avoids assumptions about the atomicity of the individual operations).

Yeah, that would get the point across better. They should be equivalent as far as Spike's concerned, though, since the difference in atomicity isn't observable--Spike's single-threaded, and 4 bytes is the smallest protection granule.

OK. I still have one problem: How about the tracer for cache_misses and write_accesses counter? Do we need to take it into acount?

I think this approach is OK with respect to those counters. They're part of the Spike implementation, rather than governed by the ISA, and so we can just say that the current behavior is correct by definition.

scottj97 · 2021-12-24T20:41:46Z

Please squash the fixes back into the original commits. I.e. everything starting with d819da1

liweiwei90 · 2021-12-27T02:29:41Z

Please squash the fixes back into the original commits. I.e. everything starting with d819da1

OK.

scottj97

This is much improved, but clean_inval() still smells funny.

scottj97 · 2021-12-31T23:28:05Z

riscv/mmu.cc

+void mmu_t::clean_inval(reg_t addr, bool clean, bool inval)
+{
+  reg_t paddr = addr;
+  bool page = false, guest = false, access = false;
+  bool has_gva;
+  reg_t gpa;
+  access_type type = LOAD;
+  bool fault = true;
+  try {
+    paddr = translate(addr, 1, LOAD, 0);
+    type = LOAD;
+    fault = false;
+  } catch (trap_load_access_fault& t) {
+    access = true;
+    has_gva = t.has_gva();
+  } catch (trap_load_page_fault& t) {
+    page = true;
+    has_gva = t.has_gva();
+  } catch (trap_load_guest_page_fault& t) {
+    guest = true;
+    gpa = t.get_tval2();
+  }
+
+  if (fault) {
+    try {
+      paddr = translate(addr, 1, STORE, 0);
+      type = STORE;
+      fault = false;
+    } catch (trap_store_access_fault& t) {
+      access = true;
+      has_gva = t.has_gva();
+    } catch (trap_store_page_fault& t) {
+      page = true;
+      has_gva = t.has_gva();
+    } catch (trap_store_guest_page_fault& t) {
+      guest = true;
+      gpa = t.get_tval2();
+    }
+  }
+
+  if (fault) {
+    try {
+      paddr = translate(addr, 1, FETCH, 0);
+      type = FETCH;
+    } catch (trap_instruction_access_fault& t) {
+      access = true;
+      has_gva = t.has_gva();
+    } catch (trap_instruction_page_fault& t) {
+      page = true;
+      has_gva = t.has_gva();
+    } catch (trap_instruction_guest_page_fault& t) {
+      guest = true;
+      gpa = t.get_tval2();
+    }
+  }
+
+  if (access) {
+    trap_store_access_fault(has_gva, addr, 0, 0);
+  } else if (guest) {
+    trap_store_guest_page_fault(addr, gpa, 0);
+  } else if (page) {
+    trap_store_page_fault(has_gva, addr, 0, 0);
+  } else {
+    if (auto host_addr = sim->addr_to_mem(paddr)) {
+      if (tracer.interested_in_range(paddr, paddr + PGSIZE, type))
+        tracer.clean_invalidate(paddr, blocksz, clean, inval);
+      else
+        refill_tlb(addr, paddr, host_addr, type);
+    } else {
+      throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0);
+    }
+
+    if (!matched_trigger) {
+      matched_trigger = trigger_exception(type == LOAD? OPERATION_LOAD :
+                                          type == STORE? OPERATION_STORE : OPERATION_EXECUTE,
+                                          addr, 0);
+      if (matched_trigger)
+        throw *matched_trigger;
+    }
+  }
+}


This seems excessive, especially all the special handling of exceptions.

In the spec, " A cache-block management instruction is permitted to access the specified cache block whenever a load
instruction, store instruction, or instruction fetch is permitted to access the corresponding physical addresses." That means it may do LOAD STORE or FETCH check. However, current translate progress can only pass one type at a time. So in above implementation it divide into three steps to check each access type:

No matter which access type pass the translation progress, cbo instruction will execute successfully.

If all of them trigger exceptions. it will combine the final exception result according which exception is triggered at last (birefly access fault> guest page fault > page fault)

Since stores don't succeed where loads fail, it should be possible to only check for load or fetch permissions.

Furthermore, the interaction between CBOs and triggers is implementation-defined, so we're free to skip that part.

We're also under no obligation to refill the TLB. For these instructions, simple code is

And finally, we don't need to be quite so precise about what type gets passed to tracer.interested_in_range. Between all of those, I think this can be simplified substantially. Something like this:

void mmu_t::clean_inval(reg_t addr, bool clean, bool inval) { reg_t paddr; try { paddr = translate(addr, 1, LOAD, 0); } catch (...) { // Try again with fetch permissions try { paddr = translate(addr, 1, FETCH, 0); } catch (trap_instruction_access_fault& t) { trap_store_access_fault(t.has_gva(), addr, 0, 0); } catch (trap_instruction_page_fault& t) { trap_store_page_fault(t.has_gva(), addr, 0, 0); } catch (trap_instruction_guest_page_fault& t) { trap_store_guest_page_fault(addr, t.get_tval2(), 0); } catch (...) { abort(); } } if (auto host_addr = sim->addr_to_mem(paddr)) { if (tracer.interested_in_range(paddr, paddr + PGSIZE, LOAD)) tracer.clean_invalidate(paddr, blocksz, clean, inval); } else { throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0); } }

Would it make any sense to modify translate() so that we can pass a list of access types? So we could do:

paddr = translate(addr, 1, LOAD | FETCH | STORE, 0);

I'd need to see it both ways to judge. I'm worried that doing so would make the already complex page-table-walking code messier. This approach is certainly ugly, but at least it is contained.

In the spec, " A cache-block management instruction is permitted to access the specified cache block whenever a load instruction, store instruction, or instruction fetch is permitted to access the corresponding physical addresses." That means it may do LOAD STORE or FETCH check. However, current translate progress can only pass one type at a time. So in above implementation it divide into three steps to check each access type:

No matter which access type pass the translation progress, cbo instruction will execute successfully.

If all of them trigger exceptions. it will combine the final exception result according which exception is triggered at last (birefly access fault> guest page fault > page fault)

How does this work with MPRV=1 (which does not affect fetch)? Perhaps the spec is too vague here. As of 83fdb70, clean_inval() tries a load access type, and if that detects a fault, it tries a fetch access type. Those could be two completely different paddrs if MPRV=1. That doesn't seem desirable. The spec refers only to "the corresponding physical address", i.e. only a single possible physical address.

You had me stumped there for a moment, but I think everything is OK in the spec....

The translation is based on the effective address of the CBO, which, like other explicit load and store accesses, is affected by MPRV. Once the translation is performed, the permissions are checked against the RWX bits in the various mechanisms. So the translation and permission check is not performed as if the access were an instruction fetch.

But this likely highlights an issue with performing a FETCH translation by itself.... (I was wrong in thinking that was OK.)

Perhaps a way to approach this is to perform two LOAD accesses, one with MXR=0 and one with MXR=1? (Though I haven't quite thought through whether that explodes with virtualization....)

Edit: or just force MXR=1?

MXR doesn't affect PMP, and so that wouldn't suffice to allow CBOs to regions that have a R=0 X=1 PMP configuration.

I have modified translation progress to support check for cbo ops If virt is not a problem.

scottj97 · 2022-01-05T18:36:05Z

riscv/mmu.cc

+    }
+  }
+
+  if (access) {


Two problems I see in this logic:

If LOAD detects an access fault, it tries again with FETCH. But access will still be true, so this will take a store access fault, when it should successfully perform the CBO if the fetch succeeds.

This does not seem to check for STORE permissions at all? If the page is read-only, the operation should not be allowed (right?). Edit: apparently cache ops are allowed to read-only pages, so never mind.

Yeah. this is a logic error. I should take all of them to false if FETCH succeeds.

scottj97 · 2022-01-06T22:00:36Z

riscv/mmu.cc

@@ -258,7 +275,7 @@ reg_t mmu_t::pmp_homogeneous(reg_t addr, reg_t len)
  return true;
 }

-reg_t mmu_t::s2xlate(reg_t gva, reg_t gpa, access_type type, access_type trap_type, bool virt, bool hlvx)
+reg_t mmu_t::s2xlate(reg_t gva, reg_t gpa, int& type, int trap_type, bool virt, bool hlvx)


Why use a reference for type?

Nevermind, I see why. This smells bad but I can't think of a better way. :(

This is going to lead to a lot of confusion. Return an aggregate (e.g. struct { reg_t addr; int type; };) instead?

OK. I'll try this way.

aswaterman · 2022-01-06T23:58:45Z

@dkruckemyer-ventana tells me that the perms checking for CBOs might be relaxed so that merely checking for load permissions is a correct implementation. I'll let him confirm or deny. That would greatly simplify things: we could leave the translation functions as-is, and simply check for load perms, catch load exceptions, and rethrow as store exceptions.

dkruckemyer-ventana · 2022-01-07T00:59:44Z

I think that change is highly likely. Let's see what things look like with Andrew's suggestions.

liweiwei90 · 2022-01-11T15:04:13Z

I think that change is highly likely. Let's see what things look like with Andrew's suggestions.

Sorry. do you mean the spec will change as this way?
If so, the code can be simplified to as follows without change to translate function:

void mmu_t::clean_inval(reg_t addr, bool clean, bool inval)
{ 
  reg_t paddr;
  
  try {
    paddr = translate(addr, 1, LOAD, 0);
  }  catch (trap_load_access_fault& t) {
      trap_store_access_fault(t.has_gva(), addr, 0, 0);
  } catch (trap_load_page_fault& t) {
      trap_store_page_fault(t.has_gva(), addr, 0, 0);
  } catch (trap_load_guest_page_fault& t) {
      trap_store_guest_page_fault(addr, t.get_tval2(), 0);
  } catch (...) {
      abort();
  }

  if (auto host_addr = sim->addr_to_mem(paddr)) { 
    if (tracer.interested_in_range(paddr, paddr + PGSIZE, LOAD))
      tracer.clean_invalidate(paddr, blocksz, clean, inval);
  } else {
    throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0);
  }
}

dkruckemyer-ventana · 2022-01-11T18:52:20Z

Yes, barring any last minute objections, load (read) permission is the only requirement for management CBOs. There's a PR with the updated text:

riscv/riscv-CMOs#46

liweiwei90 · 2022-01-12T01:31:42Z

OK. I have reverted the modification for translate function and update the clean_inval function.

dkruckemyer-ventana · 2022-01-26T15:30:23Z

riscv/cachesim.cc

+    if (likely(hit_way != NULL))
+    {
+      if (*hit_way & DIRTY)
+        writebacks++;


I'm not sure what "writebacks" counts, but an actual writeback is only required if a clean is required. For inval only, a proper writeback of data would not need to occur.

OK. I'll try to fix this.

riscv/mmu.cc

riscv/insns/cbo_clean.h

riscv/insns/cbo_flush.h

riscv/mmu.cc

riscv/decode.h

riscv/insns/cbo_zero.h

riscv/mmu.h

prefetch.* are hints and share the encoding of ORI with rd = 0. so it can share the implementation of ORI and execute as no-ops

scottj97 · 2022-01-30T17:09:05Z

Thank you for your patience with all my nitpicking.

aswaterman · 2022-02-16T13:06:26Z

Thanks for both your effort and your patience, @liweiwei90!

ingallsj · 2022-02-18T18:35:11Z

spike_main/spike.cc

@@ -72,6 +72,7 @@ static void help(int exit_code = 1)
  fprintf(stderr, "  --dm-no-abstract-csr  Debug module won't support abstract to authenticate\n");
  fprintf(stderr, "  --dm-no-halt-groups   Debug module won't support halt groups\n");
  fprintf(stderr, "  --dm-no-impebreak     Debug module won't support implicit ebreak in program buffer\n");
+  fprintf(stderr, "  --blocksz=<size>     Cache block size (B) for CMO operations(powers of 2) [default 16]\n");


default 16 or 64?

I would suggest a default of 64B - based on that being the most common block size in the industry today (in x86 and ARMv8 architectures).

Agreed, @ingallsj please make a PR to fix.

liweiwei90 mentioned this pull request Dec 18, 2021

add support for cbo.zero in cmo extension riscv-non-isa/riscv-arch-test#226

Merged

a4lg suggested changes Dec 19, 2021

View reviewed changes

a4lg reviewed Dec 19, 2021

View reviewed changes

a4lg suggested changes Dec 19, 2021

View reviewed changes

liweiwei90 force-pushed the plct-cmo-upstream branch from e789232 to cb040d5 Compare December 20, 2021 03:11

a4lg suggested changes Dec 20, 2021

View reviewed changes

scottj97 reviewed Dec 24, 2021

View reviewed changes

liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from 56c55e5 to 0bceee0 Compare December 30, 2021 08:35

scottj97 suggested changes Dec 31, 2021

View reviewed changes

scottj97 approved these changes Jan 4, 2022

View reviewed changes

liweiwei90 force-pushed the plct-cmo-upstream branch 3 times, most recently from 8c6acf7 to e9ffda5 Compare January 5, 2022 02:49

scottj97 suggested changes Jan 5, 2022

View reviewed changes

liweiwei90 force-pushed the plct-cmo-upstream branch from e9ffda5 to 099f268 Compare January 6, 2022 01:25

scottj97 reviewed Jan 6, 2022

View reviewed changes

liweiwei90 force-pushed the plct-cmo-upstream branch from 099f268 to 020e955 Compare January 7, 2022 08:12

liweiwei90 force-pushed the plct-cmo-upstream branch from 020e955 to 79d5177 Compare January 12, 2022 01:30

liweiwei90 requested a review from scottj97 January 12, 2022 02:59

liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from de8b0dc to 0923f4e Compare January 22, 2022 13:16

dkruckemyer-ventana reviewed Jan 26, 2022

View reviewed changes

riscv/mmu.cc Outdated Show resolved Hide resolved

liweiwei90 force-pushed the plct-cmo-upstream branch from 0923f4e to f8e8359 Compare January 27, 2022 03:27

scottj97 suggested changes Jan 28, 2022

View reviewed changes

riscv/insns/cbo_clean.h Outdated Show resolved Hide resolved

riscv/insns/cbo_flush.h Outdated Show resolved Hide resolved

riscv/mmu.cc Outdated Show resolved Hide resolved

liweiwei added 3 commits January 29, 2022 17:22

add blocksz parameter to specify the cache block size for CBO operations

42cbf10

add isa string, csr support for cmo extensions

14e54eb

add clean_invalidate function for caches

456913b

liweiwei90 force-pushed the plct-cmo-upstream branch from f8e8359 to 6ec31f6 Compare January 29, 2022 09:23

scottj97 suggested changes Jan 29, 2022

View reviewed changes

riscv/decode.h Outdated Show resolved Hide resolved

riscv/insns/cbo_zero.h Outdated Show resolved Hide resolved

riscv/mmu.h Outdated Show resolved Hide resolved

riscv/mmu.h Outdated Show resolved Hide resolved

liweiwei90 force-pushed the plct-cmo-upstream branch from 6ec31f6 to 2a85517 Compare January 30, 2022 02:58

liweiwei added 3 commits January 30, 2022 11:33

add instructions function for cmo

e205ca6

prefetch.* are hints and share the encoding of ORI with rd = 0. so it can share the implementation of ORI and execute as no-ops

add disasm support for cmo instructions

e566c45

update README

0531acb

liweiwei90 force-pushed the plct-cmo-upstream branch from 2a85517 to 0531acb Compare January 30, 2022 03:33

scottj97 approved these changes Jan 30, 2022

View reviewed changes

aswaterman merged commit b68b758 into riscv-software-src:master Feb 16, 2022

ingallsj reviewed Feb 18, 2022

View reviewed changes

ingallsj mentioned this pull request Feb 18, 2022

Documentation: Cache block size for CMO operations default 64 #923

Merged

add support for CMO v1.0 #891

add support for CMO v1.0 #891

Conversation

liweiwei90 commented Dec 18, 2021

aswaterman commented Dec 18, 2021

a4lg commented Dec 19, 2021

a4lg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 19, 2021 • edited Loading

Choose a reason for hiding this comment

liweiwei90 commented Dec 20, 2021

a4lg commented Dec 20, 2021

a4lg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liweiwei90 commented Dec 21, 2021

liweiwei90 commented Dec 21, 2021

a4lg commented Dec 21, 2021 • edited Loading

liweiwei90 commented Dec 21, 2021

aswaterman commented Dec 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottj97 commented Dec 24, 2021

liweiwei90 commented Dec 27, 2021

scottj97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dkruckemyer-ventana Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

dkruckemyer-ventana Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottj97 Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 19, 2021 •

edited

Loading

a4lg Dec 20, 2021 •

edited

Loading

a4lg commented Dec 21, 2021 •

edited

Loading

dkruckemyer-ventana Jan 5, 2022 •

edited

Loading

dkruckemyer-ventana Jan 5, 2022 •

edited

Loading

scottj97 Jan 5, 2022 •

edited

Loading