Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for CMO v1.0 #891

Merged
merged 6 commits into from
Feb 16, 2022

Conversation

liweiwei90
Copy link
Contributor

Add support for CMO v1.0:

  • Add zicboz,zicbom,zicbop flags to isa string
  • Add blocksz parameter to specify the cache block size
  • Update cache access function to support cache block size access
  • Add clean_invalidate function for cache
  • Add functions for CMO instructions
  • Add disasm support for CMO instructions

@aswaterman
Copy link
Collaborator

I'd like someone from the CMO TG to perform a code review. @dkruckemyer-ventana can you nominate someone?

@a4lg
Copy link
Contributor

a4lg commented Dec 19, 2021

I was also making RISC-V CMO instruction but my implementation lacked actual cache invalidation.
I generally support this branch with a few fixes.

Copy link
Contributor

@a4lg a4lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is definitely better than my test implementation (not PR-ed).
Still, my work told me that there's a few changes are required.

disasm/disasm.cc Outdated
Comment on lines 1830 to 1832
DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});
DISASM_INSN("prefetch.w", prefetch_i, 0, {&store_address});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test this file (spike --isa=rv64gc_zicbom_zicbop_zicboz -l test-zicbo.bare.elf): test-zicbo.bare.zip

This is built with GNU Binutils + my CBO support patchset and my toy builder (will be open-sourced later).

    .globl  _start
    .section .entry, "ax", %progbits
    .attribute arch, "rv64gc_zicbom_zicbop_zicboz"
_start:
    prefetch.i 0(x0)
    prefetch.r 32(x1)
    prefetch.w 64(x2)
    cbo.clean x1
    cbo.clean x30
    cbo.inval x1
    cbo.inval x30
    cbo.flush x1
    cbo.flush x30
    cbo.zero x1
    cbo.zero x30
    ret

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

riscv/encoding.h Outdated
Comment on lines 231 to 233
#define ENVCFG_CBIE 0x30
#define ENVCFG_CBCFE 0x40
#define ENVCFG_CBZE 0x80
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes including CBIE, CBCFE and CBZE bits are merged into riscv-opcodes.
riscv/riscv-opcodes#94

Because riscv/encoding.h is intended to be auto-generated by riscv-opcodes, managing our bits (especially with our names) is not a good idea.

Even if we change riscv/encoding.h manually, we should use following changes instead (as upstreamed into master):

#define MENVCFG_CBIE  0x00000030
#define MENVCFG_CBCFE 0x00000040
#define MENVCFG_CBZE  0x00000080

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll update encoding.h.

@@ -546,6 +552,10 @@ void state_t::reset(processor_t* const proc, reg_t max_isa)
csrmap[CSR_MIMPID] = std::make_shared<const_csr_t>(proc, CSR_MIMPID, 0);
csrmap[CSR_MVENDORID] = std::make_shared<const_csr_t>(proc, CSR_MVENDORID, 0);
csrmap[CSR_MHARTID] = std::make_shared<const_csr_t>(proc, CSR_MHARTID, proc->get_id());
const reg_t envcfg_mask = ENVCFG_CBIE | ENVCFG_CBCFE | ENVCFG_CBZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension masking is required.

envcfg_mask = (proc->extension_enabled(EXT_ZICBOM) ? MENVCFG_CBCFE | MENVCFG_CBIE: 0)
            | (proc->extension_enabled(EXT_ZICBOZ) ? MENVCFG_CBZE: 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll update this.

Comment on lines 556 to 558
csrmap[CSR_MENVCFG] = menvcfg = std::make_shared<masked_csr_t>(proc, CSR_MENVCFG, envcfg_mask, 0);
csrmap[CSR_SENVCFG] = senvcfg = std::make_shared<masked_csr_t>(proc, CSR_SENVCFG, envcfg_mask, 0);
csrmap[CSR_HENVCFG] = henvcfg = std::make_shared<masked_csr_t>(proc, CSR_HENVCFG, envcfg_mask, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, those CSRs are not that simple.
I'm working on the fix as a part of other privileged spec 1.12 CSR changes and will support merging this simple implementation for now (I just let you know there is a todo-list entry here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@@ -0,0 +1,2 @@
require_extension(EXT_ZICBOP);
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line since this instruction is a hint and must not generate illegal instruction trap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can optionally stop prefetching if EXT_ZICBOP is not enabled. Even so, we must not generate any trap here (even prefetch.i 0(zero) must be safe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll fix this.

@@ -0,0 +1,2 @@
require_extension(EXT_ZICBOP);
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line since this instruction is a hint and must not generate illegal instruction trap.

@@ -0,0 +1,2 @@
require_extension(EXT_ZICBOP);
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line since this instruction is a hint and must not generate illegal instruction trap.

disasm/disasm.cc Outdated
DISASM_INSN("cbo.inval", cbo_inval, 0, {&xrs1});
DISASM_INSN("cbo.zero", cbo_zero, 0, {&xrs1});
DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISASM_INSN("prefetch.r", prefetch_r, 0, {&store_address});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I forgot to update the second argument after copy.

disasm/disasm.cc Outdated
DISASM_INSN("cbo.zero", cbo_zero, 0, {&xrs1});
DISASM_INSN("prefetch.i", prefetch_i, 0, {&store_address});
DISASM_INSN("prefetch.r", prefetch_i, 0, {&store_address});
DISASM_INSN("prefetch.w", prefetch_i, 0, {&store_address});
Copy link
Contributor

@a4lg a4lg Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISASM_INSN("prefetch.w", prefetch_w, 0, {&store_address});

@liweiwei90
Copy link
Contributor Author

This PR is definitely better than my test implementation (not PR-ed). Still, my work told me that there's a few changes are required.

Thanks for your comments.

@a4lg
Copy link
Contributor

a4lg commented Dec 20, 2021

(Comment by @a4lg)
Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

(Comment by @liweiwei90)
That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

I looked into the current implementation and... I concluded that ori and Zicbop instructions will not be decoded incorrectly.

  1. Hash table is not relevant here (because hash table key is exact instruction bits) and linear search using the instruction list is important.
  2. Instruction list is initially ordered in descending order of (match, mask).
  3. Related four instructions are ordered in following order (that is okay):
    • prefetch.w (match == 0x306013)
    • prefetch.r (match == 0x106013)
    • prefetch.i (match == 0x006013, mask == 0x01f07fff)
    • ori (match == 0x006013, mask == 0x707f)
  4. processor_t::decode_insn moves matched instruction to front to reduce decode miss penalty but that behavior is suppressed if there's another instruction with the same match value exists and...
  5. prefetch.i and ori share the same match value (of 0x006013). That means they will not be moved to the front (on the other hand, prefetch.[rw] will be).
  6. prefetch.[irw] are always tested before ori. (Q.E.D.)

There will be a small performance degradation, but nothing more.

Yes, it's not very intuitive behavior but not very urgent to fix either.
What do you think? >@aswaterman

Copy link
Contributor

@a4lg a4lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

@@ -241,6 +242,7 @@ int main(int argc, char** argv)
uint16_t rbb_port = 0;
bool use_rbb = false;
unsigned dmi_rti = 0;
reg_t blocksz = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with riscv/sail-riscv#137 (comment) and default block size would be better to be 64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll fix this. There is still some problem unsolved in the review of sail-riscv, this may also affect the implementation of spike.

@@ -376,6 +378,7 @@ int main(int argc, char** argv)
exit(-1);
}
});
parser.option(0, "blocksz", 1, [&](const char* s){blocksz = strtoull(s, 0, 0);});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking whether the block size is a power of 2 somewhere? See processor_t::check_pow2 for argument checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have sent email to David to discuss about the unsolved problems, including the reasonable range for cache block size. There is no reply currently. Anyway, I agree it should be power of 2. I'll update it later.

@@ -1260,6 +1260,15 @@ riscv_insn_svinval = \
hinval_vvma \
hinval_gvma \

riscv_insn_cmo = \
Copy link
Contributor

@a4lg a4lg Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sail-riscv but actual extension name would be better here, too. How about zicbo (for Zicbo[mpz]) instead of cmo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name in riscv_insn_list are all use the total name of extensions(k, b, v,...) or specific extension name(svinval). I think cmo is more like the total name of zicbo[mpz] than zicbo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. If the maintainer agrees, that's okay.

@liweiwei90
Copy link
Contributor Author

(Comment by @a4lg)
Disassembly support for prefetch.[irw] instructions must be placed before ori instruction because prefetch hint instructions are special encodings of ori instruction.

(Comment by @liweiwei90)
That's true. I didn't realize this question before jrtc27 comment in riscv/sail-riscv#137. It's easy to fix the problem in disasm. However, It seems hard to fix the function code of prefetch.* . Is there any suggestion?

I looked into the current implementation and... I concluded that ori and Zicbop instructions will not be decoded incorrectly.

1. Hash table is not relevant here (because hash table key is exact instruction bits) and linear search using the instruction list is important.

2. Instruction list is initially ordered in descending order of `(match, mask)`.

3. Related four instructions are ordered in following order (that is okay):
   
   * `prefetch.w` (`match == 0x306013`)
   * `prefetch.r` (`match == 0x106013`)
   * `prefetch.i` (`match == 0x006013, mask == 0x01f07fff`)
   * `ori` (`match == 0x006013, mask == 0x707f`)

4. `processor_t::decode_insn` moves matched instruction to front to reduce decode miss penalty but that behavior is suppressed if there's another instruction with the same `match` value exists and...

5. `prefetch.i` and `ori` share the same `match` value (of `0x006013`). That means they will not be moved to the front (on the other hand, `prefetch.[rw]` will be).

6. `prefetch.[irw]` are always tested before `ori`. (Q.E.D.)

There will be a small performance degradation, but nothing more.

Yes, it's not very intuitive behavior but not very urgent to fix either. What do you think? >@aswaterman

Thanks for your explanation. I missed the case that the instruction will not moved to the front if it have the same match value with its previous or later instruction.

@liweiwei90
Copy link
Contributor Author

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

@a4lg
Copy link
Contributor

a4lg commented Dec 21, 2021

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

I just felt that it wasn't necessary. prefetch.* instructions need to be placed before ori but regular extended instructions are placed under base instructions, organized by extension (so I thought cbo.* instruction should follow regular coding style).
But that's just about cleanliness, not functionality.

@liweiwei90
Copy link
Contributor Author

I first didn't notice that disasm part is resolved (because cbo.* instructions are moved too). Anyway, here's my suggestions.

Sorry, I don't understand the problem for moving cbo.* instructions before.

I just felt that it wasn't necessary. prefetch.* instructions need to be placed before ori but regular extended instructions are placed under base instructions, organized by extension (so I thought cbo.* instruction should follow regular coding style). But that's just about cleanliness, not functionality.

OK. I just want to put the cmo related instructions together.

@aswaterman
Copy link
Collaborator

The prefetch instructions should be deleted from this PR. Executing them as no-ops is a fully legitimate implementation of the extension. The proposed approach adds complexity but doesn't provide meaningful benefit.

@@ -39,6 +40,11 @@ class memtracer_list_t : public memtracer_t
for (std::vector<memtracer_t*>::iterator it = list.begin(); it != list.end(); ++it)
(*it)->trace(addr, bytes, type);
}
void clean_invalidate(uint64_t addr, size_t bytes, bool clean, bool inval)
{
for (std::vector<memtracer_t*>::iterator it = list.begin(); it != list.end(); ++it)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use modern C++ loop:

for (auto it: list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch instructions should be deleted from this PR. Executing them as no-ops is a fully legitimate implementation of the extension. The proposed approach adds complexity but doesn't provide meaningful benefit.

OK. The prefetchs have been deleted

riscv/mmu.cc Outdated
void mmu_t::cbo_zero(reg_t addr) {
addr &= ~(blocksz - 1);
reg_t vpn = addr >> PGSHIFT;
if (likely(tlb_store_tag[vpn % TLB_ENTRIES] == vpn)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of logic here (and in clean_inval() too) to perform what is basically a simple store, no? There ought to be a better way than to have to deal with all these possibilities (TLB hit/miss, tracing, triggers) over and over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is similar to simple store, However the store length is longer. So the effect of TLB hit/miss, tracing, triggers is a little different from simple store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I cannot get your idea of a better way. Current MACRO for load/store cannot fit into block size operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswaterman any ideas? cbo_zero() and clean_inval() ought to be much simpler for what they do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottj97 I agree.

cbo.zero was defined in such a way that breaking it into a sequence of smaller stores is legal. IMO, we should take advantage of that property here. Could delete this routine and replace the instruction implementation with something like

auto base = RS1 & -blocksz;
for (size_t offset = 0; offset < blocksz; offset += 4)
  MMU.store_uint32(base + offset, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, increment "offset" by 1 and simply store bytes, since that's the arch definition (and avoids assumptions about the atomicity of the individual operations).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would get the point across better. They should be equivalent as far as Spike's concerned, though, since the difference in atomicity isn't observable--Spike's single-threaded, and 4 bytes is the smallest protection granule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I still have one problem: How about the tracer for cache_misses and write_accesses counter? Do we need to take it into acount?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is OK with respect to those counters. They're part of the Spike implementation, rather than governed by the ISA, and so we can just say that the current behavior is correct by definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@scottj97
Copy link
Contributor

Please squash the fixes back into the original commits. I.e. everything starting with d819da1

@liweiwei90
Copy link
Contributor Author

Please squash the fixes back into the original commits. I.e. everything starting with d819da1

OK.

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from 56c55e5 to 0bceee0 Compare December 30, 2021 08:35
Copy link
Contributor

@scottj97 scottj97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much improved, but clean_inval() still smells funny.

riscv/mmu.cc Outdated
Comment on lines 164 to 184
void mmu_t::clean_inval(reg_t addr, bool clean, bool inval)
{
reg_t paddr = addr;
bool page = false, guest = false, access = false;
bool has_gva;
reg_t gpa;
access_type type = LOAD;
bool fault = true;
try {
paddr = translate(addr, 1, LOAD, 0);
type = LOAD;
fault = false;
} catch (trap_load_access_fault& t) {
access = true;
has_gva = t.has_gva();
} catch (trap_load_page_fault& t) {
page = true;
has_gva = t.has_gva();
} catch (trap_load_guest_page_fault& t) {
guest = true;
gpa = t.get_tval2();
}

if (fault) {
try {
paddr = translate(addr, 1, STORE, 0);
type = STORE;
fault = false;
} catch (trap_store_access_fault& t) {
access = true;
has_gva = t.has_gva();
} catch (trap_store_page_fault& t) {
page = true;
has_gva = t.has_gva();
} catch (trap_store_guest_page_fault& t) {
guest = true;
gpa = t.get_tval2();
}
}

if (fault) {
try {
paddr = translate(addr, 1, FETCH, 0);
type = FETCH;
} catch (trap_instruction_access_fault& t) {
access = true;
has_gva = t.has_gva();
} catch (trap_instruction_page_fault& t) {
page = true;
has_gva = t.has_gva();
} catch (trap_instruction_guest_page_fault& t) {
guest = true;
gpa = t.get_tval2();
}
}

if (access) {
trap_store_access_fault(has_gva, addr, 0, 0);
} else if (guest) {
trap_store_guest_page_fault(addr, gpa, 0);
} else if (page) {
trap_store_page_fault(has_gva, addr, 0, 0);
} else {
if (auto host_addr = sim->addr_to_mem(paddr)) {
if (tracer.interested_in_range(paddr, paddr + PGSIZE, type))
tracer.clean_invalidate(paddr, blocksz, clean, inval);
else
refill_tlb(addr, paddr, host_addr, type);
} else {
throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0);
}

if (!matched_trigger) {
matched_trigger = trigger_exception(type == LOAD? OPERATION_LOAD :
type == STORE? OPERATION_STORE : OPERATION_EXECUTE,
addr, 0);
if (matched_trigger)
throw *matched_trigger;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems excessive, especially all the special handling of exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec, " A cache-block management instruction is permitted to access the specified cache block whenever a load
instruction, store instruction, or instruction fetch is permitted to access the corresponding physical addresses." That means it may do LOAD STORE or FETCH check. However, current translate progress can only pass one type at a time. So in above implementation it divide into three steps to check each access type:

  • No matter which access type pass the translation progress, cbo instruction will execute successfully.
  • If all of them trigger exceptions. it will combine the final exception result according which exception is triggered at last (birefly access fault> guest page fault > page fault)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since stores don't succeed where loads fail, it should be possible to only check for load or fetch permissions.

Furthermore, the interaction between CBOs and triggers is implementation-defined, so we're free to skip that part.

We're also under no obligation to refill the TLB. For these instructions, simple code is

And finally, we don't need to be quite so precise about what type gets passed to tracer.interested_in_range. Between all of those, I think this can be simplified substantially. Something like this:

void mmu_t::clean_inval(reg_t addr, bool clean, bool inval)
{ 
  reg_t paddr;
  
  try {
    paddr = translate(addr, 1, LOAD, 0);
  } catch (...) {
    // Try again with fetch permissions 
    try {
      paddr = translate(addr, 1, FETCH, 0);
    } catch (trap_instruction_access_fault& t) {
      trap_store_access_fault(t.has_gva(), addr, 0, 0);
    } catch (trap_instruction_page_fault& t) {
      trap_store_page_fault(t.has_gva(), addr, 0, 0);
    } catch (trap_instruction_guest_page_fault& t) {
      trap_store_guest_page_fault(addr, t.get_tval2(), 0);
    } catch (...) {
      abort();
    }
  }

  if (auto host_addr = sim->addr_to_mem(paddr)) { 
    if (tracer.interested_in_range(paddr, paddr + PGSIZE, LOAD))
      tracer.clean_invalidate(paddr, blocksz, clean, inval);
  } else {
    throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0);
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make any sense to modify translate() so that we can pass a list of access types? So we could do:

paddr = translate(addr, 1, LOAD | FETCH | STORE, 0);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to see it both ways to judge. I'm worried that doing so would make the already complex page-table-walking code messier. This approach is certainly ugly, but at least it is contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec, " A cache-block management instruction is permitted to access the specified cache block whenever a load instruction, store instruction, or instruction fetch is permitted to access the corresponding physical addresses." That means it may do LOAD STORE or FETCH check. However, current translate progress can only pass one type at a time. So in above implementation it divide into three steps to check each access type:

  • No matter which access type pass the translation progress, cbo instruction will execute successfully.
  • If all of them trigger exceptions. it will combine the final exception result according which exception is triggered at last (birefly access fault> guest page fault > page fault)

How does this work with MPRV=1 (which does not affect fetch)? Perhaps the spec is too vague here. As of 83fdb70, clean_inval() tries a load access type, and if that detects a fault, it tries a fetch access type. Those could be two completely different paddrs if MPRV=1. That doesn't seem desirable. The spec refers only to "the corresponding physical address", i.e. only a single possible physical address.

Copy link

@dkruckemyer-ventana dkruckemyer-ventana Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You had me stumped there for a moment, but I think everything is OK in the spec....

The translation is based on the effective address of the CBO, which, like other explicit load and store accesses, is affected by MPRV. Once the translation is performed, the permissions are checked against the RWX bits in the various mechanisms. So the translation and permission check is not performed as if the access were an instruction fetch.

But this likely highlights an issue with performing a FETCH translation by itself.... (I was wrong in thinking that was OK.)

Copy link

@dkruckemyer-ventana dkruckemyer-ventana Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a way to approach this is to perform two LOAD accesses, one with MXR=0 and one with MXR=1? (Though I haven't quite thought through whether that explodes with virtualization....)

Edit: or just force MXR=1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MXR doesn't affect PMP, and so that wouldn't suffice to allow CBOs to regions that have a R=0 X=1 PMP configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified translation progress to support check for cbo ops If virt is not a problem.

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 3 times, most recently from 8c6acf7 to e9ffda5 Compare January 5, 2022 02:49
riscv/mmu.cc Outdated
}
}

if (access) {
Copy link
Contributor

@scottj97 scottj97 Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two problems I see in this logic:

  1. If LOAD detects an access fault, it tries again with FETCH. But access will still be true, so this will take a store access fault, when it should successfully perform the CBO if the fetch succeeds.
  2. This does not seem to check for STORE permissions at all? If the page is read-only, the operation should not be allowed (right?). Edit: apparently cache ops are allowed to read-only pages, so never mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. this is a logic error. I should take all of them to false if FETCH succeeds.

riscv/mmu.cc Outdated
@@ -258,7 +275,7 @@ reg_t mmu_t::pmp_homogeneous(reg_t addr, reg_t len)
return true;
}

reg_t mmu_t::s2xlate(reg_t gva, reg_t gpa, access_type type, access_type trap_type, bool virt, bool hlvx)
reg_t mmu_t::s2xlate(reg_t gva, reg_t gpa, int& type, int trap_type, bool virt, bool hlvx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a reference for type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see why. This smells bad but I can't think of a better way. :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to lead to a lot of confusion. Return an aggregate (e.g. struct { reg_t addr; int type; };) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll try this way.

@aswaterman
Copy link
Collaborator

@dkruckemyer-ventana tells me that the perms checking for CBOs might be relaxed so that merely checking for load permissions is a correct implementation. I'll let him confirm or deny. That would greatly simplify things: we could leave the translation functions as-is, and simply check for load perms, catch load exceptions, and rethrow as store exceptions.

@dkruckemyer-ventana
Copy link

I think that change is highly likely. Let's see what things look like with Andrew's suggestions.

@liweiwei90
Copy link
Contributor Author

I think that change is highly likely. Let's see what things look like with Andrew's suggestions.

Sorry. do you mean the spec will change as this way?
If so, the code can be simplified to as follows without change to translate function:

void mmu_t::clean_inval(reg_t addr, bool clean, bool inval)
{ 
  reg_t paddr;
  
  try {
    paddr = translate(addr, 1, LOAD, 0);
  }  catch (trap_load_access_fault& t) {
      trap_store_access_fault(t.has_gva(), addr, 0, 0);
  } catch (trap_load_page_fault& t) {
      trap_store_page_fault(t.has_gva(), addr, 0, 0);
  } catch (trap_load_guest_page_fault& t) {
      trap_store_guest_page_fault(addr, t.get_tval2(), 0);
  } catch (...) {
      abort();
  }

  if (auto host_addr = sim->addr_to_mem(paddr)) { 
    if (tracer.interested_in_range(paddr, paddr + PGSIZE, LOAD))
      tracer.clean_invalidate(paddr, blocksz, clean, inval);
  } else {
    throw trap_store_access_fault((proc) ? proc->state.v : false, addr, 0, 0);
  }
}

@dkruckemyer-ventana
Copy link

Yes, barring any last minute objections, load (read) permission is the only requirement for management CBOs. There's a PR with the updated text:

riscv/riscv-CMOs#46

@liweiwei90
Copy link
Contributor Author

OK. I have reverted the modification for translate function and update the clean_inval function.

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from de8b0dc to 0923f4e Compare January 22, 2022 13:16
if (likely(hit_way != NULL))
{
if (*hit_way & DIRTY)
writebacks++;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "writebacks" counts, but an actual writeback is only required if a clean is required. For inval only, a proper writeback of data would not need to occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll try to fix this.

riscv/mmu.cc Outdated Show resolved Hide resolved
riscv/insns/cbo_clean.h Outdated Show resolved Hide resolved
riscv/insns/cbo_flush.h Outdated Show resolved Hide resolved
riscv/mmu.cc Outdated Show resolved Hide resolved
riscv/decode.h Outdated Show resolved Hide resolved
riscv/insns/cbo_zero.h Outdated Show resolved Hide resolved
riscv/mmu.h Outdated Show resolved Hide resolved
riscv/mmu.h Outdated Show resolved Hide resolved
liweiwei added 3 commits January 30, 2022 11:33
prefetch.* are hints and share the encoding of ORI with rd = 0. so it can share the implementation of ORI and execute as no-ops
@scottj97
Copy link
Contributor

Thank you for your patience with all my nitpicking.

@aswaterman aswaterman merged commit b68b758 into riscv-software-src:master Feb 16, 2022
@aswaterman
Copy link
Collaborator

Thanks for both your effort and your patience, @liweiwei90!

@@ -72,6 +72,7 @@ static void help(int exit_code = 1)
fprintf(stderr, " --dm-no-abstract-csr Debug module won't support abstract to authenticate\n");
fprintf(stderr, " --dm-no-halt-groups Debug module won't support halt groups\n");
fprintf(stderr, " --dm-no-impebreak Debug module won't support implicit ebreak in program buffer\n");
fprintf(stderr, " --blocksz=<size> Cache block size (B) for CMO operations(powers of 2) [default 16]\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default 16 or 64?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a default of 64B - based on that being the most common block size in the industry today (in x86 and ARMv8 architectures).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, @ingallsj please make a PR to fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants