Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for cbo.zero in cmo extension #226

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

liweiwei90
Copy link

riscv_ctg (riscv-software-src/riscv-ctg#22), Sail model (riscv/sail-riscv#137) and Spike (riscv-software-src/riscv-isa-sim#891) for cmo extension is in review.

@pawks
Copy link
Collaborator

pawks commented Dec 18, 2021

Will need a CHANGELOG update before this can be merged.

@neelgala
Copy link
Collaborator

I would wait for the sail PR to be merged before we merge this to ensure that there is atleast one model from where the references can be replicated.

@liweiwei90
Copy link
Author

Will need a CHANGELOG update before this can be merged.
OK, Maybe this will wait the ctg and sail PR to be merged.

@simon5656
Copy link
Collaborator

I thought tests had to update the signatures? - I do not see that for these new tests... there is no visible behaviour of the test? I guess it is ok for now as somewhere in the quality methodology of accepting new tests somebody will automatically check that the tests propagate behaviours to the output (the signature) - so that errors in DUT can be found with signature compares...

@neelgala
Copy link
Collaborator

the test's default signature is 0xdeadbeef after sucessful run of the test, the signature becomes all zero which I believe is the intended operation of the cbo instruction. The only assumption here by the test is that the signature region is in the cacheable region - which I think is a fair assumption for now since that information needs to be captured from the DUT via riscv-config or someother discovery mechanism.

@simon5656 does that make sense ?

@pdonahue-ventana
Copy link
Contributor

The only assumption here by the test is that the signature region is in the cacheable region

I didn't look at the test in detail but how is the cacheability visible to the test?

Note that https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc says:

Cache-block zero instructions store zeros independently of whether data from the underlying memory locations are cacheable.

@simon5656
Copy link
Collaborator

When I look at the test macros - I thought policy (style?) was that signature areas were written by RVTEST_SIGUPD macro - so each of the test macros only used that to touch the signature region...

@liweiwei90
Copy link
Author

Will need a CHANGELOG update before this can be merged.
OK, Maybe this will wait the ctg and sail PR to be merged.

The only assumption here by the test is that the signature region is in the cacheable region

I didn't look at the test in detail but how is the cacheability visible to the test?

Note that https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc says:

Cache-block zero instructions store zeros independently of whether data from the underlying memory locations are cacheable.

It doesn't take the cacheability into account currently. It just test the function for storing zeros to the address space of cache block.

@liweiwei90
Copy link
Author

When I look at the test macros - I thought policy (style?) was that signature areas were written by RVTEST_SIGUPD macro - so each of the test macros only used that to touch the signature region...

I think the situation for store-related instructions is different. It doesn't need to have an additional RVTEST_SIGUPD progress to update the signature region.

@pdonahue-ventana
Copy link
Contributor

It doesn't take the cacheability into account currently. It just test the function for storing zeros to the address space of cache block.

It doesn't need to take cacheability into account so I don't see any problem. All it does is store zeros to a block of memory. It really should just be named bo.zero since caches have nothing to do with it (at least architecturally).

@allenjbaum
Copy link
Collaborator

allenjbaum commented Dec 20, 2021 via email

@simon5656
Copy link
Collaborator

Where is cache block size defined as the cbo.zero instruction clears a cache block - where the size is implementation defined?

and so maybe it would be better to put some data in a region, use the cbo.zero with a gpr to zero it and then check that the whole memory is zeroed and if it is - then add something to the signature using RVTEST_SIGUPD (ie this operand and memory region worked) and then move to the next operand and region and then update the signature and then repeat.

The current tests seem to zero the signature and then overwrite it with zeros many times... so not sure what the intent is... if the device has large cache block subsequent tests just keep zeroing the same region...

and I feel RVTEST_SIGUPD should always be used (and the signature regions not stored in directly) - as it is then easy to add logging in to the macro while testing the test - and so can see what the test thinks it is actually doing... - but that is just our approach.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Dec 21, 2021 via email

@liweiwei90
Copy link
Author

The CTG paradigm does use SIGUPD, though using the option to explicitly set the offset, and ensure the hidden offset tracks it. In this case it might make sense to store directly into the signature region using the CBO. IT will be a pretty big signature region if every offset less than the cacheblock size is tested, but for our purposes, I don't think we need to do that. Offset of cacheBlkSz+/- 0,1<<N for N=0..log2(CacheBlkSz) should be good enough. The test can be easily parameterized to do avoid offsets that are out of range (or not; it will still work, just be redundant) A CacheBlkSz variable should be set in implementation YAML, and used by the test. We need to add that variable (and many, many more) as parameters in the YAML

On Mon, Dec 20, 2021 at 6:56 AM Simon Davidmann @.> wrote: Where is cache block size defined as the cbo.zero instruction clears a cache block - where the size is implementation defined? and so maybe it would be better to put some data in a region, use the cbo.zero with a gpr to zero it and then check that the whole memory is zeroed and if it is - then add something to the signature using RVTEST_SIGUPD (ie this operand and memory region worked) and then move to the next operand and region and then update the signature and then repeat. The current tests seem to zero the signature and then overwrite it with zeros many times... so not sure what the intent is... if the device has large cache block subsequent tests just keep zeroing the same region... and I feel RVTEST_SIGUPD should always be used (and the signature regions not stored in directly) - as it is then easy to add logging in to the macro while testing the test - and so can see what the test thinks it is actually doing... - but that is just our approach. — Reply to this email directly, view it on GitHub <#226 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPXVJQLYFNSAGLINVVZIZ3UR47ZXANCNFSM5KKHEOYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.>

Maybe we can use a big enough(max(CacheBlkSz)) rvtest_data range to test the cbo.zero. and do the check and RVTEST_SIGUPD as simon5656 suggests. In the test, we can use CacheBlkSz as a macro parameter just like XLEN (we can use the CBO MACRO to ensure that it will not affect the final signature data). Offset of 0,1<<N for N=0..log2(max(CacheBlkSz)) and some random offset will be tested. However, there is a problem: is there any upper limit for CacheBlkSz?

@neelgala
Copy link
Collaborator

do we need to capture the (max-) cache-blk-sz parameter in the riscv-config yaml somewhere ? Then it may be easy for RISCOF to set those parameters at compile time. The test can then decrement down cache block sizes starting from max-cache-block all the way to zero in reasonable decrementing steps.

For the current framework the max-cache-blk-sz can be derived from the target's Makefile.include as a new parameter (similar to how XLEN is set)

@pawks
Copy link
Collaborator

pawks commented Dec 21, 2021

I don't think performing the cbo.zero at some location and then loading the value from there to store it back into the signature region is really an efficient testing strategy. I think a parameter is needed in the yaml to configure the model, but a dynamic configuration of the test is not necessary. The value in rs1 is treated as the effective address which identifies a cache block. Hence the test should ideally test if non-zero bits in the LSB do identify the cache block correctly i.e the block index is the value over which coverpoints should be defined. To do so, we can assume a reasonably high cache block size(say 64 bytes) and write tests accordingly. The test will work without problems for any block size under 64 bytes.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Dec 21, 2021 via email

@pawks
Copy link
Collaborator

pawks commented Dec 22, 2021

I can't tell if we are agreeing or not. The cache block size is entirely defined by the implementation (though it is a power of 2). As such, it should be feined by a YAML entry. We don't have to dynamically update the test, but we do need to "dynamically" configure SAIL to store the correct amount of data, so configuring the test as well shouldn't be difficult. It may not be strictly necessary, but does mean we don't have to worry about setting the wrong max size value. If you don't dynamically configure, then the max should be at least 128 - there are architectures out there that have 128B cachelines, because it was optimal for HPC workloads (if I remember correctly). I have no idea how that changes with AI workloads. One of the things that tests will specifically test is ensuring that the LSBs of rs1 are ignore. The number of ignored bits changes with the cache block size. Testing bits that go beyond the cache block boundary but less than the statically configured max is OK. Zeroing an area of memory, then reading it and writing it into the signature is much less efficient. There was originally a reason for doing that (storing directly into the signature confused the SIGUPD macro) but you probably wouldn't use it for this anyway.

I agree with this here. The tests in this PR do not test is whether the LSB of rs1 is ignored. The LSB of rs1 is always 0. The adjustment in the macro should probably become LI (rs1, offset+rs1_val); add rs1,rs1,swreg;.

I think the coverpoints will be different in case of different cache line sizes and hence CTG should be run again to generate the test for that particular cacheline size (instead of the test being configurable). Incase the coverpoints need not change, this variability can be captured by changing the size of the signature entries.

@liweiwei90
Copy link
Author

In current PR, the generated rs1_val is the LSB of rs1.
I agree it's pretty good to generate the test for particular cacheline size. However, I wonder how do we support this in current riscv-arch-test framework? Do we extend the framework to generate code and related signature file every time we run cmo test with particular cache line size?

@allenjbaum
Copy link
Collaborator

allenjbaum commented Dec 23, 2021 via email

@liweiwei90
Copy link
Author

That's true. Current implementation was generated by the assumption that cache block size is two times of XLEN.

@allenjbaum
Copy link
Collaborator

I'n not quite sure that the TEST_CBO_ZERO does what you want.
You set up a pointer reg, then every time you use it, you pre-increment by (BLOCKSZ*8/XLEN) + (offset & BLOCKSZ-1)

  1. I'm not sure you should be pre-incrementing rather than post incrementing
  2. the increment should NOT be dependent on XLEN - cache blocks are fixed sized and independent of XLEN
  3. if you intend offset to be offset_within_cacheblock, then you're instead making it the sum of all previous offsets,
    The test generator need to keep track of that, or you can instead add (offset-prev_offset) & (BLOCKSZ-1) ,
    then .set prev_offset , offset
  4. BLOCKSZ defaults to 16. IT should simply default to 64 (if not otherwise specified)
    The way it is defined it is either 26 or 32 bytes, both of which are very rare in real life.

@liweiwei90
Copy link
Author

I'n not quite sure that the TEST_CBO_ZERO does what you want. You set up a pointer reg, then every time you use it, you pre-increment by (BLOCKSZ*8/XLEN) + (offset & BLOCKSZ-1)

1. I'm not sure you should be pre-incrementing rather than post incrementing

The first offset is zero, so the address located in 0 ~ BLOCKSZ - 1. I'll change to post incrementing swreg just like RVTEST_SIGUPD later.

2. the increment should NOT be dependent on XLEN - cache blocks are fixed sized and independent of XLEN

The offset generated by riscv_ctg seems XLEN related (be XLEN/8). So I need divide it with XLEN here when I use it to calculate start address for that cache block.

3. if you intend offset to be offset_within_cacheblock, then you're instead making it the sum of all previous offsets,
   The test generator need to keep track of that, or you can instead add (offset-prev_offset) & (BLOCKSZ-1) ,
   then .set prev_offset , offset

Currently, imm_val is the offset_within_cacheblock, and offset is just the index. I'll try to use the way to increace swreg by BLOCKDZ later. And offset is not needed by this way.

4. BLOCKSZ defaults to 16. IT should simply default to 64 (if not otherwise specified)
   The way it is defined it is either 26 or 32 bytes, both of which are very rare in real life.

OK. I'll change this to 64.

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from 5c2db76 to 5ba40f3 Compare March 30, 2022 09:09
@allenjbaum
Copy link
Collaborator

Regarding BLOCKSZ: this depends on how the signature is written. The SIGUPD macro is defined to work with store instructions, either sw or sd - and which of those is used in tests depends on whether it's an RV32 or RV64, hence the dependency on XLEN (or FLEN for fsw and fsd)

But, cbo.zero doesn't store a XLEN byte signature, it stores a cache block.

So, if tests are written to point cbo.zero into some arbitrary place in memory, then read back and written into the signature area, you can use lots of SIGUPD macros to do those stores, and the offset would get incremented appropriately by those macros.
If you are instead using cbo.zero to store directly into the signature area, so you aren't doing all the extra reads and writes, then you don't use the SIGUPD macros, and you need to update the offset to at least the next cache block boundary (which might be a little more or a little less than a full cache block, depending on the offset within the cache block.

In the latter case - the offset increment is not dependent on XLEN.

What I expect is that your tests store at different offsets within a cache block, so the base address used in each successive op might be a little more or a little less than the previous cbo.zero - but each one will be in a different naturally aligned-cache block-sized region.

B

@liweiwei90
Copy link
Author

Yes, the later way is selected currently.

#define TEST_CBO_ZERO(swreg,rs1,inst,imm_val)   ;\
addi rs1,swreg,(imm_val&(BLOCKSZ-1))                                        ;\
inst rs1                                                                    ;\
nop                                                                         ;\
nop                                                                         ;\
addi swreg, swreg, BLOCKSZ

swreg will update to increment BLOCKSZ for every case to point to next target block start address. imm_val is the offset to the cache block size, generated by cgf:

        'walking_ones("rs1_val", 12, False)': 0
        'walking_zeros("rs1_val", 12, False)': 0
        'uniform_random(10, 100, ["rs1_val"], [12])': 0

@allenjbaum
Copy link
Collaborator

perfect - make sure swreg starts BLKSZ aligned, and you dont do any other SIGUPD that would upset that alignment.

Copy link
Collaborator

@allenjbaum allenjbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the TEST_CBO_ZERO issue

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from 78fc0ef to de56f1c Compare December 10, 2022 07:51
@allenjbaum
Copy link
Collaborator

Since this was filed, the test macros in arch_test.h have been split into a separate rvtest_macro.h file. Could you pull the latest arch_test.h from the exisiting repo, update the changelog, and try again? Oncr you do that, it should br good to go.

@liweiwei90 liweiwei90 force-pushed the plct-cmo-upstream branch 2 times, most recently from 7b05d71 to 0e7ab01 Compare February 22, 2023 09:24
@liweiwei90
Copy link
Author

Since this was filed, the test macros in arch_test.h have been split into a separate rvtest_macro.h file. Could you pull the latest arch_test.h from the exisiting repo, update the changelog, and try again? Oncr you do that, it should br good to go.

OK. I have updated them.

@allenjbaum
Copy link
Collaborator

Soemthing went wrong in the tests itself. Lines 19 and 23 of [rv32i_m/CMO/src/cbo.zero-01.S have the RVTEST_ISA macro parameters mis-formatted. Instead of Zicboz, there are separate Z,i,c,b,o,z parameters.
Could you make sure that you run this against Sail with riscof and see that it passes?

@allenjbaum
Copy link
Collaborator

And also for both RV32 and RV64

@liweiwei90
Copy link
Author

liweiwei90 commented Feb 23, 2023

Soemthing went wrong in the tests itself. Lines 19 and 23 of [rv32i_m/CMO/src/cbo.zero-01.S have the RVTEST_ISA macro parameters mis-formatted. Instead of Zicboz, there are separate Z,i,c,b,o,z parameters.
Could you make sure that you run this against Sail with riscof and see that it passes?

The lines are generated by riscv-ctg(riscv-software-src/riscv-ctg#22). I'll try to check why this later.
I have ran it agianst sail(riscv/sail-riscv#137) with riscof(riscv-software-src/riscof#46) by the command:
riscof run --config=config.ini --suite=../riscv-arch-test/riscv-test-suite/rv32i_m/CMO --env=../riscv-arch-test/riscv-test-suite/env
and it passed.

@liweiwei90
Copy link
Author

I think it's the wrong format for isa field of cbo.zero in riscv_ctg/data/template.yaml that causes this error. I have fixed it and updated the generated tests.

@allenjbaum
Copy link
Collaborator

allenjbaum commented Feb 24, 2023 via email

@liweiwei90
Copy link
Author

liweiwei90 commented Feb 24, 2023

Is "RVTEST_ISA("RV32IZicbozZicsr")" correct, or should it be "RVTEST_ISA("RV32I_Zicboz_Zicsr")?

Sorry. I didn't find where RVTEST_ISA is really worked. However, there is no '_' in test cases for K such as RVTEST_ISA("RV32IZk,RV32IZkn,RV32IZknd"). And the previous errors are caused by using Zicboz_Zicsr for the isa field in the template yaml of riscv-ctg,

@allenjbaum
Copy link
Collaborator

allenjbaum commented Feb 24, 2023 via email

@allenjbaum
Copy link
Collaborator

This will fail the first time a DUT has a cache block size is not 64 bytes. These tests need to be changed to use an RVMODEL_CACHEBLOCK_SIZE macro, and use it everywhere that you currently use the constant 64. Separately we need to document that users should set this if they don't use the default, and arch_test.h needs to be updated to set the default if it isn't defined. (and, SPIKE and Sail will need to be able to be configured for non-default cacheblock size....)

Copy link
Collaborator

@allenjbaum allenjbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to use a model specific macro variable RVMODEL_CACHEBLOCK_SIZE instead of BLOCKSZ, and arch_test.h must default that to 64 bytes

@pdonahue-ventana
Copy link
Contributor

I didn't review all of this but note that a single RVMODEL_CACHEBLOCK_SIZE macro may not be sufficient. The block size for Zicboz is allowed to be different than the block size for Zicbom/Zicbop. There are two independently discoverable values as mentioned in the CMO spec:

The initial set of CMO extensions requires the following information to be discovered by software:
• The size of the cache block for management and prefetch instructions
• The size of the cache block for zero instructions

inst (rs1) ;\
nop ;\
nop ;\
#if (BLOCKSZ<4096) \
Copy link
Collaborator

@allenjbaum allenjbaum Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work; addi has range -2048..2047, so this needs to change to 2048, but there is an ADDI macro already defined that does this and you should use that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll fix this.

@allenjbaum
Copy link
Collaborator

OK, then we need to add this code to arch_test.h: (changing the macro names to be a bit easier to type)
#ifndef RVMODEL_CBZ_BLOCKSIZE
#define RVMODEL_CBZ_BLOCKSIZE 64
#endif
#ifndef RVMODEL_CMO_BLOCKSIZE
#define RVMODEL_CMO_BLOCKSIZE 64
#endif

and replace instances of BLOCKSZ in the cbo.zero tests to use RVMODEL_CBZ_BLOCKSIZE

@davidharrishmc
Copy link
Contributor

Is this test suite still moving? We would use it on CORE-V Wally if it gets committed.

Copy link
Collaborator

@allenjbaum allenjbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there will be a merge conflict because I got to this so late, but otherwise it looks good

@allenjbaum allenjbaum merged commit 4eea0a0 into riscv-non-isa:main Nov 15, 2023
1 check passed
@davidharrishmc
Copy link
Contributor

As written, rv32i_m/CMO/src/cbo.zero-01.S hangs in Sail for me. At 0x80000128, cbo.zero (t5) causes an illegal-instruction trap. There is no trap handler, so Sail infinite loops.

I hypothesized this might be because the menvcfg.CBZE bit was not set, because the Zicboz extension requires this bit to execute cbo.zero. However, setting that bit doesn't solve the illegal-instruction trap in Sail.

Has anybody been able to run cbo.zero-01.S through Sail?

@allenjbaum
Copy link
Collaborator

allenjbaum commented Dec 21, 2023 via email

@davidharrishmc
Copy link
Contributor

davidharrishmc commented Dec 21, 2023 via email

@davidharrishmc
Copy link
Contributor

And disregard my concern about menvcfg.CBZE. That is not required for machine-mode testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants