From 7a02955bcf5423daf176a91f2bbfee9f29a54839 Mon Sep 17 00:00:00 2001 From: Adrian Palacios <73246657+adpaco-aws@users.noreply.github.com> Date: Tue, 27 Aug 2024 13:24:40 -0400 Subject: [PATCH] RFC: Source-based code coverage (#3143) Upgrades the Kani coverage feature with the source-based code coverage implementation used in the Rust compiler. Rendered version available [here](https://github.com/adpaco-aws/rmc/blob/rfc-region-cov/rfc/src/rfcs/0011-source-coverage.md). Related to #2640 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses. --- rfc/src/SUMMARY.md | 1 + rfc/src/rfcs/0008-line-coverage.md | 2 +- rfc/src/rfcs/0011-source-coverage.md | 647 +++++++++++++++++++++++++++ 3 files changed, 649 insertions(+), 1 deletion(-) create mode 100644 rfc/src/rfcs/0011-source-coverage.md diff --git a/rfc/src/SUMMARY.md b/rfc/src/SUMMARY.md index e0b31761ae62..0b38dac9cbb5 100644 --- a/rfc/src/SUMMARY.md +++ b/rfc/src/SUMMARY.md @@ -16,3 +16,4 @@ - [0008-line-coverage](rfcs/0008-line-coverage.md) - [0009-function-contracts](rfcs/0009-function-contracts.md) - [0010-quantifiers](rfcs/0010-quantifiers.md) +- [0011-source-coverage](rfcs/0011-source-coverage.md) diff --git a/rfc/src/rfcs/0008-line-coverage.md b/rfc/src/rfcs/0008-line-coverage.md index d2bedab8bdef..a0f287bee96e 100644 --- a/rfc/src/rfcs/0008-line-coverage.md +++ b/rfc/src/rfcs/0008-line-coverage.md @@ -1,7 +1,7 @@ - **Feature Name:** Line coverage (`line-coverage`) - **Feature Request Issue:** - **RFC PR:** -- **Status:** Unstable +- **Status:** Cancelled - **Version:** 0 - **Proof-of-concept:** (Kani) + (Kani VS Code Extension) diff --git a/rfc/src/rfcs/0011-source-coverage.md b/rfc/src/rfcs/0011-source-coverage.md new file mode 100644 index 000000000000..9ea23c72a9d1 --- /dev/null +++ b/rfc/src/rfcs/0011-source-coverage.md @@ -0,0 +1,647 @@ +- **Feature Name:** Source-based code coverage (`source-coverage`) +- **Feature Request Issue:** +- **RFC PR:** +- **Status:** Under Review +- **Version:** 1 +- **Proof-of-concept:** (Kani) + (`kani-cov`) + +------------------- + +## Summary + +A source-based code coverage feature for Kani built on top of Rust's coverage instrumentation. + +## User Impact + +In our first attempt to add a coverage feature fully managed by Kani, we introduced and made available a line coverage option +(see [RFC: Line coverage](0008-line-coverage.md) for more details). +This option has since then allowed us to gather more data around the expectations for a coverage feature in Kani. + +For example, the line coverage output we produced was not easy to interpret +without knowing some implementation details. +Aside from that, the feature requested in +[#2795](https://github.com/model-checking/kani/issues/2795) alludes to the need +of providing coverage-specific tooling in Kani. +Nevertheless, as captured in +[#2640](https://github.com/model-checking/kani/issues/2640), source-based +coverage results provide the clearest and most precise coverage information. + +In this RFC, we propose an integration with [Rust's source-based code coverage +instrumentation](https://doc.rust-lang.org/rustc/instrument-coverage.html). +This integration would allow us to report source-based code coverage results from Kani. +Also, we propose adding a new user-facing, coverage-focused tool called `kani-cov`. +The tool would allow users to process coverage results generated by Kani and produce +coverage artifacts such as summaries and reports according to their preferences. +In the [next section](#user-experience), we will explain in more detail how we +expect `kani-cov` to assist with coverage-related tasks. + +With these changes, we expect our coverage options to become more flexible, +precise and efficient. These options are expected to replace the previous +options available through the line coverage feature. +In the [last section](#future-possibilities) of this RFC, we will also discuss +the requirements for a potential integration of this coverage feature with the +LLVM toolchain. + +## User Experience + +The proposed experience is partially inspired by that of the most popular +coverage frameworks. +First, let us delve into the LLVM coverage workflow, followed by an explanation +of our proposal. + +### The LLVM code coverage workflow + +The LLVM project is home to one of the most popular code coverage frameworks. +The workflow associated to the LLVM framework is described in the documentation for +[source-based code coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html)[^note-source], +but we briefly describe it here to better relate it with our proposal. + +In short, the LLVM code coverage workflow follows three steps: + 1. **Compiling with coverage enabled.** This causes the compiler to generate an instrumented program. + 2. **Running the instrumented program.** This generates binary-encoded `.profraw` files. + 3. **Using tools to aggregate and export coverage information into other formats.** + +When working in a `cargo` project, step 1 can be done through this command: + +```sh +RUSTFLAGS='-Cinstrument-coverage' cargo build +``` + +The same flag must to be used for step 2: + +```sh +RUSTFLAGS='-Cinstrument-coverage' cargo run +``` + +This should populate the directory with at least one `.profraw` file. +Each `.profraw` file corresponds to a specific source code file in your project. + +At this point, we will have produced the artifacts that we generally require for +the LLVM tools: + 1. **The instrumented binary** which, in addition to the instrumented program, + contains additional information (e.g., the coverage mappings) required to + interpret the profiling results. + 2. **The `.profraw` files** which essentially includes the profiling results + (e.g., counter values) for each function of the corresponding source code file. + +For step 3, the commands will depend on what kind of results we want. +Most likely we will have to merge the `.profraw` files and produce a `.profdata` +file as follows: + +```sh +llvm-profdata merge -sparse *.profraw -o output.profdata +``` + +The resulting `.profdata` file will contain the aggregated coverage results from +the `.profraw` files passed to the `merge` command. + +Then, we can use a command such as + +```sh +llvm-cov show target/debug/binary —instr-profile=output.profdata -show-line-counts-or-regions +``` + +to visualize the code coverage through the terminal as in the image: + +![Source-based code coverage with `llvm-cov`](https://github.com/model-checking/kani/assets/73246657/4f8a973d-8977-4c0b-822d-e73ed6d223aa) + +or the command + +```sh +llvm-cov report target/debug/binary --instr-profile=output.profdata --show-region-summary +``` + +to produce coverage summaries like this: + +``` +Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover Branches Missed Branches Cover +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +/long/long/path/to/my/project/binary/src/main.rs 9 3 66.67% 3 1 66.67% 14 4 71.43% 0 0 - +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +TOTAL 9 3 66.67% 3 1 66.67% 14 4 71.43% 0 0 - +``` + +[^note-source]: The LLVM project refers to their own coverage feature as + *source-based code coverage*. It is not rare to see the term *region + coverage* being used instead to refer to the same thing. That is because + LLVM's source-based code coverage feature can report coverage for code + regions, but other coverage frameworks do not support the concept of code + regions. + +### The Kani coverage workflow + +The two main components of the Kani coverage workflow that we propose are the +following: + 1. The existing `--coverage` flag that drives the coverage workflow in Kani, + emits raw coverage data (as in the `.profraw` files), and produces basic + coverage results by default. + 2. A new subcommand `cov` that allows users to further process raw coverage + information emitted by Kani to produce customized coverage results (i.e., + different to the ones produced by default with the `--coverage` option). + The `cov` subcommand is an alias for the `kani-cov` tool. + +In contrast to the LLVM workflow, where human-readable coverage results can +be produced only after a sequence of LLVM tool commands, we provide some +coverage results by default. +This aligns better with our UX philosophy, and removes the need for a wrapper +around our coverage features like +[`cargo-llvm-cov`](https://github.com/taiki-e/cargo-llvm-cov). +Alternatively, the `cov` subcommand offers the ability of producing more +specific coverage results if needed. +We anticipate the `cov` subcommand being particularly useful in less standard +project setups, giving the users the flexibility required to produce coverage +results tailored to their specific needs. + +In the following, we describe each one of these components in more detail. + +#### The `--coverage` option + +The default coverage workflow will be kicked off through the unstable +`--coverage` option: + +```sh +cargo kani --coverage -Zsource-coverage +``` + +The main difference with respect to the regular verification workflow is that, +at the end of the verification-based coverage run, Kani will generate two coverage +results: + - A coverage summary corresponding to the coverage achieved by the harnesses + included in the verification run. This summary will be printed after the + verification output. + - A coverage report corresponding to the coverage achieved by the harnesses + included in the verification run. The report will be placed in the same target + directory where the raw coverage files are put. The path to the report will + also be printed after the verification output. + +Therefore, a typical `--coverage` run could look like this: + +``` +VERIFICATION:- SUCCESSFUL + +Coverage Results: + +| Filename | Regions | Missed Regions | Cover | Functions | Missed Functions | Cover | +| -------- | ------- | -------------- | ----- | --------- | ---------------- | ----- | +| main.rs | 9 | 3 | 66.67 | 3 | 1 | 33.33 | +| file.rs | 11 | 5 | 45.45 | 2 | 1 | 50.00 | + +Coverage report available in target/kani/x86_64-unknown-linux-gnu/cov/kani_2024-04-26_15-30-00/report/index.html +``` + +#### The `cov` subcommand + +The `cov` subcommand will be used to process raw coverage information generated +by Kani and produce coverage outputs as indicated by the user. +Hence, the `cov` subcommand corresponds to the set of LLVM tools +(`llvm-profdata`, `llvm-cov`, etc.) that are used to produce coverage outputs +through the LLVM coverage workflow. + +In contrast to LLVM, we will have a single subcommand for all Kani +coverage-related needs. The `cov` subcommand will just call the `kani-cov` tool, +which is expected to be shipped along the rest of Kani binaries. + +We suggest that the subcommand initially offers two options: + 1. An option to merge the coverage results from one or more files and coverage + mappings[^note-snapshot] into a single file. + 2. An option to produce coverage outputs from coverage results, including summaries + or coverage reports in human-readable formats (e.g., HTML). + +Let's assume that we have run `cargo kani --coverage -Zsource-coverage` and +generated coverage files in the `my-coverage` folder. Then, we would use `cargo +kani cov` as follows to combine the coverage results[^note-exclude] for all +harnesses: + +```sh +cargo kani cov --merge my-coverage/*.kaniraw -o my-coverage.kanicov +``` + +Let's say the user is first interested in reading a coverage summary through the +terminal. +They can use the `--summary` option for that: + +```sh +cargo kani cov --summary my-coverage/default.kanimap -instr-profile=my-coverage.kanicov +``` + +The command could print a coverage summary like: + +``` +| Filename | Regions | Missed Regions | Cover | Functions | ... +| -------- | ------- | -------------- | ----- | --------- | ... +| main.rs | 9 | 3 | 66.67 | 3 | ... +[...] +``` + +Now, let's say the user wants to produce an HTML report of the coverage results. +They will have to use the `--report` option for that: + +```sh +cargo kani cov --report my-coverage/default.kanimap -format=html -instr-profile=my-coverage.kanicov -o coverage-report +``` + +This time, the command will generate a `coverage-report` folder including a +browsable HTML webpage that highlights the regions covered in the source +according to the coverage results in `my-coverage.kanicov`. + +[^note-export]: The `llvm-cov` tool includes the option + [`gcov`](https://llvm.org/docs/CommandGuide/llvm-cov.html#llvm-cov-gcov) to + export into GCC's coverage format [Gcov](https://en.wikipedia.org/wiki/Gcov), + and the option + [`export`](https://llvm.org/docs/CommandGuide/llvm-cov.html#llvm-cov-export) + to export into the LCOV format. These may be good options to consider for + `kani-cov` in the future but we should focus on basic formats for now. + +[^note-exclude]: Options to exclude certain coverage results (e.g, from the + standard library) will likely be part of this option. + +[^note-snapshot]: Coverage mappings essentially provide a snapshot of the source + code reports for items that otherwise are unreachable or have been sliced + away during the compilation process. + +#### Integration with the Kani VS Code Extension + +We will update the coverage feature of the +[Kani VS Code Extension](https://github.com/model-checking/kani-vscode-extension) +to follow this new coverage workflow. +In other words, the extension will first run Kani with the `--coverage` option and +use `kani cov` to produce a `.kanicov` file with the coverage results. +The extension will consume the source-based code coverage results and +highlight region coverage in the source code seen from VS Code. + +We could also consider other coverage-related features in order to enhance the +experience through the Kani VS Code Extension. For example, we could +automatically show the percentage of covered regions in the status bar by +additionally extracting a summary of the coverage results. + +Finally, we could also consider an integration with other code coverage tools. +For example, if we wanted to integrate with the VS Code extensions +[Code Coverage](https://marketplace.visualstudio.com/items?itemName=markis.code-coverage) or +[Coverage Gutters](https://marketplace.visualstudio.com/items?itemName=ryanluker.vscode-coverage-gutters), +we would only need to extend `kani-cov` to export coverage results to the LCOV +format or integrate Kani with LLVM tools as discussed in [Integration with LLVM](#integration-with-llvm). + +## Detailed Design + +In this section, we provide more details on: + - The Rust coverage instrumentation and how it can be integrated into + Kani to produce source-based code coverage results. + - The proposed coverage workflow to be run by default in Kani when the + `--coverage` option is used. + +This information is mostly intended as a reference for Kani contributors. +Currently, the Rust coverage instrumentation continues to be developed. Because +of that, Rust toolchain upgrades may result in breaking changes to our own +coverage feature. This section should help developers to understand the general +approach and resolve such issues by themselves. + +### The Rust coverage instrumentation + +The Rust compiler includes two code coverage implementations: + * A source-based coverage implementation which uses LLVM's coverage + instrumentation to generate precise coverage data. This implementation can be + enabled with `-C instrument-coverage`. + * A Gcov-based coverage implementation that derives coverage data based on + DebugInfo. This implementation can be enabled with `-Z profile`. + +The [Instrumentation-based Code Coverage](https://doc.rust-lang.org/rustc/instrument-coverage.html) +chapter from the `rustc` book describes in detail how to enable and use the LLVM +instrumentation-based coverage feature. In contrast, the +[LLVM Source-Based Code Coverage](https://rustc-dev-guide.rust-lang.org/llvm-coverage-instrumentation.html) +chapter from the `rustc` development guide documents how the LLVM +coverage instrumentation is performed in the Rust compiler. + +In this section, we will first summarize some information from the +[LLVM Source-Based Code Coverage](https://rustc-dev-guide.rust-lang.org/llvm-coverage-instrumentation.html) +chapter, limited to details which are relevant to the development of the +source-based coverage feature in Kani. Then, we will explain how Kani taps into +the Rust coverage instrumentation to perform its own coverage instrumentation +and be able to report source-based code coverage results. This will also include +mentions to current issues with this implementation, which we plan to further +discuss in [Future possibilities](#future-possibilities). + +#### Understanding the Rust coverage instrumentation + +The LLVM coverage instrumentation is implemented in the Rust compiler as a +[MIR pass called `InstrumentCoverage`](https://rustc-dev-guide.rust-lang.org/llvm-coverage-instrumentation.html#mir-pass-instrumentcoverage). + +The MIR pass first builds a coverage-specific version of the MIR Control Flow +Graph (CFG) from the MIR. The initial version of this CFG is based on the MIR's +`BasicBlock`s, which then gets refined by combining blocks that can be chained +from a coverage-relevant point of view. The final version of the coverage CFG is +then used to determine where to inject the +[`StatementKind::Coverage`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.StatementKind.html#variant.Coverage) +statements in order to measure coverage for a single region coverage span. + +The injection of `StatementKind::Coverage` statements is the main result we are +interested in for the integration with Kani. Additionally, the instrumentation +will also attach the +[`FunctionCoverageInfo`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.FunctionCoverageInfo.html) +structure to each function's body.[^note-coverage-info] +This result is also needed at the moment because coverage statements do not +include information on the code region they are supposed to cover. +However, `FunctionCoverageInfo` contains the +[coverage mappings](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.Mapping.html), +which represent the relation between +[coverage counters](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CovTerm.html) +and code regions. + +As explained in [MIR Pass: +`InstrumentCoverage`](https://rustc-dev-guide.rust-lang.org/llvm-coverage-instrumentation.html#mir-pass-instrumentcoverage), +many coverage statements will not be converted into a physical +counter[^note-physical-counter]. Instead, they will be converted into a +*coverage-counter expression* that can be calculated based on other coverage +counters. We highly recommend looking at the example in [MIR Pass: +`InstrumentCoverage`](https://rustc-dev-guide.rust-lang.org/llvm-coverage-instrumentation.html#mir-pass-instrumentcoverage) +to better understand how this works. This optimization is mainly done for +performance reasons because incrementing a physical counter causes a +non-negligible overhead, especially within loops. + +The (`StatementKind::`)`Coverage` statements that are injected by the Rust coverage +instrumentation contain a [`CoverageKind`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html) field indicating the type of coverage counter. The variant +[`CounterIncrement`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.CounterIncrement) +represents physical counters, while +[`ExpressionUsed`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.ExpressionUsed) +represents the counter expressions that we just discussed. +Other variants such as `SpanMarker` or `BlockMarker` are not relevant to this +work since they should have been erased after the `InstrumentCoverage` pass. + +[^note-coverage-info]: It is important to note that the StableMIR interface does + not include `FunctionCoverageInfo` in function bodies. Because of that, we + need to pull it from the internal `rustc` function bodies. + +[^note-physical-counter]: By *physical counter*, we refer to a global program + variable that is initialized to zero and incremented by one each time that + the execution passes through. + +#### Integrating the instrumentation into Kani + +Now that we have explained what the Rust coverage instrumentation does at a high +level, we should be ready to discuss how it can be used from Kani. Here, we will +follow an approach where, during the codegen stage, we generate a Kani +reachability check for each code region and, after the verification stage, we +postprocess the information in those checks to generate the coverage +information. So this section will essentially be a retelling of the +implementation in [#3119](https://github.com/model-checking/kani/pull/3119), and +we will discuss variations/extensions of this approach in the +[appropriate](#rationale-and-alternatives) [sections](#future-possibilities). + +Clearly, the first step is adding `-C instrument-coverage` to the `rustc` flags +we use when calling the compiler to codegen. This flag enables the Rust coverage +instrumentation that we discussed earlier, resulting in + 1. the injection of + [`Coverage`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.StatementKind.html#variant.Coverage) + statements in the MIR code, and + 2. the inclusion of [`FunctionCoverageInfo`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.FunctionCoverageInfo.html) in function bodies. + +The next step is handling the `Coverage` statements from `codegen_statement`. + +Each `Coverage` statement contains opaque coverage +information[^note-opaque-coverage] of the +[`CoverageKind`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html) +type which can be processed to determine the type of coverage counter +(`CounterIncrement` for physical counters, `ExpressionUsed` for counter +expressions) and the ID number of the counter. These two pieces of information allow us to +uniquely identify the counter within a given function. For example, +`CounterIncrement(0)` would generally refer to the first physical counter in the +function. + +Unfortunately, the `CoverageKind` information does not tell us anything about +the code region that the counter covers. However, data about the code region can +be pulled from the coverage mappings included in the `FunctionCoverageInfo` that +is attached to the (internal) function body. Note that the coverage mappings +includes information about all the coverage counters in a function, even for +counters which have been dropped. Matching the `CoverageKind` information with +that of the counters in the coverage mappings allows us to retrieve the code +region for any counter. + +Using all this data, for each coverage statement[^note-expression-integration] we generate a coverage check +that maintains the essence of the coverage checks described in the [RFC for line +coverage](https://model-checking.github.io/kani/rfc/rfcs/0008-line-coverage.html): + +> Coverage checks are a new class of checks similar to [`cover` checks](https://model-checking.github.io/kani/rfc/rfcs/0003-cover-statement.html). +> The main difference is that users cannot directly interact with coverage checks (i.e., they cannot add or remove them manually). +> Coverage checks are encoded as an `assert(false)` statement (to test reachability) with a fixed description. +> In addition, coverage checks are: +> * Hidden from verification results. +> * Postprocessed to produce coverage results. + +Therefore, the last step is to postprocess the results from coverage checks to +produce coverage results. This is not too complicated to do since the checks +already include the counter information (type + ID) and the function name in +the check's description. If the span of the code region is also included +(this is what [#3119](https://github.com/model-checking/kani/pull/3119) is +currently doing), we can directly generate a primitive output like this: + +``` + () + * - + * ... + * - +``` + +For example, for the test case in +[#3119](https://github.com/model-checking/kani/pull/3119) we report this: + +``` +src/main.rs (main) + * 14:1 - 19:2 COVERED + +src/main.rs (test_cov) + * 5:1 - 6:15 COVERED + * 6:19 - 6:28 UNCOVERED + * 7:9 - 7:13 COVERED + * 9:9 - 9:14 UNCOVERED + * 11:1 - 11:2 COVERED +``` + +> **NOTE: This section has been written according to the implementation in +> [#3119](https://github.com/model-checking/kani/pull/3119), which currently +> produces a text-based output like the one shown above. There is ongoing work to +> store the coverage mappings in a separate file (as described in the next +> section), which would save us the need to attach code region data to the +> coverage checks.** + +[^note-opaque-coverage]: The Rust compiler uses the `Opaque` type to prevent +others from interfacing with unstable types (e.g., the `Coverage` type +[here](https://github.com/rust-lang/rust/blob/f7eefec4e03f5ba723fbc04d94dbc1203b7c9bff/compiler/stable_mir/src/mir/body.rs#L389)). +Nonetheless, this can be worked around by serializing its contents and parsing +it back into an internal data type. + +[^note-expression-integration]: We could follow an alternative approach where we +do not instrument each coverage statement, but only those that correspond to +physical counters. Unfortunately, doing so would lead to incorrect coverage +results due to the arithmetic nature of expression-based counters. We elaborate +on this topic in the later parts of this document. + +### The default coverage workflow in Kani + +In this section, we describe the default `--coverage` workflow from a +developer's point of view. This will hopefully help developers understand how +the different coverage components in Kani are connected. For example, we'll +describe the raw coverage information that gets produced throughout the default +`--coverage` workflow and define the basic `cov` commands that it will execute. + +The main difference with respect to the regular verification workflow is that, +at the end of the verification-based coverage run, Kani will generate two types +of files: + - One single file `.kanimap` file for the project. This file will contain the + coverage mappings for the project's source code. + - One `.kaniraw` file for each harness. This file will contain the + verification-based results for the coverage-oriented properties corresponding + to a given harness. + +Note that `.kaniraw` files correspond to `.profraw` files in the LLVM coverage +workflow. Similarly, the `.kanimap` file corresponds to the coverage-related +information that's embedded into the project's binaries in the LLVM coverage +workflow.[^note-kanimap] + +The files will be written into a new timestamped directory associated with the +coverage run. The path to this directory will be printed to standard output in +by default. For example, the [draft implementation](https://github.com/model-checking/kani/pull/3119) +writes the coverage files into the `target/kani//cov/` directory. + +Users aren't expected to read the information in any of these files. +Therefore, there's no need to restrict their format. +The [draft implementation](https://github.com/model-checking/kani/pull/3119) +uses the JSON format but we might consider using a binary format if it doesn't +scale. + +In addition, Kani will produce two types of coverage results: + 1. A coverage summary with the default options. + 2. A terminal-based coverage report with the default options. However, we will + only do this if the program is composed of a single source + file[^note-conditional-report]. + +[^note-kanimap]: Note that the `.kanimap` generation isn't implemented in + [#3119](https://github.com/model-checking/kani/pull/3119). The [draft + implementation of + `kani-cov`](https://github.com/model-checking/kani/pull/3121) simply reads + the source files referred to by the code coverage checks, but it doesn't get + information about code trimmed out by the MIR linker. + +[^note-conditional-report]: In other words, standalone `kani` would always emit +these terminal-based reports, but `cargo kani` would not unless the project +contains a single Rust file (for example, `src/main.rs`). + +## Rationale and alternatives + +### Other coverage implementations + +In a previous version of this feature, we used an ad-hoc coverage implementation. +In addition to being very inefficient[^note-benchmarks], the line-based coverage +results were not trivial to interpret by users. +At the moment, there's only another unstable, GCC-compatible code coverage implementation +based on the Gcov format. The Gcov format is line-based so it's not able +to report region coverage results. +In other words, it's not as advanced nor precise as the source-based implementation. + +[^note-benchmarks]: Actual performance benchmarks to follow in + [#3119](https://github.com/model-checking/kani/pull/3119). + +## Open questions + + - Do we want to instrument dependencies by default? Preliminary benchmarking results show a slowdown of 100% and greater. + More evaluations are required to determine how we handle instrumentation for dependencies, and what options we might want + to provide to users. + - How do we handle features/options for `kani-cov`? In particular, do we need more details in this RFC? + +## Future possibilities + +### Integration with LLVM + +As part of this work, we explored a potential integration with the LLVM +framework. The idea behind such an integration would essentially involve +producing coverage results in formats compatible with the LLVM framework (e.g., +the `.profraw` format). The main advantage of integrating with the LLVM +framework in this way is that we would not need a tool like `kani-cov` to +aggregate coverage results; we could just use LLVM tools such as `llvm-profdata` +and `llvm-cov` to consume them. + +However, at this time we recommend against integrating with LLVM due to these reasons: + 1. Generating the instrumented binary used in the [LLVM coverage + workflow](#the-llvm-code-coverage-workflow) requires a standard `rustc` + compilation with `--cfg kani` in addition to other flags including `-C + instrument-coverage`. This is likely to result in compilation errors since the + standard `rustc` backend cannot produce code for Kani APIs, for example. + 2. Producing the `.profraw` files requires executing the instrumented binary at + least once. This would be an issue for Rust projects which assume a particular + environment for their execution. + 3. There are no stable interfaces to create or modify files in formats + compatible with the LLVM framework. Even though the documentation for the [LLVM + Code Coverage Mapping Format](https://llvm.org/docs/CoverageMappingFormat.html) + is excellent, the easiest way to interact with files on these format is through + LLVM tools (e.g., + [`llvm-cov`](https://github.com/llvm/llvm-project/tree/main/llvm/tools/llvm-cov)) + which bring in many other LLVM dependencies. During our exploration, we + attempted to decode and re-encode files in the `.profraw` to set the counter + data to the values obtained during verification. To this end, we tried tools + like [`llvm-profparser`](https://github.com/xd009642/llvm-profparser/) which + can be used as a replacement for `llvm-profdata` and `llvm-cov` but failed to + parse coverage files emitted by the Rust compiler (this is also related to the + next point). Another crate that we used is + [`coverage-dump`](https://github.com/rust-lang/rust/tree/master/src/tools/coverage-dump), + a recent tool in the Rust compiler used for testing purposes. `coverage-dump` + extracts coverage mappings from LLVM IR assembly files (i.e., human-readable + `*.ll` files) but does not work with the binary-encoded formats. Finally, we + also built some ad-hoc tooling to perform these modifications but it soon + became evident that we would need to develop it further in order to handle any + program. + 4. LLVM releases a new version approximately every six months. This would + likely result in another "toolchain update" problem for Kani in order to + provide compatibility with newer LLVM versions. Moreover, the Rust compiler + supplies their own version of LLVM tools (`rust-profdata`, `rust-cov`, etc.) + which are fully compatible with coverage-related artifacts produced by `rustc`. + + +### Optimization with coverage-counter expressions + +In the [subsection related to the +integration](#integrating-the-instrumentation-into-kani), we noted that we could +follow an alternative approach where we only instrument coverage statements that +correspond to physical counters. In fact, this would be the logical choice since +the replacement of physical counters by expression-based counters would also be +a performance optimization for us. + +However, the expressions used in expression-based counters are built with the +arithmetic operators `Add` (`+`) and `Sub` (`-`). On the other hand, the +coverage checks performed by Kani have a boolean meaning: you either cover a +region or you do not. Thus, there are many cases where these two notions of +coverage counters are incompatible. For example, let's say we have this +function: + +```rust +fn check_value(val: u32) { + if val == VALUE { + do_this(); + } else { + do_that(); + } + do_something_else(); +} +``` + +One way to optimize the counters in this function is to have two physical +counters for the branches of the `if` statement (`c1` and `c2`), and then an +expression-based counter associated to the `do_something_else()` statement +adding those (i.e., `c3 = c1 + c2`). If we have, for example, two executions for +this program, with each one taking a different branch, then the results for the +coverage counters will be `c1 = 1`, `c2 = 1` and `c3 = c1 + c2 = 2`. + +But what does `c3 = 2` mean in the context of a verification-based coverage +result? That is not clear. For instance, in a Kani trace, you could have a +nondeterministic value for `val` which just happens to be `val == VALUE` and not +at the same time. This would result in the same counters (`c1 = 1`, `c2 = 1` and +`c3 = 2`), but the program is being run only once! + +Note that finding a verification-based replacement for the runtime operators in +counter-based expressions is an interesting research topic. If we could +establish a relation between the runtime and verification expressions, then we +could avoid the instrumentation of coverage checks for expression-based +counters. For example, could we replace the `Add` operator (`+`) with an `Or` +operator (`||`)? Intuitively, this makes sense since verification-based coverage +counters are binary. It also seems to work for our example since covering any of +the branches should result in the `do_something_else()` statement being covered +as well, with the counter values now being `c1 = 1`, `c2 = 1` and `c3 = 1`. +However, it is not clear that this would work for all cases, nor it is clear +that we can replace `Sub` with another verification-based operator.