From 94a4b8daa807b4145c2ab231915f966f7ad67941 Mon Sep 17 00:00:00 2001 From: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> Date: Mon, 17 Jun 2024 09:46:31 -0700 Subject: [PATCH 1/3] Update charter.adoc Revise to clarify intent of existing extensions, and add justification for second extension Signed-off-by: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> --- charter.adoc | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/charter.adoc b/charter.adoc index 4882074..626503e 100644 --- a/charter.adoc +++ b/charter.adoc @@ -1,12 +1,15 @@ = Preliminary Performance Event Sampling TG Charter -RISC-V hardware performance monitoring counters (Zihpm) provide support for counting performance events, and, with Sscofpmf, support for basic, interrupt-based performance event sampling. However, on most implementations sampling interrupts will skid, such that the resulting trap is taken some number of cycles and/or instructions after the instruction that caused the overflow retires. As a result the PC collected by the profiler will rarely match that of the causal instruction, since the PC will typically advance during the skid period. Other state that a profiler may want to collect (registers, call-stack, counter values, etc) is likely to be overwritten or modified as well. +RISC-V hardware performance monitoring counters (Zihpm) provide support for counting performance events, and, with Sscofpmf, support for basic, interrupt-based performance event sampling. These extensions provide a means for collecting performance event counts across a window of software execution, but do not provide a guaranteed means to associate an event with a specific instruction PC. Without such information it is difficult for a profiler to determine which instructions are experiencing performance events of interest, and hence are high priority targets for tuning. -The Performance Event Sampling TG aims to address these limitations by defining two new ISA extensions: +A second gap exists within the ISA: there is no capability that allows collecting data on the execution of sampled instructions. With information such as the data virtual address, memory access latency, exposed stall latency, etc, analysis tools can utilize more sophisticated techniques for identifying tuning opportunities. -* An extension that enables precise attribution of samples based on select events (e.g., instruction/uop retirement events) to the instruction that caused the counter overflow, despite implementations where the associated sampling interrupt may skid. This will provide more directly actionable information to the user, by precisely identifying the instructions that are most often experiencing performance events. -* An extension that enables sampling of instructions and/or uops, with collection of runtime event occurrences and latencies incurred by the instruction/uop. Such samples can be filtered based on instruction/uop type, events incurred, or latencies observed, allowing the user to focus on samples of interest. Further, associated sampling interrupts can be skidless, allowing the user to collect additional sample state (call-stack, register values) reliably. +The Performance Event Sampling TG aims to fill these gaps by defining two new ISA extensions: -Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, aiming to require no additional performance overhead beyond that resulting from the handling of sampling interrupts. The extensions will be compatible with the H extension, and support RISC-V security objectives. +* An extension that enables precise attribution of samples based on select events (e.g., instruction/uop retirement events) to the instruction that caused the counter overflow, despite implementations where the associated sampling interrupt may skid. This will provide more directly actionable information to the user, by precisely identifying the instructions that are most often experiencing performance events. +* An extension that enables sampling of instructions and/or uops, with collection of runtime metadata for the instruction/uop, including data virtual address, selecct event occurrences, and latencies incurred. Such samples can be filtered based on instruction/uop type, events incurred, or latencies observed, allowing the user to focus on samples of interest. Further, associated sampling interrupts can be skidless, allowing the user to collect additional sample state (call-stack, register values) reliably. + +Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, aiming to require no additional performance overhead beyond that resulting from the handling of sampling interrupts. The extensions will be compatible with the H extension, and support RISC-V security objectives. The TG will prototype support for the new extensions in Qemu and Linux perf, to demonstrate the usability of the ISA for kernels and tools. + From 201587550b27ca48c7b004fda3b2e543e7c919b2 Mon Sep 17 00:00:00 2001 From: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> Date: Mon, 24 Jun 2024 08:55:58 -0700 Subject: [PATCH 2/3] Update charter.adoc fix typo, clarify that overhead is only when enabled Signed-off-by: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> --- charter.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/charter.adoc b/charter.adoc index 626503e..8d56772 100644 --- a/charter.adoc +++ b/charter.adoc @@ -7,9 +7,9 @@ A second gap exists within the ISA: there is no capability that allows collectin The Performance Event Sampling TG aims to fill these gaps by defining two new ISA extensions: * An extension that enables precise attribution of samples based on select events (e.g., instruction/uop retirement events) to the instruction that caused the counter overflow, despite implementations where the associated sampling interrupt may skid. This will provide more directly actionable information to the user, by precisely identifying the instructions that are most often experiencing performance events. -* An extension that enables sampling of instructions and/or uops, with collection of runtime metadata for the instruction/uop, including data virtual address, selecct event occurrences, and latencies incurred. Such samples can be filtered based on instruction/uop type, events incurred, or latencies observed, allowing the user to focus on samples of interest. Further, associated sampling interrupts can be skidless, allowing the user to collect additional sample state (call-stack, register values) reliably. +* An extension that enables sampling of instructions and/or uops, with collection of runtime metadata for the instruction/uop, including data virtual address, select event occurrences, and latencies incurred. Such samples can be filtered based on instruction/uop type, events incurred, or latencies observed, allowing the user to focus on samples of interest. Further, associated sampling interrupts can be skidless, allowing the user to collect additional sample state (call-stack, register values) reliably. -Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, aiming to require no additional performance overhead beyond that resulting from the handling of sampling interrupts. The extensions will be compatible with the H extension, and support RISC-V security objectives. +Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, aiming to require no performance overhead when enabled beyond that resulting from the handling of sampling interrupts. The extensions will be compatible with the H extension, and support RISC-V security objectives. The TG will prototype support for the new extensions in Qemu and Linux perf, to demonstrate the usability of the ISA for kernels and tools. From 7c3c6aa0fcd12a9d7c18fc9c3802bbb359fa2f50 Mon Sep 17 00:00:00 2001 From: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> Date: Mon, 24 Jun 2024 10:08:42 -0700 Subject: [PATCH 3/3] Update charter.adoc Clarify that no performance overhead is a goal, not a requirement Signed-off-by: Beeman Strong <97133824+bcstrongx@users.noreply.github.com> --- charter.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charter.adoc b/charter.adoc index 8d56772..b8ea5ad 100644 --- a/charter.adoc +++ b/charter.adoc @@ -9,7 +9,7 @@ The Performance Event Sampling TG aims to fill these gaps by defining two new IS * An extension that enables precise attribution of samples based on select events (e.g., instruction/uop retirement events) to the instruction that caused the counter overflow, despite implementations where the associated sampling interrupt may skid. This will provide more directly actionable information to the user, by precisely identifying the instructions that are most often experiencing performance events. * An extension that enables sampling of instructions and/or uops, with collection of runtime metadata for the instruction/uop, including data virtual address, select event occurrences, and latencies incurred. Such samples can be filtered based on instruction/uop type, events incurred, or latencies observed, allowing the user to focus on samples of interest. Further, associated sampling interrupts can be skidless, allowing the user to collect additional sample state (call-stack, register values) reliably. -Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, aiming to require no performance overhead when enabled beyond that resulting from the handling of sampling interrupts. The extensions will be compatible with the H extension, and support RISC-V security objectives. +Each extension will be crafted to be implementation-friendly even for high-performance, out-of-order microarchitectures, such that an implementation that incurs no performance overhead beyond that resulting from the handling of sampling interrupts can be acheived with reasonable hardware cost and complexity. The extensions will be compatible with the H extension, and support RISC-V security objectives. The TG will prototype support for the new extensions in Qemu and Linux perf, to demonstrate the usability of the ISA for kernels and tools.