diff --git a/RELEASE_NOTES.txt b/RELEASE_NOTES.txt index d7d2879..827b6a6 100644 --- a/RELEASE_NOTES.txt +++ b/RELEASE_NOTES.txt @@ -1,66 +1,69 @@ -Radeon™ GPU Profiler V2.0 12-07-2023 +Radeon™ GPU Profiler V2.1 04-24-2024 ------------------------------------- -V2.0 Changes +V2.1 Changes ------------------------------------- * Radeon GPU Profiler -1) Support for additional AMD RDNA™ 3 hardware -2) Redesigned Wavefront occupancy user interface, allowing for user customization and improved usage of available screen real estate -3) Dark mode user interface support, allowing the user to choose between a light and dark theme (or have RGP follow the OS theme setting) -4) The ray tracing shader table displayed in the Pipeline state pane can now display data on thread divergence for individual shader functions (DirectX® 12 only for now, requires AMD Software: Adrenalin Edition™ 23.12.1 or newer) -5) Allow opening .rgp files which contain a large number of events -6) HIP kernels that contain calls to other functions now support the Call Targets table in the Instruction timing pane's side panel -7) PIX3 marker support updated for latest version of WinPixEventRuntime -8) Bug/stability fixes - -* Radeon Developer Panel - -1) To support collecting data for thread divergence in the ray tracing shader table, the Profiling workflow user interface has a new "Enable shader instrumentation" checkbox (affects DirectX 12 only for now, requires AMD Software: Adrenalin Edition™ 23.12.1 or newer) +1) Interoperability with the Radeon GPU Analyzer: binary pipelines can now be extracted from a loaded profile data set in RGP and automatically loaded into a new instance of RGA for analysis +2) Rows in Wavefront occupancy pane can now be resized, allowing for additional user customization +3) New "Color by limiting factor" coloring mode in the Wavefront occupancy and Event timing panes. This will highlight events whose theoretical occupancy is limited by VGPR usage, LDS usage or thread group dimensions +4) New "Color by context rolls" coloring mode in the Wavefront occupancy and Event timing panes. This will highlight events where a context roll occurred since the previous event +5) Latency visualization in the Instruction timing pane will now show which part of the total latency represents a "pre-issue" stall +6) Fixed issue with incorrect LDS usage reported on RDNA™-based GPUs +7) Bug/stability fixes Known Issues ------------------------------------- * All platforms -1) Radeon Developer Panel can only capture a profile on a single AMD GPU at a time. -2) Radeon Developer Panel cannot capture profiles from non-AMD GPUs. -3) Applications that call Present() from the async compute queue are not supported on pre-RDNA hardware. Incomplete profile data may result on RDNA-based hardware. -4) When using RGP with RenderDoc, please make sure that RenderDoc is terminated between RenderDoc capture sessions (generating a RenderDoc capture file or loading a RenderDoc capture file is considered a session for the purpose here). While it is possible to take multiple RGP profiles of a RenderDoc capture file, it is not possible to take RGP profiles between RenderDoc sessions. If this is attempted, RenderDoc will show an error dialog box indicating that an RGP profile can't be taken and to restart RenderDoc -5) If an instance of Radeon GPU Profiler is spawned from RenderDoc, it must be closed before restarting RenderDoc. The menu option to create new RGP profiles will not be enabled otherwise. -6) OpenCL™ captures may include an extra DMA command buffer in the Profile summary. -7) In some rare cases on RDNA 2 hardware, all counter data may be missing from a captured RGP profile. When this happens, Radeon Developer Panel will prompt the user to recapture. -8) In some rare cases, data for one or more cache counters may be missing. Usually, recapturing will allow the missing data to show up. -9) It is recommended to use at least 1080p display resolution (1920 x 1080) with the RGP user interface. Some minor user interface issues may appear when using a lower resolution. -10) Cache and ray tracing counter data collection is not currently supported on RDNA based APUs. -11) For systems consisting of an AMD APU and an AMD discrete GPU, capturing profiles should work, but an error may be logged in the Radeon Developer Panel regarding not being able to set peak clock mode. It is recommended that the GPU in the APU be disabled in the BIOS. +1) When using RGP with RenderDoc, please make sure that RenderDoc is terminated between RenderDoc capture sessions (generating a RenderDoc capture file or loading a RenderDoc capture file is considered a session for the purpose here). While it is possible to take multiple RGP profiles of a RenderDoc capture file, it is not possible to take RGP profiles between RenderDoc sessions. If this is attempted, RenderDoc will show an error dialog box indicating that an RGP profile can't be taken and to restart RenderDoc +2) If an instance of Radeon GPU Profiler is spawned from RenderDoc, it must be closed before restarting RenderDoc. The menu option to create new RGP profiles will not be enabled otherwise. +3) OpenCL™ captures may include an extra DMA command buffer in the Profile summary. +4) It is recommended to use at least 1080p display resolution (1920 x 1080) with the RGP user interface. Some minor user interface issues may appear when using a lower resolution. * Windows® 1) D3D12 command list calls of ExecuteIndirect() may show in RGP as multiple compute events. 2) Some Radeon Software hotkeys may conflict with Radeon GPU Profiler shortcut keys. The Radeon Software hotkeys can be reconfigured by opening the Radeon Software panel (from the system tray), selecting the Hotkeys tab under Settings then changing or unbinding any conflicting hotkeys. 3) If a DirectX® 12 profile is missing GPU synchronization primitive data (i.e. signals and waits) on the Frame summary pane, please try running the included scripts\AddUserToGroup.bat batch file and then recapturing the profile. This batch file must be run as Administrator. -4) Current versions of the Radeon Developer Panel cannot profile Universal Windows Platform (UWP) applications. Please use Radeon Developer Panel v2.8 to profile a UWP application. * Linux® -1) Installations of Ubuntu 20.04 or newer may have the RADV open source Vulkan® driver installed by default on the system. As a result, after an amdgpu-pro driver install, the default Vulkan ICD may be the RADV ICD. In order to capture a profile, Vulkan applications must be using the amdgpu-pro Vulkan ICD. The default Vulkan ICD can be overridden by setting the following environment variable before launching a Vulkan application: VK_ICD_FILENAMES=/etc/vulkan/icd.d/amd_icd64.json -2) After launching RGP from the Developer Panel to view a captured profile, the panel may fail to connect the next time it is launched. The workaround is to close RGP before relaunching the panel. -3) If the Developer Panel or the Developer Service crash while running with the root account, it may be necessary to restart/exit them again with the root account in order to cleanup shared memory. -4) When running with the root account, the Developer Panel may output error or warning messages to the terminal. These should not prevent the panel from functioning properly. -5) If the RadeonDeveloperServiceCLI application crashes, shared memory may need to be cleaned up by running the remove_shared_memory.sh script located in the script folder of the RGP release kit. Run the script with elevated privileges using sudo. -6) The Radeon Developer Panel may fail to start the Radeon Developer Service when the Connect button is clicked. If this occurs, manually start the Radeon Developer Service, select localhost from the the Recent connections list and click the Connect button again. -7) On some RDNA 3 hardware, detailed instruction timing data will not be available even when the user asks Radeon Developer Panel to collect it. +1) Installations of Ubuntu 20.04 or newer may have the RADV open source Vulkan driver installed by default on the system. As a result, after an amdgpu-pro driver install, the default Vulkan ICD may be the RADV ICD. In order to capture a profile, Vulkan applications must be using the amdgpu-pro Vulkan ICD. The default Vulkan ICD can be overridden by setting the following environment variable before launching a Vulkan application: VK_ICD_FILENAMES=/etc/vulkan/icd.d/amd_icd64.json * RDNA 1) The Device configuration does not show the correct Work group processor per Shader engine for certain parts with harvested CUs. +* Radeon Developer Panel + +1) See RDP_RELEASE_NOTES.txt for additional items that may affect profiling. + Release Notes History ------------------------------------- +V2.0 Changes +------------------------------------- + +* Radeon GPU Profiler + +1) Support for additional AMD RDNA 3 hardware +2) Redesigned Wavefront occupancy user interface, allowing for user customization and improved usage of available screen real estate +3) Dark mode user interface support, allowing the user to choose between a light and dark theme (or have RGP follow the OS theme setting) +4) The ray tracing shader table displayed in the Pipeline state pane can now display data on thread divergence for individual shader functions (DirectX 12 only for now, requires AMD Software: Adrenalin Edition™ 23.12.1 or newer) +5) Allow opening .rgp files which contain a large number of events +6) HIP kernels that contain calls to other functions now support the Call Targets table in the Instruction timing pane's side panel +7) PIX3 marker support updated for latest version of WinPixEventRuntime +8) Bug/stability fixes + +* Radeon Developer Panel + +1) To support collecting data for thread divergence in the ray tracing shader table, the Profiling workflow user interface has a new "Enable shader instrumentation" checkbox (affects DirectX 12 only for now, requires AMD Software: Adrenalin Edition 23.12.1 or newer) + V1.16 Changes ------------------------------------- @@ -95,20 +98,21 @@ V1.15 Changes * Radeon GPU Profiler -1) Support for additional AMD RDNA 3 hardware -2) Newly redesigned ISA disassembly views in the Pipeline state and Instruction timing panes +1) Support for additional AMD RDNA 3 hardware +2) Newly redesigned ISA disassembly views in the Pipeline state and Instruction timing panes - Code blocks can now be collapsed/expanded - Selected token highlighting allows you to quickly see other instances of the selected token (instruction opcodes, registers and constants) - One-click navigation between branch instructions and their targets, along with tracked navigation history - Customize the displayed columns - Improved search result highlighting -3) Improved performance in the System activity timeline in the Frame summary pane when opening large profiles -4) Instruction timing side panel will now report the total number of WMMA (wave matrix multiply accumulate) instructions executed by a shader when running on RDNA 3 or newer hardware -5) Pipeline state pane will now report when conservative rasterization is enabled -6) Fixed issues with keyboard selection in the tree view in the Event timing and Pipeline state panes -7) DirectX 12 Mesh shader functions and Vulkan Mesh shader extension functions now are identified properly in RGP's event lists -8) Fixed incorrect tree hierarchy in the Event timing and Pipeline state pane when events are grouped by user events and event filtering is used -9) Bug/stability fixes +3) Improved performance in the System activity timeline in the Frame summary pane when opening large profiles +4) Instruction timing side panel will now report the total number of WMMA (wave matrix multiply accumulate) instructions executed by a shader when running on RDNA 3 or newer hardware +5) Pipeline state pane will now report when conservative rasterization is enabled +6) Fixed issues with keyboard selection in the tree view in the Event timing and Pipeline state panes +7) DirectX 12 Mesh shader functions and Vulkan Mesh shader extension functions now are identified properly in RGP's event lists +8) Fixed incorrect tree hierarchy in the Event timing and Pipeline state pane when events are grouped by user events and event filtering is used +9) Initial support for DirectX 12 Work Graphs +10) Bug/stability fixes V1.14.1 Changes ------------------------------------- diff --git a/docs/source/conf.py b/docs/source/conf.py index da82543..90fa827 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -1,5 +1,7 @@ # -*- coding: utf-8 -*- # +# Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved. +# # Radeon GPU Profiler documentation build configuration file, created by # sphinx-quickstart on Fri Jun 30 12:01:48 2017. # @@ -46,7 +48,7 @@ # General information about the project. project = u'Radeon GPU Profiler' -copyright = u'2017-2023, Advanced Micro Devices, Inc. All rights reserved.' +copyright = u'2017-2024, Advanced Micro Devices, Inc. All rights reserved.' author = u'AMD Developer Tools' # The version info for the project you're documenting, acts as replacement for @@ -54,9 +56,9 @@ # built documents. # # The short X.Y version. -version = u'2.0.0' +version = u'2.1.0' # The full version, including alpha/beta/rc tags. -release = u'2.0.0' +release = u'2.1.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/source/index.rst b/docs/source/index.rst index ffab934..4de802f 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -308,6 +308,13 @@ right-click context menu to jump between panes, the option to "View in context rolls" will only be available if the selected event is currently present in the events table on the context rolls pane. +In the events panes selecting the "context rolls" option from the "Color +By" drop down box in the Wavefront occupancy event timeline or the Event +timing pane shows all events that have had their context rolled from the +previous event. + +.. image:: media_rgp/rgp_context_rolls_4.png + Most expensive events --------------------- @@ -566,8 +573,8 @@ supports zooming: .. |ZoomInRef| image:: media_rgp/rgp_zoom_in.png .. |ZoomOutRef| image:: media_rgp/rgp_zoom_out.png -|ZoomSelectionRef| **Zoom to selection** ----------------------------------------- +|ZoomSelectionRef| Zoom to selection +------------------------------------ When **Zoom to selection** is clicked, the zoom level is increased to a selected region or selected event. A selection region is set by holding down the left mouse button while the mouse is on the graph and dragging the mouse @@ -580,14 +587,14 @@ to a selected event can be accomplished by simply double clicking the event. Pressing the **Z** shortcut key while holding down the **CTRL** key activates **Zoom to selection** as well. -|ZoomResetRef| **Zoom reset** ------------------------------ +|ZoomResetRef| Zoom reset +------------------------- When **Zoom reset** is clicked, the zoom level is returned to the original level to reveal the entire time span on the graph. The zoom level can also be reset using the **H** shortcut key. -|ZoomInRef| **Zoom in** ------------------------ +|ZoomInRef| Zoom in +------------------- Increases the zoom level incrementally to display a shorter time span on the graph. The zoom level is increased each time this icon is clicked until the maximum zoom level is reached. Alternatively, holding down the **CTRL** key @@ -596,8 +603,8 @@ will also zoom in for a more detailed view. Zooming in can be activated with the **A** shortcut key. To zoom in quickly at a 10x rate, press the **S** shortcut key. -|ZoomOutRef| **Zoom out** -------------------------- +|ZoomOutRef| Zoom out +--------------------- Decreases the zoom level incrementally to display a longer time span on the graph. The zoom level is decreased each time this icon is clicked until the minimum zoom level is reached (i.e. the full available time region). @@ -1046,6 +1053,55 @@ contain a user event hierarchy, nothing will be shown. Events enclosed by user markers are colored in the wavefront occupancy view. They are also visible in the side panel. +.. _rga_rgp_interop: + +Radeon GPU Analyzer and Radeon GPU Profiler interop +=================================================== + +The Radeon GPU Analyzer now supports opening pipeline binary files in its binary analysis mode. +Users can create and open these binary files directly from RGP and view them in RGA. To export +a pipeline binary for analysis in RGA right-click an event in any RGP pane that contains events +and select "Analyze pipeline in Radeon GPU Analyzer" in the context menu options. + +.. image:: media_rgp/rgp_analyze_pipeline_in_rga.png + +Some events such as indirect raytracing events can have multiple pipeline binaries. To select +which pipeline binary to analyze in RGA right-click a specific binary in the pipeline state +shader table for ray tracing events and select the context menu option to analyze that pipeline +binary. Alternatively, right-click the event anywhere in RGP and select "Analyze pipeline in +Radeon GPU Analyzer" to open a window to pick from the full list of pipeline binaries in that event. +Pushing the "Analyze selected binaries" button will save and open all binaries that were checked in +the list in RGA. Keep in mind opening a large number of pipeline binaries in RGA may take some time. + +.. image:: media_rgp/rgp_select_multiple_pipeline_binaries_for_rga_export.png + +The location of the Radeon GPU Analyzer executable file as well as the location where pipeline +binaries are saved can be changed in the Radeon GPU Analyzer interop section of the general settings. + +.. image:: media_rgp/rgp_rga_interop_settings.png + +If either the executable file or pipeline binary file path cannot be found an error message will +be displayed next to the corresponding setting. + +.. image:: media_rgp/rgp_rga_interop_settings_invalid.png + +Selecting a pipeline binary for analysis while either of these file paths are invalid will open up +a message prompt to select a valid file path. + +.. image:: media_rgp/rgp_rga_executable_not_found.png + +If the pipeline binary being exported already exists on disk a message prompt will appear asking +if the file should be overwritten or directly opened in RGA without being overwritten. Select the +checkbox in the message prompt to save the selected option for the future and not see this message +again. + +.. image:: media_rgp/rgp_pipeline_binary_file_already_exists.png + +This setting can be changed at any time from the Radeon GPU Analyzer interop section of the +general settings. + +.. image:: media_rgp/rgp_overwrite_existing_pipeline_binaries_options.png + RenderDoc & Radeon GPU Profiler interop BETA ============================================ @@ -1225,4 +1281,4 @@ Microsoft is a registered trademark of Microsoft Corporation in the US and other Windows is a registered trademark of Microsoft Corporation in the US and other jurisdictions. -© 2016-2023 Advanced Micro Devices, Inc. All rights reserved. \ No newline at end of file +© 2016-2024 Advanced Micro Devices, Inc. All rights reserved. diff --git a/docs/source/instruction_timing.rst b/docs/source/instruction_timing.rst index 4728c74..667dfe5 100644 --- a/docs/source/instruction_timing.rst +++ b/docs/source/instruction_timing.rst @@ -65,17 +65,18 @@ be seen in the image below. Solid green indicates how much of a given instruction's latency was hidden by VALU work. Solid yellow indicates how much latency was hidden by SALU or SMEM work. A diagonal hatch pattern made up of both -green and yellow indicates how much latency was hidden by both VALU and SALU work. Finally, red indicates -how much latency was not hidden by other work being done on the GPU. It is likely that bars -with large red segments indicate a stall occurring while the shader is executing. When the mouse -hovers over a row in the Latency column, a tooltip appears showing the exact breakdown of that +green and yellow indicates how much latency was hidden by both VALU and SALU work. +Sections with a black diagonal hatch pattern are the portion of the stall that is the pre-issue stall. +Finally, solid red indicates how much latency was not hidden by other work being done on the GPU. +It is likely that bars with large red segments indicate a stall occurring while the shader is executing. +When the mouse hovers over a row in the Latency column, a tooltip appears showing the exact breakdown of that instruction's latency. -In the image above, the total latency of the instruction is 853 clocks. Of those 853 clocks, 209 clocks -worth of latency are hidden by SALU work on other slots and 554 clocks worth of latency are hidden by -VALU work. The 209 clocks where both SALU and VALU work was being done is shown using the hatch pattern. -The segment between 209 and 554 clocks is shown as green since only VALU work is being done. The segment -between 554 and 853 clocks is shown as red since there is no other work being done. Since there is more +In the image above, the total latency of the instruction is 845 clocks. Of those 845 clocks, 197 clocks +worth of latency are hidden by SALU work on other slots and 453 clocks worth of latency are hidden by +VALU work. The 197 clocks where both SALU and VALU work was being done is shown using the hatch pattern. +The segment between 197 and 453 clocks is shown as green since only VALU work is being done. The segment +between 453 and 845 clocks is shown as red since there is no other work being done. Since there is more VALU work being done at the same time, green is more prevalent than yellow in this bar. Contrast this with the image below, where an instruction is shown where more latency is hidden by SALU @@ -83,6 +84,14 @@ work. In this case, yellow is more prevalent than green. .. image:: media_rgp/rgp_instruction_timing_latency_bars_2.png +When the amount of latency hidden by SALU and VALU work is greater than the the pre-issue +stall, no black diagonal hatch pattern will be displayed, and the tooltip will display that the pre-issue +stall is completely hidden. If the amount of latency hidden by SALU and VALU work is less than the +pre-issue stall, the duration after the VALU and SALU work will have the black diagonal hatch pattern, +as shown in the image below. + +.. image:: media_rgp/rgp_instruction_timing_latency_bars_3.png + A red indicator will be shown in the vertical scroll bar corresponding to the location of the instruction with the highest latency. This allows you to quickly find the hotspot within the shader. @@ -230,6 +239,13 @@ Compute profile are shown below. .. image:: media_rgp/rgp_instruction_timing_3.png +The pipeline binary of an event can also be exported for analysis in the Radeon GPU Aanalyzer from the +instruction timing pane. Select the hamburger drop down as shown in the image below and select +"Analyze pipeline in Radeon GPU Analyzer". Selecting this option for indirect raytracing events will +save and open the pipeline binary for the currently selected export name. + +.. image:: media_rgp/rgp_instruction_timing_rga_interop.png + More information on some of the features available in the Instruction timing pane can be found under the :ref:`ISA View ` section. diff --git a/docs/source/media_rgp/rdp_open_profile.png b/docs/source/media_rgp/rdp_open_profile.png index 7076e0d..1c0bf00 100644 Binary files a/docs/source/media_rgp/rdp_open_profile.png and b/docs/source/media_rgp/rdp_open_profile.png differ diff --git a/docs/source/media_rgp/rgp_analyze_pipeline_in_rga.png b/docs/source/media_rgp/rgp_analyze_pipeline_in_rga.png new file mode 100644 index 0000000..05b32b6 Binary files /dev/null and b/docs/source/media_rgp/rgp_analyze_pipeline_in_rga.png differ diff --git a/docs/source/media_rgp/rgp_color_theme_drop_down.png b/docs/source/media_rgp/rgp_color_theme_drop_down.png index 213f399..5f03297 100644 Binary files a/docs/source/media_rgp/rgp_color_theme_drop_down.png and b/docs/source/media_rgp/rgp_color_theme_drop_down.png differ diff --git a/docs/source/media_rgp/rgp_context_rolls_3.png b/docs/source/media_rgp/rgp_context_rolls_3.png index a538739..8438059 100644 Binary files a/docs/source/media_rgp/rgp_context_rolls_3.png and b/docs/source/media_rgp/rgp_context_rolls_3.png differ diff --git a/docs/source/media_rgp/rgp_context_rolls_4.png b/docs/source/media_rgp/rgp_context_rolls_4.png new file mode 100644 index 0000000..71d19e5 Binary files /dev/null and b/docs/source/media_rgp/rgp_context_rolls_4.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_1.png b/docs/source/media_rgp/rgp_instruction_timing_1.png index 7f01490..4cdb088 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_1.png and b/docs/source/media_rgp/rgp_instruction_timing_1.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_2.png b/docs/source/media_rgp/rgp_instruction_timing_2.png index f622583..4423a85 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_2.png and b/docs/source/media_rgp/rgp_instruction_timing_2.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_3.png b/docs/source/media_rgp/rgp_instruction_timing_3.png index a7b6fbb..e8b12c6 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_3.png and b/docs/source/media_rgp/rgp_instruction_timing_3.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_exports.png b/docs/source/media_rgp/rgp_instruction_timing_exports.png index 654828c..c9bb8e0 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_exports.png and b/docs/source/media_rgp/rgp_instruction_timing_exports.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_latency_bars.png b/docs/source/media_rgp/rgp_instruction_timing_latency_bars.png index 061872e..58fc76e 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_latency_bars.png and b/docs/source/media_rgp/rgp_instruction_timing_latency_bars.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_latency_bars_2.png b/docs/source/media_rgp/rgp_instruction_timing_latency_bars_2.png index faca615..65ee0ef 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_latency_bars_2.png and b/docs/source/media_rgp/rgp_instruction_timing_latency_bars_2.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_latency_bars_3.png b/docs/source/media_rgp/rgp_instruction_timing_latency_bars_3.png new file mode 100644 index 0000000..57b0f23 Binary files /dev/null and b/docs/source/media_rgp/rgp_instruction_timing_latency_bars_3.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_normalization_mode.png b/docs/source/media_rgp/rgp_instruction_timing_normalization_mode.png index 14dd3f1..e4e561e 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_normalization_mode.png and b/docs/source/media_rgp/rgp_instruction_timing_normalization_mode.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_rga_interop.png b/docs/source/media_rgp/rgp_instruction_timing_rga_interop.png new file mode 100644 index 0000000..4b04977 Binary files /dev/null and b/docs/source/media_rgp/rgp_instruction_timing_rga_interop.png differ diff --git a/docs/source/media_rgp/rgp_instruction_timing_wavefront_latencies.png b/docs/source/media_rgp/rgp_instruction_timing_wavefront_latencies.png index c4209a7..013c7be 100644 Binary files a/docs/source/media_rgp/rgp_instruction_timing_wavefront_latencies.png and b/docs/source/media_rgp/rgp_instruction_timing_wavefront_latencies.png differ diff --git a/docs/source/media_rgp/rgp_most_expensive_events_2.png b/docs/source/media_rgp/rgp_most_expensive_events_2.png index 9464a69..a0b0a29 100644 Binary files a/docs/source/media_rgp/rgp_most_expensive_events_2.png and b/docs/source/media_rgp/rgp_most_expensive_events_2.png differ diff --git a/docs/source/media_rgp/rgp_overwrite_existing_pipeline_binaries_options.png b/docs/source/media_rgp/rgp_overwrite_existing_pipeline_binaries_options.png new file mode 100644 index 0000000..45b099a Binary files /dev/null and b/docs/source/media_rgp/rgp_overwrite_existing_pipeline_binaries_options.png differ diff --git a/docs/source/media_rgp/rgp_pipeline_binary_file_already_exists.png b/docs/source/media_rgp/rgp_pipeline_binary_file_already_exists.png new file mode 100644 index 0000000..ba0f145 Binary files /dev/null and b/docs/source/media_rgp/rgp_pipeline_binary_file_already_exists.png differ diff --git a/docs/source/media_rgp/rgp_pipeline_state_2.png b/docs/source/media_rgp/rgp_pipeline_state_2.png index f4a24aa..f442050 100644 Binary files a/docs/source/media_rgp/rgp_pipeline_state_2.png and b/docs/source/media_rgp/rgp_pipeline_state_2.png differ diff --git a/docs/source/media_rgp/rgp_pipeline_state_raytracing_5.png b/docs/source/media_rgp/rgp_pipeline_state_raytracing_5.png new file mode 100644 index 0000000..07fcabd Binary files /dev/null and b/docs/source/media_rgp/rgp_pipeline_state_raytracing_5.png differ diff --git a/docs/source/media_rgp/rgp_pipeline_summary_5.png b/docs/source/media_rgp/rgp_pipeline_summary_5.png index 608ca26..87c680a 100644 Binary files a/docs/source/media_rgp/rgp_pipeline_summary_5.png and b/docs/source/media_rgp/rgp_pipeline_summary_5.png differ diff --git a/docs/source/media_rgp/rgp_pipeline_summary_6.png b/docs/source/media_rgp/rgp_pipeline_summary_6.png new file mode 100644 index 0000000..8294b37 Binary files /dev/null and b/docs/source/media_rgp/rgp_pipeline_summary_6.png differ diff --git a/docs/source/media_rgp/rgp_resized_occupancy_views.png b/docs/source/media_rgp/rgp_resized_occupancy_views.png new file mode 100644 index 0000000..05bf04c Binary files /dev/null and b/docs/source/media_rgp/rgp_resized_occupancy_views.png differ diff --git a/docs/source/media_rgp/rgp_rga_executable_not_found.png b/docs/source/media_rgp/rgp_rga_executable_not_found.png new file mode 100644 index 0000000..b624a37 Binary files /dev/null and b/docs/source/media_rgp/rgp_rga_executable_not_found.png differ diff --git a/docs/source/media_rgp/rgp_rga_interop_settings.png b/docs/source/media_rgp/rgp_rga_interop_settings.png new file mode 100644 index 0000000..b34ace0 Binary files /dev/null and b/docs/source/media_rgp/rgp_rga_interop_settings.png differ diff --git a/docs/source/media_rgp/rgp_rga_interop_settings_invalid.png b/docs/source/media_rgp/rgp_rga_interop_settings_invalid.png new file mode 100644 index 0000000..103fa3b Binary files /dev/null and b/docs/source/media_rgp/rgp_rga_interop_settings_invalid.png differ diff --git a/docs/source/media_rgp/rgp_select_multiple_pipeline_binaries_for_rga_export.png b/docs/source/media_rgp/rgp_select_multiple_pipeline_binaries_for_rga_export.png new file mode 100644 index 0000000..6fcd7f8 Binary files /dev/null and b/docs/source/media_rgp/rgp_select_multiple_pipeline_binaries_for_rga_export.png differ diff --git a/docs/source/media_rgp/rgp_themes_and_colors_settings.png b/docs/source/media_rgp/rgp_themes_and_colors_settings.png index a70ef0e..b067b98 100644 Binary files a/docs/source/media_rgp/rgp_themes_and_colors_settings.png and b/docs/source/media_rgp/rgp_themes_and_colors_settings.png differ diff --git a/docs/source/media_rgp/rgp_wavefront_occupancy_6.png b/docs/source/media_rgp/rgp_wavefront_occupancy_6.png index dc13e0d..d894db1 100644 Binary files a/docs/source/media_rgp/rgp_wavefront_occupancy_6.png and b/docs/source/media_rgp/rgp_wavefront_occupancy_6.png differ diff --git a/docs/source/pipeline_state.rst b/docs/source/pipeline_state.rst index 8ad131b..0a947cc 100644 --- a/docs/source/pipeline_state.rst +++ b/docs/source/pipeline_state.rst @@ -98,7 +98,10 @@ the table by Export name using the **Filter shaders...** field. If you click on any hyperlinked text in the shader table, it will navigate to the ISA tab and show the ISA for the selected shader function. You can also use the right-click context menu to navigate to either the ISA tab or to the Instruction timing -view. +view. The context menu also allows you to analyze the pipeline binary for that +shader function in the Radeon GPU Analyzer. + +.. image:: media_rgp/rgp_pipeline_state_raytracing_5.png If the **Enable shader instrumentation** checkbox was checked in Radeon Developer Panel when the profile was captured, the table will also include diff --git a/docs/source/pipelines.rst b/docs/source/pipelines.rst index 19ca286..c6609ae 100644 --- a/docs/source/pipelines.rst +++ b/docs/source/pipelines.rst @@ -62,6 +62,14 @@ Each entry in the table displays the following information: The **Filter pipelines...** field can be used to filter items in the list by the API PSO hash. The Pipelines table can be sorted by clicking on a column header. +Right-clicking a pipeline in the pipeline summary section displays a context menu giving the +option to "Analyze pipeline in Radeon GPU Analyzer." Selecting the option saves the pipeline +in a binary format and opens the binary file in the Radeon GPU Analyzer. See the section +:ref:`Radeon GPU Analyzer and Radeon GPU Profiler interop` for more +information. + +.. image:: media_rgp/rgp_pipeline_summary_6.png + Below the table, the Bucket ID, API PSO hash and Driver internal pipeline hash for the currently-selected pipeline is displayed. There is also a quick link to view the selected pipeline in the Pipeline state view. This will navigate to the diff --git a/docs/source/wavefront_occupancy.rst b/docs/source/wavefront_occupancy.rst index bd46948..82a615d 100644 --- a/docs/source/wavefront_occupancy.rst +++ b/docs/source/wavefront_occupancy.rst @@ -58,6 +58,9 @@ legend to visualize wavefronts in different ways: wavefronts ran on. This can be useful to visualize the amount of context rolls that occurred. +- **Color by limiting factor.** Shows the limiting factor for the occupancy + of that shader. + - **Color by shader engine.** Shows which shader engine the wavefronts ran on. @@ -79,6 +82,10 @@ legend to visualize wavefronts in different ways: well as wavefronts from shaders with inlined ray tracing will be shown using the specified ray tracing color. All other waves will be shown as grey. +- **Color by indirect command** Shows which wavefronts correspond to which + indirect commands of the profile. Each indirect command is assigned a unique + color. All other waves will be shown as grey. + Color modes can be synchronized across the Wavefront occupancy and Event timing panes. To do this, simply hold down the Ctrl key when selecting a mode from any Color by combo box. The selected color mode will be used for the Wavefront @@ -228,6 +235,12 @@ events in different ways: context. This can be useful to visualize the amount of context rolls that occurred. +- **Color by context rolls.** Shows which events had their context rolled + since the previous event. + +- **Color by limiting factor.** Shows the largest limiting factor for the + occupancy for any shader in that event. + - **Color by event.** Will show each event in a unique color. - **Color by pass.** Groups events into different passes depending on @@ -253,6 +266,10 @@ events in different ways: - **Color by ray tracing** will only colorize raytracing events. All other events will be greyed out. +- **Color by indirect command** Will colorize each event based on which + indirect command the event came from. Events launched from the same + indirect command get the same unique color. All other events will be greyed out. + Beneath the **Color by** combo-box is the **Event filter** combo-box. This allows the user to visualize only certain types of events on the timeline. For example, the user can select to see draws, dispatches, clears, barriers, @@ -416,7 +433,7 @@ Below is a screenshot of what the right-click context menu looks like. .. rubric:: Wavefront occupancy customization The Wavefront occupancy section of RGP is customizable. Users can hide -and reorder the vertical position of views. +and reorder the vertical position of views. Users can also resize the height of the views. To hide a view, simply press the X button next to the view. @@ -449,6 +466,10 @@ reflect its new position. .. image:: media_rgp/rgp_occupancy_view_new_position.png +The views can also be resized by clicking and dragging the bottom of the view. + +.. image:: media_rgp/rgp_resized_occupancy_views.png + The customization of the Wavefront occupancy section is treated like a normal RGP setting and persists upon closing and reopening RGP.