ROCm · dgaliffiAMD · May 23, 2024 · May 15, 2024 · May 16, 2024 · May 22, 2024
@@ -0,0 +1,18 @@
+name: Linting
+
+on:
+ push:
+ branches:
+ - develop
+ - main
+ - 'release/rocm-rel*'
+ pull_request:
+ branches:
+ - develop
+ - main
+ - 'release/roc-rel*'
+
+jobs:
+ call-workflow-passing-data:
+ name: Documentation
+ uses: ROCm/rocm-docs-core/.github/workflows/linting.yml@develop
@@ -1,12 +1,12 @@
 # Example checklist
 
 - Elaboration
-  - [ ] Example concept is described and agreed upon
+ - [ ] Example concept is described and agreed upon
 - Implementation
-  - [ ] Example is implemented
+ - [ ] Example is implemented
 - Internal review
-  - [ ] Internal code review is done
+ - [ ] Internal code review is done
 - External review
-  - [ ] Upstreaming PR is opened, external review is done
+ - [ ] Upstreaming PR is opened, external review is done
 - Done
-  - [ ] Example merged to upstream
+ - [ ] Example merged to upstream
@@ -1,16 +1,18 @@
 ## Notes for the reviewer
+
 _The reviewer should acknowledge all these topics._
 <insert notes>
 
 ## Checklist before merge
+
 - [ ] CMake support is added
-  - [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
+ - [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
 - [ ] GNU Make support is added (Linux)
 - [ ] Visual Studio project is added for VS2017, 2019, 2022 (Windows) (use [the script](https://projects.streamhpc.com/departments/knowledge/employee-handbook/-/wikis/Projects/AMD/Libraries/examples/Adding-Visual-Studio-Projects-to-new-examples#scripts))
-  - [ ] DLL dependencies are copied via `<Content Include`
-  - [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
-  - [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
+ - [ ] DLL dependencies are copied via `<Content Include`
+ - [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
+ - [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
 - [ ] Inline code documentation is added
 - [ ] README is added according to template
-  - [ ] Related READMEs, ToC are updated
+ - [ ] Related READMEs, ToC are updated
 - [ ] The CI passes for Linux/ROCm, Linux/CUDA, Windows/ROCm, Windows/CUDA.
@@ -0,0 +1,10 @@
+MD013: false
+MD024:
+ siblings_only: true
+MD026:
+ punctuation: ".,;:!"
+MD029:
+ style: ordered
+MD033: false
+MD034: false
+MD041: false
@@ -2,6 +2,7 @@
 # MIGraphX - Torch Examples
 
 # Summary
+
 The examples in this subdirectory showcase the functionality for executing quantized models using MIGraphX. The Torch-MIGraphX integration library is used to achieve this, where PyTorch is used to quantize models, and MIGraphX is used to execute them on AMD GPUs.
 
 For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/tree/master) library.
@@ -10,41 +11,44 @@ For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftw
 
 The quantization workflow consists of two main steps:
 
-- Generate quantization parameters 
- 
+- Generate quantization parameters
+
 - Convert relevant operations in the model's computational graph to use the quantized datatype
 
 ### Generating quantization parameters
 
 There are three main methods for computing quantization parameters:
 
 - Dynamic Quantization:
- - Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime
-
+
+ - Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime
+
 - Static Post Training Quantization (PTQ):
- - Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data
-
+
+ - Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data
+
 - Static Quantization Aware Training (QAT):
+
  - Quantization parameters are calibrated during the training process
 
 **Note**: All three of these techniques are supported by PyTorch (at least in a prototype form), and so the examples leverage PyTorch's quantization APIs to perform this step.
 
 ### Converting and executing the quantized model
-As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer. 
+
+As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer.
 
 The examples show how to use this library to convert and execute PyTorch quantized models on GPUs using MIGraphX.
 
 ## Torch-MIGraphX
 
-Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally. 
+Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally.
 
 This library currently supports two paths for lowering:
 
 - FX Tracing: Uses tracing API provided by the `torch.fx` library.
- 
+
 - Dynamo Backend: Importing torch_migraphx automatically registers the "migraphx" backend that can be used with the `torch.compile` API.
 
 ### Installation instructions
 
-Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions. 
-
+Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions.
@@ -1,102 +1,109 @@
-
-
 # Running quantized ResNet50 via MIGraphX
 
 ## Summary
+
 This example walks through the dynamo Post Training Quantization (PTQ) workflow for running a quantized model using torch_migraphx.
 
 ## Prerequisites
 
-- You must follow the installation instructions for the torch_migraphx library in [Readme.md](https://github.com/Rmalavally/rocm-examples/blob/develop/AI/MIGraphX/Quantization/Readme.md) before using this example.
+- You must follow the installation instructions for the torch_migraphx library in [README.md](README.md) before using this example.
 
 ## Steps for running a quantized model using torch_migraphx
 
-1. Use torch.export and quantize_pt2e APIs to perform quantization
-*Note*: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.
+1. Use torch.export and quantize_pt2e APIs to perform quantization.
+
+ **Note**: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.
+
+ ```python
+ import torch
+ from torchvision import models
+ from torch._export import capture_pre_autograd_graph
+ from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
+ ```
 
-```import torch
-from torchvision import models
-from torch._export import capture_pre_autograd_graph
-from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
-```
+ ```python
+ import torch_migraphx
+ from torch_migraphx.dynamo.quantization import MGXQuantizer
 
-```import torch_migraphx
-from torch_migraphx.dynamo.quantization import MGXQuantizer
-model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
-input_fp32 = torch.randn(2, 3, 28, 28)
+ model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
+ input_fp32 = torch.randn(2, 3, 28, 28)
 
-torch_fp32_out = model_fp32(input_fp32)
-```
+ torch_fp32_out = model_fp32(input_fp32)
+ ```
 
+ ```python
+ model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
+ ```
 
-```
-model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
-```
-Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.
+ Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.
 
-```
-quantizer = MGXQuantizer()
-m = prepare_pt2e(model_export, quantizer)
-# psudo calibrate
-with torch.no_grad():
- for _ in range(10):
+ ```python
+ quantizer = MGXQuantizer()
+ m = prepare_pt2e(model_export, quantizer)
+
+ # psudo calibrate
+ with torch.no_grad():
+ for _ in range(10):
  m(torch.randn(2, 3, 28, 28))
-q_m = convert_pt2e(m)
-torch_qout = q_m(input_fp32)
-```
+ q_m = convert_pt2e(m)
+ torch_qout = q_m(input_fp32)
+ ```
 
 2. Lower Quantized model to MIGraphX. This step is the same as lowering any other model using torch.compile!
 
-```
-mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
-mgx_out = mgx_mod(input_fp32.cuda())
-print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
-print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
-print(f"MIGraphX INT8:\n{mgx_out}")
-```
+ ```python
+ mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
+ mgx_out = mgx_mod(input_fp32.cuda())
+
+ print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
+ print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
+ print(f"MIGraphX INT8:\n{mgx_out}")
+ ```
 
 3. Performance
-
-Do a quick test to measure the performance gain from using quantization.
-
-```
-import copy
-import torch._dynamo
-# We will use this function to benchmark all modules:
-def benchmark_module(model, inputs, iterations=100):
- model(*inputs)
- torch.cuda.synchronize()
-
- start_event = torch.cuda.Event(enable_timing=True)
- end_event = torch.cuda.Event(enable_timing=True)
-
- start_event.record()
- for _ in range(iterations):
+
+ Do a quick test to measure the performance gain from using quantization.
+
+ ```python
+ import copy
+ import torch._dynamo
+
+ # We will use this function to benchmark all modules:
+ def benchmark_module(model, inputs, iterations=100):
  model(*inputs)
- end_event.record()
- torch.cuda.synchronize()
-
- return start_event.elapsed_time(end_event) / iterations
-# Benchmark MIGraphX INT8
-mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
-torch._dynamo.reset()
-# Benchmark MIGraphX FP32
-mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
-mgx_module_fp32(input_fp32.cuda())
-mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
-torch._dynamo.reset()
-# Benchmark MIGraphX FP16
-mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
-input_fp16 = input_fp32.cuda().half()
-mgx_module_fp16(input_fp16)
-mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])
-print(f"{mgx_fp32_time=:0.4f}ms")
-print(f"{mgx_fp16_time=:0.4f}ms")
-print(f"{mgx_int8_time=:0.4f}ms")
-
-```
- Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
+ torch.cuda.synchronize()
+
+ start_event = torch.cuda.Event(enable_timing=True)
+ end_event = torch.cuda.Event(enable_timing=True)
+
+ start_event.record()
 
+ for _ in range(iterations):
+ model(*inputs)
+ end_event.record()
+ torch.cuda.synchronize()
 
+ return start_event.elapsed_time(end_event) / iterations
 
+ # Benchmark MIGraphX INT8
+ mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
+ torch._dynamo.reset()
 
+ # Benchmark MIGraphX FP32
+ mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
+ mgx_module_fp32(input_fp32.cuda())
+ mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
+ torch._dynamo.reset()
+
+ # Benchmark MIGraphX FP16
+ mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
+ input_fp16 = input_fp32.cuda().half()
+ mgx_module_fp16(input_fp16)
+ mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])
+
+ print(f"{mgx_fp32_time=:0.4f}ms")
+ print(f"{mgx_fp16_time=:0.4f}ms")
+ print(f"{mgx_int8_time=:0.4f}ms")
+ ```
+
+ Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
@@ -1,43 +1,54 @@
 # Applications Examples
 
 ## Summary
+
 The examples in this subdirectory showcase several GPU-implementations of finance, computer science, physics, etc. models or algorithms that additionally offer a command line application. The examples are build on Linux for the ROCm (AMD GPU) backend. Some examples additionally support the CUDA (NVIDIA GPU) backend.
 
 ## Prerequisites
+
 ### Linux
+
 - [CMake](https://cmake.org/download/) (at least version 3.21)
 - OR GNU Make - available via the distribution's package manager
 - [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)
 
 ### Windows
+
 - [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
 - ROCm toolchain for Windows (No public release yet)
-  - The Visual Studio ROCm extension needs to be installed to build with the solution files.
+ - The Visual Studio ROCm extension needs to be installed to build with the solution files.
 - [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
 - [Ninja](https://ninja-build.org/) (optional, to build with CMake)
 
 ## Building
+
 ### Linux
+
 Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.
 
 #### Using CMake
+
 All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently.
 
 - `$ cd Libraries/Applications`
 - `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA, when supported)
 - `$ cmake --build build`
 
 #### Using Make
+
 All examples can be built by a single invocation to Make or be built independently.
 
 - `$ cd Libraries/Applications`
 - `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA, when supported)
 
 ### Windows
+
 #### Visual Studio
+
 Visual Studio solution files are available for the individual examples. To build all supported HIP runtime examples open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for Applications.
 
 For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).
 
 #### CMake
+
 All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).