Skip to content

Commit

Permalink
Starting to fix linting errors in markdown files.
Browse files Browse the repository at this point in the history
  • Loading branch information
dgaliffiAMD committed May 16, 2024
1 parent 3a8b0db commit 758ec64
Show file tree
Hide file tree
Showing 75 changed files with 1,719 additions and 870 deletions.
10 changes: 5 additions & 5 deletions .gitlab/issue_templates/example.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Example checklist

- Elaboration
- [ ] Example concept is described and agreed upon
- [ ] Example concept is described and agreed upon
- Implementation
- [ ] Example is implemented
- [ ] Example is implemented
- Internal review
- [ ] Internal code review is done
- [ ] Internal code review is done
- External review
- [ ] Upstreaming PR is opened, external review is done
- [ ] Upstreaming PR is opened, external review is done
- Done
- [ ] Example merged to upstream
- [ ] Example merged to upstream
12 changes: 7 additions & 5 deletions .gitlab/merge_request_templates/example.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
## Notes for the reviewer

_The reviewer should acknowledge all these topics._
<insert notes>

## Checklist before merge

- [ ] CMake support is added
- [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
- [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
- [ ] GNU Make support is added (Linux)
- [ ] Visual Studio project is added for VS2017, 2019, 2022 (Windows) (use [the script](https://projects.streamhpc.com/departments/knowledge/employee-handbook/-/wikis/Projects/AMD/Libraries/examples/Adding-Visual-Studio-Projects-to-new-examples#scripts))
- [ ] DLL dependencies are copied via `<Content Include`
- [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
- [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
- [ ] DLL dependencies are copied via `<Content Include`
- [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
- [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
- [ ] Inline code documentation is added
- [ ] README is added according to template
- [ ] Related READMEs, ToC are updated
- [ ] Related READMEs, ToC are updated
- [ ] The CI passes for Linux/ROCm, Linux/CUDA, Windows/ROCm, Windows/CUDA.
10 changes: 10 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
MD013: false
MD024:
siblings_only: true
MD026:
punctuation: ".,;:!"
MD029:
style: ordered
MD033: false
MD034: false
MD041: false
26 changes: 15 additions & 11 deletions AI/MIGraphX/Quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# MIGraphX - Torch Examples

# Summary

The examples in this subdirectory showcase the functionality for executing quantized models using MIGraphX. The Torch-MIGraphX integration library is used to achieve this, where PyTorch is used to quantize models, and MIGraphX is used to execute them on AMD GPUs.

For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/tree/master) library.
Expand All @@ -10,41 +11,44 @@ For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftw

The quantization workflow consists of two main steps:

- Generate quantization parameters
- Generate quantization parameters

- Convert relevant operations in the model's computational graph to use the quantized datatype

### Generating quantization parameters

There are three main methods for computing quantization parameters:

- Dynamic Quantization:
- Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime


- Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime

- Static Post Training Quantization (PTQ):
- Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data


- Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data

- Static Quantization Aware Training (QAT):

- Quantization parameters are calibrated during the training process

**Note**: All three of these techniques are supported by PyTorch (at least in a prototype form), and so the examples leverage PyTorch's quantization APIs to perform this step.

### Converting and executing the quantized model
As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer.

As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer.

The examples show how to use this library to convert and execute PyTorch quantized models on GPUs using MIGraphX.

## Torch-MIGraphX

Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally.
Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally.

This library currently supports two paths for lowering:

- FX Tracing: Uses tracing API provided by the `torch.fx` library.

- Dynamo Backend: Importing torch_migraphx automatically registers the "migraphx" backend that can be used with the `torch.compile` API.

### Installation instructions

Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions.

Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions.
157 changes: 82 additions & 75 deletions AI/MIGraphX/Quantization/Running-Quantized-ResNet50-via-MIGraphX.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,109 @@


# Running quantized ResNet50 via MIGraphX

## Summary

This example walks through the dynamo Post Training Quantization (PTQ) workflow for running a quantized model using torch_migraphx.

## Prerequisites

- You must follow the installation instructions for the torch_migraphx library in [Readme.md](https://github.com/Rmalavally/rocm-examples/blob/develop/AI/MIGraphX/Quantization/Readme.md) before using this example.
- You must follow the installation instructions for the torch_migraphx library in [README.md](README.md) before using this example.

## Steps for running a quantized model using torch_migraphx

1. Use torch.export and quantize_pt2e APIs to perform quantization
*Note*: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.
1. Use torch.export and quantize_pt2e APIs to perform quantization.

**Note**: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.

```python
import torch
from torchvision import models
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
```

```import torch
from torchvision import models
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
```
```python
import torch_migraphx
from torch_migraphx.dynamo.quantization import MGXQuantizer

```import torch_migraphx
from torch_migraphx.dynamo.quantization import MGXQuantizer
model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
input_fp32 = torch.randn(2, 3, 28, 28)
model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
input_fp32 = torch.randn(2, 3, 28, 28)

torch_fp32_out = model_fp32(input_fp32)
```
torch_fp32_out = model_fp32(input_fp32)
```

```python
model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
```

```
model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
```
Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.
Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.

```
quantizer = MGXQuantizer()
m = prepare_pt2e(model_export, quantizer)
# psudo calibrate
with torch.no_grad():
for _ in range(10):
```python
quantizer = MGXQuantizer()
m = prepare_pt2e(model_export, quantizer)

# psudo calibrate
with torch.no_grad():
for _ in range(10):
m(torch.randn(2, 3, 28, 28))
q_m = convert_pt2e(m)
torch_qout = q_m(input_fp32)
```
q_m = convert_pt2e(m)
torch_qout = q_m(input_fp32)
```

2. Lower Quantized model to MIGraphX. This step is the same as lowering any other model using torch.compile!

```
mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
mgx_out = mgx_mod(input_fp32.cuda())
print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
print(f"MIGraphX INT8:\n{mgx_out}")
```
```python
mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
mgx_out = mgx_mod(input_fp32.cuda())

print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
print(f"MIGraphX INT8:\n{mgx_out}")
```

3. Performance

Do a quick test to measure the performance gain from using quantization.

```
import copy
import torch._dynamo
# We will use this function to benchmark all modules:
def benchmark_module(model, inputs, iterations=100):
model(*inputs)
torch.cuda.synchronize()
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)
start_event.record()
for _ in range(iterations):

Do a quick test to measure the performance gain from using quantization.

```python
import copy
import torch._dynamo

# We will use this function to benchmark all modules:
def benchmark_module(model, inputs, iterations=100):
model(*inputs)
end_event.record()
torch.cuda.synchronize()
return start_event.elapsed_time(end_event) / iterations
# Benchmark MIGraphX INT8
mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
torch._dynamo.reset()
# Benchmark MIGraphX FP32
mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
mgx_module_fp32(input_fp32.cuda())
mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
torch._dynamo.reset()
# Benchmark MIGraphX FP16
mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
input_fp16 = input_fp32.cuda().half()
mgx_module_fp16(input_fp16)
mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])
print(f"{mgx_fp32_time=:0.4f}ms")
print(f"{mgx_fp16_time=:0.4f}ms")
print(f"{mgx_int8_time=:0.4f}ms")
```
Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
torch.cuda.synchronize()

start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()

for _ in range(iterations):
model(*inputs)
end_event.record()
torch.cuda.synchronize()

return start_event.elapsed_time(end_event) / iterations

# Benchmark MIGraphX INT8
mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
torch._dynamo.reset()

# Benchmark MIGraphX FP32
mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
mgx_module_fp32(input_fp32.cuda())
mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
torch._dynamo.reset()

# Benchmark MIGraphX FP16
mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
input_fp16 = input_fp32.cuda().half()
mgx_module_fp16(input_fp16)
mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])

print(f"{mgx_fp32_time=:0.4f}ms")
print(f"{mgx_fp16_time=:0.4f}ms")
print(f"{mgx_int8_time=:0.4f}ms")
```

Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
13 changes: 12 additions & 1 deletion Applications/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,54 @@
# Applications Examples

## Summary

The examples in this subdirectory showcase several GPU-implementations of finance, computer science, physics, etc. models or algorithms that additionally offer a command line application. The examples are build on Linux for the ROCm (AMD GPU) backend. Some examples additionally support the CUDA (NVIDIA GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.

#### Using CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/Applications`
- `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA, when supported)
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/Applications`
- `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA, when supported)

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all supported HIP runtime examples open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for Applications.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
Loading

0 comments on commit 758ec64

Please sign in to comment.