Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linting action for documentation #119

Merged
merged 8 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: Linting

on:
push:
branches:
- develop
- main
- 'release/rocm-rel*'
pull_request:
branches:
- develop
- main
- 'release/roc-rel*'

jobs:
call-workflow-passing-data:
name: Documentation
uses: ROCm/rocm-docs-core/.github/workflows/linting.yml@develop
10 changes: 5 additions & 5 deletions .gitlab/issue_templates/example.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Example checklist

- Elaboration
- [ ] Example concept is described and agreed upon
- [ ] Example concept is described and agreed upon
- Implementation
- [ ] Example is implemented
- [ ] Example is implemented
- Internal review
- [ ] Internal code review is done
- [ ] Internal code review is done
- External review
- [ ] Upstreaming PR is opened, external review is done
- [ ] Upstreaming PR is opened, external review is done
- Done
- [ ] Example merged to upstream
- [ ] Example merged to upstream
12 changes: 7 additions & 5 deletions .gitlab/merge_request_templates/example.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
## Notes for the reviewer

_The reviewer should acknowledge all these topics._
<insert notes>

## Checklist before merge

- [ ] CMake support is added
- [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
- [ ] Dependencies are copied via `IMPORTED_RUNTIME_ARTIFACTS` if applicable
- [ ] GNU Make support is added (Linux)
- [ ] Visual Studio project is added for VS2017, 2019, 2022 (Windows) (use [the script](https://projects.streamhpc.com/departments/knowledge/employee-handbook/-/wikis/Projects/AMD/Libraries/examples/Adding-Visual-Studio-Projects-to-new-examples#scripts))
- [ ] DLL dependencies are copied via `<Content Include`
- [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
- [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
- [ ] DLL dependencies are copied via `<Content Include`
- [ ] Visual Studio project is added to `ROCm-Examples-vs*.sln` (ROCm)
- [ ] Visual Studio project is added to `ROCm-Examples-Portable-vs*.sln` (ROCm/CUDA) if applicable
- [ ] Inline code documentation is added
- [ ] README is added according to template
- [ ] Related READMEs, ToC are updated
- [ ] Related READMEs, ToC are updated
- [ ] The CI passes for Linux/ROCm, Linux/CUDA, Windows/ROCm, Windows/CUDA.
10 changes: 10 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
MD013: false
MD024:
siblings_only: true
MD026:
punctuation: ".,;:!"
MD029:
style: ordered
MD033: false
MD034: false
MD041: false
26 changes: 15 additions & 11 deletions AI/MIGraphX/Quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# MIGraphX - Torch Examples

# Summary

The examples in this subdirectory showcase the functionality for executing quantized models using MIGraphX. The Torch-MIGraphX integration library is used to achieve this, where PyTorch is used to quantize models, and MIGraphX is used to execute them on AMD GPUs.

For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/tree/master) library.
Expand All @@ -10,41 +11,44 @@ For more information, refer to the [Torch-MIGraphX](https://github.com/ROCmSoftw

The quantization workflow consists of two main steps:

- Generate quantization parameters
- Generate quantization parameters

- Convert relevant operations in the model's computational graph to use the quantized datatype

### Generating quantization parameters

There are three main methods for computing quantization parameters:

- Dynamic Quantization:
- Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime


- Model weights are pre-quantized , input/activation quantization parameters are computed dynamically at runtime

- Static Post Training Quantization (PTQ):
- Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data


- Quantization parameters are computed via calibration. Calibration involves calculating statistical attributes for relevant model nodes using provided sample input data

- Static Quantization Aware Training (QAT):

- Quantization parameters are calibrated during the training process

**Note**: All three of these techniques are supported by PyTorch (at least in a prototype form), and so the examples leverage PyTorch's quantization APIs to perform this step.

### Converting and executing the quantized model
As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer.

As of the latest PyTorch release, there is no support for executing quantized models on GPUs directly through the framework. To execute these quantized models, use AMD's graph optimizer, MIGraphX, which is built using the ROCm stack. The [torch_migraphx](https://github.com/ROCmSoftwarePlatform/torch_migraphx) library provides a friendly interface for optimizing PyTorch models using the MIGraphX graph optimizer.

The examples show how to use this library to convert and execute PyTorch quantized models on GPUs using MIGraphX.

## Torch-MIGraphX

Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally.
Torch-MIGraphX integrates AMD's graph inference engine with the PyTorch ecosystem. It provides a `mgx_module` object that may be invoked in the same manner as any other torch module, but utilizes the MIGraphX inference engine internally.

This library currently supports two paths for lowering:

- FX Tracing: Uses tracing API provided by the `torch.fx` library.

- Dynamo Backend: Importing torch_migraphx automatically registers the "migraphx" backend that can be used with the `torch.compile` API.

### Installation instructions

Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions.

Refer to the [Torch_MIGraphX](https://github.com/ROCmSoftwarePlatform/torch_migraphx/blob/master/README.md) page for Docker and source installation instructions.
157 changes: 82 additions & 75 deletions AI/MIGraphX/Quantization/Running-Quantized-ResNet50-via-MIGraphX.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,109 @@


# Running quantized ResNet50 via MIGraphX

## Summary

This example walks through the dynamo Post Training Quantization (PTQ) workflow for running a quantized model using torch_migraphx.

## Prerequisites

- You must follow the installation instructions for the torch_migraphx library in [Readme.md](https://github.com/Rmalavally/rocm-examples/blob/develop/AI/MIGraphX/Quantization/Readme.md) before using this example.
- You must follow the installation instructions for the torch_migraphx library in [README.md](README.md) before using this example.

## Steps for running a quantized model using torch_migraphx

1. Use torch.export and quantize_pt2e APIs to perform quantization
*Note*: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.
1. Use torch.export and quantize_pt2e APIs to perform quantization.

**Note**: The export API call is considered a prototype feature at the time this tutorial is written. Some call signatures may be modified in the future.

```python
import torch
from torchvision import models
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
```

```import torch
from torchvision import models
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
```
```python
import torch_migraphx
from torch_migraphx.dynamo.quantization import MGXQuantizer

```import torch_migraphx
from torch_migraphx.dynamo.quantization import MGXQuantizer
model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
input_fp32 = torch.randn(2, 3, 28, 28)
model_fp32 = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1).eval()
input_fp32 = torch.randn(2, 3, 28, 28)

torch_fp32_out = model_fp32(input_fp32)
```
torch_fp32_out = model_fp32(input_fp32)
```

```python
model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
```

```
model_export = capture_pre_autograd_graph(model_fp32, (input_fp32, ))
```
Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.
Use the pt2e API to prepare, calibrate, and convert the model. Torch-MIGraphX provides a custom Quantizer for performing quantization that is compatible with MIGraphX.

```
quantizer = MGXQuantizer()
m = prepare_pt2e(model_export, quantizer)
# psudo calibrate
with torch.no_grad():
for _ in range(10):
```python
quantizer = MGXQuantizer()
m = prepare_pt2e(model_export, quantizer)

# psudo calibrate
with torch.no_grad():
for _ in range(10):
m(torch.randn(2, 3, 28, 28))
q_m = convert_pt2e(m)
torch_qout = q_m(input_fp32)
```
q_m = convert_pt2e(m)
torch_qout = q_m(input_fp32)
```

2. Lower Quantized model to MIGraphX. This step is the same as lowering any other model using torch.compile!

```
mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
mgx_out = mgx_mod(input_fp32.cuda())
print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
print(f"MIGraphX INT8:\n{mgx_out}")
```
```python
mgx_mod = torch.compile(q_m, backend='migraphx').cuda()
mgx_out = mgx_mod(input_fp32.cuda())

print(f"PyTorch FP32 (Gold Value):\n{torch_fp32_out}")
print(f"PyTorch INT8 (Fake Quantized):\n{torch_qout}")
print(f"MIGraphX INT8:\n{mgx_out}")
```

3. Performance

Do a quick test to measure the performance gain from using quantization.

```
import copy
import torch._dynamo
# We will use this function to benchmark all modules:
def benchmark_module(model, inputs, iterations=100):
model(*inputs)
torch.cuda.synchronize()

start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()
for _ in range(iterations):

Do a quick test to measure the performance gain from using quantization.

```python
import copy
import torch._dynamo

# We will use this function to benchmark all modules:
def benchmark_module(model, inputs, iterations=100):
model(*inputs)
end_event.record()
torch.cuda.synchronize()

return start_event.elapsed_time(end_event) / iterations
# Benchmark MIGraphX INT8
mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
torch._dynamo.reset()
# Benchmark MIGraphX FP32
mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
mgx_module_fp32(input_fp32.cuda())
mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
torch._dynamo.reset()
# Benchmark MIGraphX FP16
mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
input_fp16 = input_fp32.cuda().half()
mgx_module_fp16(input_fp16)
mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])
print(f"{mgx_fp32_time=:0.4f}ms")
print(f"{mgx_fp16_time=:0.4f}ms")
print(f"{mgx_int8_time=:0.4f}ms")

```
Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
torch.cuda.synchronize()

start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()

for _ in range(iterations):
model(*inputs)
end_event.record()
torch.cuda.synchronize()

return start_event.elapsed_time(end_event) / iterations

# Benchmark MIGraphX INT8
mgx_int8_time = benchmark_module(mgx_mod, [input_fp32.cuda()])
torch._dynamo.reset()

# Benchmark MIGraphX FP32
mgx_module_fp32 = torch.compile(copy.deepcopy(model_fp32), backend='migraphx').cuda()
mgx_module_fp32(input_fp32.cuda())
mgx_fp32_time = benchmark_module(mgx_module_fp32, [input_fp32.cuda()])
torch._dynamo.reset()

# Benchmark MIGraphX FP16
mgx_module_fp16 = torch.compile(copy.deepcopy(model_fp32).half(), backend='migraphx').cuda()
input_fp16 = input_fp32.cuda().half()
mgx_module_fp16(input_fp16)
mgx_fp16_time = benchmark_module(mgx_module_fp16, [input_fp16])

print(f"{mgx_fp32_time=:0.4f}ms")
print(f"{mgx_fp16_time=:0.4f}ms")
print(f"{mgx_int8_time=:0.4f}ms")
```

Note that these performance gains (or lack of gains) will vary depending on the specific hardware in use.
13 changes: 12 additions & 1 deletion Applications/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,54 @@
# Applications Examples

## Summary

The examples in this subdirectory showcase several GPU-implementations of finance, computer science, physics, etc. models or algorithms that additionally offer a command line application. The examples are build on Linux for the ROCm (AMD GPU) backend. Some examples additionally support the CUDA (NVIDIA GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.

#### Using CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/Applications`
- `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA, when supported)
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/Applications`
- `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA, when supported)

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all supported HIP runtime examples open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for Applications.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `Applications` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
Loading
Loading