Skip to content

Commit

Permalink
Fixed more linting errors.
Browse files Browse the repository at this point in the history
Mostly in rocPRIM, rocRAND, rocSOLVER, and rocSPARSE.
  • Loading branch information
dgaliffiAMD committed May 16, 2024
1 parent 758ec64 commit f615ce2
Show file tree
Hide file tree
Showing 37 changed files with 940 additions and 527 deletions.
2 changes: 1 addition & 1 deletion Applications/prefix_sum/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The algorithm used has two phases which are repeated:
Below is an example where the threads per block is 2.
In the first iteration ($\text{offset}=1$) we have 4 threads combining 8 items.

![](prefix_sum_diagram.svg)
![prefix_sum_diagram.svg](prefix_sum_diagram.svg)

### Application flow

Expand Down
1 change: 1 addition & 0 deletions HIP-Basic/runtime_compilation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This example showcases how to make use of hipRTC to compile in runtime a kernel
### Application flow

The diagram below summarizes the runtime compilation part of the example.

1. A number of variables are declared and defined to configure the program which will be compiled in runtime.
2. The program is created using the above variables as parameters, along with the SAXPY kernel in string form.
3. The properties of the first device (GPU) available are consulted to set the device architecture as (the only) compile option.
Expand Down
19 changes: 15 additions & 4 deletions Libraries/rocPRIM/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,58 @@
# rocPRIM Examples

## Summary

The examples in this subdirectory showcase the functionality of the [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM) library. The examples build on both Linux and Windows for the ROCm (AMD GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)
- [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM)
- `rocPRIM-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html).
- `rocPRIM-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html).

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM)
- Installed as part of the ROCm SDK on Windows for ROCm platform.
- Installed as part of the ROCm SDK on Windows for ROCm platform.
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment.

#### Using CMake

All examples in the `rocPRIM` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/rocPRIM`
- `$ cmake -S . -B build`
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/rocPRIM`
- `$ make`

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all examples for rocPRIM open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocPRIM.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `rocPRIM` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
10 changes: 8 additions & 2 deletions Libraries/rocPRIM/block_sum/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# rocPRIM Block Sum Example

## Description

This simple program showcases the usage of the `rocprim::block_reduce` block-level function. It also showcases the usage of `rocprim::block_load` block-level load function. The results from `rocprim::block_load` are eventually used by `rocprim::block_reduce`. The final result of the block-level reductions are written to the standard output.

### Application flow
### Application flow

1. Host side data is instantiated in a `std::vector<int>`.
2. Device storage for input and output data is allocated using `hipMalloc`.
3. Input data is copied from the host to the device using `hipMemcpy`.
Expand All @@ -15,18 +17,22 @@ This simple program showcases the usage of the `rocprim::block_reduce` block-lev
6. All device memory is freed using `hipFree`.

## Key APIs and Concepts
- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::block_reduce` which is a GPU block-level function.

- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::block_reduce` which is a GPU block-level function.
- The `rocprim::block_reduce` template function performs a reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (`rocprim::plus<int>`) is used, which fulfils this property.
- `rocprim::block_reduce` is a collective operation, which means all threads in the block must make a call to `rocprim::block_reduce`.
- In this example `rocprim::block_load` is used to pre-fetch (load) the global input data. It has the potential to increase performance since data is effiently loaded into per-thread local register space.

## Used API surface

### rocPRIM

- `rocprim::block_reduce`
- `rocprim::plus`
- `rocprim::block_load`

### HIP runtime

- `hipGetErrorString`
- `hipMalloc`
- `hipMemcpy`
Expand Down
8 changes: 7 additions & 1 deletion Libraries/rocPRIM/device_sum/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# rocPRIM Device Sum Example

## Description

This simple program showcases the usage of the device function `rocprim::reduce`.

### Application flow
### Application flow

1. Input data is instantiated in a `std::vector<int>` and the values are printed to the standard output.
2. Device storage for input and output data is allocated using `hipMalloc`.
3. Input data is copied from the host to the device using `hipMemcpy`.
Expand All @@ -15,16 +17,20 @@ This simple program showcases the usage of the device function `rocprim::reduce`
9. All device memory is freed using `hipFree`.

## Key APIs and Concepts

- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::reduce` which is a device function, thereby it can be called from host code.
- The `rocprim::reduce` template function performs a generalized reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (`rocprim::plus<int>`) is used which fulfils this property.
- The device functions of `rocPRIM` require a temporary device memory location to store the results of intermediate calculations. The required amount of temporary storage can be calculated by invoking the function with matching argument set, except the first argument `temporary_storage` must be a `nullptr`. In this case, the GPU kernel is not launched.

## Demonstrated API Calls

### rocPRIM

- `rocprim::reduce`
- `rocprim::plus`

### HIP runtime

- `hipMalloc`
- `hipMemcpy`
- `hipFree`
23 changes: 17 additions & 6 deletions Libraries/rocRAND/README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,60 @@
# rocRAND Examples

## Summary

The examples in this subdirectory showcase the functionality of the [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND) library. The examples build on both Linux and Windows for both the ROCm (AMD GPU) and CUDA (NVIDIA GPU) backend.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x) OR the HIP Nvidia runtime (on the CUDA platform)
- [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND)
- ROCm platform: `rocrand-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html).
- CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install).
- ROCm platform: `rocrand-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html).
- CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install).

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND)
- ROCm platform: Installed as part of the ROCm SDK on Windows.
- CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install).
- ROCm platform: Installed as part of the ROCm SDK on Windows.
- CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install).
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment set up specifically for the example suite.

#### Using CMake

All examples in the `rocRAND` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/rocRAND`
- `$ cmake -S . -B build` (on ROCm) or `$ cmake -S . -B build -D GPU_RUNTIME=CUDA` (on CUDA)
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/rocRAND`
- `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA)

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all examples for rocRAND open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocRAND.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `rocRAND` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
12 changes: 11 additions & 1 deletion Libraries/rocRAND/simple_distributions_cpp/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,36 @@
# rocRAND Simple Distributions Example (C++)

## Description

This sample illustrates the usage of the rocRAND random number generator library via the host-side C++ API. The usage of the random engines and random distributions offered by rocRAND is showcased. The usage, results and execution time of each algorithm provided by rocRAND is compared to the corresponding standard library equivalent.

### Application flow
### Application flow

1. The example command line application takes optional arguments: the used device index, the random distribution type, the element count of the generated random vector and whether the generated vectors should be printed to the standard output.
2. The arguments are parsed in `parse_args` and the result is printed to the standard output. If the parsing fails due to e.g. malformed input, an exception is raised, the correct usage is printed and the program returns with an error code.
3. The utilized device (GPU) is selected in `set_device`. If the selected device does not exist, an error message is printed to the standard error output and the program returns with an error code. Otherwise the name of the selected device is printed to the standard output.
4. The host and device distribution types are selected in `dispatch_distribution_type` based on the provided command line arguments.
5. Two vectors filled with randomly generated values are produced in `compare_device_and_host_random_number_generation`. One is generated on the device using rocRAND (`generate_random_vector_on_device`) and the other is generated on the host using the standard `<random>` library (`generate_random_vector_on_host`). The runtime of the two functions is measured and printed to the standard output.

### Command line interface

The application provides the following optional command line arguments:

- `--device <device ID>`. Controls which device (GPU) the random number generation runs on. Default value is `0`.
- `--distribution uniform_int|uniform_float|normal|poisson`. Controls the type of the random distribution that is used for the random number generation. Default value is `uniform_int`.
- `--size <size>`. Controls the number of random numbers generated.
- `--print`. If specified, the generated random vectors are written to the standard output.

## Key APIs and Concepts

### rocRAND Engines

rocRAND engines define algorithms that generate sequences of random numbers. Typically an engine maintains an internal state that determines the order and value of all subsequent random numbers produced by the engine. In that sense, an engine lacks true randomness, hence the name pseudo-random number generator (or PRNG). Other engines produce quasi-random sequences, which appear to be equidistributed. An engine can be initialized with a seed value that determines the initial state of the engine. Different engine types employ different algorithms to generate the pseudo-random sequence, they differ in the mathematical characteristics of the sequence generated. Unless special requirements arise, it is safe to use the `rocrand_cpp::default_random_engine` alias to create an engine. For the full list of implemented engines, refer to the documentation.

### rocRAND Distributions

A PRNG engine typically generates uniformly distributed integral numbers over the full range of the type. In order to transform this output to something more useful, rocRAND provides a set of distributions that transform this raw random sequence to samples of a random distribution. This example showcases the following distributions:

- `rocrand_cpp::uniform_int_distribution` generates unsigned integers sampled from a [discrete uniform distribution](https://en.wikipedia.org/wiki/discrete_uniform_distribution)
- `rocrand_cpp::uniform_real_distribution` generates floating point numbers sampled from a [continuous uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution) over the interval of `[0,1)`
- `rocrand_cpp::normal_distribution` generates floating point numbers sampled from a standard [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution).
Expand All @@ -33,13 +41,15 @@ For the full list of implemented distributions, refer to the documentation.
## Demonstrated API Calls

### rocRAND

- `rocrand_cpp::default_random_engine`
- `rocrand_cpp::uniform_int_distribution`
- `rocrand_cpp::uniform_real_distribution`
- `rocrand_cpp::normal_distribution`
- `rocrand_cpp::poisson_distribution`

### HIP runtime

- `hipGetErrorString`
- `hipSetDevice`
- `hipGetDeviceProperties`
Expand Down
10 changes: 10 additions & 0 deletions Libraries/rocSOLVER/getf2/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# rocSOLVER LU Factorization Example

## Description

This example illustrates the use of the rocSOLVER `getf2` functionality. The rocSOLVER `getf2` computes the [LU decomposition](https://en.wikipedia.org/wiki/LU_decomposition) of an $m \times n$ matrix $A$, with partial pivoting. This factorization is given by $P \cdot A = L \cdot U$, where:

- `getf2()`: This is the unblocked Level-2-BLAS version of the LU factorization algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices.
- $A$ is the $m \times n$ input matrix.
- $P$ is an $m \times m$ [permutation matrix](https://en.wikipedia.org/wiki/Permutation_matrix), in this example stored as an array of row indices `vector<int> Ipiv` of size `min(m, n)`.
Expand All @@ -13,6 +15,7 @@ This example illustrates the use of the rocSOLVER `getf2` functionality. The roc
- an $n \times n$ upper tridiagonal matrix, when $m \geq n$

### Application flow

1. Parse command line arguments for the dimension of the input matrix.
2. Declare and initialize a number of constants for the input and output matrices and vectors.
3. Allocate and initialize the host matrices and vectors.
Expand All @@ -25,14 +28,17 @@ This example illustrates the use of the rocSOLVER `getf2` functionality. The roc
10. Free device memory and the rocBLAS handle.

## Key APIs and Concepts

### rocSOLVER

- `rocsolver_[sdcz]getf2` computes the LU factorization of the $m \times n$ input matrix $A$. The correct function signature should be chosen, based on the datatype of the input matrix:
- `s` (single-precision: `float`)
- `d` (double-precision: `double`)
- `c` (single-precision complex: `rocblas_float_complex`)
- `z` (double-precision complex: `rocblas_double_complex`).

Input parameters for the precision used in this example (double-precision):

- `rocblas_handle handle`
- `const rocblas_int m`: number of rows of $A$
- `const rocblas_int n`: number of columns of $A$
Expand All @@ -44,17 +50,21 @@ Input parameters for the precision used in this example (double-precision):
Return type: `rocblas_status`

## Used API surface

### rocSOLVER

- `rocsolver_dgetf2`

### rocBLAS

- `rocblas_create_handle`
- `rocblas_destroy_handle`
- `rocblas_double`
- `rocblas_handle`
- `rocblas_int`

### HIP runtime

- `hipFree`
- `hipMalloc`
- `hipMemcpy`
Expand Down
Loading

0 comments on commit f615ce2

Please sign in to comment.