Skip to content

Commit

Permalink
Fix linting errors in rocThrust
Browse files Browse the repository at this point in the history
  • Loading branch information
dgaliffiAMD committed May 16, 2024
1 parent f615ce2 commit 5cababb
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 6 deletions.
1 change: 1 addition & 0 deletions Libraries/rocSPARSE/level_2/bsrxmv/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ $$ \mathbf{\bar{m}} = \left(
\right)$$

The BSRX format is the same as BSR, but the `bsr_row_ptr` is separated into starting and ending indices.

- `bsrx_row_ptr`: the first block of each row that is used for the calculation. This block is typically the first nonzero block.
- `bsrx_end_ptr`: the position next to the last block (last + 1) that is used for the calculation. This block is typically the last nonzero block.

Expand Down
15 changes: 13 additions & 2 deletions Libraries/rocThrust/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,56 @@
# rocThrust Examples

## Summary

The examples in this subdirectory showcase the functionality of the [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust) library. The examples build on Linux using the ROCm platform and on Windows using the HIP on Windows platform.

## Prerequisites

### Linux

- [CMake](https://cmake.org/download/) (at least version 3.21)
- OR GNU Make - available via the distribution's package manager
- OR GNU Make - available via the distribution's package manager
- [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x)
- [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust): `rocthrust-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2/page/How_to_Install_ROCm.html).

### Windows

- [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload
- ROCm toolchain for Windows (No public release yet)
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- The Visual Studio ROCm extension needs to be installed to build with the solution files.
- [rocThrust](https://github.com/rocmSoftwarePlatform/rocThrust): installed as part of the ROCm SDK on Windows
- [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21)
- [Ninja](https://ninja-build.org/) (optional, to build with CMake)

## Building

### Linux

Make sure that the dependencies are installed, or use the [provided Dockerfile](../../Dockerfiles/hip-libraries-rocm-ubuntu.Dockerfile) to build and run the examples in a containerized environment that has all prerequisites installed.

#### Using CMake

All examples in the `rocThrust` subdirectory can either be built by a single CMake project or be built independently.

- `$ cd Libraries/rocThrust`
- `$ cmake -S . -B build`
- `$ cmake --build build`

#### Using Make

All examples can be built by a single invocation to Make or be built independently.

- `$ cd Libraries/rocThrust`
- `$ make`

### Windows

#### Visual Studio

Visual Studio solution files are available for the individual examples. To build all examples for rocThrust open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocThrust.

For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio).

#### CMake

All examples in the `rocThrust` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2).
7 changes: 6 additions & 1 deletion Libraries/rocThrust/device_ptr/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# rocThrust Device Pointer Example

## Description

This simple program showcases the usage of the `thrust::device_ptr` template.

### Application flow
### Application flow

1. A `thrust::device_ptr<int>` is instantiated, and memory for ten elements is allocated.
2. Two more `thrust::device_ptr<int>` are instantiated and set to the start- and end-point of the allocated memory region.
3. Normal pointer arithmetic is used on the `thrust::device_ptr<int>`s to calculate the number of elements allocated in step 1.
Expand All @@ -15,14 +17,17 @@ This simple program showcases the usage of the `thrust::device_ptr` template.
9. The device memory is freed using `thrust::device_free`.

## Key APIs and Concepts

- Thrust's `device_ptr` is a simple and transparent way of handling device memory the same way one would handle host memory with normal pointers.
- Unlike a normal pointer to device memory `device_ptr` adds type safety, and the underlying device memory is transparently accessible on the host.
- The `device_ptr` can be used in Thrust algorithms like a normal pointer to device memory.
- The "raw" normal pointer to the device memory for usage in kernels or other APIs can be obtained from a `device_ptr` by using `thrust::raw_pointer_cast`.
- `device_ptr` is not a smart pointer. Allocating and freeing memory lies in the responsibility of the programmer.

## Demonstrated API Calls

### rocThrust

- `thrust::device_ptr<T>::operator=`
- `thrust::device_ptr<T>::operator[]`
- `thrust::device_malloc<T>`
Expand Down
5 changes: 5 additions & 0 deletions Libraries/rocThrust/norm/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,28 @@
# rocThrust Norm Example

## Description

An example is presented to compute the Euclidean norm of a `thrust::device_vector`. The result is written to the standard output.

### Application flow

1. Instantiate a host vector.
2. Copy the vector to the device by constructing `thrust::device_vector` from the host vector.
3. Set the initial value for the transformed reduction to 0.
4. Add the sum of the square of each element and get the square root of the sum (by using `std::sqrt()`). That is the definition of the Euclidean norm. Use the `square` operator to calculate the square of each element. Use the `thrust::plus` binary operator to sum elements.
5. Print the norm to the standard output.

## Key APIs and Concepts

- `thrust::transform_reduce()` computes a generalized sum (AKA reduction or fold) after transforming each element with a unary function. Both the transformation and the reduction function can be specified. (e.g. with `thrust::plus` as the binary summation and `f` as the transform function `transform_reduce` would compute the value of `f(a[0]) + f(a[1]) + f(a[2]) + ...`).
- In the example, the operator is the `thrust::plus` function object with doubles. It is a binary operator that returns the arithmetic sum.
- An initial value is required for the summation.
- A `thrust::device_vector` is used to simplify memory management and transfer. See the [vectors example](../vectors) for the usage of `thrust::vector`.

## Demonstrated API Calls

### rocThrust

- `thrust::device_vector::device_vector`
- `thrust::plus`
- `thrust::reduce()`
5 changes: 5 additions & 0 deletions Libraries/rocThrust/reduce_sum/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
# rocThrust sum (reduce) example

## Description

An example is presented to compute the sum of a `thrust::device_vector` integer vector using the `thrust::reduce()` generalized summation and the `thrust::plus` operator. The result is written to the standard output.

### Application flow

1. Instantiate a `thrust::host_vector` and fill the elements. The values of the elements are printed to the standard output.
2. Copy the vector to the device by `thrust::device_vector`.
3. Set the initial value of the reduction.
4. Use the `thrust::reduce()` generalized summary function with the `thrust::plus` addition operator and return the sum of the vector.
5. Print the sum to the standard output.

## Key APIs and Concepts

- The `thrust::reduce()` function returns a generalized sum. The summation operator has to be provided by the caller.
- In the example, the operator is the `thrust::plus` function object with integers. It is a binary operator that returns the arithmetic sum.
- A `thrust::device_vector` and a `thrust::host_vector` are used to simplify memory management and transfer. For further details, please visit the [vectors example](../vectors/).

## Demonstrated API Calls

### rocThrust

- `thrust::host_vector::host_vector`
- `thrust::host_vector::operator[]`
- `thrust::device_vector::device_vector`
Expand Down
7 changes: 6 additions & 1 deletion Libraries/rocThrust/remove_points/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# rocThrust Remove Points Example

## Description

This short program demonstrates the usage of the `thrust` random number generation, host vector, generation, tuple, zip iterator, and conditional removal templates.
It generates a number of random points $(x, y)$ in a unit square $x,y\in[0,1)$ and then removes all of them outside the unit circle, i.e. with $x^2 + y^2 > 1$.

## Key APIs and Concepts

- Thrust provides functionality for random number generation similar to [the STL `<random>` header](https://en.cppreference.com/w/cpp/header/random) (from C++11 and above), like `thrust::default_random_engine`, `thrust::uniform_real_distribution` and so on.
- Thrust's vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed.
- It is suggested that developers use `host_vector` instead of explicit invocations to `malloc` and `free` functions.
Expand All @@ -13,14 +15,17 @@ It generates a number of random points $(x, y)$ in a unit square $x,y\in[0,1)$ a
- The zip iterator provides the ability to parallel-iterate over several controlled sequences simultaneously. A zip iterator is constructed from a tuple of iterators. Moving the zip iterator moves all the iterators in parallel. Dereferencing the zip iterator returns a tuple that contains the results of dereferencing the individual iterators.
- `remove_if` "removes" every element on which the predicate evaluates to `true` from the range specified by begin and end iterators. All kept elements are moved to the beginning of the range in the same order as in the original sequence, and the end iterator to the range of kept elements is returned. Idiomatic usage of conditional removal is the so-called _erase–remove idiom_ `S.erase(remove_if(S.begin(), S.end(), pred), S.end())`. This idiom cannot be used here because the `zip_iterator` refers to multiple containers.

### Application flow
### Application flow

1. A `thrust::default_random_engine` is instantiated and values are sampled from a uniform distribution between 0 and 1 using `thrust::uniform_real_distribution<float>`.
2. To hold the coordinates of the points, two `thrust::host_vector<float>`s are constructed. Their elements are set one-by-one from a uniform distribution by `generate` and the points are printed to the standard output.
3. Zip iterators are constructed from `begin` and `end` iterators over the coordinate vectors and then passed to the `thrust::remove_if` operation. The operation uses a test `is_outside_circle<float>` to remove all points outside the unit circle and puts all remaining points to the beginning of the range spanned by the zip iterators. `thrust::remove_if` returns an end iterator to the remaining points. The new size for vectors is calculated by finding distance between returned iterator and `begin` iterator and the vectors are resized accordingly.
4. Finally, the remaining points are printed again.

## Demonstrated API Calls

### rocThrust

- `thrust::default_random_engine::default_random_engine`
- `thrust::uniform_real_distribution<RealType>::uniform_real_distribution(RealType, RealType)`
- `thrust::uniform_real_distribution<RealType>::operator()(UniformRandomNumberGenerator)`
Expand Down
7 changes: 6 additions & 1 deletion Libraries/rocThrust/saxpy/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# rocThrust Saxpy Example

## Description

This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) using rocThrust and showcases the usage of the vector and functor templates and of `thrust::fill` and `thrust::transform` operations.

### Application flow
### Application flow

1. Two host arrrays of floats `x` and `y` are instantiated, and their contents are printed to the standard output.
2. Two `thrust::device_vector<float>`s, `X` and `Y`, are instantiated with the corresponding arrays. The contents are copied to the device.
3. The `saxpy_slow` function is invoked next. It uses the most straightforward implementation using a temporary device vector `temp` and two separate transformations, one with multiplies and one with plus. First, the `temp` vector is filled with `a` values, using `thrust::fill`. Then, it is filled by transformed values of `a * X[i]` by `thrust::transform` using the `thrust::multiplies` functor. Last, the device vector `Y` is filled by `temp[i] + Y[i]` by `thrust::transform` using the `thrust::plus` functor.
Expand All @@ -13,6 +15,7 @@ This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) us
7. The values of device vector `Y` are printed to the standard output. The `X` and `Y` vectors are destroyed.

## Key APIs and Concepts

- rocThrust's device and host vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed.
- Additionally, using `device_vector` and `host_vector` simplifies the transfers between device and host memory to a copy assignment. Note that iterators over device containers can be used everywhere just like host iterators.
- It is suggested that developers use `device_vector` and `host_vector` instead of explicit invocations to `malloc` and `free` functions.
Expand All @@ -22,7 +25,9 @@ This simple program implements the SAXPY operation (`Y[i] = a * X[i] + Y[i]`) us
- [Fused Multiply-Add (FMA)](https://en.cppreference.com/w/cpp/numeric/math/fma) operation `fma` represents multiplication of the first two arguments followed by addition of the third one to the product. It has the advantage of being faster and more accurate compated to separate multiplication and addition on the hardware that support such an instruction, as it avoids cancellation error in addition (addition inside `fma` operation proceeds with full non-rounded result of multiplication that is twice wider).

## Demonstrated API Calls

### rocThrust

- `thrust::host_vector::host_vector`
- `thrust::host_vector::operator[]`
- `thrust::host_vector::begin()`
Expand Down
7 changes: 6 additions & 1 deletion Libraries/rocThrust/vectors/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,26 @@
# rocThrust Vectors Example

## Description

This simple program showcases the usage of the `thrust::device_vector` and the `thrust::host_vector` templates.

### Application flow
### Application flow

1. A `thrust::host_vector<int>` is instantiated, its elements are set one-by-one, and the vector is printed to the standard output.
2. The `host_vector` is resized and it is printed again to the standard output.
3. A `thrust::device_vector<int>` is instantiated with the aforementioned `host_vector`. The contents are copied to the device.
4. The `device_vector`'s elements are modified from host code, and it is printed to the standard output.

## Key APIs and Concepts

- Thrust's device and host vectors implement RAII-style ownership over device and host memory pointers (similarly to `std::vector`). The instances are aware of the requested element count, allocate the required amount of memory, and free it upon destruction. When resized, the memory is reallocated if needed.
- Additionally, using `device_vector` and `host_vector` simplifies the transfers between device and host memory to a copy assignment.
- It is suggested that developers use `device_vector` and `host_vector` instead of explicit invocations to `malloc` and `free` functions.

## Demonstrated API Calls

### rocThrust

- `thrust::host_vector::host_vector`
- `thrust::host_vector::~host_vector`
- `thrust::host_vector::operator[]`
Expand Down

0 comments on commit 5cababb

Please sign in to comment.