diff --git a/Applications/prefix_sum/README.md b/Applications/prefix_sum/README.md index ff65d275..49545687 100644 --- a/Applications/prefix_sum/README.md +++ b/Applications/prefix_sum/README.md @@ -18,7 +18,7 @@ The algorithm used has two phases which are repeated: Below is an example where the threads per block is 2. In the first iteration ($\text{offset}=1$) we have 4 threads combining 8 items. -![](prefix_sum_diagram.svg) +![prefix_sum_diagram.svg](prefix_sum_diagram.svg) ### Application flow diff --git a/HIP-Basic/runtime_compilation/README.md b/HIP-Basic/runtime_compilation/README.md index 23467970..9eb87c64 100644 --- a/HIP-Basic/runtime_compilation/README.md +++ b/HIP-Basic/runtime_compilation/README.md @@ -9,6 +9,7 @@ This example showcases how to make use of hipRTC to compile in runtime a kernel ### Application flow The diagram below summarizes the runtime compilation part of the example. + 1. A number of variables are declared and defined to configure the program which will be compiled in runtime. 2. The program is created using the above variables as parameters, along with the SAXPY kernel in string form. 3. The properties of the first device (GPU) available are consulted to set the device architecture as (the only) compile option. diff --git a/Libraries/rocPRIM/README.md b/Libraries/rocPRIM/README.md index 9223a72d..068bbbb4 100644 --- a/Libraries/rocPRIM/README.md +++ b/Libraries/rocPRIM/README.md @@ -1,30 +1,37 @@ # rocPRIM Examples ## Summary + The examples in this subdirectory showcase the functionality of the [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM) library. The examples build on both Linux and Windows for the ROCm (AMD GPU) backend. ## Prerequisites + ### Linux + - [CMake](https://cmake.org/download/) (at least version 3.21) -- OR GNU Make - available via the distribution's package manager + - OR GNU Make - available via the distribution's package manager - [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x) - [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM) - - `rocPRIM-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). + - `rocPRIM-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). ### Windows + - [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload - ROCm toolchain for Windows (No public release yet) - - The Visual Studio ROCm extension needs to be installed to build with the solution files. + - The Visual Studio ROCm extension needs to be installed to build with the solution files. - [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM) - - Installed as part of the ROCm SDK on Windows for ROCm platform. + - Installed as part of the ROCm SDK on Windows for ROCm platform. - [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21) - [Ninja](https://ninja-build.org/) (optional, to build with CMake) ## Building + ### Linux + Make sure that the dependencies are installed, or use one of the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment. #### Using CMake + All examples in the `rocPRIM` subdirectory can either be built by a single CMake project or be built independently. - `$ cd Libraries/rocPRIM` @@ -32,16 +39,20 @@ All examples in the `rocPRIM` subdirectory can either be built by a single CMake - `$ cmake --build build` #### Using Make + All examples can be built by a single invocation to Make or be built independently. - `$ cd Libraries/rocPRIM` - `$ make` ### Windows + #### Visual Studio + Visual Studio solution files are available for the individual examples. To build all examples for rocPRIM open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocPRIM. For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio). #### CMake + All examples in the `rocPRIM` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2). diff --git a/Libraries/rocPRIM/block_sum/README.md b/Libraries/rocPRIM/block_sum/README.md index a01fb72a..122a3d51 100644 --- a/Libraries/rocPRIM/block_sum/README.md +++ b/Libraries/rocPRIM/block_sum/README.md @@ -1,9 +1,11 @@ # rocPRIM Block Sum Example ## Description + This simple program showcases the usage of the `rocprim::block_reduce` block-level function. It also showcases the usage of `rocprim::block_load` block-level load function. The results from `rocprim::block_load` are eventually used by `rocprim::block_reduce`. The final result of the block-level reductions are written to the standard output. -### Application flow +### Application flow + 1. Host side data is instantiated in a `std::vector`. 2. Device storage for input and output data is allocated using `hipMalloc`. 3. Input data is copied from the host to the device using `hipMemcpy`. @@ -15,18 +17,22 @@ This simple program showcases the usage of the `rocprim::block_reduce` block-lev 6. All device memory is freed using `hipFree`. ## Key APIs and Concepts -- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::block_reduce` which is a GPU block-level function. + +- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::block_reduce` which is a GPU block-level function. - The `rocprim::block_reduce` template function performs a reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (`rocprim::plus`) is used, which fulfils this property. - `rocprim::block_reduce` is a collective operation, which means all threads in the block must make a call to `rocprim::block_reduce`. - In this example `rocprim::block_load` is used to pre-fetch (load) the global input data. It has the potential to increase performance since data is effiently loaded into per-thread local register space. ## Used API surface + ### rocPRIM + - `rocprim::block_reduce` - `rocprim::plus` - `rocprim::block_load` ### HIP runtime + - `hipGetErrorString` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocPRIM/device_sum/README.md b/Libraries/rocPRIM/device_sum/README.md index 7feabe53..66c440d4 100644 --- a/Libraries/rocPRIM/device_sum/README.md +++ b/Libraries/rocPRIM/device_sum/README.md @@ -1,9 +1,11 @@ # rocPRIM Device Sum Example ## Description + This simple program showcases the usage of the device function `rocprim::reduce`. -### Application flow +### Application flow + 1. Input data is instantiated in a `std::vector` and the values are printed to the standard output. 2. Device storage for input and output data is allocated using `hipMalloc`. 3. Input data is copied from the host to the device using `hipMemcpy`. @@ -15,16 +17,20 @@ This simple program showcases the usage of the device function `rocprim::reduce` 9. All device memory is freed using `hipFree`. ## Key APIs and Concepts + - rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases `rocprim::reduce` which is a device function, thereby it can be called from host code. - The `rocprim::reduce` template function performs a generalized reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (`rocprim::plus`) is used which fulfils this property. - The device functions of `rocPRIM` require a temporary device memory location to store the results of intermediate calculations. The required amount of temporary storage can be calculated by invoking the function with matching argument set, except the first argument `temporary_storage` must be a `nullptr`. In this case, the GPU kernel is not launched. ## Demonstrated API Calls + ### rocPRIM + - `rocprim::reduce` - `rocprim::plus` ### HIP runtime + - `hipMalloc` - `hipMemcpy` - `hipFree` diff --git a/Libraries/rocRAND/README.md b/Libraries/rocRAND/README.md index eb573c5a..b0db0857 100644 --- a/Libraries/rocRAND/README.md +++ b/Libraries/rocRAND/README.md @@ -1,32 +1,39 @@ # rocRAND Examples ## Summary + The examples in this subdirectory showcase the functionality of the [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND) library. The examples build on both Linux and Windows for both the ROCm (AMD GPU) and CUDA (NVIDIA GPU) backend. ## Prerequisites + ### Linux + - [CMake](https://cmake.org/download/) (at least version 3.21) -- OR GNU Make - available via the distribution's package manager + - OR GNU Make - available via the distribution's package manager - [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x) OR the HIP Nvidia runtime (on the CUDA platform) - [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND) - - ROCm platform: `rocrand-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). - - CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install). + - ROCm platform: `rocrand-dev` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). + - CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install). ### Windows + - [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload - ROCm toolchain for Windows (No public release yet) - - The Visual Studio ROCm extension needs to be installed to build with the solution files. + - The Visual Studio ROCm extension needs to be installed to build with the solution files. - [rocRAND](https://github.com/rocmSoftwarePlatform/rocRAND) - - ROCm platform: Installed as part of the ROCm SDK on Windows. - - CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install). + - ROCm platform: Installed as part of the ROCm SDK on Windows. + - CUDA platform: Install rocRAND from source: [instructions](https://github.com/rocmSoftwarePlatform/rocRAND#build-and-install). - [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21) - [Ninja](https://ninja-build.org/) (optional, to build with CMake) ## Building + ### Linux + Make sure that the dependencies are installed, or use the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment set up specifically for the example suite. #### Using CMake + All examples in the `rocRAND` subdirectory can either be built by a single CMake project or be built independently. - `$ cd Libraries/rocRAND` @@ -34,16 +41,20 @@ All examples in the `rocRAND` subdirectory can either be built by a single CMake - `$ cmake --build build` #### Using Make + All examples can be built by a single invocation to Make or be built independently. - `$ cd Libraries/rocRAND` - `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA) ### Windows + #### Visual Studio + Visual Studio solution files are available for the individual examples. To build all examples for rocRAND open the top level solution file [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) and filter for rocRAND. For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio). #### CMake + All examples in the `rocRAND` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2). diff --git a/Libraries/rocRAND/simple_distributions_cpp/README.md b/Libraries/rocRAND/simple_distributions_cpp/README.md index 47b8188e..76cc9f83 100644 --- a/Libraries/rocRAND/simple_distributions_cpp/README.md +++ b/Libraries/rocRAND/simple_distributions_cpp/README.md @@ -1,9 +1,11 @@ # rocRAND Simple Distributions Example (C++) ## Description + This sample illustrates the usage of the rocRAND random number generator library via the host-side C++ API. The usage of the random engines and random distributions offered by rocRAND is showcased. The usage, results and execution time of each algorithm provided by rocRAND is compared to the corresponding standard library equivalent. -### Application flow +### Application flow + 1. The example command line application takes optional arguments: the used device index, the random distribution type, the element count of the generated random vector and whether the generated vectors should be printed to the standard output. 2. The arguments are parsed in `parse_args` and the result is printed to the standard output. If the parsing fails due to e.g. malformed input, an exception is raised, the correct usage is printed and the program returns with an error code. 3. The utilized device (GPU) is selected in `set_device`. If the selected device does not exist, an error message is printed to the standard error output and the program returns with an error code. Otherwise the name of the selected device is printed to the standard output. @@ -11,18 +13,24 @@ This sample illustrates the usage of the rocRAND random number generator library 5. Two vectors filled with randomly generated values are produced in `compare_device_and_host_random_number_generation`. One is generated on the device using rocRAND (`generate_random_vector_on_device`) and the other is generated on the host using the standard `` library (`generate_random_vector_on_host`). The runtime of the two functions is measured and printed to the standard output. ### Command line interface + The application provides the following optional command line arguments: + - `--device `. Controls which device (GPU) the random number generation runs on. Default value is `0`. - `--distribution uniform_int|uniform_float|normal|poisson`. Controls the type of the random distribution that is used for the random number generation. Default value is `uniform_int`. - `--size `. Controls the number of random numbers generated. - `--print`. If specified, the generated random vectors are written to the standard output. ## Key APIs and Concepts + ### rocRAND Engines + rocRAND engines define algorithms that generate sequences of random numbers. Typically an engine maintains an internal state that determines the order and value of all subsequent random numbers produced by the engine. In that sense, an engine lacks true randomness, hence the name pseudo-random number generator (or PRNG). Other engines produce quasi-random sequences, which appear to be equidistributed. An engine can be initialized with a seed value that determines the initial state of the engine. Different engine types employ different algorithms to generate the pseudo-random sequence, they differ in the mathematical characteristics of the sequence generated. Unless special requirements arise, it is safe to use the `rocrand_cpp::default_random_engine` alias to create an engine. For the full list of implemented engines, refer to the documentation. ### rocRAND Distributions + A PRNG engine typically generates uniformly distributed integral numbers over the full range of the type. In order to transform this output to something more useful, rocRAND provides a set of distributions that transform this raw random sequence to samples of a random distribution. This example showcases the following distributions: + - `rocrand_cpp::uniform_int_distribution` generates unsigned integers sampled from a [discrete uniform distribution](https://en.wikipedia.org/wiki/discrete_uniform_distribution) - `rocrand_cpp::uniform_real_distribution` generates floating point numbers sampled from a [continuous uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution) over the interval of `[0,1)` - `rocrand_cpp::normal_distribution` generates floating point numbers sampled from a standard [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). @@ -33,6 +41,7 @@ For the full list of implemented distributions, refer to the documentation. ## Demonstrated API Calls ### rocRAND + - `rocrand_cpp::default_random_engine` - `rocrand_cpp::uniform_int_distribution` - `rocrand_cpp::uniform_real_distribution` @@ -40,6 +49,7 @@ For the full list of implemented distributions, refer to the documentation. - `rocrand_cpp::poisson_distribution` ### HIP runtime + - `hipGetErrorString` - `hipSetDevice` - `hipGetDeviceProperties` diff --git a/Libraries/rocSOLVER/getf2/README.md b/Libraries/rocSOLVER/getf2/README.md index 745dfc8c..85d19c67 100644 --- a/Libraries/rocSOLVER/getf2/README.md +++ b/Libraries/rocSOLVER/getf2/README.md @@ -1,7 +1,9 @@ # rocSOLVER LU Factorization Example ## Description + This example illustrates the use of the rocSOLVER `getf2` functionality. The rocSOLVER `getf2` computes the [LU decomposition](https://en.wikipedia.org/wiki/LU_decomposition) of an $m \times n$ matrix $A$, with partial pivoting. This factorization is given by $P \cdot A = L \cdot U$, where: + - `getf2()`: This is the unblocked Level-2-BLAS version of the LU factorization algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices. - $A$ is the $m \times n$ input matrix. - $P$ is an $m \times m$ [permutation matrix](https://en.wikipedia.org/wiki/Permutation_matrix), in this example stored as an array of row indices `vector Ipiv` of size `min(m, n)`. @@ -13,6 +15,7 @@ This example illustrates the use of the rocSOLVER `getf2` functionality. The roc - an $n \times n$ upper tridiagonal matrix, when $m \geq n$ ### Application flow + 1. Parse command line arguments for the dimension of the input matrix. 2. Declare and initialize a number of constants for the input and output matrices and vectors. 3. Allocate and initialize the host matrices and vectors. @@ -25,7 +28,9 @@ This example illustrates the use of the rocSOLVER `getf2` functionality. The roc 10. Free device memory and the rocBLAS handle. ## Key APIs and Concepts + ### rocSOLVER + - `rocsolver_[sdcz]getf2` computes the LU factorization of the $m \times n$ input matrix $A$. The correct function signature should be chosen, based on the datatype of the input matrix: - `s` (single-precision: `float`) - `d` (double-precision: `double`) @@ -33,6 +38,7 @@ This example illustrates the use of the rocSOLVER `getf2` functionality. The roc - `z` (double-precision complex: `rocblas_double_complex`). Input parameters for the precision used in this example (double-precision): + - `rocblas_handle handle` - `const rocblas_int m`: number of rows of $A$ - `const rocblas_int n`: number of columns of $A$ @@ -44,10 +50,13 @@ Input parameters for the precision used in this example (double-precision): Return type: `rocblas_status` ## Used API surface + ### rocSOLVER + - `rocsolver_dgetf2` ### rocBLAS + - `rocblas_create_handle` - `rocblas_destroy_handle` - `rocblas_double` @@ -55,6 +64,7 @@ Return type: `rocblas_status` - `rocblas_int` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSOLVER/getri/README.md b/Libraries/rocSOLVER/getri/README.md index 5eb82530..46827c47 100644 --- a/Libraries/rocSOLVER/getri/README.md +++ b/Libraries/rocSOLVER/getri/README.md @@ -1,9 +1,11 @@ # rocSOLVER Matrix Inversion Example ## Description + This example showcases computing the inversion $A^{-1}$ of a rectangular matrix $n\times n$ matrix $A$ using rocSOLVER. The inversion operation is divided in two steps: first, an [LU factorization](https://en.wikipedia.org/wiki/LU_decomposition) is computed from the input matrix $A$ by use of the `getrf` operation, which yields the lower triangular matrix $L$, upper triangular matrix $U$, and permutation matrix $P$. The results of this operation satisfy $A = PLU$. Next, the inverted matrix $A^{-1}$ is computed from $L$, $U$, and $P$ by using the `getri` operation. ### Application flow + 1. Parse command line arguments for the dimension of the input matrix. 2. Declare and initialize a number of constants for the input and output matrix. 3. Allocate and initialize the to-be-inverted input matrix. @@ -11,51 +13,73 @@ This example showcases computing the inversion $A^{-1}$ of a rectangular matrix 5. Create a rocBLAS library handle. 6. Invoke the rocSOLVER `getrf` operation with double precision. 7. Copy the `getrf` info output back to the host, and check whether the operation was successful. -9. Invoke the rocSOLVER `getri` operation with double precision. -10. Copy the `getri` info output back to the host, and check again whether the operation was a success. -11. Validate the solution by checking if $A\cdot A^{-1} = I$ holds. This is done by computing $1\cdot A \cdot A^{-1} + -1 \cdot I$ using the `gemm` operation from rocBLAS, and comparing the elements of the result to 0. -12. Free device memory, release rocBLAS handle. +8. Invoke the rocSOLVER `getri` operation with double precision. +9. Copy the `getri` info output back to the host, and check again whether the operation was a success. +10. Validate the solution by checking if $A\cdot A^{-1} = I$ holds. This is done by computing $1\cdot A \cdot A^{-1} + -1 \cdot I$ using the `gemm` operation from rocBLAS, and comparing the elements of the result to 0. +11. Free device memory, release rocBLAS handle. ## Key APIs and Concepts + ### rocSOLVER + - `rocsolver_[sdcz]getrf` computes the LU-factorization of an $m\times n$ matrix $A$, and optionally also provides a permutation matrix $P$ when partial pivoting is used. Depending on the character matched in `[sdcz]`, the factorization can be computed with different precision: - - `s` (single-precision: `float`) - - `d` (double-precision: `double`) - - `c` (single-precision complex: `rocblas_float_complex`) - - `z` (double-precision complex: `rocblas_double_complex`). - - Double precision is used in the example. In this case, the function accepts the following parameters: - -`rocblas_handle handle` is a handle to the rocBLAS library, created using `rocblas_create_handle`. - - `rocblas_int m` is the number of rows in $A$. - - `rocblas_int n` is the number of columns in $A$. - - `double* A` is a device-pointer to the memory of matrix $A$. It should hold at least $n\times lda$ elements. The `getrf` operation is performed in-place, and the resulting $L$ and $U$ matrices are stored in the memory of $A$. Diagonal elements of $L$ are not stored. - - `rocblas_int lda` is the leading dimension of matrix $A$, which is the stride between the first element of the columns of the matrix. Note that the matrix is laid out in column-major ordering. - - `rocblas_int* ipiv` is a device-pointer to where the permutation matrix used for partial pivoting is written to. Note that the permutation matrix can be represented using a single array, and so this parameter requires only `min(n, m)` ints of memory. If `ipiv[i] = j`, then row `j` was interchanged with row `i`. Note that row indices are 1-indexed. Partial pivoting enables numerical stability for this algorithm. If this is undesired, the fucntion `rocsolver_[sdcz]getrf_npvt` can be used to omit partial pivoting. - - `rocblas_int* info` is a device-pointer to a single integer that describes the result of the operation. If `*info` is `0`, the operation was successful. Otherwise `*info` holds the first non-zero pivot (1-indexed), and means that $A$ was not invertible. - - The function returns a `rocblas_status` value, which indicates whether any errors have occurred during the operation. + + - `s` (single-precision: `float`) + - `d` (double-precision: `double`) + - `c` (single-precision complex: `rocblas_float_complex`) + - `z` (double-precision complex: `rocblas_double_complex`). + + Double precision is used in the example. In this case, the function accepts the following parameters: + + - `rocblas_handle handle` is a handle to the rocBLAS library, created using `rocblas_create_handle`. + + - `rocblas_int m` is the number of rows in $A$. + + - `rocblas_int n` is the number of columns in $A$. + + - `double* A` is a device-pointer to the memory of matrix $A$. It should hold at least $n\times lda$ elements. The `getrf` operation is performed in-place, and the resulting $L$ and $U$ matrices are stored in the memory of $A$. Diagonal elements of $L$ are not stored. + + - `rocblas_int lda` is the leading dimension of matrix $A$, which is the stride between the first element of the columns of the matrix. Note that the matrix is laid out in column-major ordering. + + - `rocblas_int* ipiv` is a device-pointer to where the permutation matrix used for partial pivoting is written to. Note that the permutation matrix can be represented using a single array, and so this parameter requires only `min(n, m)` ints of memory. If `ipiv[i] = j`, then row `j` was interchanged with row `i`. Note that row indices are 1-indexed. Partial pivoting enables numerical stability for this algorithm. If this is undesired, the fucntion `rocsolver_[sdcz]getrf_npvt` can be used to omit partial pivoting. + + - `rocblas_int* info` is a device-pointer to a single integer that describes the result of the operation. If `*info` is `0`, the operation was successful. Otherwise `*info` holds the first non-zero pivot (1-indexed), and means that $A$ was not invertible. + + The function returns a `rocblas_status` value, which indicates whether any errors have occurred during the operation. - `rocblas_[sdcz]getri` inverts a batch of $n\times n$ matrix using the LU-factorization previously obtained using `getrf`. As with the previous operations, different operations for `getri` are available, depending on the character matched in `[sdcz]`. + - `rocblas_[sdcz]getri` inverts an $n\times n$ matrix using the LU-factorization previously obtained using `getrf`. As with `getrf`, `getri` can be computed with a different precision based on the character matched in `[sdcz]`. - In the case for double precision, the function accepts the following parameters: - - `rocblas_handle handle` is a handle to the rocBLAS library, created using `rocblas_create_handle`. - - `rocblas_int n` is the number of rows and columns of the matrix. - - `double* A` is a device-pointer to an LU-factorized matrix $A$. The matrix should have at least $n\times lda$ elements. On successful exit, the values in this matrix are overwritten with the inversion $A^{-1}$ of the original matrix $A$. - - `rocblas_int lda` is the leading dimension of $A$, which is the stride between the first element of the columns of the matrix. Note that the matrices are laid out in column-major ordering. - - `rocblas_int* ipiv` is a device-pointer to the permutation matrix of the LU-factorization of $A$. If no permutation matrix is available, the `rocsolver_[sdcz]_getri_npvt` function can be used instead. - - `rocblas_int* info` is a device-pointer to a single integer that describes the result of the inversion operation. If `*info` is non-zero, then the inversion failed, and the value indicates the first zero pivot in $A$. If `*info` is zero, then the operation was successful and `A` holds the inverted matrix $A^{-1}$ of the original matrix $A$. - The function returns a `rocblas_status` value, which indicates whether any errors have occurred during the operation. + In the case for double precision, the function accepts the following parameters: + + - `rocblas_handle handle` is a handle to the rocBLAS library, created using `rocblas_create_handle`. + + - `rocblas_int n` is the number of rows and columns of the matrix. + + - `double* A` is a device-pointer to an LU-factorized matrix $A$. The matrix should have at least $n\times lda$ elements. On successful exit, the values in this matrix are overwritten with the inversion $A^{-1}$ of the original matrix $A$. + + - `rocblas_int lda` is the leading dimension of $A$, which is the stride between the first element of the columns of the matrix. Note that the matrices are laid out in column-major ordering. + + - `rocblas_int* ipiv` is a device-pointer to the permutation matrix of the LU-factorization of $A$. If no permutation matrix is available, the `rocsolver_[sdcz]_getri_npvt` function can be used instead. + + - `rocblas_int* info` is a device-pointer to a single integer that describes the result of the inversion operation. If `*info` is non-zero, then the inversion failed, and the value indicates the first zero pivot in $A$. If `*info` is zero, then the operation was successful and `A` holds the inverted matrix $A^{-1}$ of the original matrix $A$. + + The function returns a `rocblas_status` value, which indicates whether any errors have occurred during the operation. ### rocBLAS + - `rocblas_[sdcz]gemm` performs a general matrix multiplication in the form of $C = \alpha\cdot op_a(A)\cdot op_b(B) + \beta\cdot C$. This function is showcased in the [rocBLAS level 3 GEMM example](/Libraries/rocBLAS/level_3/gemm/). ## Used API surface + ### rocSOLVER + - `rocsolver_dgetrf` - `rocsolver_dgetri` ### rocBLAS + - `rocblas_create_handle` - `rocblas_destroy_handle` - `rocblas_dgemm` @@ -66,6 +90,7 @@ This example showcases computing the inversion $A^{-1}$ of a rectangular matrix - `rocblas_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSOLVER/syev_batched/README.md b/Libraries/rocSOLVER/syev_batched/README.md index ec7b30df..814a2ea7 100644 --- a/Libraries/rocSOLVER/syev_batched/README.md +++ b/Libraries/rocSOLVER/syev_batched/README.md @@ -29,11 +29,11 @@ and the eigenvalues as a diagonal matrix: $$ W_i = \mathrm{diag}\left(\mathbf{w_i}\right) = \mathrm{diag}\left([\lambda_{i_0}, \dots, \lambda_{i_j}, \dots, \lambda_{i_{n-1}}]\right) = \begin{bmatrix} -\lambda_{i_0} & & & & & \\ - & \lambda_{i_1} & & & & \\ - & & \ddots & & & \\ - & & & \lambda_{i_j} & & \\ - & & & & \ddots & \\ +\lambda_{i_0} & & & & & \\ + & \lambda_{i_1} & & & & \\ + & & \ddots & & & \\ + & & & \lambda_{i_j} & & \\ + & & & & \ddots & \\ & & & & & \lambda_{i_{n-1}} \end{bmatrix} $$ @@ -72,6 +72,7 @@ The application provides the following optional command line arguments: 12. Validate the results ## Key APIs and Concepts + - The performance of a numerical multi-linear algebra code can be heavily increased by using tensor contractions [ [Y. Shi et al., HiPC, pp 193, 2016.](https://doi.org/10.1109/HiPC.2016.031) ], thereby similarly to other linear algebra libraries like hipBLAS rocSOLVER also has a `_batched` and a `_strided_batched` [ [C. Jhurani and P. Mullowney, JPDP Vol 75, pp 133, 2015.](https://doi.org/10.1016/j.jpdc.2014.09.003) ] extensions.
We can apply the same operation for several matrices if we combine them into batched matrices. Batched computation has a performance improvement for a large number of small matrices. For a constant stride between matrices, further acceleration is available by strided batched solvers. @@ -103,6 +104,7 @@ We can apply the same operation for several matrices if we combine them into bat - `const rocblas_int batch_count`: Number of matrices in the batch. ### rocBLAS + - rocBLAS is initialized by calling `rocblas_create_handle(rocblas_handle t*)` and it is terminated by calling `rocblas_destroy_handle(t)`. ## Used API surface diff --git a/Libraries/rocSOLVER/syev_strided_batched/README.md b/Libraries/rocSOLVER/syev_strided_batched/README.md index e0c896d0..2cf30ae7 100644 --- a/Libraries/rocSOLVER/syev_strided_batched/README.md +++ b/Libraries/rocSOLVER/syev_strided_batched/README.md @@ -29,11 +29,11 @@ and the eigenvalues as a diagonal matrix: $$ W_i = \mathrm{diag}\left(\mathbf{w_i}\right) = \mathrm{diag}\left([\lambda_{i_0}, \dots, \lambda_{i_j}, \dots, \lambda_{i_{n-1}}]\right) = \begin{bmatrix} -\lambda_{i_0} & & & & & \\ - & \lambda_{i_1} & & & & \\ - & & \ddots & & & \\ - & & & \lambda_{i_j} & & \\ - & & & & \ddots & \\ +\lambda_{i_0} & & & & & \\ + & \lambda_{i_1} & & & & \\ + & & \ddots & & & \\ + & & & \lambda_{i_j} & & \\ + & & & & \ddots & \\ & & & & & \lambda_{i_{n-1}} \end{bmatrix} $$ @@ -57,7 +57,6 @@ The application provides the following optional command line arguments: - `-c ` the size of the batch. Default value is `3`. - `-p

` The size of the padding. This value is used to calculate the stride for the input matrix, eigenvalues and the tridiagonal matrix. - ## Application flow 1. Parse command line arguments for dimensions of the input matrix. @@ -74,6 +73,7 @@ The application provides the following optional command line arguments: 12. Free the memory allocations on device. ## Key APIs and Concepts + - The performance of a numerical multi-linear algebra code can be heavily increased by using tensor contractions [ [Y. Shi et al., HiPC, pp 193, 2016.](https://doi.org/10.1109/HiPC.2016.031) ], thereby similarly to other linear algebra libraries like hipBLAS rocSOLVER also has a `_batched` and a `_strided_batched` [ [C. Jhurani and P. Mullowney, JPDP Vol 75, pp 133, 2015.](https://doi.org/10.1016/j.jpdc.2014.09.003) ] extensions.
We can apply the same operation for several matrices if we combine them into batched matrices. Batched computation has a performance improvement for a large number of small matrices. For a constant stride between matrices, further acceleration is available by strided batched solvers. @@ -105,6 +105,7 @@ We can apply the same operation for several matrices if we combine them into bat - `rocblas_int batch_count`: Number of matrices in the batch. ### rocBLAS + - rocBLAS is initialized by calling `rocblas_create_handle(rocblas_handle t*)` and it is terminated by calling `rocblas_destroy_handle(t)`. ## Used API surface diff --git a/Libraries/rocSPARSE/README.md b/Libraries/rocSPARSE/README.md index 82490053..aeabe845 100644 --- a/Libraries/rocSPARSE/README.md +++ b/Libraries/rocSPARSE/README.md @@ -1,32 +1,50 @@ # rocSPARSE Examples ## Summary + The examples in this subdirectory showcase the functionality of the [rocSPARSE](https://github.com/rocmSoftwarePlatform/rocSPARSE) library. The examples build on both Linux and Windows for both the ROCm (AMD GPU) and CUDA (NVIDIA GPU) backend. ## Prerequisites + ### Linux + - [CMake](https://cmake.org/download/) (at least version 3.21) -- OR GNU Make - available via the distribution's package manager + + - OR GNU Make - available via the distribution's package manager + - [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/Overview_of_ROCm_Installation_Methods.html) (at least version 5.x.x) OR the HIP Nvidia runtime (on the CUDA platform) + - [rocSPARSE](https://github.com/rocmSoftwarePlatform/rocSPARSE) - - ROCm platform: `rocsparse` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). - - CUDA platform: Install rocSPARSE from source: [instructions](https://rocsparse.readthedocs.io/en/rocm-5.5.0/usermanual.html#building-rocsparse-from-source). + + - ROCm platform: `rocsparse` package available from [repo.radeon.com](https://repo.radeon.com/rocm/). The repository is added during the standard ROCm [install procedure](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.1.3/page/How_to_Install_ROCm.html). + + - CUDA platform: Install rocSPARSE from source: [instructions](https://rocsparse.readthedocs.io/en/rocm-5.5.0/usermanual.html#building-rocsparse-from-source). ### Windows + - [Visual Studio](https://visualstudio.microsoft.com/) 2019 or 2022 with the "Desktop Development with C++" workload + - ROCm toolchain for Windows (No public release yet) - - The Visual Studio ROCm extension needs to be installed to build with the solution files. + + - The Visual Studio ROCm extension needs to be installed to build with the solution files. + - [rocSPARSE](https://github.com/rocmSoftwarePlatform/rocSPARSE) - - ROCm platform: Installed as part of the ROCm SDK on Windows. - - CUDA platform: Install rocSPARSE from source: [instructions](https://rocsparse.readthedocs.io/en/rocm-5.5.0/usermanual.html#building-rocsparse-from-source). + + - ROCm platform: Installed as part of the ROCm SDK on Windows. + - CUDA platform: Install rocSPARSE from source: [instructions](https://rocsparse.readthedocs.io/en/rocm-5.5.0/usermanual.html#building-rocsparse-from-source). + - [CMake](https://cmake.org/download/) (optional, to build with CMake. Requires at least version 3.21) + - [Ninja](https://ninja-build.org/) (optional, to build with CMake) ## Building + ### Linux + Make sure that the dependencies are installed, or use the [provided Dockerfiles](../../Dockerfiles/) to build and run the examples in a containerized environment set up specifically for the example suite. #### Using CMake + All examples in the `rocSPARSE` subdirectory can either be built by a single CMake project or be built independently. - `$ cd Libraries/rocSPARSE` @@ -34,16 +52,20 @@ All examples in the `rocSPARSE` subdirectory can either be built by a single CMa - `$ cmake --build build` #### Using Make + All examples can be built by a single invocation to Make or be built independently. - `$ cd Libraries/rocSPARSE` - `$ make` (on ROCm) or `$ make GPU_RUNTIME=CUDA` (on CUDA) ### Windows + #### Visual Studio + Visual Studio solution files are available for the individual examples. To build all examples for rocSPARSE open the top level solution file [ROCm-Examples-VS2017.sln](../../ROCm-Examples-VS2017.sln), [ROCm-Examples-VS2019.sln](../../ROCm-Examples-VS2019.sln) or [ROCm-Examples-VS2022.sln](../../ROCm-Examples-VS2022.sln) (for Visual Studio 2017, 2019 or 2022, respectively) and filter for rocSPARSE. For more detailed build instructions refer to the top level [README.md](../../README.md#visual-studio). #### CMake + All examples in the `rocSPARSE` subdirectory can either be built by a single CMake project or be built independently. For build instructions refer to the top-level [README.md](../../README.md#cmake-2). diff --git a/Libraries/rocSPARSE/level_2/bsrmv/README.md b/Libraries/rocSPARSE/level_2/bsrmv/README.md index a17cf7ae..fff0c53b 100644 --- a/Libraries/rocSPARSE/level_2/bsrmv/README.md +++ b/Libraries/rocSPARSE/level_2/bsrmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 BSR Matrix-Vector Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication using BSR storage format. The operation calculates the following product: @@ -13,6 +15,7 @@ where - $A'$ is a sparse matrix in BSR format with `rocsparse_operation` and described below. ## Application flow + 1. Set up a sparse matrix in BSR format. Allocate an x and a y vector and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor and a matrix info. 3. Allocate device memory and copy input matrix and vectors from host to device. @@ -23,23 +26,27 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -137,7 +144,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -150,26 +157,29 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - `rocsparse_[dscz]bsrmv(...)` performs the sparse matrix-dense vector multiplication $\hat{y}=\alpha \cdot A' x + \beta \cdot y$ using the BSR format. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation trans`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ Currently, only `rocsparse_operation_none` is supported. - `rocsparse_mat_descr`: descriptor of the sparse BSR matrix. - `rocsparse_direction` block storage major direction with the following options: - - `rocsparse_direction_column` - - `rocsparse_direction_row` + - `rocsparse_direction_column` + - `rocsparse_direction_row` ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_create_mat_info` @@ -187,6 +197,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/bsrsv/README.md b/Libraries/rocSPARSE/level_2/bsrsv/README.md index 33ce831a..920aefd1 100644 --- a/Libraries/rocSPARSE/level_2/bsrsv/README.md +++ b/Libraries/rocSPARSE/level_2/bsrsv/README.md @@ -1,6 +1,7 @@ # rocSPARSE Level 2 BSR Triangular Solver Example ## Description + This example illustrates the use of the `rocSPARSE` level 2 triangular solver using the BSR storage format. This triangular solver is used to solve a linear system of the form @@ -13,9 +14,9 @@ where - $A$ is a sparse triangular matrix of order $n$ whose elements are the coefficients of the equations, - $A'$ is one of the following: - - $A' = A$ (identity) - - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) - - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), + - $A' = A$ (identity) + - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) + - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), - $\alpha$ is a scalar, - $x$ is a dense vector of size $m$ containing the constant terms of the equations, and - $y$ is a dense vector of size $n$ which contains the unknowns of the system. @@ -23,6 +24,7 @@ where Obtaining the solution for such a system consists of finding concrete values of all the unknowns such that the above equality holds. ### Application flow + 1. Setup input data. 2. Allocate device memory and offload input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -34,23 +36,27 @@ Obtaining the solution for such a system consists of finding concrete values of 9. Print validation result. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -148,7 +154,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -161,34 +167,46 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. + - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. + - `rocsparse_direction dir`: matrix storage of BSR blocks. The following values are accepted: - - `rocsparse_direction_row`: parse blocks by rows. - - `rocsparse_direction_column`: parse blocks by columns. + - `rocsparse_direction_row`: parse blocks by rows. + - `rocsparse_direction_column`: parse blocks by columns. + - `rocsparse_operation trans`: matrix operation applied to the given input matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_[sdcz]bsrsv_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]bsrsv_analysis` and `rocsparse_[sdcz]bsrsv_solve` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. + - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. + - `rocsparse_[sdcz]bsrsv_solve` solves a sparse triangular linear system $A'y = \alpha x$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_[sdcz]bsrsv_analysis` performs the analysis step for `rocsparse_[sdcz]bsrsv_solve`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]bsrsv_solve`. + - `rocsparse_bsrsv_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sdcz]bsrsv_solve(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_bsrsv_zero_pivot` @@ -221,6 +239,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/bsrxmv/README.md b/Libraries/rocSPARSE/level_2/bsrxmv/README.md index 2d117996..2bc05937 100644 --- a/Libraries/rocSPARSE/level_2/bsrxmv/README.md +++ b/Libraries/rocSPARSE/level_2/bsrxmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 BSR Matrix-Vector Multiplication with Mask Operation + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication with mask operation using BSR storage format. The function returns the BSR matrix-vector product for the masked blocks @@ -17,6 +19,7 @@ where otherwise it returns the identical $\mathbf{y}$ vector elements. ## Application flow + 1. Set up a sparse matrix in BSR format. Allocate an x and a y vector, set up $\alpha$ and $\beta$ scalars and set up the mask. 2. Set up a handle, a matrix descriptor and a matrix info. 3. Allocate device memory and copy input matrix and vectors, and mask array from host to device. @@ -26,23 +29,27 @@ otherwise it returns the identical $\mathbf{y}$ vector elements. 7. Print result to the standard output. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -140,7 +147,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -185,7 +192,8 @@ The BSRX format is the same as BSR, but the `bsr_row_ptr` is separated into star - `bsrx_end_ptr`: the position next to the last block (last + 1) that is used for the calculation. This block is typically the last nonzero block. Therefore: -``` + +```math bsrx_row_ptr = { 0, 1, 3 } bsrx_end_ptr = { 1, 3, 4 } @@ -194,26 +202,30 @@ bsrx_end_ptr = { 1, 3, 4 } Additionally, `bsrx_end_ptr` can be used for column masking, how it is presented in the example. ### rocSPARSE + - `rocsparse_[dscz]bsrxmv(...)` is the solver with four different function signatures depending on the type of the input matrix: - - `d` double-precision real (`double`) - - `s` single-precision real (`float`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `d` double-precision real (`double`) + - `s` single-precision real (`float`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation trans`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + + Currently, only `rocsparse_operation_none` is supported. - Currently, only `rocsparse_operation_none` is supported. - `rocsparse_mat_descr`: descriptor of the sparse BSR matrix. - + - `rocsparse_direction` block storage major direction with the following options: - - `rocsparse_direction_column` - - `rocsparse_direction_row` + - `rocsparse_direction_column` + - `rocsparse_direction_row` ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_dbsrxmv` @@ -228,6 +240,7 @@ Additionally, `bsrx_end_ptr` can be used for column masking, how it is presented - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/coomv/README.md b/Libraries/rocSPARSE/level_2/coomv/README.md index ca6539a1..e8637bd6 100644 --- a/Libraries/rocSPARSE/level_2/coomv/README.md +++ b/Libraries/rocSPARSE/level_2/coomv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 COO Matrix-Vector Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication using COO storage format. The operation calculates the following product: @@ -13,6 +15,7 @@ where - $A'$ is a sparse matrix in COO format with `rocsparse_operation` and described below. ## Application flow + 1. Set up a sparse matrix in COO format. Allocate an x and a y vector and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor and a matrix info. 3. Allocate device memory and copy input matrix and vectors from host to device. @@ -23,8 +26,11 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### COO Matrix Storage Format + The coordinate (COO) storage format represents an $m \times n$ matrix by + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements @@ -35,21 +41,24 @@ The coordinate (COO) storage format represents an $m \times n$ matrix by The COO matrix is sorted by row indices, and by column indices in the same row. ### rocSPARSE + - `rocsparse_[dscz]coomv(...)` is the solver with four different function signatures depending on the type of the input matrix: - - `d` double-precision real (`double`) - - `s` single-precision real (`float`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `d` double-precision real (`double`) + - `s` single-precision real (`float`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ - `rocsparse_mat_descr`: holds all properties of a matrix. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_dcoomv` @@ -62,6 +71,7 @@ The COO matrix is sorted by row indices, and by column indices in the same row. - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/csrmv/README.md b/Libraries/rocSPARSE/level_2/csrmv/README.md index bee8c394..f82907f7 100644 --- a/Libraries/rocSPARSE/level_2/csrmv/README.md +++ b/Libraries/rocSPARSE/level_2/csrmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 CSR Matrix-Vector Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication using CSR storage format. The operation calculates the following product: @@ -13,6 +15,7 @@ where - $A'$ is a sparse matrix in CSR format with `rocsparse_operation`, which is described below in more detail. ## Application flow + 1. Set up a sparse matrix in CSR format. Allocate an x and a y vector and set up $\alpha$ and $\beta$ scalars. 2. Set up handle, matrix descriptor and matrix info variables. 3. Allocate device memory and copy input matrix and vectors from host to device. @@ -23,21 +26,25 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -57,7 +64,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -71,23 +78,25 @@ csr_row_ptr = { 0, 3, 5, 8 } csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` - ### rocSPARSE + - `rocsparse_[dscz]csrmv(...)` is the solver with four different function signatures depending on the type of the input matrix: - - `d` double-precision real (`double`) - - `s` single-precision real (`float`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `d` double-precision real (`double`) + - `s` single-precision real (`float`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ - `rocsparse_mat_descr`: holds all properties of a matrix. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_create_mat_info` @@ -103,6 +112,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/csrsv/README.md b/Libraries/rocSPARSE/level_2/csrsv/README.md index 53c22948..b4f6e1d1 100644 --- a/Libraries/rocSPARSE/level_2/csrsv/README.md +++ b/Libraries/rocSPARSE/level_2/csrsv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 CSR Triangular Solver Example + ## Description + This example illustrates the use of the `rocSPARSE` level 2 triangular solver using the CSR storage format. This triangular solver is used to solve a linear system of the form @@ -12,9 +14,9 @@ where - $A$ is a sparse triangular matrix of order $n$ whose elements are the coefficients of the equations, - $A'$ is one of the following: - - $A' = A$ (identity) - - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) - - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), + - $A' = A$ (identity) + - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) + - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), - $\alpha$ is a scalar, - $x$ is a dense vector of size $m$ containing the constant terms of the equations, and - $y$ is a dense vector of size $n$ which contains the unknowns of the system. @@ -22,6 +24,7 @@ where Obtaining solution for such a system consists on finding concrete values of all the unknowns such that the above equality holds. ### Application flow + 1. Setup input data. 2. Allocate device memory and offload input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -33,21 +36,25 @@ Obtaining solution for such a system consists on finding concrete values of all 9. Print validation result. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -67,7 +74,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -82,31 +89,42 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. + - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. + - `rocsparse_operation trans`: matrix operation applied to the given input matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported for `rocsparse_[sdcz]_csrsv_solve`. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported for `rocsparse_[sdcz]_csrsv_solve`. + - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_[sdcz]csrsv_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]csrsv_analysis` and `rocsparse_[sdcz]csrsv_solve` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. + - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. + - `rocsparse_[sdcz]csrsv_solve` solves a sparse triangular linear system $A'y = \alpha x$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_[sdcz]csrsv_analysis` performs the analysis step for `rocsparse_[sdcz]csrsv_solve`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]csrsv_solve`. + - `rocsparse_csrsv_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sbcz]csrsv_solve(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_create_handle` @@ -137,6 +155,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/ellmv/README.md b/Libraries/rocSPARSE/level_2/ellmv/README.md index 0dfd5352..b90a6871 100644 --- a/Libraries/rocSPARSE/level_2/ellmv/README.md +++ b/Libraries/rocSPARSE/level_2/ellmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 ELL Matrix-Vector Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication using ELL storage format. The operation calculates the following product: @@ -13,6 +15,7 @@ where - $A'$ is a sparse matrix in ELL format with `rocsparse_operation` and described below. ## Application flow + 1. Set up a sparse matrix in ELL format. Allocate an x and a y vector and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor and a matrix info. 3. Allocate device memory and copy input matrix and vectors from host to device. @@ -23,16 +26,19 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### ELL Matrix Storage Format + The Ellpack-Itpack (ELL) storage format represents an $m \times n$ matrix in column-major layout, by fixing the number of non-zeros in each column. A matrix is stored using the following arrays: + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements - `ell_width`: maximum number of non-zero elements per row - `ell_val`: array of data with $`m \times \texttt{ell\_with}`$ elements - `ell_col_ind`: array of column indices with $`m \times \texttt{ell\_with}`$ elements. Rows with less than `ell_width` non-zero elements are padded: - - `ell_val` with zeroes - - `ell_col_ind` with $-1$ + - `ell_val` with zeroes + - `ell_col_ind` with $-1$ For instance, consider a sparse matrix as @@ -49,37 +55,41 @@ $$ Therefore, the ELL representation of $A$ is: -``` +```math m = 3 n = 5 ell_width = 3 -ell_val = { 1, 4, 6, - 2, 5, 7, +ell_val = { 1, 4, 6, + 2, 5, 7, 3, 0, 8 } -ell_col_ind = { 0, 1, 0, - 1, 2, 3, +ell_col_ind = { 0, 1, 0, + 1, 2, 3, 3, -1, 4 } ``` + ### rocSPARSE + - `rocsparse_[dscz]ellmv(...)` is the solver with four different function signatures depending on the type of the input matrix: - - `d` double-precision real (`double`) - - `s` single-precision real (`float`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `d` double-precision real (`double`) + - `s` single-precision real (`float`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ - `rocsparse_mat_descr`: holds all properties of a matrix. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_dellmv` @@ -92,6 +102,7 @@ ell_col_ind = { 0, 1, 0, - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/gebsrmv/README.md b/Libraries/rocSPARSE/level_2/gebsrmv/README.md index 798d6375..0f2d2649 100644 --- a/Libraries/rocSPARSE/level_2/gebsrmv/README.md +++ b/Libraries/rocSPARSE/level_2/gebsrmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 GEBSR Matrix-Vector Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication using GEBSR storage format. The operation calculates the following product: @@ -12,25 +14,29 @@ where - $x$ and $y$ are dense vectors, - $A$ is an $m\times n$ sparse matrix, and - $A'$ is one of the following: - - $A' = A$ (identity) - - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) - - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$). + - $A' = A$ (identity) + - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) + - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$). ## Application flow + 1. Setup input data. 2. Allocate device memory and offload input data to device. 3. Initialize rocSPARSE by creating a handle. 4. Prepare utility variables for rocSPARSE gebsrmv invocation. -5. Call gebsrmv to perform y = alpha * A * x + beta * y. +5. Call gebsrmv to perform $y = alpha * A * x + beta * y$. 6. Copy solution to host from device. 7. Clear rocSPARSE allocations on device and device arrays. 8. Print results to standard output. ## Key APIs and Concepts + ### GEBSR Matrix Storage Format + The [General Block Compressed Sparse Row (GEBSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#gebsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is the same as for the BSR format, but the blocks in which the sparse matrix is split are not squared. All of them are of `bsr_row_dim` $\times$ `bsr_col_dim` size. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks @@ -38,40 +44,46 @@ Therefore, defining - `bsr_col_dim`: number of columns in each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_row_dim` $\cdot$ `bsr_col_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_row_dim * bsr_col_dim` to `(bsr_row_ptr[j+1]-1) * bsr_row_dim * bsr_col_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_row_dim * bsr_col_dim` to `(bsr_row_ptr[j+1]-1) * bsr_row_dim * bsr_col_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. Note that, for a given $m\times n$ matrix, if $m$ is not evenly divisible by the row block dimension or $n$ is not evenly divisible by the column block dimension then zeros are padded to the matrix so that $mb$ and $nb$ are the smallest integers greater than or equal to $`\frac{m}{\texttt{bsr\_row\_dim}}`$ and $`\frac{n}{\texttt{bsr\_col\_dim}}`$, respectively. ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. + - `rocsparse_[dscz]gebsrmv(...)` performs the sparse matrix-dense vector multiplication $\hat{y}=\alpha \cdot A' x + \beta \cdot y$ using the GEBSR format. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation trans`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + + Currently, only `rocsparse_operation_none` is supported for `rocsparse_[dscz]gebsrmv`. - Currently, only `rocsparse_operation_none` is supported for `rocsparse_[dscz]gebsrmv`. - `rocsparse_mat_descr`: descriptor of the sparse GEBSR matrix. - `rocsparse_direction` block storage major direction with the following options: - - `rocsparse_direction_column` - - `rocsparse_direction_row` + - `rocsparse_direction_column` + - `rocsparse_direction_row` ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_destroy_handle` @@ -86,6 +98,7 @@ Note that, for a given $m\times n$ matrix, if $m$ is not evenly divisible by the - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/gemvi/README.md b/Libraries/rocSPARSE/level_2/gemvi/README.md index 681aadac..5c20df5a 100644 --- a/Libraries/rocSPARSE/level_2/gemvi/README.md +++ b/Libraries/rocSPARSE/level_2/gemvi/README.md @@ -70,6 +70,7 @@ $$ The sparse vector is stored in the coordinate (COO) storage format. This works by storing the sparse vector $x$ as: + - the values $x_\text{values}$, - the coordinates (indices) of said values into $x_\text{indices}$, and - the amount of non zero values as $x_\text{non\_zero}$. @@ -77,22 +78,21 @@ This works by storing the sparse vector $x$ as: ### rocSPARSE - `rocsparse_[dscz]gemvi()`is the solver with four different function signatures depending on the type of the input matrix and vectors: - - `d` double-precision real (`double`) - - `s` single-precision real (`float`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `d` double-precision real (`double`) + - `s` single-precision real (`float`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation trans`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ - Currently, only `rocsparse_operation_none` is supported. + Currently, only `rocsparse_operation_none` is supported. - `rocsparse_index_base idx_base`: base of indices - - `rocsparse_index_base_zero`: zero based indexing - - `rocsparse_index_base_one`: one based indexing - + - `rocsparse_index_base_zero`: zero based indexing + - `rocsparse_index_base_one`: one based indexing ## Used API surface diff --git a/Libraries/rocSPARSE/level_2/spmv/README.md b/Libraries/rocSPARSE/level_2/spmv/README.md index 6416d19e..060ceaa9 100644 --- a/Libraries/rocSPARSE/level_2/spmv/README.md +++ b/Libraries/rocSPARSE/level_2/spmv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 Matrix-Vector Multiplication Example + ## Description + This example illustrates the use of the `rocSPARSE` level 2 sparse matrix-vector multiplication with a chosen sparse format (see: `rocsparse_spmv()` in [Key APIs and Concepts/rocSPARSE](#rocsparse)). The operation calculates the following product: @@ -14,6 +16,7 @@ where - $A'$ is the result of applying to matrix $A$ one of the `rocsparse_operation` described below. ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -25,80 +28,87 @@ where 9. Print result to the standard output. ## Key APIs and Concepts + ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_operation`: matrix operation applied to the given input matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. - `rocsparse_spmv()` solves a sparse matrix-vector product in the following formats: BELL, BSR, COO, COO AoS, CSR, CSC and ELL. - `rocsparse_datatype`: data type of rocSPARSE vector and matrix elements. - - `rocsparse_datatype_f32_r`: real 32-bit floating point type - - `rocsparse_datatype_f64_r`: real 64-bit floating point type - - `rocsparse_datatype_f32_c`: complex 32-bit floating point type - - `rocsparse_datatype_f64_c`: complex 64-bit floating point type - - `rocsparse_datatype_i32_r`: real 32-bit signed integer - - Mixed precision is available as: - - | $A$ and $\mathbf{x}$ | $\mathbf{y}$ | `compute_type` | - |---------------------------|----------------------------|----------------------------| - | `rocsparse_datatype_i8_r` | `rocsparse_datatype_i32_r` | `rocsparse_datatype_i32_r` | - | `rocsparse_datatype_i8_r` | `rocsparse_datatype_f32_r` | `rocsparse_datatype_f32_r` | - | `rocsparse_datatype_i8_r` | `rocsparse_datatype_i32_r` | `rocsparse_datatype_i32_r` | - - | $A$ | $\mathbf{x}$ , $\mathbf{y}$ and `compute_type` | - |----------------------------|------------------------------------------------| - | `rocsparse_datatype_f32_r` | `rocsparse_datatype_i32_r` | - | `rocsparse_datatype_f64_r` | `rocsparse_datatype_i64_r` | + - `rocsparse_datatype_f32_r`: real 32-bit floating point type + - `rocsparse_datatype_f64_r`: real 64-bit floating point type + - `rocsparse_datatype_f32_c`: complex 32-bit floating point type + - `rocsparse_datatype_f64_c`: complex 64-bit floating point type + - `rocsparse_datatype_i32_r`: real 32-bit signed integer + + Mixed precision is available as: + + | $A$ and $\mathbf{x}$ | $\mathbf{y}$ | `compute_type` | + |---------------------------|----------------------------|----------------------------| + | `rocsparse_datatype_i8_r` | `rocsparse_datatype_i32_r` | `rocsparse_datatype_i32_r` | + | `rocsparse_datatype_i8_r` | `rocsparse_datatype_f32_r` | `rocsparse_datatype_f32_r` | + | `rocsparse_datatype_i8_r` | `rocsparse_datatype_i32_r` | `rocsparse_datatype_i32_r` | + + | $A$ | $\mathbf{x}$ , $\mathbf{y}$ and `compute_type` | + |----------------------------|------------------------------------------------| + | `rocsparse_datatype_f32_r` | `rocsparse_datatype_i32_r` | + | `rocsparse_datatype_f64_r` | `rocsparse_datatype_i64_r` | - `rocsparse_indextype` indicates the index type of a rocSPARSE index vector. - - `rocsparse_indextype_i32`: 32-bit signed integer - - `rocsparse_indextype_i64`: 64-bit signed integer + - `rocsparse_indextype_i32`: 32-bit signed integer + - `rocsparse_indextype_i64`: 64-bit signed integer - `rocsparse_index_base` indicates the index base of indices. - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. - `rocsparse_spmv_alg`: list of SpMV algorithms. - - `rocsparse_spmv_alg_default`: default SpMV algorithm for the given format. For default algorithm, analysis step is required. - - `rocsparse_spmv_alg_bell`: algorithm for BELL matrices - - `rocsparse_spmv_alg_bsr`: algorithm for BSR matrices - - `rocsparse_spmv_alg_coo`: segmented algorithm for COO matrices - - `rocsparse_spmv_alg_coo_atomic`: atomic algorithm for COO matrices - - `rocsparse_spmv_alg_csr_adaptive`: adaptive algorithm for CSR and CSC matrices - - `rocsparse_spmv_alg_csr_stream`: stream algorithm for CSR and CSC matrices - - `rocsparse_spmv_alg_ell`: algorithm for ELL matrices - - The default algorithm for CSR and CSC matrices is `rocsparse_spmv_alg_csr_adaptive`. - - The default algorithm for COO and COO AoS is `rocsparse_spmv_alg_coo` (segmented). + - `rocsparse_spmv_alg_default`: default SpMV algorithm for the given format. For default algorithm, analysis step is required. + - `rocsparse_spmv_alg_bell`: algorithm for BELL matrices + - `rocsparse_spmv_alg_bsr`: algorithm for BSR matrices + - `rocsparse_spmv_alg_coo`: segmented algorithm for COO matrices + - `rocsparse_spmv_alg_coo_atomic`: atomic algorithm for COO matrices + - `rocsparse_spmv_alg_csr_adaptive`: adaptive algorithm for CSR and CSC matrices + - `rocsparse_spmv_alg_csr_stream`: stream algorithm for CSR and CSC matrices + - `rocsparse_spmv_alg_ell`: algorithm for ELL matrices + + The default algorithm for CSR and CSC matrices is `rocsparse_spmv_alg_csr_adaptive`. + + The default algorithm for COO and COO AoS is `rocsparse_spmv_alg_coo` (segmented). + - `rocsparse_spmat_descr`: sparse matrix descriptor. -- `rocsparse_create_[bell|coo|coo_aos|csr|csc|ell]_descr` creates a sparse matrix descriptor in BELL, COO, COO AoS, CSR, CSC or ELL format. +- `rocsparse_create_[bell|coo|coo_aos|csr|csc|ell]_descr` creates a sparse matrix descriptor in BELL, COO, COO AoS, CSR, CSC or ELL format. + + We used COO format in the example. - We used COO format in the example. + The descriptor should be destroyed at the end by `rocsparse_destroy_spmat_descr`. - The descriptor should be destroyed at the end by `rocsparse_destroy_spmat_descr`. - `rocsparse_destroy_spmat_descr`: Destroy a sparse matrix descriptor and release used resources allocated by the descriptor. - `rocsparse_dnvec_descr` is a dense vector descriptor. -- `rocsparse_create_dnvec_descr` creates a dense vector descriptor. +- `rocsparse_create_dnvec_descr` creates a dense vector descriptor. The descriptor should be destroyed at the end by `rocsparse_destroy_dnvec_descr`. + - `rocsparse_destroy_dnvec_descr` destroys a dense vector descriptor. -- `rocsparse_spmv_stage`: list of possible stages during SpMV computation. Typical order is `rocsparse_spmv_buffer_size`, `rocsparse_spmv_preprocess`, `rocsparse_spmv_compute`. - - `rocsparse_spmv_stage_buffer_size` returns the required buffer size. - - `rocsparse_spmv_stage_preprocess` preprocesses data. - - `rocsparse_spmv_stage_compute` performs the actual SpMV computation. - - `rocsparse_spmv_stage_auto`: automatic stage detection. - - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. - - Otherwise, the SpMV preprocess and the SpMV algorithm will be executed. +- `rocsparse_spmv_stage`: list of possible stages during SpMV computation. Typical order is `rocsparse_spmv_buffer_size`, `rocsparse_spmv_preprocess`, `rocsparse_spmv_compute`. + - `rocsparse_spmv_stage_buffer_size` returns the required buffer size. + - `rocsparse_spmv_stage_preprocess` preprocesses data. + - `rocsparse_spmv_stage_compute` performs the actual SpMV computation. + - `rocsparse_spmv_stage_auto`: automatic stage detection. + - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. + - Otherwise, the SpMV preprocess and the SpMV algorithm will be executed. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_coo_descr` - `rocsparse_create_dnvec_descr` - `rocsparse_create_handle` @@ -125,6 +135,7 @@ where - `rocsparse_spmv_stage_preprocess` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_2/spsv/README.md b/Libraries/rocSPARSE/level_2/spsv/README.md index d3603aba..674e748a 100644 --- a/Libraries/rocSPARSE/level_2/spsv/README.md +++ b/Libraries/rocSPARSE/level_2/spsv/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 2 Triangular Solver Example + ## Description + This example illustrates the use of the `rocSPARSE` level 2 triangular solver with a chosen sparse format. This triangular solver is used to solve a linear system of the form @@ -12,14 +14,15 @@ where - $A$ is a sparse triangular matrix of order $n$ whose elements are the coefficients of the equations, - $A'$ is one of the following: - - $A' = A$ (identity) - - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) - - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), + - $A' = A$ (identity) + - $A' = A^T$ (transpose $A$: $A_{ij}^T = A_{ji}$) + - $A' = A^H$ (conjugate transpose/Hermitian $A$: $A_{ij}^H = \bar A_{ji}$), - $\alpha$ is a scalar, - $x$ is a dense vector of size $n$ containing the constant terms of the equations, and - $y$ is a dense vector of size $n$ which contains the unknowns of the system. ### Application flow + 1. Setup input data. 2. Allocate device memory and offload input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -31,63 +34,67 @@ where 9. Print solution vector $y$ to the standard output. ## Key APIs and Concepts + ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. - `rocsparse_operation`: matrix operation applied to the given input matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. - `rocsparse_spsv()` solves a sparse triangular linear system of a sparse matrix in CSR or COO format. - `rocsparse_datatype`: data type of rocSPARSE vector and matrix elements. - - `rocsparse_datatype_f32_r`: real 32-bit floating point type - - `rocsparse_datatype_f64_r`: real 64-bit floating point type - - `rocsparse_datatype_f32_c`: complex 32-bit floating point type - - `rocsparse_datatype_f64_c`: complex 64-bit floating point type - - `rocsparse_datatype_i8_r`: real 8-bit signed integer - - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer - - `rocsparse_datatype_i32_r`: real 32-bit signed integer - - `rocsparse_datatype_u32_r` real 32-bit unsigned integer + - `rocsparse_datatype_f32_r`: real 32-bit floating point type + - `rocsparse_datatype_f64_r`: real 64-bit floating point type + - `rocsparse_datatype_f32_c`: complex 32-bit floating point type + - `rocsparse_datatype_f64_c`: complex 64-bit floating point type + - `rocsparse_datatype_i8_r`: real 8-bit signed integer + - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer + - `rocsparse_datatype_i32_r`: real 32-bit signed integer + - `rocsparse_datatype_u32_r` real 32-bit unsigned integer - `rocsparse_indextype` indicates the index type of a rocSPARSE index vector. - - `rocsparse_indextype_u16`: 16-bit unsigned integer - - `rocsparse_indextype_i32`: 32-bit signed integer - - `rocsparse_indextype_i64`: 64-bit signed integer + - `rocsparse_indextype_u16`: 16-bit unsigned integer + - `rocsparse_indextype_i32`: 32-bit signed integer + - `rocsparse_indextype_i64`: 64-bit signed integer - `rocsparse_index_base` indicates the index base of indices. - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. - `rocsparse_spsv_alg`: list of SpSV algorithms. - - `rocsparse_spsv_alg_default`: default SpSV algorithm for the given format (the only available option) + - `rocsparse_spsv_alg_default`: default SpSV algorithm for the given format (the only available option) - `rocsparse_spmat_descr`: sparse matrix descriptor. -- `rocsparse_create_[coo|csr]_descr` creates a sparse matrix descriptor in COO or CSR format. +- `rocsparse_create_[coo|csr]_descr` creates a sparse matrix descriptor in COO or CSR format. - We used COO format in the example. + We used COO format in the example. - The descriptor should be destroyed at the end by `rocsparse_destroy_spmat_descr`. + The descriptor should be destroyed at the end by `rocsparse_destroy_spmat_descr`. - `rocsparse_destroy_spmat_descr`: Destroy a sparse matrix descriptor and release used resources allocated by the descriptor. - `rocsparse_dnvec_descr` is a dense vector descriptor. -- `rocsparse_create_dnvec_descr` creates a dense vector descriptor. +- `rocsparse_create_dnvec_descr` creates a dense vector descriptor. The descriptor should be destroyed at the end by `rocsparse_destroy_dnvec_descr`. - `rocsparse_destroy_dnvec_descr` destroys a dense vector descriptor. -- `rocsparse_spsv_stage`: list of possible stages during SpSV computation. Typical order is `rocsparse_spsv_buffer_size`, `rocsparse_spsv_preprocess`, `rocsparse_spsv_compute`. - - `rocsparse_spsv_stage_buffer_size` returns the required buffer size. - - `rocsparse_spsv_stage_preprocess` preprocesses data. - - `rocsparse_spsv_stage_compute` performs the actual SpSV computation. - - `rocsparse_spsv_stage_auto`: automatic stage detection. - - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. - - If `buffer_size` is equal to `nullptr`, analysis will be performed. - - Otherwise, the SpSV preprocess and the SpSV algorithm will be executed. +- `rocsparse_spsv_stage`: list of possible stages during SpSV computation. Typical order is `rocsparse_spsv_buffer_size`, `rocsparse_spsv_preprocess`, `rocsparse_spsv_compute`. + - `rocsparse_spsv_stage_buffer_size` returns the required buffer size. + - `rocsparse_spsv_stage_preprocess` preprocesses data. + - `rocsparse_spsv_stage_compute` performs the actual SpSV computation. + - `rocsparse_spsv_stage_auto`: automatic stage detection. + - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. + - If `buffer_size` is equal to `nullptr`, analysis will be performed. + - Otherwise, the SpSV preprocess and the SpSV algorithm will be executed. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_coo_descr` - `rocsparse_create_dnvec_descr` - `rocsparse_create_handle` @@ -116,6 +123,7 @@ where - `rocsparse_spsv_stage_preprocess` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_3/bsrmm/README.md b/Libraries/rocSPARSE/level_3/bsrmm/README.md index 1952d9b0..5aa277e6 100644 --- a/Libraries/rocSPARSE/level_3/bsrmm/README.md +++ b/Libraries/rocSPARSE/level_3/bsrmm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level-3 BSR Matrix-Matrix Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 3 sparse matrix-matrix multiplication using BSR storage format. The operation calculates the following product: @@ -14,6 +16,7 @@ where - and $A'$ is the result of applying to matrix $A$ one of the `rocsparse_operation` described below. ## Application flow + 1. Set up a sparse matrix in BSR format. Allocate an $A$ and a $B$ matrix and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor. 3. Allocate device memory and copy input matrices from host to device. @@ -24,23 +27,27 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -138,7 +145,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -151,26 +158,30 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - `rocsparse_[sdcz]bsrmm(...)` performs a sparse matrix-dense matrix multiplication. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + + Currently, only `rocsparse_operation_none` is supported. - Currently, only `rocsparse_operation_none` is supported. - `rocsparse_mat_descr`: descriptor of the sparse BSR matrix. - + - `rocsparse_direction` block storage major direction with the following options: - - `rocsparse_direction_column` - - `rocsparse_direction_row` + - `rocsparse_direction_column` + - `rocsparse_direction_row` ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_dbsrmm` @@ -185,6 +196,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_3/bsrsm/README.md b/Libraries/rocSPARSE/level_3/bsrsm/README.md index 5cb5b784..2cad5214 100644 --- a/Libraries/rocSPARSE/level_3/bsrsm/README.md +++ b/Libraries/rocSPARSE/level_3/bsrsm/README.md @@ -1,6 +1,7 @@ # rocSPARSE Level 3 BSR Triangular Solver Example ## Description + This example illustrates the use of the `rocSPARSE` level 3 triangular solver using the BSR storage format. This triangular solver is used to solve a linear system of the form @@ -13,9 +14,9 @@ where - $A$ is a sparse triangular matrix of order $n$ whose elements are the coefficients of the equations, - given a matrix $M$, $M'$ denotes one of the following: - - $M' = M$ (identity) - - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) - - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), + - $M' = M$ (identity) + - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) + - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), - $X$ is a dense matrix of size $n \times nrhs$ which contains the unknowns of the system, and - $\alpha$ is a scalar, - $B$ is a dense matrix of size $n \times nrhs$ containing the constant terms of the equations, @@ -26,6 +27,7 @@ Obtaining the solution for such a system consists of finding concrete values of This is the same as solving the classical system of linear equations $A' x_i = \alpha b_i$, where $x_i$ and $b_i$ are the $i$-th rows or columns of $X$ and $B$, depending on the operation performed on $X$ and $B$. This is showcased in [level 2 example bsrsv](../../level_2/bsrsv/README.md). ### Application flow + 1. Set up input data. 2. Allocate device memory and copy input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -37,23 +39,27 @@ This is the same as solving the classical system of linear equations $A' x_i = \ 9. Print validation result. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -151,7 +157,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -164,34 +170,51 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. + - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. + - `rocsparse_direction dir`: matrix storage of BSR blocks. The following values are accepted: - - `rocsparse_direction_row`: parse blocks by rows. - - `rocsparse_direction_column`: parse blocks by columns. + + - `rocsparse_direction_row`: parse blocks by rows. + - `rocsparse_direction_column`: parse blocks by columns. + - `rocsparse_operation trans`: matrix operation applied to the given matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + + - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_[sdcz]bsrsm_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]bsrsm_analysis` and `rocsparse_[sdcz]bsrsm_solve` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. + - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. + - `rocsparse_[sdcz]bsrsm_solve` solves a sparse triangular linear system $A X = \alpha B$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_[sdcz]bsrsm_analysis` performs the analysis step for `rocsparse_[sdcz]bsrsm_solve`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]bsrsm_solve`. + - `rocsparse_bsrsm_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sdcz]bsrsm_solve(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_bsrsm_zero_pivot` @@ -225,6 +248,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_status_success` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_3/csrmm/README.md b/Libraries/rocSPARSE/level_3/csrmm/README.md index 6510ab31..47497557 100644 --- a/Libraries/rocSPARSE/level_3/csrmm/README.md +++ b/Libraries/rocSPARSE/level_3/csrmm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level-3 CSR Matrix-Matrix Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 3 sparse matrix-matrix multiplication using CSR storage format. The operation calculates the following product: @@ -14,6 +16,7 @@ where - and $A'$ is the result of applying to matrix $A$ one of the `rocsparse_operation` described below. ## Application flow + 1. Set up a sparse matrix in CSR format. Allocate an $A$ and a $B$ matrix and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor. 3. Allocate device memory and copy input matrices from host to device. @@ -24,21 +27,25 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -58,7 +65,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -73,19 +80,22 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - `rocsparse_[sdcz]csrmm(...)` performs a sparse matrix-dense matrix multiplication. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_dcsrmm` @@ -98,6 +108,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_3/csrsm/README.md b/Libraries/rocSPARSE/level_3/csrsm/README.md index 3c240899..53f73689 100644 --- a/Libraries/rocSPARSE/level_3/csrsm/README.md +++ b/Libraries/rocSPARSE/level_3/csrsm/README.md @@ -1,6 +1,7 @@ # rocSPARSE Level 3 CSR Triangular Solver Example ## Description + This example illustrates the use of the `rocSPARSE` level 3 triangular solver using the CSR storage format. This triangular solver is used to solve a linear system of the form @@ -13,9 +14,9 @@ where - $A$ is a sparse triangular matrix of order $n$ whose elements are the coefficients of the equations, - given a matrix $M$, $M'$ denotes one of the following: - - $M' = M$ (identity) - - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) - - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), + - $M' = M$ (identity) + - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) + - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), - $X$ is a dense matrix of size $n \times nrhs$ containing the unknowns of the systems, - $\alpha$ is a scalar, - $B$ is a dense matrix of size $n \times nrhs$ containing the right hand sides of the equations, @@ -26,6 +27,7 @@ Obtaining the solution for such a system consists of finding concrete values of This is the same as solving the classical system of linear equations $A' x_i = \alpha b_i$ for each $i\in[0, nrhs-1]$, where $x_i$ and $b_i$ are the $i$-th rows or columns of $X$ and $B$, depending on the operation performed on $X$ and $B$. This is showcased in [level 2 example csrsv](../../level_2/csrsv/README.md). ### Application flow + 1. Set up input data. 2. Allocate device memory and copy input data to device. 3. Initialize rocSPARSE by creating a handle. @@ -38,21 +40,27 @@ This is the same as solving the classical system of linear equations $A' x_i = \ 10. Print validation result. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -72,7 +80,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -87,48 +95,67 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. + - `rocsparse_operation trans`: matrix operation applied to the given matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $M' = M$. - - `rocsparse_operation_transpose`: transpose operation $M' = M^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $M' = M^\mathrm{H}$. - Currently, only `rocsparse_operation_none` and `rocsparse_operation_transpose` are supported for both $A$ and $B$ matrices. + - `rocsparse_operation_none`: identity operation $M' = M$. + - `rocsparse_operation_transpose`: transpose operation $M' = M^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $M' = M^\mathrm{H}$. + + Currently, only `rocsparse_operation_none` and `rocsparse_operation_transpose` are supported for both $A$ and $B$ matrices. + - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + + - `rocsparse_diag_type`: indicates whether the diagonal entries of a matrix are unit elements (`rocsparse_diag_type_unit`) or not (`rocsparse_diag_type_non_unit`). + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_index_base idx_base` indicates the index base of the indices. The following values are accepted: - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. + - A matrix stored using CSR format is _sorted_ if the order of the elements in the values array `csr_val` is such that the column indexes in `csr_col_ind` are in (strictly) increasing order for each row. Otherwise the matrix is _unsorted_. + - `rocsparse_csrsort` permutes and sorts a matrix in CSR format. A permutation $\sigma$ is applied to the CSR column indices array, such that the sorting is performed based the permuted array `csr_col_ind_perm` = $\sigma ($ `csr_col_ind` $)$. In this example, $\sigma$ is set as the identity permutation by calling `rocsparse_create_identity_permutation`. + - `rocsparse_[sdcz]gthr` gathers elements from a dense vector $y$ and stores them into a sparse vector $x$. In this example, we take $x = y =$ `csr_val` and $x[i] = y[\hat\sigma(i)]$ for $i \in [0, \texttt{nnz}-1]$, where $\hat\sigma$ is the composition of the sorting explained above with the permutation $\sigma$. - The correct function signature should be chosen based on the datatype of the input vector: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + The correct function signature should be chosen based on the datatype of the input vector: + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + - `rocsparse_csrsort_buffer_size` provides the size of the temporary storage buffer required by `rocsparse_csrsort`. + - `rocsparse_create_identity_permutation` initializes a given permutation vector of size $n$ as the identity permutation $`\begin{pmatrix} 0 & 1 & 2 & \cdots & n-1 \end{pmatrix}`$. + - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. + - `rocsparse_[sdcz]csrsm_solve` solves a sparse triangular linear system $A X = \alpha B$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) - The matrix $A$ must be sorted beforehand. + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + + The matrix $A$ must be sorted beforehand. + - `rocsparse_[sdcz]csrsm_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]csrsm_analysis` and `rocsparse_[sdcz]csrsm_solve` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. + - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_[sdcz]csrsm_analysis` performs the analysis step for `rocsparse_[sdcz]csrsm_solve`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]csrsm_solve`. + - `rocsparse_csrsm_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sdcz]csrsm_solve(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_create_handle` @@ -163,6 +190,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipDeviceSynchronize` - `hipFree` - `hipMalloc` diff --git a/Libraries/rocSPARSE/level_3/gebsrmm/README.md b/Libraries/rocSPARSE/level_3/gebsrmm/README.md index 5b4abb1d..62ba2fcb 100644 --- a/Libraries/rocSPARSE/level_3/gebsrmm/README.md +++ b/Libraries/rocSPARSE/level_3/gebsrmm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level-3 GEBSR Matrix-Matrix Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 3 sparse matrix-matrix multiplication using GEBSR storage format. The operation calculates the following product: @@ -14,6 +16,7 @@ where - and $A'$ is the result of applying to matrix $A$ one of the `rocsparse_operation` described below. ## Application flow + 1. Set up a sparse matrix in GEBSR format. Allocate an $A$ and a $B$ matrix and set up $\alpha$ and $\beta$ scalars. 2. Set up a handle, a matrix descriptor. 3. Allocate device memory and copy input matrices from host to device. @@ -24,10 +27,13 @@ where 8. Print result to the standard output. ## Key APIs and Concepts + ### GEBSR Matrix Storage Format + The [General Block Compressed Sparse Row (GEBSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#gebsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is the same as for the BSR format, but the blocks in which the sparse matrix is split are not squared. All of them are of `bsr_row_dim` $\times$ `bsr_col_dim` size. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks @@ -35,39 +41,43 @@ Therefore, defining - `bsr_col_dim`: number of columns in each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_row_dim` $\cdot$ `bsr_col_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_row_dim * bsr_col_dim` to `(bsr_row_ptr[j+1]-1) * bsr_row_dim * bsr_col_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_row_dim * bsr_col_dim` to `(bsr_row_ptr[j+1]-1) * bsr_row_dim * bsr_col_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. Note that, for a given $m\times n$ matrix, if $m$ is not evenly divisible by the row block dimension or $n$ is not evenly divisible by the column block dimension then zeros are padded to the matrix so that $mb$ and $nb$ are the smallest integers greater than or equal to $`\frac{m}{\texttt{bsr\_row\_dim}}`$ and $`\frac{n}{\texttt{bsr\_col\_dim}}`$, respectively. ### rocSPARSE + - `rocsparse_[sdcz]gebsrmm(...)` is the matrix-matrix multiplication solver with four different function signatures depending on the type of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$ - Currently, only `rocsparse_operation_none` is supported. + Currently, only `rocsparse_operation_none` is supported. - `rocsparse_mat_descr`: descriptor of the sparse BSR matrix. - + - `rocsparse_direction` block storage major direction with the following options: - - `rocsparse_direction_column` - - `rocsparse_direction_row` + - `rocsparse_direction_column` + - `rocsparse_direction_row` ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_destroy_handle` @@ -82,6 +92,7 @@ Note that, for a given $m\times n$ matrix, if $m$ is not evenly divisible by the - `rocsparse_operation_none` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/level_3/gemmi/README.md b/Libraries/rocSPARSE/level_3/gemmi/README.md index 920fa8ab..21349301 100644 --- a/Libraries/rocSPARSE/level_3/gemmi/README.md +++ b/Libraries/rocSPARSE/level_3/gemmi/README.md @@ -1,6 +1,7 @@ # rocSPARSE Dense Matrix Sparse Matrix Multiplication Example ## Description + This example illustrates the use of the `rocsparse_gemmi` function, which performs a dense matrix-sparse matrix multiplication and scaling. That is, it does the following operation: @@ -12,16 +13,18 @@ $$ where - given a matrix $M$, $M'$ denotes one of the following: - - $M' = M$ (identity) - - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) - - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), + + - $M' = M$ (identity) + - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) + - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), + - $A$ is a dense matrix $m \times k$, - $B$ is a sparse matrix of size $k \times n$, - $\alpha$ and $\beta$ are scalars, - $C$ is a dense matrix of size $m \times n$. - ### Application flow + 1. Set up input data. 2. Allocate device memory and copy input data to device. 3. Prepare for rocSPARSE function call by creating a handle and matrix descriptor. @@ -30,21 +33,27 @@ where 6. Free rocSPARSE resources and device memory. ## Key APIs and Concepts -#### CSR Matrix Storage Format + +### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -64,7 +73,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -79,23 +88,25 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. - `rocsparse_operation trans`: matrix operation applied to the given matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. - Currently, operation on $A$ must be `rocsparse_operation_none` and operation on $B$ must be `rocsparse_operation_transpose`. The other options are not yet supported. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. + Currently, operation on $A$ must be `rocsparse_operation_none` and operation on $B$ must be `rocsparse_operation_transpose`. The other options are not yet supported. - `rocsparse_mat_descr descr`: holds all properties of a matrix. - `rocsparse_[sdcz]gemmi` performs the operation $C = \alpha \cdot A' \cdot B' + \beta \cdot C$ for $C$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_mat_descr` - `rocsparse_destroy_handle` @@ -111,6 +122,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_set_pointer_mode` ### HIP runtime + - `hipDeviceSynchronize` - `hipFree` - `hipMalloc` diff --git a/Libraries/rocSPARSE/level_3/sddmm/README.md b/Libraries/rocSPARSE/level_3/sddmm/README.md index 9e5e04bf..50062291 100644 --- a/Libraries/rocSPARSE/level_3/sddmm/README.md +++ b/Libraries/rocSPARSE/level_3/sddmm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level-3 Sampled Dense-Dense Matrix Multiplication Example + ## Description + This example illustrates the use of the `rocSPARSE` sampled dense-dense matrix multiplication. The operation solves the following equation for C: @@ -11,10 +13,11 @@ where - $\alpha$ and $\beta$ are scalars - $A$ and $B$ are dense matrices - $C$ is a sparse matrix in CSR format, the result will be in this matrix -- $sppat(C)$ is the sparsity pattern of C matrix +- $sppat(C)$ is the sparsity pattern of C matrix - and $A'$ and $B'$ is the result of applying one of the `rocsparse_operation`s to the respective matrices. ## Application flow + 1. Set up a sparse matrix in CSR format. Allocate an $A$ and a $B$ matrices and set up $\alpha$ and $\beta$ scalars. 2. Prepare device for calculation. 3. Allocate device memory and copy input matrices from host to device. @@ -28,21 +31,26 @@ where 11. Print result to the standard output. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -62,7 +70,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -77,20 +85,23 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - `rocsparse_spsm(...)` performs a sparse matrix-dense matrix multiplication. This single function is used to run all three stages of the calculation. - - `rocsparse_spsm_stage_buffer_size` will query the size of the temporary buffer, which will hold the data between the preprocess and the compute stages. - - `rocsparse_spsm_stage_preprocess` will preprocess the data and save it in the temporary buffer. - - `rocsparse_spsm_stage_compute` will do the actual spsm calculation. - - `rocsparse_spsm_stage_auto` will figure out the current stage and run it. If `temp_buffer` pointer equals nullptr the stage will be `rocsparse_spsm_stage_buffer_size`, if `buffer_size` the function will perform the `rocsparse_spsm_stage_preprocess` stage, otherwise it will perform the `rocsparse_spsm_stage_compute`. - - The `rocsparse_spsm_stage_buffer_size` and `rocsparse_spsm_stage_compute` stages are asynchronous. It is the callers responsibility to assure that these stages are complete, before using their result. `rocsparse_spsm_stage_preprocess` will run in a blocking manner, no synchronization is necessary. + - `rocsparse_spsm_stage_buffer_size` will query the size of the temporary buffer, which will hold the data between the preprocess and the compute stages. + - `rocsparse_spsm_stage_preprocess` will preprocess the data and save it in the temporary buffer. + - `rocsparse_spsm_stage_compute` will do the actual spsm calculation. + - `rocsparse_spsm_stage_auto` will figure out the current stage and run it. If `temp_buffer` pointer equals nullptr the stage will be `rocsparse_spsm_stage_buffer_size`, if `buffer_size` the function will perform the `rocsparse_spsm_stage_preprocess` stage, otherwise it will perform the `rocsparse_spsm_stage_compute`. + - The `rocsparse_spsm_stage_buffer_size` and `rocsparse_spsm_stage_compute` stages are asynchronous. It is the callers responsibility to assure that these stages are complete, before using their result. `rocsparse_spsm_stage_preprocess` will run in a blocking manner, no synchronization is necessary. - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$. This is currently not supported. + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$. This is currently not supported. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_csr_descr` - `rocsparse_create_dnmat_descr` - `rocsparse_create_handle` @@ -117,6 +128,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_spmat_descr` ### HIP runtime + - `hipDeviceSynchronize` - `hipFree` - `hipMalloc` diff --git a/Libraries/rocSPARSE/level_3/spmm/README.md b/Libraries/rocSPARSE/level_3/spmm/README.md index 5b28dd18..2b5361de 100644 --- a/Libraries/rocSPARSE/level_3/spmm/README.md +++ b/Libraries/rocSPARSE/level_3/spmm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level-3 Matrix-Matrix Multiplication + ## Description + This example illustrates the use of the `rocSPARSE` level 3 sparse matrix-dense matrix multiplication with a chosen sparse format (see: `rocsparse_spmm()` in [Key APIs and Concepts/rocSPARSE](#rocsparse)). The operation calculates the following product: @@ -14,6 +16,7 @@ where - and $A'$ is the result of applying to matrix $A$ one of the `rocsparse_operation` described below. ## Application flow + 1. Set up a sparse matrix. Allocate an $A$ and a $B$ dense matrices and set up $\alpha$ and $\beta$ scalars. 2. Allocate device memory and copy input matrices from host to device. 3. Set up a handle. @@ -27,54 +30,56 @@ where 11. Print result to the standard output. ## Key APIs and Concepts + ### rocSPARSE + - `rocsparse_spmm(...)` performs a stage of sparse matrix-dense matrix multiplication. The current SpMM stage is defined by `rocsparse_spmm_stage`. The following sparse matrix formats are supported: Blocked ELL, COO and CSR. -- `rocsparse_spmm_stage`: list of possible stages during SpMM computation. Typical order is `rocsparse_spmm_buffer_size`, `rocsparse_spmm_preprocess`, `rocsparse_spmm_compute`. - - `rocsparse_spmm_stage_buffer_size` returns the required buffer size. - - `rocsparse_spmm_stage_preprocess` preprocesses data. - - `rocsparse_spmm_stage_compute` performs the actual SpMM computation. - - `rocsparse_spmm_stage_auto`: automatic stage detection. - - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. - - Otherwise, the SpMM preprocess and the SpMM algorithm will be executed. +- `rocsparse_spmm_stage`: list of possible stages during SpMM computation. Typical order is `rocsparse_spmm_buffer_size`, `rocsparse_spmm_preprocess`, `rocsparse_spmm_compute`. + - `rocsparse_spmm_stage_buffer_size` returns the required buffer size. + - `rocsparse_spmm_stage_preprocess` preprocesses data. + - `rocsparse_spmm_stage_compute` performs the actual SpMM computation. + - `rocsparse_spmm_stage_auto`: automatic stage detection. + - If `temp_buffer` is equal to `nullptr`, the required buffer size will be returned. + - Otherwise, the SpMM preprocess and the SpMM algorithm will be executed. - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_pointer_mode` controls whether scalar parameters must be allocated on the host (`rocsparse_pointer_mode_host`) or on the device (`rocsparse_pointer_mode_device`). It is controlled by `rocsparse_set_pointer_mode`. - `rocsparse_operation`: matrix operation applied to the given input matrix. The following values are accepted: - - `rocsparse_operation_none`: identity operation $A' = A$. - - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. - - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. + - `rocsparse_operation_none`: identity operation $A' = A$. + - `rocsparse_operation_transpose`: transpose operation $A' = A^\mathrm{T}$. + - `rocsparse_operation_conjugate_transpose`: conjugate transpose operation (Hermitian matrix) $A' = A^\mathrm{H}$. This operation is not yet supported. - `rocsparse_datatype`: data type of rocSPARSE matrix elements. - - `rocsparse_datatype_f32_r`: real 32-bit floating point type - - `rocsparse_datatype_f64_r`: real 64-bit floating point type - - `rocsparse_datatype_f32_c`: complex 32-bit floating point type - - `rocsparse_datatype_f64_c`: complex 64-bit floating point type - - `rocsparse_datatype_i8_r`: real 8-bit signed integer - - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer - - `rocsparse_datatype_i32_r`: real 32-bit signed integer - - `rocsparse_datatype_u32_r` real 32-bit unsigned integer + - `rocsparse_datatype_f32_r`: real 32-bit floating point type + - `rocsparse_datatype_f64_r`: real 64-bit floating point type + - `rocsparse_datatype_f32_c`: complex 32-bit floating point type + - `rocsparse_datatype_f64_c`: complex 64-bit floating point type + - `rocsparse_datatype_i8_r`: real 8-bit signed integer + - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer + - `rocsparse_datatype_i32_r`: real 32-bit signed integer + - `rocsparse_datatype_u32_r` real 32-bit unsigned integer - `rocsparse_indextype` indicates the index type of a rocSPARSE index vector. - - `rocsparse_indextype_u16`: 16-bit unsigned integer - - `rocsparse_indextype_i32`: 32-bit signed integer - - `rocsparse_indextype_i64`: 64-bit signed integer + - `rocsparse_indextype_u16`: 16-bit unsigned integer + - `rocsparse_indextype_i32`: 32-bit signed integer + - `rocsparse_indextype_i64`: 64-bit signed integer - `rocsparse_index_base` indicates the index base of indices. - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. - `rocsparse_spmm_alg`: list of SpMM algorithms. - - `rocsparse_spmm_alg_default`: default SpMM algorithm for the given format. For default algorithm, analysis step is required. - - `rocsparse_spmm_alg_bell`: algorithm for Blocked ELL matrices. - - `rocsparse_spmm_alg_coo_atomic`: atomic algorithm for COO matrices. - - `rocsparse_spmm_alg_coo_segmented`: algorithm for COO matrices using segmented scan. - - `rocsparse_spmm_alg_coo_segmented_atomic`: algorithm for COO format using segmented scan and atomics. - - `rocsparse_spmm_alg_csr`: algorithm for CSR format using row split and shared memory. - - `rocsparse_spmm_alg_csr_row_split`: algorithm for CSR format using row split and shfl - - `rocsparse_spmm_alg_csr_merge`: algorithm for CSR format using conversion to COO. + - `rocsparse_spmm_alg_default`: default SpMM algorithm for the given format. For default algorithm, analysis step is required. + - `rocsparse_spmm_alg_bell`: algorithm for Blocked ELL matrices. + - `rocsparse_spmm_alg_coo_atomic`: atomic algorithm for COO matrices. + - `rocsparse_spmm_alg_coo_segmented`: algorithm for COO matrices using segmented scan. + - `rocsparse_spmm_alg_coo_segmented_atomic`: algorithm for COO format using segmented scan and atomics. + - `rocsparse_spmm_alg_csr`: algorithm for CSR format using row split and shared memory. + - `rocsparse_spmm_alg_csr_row_split`: algorithm for CSR format using row split and shfl + - `rocsparse_spmm_alg_csr_merge`: algorithm for CSR format using conversion to COO. - `rocsparse_spmat_descr`: sparse matrix descriptor. - `rocsparse_create_[bell|coo|coo_aos|csr|csc|ell]_descr` creates a sparse matrix descriptor in BELL, COO or CSR format. @@ -85,14 +90,15 @@ The following sparse matrix formats are supported: Blocked ELL, COO and CSR. - `rocsparse_destroy_spmat_descr`: Destroy a sparse matrix descriptor and release used resources allocated by the descriptor. - `rocsparse_dnmat_descr` is a dense matrix descriptor. -- `rocsparse_create_dnmat_descr` creates a dense matrix descriptor. +- `rocsparse_create_dnmat_descr` creates a dense matrix descriptor. The descriptor should be destroyed at the end by `rocsparse_destroy_dnvec_descr`. - `rocsparse_destroy_dnmat_descr` destroys a dense matrix descriptor. - ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_coo_descr` - `rocsparse_create_dnmat_descr` - `rocsparse_create_handle` @@ -122,6 +128,7 @@ The following sparse matrix formats are supported: Blocked ELL, COO and CSR. - `rocsparse_spmm_stage_preprocess` ### HIP runtime + - `hipDeviceSynchronize` - `hipFree` - `hipMalloc` diff --git a/Libraries/rocSPARSE/level_3/spsm/README.md b/Libraries/rocSPARSE/level_3/spsm/README.md index b34d24a5..f4366697 100644 --- a/Libraries/rocSPARSE/level_3/spsm/README.md +++ b/Libraries/rocSPARSE/level_3/spsm/README.md @@ -1,5 +1,7 @@ # rocSPARSE Level 3 Triangular Solver Example + ## Description + This example illustrates the use of the `rocSPARSE` level 3 triangular solver with a chosen sparse format. The operation solves the following equation for $X$: @@ -9,15 +11,16 @@ $A' \cdot X = \alpha \cdot B'$ where - given a matrix $M$, $M'$ denotes one of the following: - - $M' = M$ (identity) - - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) - - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), + - $M' = M$ (identity) + - $M' = M^T$ (transpose $M$: $M_{ij}^T = M_{ji}$) + - $M' = M^H$ (conjugate transpose/Hermitian $M$: $M_{ij}^H = \bar M_{ji}$), - $A$ is a sparse triangular matrix of order $m$ in CSR or COO format, - $X$ is a dense matrix of size $m\times n$ containing the unknowns of the system, - $B$ is a dense matrix of size $m\times n$ containing the right hand side of the equation, - $\alpha$ is a scalar ## Application flow + 1. Set up a sparse matrix in CSR format. Allocate matrices $B$ and $C$ and set up the scalar $\alpha$. 2. Prepare device for calculation. 3. Allocate device memory and copy input matrices from host to device. @@ -29,21 +32,25 @@ where 9. Print result to the standard output. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -63,7 +70,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -78,42 +85,45 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - `rocsparse_spsm(...)` performs a sparse matrix-dense matrix multiplication. This single function is used to run all three stages of the calculation. - - `rocsparse_spsm_stage_buffer_size` will query the size of the temporary buffer, which will hold the data between the preprocess and the compute stages. - - `rocsparse_spsm_stage_preprocess` will preprocess the data and save it in the temporary buffer. - - `rocsparse_spsm_stage_compute` will do the actual spsm calculation. - - `rocsparse_spsm_stage_auto` will figure out the current stage and run it. If `temp_buffer` pointer equals nullptr the stage will be `rocsparse_spsm_stage_buffer_size`, if `buffer_size` the function will perform the `rocsparse_spsm_stage_preprocess` stage, otherwise it will perform the `rocsparse_spsm_stage_compute`. - - The `rocsparse_spsm_stage_buffer_size` and `rocsparse_spsm_stage_compute` stages are asynchronous. It is the callers responsibility to assure that these stages are complete, before using their result. `rocsparse_spsm_stage_preprocess` will run in a blocking manner, no synchronization is necessary. + - `rocsparse_spsm_stage_buffer_size` will query the size of the temporary buffer, which will hold the data between the preprocess and the compute stages. + - `rocsparse_spsm_stage_preprocess` will preprocess the data and save it in the temporary buffer. + - `rocsparse_spsm_stage_compute` will do the actual spsm calculation. + - `rocsparse_spsm_stage_auto` will figure out the current stage and run it. If `temp_buffer` pointer equals nullptr the stage will be `rocsparse_spsm_stage_buffer_size`, if `buffer_size` the function will perform the `rocsparse_spsm_stage_preprocess` stage, otherwise it will perform the `rocsparse_spsm_stage_compute`. + - The `rocsparse_spsm_stage_buffer_size` and `rocsparse_spsm_stage_compute` stages are asynchronous. It is the callers responsibility to assure that these stages are complete, before using their result. `rocsparse_spsm_stage_preprocess` will run in a blocking manner, no synchronization is necessary. - `rocsparse_spsm_alg`: list of SpSM algorithms. - - `rocsparse_spsm_alg_default`: default SpSM algorithm for the given format (the only available option) + - `rocsparse_spsm_alg_default`: default SpSM algorithm for the given format (the only available option) - `rocsparse_operation`: matrix operation type with the following options: - - `rocsparse_operation_none`: identity operation: $A' = A$ - - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ - - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$. This is currently not supported. + - `rocsparse_operation_none`: identity operation: $A' = A$ + - `rocsparse_operation_transpose`: transpose operation: $A' = A^\mathrm{T}$ + - `rocsparse_operation_conjugate_transpose`: Hermitian operation: $A' = A^\mathrm{H}$. This is currently not supported. - `rocsparse_datatype`: data type of rocSPARSE vector and matrix elements. - - `rocsparse_datatype_f32_r`: real 32-bit floating point type - - `rocsparse_datatype_f64_r`: real 64-bit floating point type - - `rocsparse_datatype_f32_c`: complex 32-bit floating point type - - `rocsparse_datatype_f64_c`: complex 64-bit floating point type - - `rocsparse_datatype_i8_r`: real 8-bit signed integer - - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer - - `rocsparse_datatype_i32_r`: real 32-bit signed integer - - `rocsparse_datatype_u32_r` real 32-bit unsigned integer + - `rocsparse_datatype_f32_r`: real 32-bit floating point type + - `rocsparse_datatype_f64_r`: real 64-bit floating point type + - `rocsparse_datatype_f32_c`: complex 32-bit floating point type + - `rocsparse_datatype_f64_c`: complex 64-bit floating point type + - `rocsparse_datatype_i8_r`: real 8-bit signed integer + - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer + - `rocsparse_datatype_i32_r`: real 32-bit signed integer + - `rocsparse_datatype_u32_r` real 32-bit unsigned integer - `rocsparse_indextype` indicates the index type of a rocSPARSE index vector. - - `rocsparse_indextype_u16`: 16-bit unsigned integer - - `rocsparse_indextype_i32`: 32-bit signed integer - - `rocsparse_indextype_i64`: 64-bit signed integer + - `rocsparse_indextype_u16`: 16-bit unsigned integer + - `rocsparse_indextype_i32`: 32-bit signed integer + - `rocsparse_indextype_i64`: 64-bit signed integer - `rocsparse_index_base` indicates the index base of indices. - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. ## Demonstrated API Calls + ### rocSPARSE + - `rocsparse_create_csr_descr` - `rocsparse_create_dnmat_descr` - `rocsparse_create_handle` @@ -144,6 +154,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_spsm_stage_preprocess` ### HIP runtime + - `hipDeviceSynchronize` - `hipFree` - `hipMalloc` diff --git a/Libraries/rocSPARSE/preconditioner/bsric0/README.md b/Libraries/rocSPARSE/preconditioner/bsric0/README.md index c0dbe34f..97178dcd 100644 --- a/Libraries/rocSPARSE/preconditioner/bsric0/README.md +++ b/Libraries/rocSPARSE/preconditioner/bsric0/README.md @@ -1,6 +1,7 @@ # rocSPARSE Preconditioner BSR Incomplete Cholesky Decomposition Example ## Description + This example illustrates the use of the `rocSPARSE` incomplete Cholesky factorization preconditioner using the BSR storage format. Given a Hermitian and [positive definite](https://en.wikipedia.org/wiki/Definite_matrix) matrix $A$, computing its Cholesky decomposition consists of finding a lower triangular matrix $L$ such that @@ -12,6 +13,7 @@ The _incomplete_ Cholesky decomposition is a sparse approximation of the above-m $$A \approx L \cdot L^H.$$ ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to the device. 3. Initialize rocSPARSE by creating a handle. @@ -24,23 +26,27 @@ $$A \approx L \cdot L^H.$$ 10. Print validation result. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. - This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. + This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. - `bsr_col_ind`: given $i \in [0, nnzb-1]$, `bsr_col_ind[i]` stores the column of the $i^{th}$ non-zero block in the block matrix. @@ -138,7 +144,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -151,21 +157,22 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_direction dir`: matrix storage of BSR blocks. The following values are accepted: - - `rocsparse_direction_row`: parse blocks by rows. - - `rocsparse_direction_column`: parse blocks by columns. + - `rocsparse_direction_row`: parse blocks by rows. + - `rocsparse_direction_column`: parse blocks by columns. - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. - `rocsparse_[sdcz]bsric0` computes the incomplete Cholesky factorization of a sparse BSR matrix $A$, such that $A \approx L \cdot L^H$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_[sdcz]bsric0_analysis` performs the analysis step for `rocsparse_[sdcz]bsric0`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]bsric0`. - `rocsparse_[sdcz]bsric0_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]bsric0_analysis` and `rocsparse_[sdcz]bsric0` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. - `rocsparse_bsric0_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sbcz]bsric0(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. @@ -173,6 +180,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_bsric0_zero_pivot` @@ -201,6 +209,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/preconditioner/bsrilu0/README.md b/Libraries/rocSPARSE/preconditioner/bsrilu0/README.md index 621149e8..663372a6 100644 --- a/Libraries/rocSPARSE/preconditioner/bsrilu0/README.md +++ b/Libraries/rocSPARSE/preconditioner/bsrilu0/README.md @@ -1,6 +1,7 @@ # rocSPARSE Preconditioner BSR Incomplete LU Decomposition Example ## Description + This example illustrates the use of the `rocSPARSE` incomplete LU factorization preconditioner using the BSR storage format. Given an arbitrary matrix $A$ of order $n$, computing its LU decomposition consists of finding a lower triangular matrix $L$ and a upper triangular matrix $U$ such that @@ -10,6 +11,7 @@ The _incomplete_ LU decomposition is a sparse approximation of the above-mention $$A \approx L \cdot U.$$ ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to the device. 3. Initialize rocSPARSE by creating a handle. @@ -22,21 +24,25 @@ $$A \approx L \cdot U.$$ 10. Print validation result. ## Key APIs and Concepts + ### BSR Matrix Storage Format + The [Block Compressed Sparse Row (BSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#bsr-storage-format) describes a sparse matrix using three arrays. The idea behind this storage format is to split the given sparse matrix into equal sized blocks of dimension `bsr_dim` and store those using the [CSR format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format). Because the CSR format only stores non-zero elements, the BSR format introduces the concept of __non-zero block__: a block that contains at least one non-zero element. Note that all elements of non-zero blocks are stored, even if some of them are equal to zero. Therefore, defining + - `mb`: number of rows of blocks - `nb`: number of columns of blocks - `nnzb`: number of non-zero blocks - `bsr_dim`: dimension of each block we can describe a sparse matrix using the following arrays: + - `bsr_val`: contains the elements of the non-zero blocks of the sparse matrix. The elements are stored block by block in column- or row-major order. That is, it is an array of size `nnzb` $\cdot$ `bsr_dim` $\cdot$ `bsr_dim`. - `bsr_row_ptr`: given $i \in [0, mb]$ - - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix - - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. + - if $` 0 \leq i < mb `$, `bsr_row_ptr[i]` stores the index of the first non-zero block in row $i$ of the block matrix + - if $i = mb$, `bsr_row_ptr[i]` stores `nnzb`. This way, row $j \in [0, mb)$ contains the non-zero blocks of indices from `bsr_row_ptr[j]` to `bsr_row_ptr[j+1]-1`. The corresponding values in `bsr_val` can be accessed from `bsr_row_ptr[j] * bsr_dim * bsr_dim` to `(bsr_row_ptr[j+1]-1) * bsr_dim * bsr_dim`. @@ -136,7 +142,7 @@ $$ Therefore, the BSR representation of $A$, using column-major ordering, is: -``` +```math bsr_val = { 8, 0, 7, 2, 0, 3, 0, 5, 2, 0, 1, 0, 0, 0, 0, 0 // A_{00} 4, 7, 0, 0, 0, 7, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0 // A_{10} 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 // A_{12} @@ -149,21 +155,22 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_direction dir`: matrix storage of BSR blocks. The following values are accepted: - - `rocsparse_direction_row`: parse blocks by rows. - - `rocsparse_direction_column`: parse blocks by columns. + - `rocsparse_direction_row`: parse blocks by rows. + - `rocsparse_direction_column`: parse blocks by columns. - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. - `rocsparse_[sdcz]bsrilu0` computes the incomplete LU factorization of a sparse BSR matrix $A$, such that $A \approx L \cdot U$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_[sdcz]bsrilu0_analysis` performs the analysis step for `rocsparse_[sdcz]bsrilu0`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]bsrilu0`. - `rocsparse_[sdcz]bsrilu0_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]bsrilu0_analysis` and `rocsparse_[sdcz]bsrilu0` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. - `rocsparse_bsrilu0_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sbcz]bsrilu0(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. @@ -171,6 +178,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_bsrilu0_zero_pivot` @@ -199,6 +207,7 @@ bsr_col_ind = { 0, 0, 2, 0, 1 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/preconditioner/csric0/README.md b/Libraries/rocSPARSE/preconditioner/csric0/README.md index 044ec487..3361b30c 100644 --- a/Libraries/rocSPARSE/preconditioner/csric0/README.md +++ b/Libraries/rocSPARSE/preconditioner/csric0/README.md @@ -1,6 +1,7 @@ # rocSPARSE Preconditioner CSR Incomplete Cholesky Decomposition Example ## Description + This example illustrates the use of the `rocSPARSE` incomplete Cholesky factorization preconditioner using the CSR storage format. Given a Hermitian and [positive definite](https://en.wikipedia.org/wiki/Definite_matrix) matrix $A$, computing its Cholesky decomposition consists of finding a lower triangular matrix $L$ such that @@ -12,6 +13,7 @@ The _incomplete_ Cholesky decomposition is a sparse approximation of the above-m $$A \approx L \cdot L^H.$$ ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to the device. 3. Initialize rocSPARSE by creating a handle. @@ -24,21 +26,25 @@ $$A \approx L \cdot L^H.$$ 10. Print validation result. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -58,7 +64,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -73,18 +79,19 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. - `rocsparse_[sdcz]csric0` computes the incomplete Cholesky factorization of a sparse CSR matrix $A$, such that $A \approx L \cdot L^H$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_[sdcz]csric0_analysis` performs the analysis step for `rocsparse_[sdcz]csric0`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]csric0`. - `rocsparse_[sdcz]csric0_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]csric0_analysis` and `rocsparse_[sdcz]csric0` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. - `rocsparse_csric0_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sbcz]csric0(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. @@ -92,6 +99,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_create_handle` @@ -117,6 +125,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/preconditioner/csrilu0/README.md b/Libraries/rocSPARSE/preconditioner/csrilu0/README.md index 1a9a4fe7..0e88af3f 100644 --- a/Libraries/rocSPARSE/preconditioner/csrilu0/README.md +++ b/Libraries/rocSPARSE/preconditioner/csrilu0/README.md @@ -1,6 +1,7 @@ # rocSPARSE Preconditioner CSR Incomplete LU Decomposition Example ## Description + This example illustrates the use of the `rocSPARSE` incomplete LU factorization preconditioner using the CSR storage format. Given an arbitrary matrix $A$ of order $n$, computing its LU decomposition consists of finding a lower triangular matrix $L$ and a upper triangular matrix $U$ such that @@ -10,6 +11,7 @@ The _incomplete_ LU decomposition is a sparse approximation of the above-mention $$A \approx L \cdot U.$$ ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to the device. 3. Initialize rocSPARSE by creating a handle. @@ -22,21 +24,25 @@ $$A \approx L \cdot U.$$ 10. Print validation result. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -56,7 +62,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -71,18 +77,19 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_mat_descr descr`: holds all properties of a matrix. The properties set in this example are the following: - - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. + - `rocsparse_fill_mode`: indicates whether a (triangular) matrix is lower (`rocsparse_fill_mode_lower`) or upper (`rocsparse_fill_mode_upper`) triangular. - `rocsparse_solve_policy policy`: specifies the policy to follow for triangular solvers and factorizations. The only value accepted is `rocsparse_solve_policy_auto`. - `rocsparse_analysis_policy analysis`: specifies the policy to follow for analysis data. The following values are accepted: - - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. - - `rocsparse_analysis_policy_force`: the analysis data will be re-built. + - `rocsparse_analysis_policy_reuse`: the analysis data gathered is re-used. + - `rocsparse_analysis_policy_force`: the analysis data will be re-built. - `rocsparse_[sdcz]csrilu0` computes the incomplete LU factorization of a sparse CSR matrix $A$, such that $A \approx L \cdot U$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) - `rocsparse_[sdcz]csrilu0_analysis` performs the analysis step for `rocsparse_[sdcz]csrilu0`. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]csrilu0`. - `rocsparse_[sdcz]csrilu0_buffer_size` allows to obtain the size (in bytes) of the temporary storage buffer required for the `rocsparse_[sdcz]csrilu0_analysis` and `rocsparse_[sdcz]csrilu0` functions. The character matched in `[sdcz]` coincides with the one matched in any of the mentioned functions. - `rocsparse_csrilu0_zero_pivot(rocsparse_handle, rocsparse_mat_info, rocsparse_int *position)` returns `rocsparse_status_zero_pivot` if either a structural or numerical zero has been found during the execution of `rocsparse_[sbcz]csrilu0(....)` and stores in `position` the index $i$ of the first zero pivot $A_{ii}$ found. If no zero pivot is found it returns `rocsparse_status_success`. @@ -90,6 +97,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_analysis_policy` - `rocsparse_analysis_policy_reuse` - `rocsparse_create_handle` @@ -115,6 +123,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_status_zero_pivot` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy` diff --git a/Libraries/rocSPARSE/preconditioner/csritilu0/README.md b/Libraries/rocSPARSE/preconditioner/csritilu0/README.md index ae9f71df..82683b92 100644 --- a/Libraries/rocSPARSE/preconditioner/csritilu0/README.md +++ b/Libraries/rocSPARSE/preconditioner/csritilu0/README.md @@ -1,6 +1,7 @@ # rocSPARSE Preconditioner CSR Iterative Incomplete LU Decomposition Example ## Description + This example illustrates the use of the `rocSPARSE` iterative incomplete LU factorization preconditioner using the CSR storage format. Given an arbitrary matrix $A$ of order $n$, computing its LU decomposition consists of finding a lower triangular matrix $L$ and a upper triangular matrix $U$ such that @@ -10,6 +11,7 @@ The _incomplete_ LU decomposition is a sparse approximation of the above-mention $$A \approx L \cdot U.$$ ### Application flow + 1. Set up input data. 2. Allocate device memory and offload input data to the device. 3. Initialize rocSPARSE and prepare utility variables for csritilu0 invocation. @@ -23,21 +25,25 @@ $$A \approx L \cdot U.$$ 11. Print validation result. ## Key APIs and Concepts + ### CSR Matrix Storage Format + The [Compressed Sparse Row (CSR) storage format](https://rocsparse.readthedocs.io/en/latest/usermanual.html#csr-storage-format) describes an $m \times n$ sparse matrix with three arrays. Defining + - `m`: number of rows - `n`: number of columns - `nnz`: number of non-zero elements we can describe a sparse matrix using the following arrays: + - `csr_val`: array storing the non-zero elements of the matrix. - `csr_row_ptr`: given $i \in [0, m]$ - - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix - - if $i = m$, `csr_row_ptr[i]` stores `nnz`. + - if $` 0 \leq i < m `$, `csr_row_ptr[i]` stores the index of the first non-zero element in row $i$ of the matrix + - if $i = m$, `csr_row_ptr[i]` stores `nnz`. - This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. + This way, row $j \in [0, m)$ contains the non-zero elements of indices from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. Therefore, the corresponding values in `csr_val` can be accessed from `csr_row_ptr[j]` to `csr_row_ptr[j+1]-1`. - `csr_col_ind`: given $i \in [0, nnz-1]$, `csr_col_ind[i]` stores the column of the $i^{th}$ non-zero element in the matrix. The CSR matrix is sorted by column indices in the same row, and each pair of indices appear only once. @@ -57,7 +63,7 @@ $$ Therefore, the CSR representation of $A$ is: -``` +```math m = 3 n = 5 @@ -72,58 +78,70 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } ``` ### rocSPARSE + - rocSPARSE is initialized by calling `rocsparse_create_handle(rocsparse_handle*)` and is terminated by calling `rocsparse_destroy_handle(rocsparse_handle)`. - `rocsparse_mat_descr descr`: holds all properties of a matrix. - `rocsparse_itilu0_alg` represents an iterative ILU0 (Zero Fill-in Incomplete LU) algorithm. The following values are accepted: - - `rocsparse_itilu0_alg_async_inplace`: Asynchronous iterative ILU0 algorithm with in-place storage. - - `rocsparse_itilu0_alg_async_split`: Asynchronous iterative ILU0 algorithm with explicit storage splitting. - - `rocsparse_itilu0_alg_sync_split`: Synchronous iterative ILU0 algorithm with explicit storage splitting. - - `rocsparse_itilu0_alg_sync_split_fusion`: Semi-synchronous iterative ILU0 algorithm with explicit storage splitting. - - `rocsparse_itilu0_alg_default`: same as `rocsparse_itilu0_alg_async_inplace`. + - `rocsparse_itilu0_alg_async_inplace`: Asynchronous iterative ILU0 algorithm with in-place storage. + - `rocsparse_itilu0_alg_async_split`: Asynchronous iterative ILU0 algorithm with explicit storage splitting. + - `rocsparse_itilu0_alg_sync_split`: Synchronous iterative ILU0 algorithm with explicit storage splitting. + - `rocsparse_itilu0_alg_sync_split_fusion`: Semi-synchronous iterative ILU0 algorithm with explicit storage splitting. + - `rocsparse_itilu0_alg_default`: same as `rocsparse_itilu0_alg_async_inplace`. - `rocsparse_itilu0_option`: available options to perform the iterative ILU0 algorithm. The following values are accepted: - - `rocsparse_itilu0_option_verbose` - - `rocsparse_itilu0_option_stopping_criteria`: Compute a stopping criteria. - - `rocsparse_itilu0_option_compute_nrm_correction`: Compute correction. - - `rocsparse_itilu0_option_compute_nrm_residual`: Compute residual. - - `rocsparse_itilu0_option_convergence_history`: Log convergence history. - - `rocsparse_itilu0_option_coo_format`: Use internal coordinate format. + - `rocsparse_itilu0_option_verbose` + - `rocsparse_itilu0_option_stopping_criteria`: Compute a stopping criteria. + - `rocsparse_itilu0_option_compute_nrm_correction`: Compute correction. + - `rocsparse_itilu0_option_compute_nrm_residual`: Compute residual. + - `rocsparse_itilu0_option_convergence_history`: Log convergence history. + - `rocsparse_itilu0_option_coo_format`: Use internal coordinate format. - `rocsparse_index_base idx_base` indicates the index base of the indices. The following values are accepted: - - `rocsparse_index_base_zero`: zero based indexing. - - `rocsparse_index_base_one`: one based indexing. + - `rocsparse_index_base_zero`: zero based indexing. + - `rocsparse_index_base_one`: one based indexing. - `rocsparse_datatype`: data type of rocSPARSE vector and matrix elements. - - `rocsparse_datatype_f32_r`: real 32-bit floating point type - - `rocsparse_datatype_f64_r`: real 64-bit floating point type - - `rocsparse_datatype_f32_c`: complex 32-bit floating point type - - `rocsparse_datatype_f64_c`: complex 64-bit floating point type - - `rocsparse_datatype_i8_r`: real 8-bit signed integer - - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer - - `rocsparse_datatype_i32_r`: real 32-bit signed integer - - `rocsparse_datatype_u32_r` real 32-bit unsigned integer + - `rocsparse_datatype_f32_r`: real 32-bit floating point type + - `rocsparse_datatype_f64_r`: real 64-bit floating point type + - `rocsparse_datatype_f32_c`: complex 32-bit floating point type + - `rocsparse_datatype_f64_c`: complex 64-bit floating point type + - `rocsparse_datatype_i8_r`: real 8-bit signed integer + - `rocsparse_datatype_u8_r`: real 8-bit unsigned integer + - `rocsparse_datatype_i32_r`: real 32-bit signed integer + - `rocsparse_datatype_u32_r` real 32-bit unsigned integer + - A matrix stored using CSR format is _sorted_ if the order of the elements in the values array `csr_val` is such that the column indexes in `csr_col_ind` are in (strictly) increasing order for each row. Otherwise the matrix is _unsorted_. + - `rocsparse_csrsort` permutes and sorts a matrix in CSR format. A permutation $\sigma$ is applied to the CSR column indices array, such that the sorting is performed based the permuted array `csr_col_ind_perm` = $\sigma ($ `csr_col_ind` $)$. In this example, $\sigma$ is set as the identity permutation by calling `rocsparse_create_identity_permutation`. + - `rocsparse_[sdcz]gthr` gathers elements from a dense vector $y$ and stores them into a sparse vector $x$. In this example, we take $x = y =$ `csr_val` and $x[i] = y[\hat\sigma(i)]$ for $i \in [0, \texttt{nnz}-1]$, where $\hat\sigma$ is the composition of the sorting explained above with the permutation $\sigma$. - The correct function signature should be chosen based on the datatype of the input vector: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + The correct function signature should be chosen based on the datatype of the input vector: + + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) + - `rocsparse_csrsort_buffer_size` provides the size of the temporary storage buffer required by `rocsparse_csrsort`. - `rocsparse_create_identity_permutation` initializes a given permutation vector of size $n$ as the identity permutation $`\begin{pmatrix} 0 & 1 & 2 & \cdots & n-1 \end{pmatrix}`$. + - `rocsparse_[sdcz]csritilu0_compute` computes iteratively the incomplete LU factorization of a sparse CSR matrix $A$, such that $A \approx L \cdot U$. The correct function signature should be chosen based on the datatype of the input matrix: - - `s` single-precision real (`float`) - - `d` double-precision real (`double`) - - `c` single-precision complex (`rocsparse_float_complex`) - - `z` double-precision complex (`rocsparse_double_complex`) + + - `s` single-precision real (`float`) + - `d` double-precision real (`double`) + - `c` single-precision complex (`rocsparse_float_complex`) + - `z` double-precision complex (`rocsparse_double_complex`) The matrix $A$ must be sorted beforehand. + - `rocsparse_csritilu0_buffer_size` computes the size in bytes of the buffer needed by `rocsparse_csritilu0_preprocess`, `rocsparse_[sdcz]csritilu0_compute` and `rocsparse_csritilu0_history`. The matrix $A$ must be sorted beforehand. + - `rocsparse_csritilu0_preprocess` computes the information required to run `rocsparse_[sdcz]csritilu0_compute` and stores it in the buffer. The matrix $A$ must be sorted beforehand. + - `rocsparse_[sdcz]csritilu0_history` fetches the convergence history data. The character matched in `[sdcz]` coincides with the one matched in `rocsparse_[sdcz]csritilu0_compute`. The matrix $A$ must be sorted beforehand. ## Demonstrated API Calls ### rocSPARSE + - `rocsparse_create_handle` - `rocsparse_create_identity_permutation` - `rocsparse_create_mat_descr` @@ -147,6 +165,7 @@ csr_col_ind = { 0, 1, 3, 1, 2, 0, 3, 4 } - `rocsparse_mat_descr` ### HIP runtime + - `hipFree` - `hipMalloc` - `hipMemcpy`