diff --git a/HIP-Basic/moving_average/README.md b/HIP-Basic/moving_average/README.md index 9fc26087..969012b7 100644 --- a/HIP-Basic/moving_average/README.md +++ b/HIP-Basic/moving_average/README.md @@ -1,9 +1,11 @@ # HIP-Basic Moving Average Example ## Description + This example shows the use of a kernel that computes a moving average on one-dimensional data. In a sequential program, the moving average of a given input array is found by processing the elements one by one. The average of the previous $n$ elements is called the moving average, where $n$ is called the _window size_. In this example, a kernel is implemented to compute the moving average in parallel, using the shared memory as a cache. ### Application flow + 1. Define constants to control the problem size and the kernel launch parameters. 2. Allocate and initialize the input array. This array is initialized as the sequentially increasing sequence $0, 1, 2, \ldots\mod n$. 3. Allocate the device array and copy the host array to it. @@ -11,11 +13,15 @@ This example shows the use of a kernel that computes a moving average on one-dim 5. Copy the result back to the host and validate it. As each average is computed using $n$ consecutive values from the input array, the average is computed over the values $0, 1, 2,\ldots, n - 1 $, the average of which is equal to $(n-1)/2$. ## Key APIs and Concepts + Device memory is allocated with `hipMalloc`, deallocated with `hipFree`. Copies to and from the device are made with `hipMemcpy` with options `hipMemcpyHostToDevice` and `hipMemcpyDeviceToHost`, respectively. A kernel is launched with the `myKernel<<>>()`-syntax. Shared memory is allocated in the kernel with the `__shared__` memory space specifier. ## Demonstrated API Calls + ### HIP runtime + #### Device symbols + - `__shared__` - `__syncthreads` - `blockDim` @@ -23,6 +29,7 @@ Device memory is allocated with `hipMalloc`, deallocated with `hipFree`. Copies - `threadIdx` #### Host symbols + - `__global__` - `hipFree` - `hipGetLastError` diff --git a/HIP-Basic/opengl_interop/README.md b/HIP-Basic/opengl_interop/README.md index 42f1323c..80d2aee1 100644 --- a/HIP-Basic/opengl_interop/README.md +++ b/HIP-Basic/opengl_interop/README.md @@ -1,10 +1,13 @@ # HIP-Basic OpenGL Interop Example ## Description + External device resources and other handles can be shared with HIP in order to provide interoperability between different GPU APIs. This example showcases a HIP program that interacts with OpenGL: a simple HIP kernel is used to simulate a sine wave over a grid of pointers, in a buffer that is shared with OpenGL. The resulting data is then rendered to a window as a grid of triangles using OpenGL. ### Application flow + #### Initialization + 1. A window is opened using the GLFW library 2. OpenGL is initialized: the window's context is made active, function pointers are loaded, debug output is enabled if possible. 3. A HIP device is picked that is OpenGL-interop capable with the current OpenGL context by using `hipGLGetDevices`. @@ -14,14 +17,18 @@ External device resources and other handles can be shared with HIP in order to p 7. OpenGL rendering state is bound. #### Rendering + 1. The sinewave simulation kernel is launched in order to update the OpenGL shared buffer. 2. The grid is drawn to the window's framebuffer. 3. The window's framebuffer is presented to the screen. ## Dependencies + This example has additional library dependencies besides HIP: + - [GLFW](https://glfw.org). There are three options for getting this dependency satisfied: 1. Install it through a package manager. Available for Linux, where GLFW can be installed from some of the usual package managers: + - APT: `apt-get install libglfw3-dev` - Pacman: `pacman -S glfw-x11` or `pacman -S glfw-wayland` - DNF: `dnf install glfw-devel` @@ -30,15 +37,21 @@ This example has additional library dependencies besides HIP: - APT: `apt-get install libxxf86vm-dev libxi-dev` - Pacman: `pacman -S libxi libxxf86vm` - DNF: `dnf install libXi-devel libXxf86vm-devel` + 2. Build from source. GLFW supports compilation on Windows with Visual C++ (2010 and later), MinGW and MinGW-w64 and on Linux and other Unix-like systems with GCC and Clang. Please refer to the [compile guide](https://www.glfw.org/docs/latest/compile.html) for a complete guide on how to do this. Note: not only it should be built as explained in the guide, but it is additionally needed to build with the install target (`cmake --build --target install`). + 3. Get the pre-compiled binaries from its [download page](https://www.glfw.org/download). Available for Windows. - Depending on the build tool used, some extra steps may be needed: - - If using CMake, the `glfw3Config.cmake` and `glfw3Targets.cmake` files must be in a path that CMake searches by default or must be passed using `-DCMAKE_MODULE_PATH`. The official GLFW3 binaries do not ship these files on Windows, and so GLFW must either be compiled manually or obtained from [vcpkg](https://vcpkg.io/), which does ship the required cmake files. - - If the former approach is selected, CMake will be able to find GLFW on Windows if the environment variable `GLFW3_DIR` (or the cmake option `-DCMAKE_PREFIX_PATH`) is set to (contain) the folder owning `glfw3Config.cmake` and `glfw3Targets.cmake`. For instance, if GLFW was installed in `C:\Program Files(x86)\GLFW\`, this will most surely be something like `C:\Program Files (x86)\GLFW\lib\cmake\glfw3\`. - - If the latter, the vcpkg toolchain path should be passed to CMake using `-DCMAKE_TOOLCHAIN_FILE="/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"`. - - If using Visual Studio, the easiest way to obtain GLFW is by installing `glfw3` from vcpkg. Alternatively, the appropriate path to the GLFW3 library and header directories can be set in `Properties->Linker->General->Additional Library Directories` and `Properties->C/C++->General->Additional Include Directories`. When using this method, the appropriate name for the GLFW library should also be updated under `Properties->C/C++->Linker->Input->Additional Dependencies`. For instance, if the path to the root folder of the Windows binaries installation was `C:\glfw-3.3.8.bin.WIN64\` and we set `GLFW_DIR` with this path, the project configuration file (`.vcxproj`) should end up containing something similar to the following: - ``` + Depending on the build tool used, some extra steps may be needed: + + - If using CMake, the `glfw3Config.cmake` and `glfw3Targets.cmake` files must be in a path that CMake searches by default or must be passed using `-DCMAKE_MODULE_PATH`. The official GLFW3 binaries do not ship these files on Windows, and so GLFW must either be compiled manually or obtained from [vcpkg](https://vcpkg.io/), which does ship the required cmake files. + + - If the former approach is selected, CMake will be able to find GLFW on Windows if the environment variable `GLFW3_DIR` (or the cmake option `-DCMAKE_PREFIX_PATH`) is set to (contain) the folder owning `glfw3Config.cmake` and `glfw3Targets.cmake`. For instance, if GLFW was installed in `C:\Program Files(x86)\GLFW\`, this will most surely be something like `C:\Program Files (x86)\GLFW\lib\cmake\glfw3\`. + - If the latter, the vcpkg toolchain path should be passed to CMake using `-DCMAKE_TOOLCHAIN_FILE="/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake"`. + + - If using Visual Studio, the easiest way to obtain GLFW is by installing `glfw3` from vcpkg. Alternatively, the appropriate path to the GLFW3 library and header directories can be set in `Properties->Linker->General->Additional Library Directories` and `Properties->C/C++->General->Additional Include Directories`. When using this method, the appropriate name for the GLFW library should also be updated under `Properties->C/C++->Linker->Input->Additional Dependencies`. For instance, if the path to the root folder of the Windows binaries installation was `C:\glfw-3.3.8.bin.WIN64\` and we set `GLFW_DIR` with this path, the project configuration file (`.vcxproj`) should end up containing something similar to the following: + + ```xml ... @@ -55,31 +68,44 @@ This example has additional library dependencies besides HIP: ``` ## Key APIs and Concepts + - `hipGLGetDevices(unsigned int* pHipDeviceCount, int* pHipDevices, unsigned int hipDeviceCount, hipGLDeviceList deviceList)` can be used to query which HIP devices can be used to share resources with the current OpenGL context. A device returned by this function must be selected using `hipSetDevice` or a stream must be created from such a device before OpenGL interop is possible. + - `hipGraphicsGLRegisterBuffer(hipGraphicsResource_t* resource, GLuint buffer, unsigned int flags)` is used to import an OpenGL buffer into HIP. `flags` affects how the resource is used in HIP. For example: -| flag | effect | -| -------------------------------------- | ----------------------------------------------- | -| `hipGraphicsRegisterFlagsNone` | HIP functions may read and write to the buffer. | -| `hipGraphicsRegisterFlagsReadOnly` | HIP functions may only read from the buffer. | -| `hiPGraphicsRegisterFlagsWriteDiscard` | HIP functions may only write to the buffer. | + + | flag | effect | + | -------------------------------------- | ----------------------------------------------- | + | `hipGraphicsRegisterFlagsNone` | HIP functions may read and write to the buffer. | + | `hipGraphicsRegisterFlagsReadOnly` | HIP functions may only read from the buffer. | + | `hiPGraphicsRegisterFlagsWriteDiscard` | HIP functions may only write to the buffer. | + - `hipGraphicsMapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream = 0)` is used to make imported OpenGL resources available to a HIP device, either the current device or a device used by a specific stream. + - `hipGraphicsResourceGetMappedPointer(void** pointer, size_t* size, hipGraphicsResource_t resource)` is used to query the device pointer that represents the memory backing the OpenGL resource. The resulting pointer may be used as any other device pointer, like those obtained from `hipMalloc`. + - `hipGraphicsUnmapResources(int count, hipGraphicsResource_t* resources, hipStream_t stream = 0)` is used to unmap an imported resources from a HIP device or stream. + - `hipGraphicsUnregisterResource(hipGraphicsResource_t resource)` is used to unregister a previously imported OpenGL resource, so that it is no longer shared with HIP. ## Caveats + ### Multi-GPU systems + When using OpenGL-HIP interop on multi-gpu systems, the OpenGL context must be created with the device that should be used for rendering. This is not done in this example for brevity, but is required in specific scenarios. For example, consider a multi-gpu machine with an AMD and an NVIDIA GPU: when this example is compiled for the HIP runtime, it must be launched such that the AMD GPU is used to render. A simple workaround is to launch the program from the monitor that is physically connected to the GPU to use. For multi-gpu laptops running Linux with an integrated AMD or Intel GPU and an NVIDIA dedicated gpu, the example must be launched with `__GLX_VENDOR_LIBRARY_NAME=nvidia` when compiling for NVIDIA. ## Demonstrated API Calls + ### HIP runtime + #### Device symbols + - `threadIdx` - `blockIdx` - `blockDim` - `__global__` #### Host symbols + - `hipGetDeviceProperties` - `hipGetLastError` - `hipGLDeviceListAll` diff --git a/HIP-Basic/runtime_compilation/README.md b/HIP-Basic/runtime_compilation/README.md index a1c7e39e..23467970 100644 --- a/HIP-Basic/runtime_compilation/README.md +++ b/HIP-Basic/runtime_compilation/README.md @@ -7,6 +7,7 @@ Runtime compilation allows compiling fragments of source code to machine code at This example showcases how to make use of hipRTC to compile in runtime a kernel and launch it on a device. This kernel is a simple SAXPY, i.e. a single-precision operation $y_i=ax_i+y_i$. ### Application flow + The diagram below summarizes the runtime compilation part of the example. 1. A number of variables are declared and defined to configure the program which will be compiled in runtime. 2. The program is created using the above variables as parameters, along with the SAXPY kernel in string form. @@ -27,33 +28,39 @@ The diagram below summarizes the runtime compilation part of the example. 17. The first few elements of the result vector $y$ are printed to the standard output. ![hiprtc.svg](hiprtc.svg) + ## Key APIs and Concepts + - `hipGetDeviceProperties` extracts the properties of the desired device. In this example it is used to get the GPU architecture. - `hipModuleGetFunction` extracts a handle for a function with a certain name from a given module. Note that if no function with that name is present in the module this method will return an error. - `hipModuleLaunchKernel` queues the launch of the provided kernel on the device. This function normally presents an asynchronous behaviour (see `HIP_LAUNCH_BLOCKING`), i.e. a call to it may return before the device finishes the execution of the kernel. Its parameters are the following: - - The kernel to be launched. - - Number of blocks in the dimension X of kernel grid, i.e. the X component of grid size. - - Number of blocks in the dimension Y of kernel grid, i.e. the Y component of grid size. - - Number of blocks in the dimension Z of kernel grid, i.e. the Z component of grid size. - - Number of threads in the dimension X of each block, i.e. the X component of block size. - - Number of threads in the dimension Y of each block, i.e. the Y component of block size. - - Number of threads in the dimension Z of each block, i.e. the Z component of block size. - - Amount of dynamic shared memory that will be available to each workgroup, in bytes. Not used in this example. - - The device stream, on which the kernel should be dispatched. If 0 (or NULL), the NULL stream will be used. In this example the latter is used. - - Pointer to the arguments needed by the kernel. Note that this parameter is not yet implemented, and thus the _extra_ parameter (the last one described in this list) should be used to pass arguments to the kernel. - - Pointer to all extra arguments passed to the kernel. They must be in the memory layout and alignment expected by the kernel. The list of arguments must end with `HIP_LAUNCH_PARAM_END`. + + - The kernel to be launched. + - Number of blocks in the dimension X of kernel grid, i.e. the X component of grid size. + - Number of blocks in the dimension Y of kernel grid, i.e. the Y component of grid size. + - Number of blocks in the dimension Z of kernel grid, i.e. the Z component of grid size. + - Number of threads in the dimension X of each block, i.e. the X component of block size. + - Number of threads in the dimension Y of each block, i.e. the Y component of block size. + - Number of threads in the dimension Z of each block, i.e. the Z component of block size. + - Amount of dynamic shared memory that will be available to each workgroup, in bytes. Not used in this example. + - The device stream, on which the kernel should be dispatched. If 0 (or NULL), the NULL stream will be used. In this example the latter is used. + - Pointer to the arguments needed by the kernel. Note that this parameter is not yet implemented, and thus the _extra_ parameter (the last one described in this list) should be used to pass arguments to the kernel. + - Pointer to all extra arguments passed to the kernel. They must be in the memory layout and alignment expected by the kernel. The list of arguments must end with `HIP_LAUNCH_PARAM_END`. + - `hipModuleLoadData` builds a module from a code (compiled binary) object residing in host memory and loads it into the current context. Note that in this example this function is called right after `hipMalloc`. This is due to the fact that, on CUDA, `hipModuleLoadData` will fail if it is not called after some runtime API call is done (as it will implicitly intialize a current context) or if there is not an explicit creation of a (current) context. - `hipModuleUnload` unloads the specified module from the current context and frees it. - `hiprtcCompileProgram` compiles the given program in runtime. Some compilation options may be passed as parameters to this function. In this example, the GPU architeture is the only compilation option. - `hiprtcCreateProgram` instantiates a runtime compilation program from the given parameters. Those are the following: - - The runtime compilation program object that will be set with the new instance. - - A pointer to the program source code. - - A pointer to the program name. - - The number of headers to be included. - - An array of pointers to the headers names. - - An array of pointers to the names to be included in the source program. + + - The runtime compilation program object that will be set with the new instance. + - A pointer to the program source code. + - A pointer to the program name. + - The number of headers to be included. + - An array of pointers to the headers names. + - An array of pointers to the names to be included in the source program. In this example the program is created including two header files to illustrate how to pass all of the above arguments to this function. + - `hiprtcDestroyProgram` destroys an instance of a given runtime compilation program object. - `hiprtcGetProgramLog` extracts the char pointer to the log generated during the compilation of a given runtime compilation program. - `hiprtcGetProgramLogSize` returns the compilation log size of a given runtime compilation program, measured as number of characters. @@ -65,9 +72,11 @@ The diagram below summarizes the runtime compilation part of the example. ### HIP runtime #### Device symbols + - `threadIdx`, `blockIdx`, `blockDim` #### Host symbols + - `hipFree` - `hipGetDeviceProperties` - `hipGetLastError`