Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocFFT and hipFFT examples (part II) #160

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions Common/example_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,46 @@ void multiply_matrices(T alpha,
}
}

/// \brief Prints an {1,2,3}-dimensional array. The last dimension (fastest-index) specified in
/// \p n will be printed horizontally.
///
/// By default a row-major layout of the data is assumed. When printing data in column-major
/// layout, the \p column_major parameter must be set to \p true for a correct interpretation
/// of the dimensions' sizes.
template<class Tdata, class Tsize>
void print_nd_data(const std::vector<Tdata>& data,
std::vector<Tsize> np,
const int column_width = 4,
const bool column_major = false)
{
if(column_major)
{
std::reverse(np.begin(), np.end());
}
const std::vector<Tsize> n(np);
// Note: we want to print the last dimension horizontally (on the x-axis)!
int size_x = n[n.size() - 1];
int size_y = n.size() > 1 ? n[n.size() - 2] : 1;
int size_z = n.size() > 2 ? n[n.size() - 3] : 1;
for(int z = 0; z < size_z; ++z)
{
for(int y = 0; y < size_y; ++y)
{
for(int x = 0; x < size_x; ++x)
{
auto index = (z * size_y + y) * size_x + x;
std::cout << std::setfill(' ') << std::setw(column_width) << data[index] << " ";
}
std::cout << "\n";
}
if(z != size_z - 1)
{
std::cout << "\n";
}
}
std::cout << std::flush;
}

/// \brief Returns a string from the double \p value with specified \p precision .
inline std::string
double_precision(const double value, const int precision, const bool fixed = false)
Expand Down
29 changes: 0 additions & 29 deletions Common/hipfft_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@

#include <hipfft/hipfft.h>

#include <iomanip>
#include <iostream>

/// \brief Converts a \p hipfftResult_t variable to its correspondent string.
Expand Down Expand Up @@ -70,32 +69,4 @@ inline const char* hipfftResultToString(hipfftResult_t status)
} \
}

/// \brief Prints an {1,2,3}-dimensional array. The last dimension (fastest-index) specified in
/// \p n will be printed horizontally.
template<class T>
void print_nd_data(const std::vector<T> data, const std::vector<int> n, const int column_width = 4)
{
// Note: we want to print the last dimension horizontally (on the x-axis)!
int size_x = n[n.size() - 1];
int size_y = n.size() > 1 ? n[n.size() - 2] : 1;
int size_z = n.size() > 2 ? n[n.size() - 3] : 1;
for(int z = 0; z < size_z; ++z)
{
for(int y = 0; y < size_y; ++y)
{
for(int x = 0; x < size_x; ++x)
{
auto index = (z * size_y + y) * size_x + x;
std::cout << std::setfill(' ') << std::setw(column_width) << data[index] << " ";
}
std::cout << "\n";
}
if(z != size_z - 1)
{
std::cout << "\n";
}
}
std::cout << std::flush;
}

#endif // COMMON_HIPFFT_UTILS_HPP
29 changes: 29 additions & 0 deletions Common/rocfft_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@

#include "example_utils.hpp"

#include <hip/hip_complex.h>
#include <rocfft/rocfft.h>

#include <iostream>
#include <numeric>

/// \brief Converts a \p rocfft_status variable to its correspondent string.
inline const char* rocfftStatusToString(rocfft_status status)
Expand Down Expand Up @@ -64,4 +66,31 @@ inline const char* rocfftStatusToString(rocfft_status status)
} \
}

std::ostream& operator<<(std::ostream& stream, hipDoubleComplex c)
{
stream << "(" << c.x << "," << c.y << ")";
return stream;
}

/// \brief Increment the index (column-major) for looping over arbitrary dimensional loops with
/// dimensions \p length. Returns a bool if end of increment has been reached.
template<class T1, class T2>
bool increment_cm(std::vector<T1>& index, const std::vector<T2>& length)
{
for(unsigned int idim = 0; idim < length.size(); ++idim)
{
if(index[idim] < length[idim])
{
if(++index[idim] == length[idim])
{
index[idim] = 0;
continue;
}
break;
}
}
// End the loop when we get back to the start:
return !std::all_of(index.begin(), index.end(), [](int i) { return i == 0; });
}

#endif // COMMON_ROCFFT_UTILS_HPP
1 change: 1 addition & 0 deletions Libraries/hipFFT/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,6 @@ if(NOT hipfft_FOUND)
return()
endif()

add_subdirectory(multi_gpu)
add_subdirectory(plan_d2z)
add_subdirectory(plan_z2z)
1 change: 1 addition & 0 deletions Libraries/hipFFT/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# SOFTWARE.

EXAMPLES := \
multi_gpu \
plan_d2z \
plan_z2z

Expand Down
1 change: 1 addition & 0 deletions Libraries/hipFFT/multi_gpu/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
hipfft_multi_gpu
82 changes: 82 additions & 0 deletions Libraries/hipFFT/multi_gpu/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# MIT License
#
# Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

set(example_name hipfft_multi_gpu)

cmake_minimum_required(VERSION 3.21 FATAL_ERROR)
project(hipfft_multi_gpu LANGUAGES CXX)

set(GPU_RUNTIME "HIP" CACHE STRING "Switches between HIP and CUDA")
set(GPU_RUNTIMES "HIP" "CUDA")
set_property(CACHE GPU_RUNTIME PROPERTY STRINGS ${GPU_RUNTIMES})

if(NOT "${GPU_RUNTIME}" IN_LIST GPU_RUNTIMES)
message(
FATAL_ERROR
"Only the following values are accepted for GPU_RUNTIME: ${GPU_RUNTIMES}"
)
endif()

if(WIN32)
message(
WARNING
"hipFFT multi_gpu example is not supported on Windows. Skipping!"
)
return()
endif()

enable_language(${GPU_RUNTIME})
set(CMAKE_${GPU_RUNTIME}_STANDARD 17)
set(CMAKE_${GPU_RUNTIME}_EXTENSIONS OFF)
set(CMAKE_${GPU_RUNTIME}_STANDARD_REQUIRED ON)

list(APPEND CMAKE_PREFIX_PATH "${ROCM_ROOT}")

# Duplicate 'find_package(hipfft)' calls do not convert to 'nop' properly.
if(NOT hipfft_FOUND)
find_package(hipfft REQUIRED)
endif()

add_executable(${example_name} main.cpp)
# Make example runnable using ctest
add_test(NAME ${example_name} COMMAND ${example_name})

target_link_libraries(${example_name} PRIVATE hip::hipfft)

target_include_directories(${example_name} PRIVATE "../../../Common")
set_source_files_properties(main.cpp PROPERTIES LANGUAGE ${GPU_RUNTIME})

if(WIN32)
target_compile_definitions(${example_name} PRIVATE WIN32)
endif()

install(TARGETS ${example_name})
if(CMAKE_SYSTEM_NAME MATCHES Windows)
install(IMPORTED_RUNTIME_ARTIFACTS hip::hipfft)
if(GPU_RUNTIME STREQUAL "HIP")
find_package(rocfft REQUIRED)
install(IMPORTED_RUNTIME_ARTIFACTS roc::rocfft)
elseif(GPU_RUNTIME STREQUAL "CUDA")
find_package(CUDAToolkit REQUIRED)
install(IMPORTED_RUNTIME_ARTIFACTS CUDA::cufft)
endif()
endif()
67 changes: 67 additions & 0 deletions Libraries/hipFFT/multi_gpu/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# MIT License
#
# Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

EXAMPLE := hipfft_multi_gpu
COMMON_INCLUDE_DIR := ../../../Common
GPU_RUNTIME := HIP

# HIP variables
ROCM_INSTALL_DIR := /opt/rocm
CUDA_INSTALL_DIR := /usr/local/cuda

HIP_INCLUDE_DIR := $(ROCM_INSTALL_DIR)/include
HIPCUB_INCLUDE_DIR := $(HIP_INCLUDE_DIR)

HIPCXX ?= $(ROCM_INSTALL_DIR)/bin/hipcc
CUDACXX ?= $(CUDA_INSTALL_DIR)/bin/nvcc

# Common variables and flags
CXX_STD := c++17
ICXXFLAGS := -std=$(CXX_STD)
ICPPFLAGS := -isystem $(HIPCUB_INCLUDE_DIR) -I $(COMMON_INCLUDE_DIR)
ILDFLAGS := -L $(ROCM_INSTALL_DIR)/lib
ILDLIBS := -lhipfft

ifeq ($(GPU_RUNTIME), CUDA)
ICXXFLAGS += -x cu
ICPPFLAGS += -isystem $(HIP_INCLUDE_DIR) -D__HIP_PLATFORM_NVIDIA__
COMPILER := $(CUDACXX)
else ifeq ($(GPU_RUNTIME), HIP)
CXXFLAGS ?= -Wall -Wextra
ICPPFLAGS += -D__HIP_PLATFORM_AMD__
COMPILER := $(HIPCXX)
else
$(error GPU_RUNTIME is set to "$(GPU_RUNTIME)". GPU_RUNTIME must be either CUDA or HIP)
endif

ICXXFLAGS += $(CXXFLAGS)
ICPPFLAGS += $(CPPFLAGS)
ILDFLAGS += $(LDFLAGS)
ILDLIBS += $(LDLIBS)

$(EXAMPLE): main.cpp $(COMMON_INCLUDE_DIR)/example_utils.hpp
$(COMPILER) $(ICXXFLAGS) $(ICPPFLAGS) $(ILDFLAGS) -o $@ $< $(ILDLIBS)

clean:
$(RM) $(EXAMPLE)

.PHONY: clean
71 changes: 71 additions & 0 deletions Libraries/hipFFT/multi_gpu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# hipFFT Mutli GPU Example

## Description

This example showcases how to execute a 2-dimensional complex-to-complex fast Fourier
transform (FFT) with multiple GPUs. Note that the API used is experimental and requires
at least ROCm 6.0.

### Application flow

1. Define the various input parameters.
2. Generate the input data on host.
3. Initialize the FFT plan handle.
4. Set up multi GPU execution.
5. Make the 2D FFT plan.
6. Allocate memory on device.
7. Copy data from host to device.
8. Execute multi GPU FFT from plan.
9. Copy data from device to host.
10. Clean up.

### Command line interface

The application provides the following optional command line arguments:

- `-l` or `--length`. The 3-D FFT size separated by spaces. It default value is `8 8 8`.
- `-d` or `--devices`. The list of devices to use separated by spaces. It default value is `0 1`.

## Key APIs and Concepts

- The `hipfftHandle` needs to be created with `hipfftCreate(...)` before use and destroyed with `hipfftDestroy(...)` after use.
It can be associated with multiple GPUs.
- `hipfftXtSetGPUs` instructs a plan to use multiple GPUs.
- `hipfhipfftMakePlan2dftPlan2d` is used to create a plan for a 2-dimensional FFT.
- `hipfftXtExecDescriptor` can execute a multi GPU plan.
- Device memory management:
- `hipfftXtMalloc` allocates device memory for a plan associated with multiple devices.
- `hipLibXtDesc` holds the handles to device memory on multiple devices.
- `hipfftXtMemcpy` can copy data between a contiguous host buffer and `hipLibXtDesc`, or between two `hipLibXtDesc`.
- The memory allocated on device can be freed with `hipfftXtFree`.

## Demonstrated API Calls

### hipFFT

- `HIPFFT_FORWARD`
- `hipfftCreate`
- `hipfftDestroy`
- `hipfftHandle`
- `hipfftMakePlan2d`
- `hipfftSetStream`
- `hipfftType`
- `HIPFFT_Z2Z`
- `hipfftXtCopyType`
- `HIPFFT_COPY_DEVICE_TO_HOST`
- `HIPFFT_COPY_HOST_TO_DEVICE`
- `hipfftXtFree`
- `hipfftXtMalloc`
- `hipfftXtMemcpy`
- `hipfftXtSetGPUs`
- `hipfftXtSubFormat`
- `HIPFFT_XT_FORMAT_INPUT`
- `HIPFFT_XT_FORMAT_OUTPUT`
- `hipfftXtExecDescriptor`

### HIP runtime

- `hipGetDeviceCount`
- `hipStream_t`
- `hipStreamCreate`
- `hipStreamDestroy`
Loading
Loading