-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocFFT and hipFFT examples (part I) #141
rocFFT and hipFFT examples (part I) #141
Conversation
2be7d9c
to
74845d4
Compare
74845d4
to
86de806
Compare
86de806
to
9fb4ec8
Compare
install(IMPORTED_RUNTIME_ARTIFACTS roc::rocfft) | ||
elseif(GPU_RUNTIME STREQUAL "CUDA") | ||
find_package(CUDAToolkit REQUIRED) | ||
install(IMPORTED_RUNTIME_ARTIFACTS CUDA::cusolver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure you're looking for cufft here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed this!
Is there any CI about this compiling and the examples actually running? |
What CI is there for these examples? |
Hi @malcolmroberts @evetsso, AFAIK the only CI in place for the examples in GitHub is the one for linting (.github/workflows/linting.yml). Additionally, we (StreamHPC) have our own internal CI, where we build and test the examples. I think, but I'm not 100% sure, that there is also an internal CI on AMD's side, perhaps @dgaliffiAMD can provide more details. |
Hi @malcolmroberts , It is as @Beanavil says, the external CIs are just for linters. The basic GitHub runners couldn't complete the build; they ran out of disk space trying to install ROCm. Thanks, |
install(IMPORTED_RUNTIME_ARTIFACTS roc::rocfft) | ||
elseif(GPU_RUNTIME STREQUAL "CUDA") | ||
find_package(CUDAToolkit REQUIRED) | ||
install(IMPORTED_RUNTIME_ARTIFACTS CUDA::cusolver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing that Malcolm pointed out - this should be cufft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Libraries/rocFFT/callback/main.cpp
Outdated
// Prepare callback | ||
load_callback_data callback_data_host; | ||
callback_data_host.filter = callback_filter_dev; | ||
callback_data_host.scale = 1.0 / static_cast<double>(N); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rocFFT has an explicit API to perform result scaling: https://rocm.docs.amd.com/projects/rocFFT/en/latest/how-to/working-with-rocfft.html#result-scaling
hipFFT also exposes an API to do this.
This API is expected to perform better than callbacks, though I realize that the rocFFT repository does not currently have an example to demonstrate its usage. The explicit API is quicker because the compiler is able to understand and optimize the extra scaling multiplication - with callback functions the runtime only receives an opaque function pointer and the compiler cannot optimize the code as well.
AFAICT result scaling is the most common use case for callback functions in the API. I think it would make sense to have an example demonstrating use of the explicit result scaling API, and then this callback example can be repurposed for a different operation.
I don't know which operation would be best though - aside from result scaling, I'm unaware of a commonly-used operation that people would want to do in a callback function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, we hadn't noticed this functionality before. Perhaps, given that the callback applies a filter and a scaling factor, we can keep the filtering and just use the scaling-specific API you mentioned for the scaling part. I think that still would make sense as a use-case for callbacks, and then we also get to showcase roc/hipFFT's scaling API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's reasonable, since rocFFT has no API for filtering currently. The only issue is that we wouldn't really see the performance benefit of using the result scaling API, since callbacks are still in use. We hope to have a better way to express filtering (and other spectral operations) eventually, but it'll take a while to get there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to answer here, the example should be updated already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
One general question I have about these samples - sometimes, the main source code file is called main.hip
, and sometimes it's main.cpp
. Is there any system behind this choice?
It seems to me that a source file that contains HIP code (i.e. __global__
or __device__
) can be justified to be named .hip
. Otherwise, .cpp
makes more sense. But it doesn't really matter in the end since we're using set_source_file_properties
in all the CMakeLists.txt anyway to override the language.
Out of all these examples, really only the callback one contains any HIP code.
If it has to be compiled as HIP code, because it contains device code, then the extension is |
@evetsso sorry for the late answer, I was off these past few days. I hadn't noticed the extensions mixup, it should be fixed in the latest commits. BTW: looks like the Azure pipeline is failing, but the logs show that the errors are caused by CMake not finding some of the HIP libraries, which I guess is an issue with the pipeline setup (?). On our end the build&test is successful, so I think there should be no problem merging this |
Hi @Beanavil, yes the Azure pipeline is missing some ROCm components that are required by this PR, I will be adding those into the CI. You can view a successful build (with all dependencies included) of this PR here: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=6032&view=results |
This pull request contains the first batch of the new rocFFT and hipFFT examples. Added samples:
rocFFT
callback
multi_gpu
hipFFT
plan_d2z
plan_z2z