Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocFFT and hipFFT examples (part I) #141

Merged
merged 11 commits into from
Aug 26, 2024

Conversation

Beanavil
Copy link
Collaborator

@Beanavil Beanavil commented Jul 16, 2024

This pull request contains the first batch of the new rocFFT and hipFFT examples. Added samples:

rocFFT

  • callback
  • multi_gpu

hipFFT

  • plan
    • plan_d2z
    • plan_z2z

@Beanavil Beanavil requested review from a team and dgaliffiAMD as code owners July 16, 2024 09:56
@Beanavil Beanavil force-pushed the fft-callback-plan-multigpu branch 2 times, most recently from 2be7d9c to 74845d4 Compare July 16, 2024 10:06
install(IMPORTED_RUNTIME_ARTIFACTS roc::rocfft)
elseif(GPU_RUNTIME STREQUAL "CUDA")
find_package(CUDAToolkit REQUIRED)
install(IMPORTED_RUNTIME_ARTIFACTS CUDA::cusolver)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure you're looking for cufft here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fixed this!

@malcolmroberts
Copy link

Is there any CI about this compiling and the examples actually running?

@evetsso
Copy link

evetsso commented Aug 2, 2024

What CI is there for these examples?

@Beanavil
Copy link
Collaborator Author

Beanavil commented Aug 5, 2024

Hi @malcolmroberts @evetsso, AFAIK the only CI in place for the examples in GitHub is the one for linting (.github/workflows/linting.yml). Additionally, we (StreamHPC) have our own internal CI, where we build and test the examples. I think, but I'm not 100% sure, that there is also an internal CI on AMD's side, perhaps @dgaliffiAMD can provide more details.

@dgaliffiAMD
Copy link
Collaborator

Hi @malcolmroberts ,

It is as @Beanavil says, the external CIs are just for linters. The basic GitHub runners couldn't complete the build; they ran out of disk space trying to install ROCm.
Internally, changes are in code review to add the rocm-examples to the build and test pipelines.
There are efforts to have the entire ROCm stack built through an Azure pipeline. This CI will be available externally and when it is ready rocm-examples will be included.

Thanks,
David

install(IMPORTED_RUNTIME_ARTIFACTS roc::rocfft)
elseif(GPU_RUNTIME STREQUAL "CUDA")
find_package(CUDAToolkit REQUIRED)
install(IMPORTED_RUNTIME_ARTIFACTS CUDA::cusolver)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing that Malcolm pointed out - this should be cufft.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

// Prepare callback
load_callback_data callback_data_host;
callback_data_host.filter = callback_filter_dev;
callback_data_host.scale = 1.0 / static_cast<double>(N);
Copy link

@evetsso evetsso Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocFFT has an explicit API to perform result scaling: https://rocm.docs.amd.com/projects/rocFFT/en/latest/how-to/working-with-rocfft.html#result-scaling

hipFFT also exposes an API to do this.

This API is expected to perform better than callbacks, though I realize that the rocFFT repository does not currently have an example to demonstrate its usage. The explicit API is quicker because the compiler is able to understand and optimize the extra scaling multiplication - with callback functions the runtime only receives an opaque function pointer and the compiler cannot optimize the code as well.

AFAICT result scaling is the most common use case for callback functions in the API. I think it would make sense to have an example demonstrating use of the explicit result scaling API, and then this callback example can be repurposed for a different operation.

I don't know which operation would be best though - aside from result scaling, I'm unaware of a commonly-used operation that people would want to do in a callback function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, we hadn't noticed this functionality before. Perhaps, given that the callback applies a filter and a scaling factor, we can keep the filtering and just use the scaling-specific API you mentioned for the scaling part. I think that still would make sense as a use-case for callbacks, and then we also get to showcase roc/hipFFT's scaling API.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's reasonable, since rocFFT has no API for filtering currently. The only issue is that we wouldn't really see the performance benefit of using the result scaling API, since callbacks are still in use. We hope to have a better way to express filtering (and other spectral operations) eventually, but it'll take a while to get there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to answer here, the example should be updated already!

@evetsso evetsso self-requested a review August 9, 2024 15:23
Copy link

@evetsso evetsso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

One general question I have about these samples - sometimes, the main source code file is called main.hip, and sometimes it's main.cpp. Is there any system behind this choice?

It seems to me that a source file that contains HIP code (i.e. __global__ or __device__) can be justified to be named .hip. Otherwise, .cpp makes more sense. But it doesn't really matter in the end since we're using set_source_file_properties in all the CMakeLists.txt anyway to override the language.

Out of all these examples, really only the callback one contains any HIP code.

@Snektron
Copy link
Collaborator

It seems to me that a source file that contains HIP code (i.e. global or device) can be justified to be named .hip. Otherwise, .cpp makes more sense.

If it has to be compiled as HIP code, because it contains device code, then the extension is .hip. If it can be compiled using a regular c++ compiler (at least in theory I guess, since you pointed out the language overriding thing), then the file extenion is c++. I'm fairly sure that this is documented somewhere, but I couldn't find it now...

@dgaliffiAMD
Copy link
Collaborator

Hi @evetsso and @Snektron, it looks like everything is approved, but I still see a couple of unresolved conversations. I just wanted to couple check that this PR is okay to merge. Thank you.

@Beanavil
Copy link
Collaborator Author

Beanavil commented Aug 22, 2024

@evetsso sorry for the late answer, I was off these past few days. I hadn't noticed the extensions mixup, it should be fixed in the latest commits.

BTW: looks like the Azure pipeline is failing, but the logs show that the errors are caused by CMake not finding some of the HIP libraries, which I guess is an issue with the pipeline setup (?). On our end the build&test is successful, so I think there should be no problem merging this

@danielsu-amd
Copy link
Contributor

Hi @Beanavil, yes the Azure pipeline is missing some ROCm components that are required by this PR, I will be adding those into the CI.

You can view a successful build (with all dependencies included) of this PR here: https://dev.azure.com/ROCm-CI/ROCm-CI/_build/results?buildId=6032&view=results

@dgaliffiAMD dgaliffiAMD merged commit b6e6ecc into ROCm:develop Aug 26, 2024
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants