Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing "__atomic_fetch_add_8" support #698

Closed
Sarbojit2019 opened this issue Nov 22, 2023 · 8 comments
Closed

Missing "__atomic_fetch_add_8" support #698

Sarbojit2019 opened this issue Nov 22, 2023 · 8 comments
Milestone

Comments

@Sarbojit2019
Copy link
Collaborator

While running the QuickSilver app with Levelzero as backed I see a crash. Looks like "__atomic_fetch_add_8" support is missing.

QuickSilver : https://github.com/oneapi-src/Velocity-Bench/tree/main/QuickSilver

Error message:
CHIP warning [TID 1675333] [1700634335.412708246] : Missing definition for '__atomic_fetch_add_8'
CHIP warning [TID 1675333] [1700634335.412844660] : SPIR-V Parser: MemberId 82 not found in type map
CHIP warning [TID 1675333] [1700634335.412847436] : SPIR-V Parser: MemberId 82 not found in type map
CHIP warning [TID 1675333] [1700634335.412849475] : SPIR-V Parser: MemberId 82 not found in type map
CHIP warning [TID 1675333] [1700634335.412873560] : SPIR-V Parser: MemberId 192 not found in type map
CHIP warning [TID 1675333] [1700634335.412879170] : SPIR-V Parser: MemberId 192 not found in type map
CHIP warning [TID 1675333] [1700634335.412882545] : SPIR-V Parser: MemberId 194 not found in type map
CHIP error [TID 1675333] [1700634336.321535904] : hipErrorTbd (ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED ) in /home/sarbojit/src/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:2482:compile

CHIP error [TID 1675333] [1700634336.321705150] : Caught Error: hipErrorTbd
error: #1054 (hipErrorTbd
Aborted (core dumped)

@linehill
Copy link
Collaborator

Could you trace where the __atomic_fetch_add_8 is coming from? The function you see is a compiler built-in and not a HIP language feature. It is possible that there is an user error involved or an unintended device code injection via a constexpr function (constexpr functions are implicitly __host__ __device__ in HIP/CUDA mode).

@linehill
Copy link
Collaborator

Attempted to reproduce the issue with:

#include <hip/hip_runtime.h>
__device__ int x = 0;
__global__ void k() { __atomic_fetch_add(&x, 1, __ATOMIC_RELAXED); }

But this gets blocked by an assertion:

$ ../install/bin/hipcc atomic-builtins.hip -c
atomic-builtins.hip:3:23: warning: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0  bytes) [-Watomic-alignment]
__global__ void k() { __atomic_fetch_add(&x, 1, __ATOMIC_RELAXED); }
                      ^
clang-16: /mnt/md0/linehill/ws-chip-spv-3/llvm-project/llvm/lib/IR/Instructions.cpp:3356: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
...

There is a possibility that the QuickSilver app is compiled with clang without assertions enabled and the compiler produces ill-formed code.

@Sarbojit2019
Copy link
Collaborator Author

@linehill
How do I enable/disable assertion in compiler? I have built clang locally in release mode and using the same for compiling QuickSilver.
Are you suspecting it is bad kernel which is creating the issue? As per the repo looks like same code works for AMD.

@linehill
Copy link
Collaborator

How do I enable/disable assertion in compiler?

The assertions are enabled with -DCMAKE_BUILD_TYPE=Debug.

Are you suspecting it is bad kernel which is creating the issue?

Not sure yet. There might be very unexpected way the __atomic_fetch_add_8 gets introduced into the device code. Tracing the source of it could help here.

@linehill
Copy link
Collaborator

@Sarbojit2019, any success tracing the origin of the __atomic_fetch_add_8? I made a sanity check patch that may help on the tracing. Pull the patch and rebuild chipStar in Debug mode to enable the sanity checker.

@pvelesko
Copy link
Collaborator

@Sarbojit2019 status?

@pvelesko pvelesko added this to the Release 1.2 milestone May 29, 2024
@pvelesko
Copy link
Collaborator

Since no reply from @Sarbojit2019 and this error not appearing anywhere else, let's move to 1.3

@pvelesko pvelesko modified the milestones: Release 1.2, Release 1.3 Aug 12, 2024
@pvelesko
Copy link
Collaborator

Works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants