Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-flto -fsanitize=cfi cause issues on the CUDA CI node #2309

Open
RossBrunton opened this issue Nov 11, 2024 · 1 comment
Open

-flto -fsanitize=cfi cause issues on the CUDA CI node #2309

RossBrunton opened this issue Nov 11, 2024 · 1 comment
Assignees

Comments

@RossBrunton
Copy link
Contributor

Enabling CFI sanitisation causes the linker to randomly fail when ran on the nodes used to build the CUDA build. Note this is not due to CUDA itself - building the native cpu adapter similarly fails.

Outcomes I've noticed are:

  • Failing to find ur_getenv, even though it should be available in one of the .o files linked to the binary.
  • Crashing in LLVM when doing LTO.
  • Reporting invalid debug information (seems to be fixed by setting the debug info version to DWARF4).

This should be investigated and fixed.

RossBrunton added a commit to RossBrunton/unified-runtime that referenced this issue Nov 13, 2024
Note that this flagged up a few issues, for which followup tickets
were created:
* oneapi-src#2323
* oneapi-src#2309
* oneapi-src#2324
RossBrunton added a commit to RossBrunton/unified-runtime that referenced this issue Nov 13, 2024
Note that this flagged up a few issues, for which followup tickets
were created:
* oneapi-src#2323
* oneapi-src#2309
* oneapi-src#2324
@RossBrunton RossBrunton self-assigned this Nov 14, 2024
RossBrunton added a commit to RossBrunton/unified-runtime that referenced this issue Nov 14, 2024
Note that this flagged up a few issues, for which followup tickets
were created:
* oneapi-src#2323
* oneapi-src#2309
* oneapi-src#2324
@RossBrunton
Copy link
Contributor Author

After many, many days of tracking it down, this seems to be a configuration issue on the build nodes.

Some of them have llvm-linker-tools-13 installed as well as llvm-linker-tools-14, which provides the gold plugin used for lto. Due to the way plugins are loaded, version 13 has priority over 14, but doesn't seem to be able to handle llvm 14 IR in all cases. This means ar silently fails to produce an archive containing all the symbols, which only raises an actual error when the linker finds out that it doesn't have all the symbols it needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant