LLVM 16 and 17 support #1730

brechtvl · 2023-09-16T22:49:43Z

Description

Bump maximum supported LLVM version to 17.
Rename makeArrayRef to toArrayRef to avoid llvm::makeArrayRef deprecation warnings.
Minor required changes related to llvm::Align.
Disable initializeInstrumentation, which was removed and appears to serve no purpose now.
Add i128 to OptiX data layout. This was already in the NVPTX backend since LLVM 6, the docs this was taken from seem outdated. Unclear why it started failing now, but the data layout seems clearly incomplete without this.
Since llvm/llvm-project@2c3f82b, FRem generates an optix.ptx.testp.infinite.f32 intrinsic that OptiX does not currently implement. Work around with custom implementation.
LLVM 16 was generating global variables with characters not supported by ptx. Add manual renaming matching the NVPTXAssignValidGlobalNames LLVM pass.
These global variables were actually meant to be string hashes instead of strings for performance, so also ensure they are evaluated to hashes at compile time.

This depends on both opaque pointers (#1728) and new pass manager support (#1729), and should not be merged before them.

Tests

CI jobs updated to use LLVM 16 for bleeding edge.

Checklist:

I have read the contribution guidelines.
I have previously submitted a Contributor License Agreement.
I have updated the documentation, if applicable.
I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
My code follows the prevailing code style of this project.

* Bump minimum supported LLVM version to 16. * Rename `makeArrayRef` to `toArrayRef` to avoid `llvm::makeArrayRef` deprecation warnings. * Minor required changes related to `llvm::Align`. * Disable `initializeInstrumentation` and `createPruneEHPass` which were remove and appear to serve no purpose now. * Add assert when using default optimization levels with legacy pass manager, which is no longer supported. This depends on both opaque pointers and new pass manager support. Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

src/cmake/externalpackages.cmake

lgritz · 2023-09-18T04:09:30Z

If it helps you, this little patch will modify the CI to run the bleeding edge test on llvm 16:

--------------------------- .github/workflows/ci.yml ---------------------------
index c9b8470c..be7778c7 100644
@@ -343,7 +343,7 @@ jobs:
             python_ver: "3.10"
             simd: avx2,f16c
             batched: b8_AVX2,b8_AVX512,b16_AVX512
-            setenvs: export LLVM_VERSION=14.0.0
+            setenvs: export LLVM_VERSION=15.0.6
                             LLVM_DISTRO_NAME=ubuntu-18.04
                             OPENCOLORIO_VERSION=v2.2.0
                             PUGIXML_VERSION=v1.13
@@ -359,8 +359,8 @@ jobs:
             python_ver: "3.10"
             simd: avx2,f16c
             batched: b8_AVX2,b8_AVX512,b16_AVX512
-            setenvs: export LLVM_VERSION=15.0.6
-                            LLVM_DISTRO_NAME=ubuntu-18.04
+            setenvs: export LLVM_VERSION=16.0.4
+                            LLVM_DISTRO_NAME=ubuntu-22.04
                             OPENCOLORIO_VERSION=main
                             PUGIXML_VERSION=master
           - desc: clang14/C++17 llvm14 boost1.71 exr3.1 py3.8 avx2 batch-b16avx512

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-09-18T08:44:23Z

If it helps you, this little patch will modify the CI to run the bleeding edge test on llvm 16:

Thanks. It's going to fail to build with LLVM 16 right now, but once the other PRs are merged I can try it.

brechtvl · 2023-09-18T16:37:02Z

LLVM 17 (just released) actually works as well, I needed to do a minor change the new pass manager PR.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

i128:128 was already in the NVPTX backend since LLVM 6, the docs this was taken from seem outdated. Unclear it started failing now, but the data layout seems clearly incomplete without this. Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-09-28T02:19:54Z

OptiX almost works with LLVM 16 but there is one problem, optixModuleCreateFn reports the following:

New backend is missing implementation for PTX intrinsic optix.ptx.testp.infinite.f32

What seems to happen is that LLVM is now generating an intrinsic that OptiX does not support. A custom LLVM 16 build with the following commit reverted works: llvm/llvm-project@2c3f82b

@tgrant-nv, what would be the solution here? A fix in the OptiX driver? Somehow patching the generated ptx in OSL to avoid this instruction?

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-09-28T03:55:29Z

I found a workaround, though it doesn't seem like a great long term solution.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-09-28T11:33:00Z

I encountered a problem with the testoptix-noise.optix:

Message: Creating module for PTX group group: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 46; fatal   : Parsing error near '.str6': syntax error
Cannot parse input PTX string

I couldn't spot anything suspicious relating to str6 in shadeops_cuda.ptx. It's a global string usimplexnoise. Shuffling around the code that generates this worked around the problem.

tgrant-nv · 2023-09-28T15:11:14Z

Thanks, Brecht.

If there is a variable named .str6, that is not a valid identifier in PTX since names cannot contain . (see section 4.4 in the CUDA docs). There is an opt flag to emit valid identifiers (--nvptx-assign-valid-global-names), which we are already using in cuda_macros.cmake. opt is being run on the shadeops IR with that flag when shadeops_cuda.ptx is being generated, so it's not clear to me why it's not having the intended effect.

Regarding the fmod issue, that looks like something that should be handled in the OptiX driver. Unfortunately we do stumble upon these coverage gaps from time to time. We should be able to support these testp instructions in an upcoming driver release, but your workaround will be necessary for existing drivers.

brechtvl · 2023-09-28T15:28:00Z

Ah, it's not actually using that --nvptx-assign-valid-global-names flag with the new pass manager for LLVM 16, so that would explain it. Those changes are in #1729, specifically c83afef.

The opt command with the legacy pass manager no longer supports default -O levels, so I made it use the new pass manager. But I could not find a way to use the --nvptx-assign-valid-global-names pass in either opt or llc then.

My understanding is that this type of target specific codegen pass should be done in llc instead of opt now. And there are some --nvptx flags for llc and various pass flags for other targets, but not this specific one. Maybe this was an oversight and the flag could be implemented in llc upstream, but that would be for LLVM 18 I guess.

brechtvl · 2023-09-28T15:55:05Z

This is what it looks like in PTX, there's not actually a . in the name?

.visible .global .align 1 .b8 _$_str6[14] = {117, 115, 105, 109, 112, 108, 101, 120, 110, 111, 105, 115, 101, 0};

tgrant-nv · 2023-09-28T18:24:38Z

I've built your branch and confirmed that the issue is not in shadeops_cuda.ptx, which contains the valid name as seen in the snippet you posted, but in the JIT'd shader PTX for group2. It contains two declarations for strings with invalid identifiers:

.extern .global .align 1 .b8 .str6[14];
.extern .global .align 1 .b8 .str7[9];

We will need to somehow convince LLVM to generate valid names, or perhaps write a small pass to sanitize invalid names.

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

src/cmake/externalpackages.cmake

src/liboslexec/opnoise.cpp

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

And revert previous workaround. Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-09-29T23:47:51Z

It occurred to me that there should be no strings in the ptx code in the first place, there should only be string hashes. And it appears that there is nothing strictly requiring the hash to be evaluated at compile time. So I implemented a fix based on this:
https://artificial-mind.net/blog/2020/11/14/cpp17-consteval

The issue of JIT potentially generating invalid global variable names remains. But I suspect it's not new and we've just been lucky that the optimizer decided to evaluate the hash at compile time in earlier LLVM versions.

With this latest fix I think this PR is ready,

AlexMWells · 2023-10-02T17:20:25Z

@brechtvl , what is the minimum CUDA version required? I thought constexpr might have problems compiling with CUDA < 12?

brechtvl · 2023-10-02T17:51:46Z

This fix should only use C++14, which has been supported since CUDA 9 (released in 2017). I couldn't find a minimum CUDA version in the OSL code, but that seems old enough.

Maybe you are thinking of if constexpr which is a C++17 feature?

I have been testing with CUDA 11.8.

lgritz · 2023-10-02T18:28:16Z

I think we use 11.6 currently at SPI? (I think, I'm not in front of it at the moment.)

The INSTALL.md says that OSL supports Cuda >= 8.0, but (a) we definitely don't test it, so I don't know what the actual oldest version is that works properly, and (b) I have no particular objection to picking a higher minimum, though I don't have a lot of data about how high we can go before people will be inconvenienced. (Anecdotal evidence: according to the vfx industry build matrix, the major studios are all on 9.2 and higher and the DCCs say they want 10 or higher.)

brechtvl · 2023-10-02T20:48:08Z

I ran some tests:

CUDA 8.0: fails to build in main because nvcc --std=c++14 is unsupported.
CUDA 9.2 and 10.2: builds and most test pass, but testoptix-noise.optix.opt fails with illegal memory access was encountered. both in main and this PR.
CUDA 11.8: builds and tests pass.

I'm not sure what is going on with that failing test, if that's a real problem that someone using CUDA 9 or 10 might encounter in production.

But for this PR at least, the usage of C++14 seems ok since it's already using --std=c++14 for nvcc commands.

tgrant-nv · 2023-10-02T21:34:33Z

Regarding the invalid variable name issue, I'm not sure if there is a way to trigger the existing pass (NVPTXAssignValidGlobalNames) with the new pass manager. Perhaps there are some target options that could be set before the PTX is emitted that would do the trick, but I'm not sure. But it would be easy to duplicate the existing pass in liboslexec. It doesn't need to be a full-fledged pass. Something like this would do the trick (cribbed from the LLVM source):

diff --git a/src/liboslexec/llvm_instance.cpp b/src/liboslexec/llvm_instance.cpp
index 532ba40b..82a0857d 100644
--- a/src/liboslexec/llvm_instance.cpp
+++ b/src/liboslexec/llvm_instance.cpp
@@ -1871,12 +1871,28 @@ BackendLLVM::run()
                 fn.addFnAttr("osl-lib-function", "true");
             }
 
+            auto cleanupName = [&](llvm::StringRef Name) {
+                std::string ValidName;
+                llvm::raw_string_ostream ValidNameStream(ValidName);
+                for (char C : Name) {
+                    if (C == '.' || C == '@') {
+                        ValidNameStream << "_$_";
+                    } else {
+                        ValidNameStream << C;
+                    }
+                }
+                return ValidNameStream.str();
+            };
+
             // Mark all global variables extern and discard their initializers.
             // Global variables are defined in the shadeops PTX file.
             for (llvm::GlobalVariable& global : ll.module()->globals()) {
                 global.setLinkage(llvm::GlobalValue::ExternalLinkage);
                 global.setExternallyInitialized(true);
                 global.setInitializer(nullptr);
+                std::string validName = cleanupName(global.getName());
+                if(validName != global.getName())
+                    global.setName(validName);
             }
         }
         OSL_ASSERT(ll.module());

AlexMWells · 2023-10-02T21:53:34Z

CUDA 8.0: fails to build in main because nvcc --std=c++14 is unsupported.
I think the current INSTALL.md states CUDA 8 as minimum, so perhaps it needs to be bumped up.

https://developer.nvidia.com/blog/programming-efficiently-with-the-cuda-11-3-compiler-toolchain/

lgritz · 2023-10-02T22:11:11Z

Thanks for checking on the Cuda versions, Brecht. It sounds like we should change the advertised set of versions for Cuda to be 9.0 minimum and 11.0+ recommended. I'm happy to quickly whip that up and submit it.

Patch by Tim with minor code style changes. Co-authored-by: Tim Grant <tgrant@nvidia.com> Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl · 2023-10-02T22:33:14Z

Regarding the invalid variable name issue, I'm not sure if there is a way to trigger the existing pass (NVPTXAssignValidGlobalNames) with the new pass manager. Perhaps there are some target options that could be set before the PTX is emitted that would do the trick, but I'm not sure. But it would be easy to duplicate the existing pass in liboslexec. It doesn't need to be a full-fledged pass. Something like this would do the trick ...

I can confirm this also fixes the issue. So included it, in case such global variables are generated for a good reason in the future.

brechtvl · 2023-10-02T22:33:35Z

Thanks for checking on the Cuda versions, Brecht. It sounds like we should change the advertised set of versions for Cuda to be 9.0 minimum and 11.0+ recommended. I'm happy to quickly whip that up and submit it.

Sounds good.

lgritz · 2023-10-02T22:33:54Z

#1737

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

lgritz · 2023-10-03T22:28:12Z

Is this ready to merge?

lgritz

LGTM and passes CI except for the Intel compiler tests which are broken for other reasons (on their end).

brechtvl · 2023-10-03T22:46:05Z

Yes, it's ready.

spelufo · 2023-10-16T18:57:11Z

Hi, just wanted to let you know that OSL is currently disabled on Archlinux's blender since it was waiting for LLVM 16 support. This PR seems to indicate that it is now ready, but I don't know if there's some work left on the blender side. I've emailed the arch package mantainers, haven't heard back yet.

@brechtvl If I compile the latest blender from source with the same cmake command as arch's without the disable osl flag, should it work? I was relying on OSL for this.

Thanks

brechtvl · 2023-10-16T19:44:32Z

It does not work currently. The OSL API is still undergoing changes in the main branch, only when there is a stable OSL release can we really ensure it's compatible.

I suggest to use either the Blender binary download or the build instructions from blender.org, instead of the Arch package.

This won't be picked up by the platforms yet, they are still using 1.13.5.x for 2023.10 (as well as still OIIO 2.6.0.x, which is too old for this OSL release). So really the only client for this build is likely Arnold. Highlights: * Raise OpenImageIO to 2.6.1.0. * RS::get_texture_info make robust to empty shaderglobals param (AcademySoftwareFoundation#1731) * LLVM opaque pointers support (AcademySoftwareFoundation#1728) * LLVM new pass manager support (AcademySoftwareFoundation#1729) * LLVM 16 and 17 support (AcademySoftwareFoundation#1730) * fix: SymOverrideInfo bitfields should be the same type (AcademySoftwareFoundation#1745) * fix: batched pointcloud if "index" is not passed (AcademySoftwareFoundation#1753) * api: Ustringhash phase 3 (AcademySoftwareFoundation#1732) * fix: Fix NVPTX TargetMachine leak, etc. (AcademySoftwareFoundation#1763) * feat(api): Add API for building interpolated getter free functions. (AcademySoftwareFoundation#1765) * build: add options to use static Cuda libraries (AcademySoftwareFoundation#1772) * build(deps): Initial support for LLVM-18 (AcademySoftwareFoundation#1773) * fix: Mute partio error prints (AcademySoftwareFoundation#1774) * fix: calculatenormal needs fliphandedness (AcademySoftwareFoundation#1783) * fix: Print closure missing error message at compile time instead of run time. (AcademySoftwareFoundation#1781) * fix: GPU interpolated param initialization (AcademySoftwareFoundation#1791) * oslquery: Simplify include and link needs (AcademySoftwareFoundation#1787) * Make isconnected() work with downstream renderer "connections." (AcademySoftwareFoundation#1782) Also along for the ride: * Remove the `blah-platform/=1.0` preamble, no longer needed. * Stop building for deprecated Nuke 12 and 13 platforms. * Raise cmake build version to 3.24 See merge request spi/dev/3rd-party/osl-feedstock!57

brechtvl marked this pull request as draft September 16, 2023 22:49

brechtvl mentioned this pull request Sep 16, 2023

LLVM 16 and opaque pointers support #1727

Closed

5 tasks

brechtvl marked this pull request as ready for review September 16, 2023 23:57

Merge branch 'main' into llvm16-minimal

fb004c1

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

lgritz reviewed Sep 18, 2023

View reviewed changes

src/cmake/externalpackages.cmake Outdated Show resolved Hide resolved

Keep CMAKE_CXX_STANDARD warning for LLVM 16

e7f2fb3

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Enable LLVM 17 too, adjust for changes in new pass manager PR

c8e684e

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl changed the title ~~LLVM 16 support~~ LLVM 16 and 17 support Sep 20, 2023

brechtvl force-pushed the llvm16-minimal branch from b01aac8 to dee0540 Compare September 28, 2023 02:05

Work around missing OptiX intrinsic with custom fmod

f38d0d7

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Work around OptiX parse error by shuffling code around

f3ae105

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl mentioned this pull request Sep 28, 2023

LLVM new pass manager support #1729

Merged

5 tasks

brechtvl added 3 commits September 28, 2023 21:10

Merge branch 'main' into llvm16-minimal

5707966

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Modify CI to run the bleeding edge test on LLVM 16

86b6f4b

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Fix deprecation warning

abbd0db

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

lgritz reviewed Sep 29, 2023

View reviewed changes

src/cmake/externalpackages.cmake Outdated Show resolved Hide resolved

lgritz reviewed Sep 29, 2023

View reviewed changes

src/liboslexec/opnoise.cpp Outdated Show resolved Hide resolved

Remove redundant LLVM version check, update CI job names

87524d3

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl added 2 commits September 30, 2023 01:37

Avoid invalid global name by forcing device strings to be hashed

ad79d38

And revert previous workaround. Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Improve comment

d9fe4de

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Manually make global variable names valid for PTX

8e97f2b

Patch by Tim with minor code style changes. Co-authored-by: Tim Grant <tgrant@nvidia.com> Signed-off-by: Brecht Van Lommel <brecht@blender.org>

brechtvl added 2 commits October 3, 2023 21:45

Fix build with older LLVM

55cd92c

Signed-off-by: Brecht Van Lommel <brecht@blender.org>

Merge branch 'main' into llvm16-minimal

e7c701a

lgritz approved these changes Oct 3, 2023

View reviewed changes

lgritz merged commit cdfec4f into AcademySoftwareFoundation:main Oct 4, 2023

ShaddyDC mentioned this pull request Nov 20, 2023

osl: fix build, 1.12.13.0 -> 1.12.14.0 NixOS/nixpkgs#268670

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM 16 and 17 support #1730

LLVM 16 and 17 support #1730

brechtvl commented Sep 16, 2023 •

edited

Loading

lgritz commented Sep 18, 2023

brechtvl commented Sep 18, 2023 •

edited

Loading

brechtvl commented Sep 18, 2023 •

edited

Loading

brechtvl commented Sep 28, 2023

brechtvl commented Sep 28, 2023

brechtvl commented Sep 28, 2023

tgrant-nv commented Sep 28, 2023

brechtvl commented Sep 28, 2023 •

edited

Loading

brechtvl commented Sep 28, 2023

tgrant-nv commented Sep 28, 2023

brechtvl commented Sep 29, 2023

AlexMWells commented Oct 2, 2023

brechtvl commented Oct 2, 2023 •

edited

Loading

lgritz commented Oct 2, 2023

brechtvl commented Oct 2, 2023

tgrant-nv commented Oct 2, 2023 •

edited

Loading

AlexMWells commented Oct 2, 2023

lgritz commented Oct 2, 2023

brechtvl commented Oct 2, 2023

brechtvl commented Oct 2, 2023

lgritz commented Oct 2, 2023

lgritz commented Oct 3, 2023

lgritz left a comment

brechtvl commented Oct 3, 2023

spelufo commented Oct 16, 2023

brechtvl commented Oct 16, 2023

LLVM 16 and 17 support #1730

LLVM 16 and 17 support #1730

Conversation

brechtvl commented Sep 16, 2023 • edited Loading

Description

Tests

Checklist:

lgritz commented Sep 18, 2023

brechtvl commented Sep 18, 2023 • edited Loading

brechtvl commented Sep 18, 2023 • edited Loading

brechtvl commented Sep 28, 2023

brechtvl commented Sep 28, 2023

brechtvl commented Sep 28, 2023

tgrant-nv commented Sep 28, 2023

brechtvl commented Sep 28, 2023 • edited Loading

brechtvl commented Sep 28, 2023

tgrant-nv commented Sep 28, 2023

brechtvl commented Sep 29, 2023

AlexMWells commented Oct 2, 2023

brechtvl commented Oct 2, 2023 • edited Loading

lgritz commented Oct 2, 2023

brechtvl commented Oct 2, 2023

tgrant-nv commented Oct 2, 2023 • edited Loading

AlexMWells commented Oct 2, 2023

lgritz commented Oct 2, 2023

brechtvl commented Oct 2, 2023

brechtvl commented Oct 2, 2023

lgritz commented Oct 2, 2023

lgritz commented Oct 3, 2023

lgritz left a comment

Choose a reason for hiding this comment

brechtvl commented Oct 3, 2023

spelufo commented Oct 16, 2023

brechtvl commented Oct 16, 2023

brechtvl commented Sep 16, 2023 •

edited

Loading

brechtvl commented Sep 18, 2023 •

edited

Loading

brechtvl commented Sep 18, 2023 •

edited

Loading

brechtvl commented Sep 28, 2023 •

edited

Loading

brechtvl commented Oct 2, 2023 •

edited

Loading

tgrant-nv commented Oct 2, 2023 •

edited

Loading