Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported operation when using CUDA #713

Open
salbert83 opened this issue Oct 13, 2024 · 0 comments
Open

Unsupported operation when using CUDA #713

salbert83 opened this issue Oct 13, 2024 · 0 comments

Comments

@salbert83
Copy link

I previously raised this issue here, FluxML/Zygote.jl#1532, but was recommended it would be more appropriate here.

My environment. I have seen the same issue on Linux machines
Julia Version 1.11.0
Commit 501a4f25c2 (2024-10-07 11:40 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, icelake-client)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Status C:\Users\salbe\OneDrive\Documents\Research\JuliaBugs\Project.toml
[052768ef] CUDA v5.5.2
[7073ff75] IJulia v1.25.0
[e88e6eb3] Zygote v0.6.71

The example:
f₁(x) = sum(abs2, exp.(log.(x) .* (1:length(x))))
f₂(x) = sum(abs2, x.^(1:length(x)))
x = randn(ComplexF64, 5);
z = CuArray{ComplexF64}(x);

Check the gradient calculations are consistent between the 2 functons
test₁ = Zygote.gradient(f₁, x)[1]
test₂ = Zygote.gradient(f₂, x)[1]
norm(test₁ - test₂) / norm(test₁)
Output: 2.2530284453414604e-16 <-- This is reasonable

Check the calculation using CUDA
test₃ = Zygote.gradient(f₁, z)[1];
norm(test₁ - Array(test₃))/ norm(test₁)
Output: 2.0454901873585542e-16

However, using f₂ generates an exception
test₄ = Zygote.gradient(f₂, z)
Output:
InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#34#36")(::CUDA.CuKernelContext, ::CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var"#1409#1410"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to ≺(a, b) @ ForwardDiff C:\Users\salbe.julia\packages\ForwardDiff\PcZ48\src\dual.jl:54)
Stacktrace:
[1] promote_rule
@ C:\Users\salbe.julia\packages\ForwardDiff\PcZ48\src\dual.jl:407
[2] promote_type
@ .\promotion.jl:318
[3] ^
@ .\complex.jl:886
[4] #1409
@ C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:276
[5] _broadcast_getindex_evalf
@ .\broadcast.jl:673
[6] _broadcast_getindex
@ .\broadcast.jl:646
[7] getindex
@ .\broadcast.jl:605
[8] #34
@ C:\Users\salbe.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:59
Hint: catch this exception as err and call code_typed(err; interactive = true) to introspect the erronous code with Cthulhu.jl

Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\validation.jl:147
[2] macro expansion
@ C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:382 [inlined]
[3] macro expansion
@ C:\Users\salbe.julia\packages\TimerOutputs\NRdsv\src\TimerOutput.jl:253 [inlined]
[4] macro expansion
@ C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:381 [inlined]
[5] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\utils.jl:108
[6] emit_llvm
@ C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\utils.jl:106 [inlined]
[7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:100
[8] codegen
@ C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:82 [inlined]
[9] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@kwargs{})
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:79
[10] compile
@ C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:74 [inlined]
[11] #1145
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:250 [inlined]
[12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@kwargs{})
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:34
[13] JuliaContext(f::Function)
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\driver.jl:25
[14] compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:249
[15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\execution.jl:237
[16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler C:\Users\salbe.julia\packages\GPUCompiler\2CW9L\src\execution.jl:151
[17] macro expansion
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:380 [inlined]
[18] macro expansion
@ .\lock.jl:273 [inlined]
[19] cufunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var"#1409#1410"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::@kwargs{})
@ CUDA C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:375
[20] cufunction
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:372 [inlined]
[21] macro expansion
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:112 [inlined]
[22] #launch_heuristic#1200
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\gpuarrays.jl:17 [inlined]
[23] launch_heuristic
@ C:\Users\salbe.julia\packages\CUDA\2kjXI\src\gpuarrays.jl:15 [inlined]
[24] _copyto!
@ C:\Users\salbe.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:78 [inlined]
[25] copyto!
@ C:\Users\salbe.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:44 [inlined]
[26] copy
@ C:\Users\salbe.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:29 [inlined]
[27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Zygote.var"#1409#1410"{typeof(^)}, Tuple{CuArray{ComplexF64, 1, CUDA.DeviceMemory}, UnitRange{Int64}}})
@ Base.Broadcast .\broadcast.jl:867
[28] broadcast_forward(::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
@ Zygote C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:282
[29] adjoint
@ C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:361 [inlined]
[30] _pullback(::Zygote.Context{false}, ::typeof(Base.Broadcast.broadcasted), ::CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, ::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
@ Zygote C:\Users\salbe.julia\packages\ZygoteRules\M4xmc\src\adjoint.jl:67
[31] _apply(::Function, ::Vararg{Any})
@ Core .\boot.jl:946
[32] adjoint
@ C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\lib\lib.jl:203 [inlined]
[33] _pullback
@ C:\Users\salbe.julia\packages\ZygoteRules\M4xmc\src\adjoint.jl:67 [inlined]
[34] broadcasted
@ .\broadcast.jl:1326 [inlined]
[35] f₂
@ .\In[3]:2 [inlined]
[36] _pullback(ctx::Zygote.Context{false}, f::typeof(f₂), args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
@ Zygote C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\compiler\interface2.jl:0
[37] pullback(f::Function, cx::Zygote.Context{false}, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
@ Zygote C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:90
[38] pullback
@ C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:88 [inlined]
[39] gradient(f::Function, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
@ Zygote C:\Users\salbe.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:147
[40] top-level scope
@ In[7]:1

Click to add a cell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant