Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported operation when using CUDA #1532

Closed
salbert83 opened this issue Oct 12, 2024 · 3 comments
Closed

Unsupported operation when using CUDA #1532

salbert83 opened this issue Oct 12, 2024 · 3 comments

Comments

@salbert83
Copy link

Motivation and description

UnsupportedOpWithCUDA.txt

The attached is actually a jupyter notebook, but I saved with a .txt extension to upload.

In summary, I defined two functions
f₁(x) = sum(abs2, exp.(log.(x) .* (1:length(x))))
f₂(x) = sum(abs2, x.^(1:length(x)))

Their gradients agree on the CPU. When the argument is on the GPU, the first function's gradient is consistent with the CPU calculation, the later function throws an exception.

Possible Implementation

No response

@ToucheSir
Copy link
Member

As befitting a MWE, it would be better if you could copy the stacktrace from your notebook and paste it in a code block here. Raw Jupyter notebooks are not particularly human-readable.

@ToucheSir
Copy link
Member

Looking at the stacktrace:

InvalidIRError: compiling MethodInstance for (::GPUArrays.var\"#34#36\")(::CUDA.CuKernelContext, ::CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var\"#1409#1410\"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to ≺(a, b) @ ForwardDiff C:\\Users\\salbe\\.julia\\packages\\ForwardDiff\\PcZ48\\src\\dual.jl:54)
Stacktrace:
 [1] promote_rule
   @ C:\\Users\\salbe\\.julia\\packages\\ForwardDiff\\PcZ48\\src\\dual.jl:407
 [2] promote_type
   @ .\\promotion.jl:318
 [3] ^
   @ .\\complex.jl:886
 [4] #1409
   @ C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\lib\\broadcast.jl:276
 [5] _broadcast_getindex_evalf
   @ .\\broadcast.jl:673
 [6] _broadcast_getindex
   @ .\\broadcast.jl:646
 [7] getindex
   @ .\\broadcast.jl:605
 [8] #34
   @ C:\\Users\\salbe\\.julia\\packages\\GPUArrays\\qt4ax\\src\\host\\broadcast.jl:59
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl",
	"stack": "InvalidIRError: compiling MethodInstance for (::GPUArrays.var\"#34#36\")(::CUDA.CuKernelContext, ::CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var\"#1409#1410\"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to ≺(a, b) @ ForwardDiff C:\\Users\\salbe\\.julia\\packages\\ForwardDiff\\PcZ48\\src\\dual.jl:54)
Stacktrace:
 [1] promote_rule
   @ C:\\Users\\salbe\\.julia\\packages\\ForwardDiff\\PcZ48\\src\\dual.jl:407
 [2] promote_type
   @ .\\promotion.jl:318
 [3] ^
   @ .\\complex.jl:886
 [4] #1409
   @ C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\lib\\broadcast.jl:276
 [5] _broadcast_getindex_evalf
   @ .\\broadcast.jl:673
 [6] _broadcast_getindex
   @ .\\broadcast.jl:646
 [7] getindex
   @ .\\broadcast.jl:605
 [8] #34
   @ C:\\Users\\salbe\\.julia\\packages\\GPUArrays\\qt4ax\\src\\host\\broadcast.jl:59
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\validation.jl:147
  [2] macro expansion
    @ C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:382 [inlined]
  [3] macro expansion
    @ C:\\Users\\salbe\\.julia\\packages\\TimerOutputs\\NRdsv\\src\\TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:381 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\utils.jl:108
  [6] emit_llvm
    @ C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\utils.jl:106 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:100
  [8] codegen
    @ C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:82 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:79
 [10] compile
    @ C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:74 [inlined]
 [11] #1145
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\compilation.jl:250 [inlined]
 [12] JuliaContext(f::CUDA.var\"#1145#1148\"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:34
 [13] JuliaContext(f::Function)
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\driver.jl:25
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\compilation.jl:249
 [15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\execution.jl:237
 [16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler C:\\Users\\salbe\\.julia\\packages\\GPUCompiler\\2CW9L\\src\\execution.jl:151
 [17] macro expansion
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\execution.jl:380 [inlined]
 [18] macro expansion
    @ .\\lock.jl:273 [inlined]
 [19] cufunction(f::GPUArrays.var\"#34#36\", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var\"#1409#1410\"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::@Kwargs{})
    @ CUDA C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\execution.jl:375
 [20] cufunction
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\execution.jl:372 [inlined]
 [21] macro expansion
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\compiler\\execution.jl:112 [inlined]
 [22] #launch_heuristic#1200
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\gpuarrays.jl:17 [inlined]
 [23] launch_heuristic
    @ C:\\Users\\salbe\\.julia\\packages\\CUDA\\2kjXI\\src\\gpuarrays.jl:15 [inlined]
 [24] _copyto!
    @ C:\\Users\\salbe\\.julia\\packages\\GPUArrays\\qt4ax\\src\\host\\broadcast.jl:78 [inlined]
 [25] copyto!
    @ C:\\Users\\salbe\\.julia\\packages\\GPUArrays\\qt4ax\\src\\host\\broadcast.jl:44 [inlined]
 [26] copy
    @ C:\\Users\\salbe\\.julia\\packages\\GPUArrays\\qt4ax\\src\\host\\broadcast.jl:29 [inlined]
 [27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Zygote.var\"#1409#1410\"{typeof(^)}, Tuple{CuArray{ComplexF64, 1, CUDA.DeviceMemory}, UnitRange{Int64}}})
    @ Base.Broadcast .\\broadcast.jl:867
 [28] broadcast_forward(::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
    @ Zygote C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\lib\\broadcast.jl:282
 [29] adjoint
    @ C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\lib\\broadcast.jl:361 [inlined]
 [30] _pullback(::Zygote.Context{false}, ::typeof(Base.Broadcast.broadcasted), ::CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, ::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
    @ Zygote C:\\Users\\salbe\\.julia\\packages\\ZygoteRules\\M4xmc\\src\\adjoint.jl:67
 [31] _apply(::Function, ::Vararg{Any})
    @ Core .\\boot.jl:946
 [32] adjoint
    @ C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\lib\\lib.jl:203 [inlined]
 [33] _pullback
    @ C:\\Users\\salbe\\.julia\\packages\\ZygoteRules\\M4xmc\\src\\adjoint.jl:67 [inlined]
 [34] broadcasted
    @ .\\broadcast.jl:1326 [inlined]
 [35] f₂
    @ .\\In[3]:2 [inlined]
 [36] _pullback(ctx::Zygote.Context{false}, f::typeof(f₂), args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\compiler\\interface2.jl:0
 [37] pullback(f::Function, cx::Zygote.Context{false}, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\compiler\\interface.jl:90
 [38] pullback
    @ C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\compiler\\interface.jl:88 [inlined]
 [39] gradient(f::Function, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote C:\\Users\\salbe\\.julia\\packages\\Zygote\\Tt5Gx\\src\\compiler\\interface.jl:147
 [40] top-level scope
    @ In[7]:1

I think the problem is in how ForwardDiff.jl handles ^. The function called is https://github.com/JuliaLang/julia/blob/v1.11.0/base/complex.jl#L885, which ends up at https://github.com/JuliaDiff/ForwardDiff.jl/blob/v0.10.36/src/dual.jl#L54 via https://github.com/JuliaDiff/ForwardDiff.jl/blob/v0.10.36/src/dual.jl#L407. Even if that error path isn't being hit, it isn't GPU-friendly.

The good and bad news is that this appears to be an issue with how ForwardDiff handles ^ for complex numbers. My recommendation would be to recreate your MWE using ForwardDiff.gradient instead of Zygote, and then raise an issue there.

@salbert83
Copy link
Author

Thank you. I will raise the issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants