Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add TOA and surface radiation diagnostics #2579

Merged
merged 1 commit into from
Feb 2, 2024
Merged

add TOA and surface radiation diagnostics #2579

merged 1 commit into from
Feb 2, 2024

Conversation

szy21
Copy link
Member

@szy21 szy21 commented Jan 26, 2024

Purpose

Closes #2578
Also cleans up diagnostics for slab ocean experiments

To-do

Content


  • I have read and checked the items on the review checklist.

@szy21 szy21 force-pushed the zs/rad_diagnostics branch from e15526c to 11a94d7 Compare January 26, 2024 07:46
@Sbozzolo
Copy link
Member

Sbozzolo commented Feb 1, 2024

The issue is with saving the value on the accumulator. copy doesn't seem to work on this type:

view(reshape(view(::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, 1:11, :), 11, 4, 4, 1, 216), 11, :, :, :, :)

It might be because there's two nested views.

@Sbozzolo
Copy link
Member

Sbozzolo commented Feb 1, 2024

Minumum breaking example:

using CUDA
A = CuArray(reshape(1:16, 4, 4))
copy(view(reshape(view(A, 1:3, :), 3, 4), 3, :, :))

@Sbozzolo
Copy link
Member

Sbozzolo commented Feb 1, 2024

As a workaround, I will change to similar + copyto!. This is not the most optimal solution, but given that this is only done at initialization, we can live with that.

@Sbozzolo Sbozzolo force-pushed the zs/rad_diagnostics branch 2 times, most recently from 5632fbc to f7e9d49 Compare February 1, 2024 19:47
@@ -958,8 +958,14 @@ function get_simulation(config::AtmosConfig)
)
else
# Add to the accumulator

# We use similar + .= instead of copy because of a bug in CUDA.jl 5.2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a link to the issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no issue yet, I asked on slack #gpu, if someone confirms that this is a problem, I will open the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pointed out to this JuliaGPU/Adapt.jl#21 with the following comments:

because of how Julia works, it's tricky to define GPU-accelerated functions for nested array wrappers, so this is triggering fallback functionality from Base
this would need a change in Base, or a more radical type system change, to make this easier. alternatively, relying more on unified memory would make the fallbacks work better, which is probably what we're going to do instead.

@charleskawczynski
Copy link
Member

Minumum breaking example:

using CUDA
A = CuArray(reshape(1:16, 4, 4))
copy(view(reshape(view(A, 1:3, :), 3, 14), 3, :, :))

That's not the issue:

julia> using CUDA

julia> A = CuArray(reshape(1:16, 4, 4));

julia> copy(view(reshape(view(A, 1:3, :), 3, 14), 3, :, :));
ERROR: DimensionMismatch: parent has 12 elements, which is incompatible with size (3, 14)
Stacktrace:
 [1] _throw_dmrs(n::Int64, str::String, dims::Tuple{Int64, Int64})
   @ Base .\reshapedarray.jl:182
 [2] _reshape
   @ Base .\reshapedarray.jl:177 [inlined]
 [3] reshape
   @ Base .\reshapedarray.jl:112 [inlined]
 [4] reshape(::SubArray{Int64, 2, CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}, Tuple{UnitRange{…}, Base.Slice{…}}, false}, ::Int64, ::Int64)
   @ Base .\reshapedarray.jl:117
 [5] top-level scope
   @ REPL[9]:1
 [6] top-level scope
   @ C:\Users\kawcz\.julia\packages\CUDA\rXson\src\initialization.jl:208
Some type information was truncated. Use `show(err)` to see complete types.

The error in the log is:

caused by: Scalar indexing is disallowed.
--
  | Invocation of getindex resulted in scalar indexing of a GPU array.
  | This is typically caused by calling an iterating implementation of a method.
  | Such implementations *do not* execute on the GPU, but very slowly on the CPU,
  | and therefore are only permitted from the REPL for prototyping purposes.
  | If you did intend to index this array, annotate the caller with @allowscalar.
  | Stacktrace:
  | [1] error(s::String)
  | @ Base ./error.jl:35
  | [2] assertscalar(op::String)
  | @ GPUArraysCore /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
  | [3] getindex(A::CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, I::Int64)
  | @ GPUArrays /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/GPUArrays/dAUOE/src/host/indexing.jl:48
  | [4] scalar_getindex
  | @ /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/GPUArrays/dAUOE/src/host/indexing.jl:34 [inlined]
  | [5] _getindex
  | @ /central/scratch/esm/slurm-buildkite/climaatmos-ci/depot/default/packages/GPUArrays/dAUOE/src/host/indexing.jl:17 [inlined]
  | [6] getindex(A::CUDA.CuArray
...

It's related to scalar indexing, which is not allowed on the gpu.

@Sbozzolo
Copy link
Member

Sbozzolo commented Feb 1, 2024

Ops, I miscopied! The 14 should be a 4

julia> copy(view(reshape(view(A, 1:3, :), 4, 3), 3, :))
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f5e41d44010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
3-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
  3
  9
 14

@szy21 szy21 requested a review from Sbozzolo February 2, 2024 01:33
@szy21 szy21 marked this pull request as ready for review February 2, 2024 01:33
@szy21 szy21 force-pushed the zs/rad_diagnostics branch 2 times, most recently from 22353ea to 3c88ba4 Compare February 2, 2024 07:26
@szy21 szy21 force-pushed the zs/rad_diagnostics branch from 3c88ba4 to 492513d Compare February 2, 2024 07:59
@szy21 szy21 enabled auto-merge February 2, 2024 08:06
@szy21 szy21 added this pull request to the merge queue Feb 2, 2024
Merged via the queue into main with commit ea83c95 Feb 2, 2024
10 checks passed
@szy21 szy21 deleted the zs/rad_diagnostics branch February 2, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add surface and TOA radiative fluxes to diagnostics
3 participants