Add GPU compatibility in ClimaLSM #392

Sbozzolo · 2023-11-16T17:53:50Z

GPU compatibility

As of 15 Jan 2024, the key missing elements to ensure that ClimaLSM is GPU compatible are file processing and interpolations. This issue is now turned into a Software Design Issue to describe how these will be implemented.

The problem

ClimaLSM reads in and evaluates various data for boundary conditions, forcings, parameters, and so on. As of now, the data is mostly used to prescribe atmosphere, radiation, and snow/precipitation. For a single site, files are read, data is preprocessed (e.g., to deal with missing elements), and 1D splines are produced. Then, ClimaLSM is run assuming that we can simply call atmos_T(t) to obtain the temperature of the atmosphere at time t.
This pattern works fine for single-site CPU runs, but has to be generalized for global/GPU runs. The main reasons for this are:

Splines are not transpently GPU-compatible,
The amount of data that has to be read for global runs can vastly exceed the system limit, and it is not possible to read everything before a run starts.

We want to support (with the same user/developer-interface for CPUs and GPUs)

analytic functions (of arbitrary dimensions),
1D temporal data read from files (e.g., for FLUXNET data),
2 + 1D spatiotemporal data read from files (e.g., ERA5 data).

We also want to support arbitrary interpolation schemes. In this way, we can start for the more difficult cases with the simple nearest neighbor interpolation (0th order), and later implement a linear interpolation (1st order), while leaving a pathway to implement higher order interpolation schemes in the future (if needed). This might also be the place where we implement more complex behaviors, e.g., using the same yearly data for a multi-year run.

The proposed solution

The proposed solution consists in defining a single way to deal with all the aformentioned cases with multiple backends. We propose to define a new abstract class AbstractInterpolator and a new function to evaluate the interpolator. The function might look like

"""
    interpolate(interpolator::AbstractInterpolator, point)

Evaluate `interpolator` on the given `point`. 
"""
function interpolate(interpolator::AbstractInterpolator, point) end

Or (more likely)

"""
    interpolate!(destination, interpolator::AbstractInterpolator, point)

Evaluate `interpolator` on the given `point` and write the output to `destination`. 
"""
function interpolate!(destination, interpolator::AbstractInterpolator, point) end

This second option does not look as clean, but it provides greater flexibility, which will probably be required to avoid scalar indexing (for GPUs). We will explore both options.

interpolate/interpolate! will be the only way to evaluate external data (ie, not generated in running the model). ClimaLSM will be updated to use this everywhere.

It would be nice if the 1D interpolators could naturally support being broadcasted over a surface to use the same prescription over the globe. That is, if we are working with 1D data but running on a sphere, the 1D data should be used for all the sites on the sphere. This capability will probably require going with interpolate! as opposed to interpolate.

AbstractInterpolator will have three concrete implementation, corresponding to the three use cases described above. Constructors will automatically identify which of the three implementations to use based on the input data, and the existance of three concrete structs should be considered an implementation detail.

A little bit more details/Sketch of implementation

`AnalyticInterpolator`

AnalyticInterpolator is the simplest. It stores one julia function that is expected to be evaluated on a point (in space and/or time).

struct AnalyticInterpolator{F <: Function} <: AbstractInterpolator
    func::F
end

As for interpolate, the implementation is simply

interpolate(interpolator::AnalyticInterpolator, point) = interpolator.func(point)

Special care has to be put in checking for type consistency (in particular with regards with Float32 and Float64).

`TemporalInterpolator1D`

With TemporalInterpolator1D, we can leverage the fact that the amount of data is small. With small amounts of data, we can read everything at the beginning and store the backing arrays to memory (on GPUs as well).

struct TemporalInterpolator1D{AA <: AbstractArray} <: AbstractInterpolator
    xs::AA
    ys::AA
end

xs and ys can be Arrays or CuArrays. They have to be preprocessed, and xs will probably have to be sorted.

As for interpolate, we can have something like (for nearest-neighbor interpolation)

function interpolate(interpolator::TemporalInterpolator1D, point)
    index = searchsortedfirst(interpolator.xs, point)
    return interpolator.ys[index]
end

`TemporalInterpolator3D`

TemporalInterpolator3D is the most complex one. For TemporalInterpolator3D, we assume that we can only hold in memory a limited amount of data. TemporalInterpolator3D will require building a flexible online file/data processor. Most of challenge in implementing TemporalInterpolator3D is developing an improved FileReader object (that is GPU compatible and works on Fields). This is aligned with the goals in #398 (cc @juliasloan25).

This interpolator hands off the hard work the an XFileReader object (the X at the beginning is just to distinguish it in this SDI from our current implementation).

struct TemporalInterpolator3D <: AbstractInterpolator
     reader::XFileReader
end

From the point of view of TemporalInterpolator3D, XFileReader has to transparently provide the data it is asked.

A linear interpolation method might look like

function interpolate(interpolator::TemporalInterpolator3D, time)
    data_left, time_left = read_before(interpolator.reader, time)
    data_right, time_right = read_after(interpolator.reader, time)
    return data_left + (data_right - data_left) / (time_right - time_left) * (time - time_left)
end

It is important to note that with this design we isolated the challenges in
the implementation of read_before/after (which are tricky both for GPU compatibility and to obtain reasonable performance). As long as we establish a contract between TemporalInterpolator3D and XFileReader with a clear interface, we can tackle this problem incrementally.

Step 1:
read could just read the data on-demand everytime it is asked. This means, handing over the control to the CPU, reading the file from disk, remap it to the grid (if a remapped version is not already available in a cache), and copy it over to the GPU. XFileReader can store the latest version read (if we start with nearest neighbor interpolation, this is a good starting point).

Step 2:
read could buffer some data. When asked to provide data, XFileReader checks the time requested is in the range that has already been read, if not read and process the next N data slices. Once again, this means, handing over the control to the CPU, reading the file from disk, remap N slices to the grid (if a remapped version is not already available in a cache), and copy them to the GPU.

Given all the I/O, this leads to

Step 3:
XFileReader works asynchronously. XFileReader has a buffer of data to fill. This buffer is constantly being replenished by the CPU when it is idle (this is relevant only for GPU runs).

Implementing different interpolation schemes

A simple way to implement different interpolation schemes is to add an additional type layer that we can use to dispatch different interpolation methods. We can define types AbstractInterpolationMethod:

struct NearestNeighbor <: AbstractInterpolationMethod end
struct LinearInterpolation <: AbstractInterpolationMethod end

With this approach, the previously define interpolators would be augmented with a new field method, which is an instance of one of the AbstractInterpolationMethod types. Then, interpolate would become

interpolate(interpolator, point) = interpolate(interpolator, point, interpolator.method)

function interpolate(interpolator, point, method::NearestNeighbor)
# implementation
end

function interpolate(interpolator, point, method::LinearInterpolation)
# implementation
end

This requires implementing the same interpolation method for every AbstractInterpolator and does not allow composing different methods (it is possible to customize the behavior of one of the methods by adding additional fields, e.g., what to do with edge values). However, this method poses no major challenges for GPUs and poses no addition runtime cost (because everything is inferred from the type). Given that we expect to use linear interpolation for everything, and supporting different interpolation methods is mostly to ease development and to provide pathways for future expansions, I believe that this is the best way forward.

The text was updated successfully, but these errors were encountered:

juliasloan25 · 2024-01-16T16:33:28Z

Thank you for writing this up! It's very nice to have the use cases and implementation ideas in one place.

We should also update our file reading so that we only read in the dates of a file that will be used for the simulation. Currently we read and regrid all available dates in a file, even if they won't be used. It seems like this would be handled by the read_before/read_after implementations, but maybe we want to do this for the 1D case as well

Sbozzolo mentioned this issue Nov 16, 2023

First steps toward GPU compatibility #393

Merged

Sbozzolo self-assigned this Nov 30, 2023

juliasloan25 added the GPU label Dec 5, 2023

Sbozzolo added the SDI Software Design Issue label Jan 15, 2024

Sbozzolo mentioned this issue Jan 15, 2024

Remove splines, add TimeVaryingInputs #451

Merged

juliasloan25 changed the title ~~Add GPU compatibility~~ Add GPU compatibility in ClimaLSM Jan 18, 2024

Sbozzolo mentioned this issue Mar 28, 2024

Use ClimaUtilities FileReading/Inputs/Regridding #560

Merged

Sbozzolo closed this as completed in #560 Mar 28, 2024

juliasloan25 mentioned this issue Apr 24, 2024

O2.3.5 Enable global runs (on GPU) #398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU compatibility in ClimaLSM #392

Add GPU compatibility in ClimaLSM #392

Sbozzolo commented Nov 16, 2023 •

edited by juliasloan25

Loading

juliasloan25 commented Jan 16, 2024

Add GPU compatibility in ClimaLSM #392

Add GPU compatibility in ClimaLSM #392

Comments

Sbozzolo commented Nov 16, 2023 • edited by juliasloan25 Loading

GPU compatibility

The problem

The proposed solution

A little bit more details/Sketch of implementation

AnalyticInterpolator

TemporalInterpolator1D

TemporalInterpolator3D

Implementing different interpolation schemes

juliasloan25 commented Jan 16, 2024

Sbozzolo commented Nov 16, 2023 •

edited by juliasloan25

Loading

`AnalyticInterpolator`

`TemporalInterpolator1D`

`TemporalInterpolator3D`