Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU compatibility in ClimaLSM #392

Closed
Tracked by #398
Sbozzolo opened this issue Nov 16, 2023 · 1 comment · Fixed by #560
Closed
Tracked by #398

Add GPU compatibility in ClimaLSM #392

Sbozzolo opened this issue Nov 16, 2023 · 1 comment · Fixed by #560
Assignees
Labels
GPU SDI Software Design Issue

Comments

@Sbozzolo
Copy link
Member

Sbozzolo commented Nov 16, 2023

GPU compatibility

As of 15 Jan 2024, the key missing elements to ensure that ClimaLSM is GPU compatible are file processing and interpolations. This issue is now turned into a Software Design Issue to describe how these will be implemented.

The problem

ClimaLSM reads in and evaluates various data for boundary conditions, forcings, parameters, and so on. As of now, the data is mostly used to prescribe atmosphere, radiation, and snow/precipitation. For a single site, files are read, data is preprocessed (e.g., to deal with missing elements), and 1D splines are produced. Then, ClimaLSM is run assuming that we can simply call atmos_T(t) to obtain the temperature of the atmosphere at time t.
This pattern works fine for single-site CPU runs, but has to be generalized for global/GPU runs. The main reasons for this are:

  • Splines are not transpently GPU-compatible,
  • The amount of data that has to be read for global runs can vastly exceed the system limit, and it is not possible to read everything before a run starts.

We want to support (with the same user/developer-interface for CPUs and GPUs)

  1. analytic functions (of arbitrary dimensions),
  2. 1D temporal data read from files (e.g., for FLUXNET data),
  3. 2 + 1D spatiotemporal data read from files (e.g., ERA5 data).

We also want to support arbitrary interpolation schemes. In this way, we can start for the more difficult cases with the simple nearest neighbor interpolation (0th order), and later implement a linear interpolation (1st order), while leaving a pathway to implement higher order interpolation schemes in the future (if needed). This might also be the place where we implement more complex behaviors, e.g., using the same yearly data for a multi-year run.

The proposed solution

The proposed solution consists in defining a single way to deal with all the aformentioned cases with multiple backends. We propose to define a new abstract class AbstractInterpolator and a new function to evaluate the interpolator. The function might look like

"""
    interpolate(interpolator::AbstractInterpolator, point)

Evaluate `interpolator` on the given `point`. 
"""
function interpolate(interpolator::AbstractInterpolator, point) end

Or (more likely)

"""
    interpolate!(destination, interpolator::AbstractInterpolator, point)

Evaluate `interpolator` on the given `point` and write the output to `destination`. 
"""
function interpolate!(destination, interpolator::AbstractInterpolator, point) end

This second option does not look as clean, but it provides greater flexibility, which will probably be required to avoid scalar indexing (for GPUs). We will explore both options.

interpolate/interpolate! will be the only way to evaluate external data (ie, not generated in running the model). ClimaLSM will be updated to use this everywhere.

It would be nice if the 1D interpolators could naturally support being broadcasted over a surface to use the same prescription over the globe. That is, if we are working with 1D data but running on a sphere, the 1D data should be used for all the sites on the sphere. This capability will probably require going with interpolate! as opposed to interpolate.

AbstractInterpolator will have three concrete implementation, corresponding to the three use cases described above. Constructors will automatically identify which of the three implementations to use based on the input data, and the existance of three concrete structs should be considered an implementation detail.

A little bit more details/Sketch of implementation

AnalyticInterpolator

AnalyticInterpolator is the simplest. It stores one julia function that is expected to be evaluated on a point (in space and/or time).

struct AnalyticInterpolator{F <: Function} <: AbstractInterpolator
    func::F
end

As for interpolate, the implementation is simply

interpolate(interpolator::AnalyticInterpolator, point) = interpolator.func(point)

Special care has to be put in checking for type consistency (in particular with regards with Float32 and Float64).

TemporalInterpolator1D

With TemporalInterpolator1D, we can leverage the fact that the amount of data is small. With small amounts of data, we can read everything at the beginning and store the backing arrays to memory (on GPUs as well).

struct TemporalInterpolator1D{AA <: AbstractArray} <: AbstractInterpolator
    xs::AA
    ys::AA
end

xs and ys can be Arrays or CuArrays. They have to be preprocessed, and xs will probably have to be sorted.

As for interpolate, we can have something like (for nearest-neighbor interpolation)

function interpolate(interpolator::TemporalInterpolator1D, point)
    index = searchsortedfirst(interpolator.xs, point)
    return interpolator.ys[index]
end

TemporalInterpolator3D

TemporalInterpolator3D is the most complex one. For TemporalInterpolator3D, we assume that we can only hold in memory a limited amount of data. TemporalInterpolator3D will require building a flexible online file/data processor. Most of challenge in implementing TemporalInterpolator3D is developing an improved FileReader object (that is GPU compatible and works on Fields). This is aligned with the goals in #398 (cc @juliasloan25).

This interpolator hands off the hard work the an XFileReader object (the X at the beginning is just to distinguish it in this SDI from our current implementation).

struct TemporalInterpolator3D <: AbstractInterpolator
     reader::XFileReader
end

From the point of view of TemporalInterpolator3D, XFileReader has to transparently provide the data it is asked.

A linear interpolation method might look like

function interpolate(interpolator::TemporalInterpolator3D, time)
    data_left, time_left = read_before(interpolator.reader, time)
    data_right, time_right = read_after(interpolator.reader, time)
    return data_left + (data_right - data_left) / (time_right - time_left) * (time - time_left)
end

It is important to note that with this design we isolated the challenges in
the implementation of read_before/after (which are tricky both for GPU compatibility and to obtain reasonable performance). As long as we establish a contract between TemporalInterpolator3D and XFileReader with a clear interface, we can tackle this problem incrementally.

Step 1:
read could just read the data on-demand everytime it is asked. This means, handing over the control to the CPU, reading the file from disk, remap it to the grid (if a remapped version is not already available in a cache), and copy it over to the GPU. XFileReader can store the latest version read (if we start with nearest neighbor interpolation, this is a good starting point).

Step 2:
read could buffer some data. When asked to provide data, XFileReader checks the time requested is in the range that has already been read, if not read and process the next N data slices. Once again, this means, handing over the control to the CPU, reading the file from disk, remap N slices to the grid (if a remapped version is not already available in a cache), and copy them to the GPU.

Given all the I/O, this leads to

Step 3:
XFileReader works asynchronously. XFileReader has a buffer of data to fill. This buffer is constantly being replenished by the CPU when it is idle (this is relevant only for GPU runs).

Implementing different interpolation schemes

A simple way to implement different interpolation schemes is to add an additional type layer that we can use to dispatch different interpolation methods. We can define types AbstractInterpolationMethod:

struct NearestNeighbor <: AbstractInterpolationMethod end
struct LinearInterpolation <: AbstractInterpolationMethod end

With this approach, the previously define interpolators would be augmented with a new field method, which is an instance of one of the AbstractInterpolationMethod types. Then, interpolate would become

interpolate(interpolator, point) = interpolate(interpolator, point, interpolator.method)

function interpolate(interpolator, point, method::NearestNeighbor)
# implementation
end

function interpolate(interpolator, point, method::LinearInterpolation)
# implementation
end

This requires implementing the same interpolation method for every AbstractInterpolator and does not allow composing different methods (it is possible to customize the behavior of one of the methods by adding additional fields, e.g., what to do with edge values). However, this method poses no major challenges for GPUs and poses no addition runtime cost (because everything is inferred from the type). Given that we expect to use linear interpolation for everything, and supporting different interpolation methods is mostly to ease development and to provide pathways for future expansions, I believe that this is the best way forward.

@juliasloan25
Copy link
Member

Thank you for writing this up! It's very nice to have the use cases and implementation ideas in one place.

We should also update our file reading so that we only read in the dates of a file that will be used for the simulation. Currently we read and regrid all available dates in a file, even if they won't be used. It seems like this would be handled by the read_before/read_after implementations, but maybe we want to do this for the 1D case as well

@juliasloan25 juliasloan25 changed the title Add GPU compatibility Add GPU compatibility in ClimaLSM Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU SDI Software Design Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants