You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of 15 Jan 2024, the key missing elements to ensure that ClimaLSM is GPU compatible are file processing and interpolations. This issue is now turned into a Software Design Issue to describe how these will be implemented.
The problem
ClimaLSM reads in and evaluates various data for boundary conditions, forcings, parameters, and so on. As of now, the data is mostly used to prescribe atmosphere, radiation, and snow/precipitation. For a single site, files are read, data is preprocessed (e.g., to deal with missing elements), and 1D splines are produced. Then, ClimaLSM is run assuming that we can simply call atmos_T(t) to obtain the temperature of the atmosphere at time t.
This pattern works fine for single-site CPU runs, but has to be generalized for global/GPU runs. The main reasons for this are:
Splines are not transpently GPU-compatible,
The amount of data that has to be read for global runs can vastly exceed the system limit, and it is not possible to read everything before a run starts.
We want to support (with the same user/developer-interface for CPUs and GPUs)
analytic functions (of arbitrary dimensions),
1D temporal data read from files (e.g., for FLUXNET data),
2 + 1D spatiotemporal data read from files (e.g., ERA5 data).
We also want to support arbitrary interpolation schemes. In this way, we can start for the more difficult cases with the simple nearest neighbor interpolation (0th order), and later implement a linear interpolation (1st order), while leaving a pathway to implement higher order interpolation schemes in the future (if needed). This might also be the place where we implement more complex behaviors, e.g., using the same yearly data for a multi-year run.
The proposed solution
The proposed solution consists in defining a single way to deal with all the aformentioned cases with multiple backends. We propose to define a new abstract class AbstractInterpolator and a new function to evaluate the interpolator. The function might look like
""" interpolate(interpolator::AbstractInterpolator, point)Evaluate `interpolator` on the given `point`. """functioninterpolate(interpolator::AbstractInterpolator, point) end
Or (more likely)
""" interpolate!(destination, interpolator::AbstractInterpolator, point)Evaluate `interpolator` on the given `point` and write the output to `destination`. """functioninterpolate!(destination, interpolator::AbstractInterpolator, point) end
This second option does not look as clean, but it provides greater flexibility, which will probably be required to avoid scalar indexing (for GPUs). We will explore both options.
interpolate/interpolate! will be the only way to evaluate external data (ie, not generated in running the model). ClimaLSM will be updated to use this everywhere.
It would be nice if the 1D interpolators could naturally support being broadcasted over a surface to use the same prescription over the globe. That is, if we are working with 1D data but running on a sphere, the 1D data should be used for all the sites on the sphere. This capability will probably require going with interpolate! as opposed to interpolate.
AbstractInterpolator will have three concrete implementation, corresponding to the three use cases described above. Constructors will automatically identify which of the three implementations to use based on the input data, and the existance of three concrete structs should be considered an implementation detail.
A little bit more details/Sketch of implementation
AnalyticInterpolator
AnalyticInterpolator is the simplest. It stores one julia function that is expected to be evaluated on a point (in space and/or time).
Special care has to be put in checking for type consistency (in particular with regards with Float32 and Float64).
TemporalInterpolator1D
With TemporalInterpolator1D, we can leverage the fact that the amount of data is small. With small amounts of data, we can read everything at the beginning and store the backing arrays to memory (on GPUs as well).
xs and ys can be Arrays or CuArrays. They have to be preprocessed, and xs will probably have to be sorted.
As for interpolate, we can have something like (for nearest-neighbor interpolation)
functioninterpolate(interpolator::TemporalInterpolator1D, point)
index =searchsortedfirst(interpolator.xs, point)
return interpolator.ys[index]
end
TemporalInterpolator3D
TemporalInterpolator3D is the most complex one. For TemporalInterpolator3D, we assume that we can only hold in memory a limited amount of data. TemporalInterpolator3D will require building a flexible online file/data processor. Most of challenge in implementing TemporalInterpolator3D is developing an improved FileReader object (that is GPU compatible and works on Fields). This is aligned with the goals in #398 (cc @juliasloan25).
This interpolator hands off the hard work the an XFileReader object (the X at the beginning is just to distinguish it in this SDI from our current implementation).
It is important to note that with this design we isolated the challenges in
the implementation of read_before/after (which are tricky both for GPU compatibility and to obtain reasonable performance). As long as we establish a contract between TemporalInterpolator3D and XFileReader with a clear interface, we can tackle this problem incrementally.
Step 1: read could just read the data on-demand everytime it is asked. This means, handing over the control to the CPU, reading the file from disk, remap it to the grid (if a remapped version is not already available in a cache), and copy it over to the GPU. XFileReader can store the latest version read (if we start with nearest neighbor interpolation, this is a good starting point).
Step 2: read could buffer some data. When asked to provide data, XFileReader checks the time requested is in the range that has already been read, if not read and process the next N data slices. Once again, this means, handing over the control to the CPU, reading the file from disk, remap N slices to the grid (if a remapped version is not already available in a cache), and copy them to the GPU.
Given all the I/O, this leads to
Step 3: XFileReader works asynchronously. XFileReader has a buffer of data to fill. This buffer is constantly being replenished by the CPU when it is idle (this is relevant only for GPU runs).
Implementing different interpolation schemes
A simple way to implement different interpolation schemes is to add an additional type layer that we can use to dispatch different interpolation methods. We can define types AbstractInterpolationMethod:
With this approach, the previously define interpolators would be augmented with a new field method, which is an instance of one of the AbstractInterpolationMethod types. Then, interpolate would become
interpolate(interpolator, point) =interpolate(interpolator, point, interpolator.method)
functioninterpolate(interpolator, point, method::NearestNeighbor)
# implementationendfunctioninterpolate(interpolator, point, method::LinearInterpolation)
# implementationend
This requires implementing the same interpolation method for every AbstractInterpolator and does not allow composing different methods (it is possible to customize the behavior of one of the methods by adding additional fields, e.g., what to do with edge values). However, this method poses no major challenges for GPUs and poses no addition runtime cost (because everything is inferred from the type). Given that we expect to use linear interpolation for everything, and supporting different interpolation methods is mostly to ease development and to provide pathways for future expansions, I believe that this is the best way forward.
The text was updated successfully, but these errors were encountered:
Thank you for writing this up! It's very nice to have the use cases and implementation ideas in one place.
We should also update our file reading so that we only read in the dates of a file that will be used for the simulation. Currently we read and regrid all available dates in a file, even if they won't be used. It seems like this would be handled by the read_before/read_after implementations, but maybe we want to do this for the 1D case as well
juliasloan25
changed the title
Add GPU compatibility
Add GPU compatibility in ClimaLSM
Jan 18, 2024
GPU compatibility
As of 15 Jan 2024, the key missing elements to ensure that ClimaLSM is GPU compatible are file processing and interpolations. This issue is now turned into a Software Design Issue to describe how these will be implemented.
The problem
ClimaLSM
reads in and evaluates various data for boundary conditions, forcings, parameters, and so on. As of now, the data is mostly used to prescribe atmosphere, radiation, and snow/precipitation. For a single site, files are read, data is preprocessed (e.g., to deal with missing elements), and 1D splines are produced. Then,ClimaLSM
is run assuming that we can simply callatmos_T(t)
to obtain the temperature of the atmosphere at timet
.This pattern works fine for single-site CPU runs, but has to be generalized for global/GPU runs. The main reasons for this are:
We want to support (with the same user/developer-interface for CPUs and GPUs)
We also want to support arbitrary interpolation schemes. In this way, we can start for the more difficult cases with the simple nearest neighbor interpolation (0th order), and later implement a linear interpolation (1st order), while leaving a pathway to implement higher order interpolation schemes in the future (if needed). This might also be the place where we implement more complex behaviors, e.g., using the same yearly data for a multi-year run.
The proposed solution
The proposed solution consists in defining a single way to deal with all the aformentioned cases with multiple backends. We propose to define a new abstract class
AbstractInterpolator
and a new function to evaluate the interpolator. The function might look likeOr (more likely)
This second option does not look as clean, but it provides greater flexibility, which will probably be required to avoid scalar indexing (for GPUs). We will explore both options.
interpolate
/interpolate!
will be the only way to evaluate external data (ie, not generated in running the model).ClimaLSM
will be updated to use this everywhere.It would be nice if the 1D interpolators could naturally support being broadcasted over a surface to use the same prescription over the globe. That is, if we are working with 1D data but running on a sphere, the 1D data should be used for all the sites on the sphere. This capability will probably require going with
interpolate!
as opposed tointerpolate
.AbstractInterpolator
will have three concrete implementation, corresponding to the three use cases described above. Constructors will automatically identify which of the three implementations to use based on the input data, and the existance of three concrete structs should be considered an implementation detail.A little bit more details/Sketch of implementation
AnalyticInterpolator
AnalyticInterpolator
is the simplest. It stores one julia function that is expected to be evaluated on a point (in space and/or time).As for
interpolate
, the implementation is simplySpecial care has to be put in checking for type consistency (in particular with regards with Float32 and Float64).
TemporalInterpolator1D
With
TemporalInterpolator1D
, we can leverage the fact that the amount of data is small. With small amounts of data, we can read everything at the beginning and store the backing arrays to memory (on GPUs as well).xs
andys
can beArray
s orCuArray
s. They have to be preprocessed, andxs
will probably have to be sorted.As for
interpolate
, we can have something like (for nearest-neighbor interpolation)TemporalInterpolator3D
TemporalInterpolator3D
is the most complex one. ForTemporalInterpolator3D
, we assume that we can only hold in memory a limited amount of data.TemporalInterpolator3D
will require building a flexible online file/data processor. Most of challenge in implementingTemporalInterpolator3D
is developing an improvedFileReader
object (that is GPU compatible and works onField
s). This is aligned with the goals in #398 (cc @juliasloan25).This interpolator hands off the hard work the an
XFileReader
object (theX
at the beginning is just to distinguish it in this SDI from our current implementation).From the point of view of
TemporalInterpolator3D
,XFileReader
has to transparently provide the data it is asked.A linear interpolation method might look like
It is important to note that with this design we isolated the challenges in
the implementation of
read_before/after
(which are tricky both for GPU compatibility and to obtain reasonable performance). As long as we establish a contract betweenTemporalInterpolator3D
andXFileReader
with a clear interface, we can tackle this problem incrementally.Step 1:
read
could just read the data on-demand everytime it is asked. This means, handing over the control to the CPU, reading the file from disk, remap it to the grid (if a remapped version is not already available in a cache), and copy it over to the GPU.XFileReader
can store the latest version read (if we start with nearest neighbor interpolation, this is a good starting point).Step 2:
read
could buffer some data. When asked to provide data,XFileReader
checks thetime
requested is in the range that has already been read, if not read and process the nextN
data slices. Once again, this means, handing over the control to the CPU, reading the file from disk, remap N slices to the grid (if a remapped version is not already available in a cache), and copy them to the GPU.Given all the I/O, this leads to
Step 3:
XFileReader
works asynchronously.XFileReader
has a buffer of data to fill. This buffer is constantly being replenished by the CPU when it is idle (this is relevant only for GPU runs).Implementing different interpolation schemes
A simple way to implement different interpolation schemes is to add an additional type layer that we can use to dispatch different interpolation methods. We can define types
AbstractInterpolationMethod
:With this approach, the previously define interpolators would be augmented with a new field
method
, which is an instance of one of theAbstractInterpolationMethod
types. Then,interpolate
would becomeThis requires implementing the same interpolation method for every
AbstractInterpolator
and does not allow composing different methods (it is possible to customize the behavior of one of the methods by adding additional fields, e.g., what to do with edge values). However, this method poses no major challenges for GPUs and poses no addition runtime cost (because everything is inferred from the type). Given that we expect to use linear interpolation for everything, and supporting different interpolation methods is mostly to ease development and to provide pathways for future expansions, I believe that this is the best way forward.The text was updated successfully, but these errors were encountered: