-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TimeVaryingInput2D
#465
Conversation
70d645e
to
e6232a9
Compare
# Delete testing directory and files | ||
rm(regrid_dir_static; recursive = true, force = true) | ||
rm(regrid_dir_temporal; recursive = true, force = true) | ||
# using Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to uncomment?
@@ -87,17 +87,8 @@ tf = 50 * 86400; | |||
Δt = 3600.0; | |||
|
|||
# Construct albedo parameter object using temporal map | |||
# Use separate regridding directory for CPU and GPU runs to avoid race condition | |||
device_suffix = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here
src/shared_utilities/DataHandling.jl
Outdated
|
||
""" | ||
|
||
Note: It is best to have one file per variable with all the temporal data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be a little inconvenient/ add a step - all the reanalysis data is in one file to start with.
import ..FileReader: NCFileReader | ||
import ..Regridder | ||
import ..Regridder: | ||
AbstractRegridder, TempestRegridder, regrid, AVAILABLE_REGRIDDERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see anything in the style guide for all caps in names - why is AVAILABLE_REGRIDDERS capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - I see some languages use that for constants. it doesnt seem like Julia follows that notation: https://docs.julialang.org/en/v1/base/base/#const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL_CAPS is mostly a way to tell the reader that the variable is a "global fixed parameter", with the emphasis being in "global". We don't have to do this, but I personally find it an useful additional qualifier.
(const
in Julia doesn't even ensure that the value doesn't change.)
src/shared_utilities/DataHandling.jl
Outdated
Note: It is best to have one file per variable with all the temporal data. | ||
|
||
Assumptions: | ||
- There is only one file with all the data in time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to contradict the above Note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably poorly worded: instead of having multiple files with ranges of dates, it is best to have one single file with alle the dates ("all the data (for a given variable) in time").
TSTART <: AbstractFloat, | ||
DATES <: AbstractArray{<:Dates.DateTime}, | ||
DIMS, | ||
TIMES <: AbstractArray{<:AbstractFloat}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this last time, butTIMES <: AbstractArray{<:AbstractFloat}
is the same as AbstractArray{<:AbstractFloat}
in this context, right?
If so, I dont see that we need to have these types as parametric types.
end | ||
|
||
function next_time(data_handler::DataHandler, date::Dates.DateTime) | ||
if date in data_handler.available_dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth using the same code as is inside the functionnext_time(data_handler::DataHandler, time::AbstractFloat)
for clarity (otherwise the reader wonders if next_date
does something different than next_time
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this! I think readability is more important than being concise here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the ClimaUtilities version.
src/shared_utilities/DataHandling.jl
Outdated
|
||
snapshot = get!(data_handler._cached_regridded_fields, date) do | ||
# Add here the arguments when we have more regridders | ||
regrid_args = Dict(:TempestRegridder => (date,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused why the regridding needs the date.
I am a little confused by the notation also. we get the value of data_handler._cached_regridded_fields
corresponding do the date, but then what is that called in the do
block/where is it used in the do
block? is that what date
then refers to (instead of the actual date)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TempestRegridder regrids everything ahead of time, saves everything into files, and then read the a filename that is constructed with the date. That's why we need the date.
This block does the following: if date
is a key in data_handler._cached_regridded_fields
, then return the associated value. If date
is not yet a key, then execute the do-block and add the return value as key. So, we change data_handler._cached_regridded_fields
to add a new date
with the regridded value.
src/shared_utilities/DataHandling.jl
Outdated
snapshot = get!(data_handler._cached_regridded_fields, date) do | ||
# Add here the arguments when we have more regridders | ||
regrid_args = Dict(:TempestRegridder => (date,)) | ||
regrid(data_handler.regridder, regrid_args[regridder_type]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we need to regrid frame by frame, or is a future option to regrid batch by batch (many frames at once)? If regridding is matrix operations by a static weight matrix, we could process many snapshots at once?
src/shared_utilities/FileReader.jl
Outdated
date_idx0 = | ||
[argmin(abs.(Dates.value(date_start) .- Dates.value.(all_dates[:])))] | ||
# NOTE: We are hardcoding a few things here! | ||
dimensions = dataset["lon"][:], dataset["lat"][:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not what dimensions::DIMS
is for? if not, they could also pass in the dimension_names. then we dont need to hardcode
src/shared_utilities/FileReader.jl
Outdated
|
||
args = (file_info, file_states, sim_info) | ||
_cached_reads = Dict{Dates.DateTime, Array}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future, this is where we would pre-allocate the space for the batch of data we would read in?
@@ -81,19 +81,54 @@ When passing single-site data | |||
When a `times` and `vals` are passed, `times` have to be sorted and the two arrays have to | |||
have the same length. | |||
|
|||
======= | |||
When the input is a function, the signature of the function can be `func(time, args...; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
|
||
For example: | ||
```julia | ||
CO2fromp(time, Y, p) = p.atmos.co2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe this isnt a good example? is this for the analytic time varying input case?
|
||
Check if the given `time` is in the range of definition for `itp`. | ||
""" | ||
function Base.in(time, itp::InterpolatingTimeVaryingInput0D) | ||
function Base.in(time, itp::InterpolatingTimeVaryingInput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. It is defined at the beginning of this file:
InterpolatingTimeVaryingInput =
Union{InterpolatingTimeVaryingInput0D, InterpolatingTimeVaryingInput2D}
49698ca
to
f9bdd2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all this work Gabriele! I left many comments, and I have a few more here:
- We're opening Netcdf files and keeping them open throughout the simulation, then closing at the end. What will happen if the simulation crashes?
- There are some things left to add in future PRs that we should open issues for:
- allow regridder-specific args: mono, regrid_dir, dates to regrid
- regrid only simulation dates (related to dates arg in previous point)
- add FileReaders cache
- add DataHandling cache cleanup function/implement LRU cache
- please update the NEWS.md file too!
@@ -1,6 +1,6 @@ | |||
# This file is machine-generated - editing it directly is not advised | |||
|
|||
julia_version = "1.10.1" | |||
julia_version = "1.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just update to 1.10.2 and use that across land/atmos/coupler now
uuid = "d414da3d-4745-48bb-8d80-42e94e092884" | ||
version = "0.13.2" | ||
version = "0.13.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this downgraded?
uuid = "d414da3d-4745-48bb-8d80-42e94e092884" | ||
version = "0.13.2" | ||
version = "0.13.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, packages are downgraded. I just updated everything today so you probably don't need to modify the manifests
- CPU/GPU communication can be a bottleneck | ||
|
||
The `DataHandling` takes the divide and conquer approach: the various core tasks and | ||
features and split into other independent modules (chiefly `FileReaders`, and `Regridders`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "and" -> "are"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the ClimaUtilities version
- `available_times` (`available_dates`): to list all the `times` (`dates`) over which the | ||
data is defined. | ||
- `previous_time(time/date)` (`next_time(time/date)`): to obtain the time of the snapshot | ||
before the given `time` or `date`. This can be used to compute the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could change "before" to "before/after" to explain both cases of previous and next times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the ClimaUtilities version
PATH = temporal_data_path | ||
varname = "sp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could be moved outside of the for loop for consistency with the previous test (but it doesn't really matter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ClimaUtilities
@test ncreader.dimensions[1] == nc["lon"][:] | ||
@test ncreader.dimensions[2] == nc["lat"][:] | ||
|
||
close(ncreader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could check that OPEN_NCFILES
is updated correctly in the case without time too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't test anything that we are not testing in the case with time
end | ||
|
||
@testset "Temporal TimeVaryingInput 1D" begin | ||
@testset "InteprolatingTimeVaryingInput0D" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Inteprolating -> Interpolating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ClimaUtilities
@@ -102,3 +112,106 @@ end | |||
end | |||
end | |||
end | |||
|
|||
@testset "InteprolatingTimeVaryingInput2D" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Inteprolating -> Interpolating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ClimaUtilities
TimeVaryingInputs.evaluate!(dest, input_nearest, target_time) | ||
|
||
@test isequal( | ||
Array(parent(dest)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to use Array here but not in the "on node" case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the Array there too
Okay, this is a massive commit (and I feel bad about it). This commit splits PrescribedDataStatic and PrescibedDataTemporal in multiple independent modules. Before, PrescribedData was: - reading files - regridding - discovering dates available - interpolating - taking care of edge cases (e.g., what to do when reading outside of where the date is outside of the definition range) This commit splits all the different functions in different independent modules so that they can be more easily maintained and upgraded. This commit also introduces InterpolationsRemapper, which uses Interpolations.jl to do online/ and distributed remapping from files. This will soon be needed to efficiently support reading and buffering reads for GPU runs.
Since all the files are opened in read-only mode, it shouldn't be too much of a problem if a simulation crashes, we should start doing error handling and have a cleanup function
Opened an issue in ClimaUtiltiees
This is implemented.
Opened an issue in ClimaUtilities |
Implemented in #560 |
Okay, this is a massive commit (and I feel bad about it).
This commit splits PrescribedDataStatic and PrescibedDataTemporal in multiple independent modules.
Before, PrescribedData was:
This commit splits all the different functions in different independent modules so that they can be more easily maintained and upgraded.
This will soon be needed to efficiently support reading and buffering reads for GPU runs.
In all of this, we are still using ClimaCoreTempestRemap and doing everything ahead of time. At the moment, we are not efficiently reusing any informtion across variables that share the same remapping weights, but this can be added in the future.