Neural snow depth model #345

a-charbon · 2023-10-05T01:49:27Z

Purpose

Opening a request to move neural-network-based snow depth model into the main repository. Developing tests and tutorials for public usage as well as restructuring for resubmission of associated paper.

To-do

Write tests for code
Develop tutorials (paper figures)
Integrate into existing snow model for usage in next-steps model
Check existing guidelines/policies against submitted code.

Content

four tools files (data, model, test/train, display) for usage (will eventually be moved to an appropriate "src" folder)
code for generating the dataset used for the paper
example code for training/validation pipeline (likely to become part of an example/tutorial)

I have read and checked the items on the review checklist.

experiments/standalone/neural_snow/data_tools.jl

experiments/standalone/neural_snow/data_retriever.jl

experiments/standalone/neural_snow/data_tools.jl

experiments/standalone/neural_snow/data_retriever.jl

experiments/standalone/neural_snow/display_tools.jl

experiments/standalone/neural_snow/train_test_tools.jl

experiments/standalone/neural_snow/model_tools.jl

kmdeck

great work, Andy! the code is clean and well documented. I left a number of small comments, but the more major things are:

think about which files should be modules that go into src (DataTools, ModelTools ?), ideally these are fairly agnostic about the specific dataset in hand. Functions in src ideally are tested with unit tests.
It might make more sense for DataTools to be a local file in experiments. but we need the model code in src to evaluate it as part of ClimaLSM.
use existing tools in MLUtils when possible (for train test split, other standard ML tasks)
in experiments or tutorials, can show the results of training (test_run.jl) using functionality from display_tools.jl, analysis_tools.jl.
dont need to run data_retriever in the experiment itself, but can include this file (and DataTools.jl?) saying that this would be run to obtain the preprocessed/transformed data. Then, the completely clean/transformed data is saved in box & used in the training script, which is run in experiments.

What do you think?

experiments/standalone/neural_snow/analysis_tools.jl

experiments/standalone/neural_snow/display_tools.jl

docs/src/APIs/Snow.md

docs/tutorials/Snow/data_tutorial.jl

kmdeck · 2023-10-26T23:54:42Z

src/standalone/Snow/neural_snow/ModelTools.jl

+Return the trainable weights for the developed neural model.
+
+# Arguments
+- `model::Chain`: the neural model to be used.


IIUC, Chain is a Flux type, the neural model is of type Chain. Can the linear model be expressed as a Chain also, but with fewer layers/fixed width of the layer? Does it need its own type?

If so, we wouldnt need two methods for each of these, right? get_model_ps, evaluate, etc.

there's no need to get the model parameters for the linear model, as the linear model as an object is just the coefficient/parameter list (it's trained directly as A\b upon creation and not trained in a trainmodel! call, calling a created linear model just returns the vector of coefficients for the linear model), there is an evaluate call already written - that is, this function is specific to neural model types and not to linear models (it's not like "evaluate" etc, in the same way settimescale! is specific to the neural model)

src/standalone/Snow/neural_snow/ModelTools.jl

kmdeck · 2023-10-27T00:00:10Z

src/standalone/Snow/neural_snow/DataTools.jl

@@ -0,0 +1,549 @@
+module DataTools


I am wondering if this particular module belongs in src vs in the docs/tutorials/Snow folder. The reason why is because this functionality is needed to create the dataset used in your paper, and has some handy function for processing data, but, nothing in here is required for evaluating a trained model to get the snow depth at the next step - which is what we need in the land model simulation (all that is in ModelTools.jl, which definitely belongs in src).

What do you think?

I figured I was helping future work in case CliMA eventually wanted other types of SNOTEL data in training other models (albedo, runoff, etc., the module has all this functionality) and therefore wasn't specifically tied to this particular tutorial/use-case, and therefore should be in a location intended for wider usage. If you disagree I have no issues moving it to the tutorial folder since others can always go grab the tools from there anyway.

kmdeck

Mostly minor comments - thanks for putting this together! The tutorials seem really clear. Let me know if you would like to discuss further.

src/standalone/Snow/neural_snow/ModelTools.jl

test/standalone/Snow/tool_tests.jl

kmdeck

looks good Andy. Once you squash/rebase, it should show as a single commit. Then all the tests should pass, and we can merge it in.

I just added a couple comments re: making sure the float type is maintained - it might be good to add a couple tests for this.

updated manifests new tutorial folder structure format changes changed file referencing

a-charbon requested a review from kmdeck October 5, 2023 01:49