provides an opportunity to analyze large-scale baseflow trends under global change 🔥
To fill the gaps in time-series baseflow datasets, we introduced a machine learning approach called long short-term memory (LSTM) networks to develop a monthly baseflow dataset.
To better train across basins, we compared the standard LSTM with four variant architectures using additional static properties as input. Results show that three variant architectures (Joint, Front, and EA-LSTM) perform better than the standard LSTM, with median Kling-Gupta efficiency across basins greater than 0.85.
Based on Front LSTM, the monthly baseflow dataset with 0.25° spatial resolution across the contiguous United States from 1981 to 2020 was obtained, which can be downloaded from the release page.
├── configs <- Hydra configuration files
│ ├── constant <- Folder paths and constants
│ ├── dataset <- Configs of Pytorch dataset
│ ├── datasplit <- Split dataset into train and test
│ ├── hydra <- Configs of Hydra logging and launcher
│ ├── loss <- Configs of loss function
│ ├── model <- Configs of Pytorch model architectures
│ ├── optimizer <- Configs of optimizer
│ ├── trainer <- Configs of validation metrics and trainer
│ ├── tuner <- Configs of Optuna hyperparameter search
│ └── config.yaml <- Main project configuration file
│
├── data <- Baseflow, time series, and static properties
│
├── logs <- Logs generated by Hydra and PyTorch loggers
│
├── saved <- Saved evaluation results and model parameters
│
├── src
│ ├── datasets <- PyTorch datasets
│ ├── datasplits <- Dataset splitter for train and test
│ ├── models <- PyTorch model architectures
│ ├── trainer <- Class managing training process
│ ├── utils <- Utility scripts for metric logging
│ ├── evaluate.py <- Model evaluation piplines
│ ├── perpare.py <- Data preparation piplines
│ └── simulate.py <- Simulate gridded baseflow
│
├── run.py <- Run pipeline with chosen configuration
│
├── main.py <- Main process for the whole project
│
├── .gitignore <- List of files/folders ignored by git
├── requirements.txt <- File for installing python dependencies
├── LICENSE
└── README.md
from src import prepare
# download data from ERA5 and Google Earth Engine
prepare(cfg['constant'])
# detailed settings are in optuna.yaml
python run.py -m tuner=optuna
# evaluate Front LSTM using test_size=0.2
python run.py -m model=front dataset.eco=CPL, NAP, NPL
# train Front LSTM using test_size=0
python run.py -m model=front datasplit=full dataset.eco=CPL, NAP, NPL
from src import simulate
# load the trained model for each ecoregion
checkpoint = 'saved/train/front/CPL/models/model_latest.pth'
simulate(checkpoint)
- Xie, J., Liu, X., Tian, W., Wang, K., Bai, P., & Liu, C. (2022). Estimating Gridded Monthly Baseflow From 1981 to 2020 for the Contiguous US Using Long Short-Term Memory (LSTM) Networks. Water Resources Research, 58(8), e2021WR031663. https://doi.org/10.1029/2021WR031663