This repository contains the code used for "Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction" [paper]
The code was developed by the authors of the preprint: Jared Willard, Shashank Subramanian, Peter Harrington, Ankur Mahesh, Travis O'Brien, and William Collins
SwinV2_Weather is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25∘ resolution using a minimally modified SwinV2 transformer. SwinV2_Weather outperforms the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS) deterministic forecast, a state-of-the-art Numerical Weather Prediction (NWP) model, at nearly all lead times for critical large-scale variables like geopotential height at 500 hPa (z500), 2-meter temperature (t2m), and 10-meter wind speed (u10m).
SwinV2_Weather is based on the original Swin Transformer V2 architecture proposed in Liu et al. [2022], and we adapt the Hugging Face implementation of the model for this repository.
ERA5 Dataset Download (Nvidia)
The model is trained on 73-channels of ERA5 reanalysis data for both single levels [ Hersbach 2018 ] and 13 different pressure levels [ Hersbach 2018 ] that is pre-processed and stored into hdf5 files. This data can be aquired from one of the previously mentioned links.
/swin_model_registry/
The following subdirectory structure exists for each model used in the preprint.
├── swin_73var_depth12_chweight_invar/
│ ├── global_means.npy (Precomputed stats for normalization)
│ ├── global_stds.npy
│ ├── hyperparams.yaml (model hyperparameter / config information)
│ ├── metadata.json (metadata only used for scoring in Earth2Mip)
│ └── weights.tar (model weights)
Training configurations can be set up in config/swin.yaml. The following paths need to be set by the user. These paths should point to the data and stats you downloaded in the steps above:
<!-- swin_73var: &73var
...
...
orography: !!bool False
orography_path: None # provide path to orography.h5 file if set to true,
exp_dir: # directory path to store training checkpoints and other output
train_data_path: # full path to /train/
valid_data_path: # full path to /test/
inf_data_path: # full path to /out_of_sample. Will not be used while training.
time_means_path: # full path to time_means.npy
global_means_path: # full path to global_means.npy
global_stds_path: # full path to global_stds.npy
time_diff_means_path: # full path to time_diff_means.npy
time_diff_stds_path: # full path to time_diff_stds.npy
orography_path: # full path to orography.h5 file
landmask_path: # full path to landmask.h5 file
project: 'wandb_project' #fill in wandb project
entity: 'wandb_entity' #fill in wandb entity
An example launch script for distributed data parallel training on the slurm based HPC cluster perlmutter is provided in submit_batch.sh
.
For inference and scoring we used Earth2MIP. A fork that contains an implementation of our Swin model and directions for inference and scoring can be found at https://github.com/jdwillard19/earth2mip-swin-fork/.
If you find this work useful, cite it using:
@misc{willard2024analyzing,
title={Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction},
author={Jared D. Willard and Peter Harrington and Shashank Subramanian and Ankur Mahesh and Travis A. O'Brien and William D. Collins},
year={2024},
eprint={2404.19630},
archivePrefix={arXiv},
primaryClass={cs.LG}
}