Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add tests for meps dataset #38

Merged
merged 31 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
57edd74
added testing of loading data, creating graphs, and training model.
SimonKamuk May 22, 2024
92aa490
Merge branch 'mllam:main' into main
SimonKamuk May 22, 2024
4e17efb
added test to test name
SimonKamuk May 22, 2024
7fa7cdd
linting
SimonKamuk May 22, 2024
569d061
made create_mesh callable as python function with arguments.
SimonKamuk May 23, 2024
1ebe900
added github ci/cd for running tests with pytest
SimonKamuk May 23, 2024
0e96e88
removed coverage from test ci/cd
SimonKamuk May 23, 2024
2339ed0
fixed error in cicd
SimonKamuk May 23, 2024
5d3f834
removed astroid from requirements, causes codespell error, assuming i…
SimonKamuk May 23, 2024
8d733b7
simplified requirements
SimonKamuk May 23, 2024
7ee8821
removed commas in requirements
SimonKamuk May 23, 2024
9a5f83c
added downloading of test data from EWC using pooch
SimonKamuk May 24, 2024
c7d1d08
added pooch to requirements.txt
SimonKamuk May 24, 2024
2667b6c
updated test dataset
SimonKamuk May 24, 2024
0c7edd4
Disabled latex to enable running on github without having to install …
SimonKamuk May 27, 2024
9352949
only use latex if available
SimonKamuk May 27, 2024
4995de0
included change requests from leifdenby:
SimonKamuk May 28, 2024
d33180f
added comment
SimonKamuk May 28, 2024
fb72943
Merge branch 'mllam:main' into main
SimonKamuk May 30, 2024
e6c2c36
minor requested changes
SimonKamuk May 30, 2024
43558dc
Merge branch 'main' of github.com:SimonKamuk/neural-lam into feature_…
SimonKamuk May 30, 2024
3d77ac4
updated changelog, added cicd badges
SimonKamuk May 31, 2024
d390308
moved installation of torch-geometric from requirements to github tes…
SimonKamuk May 31, 2024
de4efba
changed name of unit test badge
SimonKamuk May 31, 2024
b0c4bed
added caching of test data
SimonKamuk Jun 3, 2024
4868db4
Merge branch 'main' of github.com:SimonKamuk/neural-lam into feature_…
SimonKamuk Jun 3, 2024
18e55a4
fix for caching
SimonKamuk Jun 3, 2024
4f75307
tried fix for caching test data
SimonKamuk Jun 3, 2024
aceb47c
updated changelog
SimonKamuk Jun 3, 2024
a6f8089
separated saving and restoring of cache
SimonKamuk Jun 3, 2024
561a26e
Merge branch 'main' of github.com:SimonKamuk/neural-lam into feature_…
SimonKamuk Jun 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: lint
name: Linting

on:
# trigger on pushes to any branch, but not main
Expand Down
40 changes: 40 additions & 0 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Unit Tests

on:
# trigger on pushes to any branch, but not main
push:
branches-ignore:
- main
# and also on PRs to main
pull_request:
branches:
- main

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install torch-geometric>=2.5.2
- name: Test with pytest
run: |
pytest -v -s
SimonKamuk marked this conversation as resolved.
Show resolved Hide resolved
- name: Cache data
uses: actions/cache@v4
with:
path: data
key: ${{ runner.os }}-meps-reduced-example-data-v0.1.0
restore-keys: |
${{ runner.os }}-meps-reduced-example-data-v0.1.0
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [unreleased](https://github.com/joeloskarsson/neural-lam/compare/v0.1.0...HEAD)

### Added
- Added tests for loading dataset, creating graph, and training model based on reduced MEPS dataset stored on AWS S3, along with automatic running of tests on push/PR to GitHub. Added caching of test data tp speed up running tests.
[/#38](https://github.com/mllam/neural-lam/pull/38)
@SimonKamuk

- Replaced `constants.py` with `data_config.yaml` for data configuration management
[\#31](https://github.com/joeloskarsson/neural-lam/pull/31)
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
![Linting](https://github.com/mllam/neural-lam/actions/workflows/pre-commit.yml/badge.svg)
![Automatic tests](https://github.com/mllam/neural-lam/actions/workflows/run_tests.yml/badge.svg)

<p align="middle">
<img src="figures/neural_lam_header.png" width="700">
</p>
Expand Down Expand Up @@ -279,6 +282,8 @@ pre-commit run --all-files
```
from the root directory of the repository.

Furthermore, all tests in the ```tests``` directory will be run upon pushing changes by a github action. Failure in any of the tests will also reject the push/PR.

# Contact
If you are interested in machine learning models for LAM, have questions about our implementation or ideas for extending it, feel free to get in touch.
You can open a github issue on this page, or (if more suitable) send an email to [joel.oskarsson@liu.se](mailto:joel.oskarsson@liu.se).
4 changes: 2 additions & 2 deletions create_mesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def prepend_node_index(graph, new_index):
return networkx.relabel_nodes(graph, to_mapping, copy=True)


def main():
def main(input_args=None):
parser = ArgumentParser(description="Graph generation arguments")
parser.add_argument(
"--data_config",
Expand Down Expand Up @@ -186,7 +186,7 @@ def main():
default=0,
help="Generate hierarchical mesh graph (default: 0, no)",
)
args = parser.parse_args()
args = parser.parse_args(input_args)

# Load grid positions
config_loader = config.Config.from_file(args.data_config)
Expand Down
239 changes: 239 additions & 0 deletions docs/notebooks/create_reduced_meps_dataset.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating meps_example_reduced\n",
"This notebook outlines how the small-size test dataset ```meps_example_reduced``` was created based on the slightly larger dataset ```meps_example```. The zipped up datasets are 263 MB and 2.6 GB, respectively. See [README.md](../../README.md) for info on how to download ```meps_example```.\n",
"\n",
"The dataset was reduced in size by reducing the number of grid points and variables.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Standard library\n",
"import os\n",
"\n",
"# Third-party\n",
"import numpy as np\n",
"import torch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The number of grid points was reduced to 1/4 by halving the number of coordinates in both the x and y direction. This was done by removing a quarter of the grid points along each outer edge, so the center grid points would stay centered in the new set.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load existing grid\n",
"grid_xy = np.load('data/meps_example/static/nwp_xy.npy')\n",
"# Get slices in each dimension by cutting off a quarter along each edge\n",
"num_x, num_y = grid_xy.shape[1:]\n",
"x_slice = slice(num_x//4, 3*num_x//4)\n",
"y_slice = slice(num_y//4, 3*num_y//4)\n",
"# Index and save reduced grid\n",
"grid_xy_reduced = grid_xy[:, x_slice, y_slice]\n",
"np.save('data/meps_example_reduced/static/nwp_xy.npy', grid_xy_reduced)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This cut out the border, so a new perimeter of 10 grid points was established as border (10 was also the border size in the original \"meps_example\").\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Outer 10 grid points are border\n",
"old_border_mask = np.load('data/meps_example/static/border_mask.npy')\n",
"assert np.all(old_border_mask[10:-10, 10:-10] == False)\n",
"assert np.all(old_border_mask[:10, :] == True)\n",
"assert np.all(old_border_mask[:, :10] == True)\n",
"assert np.all(old_border_mask[-10:,:] == True)\n",
"assert np.all(old_border_mask[:,-10:] == True)\n",
"\n",
"# Create new array with False everywhere but the outer 10 grid points\n",
"border_mask = np.zeros_like(grid_xy_reduced[0,:,:], dtype=bool)\n",
"border_mask[:10] = True\n",
"border_mask[:,:10] = True\n",
"border_mask[-10:] = True\n",
"border_mask[:,-10:] = True\n",
"np.save('data/meps_example_reduced/static/border_mask.npy', border_mask)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A few other files also needed to be copied using only the new reduced grid"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load surface_geopotential.npy, index only values from the reduced grid, and save to new file\n",
"surface_geopotential = np.load('data/meps_example/static/surface_geopotential.npy')\n",
"surface_geopotential_reduced = surface_geopotential[x_slice, y_slice]\n",
"np.save('data/meps_example_reduced/static/surface_geopotential.npy', surface_geopotential_reduced)\n",
"\n",
"# Load pytorch file grid_features.pt\n",
"grid_features = torch.load('data/meps_example/static/grid_features.pt')\n",
"# Index only values from the reduced grid. \n",
"# First reshape from (num_grid_points_total, 4) to (num_grid_points_x, num_grid_points_y, 4), \n",
"# then index, then reshape back to new total number of grid points\n",
"print(grid_features.shape)\n",
"grid_features_new = grid_features.reshape(num_x, num_y, 4)[x_slice,y_slice,:].reshape((-1, 4))\n",
"# Save to new file\n",
"torch.save(grid_features_new, 'data/meps_example_reduced/static/grid_features.pt')\n",
"\n",
"# flux_stats.pt is just a vector of length 2, so the grid shape and variable changes does not change this file\n",
"torch.save(torch.load('data/meps_example/static/flux_stats.pt'), 'data/meps_example_reduced/static/flux_stats.pt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The number of variables was reduced by truncating the variable list to the first 8."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_vars = 8\n",
"\n",
"# Load parameter_weights.npy, truncate to first 8 variables, and save to new file\n",
"parameter_weights = np.load('data/meps_example/static/parameter_weights.npy')\n",
"parameter_weights_reduced = parameter_weights[:num_vars]\n",
"np.save('data/meps_example_reduced/static/parameter_weights.npy', parameter_weights_reduced)\n",
"\n",
"# Do the same for following 4 pytorch files\n",
"for file in ['diff_mean', 'diff_std', 'parameter_mean', 'parameter_std']:\n",
" old_file = torch.load(f'data/meps_example/static/{file}.pt')\n",
" new_file = old_file[:num_vars]\n",
" torch.save(new_file, f'data/meps_example_reduced/static/{file}.pt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly the files in each of the directories train, test, and val have to be reduced. The folders all have the same structure with files of the following types:\n",
"```\n",
"nwp_YYYYMMDDHH_mbrXXX.npy\n",
"wtr_YYYYMMDDHH.npy\n",
"nwp_toa_downwelling_shortwave_flux_YYYYMMDDHH.npy\n",
"```\n",
"with ```YYYYMMDDHH``` being some date with hours, and ```XXX``` being some 3-digit integer.\n",
"\n",
"The first type of file has x and y in dimensions 1 and 2, and variable index in dimension 3. Dimension 0 is unchanged.\n",
"The second type has has x and y in dimensions 1 and 2. Dimension 0 is unchanged.\n",
"The last type has just x and y as the only 2 dimensions.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(65, 268, 238, 18)\n",
"(65, 268, 238)\n"
]
}
],
"source": [
"print(np.load('data/meps_example/samples/train/nwp_2022040100_mbr000.npy').shape)\n",
"print(np.load('data/meps_example/samples/train/nwp_toa_downwelling_shortwave_flux_2022040112.npy').shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following loop goes through each file in each sample folder and indexes them according to the dimensions given by the file name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for sample in ['train', 'test', 'val']:\n",
" files = os.listdir(f'data/meps_example/samples/{sample}')\n",
"\n",
" for f in files:\n",
" data = np.load(f'data/meps_example/samples/{sample}/{f}')\n",
" if 'mbr' in f:\n",
" data = data[:,x_slice,y_slice,:num_vars]\n",
" elif 'wtr' in f:\n",
" data = data[x_slice, y_slice]\n",
" else:\n",
" data = data[:,x_slice,y_slice]\n",
" np.save(f'data/meps_example_reduced/samples/{sample}/{f}', data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, the file ```data_config.yaml``` is modified manually by truncating the variable units, long and short names, and setting the new grid shape. Also the unit descriptions containing ```^``` was automatically parsed using latex, and to avoid having to install latex in the GitHub CI/CD pipeline, this was changed to ```**```. \n",
"\n",
"This new config file was placed in ```data/meps_example_reduced```, and that directory was then zipped and placed in a European Weather Cloud S3 bucket."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
7 changes: 6 additions & 1 deletion neural_lam/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Standard library
import os
import shutil

# Third-party
import numpy as np
Expand Down Expand Up @@ -250,7 +251,11 @@ def fractional_plot_bundle(fraction):
Get the tueplots bundle, but with figure width as a fraction of
the page width.
"""
bundle = bundles.neurips2023(usetex=True, family="serif")
# If latex is not available, some visualizations might not render correctly,
# but will at least not raise an error.
# Alternatively, use unicode raised numbers.
usetex = True if shutil.which("latex") else False
SimonKamuk marked this conversation as resolved.
Show resolved Hide resolved
bundle = bundles.neurips2023(usetex=usetex, family="serif")
bundle.update(figsizes.neurips2023())
original_figsize = bundle["figure.figsize"]
bundle["figure.figsize"] = (
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@ plotly>=5.15.0

# for dev
pre-commit>=2.15.0
pytest>=8.1.1
pooch>=1.8.1
Empty file added tests/__init__.py
Empty file.
Loading
Loading