CambridgeTemperatureNotebooks

Time series deep learning and boosted trees models for Cambridge UK temperature forecasts in python

If you like CambridgeTemperatureNotebooks, give it a star, or fork it and contribute!

Summary of 48 step-ahead predictions using darts lightGBM model with selected lagged features and optimised hyperparameters:

Click on images for larger versions. The mean RMSE across the 48 step (24 hours) forecast horizon is 0.94 and the MAE is 0.63.

The darts lightGBM model is greatly superior to the VAR (Vector AutoRegression) baseline. These predictions are for held out test data from 2022.
Additional model diagnostics are included above the Roadmap section further down this page. See also, gradient boosting notebook.

Usage

It is easiest to open the notebooks in google Colaboratory.

Try notebook(s) remotely
- pre-2021 MLP, FCN, ResNet, LSTM for temperature forecasts
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2008-2021 baseline forecasts
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2008-2021 LSTM forecasts
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2008-2021 CNN forecasts
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2008-2022 encoder decoder forecasts
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2016-2022 updated VAR baseline
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2016-2022 feature engineering
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf
- 2016-2022 gradient boosted trees
  - - editable
  - - editable
  - View on NBViewer
  - View on GitHub
  - View or download pdf

Alternatively, clone the repository and open the notebook(s) in a local installation of Jupyter. It will be necessary to install the required dependencies.

Details

See my time series and other models for Cambridge UK temperature forecasts in R repository for a detailed explanation of the data (including cleaning), baseline models, daily and yearly seasonality descriptions plus R prophet model. Assumptions and limitations are covered in the above repository and will not be repeated here. Additional exploratory data analysis is available in my Cambridge University Computer Laboratory Weather Station R Shiny repository.

My primary interest is in "now-casting" or forecasts within the next 1 to 2 hours. This is because I live close to the data source and the UK met office only update their public facing forecasts every 2 hours.

MLP, FCN, ResNet, LSTM for temperature forecasts

There are 10 years of training data (mid-2008 to 2017 inclusive) plus validation data from 2018 and test data from 2019.

I use the following neural network architectures to make temperature forecasts:

Multi-layer perceptron (MLP)
Fully convolutional network (FCN)
Residual network (ResNet)
Long short term memory (LSTM)

The mixup method is used to counteract the categorical wind bearing measurements.

More details are included in the keras MLP, FCN, ResNet, LSTM time series notebook.

2008-2021 baseline forecasts

There are over 10 years of data (mid-2008 to early-2021 inclusive).

I compare forecasts from univariate and multivariate methods in the statsmodels package to establish reasonable baselines results.

Methods include:

persistent
simple exponential smoothing
Holt Winter's exponential smoothing
vector autoregression (VAR)

An updated VAR baseline based on more and cleaner data plus better model diagnostics and much improved, faster code can be found in the gradient boosted trees notebook below. This set of baselines should be considered out of date.

More details are included in the 2021 baseline forecasts notebook.

2008-2021 LSTM forecasts

A more detailed look at LSTM based architectures.

Including:

some parameter optimisation and comparison
stacked LSTMs
bidirectional LSTMs
ConvLSTM1D

2008-2021 CNN forecasts

A more detailed look at CNN based architectures.

Including:

Conv1D
multi-head Conv1D
Conv2D
Inception-style

2008-2022 encoder decoder forecasts

Examining encoder decoder based architectures.

Including:

autoencoder with attention
encoder decoder with teacher forcing and autoregressive inference
transformer encoder decoder with teacher forcing, positional embedding, padding and autoregressive inference
encoder only transformer with positional embedding
robust backtesting and experimentation framework

2016-2022 feature engineering

Create univariate and bivariate meteorological and time series features.

Including:

missing data annotation
solar (irradiance etc) and meteorological (absolute humidity, mixing ratio etc) feature calculations
seasonal decomposition of temperature and dew.point
rolling statistics, tsfeatures, catch22, bivariate features etc

2016-2022 gradient boosted trees

Building gradient boosted tree models.

Including:

multi-step lightGBM models with darts time series framework and optuna hyperparameter studies
target, past covariate and future covariate lags selection
Borota-style shadow variables for lag and feature selection with lightGBM variable importance
robust backtesting and experimentation framework

Additional model diagnostics:

Click on images for larger versions. See also, main diagnostics at top of this page or gradient boosting notebook.

Roadmap

Improve data cleaning
- Compute missing temperature from relative humidity and dew point
- Compute missing dew point from relative humidity and temperature
- Compute missing relative humidity from temperature, dew point, and pressure
- These calculations would be preferable to imputation, interpolation, substitution with neighboring weather data or historical averages
Add prediction intervals
Add standard deviations to MSE, MAE values
Benchmark against parameterization method
- My ParametricWeatherModel script (requires cloud fraction)
Examine Global Forecast System (GFS) weather model
- runs four times a day, produces forecasts up to 16 days in advance
- data is available for free in the public domain
- model serves as the basis for the forecasts of numerous services
- potentially use as additional exogeneous variables
See future work sections in each of the notebooks linked above

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Alternatives

License

GPL-2

Name		Name	Last commit message	Last commit date
Latest commit History 1,314 Commits
data		data
docs		docs
figures		figures
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CambridgeTemperatureNotebooks

Usage

Details

MLP, FCN, ResNet, LSTM for temperature forecasts

2008-2021 baseline forecasts

2008-2021 LSTM forecasts

2008-2021 CNN forecasts

2008-2022 encoder decoder forecasts

2016-2022 feature engineering

2016-2022 gradient boosted trees

Roadmap

Contributing

Alternatives

License

About

Releases

Packages

Languages

License

makeyourownmaker/CambridgeTemperatureNotebooks

Folders and files

Latest commit

History

Repository files navigation

CambridgeTemperatureNotebooks

Usage

Details

MLP, FCN, ResNet, LSTM for temperature forecasts

2008-2021 baseline forecasts

2008-2021 LSTM forecasts

2008-2021 CNN forecasts

2008-2022 encoder decoder forecasts

2016-2022 feature engineering

2016-2022 gradient boosted trees

Roadmap

Contributing

Alternatives

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages