Skip to content

makeyourownmaker/CambridgeTemperatureNotebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CambridgeTemperatureNotebooks

Lifecycle Python

Time series deep learning and boosted trees models for Cambridge UK temperature forecasts in python

If you like CambridgeTemperatureNotebooks, give it a star, or fork it and contribute!

Summary of 48 step-ahead predictions using darts lightGBM model with selected lagged features and optimised hyperparameters:

Click on images for larger versions. The mean RMSE across the 48 step (24 hours) forecast horizon is 0.94 and the MAE is 0.63.

The darts lightGBM model is greatly superior to the VAR (Vector AutoRegression) baseline. These predictions are for held out test data from 2022.
Additional model diagnostics are included above the Roadmap section further down this page. See also, gradient boosting notebook.

Usage

It is easiest to open the notebooks in google Colaboratory.

Alternatively, clone the repository and open the notebook(s) in a local installation of Jupyter. It will be necessary to install the required dependencies.

Details

See my time series and other models for Cambridge UK temperature forecasts in R repository for a detailed explanation of the data (including cleaning), baseline models, daily and yearly seasonality descriptions plus R prophet model. Assumptions and limitations are covered in the above repository and will not be repeated here. Additional exploratory data analysis is available in my Cambridge University Computer Laboratory Weather Station R Shiny repository.

My primary interest is in "now-casting" or forecasts within the next 1 to 2 hours. This is because I live close to the data source and the UK met office only update their public facing forecasts every 2 hours.

MLP, FCN, ResNet, LSTM for temperature forecasts

There are 10 years of training data (mid-2008 to 2017 inclusive) plus validation data from 2018 and test data from 2019.

I use the following neural network architectures to make temperature forecasts:

  • Multi-layer perceptron (MLP)
  • Fully convolutional network (FCN)
  • Residual network (ResNet)
  • Long short term memory (LSTM)

The mixup method is used to counteract the categorical wind bearing measurements.

More details are included in the keras MLP, FCN, ResNet, LSTM time series notebook.

2008-2021 baseline forecasts

There are over 10 years of data (mid-2008 to early-2021 inclusive).

I compare forecasts from univariate and multivariate methods in the statsmodels package to establish reasonable baselines results.

Methods include:

  • persistent
  • simple exponential smoothing
  • Holt Winter's exponential smoothing
  • vector autoregression (VAR)

An updated VAR baseline based on more and cleaner data plus better model diagnostics and much improved, faster code can be found in the gradient boosted trees notebook below. This set of baselines should be considered out of date.

More details are included in the 2021 baseline forecasts notebook.

2008-2021 LSTM forecasts

A more detailed look at LSTM based architectures.

Including:

  • some parameter optimisation and comparison
  • stacked LSTMs
  • bidirectional LSTMs
  • ConvLSTM1D

2008-2021 CNN forecasts

A more detailed look at CNN based architectures.

Including:

  • Conv1D
  • multi-head Conv1D
  • Conv2D
  • Inception-style

2008-2022 encoder decoder forecasts

Examining encoder decoder based architectures.

Including:

  • autoencoder with attention
  • encoder decoder with teacher forcing and autoregressive inference
  • transformer encoder decoder with teacher forcing, positional embedding, padding and autoregressive inference
  • encoder only transformer with positional embedding
  • robust backtesting and experimentation framework

2016-2022 feature engineering

Create univariate and bivariate meteorological and time series features.

Including:

  • missing data annotation
  • solar (irradiance etc) and meteorological (absolute humidity, mixing ratio etc) feature calculations
  • seasonal decomposition of temperature and dew.point
  • rolling statistics, tsfeatures, catch22, bivariate features etc

2016-2022 gradient boosted trees

Building gradient boosted tree models.

Including:

  • multi-step lightGBM models with darts time series framework and optuna hyperparameter studies
  • target, past covariate and future covariate lags selection
  • Borota-style shadow variables for lag and feature selection with lightGBM variable importance
  • robust backtesting and experimentation framework

Additional model diagnostics:

Click on images for larger versions. See also, main diagnostics at top of this page or gradient boosting notebook.

Roadmap

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Alternatives

License

GPL-2