Time series deep learning and boosted trees models for Cambridge UK temperature forecasts in python
If you like CambridgeTemperatureNotebooks, give it a star, or fork it and contribute!
Summary of 48 step-ahead predictions using darts lightGBM model with selected lagged features and optimised hyperparameters:
Click on images for larger versions. The mean RMSE across the 48 step (24 hours) forecast horizon is 0.94 and the MAE is 0.63.
The darts lightGBM model is greatly superior to the VAR (Vector AutoRegression) baseline.
These predictions are for held out test data from 2022.
Additional model diagnostics are included above the Roadmap section further down this page.
See also, gradient boosting notebook.
It is easiest to open the notebooks in google Colaboratory.
- Try notebook(s) remotely
-
pre-2021 MLP, FCN, ResNet, LSTM for temperature forecasts
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2008-2021 baseline forecasts
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2008-2021 LSTM forecasts
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2008-2021 CNN forecasts
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2008-2022 encoder decoder forecasts
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2016-2022 updated VAR baseline
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2016-2022 feature engineering
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
2016-2022 gradient boosted trees
- - editable
- - editable
- View on NBViewer
- View on GitHub
- View or download pdf
-
Alternatively, clone the repository and open the notebook(s) in a local installation of Jupyter. It will be necessary to install the required dependencies.
See my time series and other models for Cambridge UK temperature forecasts in R repository for a detailed explanation of the data (including cleaning), baseline models, daily and yearly seasonality descriptions plus R prophet model. Assumptions and limitations are covered in the above repository and will not be repeated here. Additional exploratory data analysis is available in my Cambridge University Computer Laboratory Weather Station R Shiny repository.
My primary interest is in "now-casting" or forecasts within the next 1 to 2 hours. This is because I live close to the data source and the UK met office only update their public facing forecasts every 2 hours.
There are 10 years of training data (mid-2008 to 2017 inclusive) plus validation data from 2018 and test data from 2019.
I use the following neural network architectures to make temperature forecasts:
- Multi-layer perceptron (MLP)
- Fully convolutional network (FCN)
- Residual network (ResNet)
- Long short term memory (LSTM)
The mixup method is used to counteract the categorical wind bearing measurements.
More details are included in the keras MLP, FCN, ResNet, LSTM time series notebook.
There are over 10 years of data (mid-2008 to early-2021 inclusive).
I compare forecasts from univariate and multivariate methods in the statsmodels package to establish reasonable baselines results.
Methods include:
- persistent
- simple exponential smoothing
- Holt Winter's exponential smoothing
- vector autoregression (VAR)
An updated VAR baseline based on more and cleaner data plus better model diagnostics and much improved, faster code can be found in the gradient boosted trees notebook below. This set of baselines should be considered out of date.
More details are included in the 2021 baseline forecasts notebook.
A more detailed look at LSTM based architectures.
Including:
- some parameter optimisation and comparison
- stacked LSTMs
- bidirectional LSTMs
- ConvLSTM1D
A more detailed look at CNN based architectures.
Including:
- Conv1D
- multi-head Conv1D
- Conv2D
- Inception-style
Examining encoder decoder based architectures.
Including:
- autoencoder with attention
- encoder decoder with teacher forcing and autoregressive inference
- transformer encoder decoder with teacher forcing, positional embedding, padding and autoregressive inference
- encoder only transformer with positional embedding
- robust backtesting and experimentation framework
Create univariate and bivariate meteorological and time series features.
Including:
- missing data annotation
- solar (irradiance etc) and meteorological (absolute humidity, mixing ratio etc) feature calculations
- seasonal decomposition of temperature and dew.point
- rolling statistics, tsfeatures, catch22, bivariate features etc
Building gradient boosted tree models.
Including:
- multi-step lightGBM models with darts time series framework and optuna hyperparameter studies
- target, past covariate and future covariate lags selection
- Borota-style shadow variables for lag and feature selection with lightGBM variable importance
- robust backtesting and experimentation framework
Click on images for larger versions. See also, main diagnostics at top of this page or gradient boosting notebook.
- Improve data cleaning
- Compute missing temperature from relative humidity and dew point
- Compute missing dew point from relative humidity and temperature
- Compute missing relative humidity from temperature, dew point, and pressure
- These calculations would be preferable to imputation, interpolation, substitution with neighboring weather data or historical averages
- Add prediction intervals
- Add standard deviations to MSE, MAE values
- Benchmark against parameterization method
- My ParametricWeatherModel script (requires cloud fraction)
- Examine Global Forecast System (GFS) weather model
- runs four times a day, produces forecasts up to 16 days in advance
- data is available for free in the public domain
- model serves as the basis for the forecasts of numerous services
- potentially use as additional exogeneous variables
- See future work sections in each of the notebooks linked above
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.