-
Notifications
You must be signed in to change notification settings - Fork 0
/
conclusion.tex
63 lines (50 loc) · 6.5 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
\chapter{Conclusion}
Four neural-network based models were evaluated for short-term load forecasting: sequence to sequence recurrent neural network, transformer with and without teacher forcing, and universal transformer.
The models were applied to the Bruny Island submarine feeder and to the standard ISO New England dataset.
Metrics assessing the MAPE above 1 MVA and MAPE at first large peak were used to assess the performance of the forecasters on anomalous holiday periods.
The transformer with teacher forcing and universal transformer models did not produce results competitive with the sequence to sequence and transformer without teacher forcing models.
The sequence to sequence model was found to achieve overall good results on the Bruny Island feeder when evaluated on overall MAPE and MAPE above 1 MVA.
The transformer without teacher forcing model was found to achieve the best results on overall MAPE on the Bruny Island feeder, but was somewhat more volatile on the metrics above 1 MVA when compared to the sequence to sequence model.
The transformer without teacher forcing model also appeared to handle forecasting tasks involving long sequence lengths better than the sequence to sequence model.
Overall, though, which model is superior on this feeder is not clear and will depend on whether the forecasting is hourly or half-hourly, and whether accuracy overall or accuracy over holiday periods is most important.
A transformer-based model was implemented as part of the Bruny Island CONSORT battery trial and was able to reliably and accurately predict large peaks over anomalous holiday periods.
This was in one instance able to assist the CONSORT project in coordinating the batteries on Bruny Island to avoid using the diesel generator.
The models were additionally evaluated on a standard ISO New England dataset.
When using identical models to those applied on Bruny Island the sequence to sequence and transformer without teacher forcing models were able to achieve a MAPE of 2.15\% compared to other papers of 1.71\% and 1.31\%.
The presented models -- sequence to sequence and transformer without teacher forcing -- are versatile and reliable when used to forecast load with a 24-hour horizon and 30- or 60-minute resolution.
The models are practical to implement and can be applied to feeders ranging from the 1 MVA through 20 GW level with no modifications.
The models can reliably predict anomalous holiday periods given simple inputs describing the holidays and optionally given similar load profiles from the past.
\section{Future research}
\subsubsection{Transfer learning}
OpenAI has had success using transfer learning for natural language processing tasks with the transformer model \cite{radford2018improving}.
Transfer learning could be applied to load forecasting, whereby a single neural network model is first trained to forecast many different feeders, and then this pre-trained model is finally trained to forecast a single feeder.
This could potentially expand the generalization ability of the model, while also likely reducing the amount of training time required when starting from a pre-trained model.
\subsubsection{Time-weighted loss function}
Section \ref{train-req} indicated that the most recent training data may be the most important.
It is possible to modify the loss function such that it weights samples from more recently higher.
Whether this would be superior to simply restricting the training set to recent data is unknown.
\subsubsection{Monthly and hourly models}
Similar to \citet{Ceperic2013}, different models could be train for every month and every hour.
This introduces 288 models, making the training time potentially quite large, so pre-training or transfer learning could be used to make this more approachable.
This may not work well when considering holiday periods that change yearly, such as Easter which may be in either March or April.
\subsubsection{Multi-task learning}
Multi task learning is the process of using a neural network model to perform multiple tasks simultaneously, and has been shown to improve performance in some tasks \cite{Evgeniou2004}.
This could applied to load forecasting by forecasting multiple feeders simultaneously.
It's intuitive that this might be beneficial, as there are many feeders which exhibit similar patterns -- Bruny Island and St Helens in Tasmania, for example, both experience similar peaks around holiday periods.
This could also allow the forecasting system to be more resilient against poor quality data, as data from one feeder would be used to help forecast others.
\subsubsection{Generic forecaster}
The transformer, and its multi-head self attention, is able to process sequences of long length with no regard for position or order.
This could be used to create a generic forecaster, where the model is supplied with all past data up to the current time when performing forecasts.
A model like this might be able to look over the past data and produce a single forecast that matches the past patterns.
Using multi-head self attention for this would be prohibitively computationally expensive, so localised multi-head attention \cite{Vaswani2017} could be applied, as used by \citet{Song2017}.
If a single model could be made to forecast all feeders this may ultimately be computationally cheaper than using a single model for every feeder.
\subsubsection{Sequence to sequence with attention}
Sequence to sequence models can be augmented with an attention mechanism \cite{luong2015effective} to allow the model to consider all elements of the input simultaneously while producing the output sequence.
This may overcome the tendency of LSTM RNN networks to ``forget" data from early in the input sequence.
Very recently LSTM RNN models have been combined with transformer models \cite{chen2018best}, producing promising results.
Applying combinations of LSTM and attention to load forecasting may produce superior results to either on their own.
\subsubsection{Signal decomposition}
\citet{Chen2010} used wavelet decomposition to decompose input data into low and high frequency components before forecasting the next day's low and high frequency load separately.
\citet{Bedi2018} used empirical mode decomposition to decompose input data into different intrinsic mode functions before applying different LSTM RNNs to each and recombining the results to form a final forecast.
By applying methods such as these it would likely be possible to improve the accuracy of the load forecasting.
However, it may jeopardize the ability for the forecasting models to be applied to different feeders with no modification.