Hybrid Artificial Neural Networks For Electricity Consumption Prediction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

International Journal of Advanced Engineering Research

and Science (IJAERS)


Peer-Reviewed Journal
ISSN: 2349-6495(P) | 2456-1908(O)
Vol-9, Issue-8; Aug, 2022
Journal Home Page Available: https://ijaers.com/
Article DOI: https://dx.doi.org/10.22161/ijaers.98.32

Hybrid Artificial Neural Networks for Electricity


Consumption Prediction
Ricardo Augusto Manfredini

IFRS - Instituto Federal de Educação, Ciências e Tecnologia do Rio Grande do Sul – Campus Farroupilha, Brazil
Email: [email protected]

Received: 26 Jun 2022, Abstract— We present a comparative study of electricity consumption


Received in revised form: 14 Jul 2022, predictions using the SARIMAX method (Seasonal Auto Regressive
Moving Average eXogenous variables), the HyFis2 model (Hybrid Neural
Accepted: 22 July 2022,
Fuzzy Inference System) and the LSTNetA model (Long and Short Time
Available online: 19 Aug 2022 series Network Adapted), a hybrid neural network containing GRU (Gated
©2022 The Author(s). Published by AI Recurrent Unit), CNN (Convolutional Neural Network) and dense layers,
Publication. This is an open access article specially adapted for this case study. The comparative experimental study
under the CC BY license developed showed a superior result for the LSTNetA model with
(https://creativecommons.org/licenses/by/4.0/). consumption predictions much closer to the real consumption. The
LSTNetA model in the case study had a rmse (root mean squared error) of
Keywords— Artificial Neural Network,
198.44, the HyFis2 model 602.71 and the SARIMAX method 604.58.
Artifial Inteligence, eletricity
consumption predictions, time series.

I. INTRODUCTION seasonality. The easiest way to manage the risk of solar


In recent decades, the world population is increasing power and harness this power is to forecast the amount of
rapidly and, due to this increase, the global energy power to be generated [15] as well as the consumption. A
demanded and consumed is also growing more and more reliable forecast is key for various smart grid applications
[6]. Concerning , residential or commercial buildings, are such as dispatch, active demand response, grid regulation
identified as major energy consumers worldwide, and smart energy management [12].
accounting for about 30% of the global electricity demand The energy consumption of a building and the PV
related to energy consumption in the residential sector [7]. generation can be represented by a time series with trends
Buildings are responsible for a significant share of energy and seasonality [14]. There are numerous prediction
waste as well. Energy waste and climate change represent studies on time series, from classical linear regressions to
a challenge for sustainability, and it is crucial to make more recent works using machine learning algorithms,
buildings more efficient [11]. Therefore, the development which are powerful tools in predicting electricity
and use of clean products and renewable energy in consumption and PV generation [21]. Recently, many PV
buildings have gained wide interest [6]. In the residential power forecasting techniques have been developed, but
and commercial sectors, photovoltaic (PV) systems are the there is still no complete unit versal forecasting model and
most common distributed generation, minimizing demand methodology to ensure the accuracy of predictions.
dependence on traditional power plants and maximizing Concerning this, Artificial Neural Networks (ANNs) are
household self-sufficiency [8]. very popular machine learning algorithms for object
Due to PV's dependence on weather conditions, prediction and classification and are based on the classical
the intermittent nature of the power generated brings some feed-forward neural network approach [23]. ANNs are
uncertainty [24]. Similarly, the electricity consumption of computing systems inspired by the biological neural
these buildings also has inherent uncertainties due to

www.ijaers.com Page | 292


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

networks of the brain, how neurons work, pass and store more or less regular, around a trend line or curve.
information [13; 24]. Seasonalities are regular patterns observed from time to
Due to the accelerated development of computing time. Finally, randomness is effects that occur randomly
technology, ANN has provided a powerful framework for and that cannot be captured by cycles, trends and
supervised learning [5]. Deep learning allows models seasonalities.
composed of multiple layers to learn data representations Thus, the time series prediction models most used
[11]. Deep Neural Networks (DNN1 ) are inspired by the in the literature are those of linear and polynomial
structure of mammalian visual systems and they are also regressions. Among the regression models, we can
an important machine learning tool that has been widely mention the SARIMAX method [19]. This statistical
used in many fields [25]. DNN employs an architecture of model is a variant of the autoregressive moving average
multiple layers of neurons in an ANN and can represent model (ARMA), adding derivations to make the model
functions with higher complexity [5]. stationary (I), adding seasonality (S) and finally adding the
This work aimed at predicting the electricity effect of eXogenous (X) or random variables over time. In
consumption of a commercial building using ANN in its this work, the SARIMAX model was used as a baseline to
various architectures. Several ANN architectures were compare its results, its application to the test case and the
used and tested and a hybrid architecture (Dense, results obtained from other prediction models.
Convolutional and Recurrent), originally described by Lai,
G. et al. [4] and adapted for this case study, was selected. 2.2 Convolutional Artificial Neural Networks
Convective Artificial Neural Networks (CNN2 ) are a type
II. FOUNDATION of DNN that is commonly applied to analyse images. One
2.1 Time Series of the main attributes of CNN is to drive different
processing layers that generate an effective representation
Time series are sets of observations ordered in time [14].
of the features of image edges. The architecture of CNN
A temporal series can be defined as a class of phenomena
allows multiple layers of these processing units to be
whose observational process and consequent numerical
stacked, this deep learning model can emphasize the
quantification generate a sequence of observations
relevance of features at different scales [24].
distributed over time.
Fig. 1 demonstrates a typical architecture of a
Electricity consumption histories over time are
CNN, composed of at least, a convolution layer, a pooling
univalued time series [20] with trends, cycles, seasonality
layer, a flattening layer and dense layers.
and randomness. Trends are long-term characteristics
related to a time interval. Cycles are long-term oscillations,
2 CNN - Convolutional Neural Network
1 DNN - Deep Neural Network

Fig. 1. Basic CNN.


Source: The author

In the convolution layer, a filter (kernel, which is the matrix product of the matrix colored in Fig. 2 by the
also a matrix) is applied to the input matrix aiming at its kernel, at each step it shifts one position to the right until
reduction while maintaining its most important the last column of the input matrix after it shifts one line
characteristics. Fig. 2 represents, step by step, the down and continues the process until it runs through the
application of the convolution function where g(x,y) whole input matrix. In the example of Fig. 2, a 7X7 input
represents the element of the convolution matrix, that is matrix was reduced to a 5X5 convolution matrix. The

www.ijaers.com Page | 293


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

whole process represented in Fig. 2 is repeated for each of For the pooling layer, it is usual to apply the
the kernels used, generating several convolution matrices. activation function relu f ( x )= max ⁡
( 0, x) for example,
a b generating a new reduced matrix as shown in Fig. 3.
g ( x, y ) = ω f ( x, y ) =   ω ( dx,dy ) f ( x+ dx, y + dy )
dx= a dy= b Finally, the flattening layer is nothing more than
transforming the matrices of the pooling layers into
vectors, which will be the inputs of the dense layer.

Fig. 2: Convolution process


Source: The author.

Fig. 3. Pooling Process.


Source the Author.

2.3 Recurrent Artificial Neural Networks calculated so far. In theory, RNNs can make use of
In traditional ANNs, the inputs (and outputs) are information in arbitrarily long sequences, but in practice,
independent of each other, making it difficult to use them, they are limited to looking back only a few steps. Fig. 4 is
for example, in natural language processing where a word a typical representation of an RNN.
in a sentence depends on previous words in the same
sentence, or in time series where we need to know the
values over time for better projections.
In contrast, recurrent artificial neural networks
(RNN3) [8] store their previous state and also use it as
input to the current state for calculations of new outputs.
Another way of thinking about RNNs is that they have a
"memory" that captures information about what has been Fig. 4: Basic RNN.
Source: The Author.
3 RNN - Recurrent Neural Network

www.ijaers.com Page | 294


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

Fig. 4 shows an RNN being expanded into a data of sectors of Building N of ISEP/IPP where GECAD
complete network. Where xt is the input in time step t . For is located. The building has five energy meters that store
x1 the electrical energy consumption data of specific sectors
example, could be a one-hot vector corresponding to the
of the building, with a time interval of 10 seconds. This
second word of a sentence, st is the hidden state in the information, as well as meteorological data, are stored in a
time step t . It is the "memory" of the network. st Is SQL server automatically, through agents developed in
calculated based on the previous hidden state and the input Java.
s = f ( U xt +W st+1)
in the current time step: t . The function f To validate the model described below, tests were
is usually a nonlinearity, such as tanh or relu. s− 1, which performed using the same consumption data applied to the
SARIMAX model and HyFIS2. The N Building
is needed to compute the first hidden state, is usually
laboratories sector was not computed as it has a large
initialized with zeros. ot is the output in step t . For
variation in consumption due to the experiments conducted
example, if we wanted to predict the next word in a there, which generate many outliers in the consumption
sentence, it would be a probability vector in our history. For the experiment tests, it was performed an
o = softmax (V st ) .
vocabulary. t By expanding, we simply hourly average of the consumption stored every ten
mean that we write the network for the complete sequence. seconds, due to the need of predicting the next hour of
For example, if the sequence we are interested in is a 5- consumption.
word sentence, the network would be unfolded into a 5- 3.1 The Long and Short Time series Network Adapted
layer neural network, one layer for each word. (LSTNetA) Model
The model developed for energy consumption prediction
III. MATERIAL and METHODS was based on the model proposed by Lai [4], represented
This work was carried out at the Research Group on in Fig. 4, which consists of a hybrid ANN with three
Intelligent Engineering and Computing for Advanced distinct layers, initially has a convolutional layer for the
Innovation and Development (GECAD4), a research centre extraction of short-term patterns of the time series, has as
located at the Instituto Superior de Engenharia do Porto of input the time series, the output of this layer is the input of
the Instituto Politécnico do Porto ISEP/IPP, Porto, the recurrent layer that memorizes historical information
Portugal. Similarly to the HyFIS2 model (Josi et al.; 2016), of the time series, which in turn its output is the input of
the posited model uses the actual electrical consumption the highly connected dense layer. Finally, the output of the
highly connected layer is combined with the output of the
4 autoregressive linear regression (ARMA) [26] ensuring
http://www.gecad.isep.ipp.pt/GECAD/Pages/Pubs/Publ that the output will have the same scale as the input, thus
icationsPES.aspx composing the prediction.

Fig. 5. Architecture of the LSTNetA model.


Source: adapted from Lai [4].

Fig. 6 summarizes the implementation of the Dense classes, the auto-regression is represented in the
LSTNetA network. The convolution layer is represented PostARTrans class.
by the Conv2D class, the recurrent layer is represented by It is important to note that the recurrent layer uses
the GRU classes, the dense layer is represented by the one of the RNN variants the GRU (Gated Recurrent Unit)

www.ijaers.com Page | 295


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

[1], this ANN model as well as the LSTM (Long Short-


Term Memory) aims to solve the problem of short-term
memory of RNNs that, in long series, have difficulty
transporting the results of previous steps to the later ones.

Fig. 7. Typical architecture of a GRU.


Source: The Author

The LSTNetA model was developed in the


Python programming language version 3.7 [17] using the
machine learning library, developed by Google,
TensorFlow version 2.0 [22].

IV. RELATED WORKS


Fig. 8, represents the power consumption time series used
by the SARIMAX model to train and test the LSTNetA
model and HyFIS2. The top graph represents the historical
Fig. 6: Summary of the LSTNet implementation. series of consumption in watts, which starts at zero hours
Source: The Author. on 08/04/2019 to eight hours on 20/12/2019. The middle
graph shows the calculated trend of the series and the
bottom graph its seasonality.
In the backpropagation stage, the learning process
of ANNs, the RNNs suffer from the problem of gradient
dissipation (The Vanishing Gradient Problem). Gradients
are values used to update the weights of neural networks.
The vanishing gradient problem is when the weights
propagated during network training are multiplied by
values smaller than 1 for each network layer passed
Fig. 8. Historical series of consumption.
through, arriving at the initial network layers with tiny
values. This causes the adjustment of weights, calculated Source: The Author.
at each iteration of net training, to be too small, and makes
net training more expensive.
4.1 SARIMAX
Thus, in RNNs the layers that receive a small
As seen previously, the SARIMAX method is a statistical
gradient update stop learning, with this the RNNs can
method of time series analysis, enabling the prediction
forget what was seen in longer sequences, thus having a
through linear regressions. Thus, it cannot be characterized
short-term memory.
as a machine learning algorithm. In the scope of this work,
Fig. 7 shows a typical architecture of a GRU. it was applied to obtain prediction data of a widely used
Basically what makes it different from a standard RNN are model, obtaining results for comparison with the proposed
the reset gate and update gate, which by applying the model and with the HyFIS2 model.
Sigmoid and tanh activation functions, it is defined
To verify the accuracy of all models covered in
whether the previous output ht-1 will be considered or
this work, the last 120 records corresponding to five days
discarded for the calculation of the new output.
of consumption were used for comparison between real
and predicted consumption, shown in Fig. 9. To calculate
the error used to verify the results of this work, in all
models, the root mean square error (RMSE - described in

www.ijaers.com Page | 296


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

chapter 01) was used, shown in Fig. 10. The application of


this model resulted in an average RMSE of 604.72 that
was considered as accuracy of this model, in this work.

Fig. 9. Comparison Real Consumption X Sarimax.


Source: The Author.
Error

4000
2000
Fig. 11. Neuro-Fuzzy structure of the HyFIS2 model.
0
Source: Jozi [9]
1 13 25 37 49 61 73 85 97 109
Hours
For prediction of electricity consumption, as in all
models tested, the last 120 historical records were used,
Fig. 10. Verified errors of the SARIMAX method. corresponding to five days of consumption. The
Source: The author. comparison between real and predicted consumption is
shown in Fig. 12. Fig. 13 shows the RMSE errors
calculated. The application of this model resulted in an
4.2 Model HyFIS2 average RMSE of 602.71 which was considered the
The HyFIS2 (Hybrid neural Fuzzy Inference System) accuracy of this model, in this work.
model uses a hybrid approach with the combination of
dense ANN and fuzzy logic. The system includes five
layers, as shown in Fig. 11. In the first layer, the nodes are
the inputs that transmit signals to the next layer. In the
second and fourth layers, the nodes act as membership
functions to express the input-output fuzzy linguistic Fig. 12. Real Consumption Comparison X HyFis2.
variables. In these layers, the fuzzy sets defined for the Source: The Author.
input-output variables are represented as: large (L),
medium (M) and small (S). However, for some
6000
applications, these can be more specific and represented
4000
Error

as, for example, large positive (LP), small positive (SP), 2000
zero (ZE), small negative (SN) and large negative (LN). In 0
the third layer, each node is a rule node and represents a 1 13 25 37 49 61 73 85 97 109
Hours
fuzzy rule. The connection weights between the third and
the fourth layer represent certainty factors of the associated Fig. 13. Verified errors of the HyFIS2 model.
rules, i.e., each rule is activated and controlled by the Source: The Author.
weight values. Finally, the fifth layer contains the node
that represents the output of the system.
V. APPLICATION OF THE LSTNETA MODEL
The training of the LSTNetA ANN was
performed as previously described, using the data of the
real electricity consumption of the N building of the
ISEP/IPP where GECAD is located, except for the
laboratory sector. The historical series analyzed was from
zero hours on 08/04/2019 to eight hours on 20/12/2019,
with measurements every ten seconds, totaled every hour,
resulting in 4186 records, containing time and
consumption. The training was performed with a learning

www.ijaers.com Page | 297


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

rate of 0.0003, using the Adam [10] stochastic method of of the real consumption of electricity, where the red line,
gradient descent optimization for updating the weights in which represents the predictions of the LSTNetA model, in
the backpropagation process. For the initial weights of the most of the period overlapped the blue line that represents
ANN, the algorithm VarianceScaling [3] was used, which the real consumption. This demonstrates a prediction very
generates initial weights with values on the same scale as close to the real consumption value, with low errors.
the inputs. The convolution kernel used was a 6x6 identity Table 1. Fragment of Predictions and Errors of the 3
matrix and a training loop with 1000 epochs was models
performed. All these parameters were obtained
Date and Actual LSTNetA Error - HyFis2 Error - SARI Error -
experimentally and the ones with the best results were Time LSTNetA HyFis2 MAX
selected. Consum SARIMAX
ption

19/12/201 4759,38 4824,27 64,8900 3427,13 1332,2500 4721,76 37,6190


9 09:00

19/12/201 6781,51 6685,28 96,2346 6583,38 198,1300 5516,26 1265,2476


9 10:00
Fig. 14. Comparison Real Consumption X LSTNetA.
19/12/201 7279,1 7194,26 84,8373 5798,56 1480,5400 6124,20 1154,8976
Source: The Author. 9 11:00

19/12/201 6332,88 6247,08 85,8038 5798,38 534,5000 5497,10 835,7849


For the prediction of electricity consumption, as 9 12:00

in all models tested, the last 120 historical records were


used, corresponding to five days of consumption. The 19/12/201 5350,34 5569,95 219,6063 6322,98 972,6400 5653,27 302,9276
9 13:00
comparison between real and predicted consumption is
shown in Fig. 14. Fig. 15 shows the RMSE errors 19/12/201 6677,56 6499,50 178,0639 5798,37 879,1900 5197,56 1479,9983
calculated. The application of this model resulted in an 9 14:00

average RMSE of 198.44 which was considered the


accuracy of this model, in this work.

Fig. 15. Verified errors of the LSTNetA model.


Source: The Author.

Fig. 16. Comparison of Real Consumption X Prediction


VI. RESULTS AND CONCLUSION Models.
Table 1 shows a fragment of the results of the Source: The Author.
three models, the Date and Time column, the Actual
column showing the actual electricity consumption in
watts at that date and time, the LSTNetA column the Fig. 17 represents the errors (RSME) of the three
prediction of this model at that date and time, the Error - models, allowing a comparison of the assertiveness of the
LSTNetA column the absolute error of this model in the predictions of each method and also concluding that the
prediction, the column HyFIS2 the prediction of this model LSTNetA method presented a better efficiency in its
at date and time, the column Error - HyFIS2 the absolute predictions in comparison to the SARIMAX and HyFIS2
error of this model in the prediction, finally the columns methods. This statement can be corroborated with the data
SARIMAX and Error - SARIMAX, representing the presented in Table 2, where the total average error of the
prediction and absolute error, respectively, in the LSTNetA model is significantly lower than the other
SARIMAX model. models.
Comparing the results of the SARIMAX, HyFIS2
and LSTNetA models, it can be observed, as shown in Fig.
16, that the LSTNetA method, with the data used for
testing, was the one that presented the closest predictions

www.ijaers.com Page | 298


Manfredini, R.A. International Journal of Advanced Engineering Research and Science, 9(7)-2022

[12] Massucco, S., Mosaico, G., Saviozzi, M., and Silvestro, F.


(2019). A hybrid technique for day-ahead pv generation
forecasting using clear-sky models or ensemble of artificial
neural networks according to a decision tree approach.
Energies, 12(7).
Fig. 17. Comparisons of errors verified in all models. [13] Montavon, G., Samek, W., and M ̈uller, K.R. (2018).
Source: The Author. Methods for interpreting and understanding deep neural
networks. Digital Signal Processing: A Review Journal,
73,1-15.
Table 2. RSME of the 3 Models Tested [14] Morete, P.A. e Toloi, C. M. C. Análise de séries temporais.
2. ed. São Paulo: Egard Blucher, 2006.
Error - LSTNetA Error - HyFis2 Error - SARIMAX
[15] Mosaico, G. and Saviozzi, M. (2019). A hybrid
RSME 198,4496 602,7109 604,5810
methodology for the day-ahead pv forecasting exploiting a
clear sky model or artificial neural networks. In IEEE
EUROCON 2019 -18th International Conference on Smart
Technologies, 1-6.
REFERENCES
[16] Pelletier, C. at al. Temporal Convolutional Neural Network
[1] Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014). for the Classification of Satellite Image Time Series. MDPI
Empirical evaluation of gated recurrent neural network on - Remote Sensinf - Open Access Journal. Available at:
sequence modeling. in NIPS 2014 Workshop on Deep https://arxiv.org/pdf/1811.10166.pdf. Accessed on:
Learning, December 2014. 01/03/2021.
[2] Das, U.K., Tey, K.S., Seyedmahmoudian, M., Mekhilef, S., [17] Python (2021). Python is a programming language that lets
Idris, M.Y.I., Van Deventer, W., Horan, B., and Stojcevski, you work quickly and integrate systems more effectively.
A. (2018). Forecasting of photovoltaic power generation and Available at: https://www.python.org. Accessed on:
model optimization: A review. Renewable and Sustainable 01/03/2021
Energy Reviews, 81, 912-928. [18] Reddy, K.S. and Ranjan, M. (2003). Solar resource
[3] He, K. at al. (2015). Delving Deep into Rectifiers: estimation using artificial neural networks and comparison
Surpassing Human-Level Performance on ImageNet with other correlation models. Energy Conversion and
Classification. Management, 44(15), 2519-2530.
[4] Lai, G., Chang, W.C., Yang, Y., and Liu, H. (2018). [19] SARIMAX (2021). SARIMAX: Introduction. Available at:
Modeling long- and short-term temporal patterns with deep https://www.statsmodels.org/dev/examples/notebooks/gener
neural networks. 41st International ACM SIGIR Conference ated/statespace_sarimax_stata.html. Accessed on:
on Research and Development in Information Retrieval, 01/03/2021
SIGIR 2018, 95-104. [20] Spiegel, Murray Ralph (1974). Statistics. Brasília: McGraw-
[5] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, Hill do Brasil , 1974.
F.E. (2017). A survey of deep neural network architectures [21] Su, D., Batzelis, E., and Pal, B. (2019). Machine learning
and their applications. Neurocomputing, 234, 11-26. algorithms in forecasting of photovoltaic power generation.
[6] Lo Brano, V., Ciulla, G., and Di Falco, M. (2014). Artificial In 2019 International Conference on Smart Energy Systems
neural networks to predict the power output of a PV panel. and Technologies (SEST), 1-6.
International Journal of Photoenergy, 2014. [22] TensorFlow (2021). A complete open source platform for
[7] Lusis, P., Khalilpour, K.R., Andrew, L., and Liebman, machine learning. Available at: https://www.tensorflow.org.
A.(2017). Short-term residential load forecasting: Impact of Accessed on: 01/03/2021
calendar effects and forecast granularity. Applied Energy, [23] Theocharides, S., Makrides, G., Venizelou, V., Kaimakis,
205, 654-669. P., and Georghiou, G.E. (2017). Pv Production Forecasting
[8] Hammer, Barbara (2007). Learning with Recurrent Neural Model Based on Artificial Neural Networks (Ann). 33rd
Networks. London: Springer. European Photovoltaic Solar Energy Conference, 1830 -
[9] Jozi, A. et al. (2016). Energy Consumption Forecasting 1894.
based on Hybrid Neural Fuzzy Inference System. 2016 [24] Yang, J. At al. Deep convolutional neural networks on
IEEE Symposium Series on Computational Intelligence multichannel time series for human activity recognition. in
[10] Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Ijcai, vol. 15, 2015, pp. 3995-4001.
Optimization. Computer Science, Mathematics ICLR 2015 [25] Yi, H., Shiyu, S., Duan, X., and Chen, Z. (2017). A study on
(SSCI). 6-9 Dec. 2016R. Diponsable at: https:// Deep Neural Networks framework. Proceedings of 2016
https://arxiv.org/abs/1412.6980.pdf. Accessed on: IEEE Advanced Information Management, Communicates,
01/03/2021. Electronic and Automation Control Conference, IMCEC
[11] Marino, D.L., Amarasinghe, K., and Manic, M. (2016). 2016, 1519-1522.
Building energy load forecasting using Deep Neural [26] Zhang, G. P. Time series forecasting using a hybrid arima
Networks. IECON Proceedings (Industrial Electronics and neural network model. Neurocomputing, 50:159-175,
Conference), 7046-7051. 2003.

www.ijaers.com Page | 299

You might also like