On Short Term Load Forecasting Using Mac
On Short Term Load Forecasting Using Mac
On Short Term Load Forecasting Using Mac
March 1, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3060290
ABSTRACT Since electricity plays a crucial role in countries’ industrial infrastructures, power companies
are trying to monitor and control infrastructures to improve energy management and scheduling. Accurate
forecasting is a critical task for a stable and efficient energy supply, where load and supply are matched. This
article discusses various algorithms and a new hybrid deep learning model which combines long short-term
memory networks (LSTM) and convolutional neural network (CNN) model to analyze their performance
for short-term load forecasting. The proposed model is called parallel LSTM-CNN Network or PLCNet.
Two real-world data sets, namely ‘‘hourly load consumption of Malaysia ’’ as well as ‘‘daily power electric
consumption of Germany’’, are used to test and compare the presented models. To evaluate the tested models’
performance, root mean squared error (RMSE), mean absolute percentage error (MAPE), and R-squared
were used. In total, this article is divided into two parts. In the first part, different machine learning models,
including the PLCNet, predict the next time step load. In the second part, the model’s performance, which
has shown the most accurate results in the first part, is discussed in different time horizons. The results show
that deep neural networks models, especially PLCNet, are good candidates for being used as short-term
prediction tools. PLCNet improved the accuracy from 83.17% to 91.18% for the German data and achieved
98.23% accuracy in Malaysian data, which is an excellent result in load forecasting.
INDEX TERMS Electricity, smart grids, load consumption, short-term load forecasting, deep learning, time
series, regression, convolutional neural networks, long short-term memory.
NOMENCLATURE I. INTRODUCTION
ANN : Artificial Neural Network According to the IEA report [1], in 2017, world electricity
ARIMA : Autoregressive Integrated Moving Average consumption reached 21,372 TWh, which is 2.6% higher than
CNN : Convolutional Neural Network 2016 electricity consumption. Such an annual increase cre-
DNN : Deep Neural Network ates a new problem: how to reduce consumption? Nowadays,
ETS : Exponential Smoothing many companies are working on this problem and trying to
LSTM Long Shot-Term Memory solve it. Demand Response Management, which is one of the
LTLF : Long-Term load forecasting main features in smart grids [2], helps to control electricity
MLR : Multiple Linear Regression consumption with the focus on the customer side. It is also
MTLF : Medium-Term Load Forecasting more essential to understand residential and non-residential
RNN : Recurrent Neural Network building demand and the use of electricity. Carrying out a
SARIMA : Seasonal ARIMA reduction in load consumption can lead to a high number of
STLF : Short-Term Load Forecasting economic and environmental benefits. Since experts aim to
SVR : Support Vector Machine create some automated tools that are able to deliver energy
very efficiently, they introduced load forecasting methods
The associate editor coordinating the review of this manuscript and as alternative solutions for electricity network augmentation
approving it for publication was Li Zhang . as it can be useful to manage the electricity demand and
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 31191
B. Farsi et al.: On Short-Term Load Forecasting Using Machine Learning Techniques and a Novel Parallel Deep LSTM-CNN Approach
provide more energy efficiency [3]. In addition, improving they have been successful to achieve impressive results. One
power delivery quality and having secure networks is a crit- of the challenging problems of ANN is that they need large
ical task in smart grids to monitor and support advanced scale data for training to learn the models. Therefore, in some
power distribution systems [4] and, in particular, to improve cases, regression-based approaches can be more useful.
load forecasting. Since future consumption could be pre- Different algorithms for load forecasting have been studied
dicted, they can be considered as tools to minimize the gap so far. The authors in [6] prepared an overview of different
between electricity supply and user consumption. However, types of load forecasting methods. They focused on the differ-
an inaccurate prediction may lead to a huge loss. For instance, ence between short-term, medium-term, and long-term load
a small percentage of increase in forecast error was predicted forecasting and the factors which affect them. The authors in
in 1985, which led to more than 10 million pounds of yearly [14] discussed a review of load forecasting with a focus on
detriment in the thermal UK power systems [5]. Thus, many regression-based approaches. To forecast day-ahead hourly
big companies have focused on accurate load forecasting and electricity load, they used two particular data sets from the
load management so that the Energy Supply Association of University of New South Wales, one from the Kensing-
Australia, as an instance, invested about 80% of its budget on ton Campus and the Tyree Energy Technologies Building.
grid upgrades. The authors in [7] surveyed different deep learning mod-
Load forecasting approaches are categorized in three dif- els for short-term load forecasting. They evaluated seven
ferent groups concerning their functionalities for differ- deep learning models on three real-world data sets. They
ent purposes [7]: Short-term Load Forecasting (STLF) [8], proved that there is no correlation between the complexity
Medium-term Load Forecasting (MTLF) and Long-term of models and golden results. Even in some cases, simpler
Load Forecasting (LTLF) [9]. STLF forecasts the following models can achieve better results in short-term load fore-
hour load to next week, while in MTLF, it is more than one casting. The authors in [15] proposed a hybrid deep neural
week to few months, and LTLF forecasts next years load network consisting of a CNN module, an LSTM module, and
consumption. For each of these methods, there are diverse a feature-fusion module and tested it on the 2-year electric
factors that influence the prediction. load data sets from the Italy-North area. Their model was
Due to the ability of STLF approaches, they have tremen- compared with some other machine learning models, includ-
dous importance in energy management. Hence, they have ing: Decision Tree, Random Forest, DeepEnergy, LSTM, and
been used to provide proper management in electric equip- CNN coming with better results. However, even though they
ment, and because of this contribution, they are known as achieved good results from their proposed model, their used
an inevitable component in Energy Management Systems. data set was not challenging. The authors did not challenge
An error in STLF can have an immediate impact on electrical the model with a more complex data set or prediction in
equipment. Several factors affect the STLF, including the different time horizons. The author in [16] proposed a parallel
following ones: (1) Time factor [6], which is the most crucial CNN-RNN model to predict one day ahead of load con-
factor for STLF because of the existence of some patterns sumption. Temperatures, holidays, hours of day, and days of
such as daily patterns in a set of data (2) Climate, which con- the week were used as features for the historical load series
tains temperature and humidity [6]. (3) Holidays can make and achieved better results than regression-based models,
considerable changes in electricity demand. However, this DNN and CNN-RNN. However, due to the existing vanishing
article focuses on time as a factor that influences electricity gradient descent problem in RNNs, they are not suitable
usage and can help achieve accurate predictions. enough to be used in load forecasting applications, and LSTM
Besides, diverse approaches can be applied to time-series networks should replace them. RNNs and the specific type
data to carry out accurate short-term forecasting. These of their family, LSTM, use control theory in their structure.
approaches consist of statistical regression models, classic They can find the dependency between old data and new
time-series models, and deep learning models. However, ones and become an interesting network for load forecasting
in addition to the factors as mentioned earlier, there are other applications in recent years. [17] has studied RNNs models
factors such as the size of the house, the age of appliances well. The authors in [20] proposed a new DeepEnergy model
and equipment [6], global factors like diseases, which can that combines 1-D CNN to extract the features and fully
affect the load prediction for medium-term and long-term connected network to forecast future load data. To forecast
forecasting. Still, most approaches have the same attributes the next three days’ data, they used an hourly electricity
with some subtle differences. Load consumption data sets can consumption data set from the USA. They compared the pro-
be viewed as time series. time-series have specific attributes, posed model’s result with five other machine learning tech-
such as Trend, Seasonality, and Noise, which will be dis- niques through RMSE and MAPE. The results showed that
cussed later. Due to numerous challenging problems when the DeepEnergy model could carry out an accurate short-term
dealing with time-series data, researchers deployed Artificial load forecasting compared to other models. After the Deep-
Neural Networks (ANN) [18] which have structures like Energy model, the Random Forest technique [21] had a good
the human brain. They are available to be used in various performance. However, as there is no LSTM network in this
areas such as Natural Language Processing (NLP) [10], audio model, it will have some difficulty working with more com-
recognition [11], medical [12] and load forecasting [19] and plex time-series data, and it can be expected that this model
FIGURE 3. ACF plot for an example data. X axis shows number of lags, FIGURE 4. PACF plot for same data in figure 3.
Y axis shows amount of auto-correlation.
Seasonal ARIMA or SARIMA is another kind of statistical is called multiple linear regression (MLR). In order to eval-
model that is widely used in seasonal data cases. In addition to uate this model, the Least Squared Error (LSE) technique is
the same parameters with ARIMA (P,D,Q)) four other param- used. The primary goal is that to find the best coefficients to
eters for the seasonal part of these models are p, d, q and m. minimize LSE. LSE evaluates the model by adding squares
Like ARIMA, p represents the order of Auto-regressive for of error between two variables, which in this case, is between
the seasonal part, d represents the order of integration for the actual values and forecasted ones. Equation (5) shows LSE
seasonal part, and q represents the order of Moving Average formula:
for the seasonal part. Besides, m shows the time horizon of Xn
seasonality. For example, for hourly data, m will be 24, and LSE = (Yi − Xi )2 (5)
for daily data, it will be 7. Therefore, SARIMA formulation i=1
is usually presented as SARIMA (P,D,Q)(p,d,q,m). where X is predicted value, Y is the actual value.
2) EXPONENTIAL SMOOTHING In order to use linear regression for load forecasting, some
Exponential Smoothing (ETS) is a well-known time-series parameters such as temperature, humidity, time are needed
forecasting model for power systems. It can be used as an to be used as independent variables. Likewise, the load
alternative to ARIMA models, in addition to its ability to consumption data are used as dependent variables in lin-
be used for STLF, MTLF, and LTLF. It uses a weighted ear regression models. With this approach, it is possible to
sum of past observations to make the prediction. The dif- use linear regression to forecast future load consumption.
ference between ETS and ARIMA models is that ETS uses However, there are some ways to forecast load consumption
an exponential decreasing weight for previous observations. without using exogenous variables. Lags can be used as the
It means recent observations have a higher weight than past independent variable for load forecasting to predict linear
observations. Therefore the accuracy depends on some coef- regression without using exogenous data. Usually, more than
ficients. The authors in [29] studied exponential smoothing one lag is used as an independent variable, so MLR is used
for load forecasting application using different coefficients. instead of simple linear regression. AC plot is a useful tool for
They used six different data sets collected from China to time-series analysis with linear regression. In this approach,
evaluate their model, and as it was assumed, they achieved a those lags in which their auto-correlation values are more than
high range of MAPE for different coefficient values. There a certain threshold can be used as an independent variable
are various types of ETS models that are used due to the in linear regression. For instance, according to figure 3 lags
complexity of data. Equation (3) indicates the formula of the [1, 2, 3, 24, 25] are chosen as independent variables with
simple Exponential Smoothing amount 0.6 for threshold. In total, in this model, lags are inde-
pendent variables, and actual load consumption is the depen-
Ft+1 = αAt + (1 − α)Ft (3) dent variable. An alternative way to increase the model’s
accuracy is that exogenous variables such as humidity, hol-
where Ft and Ft+1 indicate, predicted value in time t and t +1
iday, and weather are added to the model. Figure 6 shows the
respectively, At indicates actual value at time t and α is the
process of preparing data and choosing parameters for linear
smoothing factor (0 ≤ α ≤ 1).
regression models. This approach also is used for SVR and
3) LINEAR REGRESSION fully connected models too. According to the diagram, data
Regression-based approaches are interesting techniques, preparation refers to finding missing values and data normal-
and among all these techniques, linear regression has an ization. As discussed after plotting the AC plot, those lags
inevitable role. Some studies tried to use linear regression for with higher auto-correlation value than the selected threshold
time-series or specifically for load forecasting. The author are used as the models’ parameters. After choosing lags and
in [30] studied RGUKT, R.K valley campus for STLF and parameters, the models can predict again, however, if they
achieved MAPE = 0.029 and RMSE = 2.453. In another are not accurate enough, the amount of the threshold must be
study, the authors in [31] used it with different linear regres- changed, and the process is also re-started.
sion models, including multiple linear regression (MLR), 4) SUPPORT VECTOR REGRESSION (SVR)
Lasso, Ridge for hourly load data.
Support vector machine (SVM) is an approach that is used
Linear regression is a statistical method to find the relation
for classification and regression problems. SVM has become
among variables. This method is useful to estimate a variable
an exciting model among machine learning techniques due to
using influence parameters. The most straightforward linear
this model’s ability in different issues such as text or image
regression equation is as below:
analysis. For instance, the authors in [14] studied SVM for
Yi = β0 + β1 Xi + µi (4) supervised learning methods. However, the first objective of
SVM was classification. Nonetheless, this model has been
where Y is the dependent variable, β0 is an interceptor, β1 is extended to regression problems after a while, called support
the slope, X is the independent variable, and µi is residual of vector regression (SVR). SVR has the same procedure as
the model, which is distributed with zero mean and constant SVM, with some differences. This model’s objective is to find
variance. By increasing the number of variables, this model the most appropriate hyperplane with minimum acceptable
A. CASE STUDIES
Two different data sets are used to carry out STLF. The
authors in [25] used load consumption of the city of Johor
in Malaysia to predict day-ahead load consumption (hourly
prediction) using a model that combines neural network and
fuzzy time series. They used a new model, which was a
FIGURE 11. The workflow of the PLCNet model. combination of Fuzzy time-series and CNN (FTS-CNN).
2 to 25, and so on. Likewise, due to the time horizon of the They first created a sparse matrix through fuzzy logic and
prediction, the target data can be different. then, through CNN, extracted features and carried out STLF.
They also tried other models, including SARIMA, different
III. EXPERIMENTAL RESULTS LSTM models, different probabilistic weighted fuzzy time
Malaysian data is divided into two sets, the training set, series, and weighted fuzzy time series. Their proposed model
which contains the year 2009 load consumption, and the year (FTS-CNN) could achieve better results than other models for
2010 load consumption used as the test set. German data set two different years of Malaysia data, and RMSE was 1777.99,
is also divided, so that 2012-2015 data are used as the training 1702.70, respectively. This data is from a power company in
set, and 2016-2017 ones are used as the test set. All the mod- this city for the years 2009 and 2010 and consists of hourly
els are implemented in Python. This article used Keras library electric consumption in MW. It has 17518 rows, which show
with the back end of TensorFlow to implement deep neu- the aggregated load consumption of these two years in this
ral networks (DNN). Besides, Scikit-learn, Statsmodels, and city. Figure 12 illustrates part of this hourly data, and figure 13
Pmdarima libraries were used for regression and time-series shows a Boxplot of the whole data set how the loads are
modeling and analysis. distributed among days of a week.
FIGURE 14. The illustration of German load data. FIGURE 16. Part of the decomposition of Malaysian data.
Another data is Germany country-wide daily aggregated goal is to carry out STLF using just previous load consump-
electric consumption since 2006 to 2017 in GWh. This data tion data. Therefore, data must be prepared specifically for
is provided by Open Power System Data (OPSD) and is each model. Data are scaled between 0 and 1 through equation
used to predict day ahead load consumption. This data has (22) which is written as follows:
2186 recorded electric consumption in Germany. Figure 14
shows part of the German load data for almost 9 months and X − Xmin
Xsc = (22)
figure 15 shows the Boxplot of this data during a week. Xmax − Xmin
Part of Malaysian data and German data have been decom-
posed into seasonal, trend and noise. Figures 16 and 17 show C. EVALUATION METRICS
the original data and their decomposition. Black plots in both In order to evaluate models performance, root mean squared
figures show the seasonal part of each data. error (RMSE), mean absolute percentage error (MAPE) and
coefficient of determination (R2 ) are used.
B. DATA NORMALIZATION v
The acquired results from practical experiments proved that u
u 1 X N
to work with deep learning models, data should be prepared RMSE = t( ) (Ai − Fi )2 (23)
N
well [50], and results showed pre-processing is more signif- i=1
icant than the training process. As discussed before, in load 100% XN
Ai − Fi
forecasting, even though some parameters such as holidays, MAPE = (24)
N Ai
temperature, humidity, etc., affect the model, the article ’s i=1
FIGURE 18. Actual and predicted results from SARIMA, Malaysian data.
FIGURE 17. Part of the decomposition of German data.
SSR
R2 = 1 − (25)
SST
N
X
SSR = (Ai − Fi )2 (26)
i=1
XN
SST = (Ai − Āi )2 (27)
i=1
FIGURE 20. Actual and predicted results from ETS, Malaysian data.
FIGURE 22. AC plot of Malaysian data.
FIGURE 21. Actual and predicted results from ETS, German data.
FIGURE 23. Actual and predicted results from linear regression,
Malaysian data.
day of 2010 and test set has the shape of (8723,10) from first
day of 2010 to the end of this year. number of lags as variables explains why the results are not
Figure 23 shows predicted and actual data of load con- similar. The number of variables (lags) for Malaysian data
sumption for year 2010 in Malaysia. As can be seen, linear is 10, while it is 4 for German data. This point is the main
regression achieved accurate results for this data set. weakness of linear regression. Even though this model is
However, for the German data set, there are some dif- high-speed, it fails to achieve accurate results if there is not
ferences. The first difference is that 0.69 is chosen for the much auto-correlation in input data.
threshold. Figure 24 shows AC plot of German data. Accord-
ing to this plot and threshold, lags [7, 14, 21, 28] are being 4) THE EVALUATION OF SVR
used as independent variables for MLR. Historical data from As SVR is a regression-based approach, the same training
2012 to the end of 2015 are used as the training set. The and test sets for linear regression in the previous section are
shape of training data is (1456,4), and the test set is from used to evaluate the model. Various parameters affect SVR
2016 to the end of 2017 with a shape of (702,4). As this data to perform well. Among all these parameters, choosing the
is daily, the model predicted daily load consumption but as appropriate kernel has the most importance. For Malaysian
not good as predicted results from Malaysian data. Figure 25 data, the ’linear’ kernel had the best performance compared to
shows actual and predicted results from test set. According other kernels, and the German case selects ’radial bias func-
to these figures, while linear regression can predict hourly tion (rbf)’ as kernel according to its well-performance with
load series accurately, it fails to forecast accurately future this data set. In addition, figures 26 and 27 show the predicted
load consumption of daily load series. The difference in the load consumption from SVR for both data sets, respectively.
FIGURE 26. Actual and predicted results from SVR, Malaysian data.
FIGURE 24. AC plot of German data.
FIGURE 27. Actual and predicted results from SVR, German data.
FIGURE 25. Actual and predicted results from linear regression, German
data.
one dense is used in the output layer. This model learns the
As seen from the figures and tables, SVR predicted future parameters through ADAM optimizer in 20 epochs, and the
load consumption of Malaysia with less accuracy than linear size of each batch is 1. Besides, for whole layers, ReLU
regression. However, SVR has achieved more accurate results is used as the activation function. ReLU stands for Recti-
than linear regression for German data. SVR has better results fied Linear Unit, and it works like a linear function with a
than linear regression in German data because only 4 lags difference, which is its output for negative inputs is zero.
are considered independent and are not enough for linear This attribute helps DNN models to avoid vanishing gradient
regression. Nevertheless, it can be concluded that SVR is not problem. The mathematical formula is given below, while
the right candidate for STLF. figure 28 shows this function:
5) THE EVALUATION OF FULLY CONNECTED NEURAL F(x) = max(0, x) (28)
NETWORKS
A fully connected neural network has been used for the same As it could be assumed, the results from fully connected
training and testing sets used in two previous sections. This neural network (see figures 29 and 30) with ReLU would
network has 3 hidden layers in addition to input and output be close to results from linear regression while this model
layers. The first layer is the input layer, and hidden layers con- is more complex and needs more time to forecast future load
sist of 27, 18, and 18 dense layers, respectively. To avoid over- consumption. Especially, when training and test sets are the
fitting, before the output layer, a dropout (20%) layer is used. same for both models.
As this network is supposed to forecast load consumption,
FIGURE 28. ReLU illustrative function. FIGURE 30. Actual and predicted results from fully connected, German
data.
FIGURE 32. Actual and predicted load consumption from LSTM, German FIGURE 34. Actual and predicted results from CNN-LSTM, German data.
data.
FIGURE 35. Actual and predicted results from PLCNet, Malaysian data.
FIGURE 33. Actual and predicted results from CNN-LSTM, Malaysian data.
8.81 for RMSE and MAPE for Malaysian data and 0.316 and
the activation function. Like other DNN models, the ADAM 33.63 for RMSE and MAPE for German data. According to
optimizer has the role in compiling the model for Malaysian the MAPE and RMSE values, the short-term electric load
data, and the RMSprop optimizer is used for German data. forecasting accuracy of tested models in descending order is
Figures 33, 34 the results of forecasted data with CNN-LSTM as follows: the PLCNet, LSTM-CNN, LSTM, ARIMA, linear
model. regression, DNN, SVR, and ETS.
As it mentioned before, the PLCNet includes two different Besides, it can be seen in the figures that the PLCNet has
parallel paths, CNN path and LSTM path, and these two paths performed far better than other models, especially in the Ger-
are fed simultaneously by historical load data. According to man data set. Since the Malaysian data is hourly data, many
the figures 35 and 36, the model has a good performance for samples are available; thus, all the models can be trained well,
both data sets. while German data is a daily one, so all the models have been
trained with fewer samples. It leads to letting PLCNet model
E. RESULTS shows its power more with an accuracy of 91.18%, in the
The detailed experimental results are presented numerically German case. After that, LSTM has performed well, and its
in tables 2 and 3. As shown in these two tables, the PLCNet’s accuracy is 83.17%. Likewise, the accuracy of the PLCNet
MAPE and RMSE are the smallest, while the R2 score is model for Malaysian data is the highest, too, 98.23%. How-
the highest. Regarding the largest error value, the MAPE ever, in this case, there is no remarkable difference between
and RMSE of ETS have the highest error value in both the most accurate one and the second one, where LSTM-CNN
German and Malaysian data sets, where it has got 0.36 and accuracy is 97.49%.
FIGURE 36. Actual and predicted results from PLCNet, German data. FIGURE 37. Histogram plot of the PLCNet results for Malaysian data.
TABLE 2. Models Performance for Malaysian Data. of each deep learning model for both Malaysian and German
data sets. According to the tables 2 and 3, LSTM has the
highest runtime, but the main reason is that this model needs
more epochs to be trained and predict future load. It can be
seen that in table 4 LSTM is faster than LSTM-CNN and
DNN models in both data sets. However, the PLCNet results
show that this model has the highest accuracy and lowest error
amount. It is also the fastest model between deep learning
models where the runtime per epoch in Malaysian data is
4.5(s) and in German data is 0.93(s).
TABLE 3. Models Performance for German Data. Therefore, it is proven that the novel hybrid STLF algo-
rithm proposed in this article is practical and useful. Although
LSTM has a good performance when dealing with time-
series, its accuracy in German data set, which does not have
many samples, is not good enough. Therefore, the LSTM is
not suitable for this kind of prediction. Finally, the experimen-
tal results show that the PLCNet provides the best electricity
load forecasting results.
F. STATISTICAL ANALYSIS
TABLE 4. The Training Time Per Epoch of Deep Learning Models. A common approach to comparing the performance of the
machine learning models is using statistical methods to select
the best one. This section aims to compare the PLCNet, and
LSTM results since both achieved acceptable results in both
German and Malaysian cases. To provide statistical analysis
of the PLCNet and LSTM results, these two models were run
10 times. In terms of visualization, figures 37 and 38 show
Regarding the run time in the tables, linear regression the results histogram of the PLCNet and LSTM for Malaysian
and SVR are the fastest in German and Malaysian cases. data set, respectively.
However, they are outperformed by deep learning models. In this section, the t-test is used to understand the achieved
Besides, ARIMA and ETS are two computational techniques results are just some stochastic results or they are trustful
that take significant time for training. Even though all deep to perform statistical analysis. This analysis works based on
learning models in the tables took much time to be trained the null hypothesis. The null hypothesis is that two models
compared to regression-based approaches, their acquired (the PLCNet and LSTM) are similar to each other. There is
accuracy is acceptable. The difference between the training no difference between them, while the alternative hypothesis
time in deep learning models depends on the number of is that the two models perform differently. The considered
epochs considered. Table 4 indicates the runtime per epoch significance level is 5%, so if the acquired P-value is less than
FIGURE 38. Histogram plot of LSTM results for Malaysian data. FIGURE 40. Histogram plot of LSTM results for German data.
1) MALAYSIAN CASE
Since in previous sections, the prediction of next hour load
data through the PLCNet was discussed thoroughly, this
section studies the next one day, 2 days, and 10 days load
data in the following.
The Malaysian data is hourly data, so predicting one day
ahead load data next 24 time steps should be predicted,
FIGURE 39. Histogram plot of the PLCNet results for German data. leading to a subtle modification in the model’s architecture.
The last layer of the model, which is a dense layer, will have
5%, the null hypothesis can be rejected. It can be concluded 24 neurons to provide the next 24 hours prediction. To predict
that the PLCNet performs better than the LSTM model. After one-day data, the model looks back to 72 hours ago to train
carrying out some statistical analysis, the obtained P-value the algorithm within data and then predict the next 24 hours.
is 0.0411 (or 4.11%), less than 5%. Likewise, after applying Figure 41 shows the results of the prediction.
the statistical analysis to the German data case, it can be In order to forecast the next 2 days load data, the next
concluded that the results from the PLCNet are not stochas- 48 time steps should be predicted, so another modification is
tic, where the obtained P-value is 0.03019 (or 3.019%). needed to make the model able to forecast the next days’ load
Figures 38 and 39 illustrate the histogram plot of the results. data. Thus, the model will have 48 neurons in its last layer
(a dense layer). In terms of the training procedure, the model
G. DIFFERENT TIME HORIZONS looks back to 4 days ago (4days×24h) to be trained, and then
Previous sections discussed some machine learning tech- it predicts the next 2 days. Figure 42 illustrates the prediction
niques to predict next time step load data consumption. and actual load data.
In other words, at time t, they predicted the load data at This paragraph aims to predict the next 10 days of load
time t + 1. Since the Malaysian data is hourly data and data, an MTLF task but a big challenge for the model. Since
German data is a daily one, all the models predicted the next the samples have been recorded hourly in Malaysian data,
hour load of Malaysian data and the next-day German data the model must predict 240 values (10 days × 24h). There
load. This section aims to challenge the PLCNet in different are 240 neurons in the last layer. The model looks back to
horizons. In the Malaysian case, the PLCNet will predict the 10 days ago and predicts 10 days ahead load data in the
next 24 hours, next 48 hours, and next 10 days load data, training process. According to figure 43, the results show that
FIGURE 41. 1 day ahead results using the PLCNet, Malaysian data. FIGURE 43. 10 days ahead results using the PLCNet, Malaysian data.
FIGURE 42. 2 days ahead results using the PLCNet, Malaysian data.
the PLCNet has an acceptable performance in this task where TABLE 8. The Comparison Table for Malaysian Data in Terms of RMSE.
the results are close to next hour, one day and 2 days ahead
outputs.
2) RESULTS
Tables 5 and 6 show the results of next days prediction.
However, to have comprehensive knowledge in terms of the models including DeepEnergy [20], LSTM and CNN-LSTM,
model’s performance, the results of the next hour prediction tables 7 and 8 are provided to compare the results of all these
are added to these tables again. In this table, A1, A2, A3 and models in different time horizons in terms of RMSE and R2
A4 represent next hour, next day, next 2 days and next 10 days score.
results.
Even though forecasting future load data in longer time 3) GERMAN CASE
horizons is a challenging task, according to these tables, there Same as Malaysian data, the model is challenged through
is almost 4% difference between the accuracy of the next German data set but in different time horizons. It predicts the
hour prediction and the next 10 days prediction which is an next 7 days, next 10 days, and next 30 days load data.
acceptable difference and the model has a good performance The model looks back to 7 days ago to perform 7 days
in Malaysian case. ahead prediction and to do this task it needs 7 neurons in its
Likewise, in order to make sure the PLCNet has better last layer. Figure 44 shows the result of using the PLCNet to
performance rather than two other discussed deep learning forecast Germany’s next 7 days load data.
FIGURE 44. 7 days ahead results using the PLCNet, German data. FIGURE 46. 30 days ahead results using the PLCNet, German data.
TABLE 11. The Comparison Table for German Data in Terms of R 2 Score.
FIGURE 45. 10 days ahead results using the PLCNet model, German data.
This paragraph discusses the results of 10 days ahead available in tables 10 and 9 to demonstrate the comparison
prediction. As the model is supposed to predict next 10 days, between different horizons for same data sets. The modeling’s
there are 10 neurons in the last layer of the model. Besides, name B1, B2, B3, and B4 represent the modeling for 1, 7,
since forecasting the next days data is a bit harder than 10, and 30 days ahead, respectively. These two tables indi-
next 7 days, the model looks back to 10 days ago data to cate that there are not many differences between one day
understand the algorithm within load series better. The results ahead prediction and 10 days ahead prediction. Still, it is
are shown in figure 45. a big challenge for the model to predict the next 30 days
If the PLCNet can carry out an LTLF task, it can be load series because the average accuracy for the next day
introduced as a well-performance tool in load forecasting prediction is 91.31% while it is 82.49% for the next 30 days
applications. To evaluate the LTLF task model’s performance, prediction. The problem is that the German data is not a
this section aims to predict the next 30 days of German load big data set, and it has only 2186 recorded samples, so it
data, and the results are shown in figure 46. Like previous is difficult for the model to learn all the parameters well
sections, there are 30 neurons in the last layer of the model to while looking back 30 previous steps and predicting the next
predict future load data in terms of the model architecture. 30 steps.
Same as the Malaysian data, a comparison table
4) RESULTS (see tables 11 and 12) is provided for German results to prove
So far, the German data set’s illustrative results have been that the PLCNet not only has a better performance in one step
shown, and in the following, the numerical results are ahead prediction but also it can perform better than other deep
available. Besides, the one day ahead prediction results are learning models in different time horizons.
TABLE 12. The Comparison Table for German Data in Terms of RMSE. choice for IoT compared to other techniques. Thus further
work could be devoted to using deep learning models such as
the PLCNet in this article for online load forecasting tasks.
REFERENCES
[1] World Energy Outlook 2019, IEA, Paris, France, 2019, doi: 10.1787/
IV. CONCLUSION caf32f3b-en.
[2] H. Daki, A. E. Hannani, A. Aqqal, A. Haidine, and A. Dahbi, ‘‘Big data
With smart grids on the rise, the importance of short-term management in smart grid: Concepts, requirements and implementation,’’
load forecasting highly increases. To predict the future load J. Big Data, vol. 4, no. 1, pp. 1–19, Dec. 2017, doi: 10.1186/s40537-017-
consumption, some factors such as weather can affect the 0070-y.
[3] S. Acharya, Y. Wi, and J. Lee, ‘‘Short-term load forecasting for a single
results. The lack of future weather is a challenging problem household based on convolution neural networks using data augmenta-
for load forecasting. In this article, the previous consumption tion,’’ Energies, vol. 12, no. 18, p. 3560, Sep. 2019.
was used as a parameter to predict the load one step ahead. [4] X. Fang, S. Misra, G. Xue, and D. Yang, ‘‘Smart grid—The new and
improved power grid: A survey,’’ IEEE Commun. Surveys Tuts., vol. 14,
Some non-deep learning approaches like linear regression no. 4, pp. 944–980, 4th Quart., 2012.
or ARIMA have proven powerful tools for accurate load [5] J. Sharp, ‘‘Book reviews,’’ Int. J. Forecasting, vol. 2, no. 2, pp. 241–242,
forecasting. However, regression-based methods come with 1986.
[6] V. Gupta, ‘‘An overview of different types of load forecasting methods
some disadvantages. In order to use these models, such
and the factors affecting the load forecasting,’’ Int. J. Res. Appl. Sci. Eng.
as SVR and linear regression, lags are used as parameters Technol., vol. 5, no. 4, pp. 729–733, Apr. 2017.
through auto-correlation (AC) values. As the threshold value [7] L. Ekonomou, C. A. Christodoulo, and V. Mladenov, ‘‘A short-term load
is subjective, the number of lags as regression-based mod- forecasting method using artificial neural networks and wavelet analysis,’’
Int. J. Power Syst., vol. 1, pp. 64–68, Jul. 2016.
els’ parameters can be different. Fully connected networks [8] F. Javed, N. Arshad, F. Wallin, I. Vassileva, and E. Dahlquist, ‘‘Forecast-
also use the same approach as regression-based approaches. ing for demand response in smart grids: An analysis on use of anthro-
Because there is no constant threshold to find those lags pologic and structural data and short term multiple loads forecasting,’’
Appl. Energy, vol. 96, pp. 150–160, Aug. 2012, doi: 10.1016/j.apenergy.
that are suitable to be used as variables, this procedure 2012.02.027.
(finding lags through AC plot) may lead to higher errors. [9] C. Xia, J. Wang, and K. Mcmenemy, ‘‘Short, medium and long term
ARIMA and ETS also are two well-known time-series anal- load forecasting model and virtual load forecaster based on radial basis
function neural networks,’’ Int. J. Electr. Power Energy Syst., vol. 32, no. 7,
ysis approaches. However, some parameters need to be tuned pp. 743–750, Sep. 2010.
to work with these methods. This procedure needs numerous [10] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, ‘‘Convolu-
trials to find the best values for them. Furthermore, in time- tional sequence to sequence learning,’’ 2017, arXiv:1705.03122. [Online].
Available: http://arxiv.org/abs/1705.03122
series methods, data must be analyzed to find out if they are [11] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,
stationary or not. In contrast, LSTM can achieve good results A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, ‘‘WaveNet:
whether data is stationary or not. CNN-LSTM also is a hybrid A generative model for raw audio,’’ 2016, arXiv:1609.03499. [Online].
Available: https://arxiv.org/abs/1609.03499
model, which is used in various load forecasting studies. [12] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, ‘‘Convolutional
The PLCNet achieves the best results between all the dis- neural networks: An overview and application in radiology,’’ Insights
cussed models where the accuracy increase from 83.17% to Imag., vol. 9, no. 4, pp. 611–629, Aug. 2018, doi: 10.1007/s13244-018-
0639-9.
91.18% for load data in a German case study. Likewise, for a [13] A. Gasparin and C. Alippi, ‘‘Deep learning for time series forecasting:
Malaysian data set, the model’s obtained accuracy is 98.23% The electric load case,’’ 2019, arXiv:1907.09207. [Online]. Available:
which is very high for time-series results and the RMSE is https://arxiv.org/abs/1907.09207
[14] B. Yildiz, J. I. Bilbao, and A. B. Sproul, ‘‘A review and analysis of
very low at 0.031. In summary, the PLCNet improves the regression and machine learning models on commercial building electricity
results remarkably for the German data set. Besides, while load forecasting,’’ Renew. Sustain. Energy Rev., vol. 73, pp. 1104–1122,
all the models have acceptable Malaysian data performance, Jun. 2017.
[15] C. Tian, J. Ma, C. Zhang, and P. Zhan, ‘‘A deep neural network model
the most accurate results come from the PLCNet. It is faster for short-term load forecast based on long short-term memory network
than other deep learning models to train both German and and convolutional neural network,’’ Energies, vol. 11, no. 12, p. 3493,
Malaysian data in terms of runtime. This improvement and Dec. 2018.
[16] W. He, ‘‘Load forecasting via deep neural networks,’’ Procedia Comput.
highly accurate results, as well as a quick training process, Sci., vol. 122, pp. 308–314, 2017, doi: 10.1016/j.procs.2017.11.374.
prove that this novel hybrid model is a good choice for STLF [17] C. Gallicchio and A. Micheli, ‘‘Deep echo state network (Deep-
tasks. The PLCNet model was also evaluated in different hori- ESN): A brief survey,’’ 2017, arXiv:1712.04323. [Online]. Available:
http://arxiv.org/abs/1712.04323
zons, and it performed better than other deep learning models.
[18] G. Zhang, B. E. Patuwo, and M. Y. Hu, ‘‘Forecasting with artificial neural
The accuracy of the PLCNet in Malaysian experiments for networks: The state of the art,’’ Int. J. Forecast., vol. 14, pp. 35–62,
different horizons is between 94.16% and 98.14%. In German Mar. 1998.
data, it is between 82.49% and 91.31%, which are acceptable [19] M. Q. Raza and A. Khosravi, ‘‘A review on artificial intelligence based
load demand forecasting techniques for smart grid and buildings,’’ Renew.
results compared to other deep learning models’ results. Sustain. Energy Rev., vol. 50, pp. 1352–1372, Oct. 2015.
The interest in using artificial neural networks for electric [20] P.-H. Kuo and C.-J. Huang, ‘‘A high precision artificial neural networks
load forecasting is winning ground in research and industries, model for short-term energy load forecasting,’’ Energies, vol. 11, no. 1,
p. 213, Jan. 2018, doi: 10.3390/en11010213.
especially when deployed in IoT applications. According to [21] A. Liaw and M. Wiener, ‘‘Classification and Regression by random For-
the discussed results, deep learning models can be the right est,’’ R News, vol. 2, no. 3, pp. 18–22, 2002.
[22] C. Gerwig, ‘‘Short term load forecasting for residential buildings—An [46] F. Milletari, N. Navab, and S.-A. Ahmadi, ‘‘V-Net: Fully convolutional
extensive literature review,’’ in Intelligent Decision Technologies. IDT neural networks for volumetric medical image segmentation,’’ in Proc. 4th
(Smart Innovation, Systems and Technologies), vol. 39, R. Neves-Silva, Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 565–571.
L. Jain, and R. Howlett, Eds. Cham, Switzerland: Springer, 2017. [47] B. Stephen, X. Tang, P. R. Harvey, S. Galloway, and K. I. Jennett, ‘‘Incor-
[23] L. Hu and G. Taylor, ‘‘A novel hybrid technique for short-term electricity porating practice theory in sub-profile models for short term aggregated
price forecasting in UK electricity markets,’’ J. Int. Council Electr. Eng., residential load forecasting,’’ IEEE Trans. Smart Grid, vol. 8, no. 4, pp.
vol. 4, no. 2, pp. 114–120, Apr. 2014, doi: 10.5370/JICEE.2014.4.2.114. 1591–1598, Jul. 2017, doi: 10.1109/TSG.2015.2493205.
[24] C.-J. Huang and P.-H. Kuo, ‘‘Multiple-input deep convolutional neural net- [48] A. Z. Hinchi and M. Tkiouat, ‘‘Rolling element bearing remaining useful
work model for short-term photovoltaic power forecasting,’’ IEEE Access, life estimation based on a convolutional long-short-term memory net-
vol. 7, pp. 74822–74834, 2019, doi: 10.1109/ACCESS.2019.2921238. work,’’ Procedia Comput. Sci., vol. 127, 2018, Art. no. 123132.
[25] H. J. Sadaei, P. C. D. L. E. Silva, F. G. Guimarães, and M. H. Lee, [49] N. Srivastava, R. Salakhutdinov, A. Krizhevsky, I. Sutskever, and
‘‘Short-term load forecasting by using a combined method of convolutional G. Hinton, ‘‘Dropout: A simple way to prevent neural networks from
neural networks and fuzzy time series,’’ Energy, vol. 175, pp. 365–377, overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
May 2019. [50] J. Sola and J. Sevilla, ‘‘Importance of input data normalization for the
[26] F. M. Bianchi, E. Maiorino, M. C. Kampmeyer, A. Rizzi, and R. Jenssen, application of neural networks to complex industrial problems,’’ IEEE
Recurrent Neural Networks for Short-Term Load Forecasting (Springer- Trans. Nucl. Sci., vol. 44, no. 3, pp. 1464–1468, Jun. 1997, doi: 10.
Briefs in Computer Science). Springer, 2017, doi: 10.1007/978-3-319- 1109/23.589532.
70338-1. [51] J. Li, X. Li, and D. He, ‘‘A directed acyclic graph network combined with
[27] R. M. Pratapa and A. Laxmi, ‘‘IOT based online load forecasting CNN and LSTM for remaining useful life prediction,’’ IEEE Access, vol. 7,
using machine learning algorithms,’’ Procedia Comput. Sci., vol. 171, pp. 75464–75475, 2019, doi: 10.1109/ACCESS.2019.2919566.
pp. 551–560, 2020, doi: 10.1016/j.procs.2020.04.059.
[28] N. Amjady, ‘‘Short-term hourly load forecasting using time-series model- BEHNAM FARSI received the bachelor’s degree
ing with peak load estimation capability,’’ IEEE Trans. Power Syst., vol. 16, from the Electrical Engineering Department, Fer-
no. 4, pp. 798–805, Nov. 2001, doi: 10.1109/59.962429. dowsi University, Iran, in 2019. He is currently
[29] P. Ji, D. Xiong, P. Wang, and J. Chen, ‘‘A study on exponential smoothing pursuing the master’s degree with the Concor-
model for load forecasting,’’ in Proc. Asia–Pacific Power Energy Eng. dia Institute for Information Systems Engineer-
Conf., Mar. 2012, pp. 1–4. ing (CIISE) Department, Concordia University,
[30] M. D. Reddy and N. Vishali, ‘‘Load forecasting using linear regression Canada. His master’s thesis topic was on load
analysis in time series model for RGUKT, RK valley campus HT feeder,’’ forecasting. His research interests include machine
Int. J. Eng. Res. Technol., vol. 6, no. 5, pp. 624–625, 2017. learning, deep learning, load forecasting, and
[31] G. Dudek, ‘‘Pattern-based local linear regression models for short- remaining useful life prediction.
term load forecasting,’’ Electr. Power Syst. Res., vol. 130, pp. 139–147,
Jan. 2016. MANAR AMAYRI received the bachelor’s degree
[32] E. Ceperic, V. Ceperic, and A. Baric, ‘‘A strategy for short-term load in power engineering from Damascus University,
forecasting by support vector regression machines,’’ IEEE Trans. Power
Syria, the master’s degree in electrical power sys-
Syst., vol. 28, no. 4, pp. 4356–4364, Nov. 2013.
tems from the Power Department, Damascus Uni-
[33] B.-J. Chen, M.-W. Chang, and C.-J. Lin, ‘‘Load forecasting using support
versity, the master’s degree in smart grids and
vector machines: A study on EUNITE competition 2001,’’ IEEE Trans.
Power Syst., vol. 19, no. 4, pp. 1821–1830, Nov. 2004. buildings from ENES3, INP-Grenoble (Institute
[34] H. S. Hippert, C. E. Pedreira, and R. C. Souza, ‘‘Neural networks for short- National Polytechnique de Grenoble), in 2014, and
term load forecasting: A review and evaluation,’’ IEEE Trans. Power Syst., the Ph.D. degree in energy smart-buildings from
vol. 16, no. 1, pp. 44–55, Feb. 2001. the Grenoble Institute of Technology, in 2017. She
[35] X. Ying, ‘‘An overview of overfitting and its solutions,’’ J. Phys., is currently an Assistant Professor with ENES3,
Conf. Ser., vol. 1168, Feb. 2019, Art. no. 022022, doi: 10.1088/1742- INP-Grenoble, G-SCOP Lab (Sciences pour la conception, l’Optimisation
6596/1168/2/022022. et la Production).
[36] S. Ruder, ‘‘An overview of gradient descent optimization algorithms,’’
2016, arXiv:1609.04747. [Online]. Available: http://arxiv.org/abs/1609. NIZAR BOUGUILA (Senior Member, IEEE)
04747 received the Engineer degree from the Univer-
[37] D. P. Kingma and J. L. Ba, ‘‘Adam: A method for stochastic optimization,’’ sity of Tunis, Tunis, Tunisia, in 2000, and the
in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, M.Sc. and Ph.D. degrees in computer science from
May 2015. [Online]. Available: http://arxiv.org/abs/1412.6980 Sherbrooke University, Sherbrooke, QC, Canada,
[38] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural in 2002 and 2006, respectively. He is currently a
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. Professor with the Concordia Institute for Infor-
[39] Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, ‘‘Recurrent neural mation Systems Engineering (CIISE), Concordia
networks for multivariate time series with missing values,’’ Sci. Rep., vol. 8, University, Montreal, QC, Canada. His research
no. 1, pp. 1–12, Dec. 2018. interests include machine learning, data mining,
[40] S. Muzaffar and A. Afshari, ‘‘Short-term load forecasts using LSTM computer vision, and pattern recognition applied to different real-life
networks,’’ Energy Procedia, vol. 158, pp. 2922–2927, Feb. 2019. problems.
[41] B.-S. Kwon, R.-J. Park, and K.-B. Song, ‘‘Short-term load forecasting
based on deep neural networks using LSTM layer,’’ J. Electr. Eng. Technol., URSULA EICKER has held leadership positions as
vol. 15, no. 4, pp. 1501–1509, Jul. 2020.
a German Physicist with the Centre for Sustain-
[42] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-
able Energy Technologies, Stuttgart University of
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
Applied Sciences. Since June 2019, she has been
pp. 2278–2324, Nov. 1998.
leading an ambitious research program to estab-
[43] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
with deep convolutional neural networks,’’ in Advances in Neural Infor- lish transformation strategies toward zero-carbon
mation Processing Systems 25. Red Hook, NY, USA: Curran Associates, cities. She is currently the Canada Excellence
2012, pp. 1097–1105. Research Chair (CERC) for Next Generation
[44] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Cities with Concordia University, Montreal. Her
Dec. 2015, pp. 1440–1448. current research interests include urban scale mod-
[45] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, ‘‘Audio- eling, zero carbon buildings, renewable energies, and circular economy
visual speech recognition using deep learning,’’ Appl. Intell., vol. 42, no. 4, strategies.
pp. 722–737, 2015.