Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
Article
Building Energy Consumption Prediction:
An Extreme Deep Learning Approach
Chengdong Li 1,∗ ID
, Zixiang Ding 1 , Dongbin Zhao 2 , Jianqiang Yi 2 and Guiqing Zhang 1
1 School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;
[email protected] (Z.D.); [email protected] (G.Z.)
2 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (D.Z.);
[email protected] (J.Y.)
* Correspondence: [email protected]; Tel.: +86-188-6641-0727
Abstract: Building energy consumption prediction plays an important role in improving the energy
utilization rate through helping building managers to make better decisions. However, as a result of
randomness and noisy disturbance, it is not an easy task to realize accurate prediction of the building
energy consumption. In order to obtain better building energy consumption prediction accuracy,
an extreme deep learning approach is presented in this paper. The proposed approach combines
stacked autoencoders (SAEs) with the extreme learning machine (ELM) to take advantage of their
respective characteristics. In this proposed approach, the SAE is used to extract the building energy
consumption features, while the ELM is utilized as a predictor to obtain accurate prediction results.
To determine the input variables of the extreme deep learning model, the partial autocorrelation
analysis method is adopted. Additionally, in order to examine the performances of the proposed
approach, it is compared with some popular machine learning methods, such as the backward
propagation neural network (BPNN), support vector regression (SVR), the generalized radial basis
function neural network (GRBFNN) and multiple linear regression (MLR). Experimental results
demonstrate that the proposed method has the best prediction performance in different cases of the
building energy consumption.
1. Introduction
Nowadays, with economic growth and the population increasing, more and more energy is
consumed. As one aspect of the energy consumption, building energy consumption accounts for
a considerable proportion [1,2]. For example, in China, statistical data shows that building energy
consumption accounted for 28% of the total energy consumption in 2011, and that it will reach 35%
by 2020 [3]; in the United States, building energy consumption is close to 39% of the total energy
consumption [4].Therefore, it is necessary to propose some efficient strategies to promote the building
energy utilization rate. Building energy consumption prediction can help building managers to make
better decisions so as to reasonably control all kinds of equipment. Hence, this is an efficient and
helpful way to reduce the consumption of building energy and to improve the energy utilization rate.
A great number of prediction methods have been proposed in the past several decades for building
energy consumption prediction. The majority of the case studies depend on the historical energy
consumption time series data to construct the prediction models [5]. Generally, the proposed methods
for building energy consumption prediction fall into two categories, which are statistical methods and
artificial intelligence methods.
The statistical methods utilize the historical data to construct probabilistic models in order to
estimate and analyze the future energy consumption. In [6], principal component analysis (PCA) was
utilized to select the significant inputs of the energy consumption prediction model. In [7], linear
regression was applied to estimate electricity consumption in an institutional building, and moreover,
fuzzy modeling and neural networks were chosen as two comparative approaches to evaluate the
performance of the linear regression method. In [8], the autoregressive model with extra inputs (ARX)
was utilized to estimate the parameters of building components. In [9], Kimbara et al. developed
an autoregressive integrated moving average (ARIMA) model to implement online building energy
consumption prediction. In [10], the ARIMA with external inputs (ARIMAX) model was applied to
predict the power demand of the buildings. In [11], a regression-based method—conditional demand
analysis (CDA)—was used for predicting the building energy consumption.
Generally speaking, the artificial intelligence methods can obtain more accurate prediction
results in most real-world applications and have been widely applied to the prediction of building
energy consumption. In [12], clusterwise regression, a novel technique that integrates clustering
and regression simultaneously was proposed for forecasting building energy consumption. In [13],
a clustering method was proposed to find the similarity of pattern sequences for electricity prices and
demand prediction. In [14], a k-means method was presented for analyzing the pattern of electricity
consumption in buildings. Additionally, data mining techniques applied to electricity-related time
series forecasting were surveyed in [15]. In [16], a decision tree was used to understand the energy
consumption patterns and to forecast the energy consumption levels. In addition, in [17], a random
forest (RF) was used to help facility managers to improve the energy efficiency in buildings. In [18],
a support vector machine (SVM) was utilized to predict the energy consumption of low-energy
buildings with a relevant data selection method. Artificial neural networks (ANNs) play an important
role in the forecasting of building energy consumption, and different kinds of ANNs have been given
for this application. In [19], a short-term predictive ANN model for electricity demand was developed
for the bioclimatic building. In [20], the Levenberg–Marquardt and Output-Weight-Optimization
(OWO)-Newton algorithm-based ANN was utilized to forecast the residential building energy
consumption. In [21,22], the ANN combined with a fuzzy inference system was examined by the
building energy consumption prediction. In [23], two adaptive ANNs with accumulative training
and sliding window training were proposed for real-time online building energy prediction. In [24],
an ANN trained by the extreme learning machine (ELM) was proposed to estimate the building
energy consumption and was compared with the genetic algorithm (GA)-based ANN. Furthermore,
a hybrid method, the radial basis function neural network (RBFNN), combined with the particle swarm
optimization (PSO) algorithm was used to improve the building energy efficiency in [25]. Although
the statistical methods and the existing artificial intelligence methods can give satisfactory results, it is
still a challenging task to obtain accurate prediction results because of random characteristics that
can be affected by the weather, the working hours, the human distribution and the equipment in the
buildings. On the other hand, the deep learning techniques that have emerged in recent years provide
us with a powerful tool to achieve better modeling and prediction performance. The deep learning
algorithm uses deep architectures or multiple-layer architectures adopting the layerwise pre-training
method for parameter optimization to obtain great feature learning ability [26]. The inherent features
of data extracted from the lowest level to the highest level of the deep learning model are more
representative than for the traditional shallow neural network. Hence, the deep architectures have
greatly improved performance for the modeling, classification and visualization problems, and they
have found lots of applications. In [27], a single convolutional neural network architecture with a
multitask learning strategy was designed for natural language processing (NLP). In [28], the deep
autoencoder network was utilized to convert high-dimensional data to low-dimensional codes, and
experiments demonstrated that it works much better than PCA for dimensionality reduction. In [29],
a stacked autoencoder (SAE) was applied for organ identification in medical magnetic resonance
images. The deep learning approaches have also been applied to time series prediction problems.
Energies 2017, 10, 1525 3 of 20
In [30], an ensemble deep learning approach was utilized for time series predictions of seven
small-batch data sets. In [31], a SAE-based deep neural network (DNN) was constructed to approximate
the Q-function in the reinforcement learning of traffic signal timing. In [32], a SAE was utilized to
realize the traffic-flow prediction on the basis of traffic-flow time series. Additionally, in [33], a deep
learning-based approach for time series forecasting with an application to electricity load was given.
In all these applications, the experimental results demonstrated that the deep learning approaches can
outperform the comparative methods.
Compared with the data sets in the research domains of image recognition, speech recognition,
and machine vision, for example, the data sets in the time series prediction applications [30–33] do
not have a large quantity of data. However, in these applications, the deep learning approaches,
including the SAE approach, still performed better than some traditional machine learning methods
because of the relatively deeper architectures and the improved or newly proposed learning strategies
in the deep learning approaches. In this paper, to enhance the prediction performance, we propose an
extreme deep learning approach to estimate building energy consumption. The proposed approach
combines the SAE with the ELM to make full use of their respective advantages. The SAE is used to
extract the building energy consumption features. Additionally, the ELM is utilized as a predictor to
obtain accurate prediction results. In the proposed extreme SAE approach, only the pre-training of the
SAE is needed, while the fine-tuning of the whole network is replaced by least-squares learning of
the parameters in the last fully connected layer. In addition, in order to determine reasonable input
variables for the extreme deep learning model, the partial autocorrelation analysis method is adopted in
this application. Finally, the proposed approach is compared with some popular methods, such as the
backward propagation neural network (BPNN), support vector regression (SVR), the generalized radial
basis function neural network (GRBFNN) and multiple linear regression (MLR). The experimental
results demonstrate that the proposed deep learning model has the best prediction ability for both the
30 and 60 min experiments.
The rest of this paper is organized as follows. In Section 2, the mechanisms of the autoencoder and
the SAE are reviewed, and the extreme deep learning model is presented. In Section 3, the prediction
model for building energy consumption is discussed in detail. Two experiments on the prediction of
the 30 and 60 min building energy consumption have been performed and the experimental results are
given in Section 4. Finally, the conclusions of this paper are drawn in Section 5.
2. Methodology
In this section, the structure and learning mechanism of the SAE is introduced first. Then, the
extreme deep architecture is shown, and the parameter learning algorithm is given.
N
To begin, we assume that there are N input–output training data pairs (x (m) , y(m) ) m=1 , where
(m) (m) ( m ) T
x ( m ) = x1 , x2 , . . . , x n ∈ Rn is the input part with n input variables and y(m) is the output part
with only one output variable.
2.1.1. Autoencoder
The autoencoder is an unsupervised neural network and is composed of three layers which are
the input layer, the hidden layer, and the output layer [28]. It attempts to dig out a limited number of
representations to reconstruct its input, that is, the target output is equal to the input of the model.
The structure of one autoencoder with L hidden nodes is demonstrated in Figure 1.
Energies 2017, 10, 1525 4 of 20
In the autoencoder, there are two processes—the encoding process and the decoding process.
In the encoding process, an autoencoder attempts to dig out a hidden representation σ 1 (x ), which can
be computed as
σ 1 (x ) = f (w 1 x + b 1 ) (1)
where w 1 is an encoding matrix, b 1 is an encoding bias vector, and f (·) is the activation function that
can be chosen as the sigmoid function or the tanh function.
Additionally, in the decoding process, a decoding matrix needs to be determined to decode the
hidden representation σ 1 (x ) back into a reconstruction σ 2 (x ). The decoded output can be computed as
σ 2 (x ) = g(w 2σ 1 (x ) + b 2 ) (2)
where w 2 and b 2 are respectively the decoding matrix and the decoding bias vector, and again, g(·) is
the activation function that can be chosen as the sigmoid function or the tanh function.
We always expect the error between the input x and the reconstruction σ 2 (x ) to be as small as
possible. This can be achieved by minimizing the following loss function [35]:
1 N
2 m∑
L(x , σ 2 (x )) = kx (m) − σ 2 (x (m) )k2 (3)
=1
In other words, the optimal parameter set of the autoencoder can be determined by solving the
following optimization problem:
In the autoencoder, this optimization problem is often solved using one of the variants of the
back-propagation algorithm, such as the conjugate gradient method or the steepest descent method,
for example.
and to learn useful structures from the input data [36–39]. This allows for sparse representations of
inputs and is useful for pre-training in many tasks.
The autoencoder with sparsity constraints is called the sparsity autoencoder. We minimize the
following loss function, which imposes a sparsity constraint on the reconstruction error to obtain the
optimal parameters of the sparsity autoencoder [36–39]:
L
SL = L(x , σ 2 (x )) + λ ∑ KL(ρ k ρ̂ j ) (5)
j =1
where λ is the penalty coefficient, ρ is a sparsity parameter that is typically a small value close to zero,
N (m) ) is the average activation of the jth hidden node with respect to the training
ρ̂ j = (1/N ) ∑m =1 σ 1 (x
set, and KL(ρ k ρ̂ j ) is the Kullback–Leibler divergence, which is also called the relative entropy and is
defined as follows [36–39]:
ρ 1−ρ
KL(ρ k ρ̂ j ) = ρlog + (1 − ρ)log (6)
ρ̂ j 1 − ρ̂ j
where w 1l and b 1l (l = 1, 2, . . . , k ) are respectively the encoding matrix and the encoding bias vector
of the lth autoencoder. Additionally, the activation function f (·) can be chosen as the sigmoid or
tanh function.
Figure 2. The stacked autoencoders with k hidden layers and its layerwise training process.
Energies 2017, 10, 1525 6 of 20
The parameter learning algorithm of the SAE is not mentioned in this subsection; it is introduced
in next subsection in detail.
To design an extreme SAE that performs well, optimal parameters, including the parameters in
the SAE part and the parameters in the ELM part, should be determined first. In this study, we use
two steps to determine these parameters. In the first step, we pre-train the parameters in the SAE part.
Then, in the second step, we utilize the least-squares method to find the parameters in the ELM part.
• Step 1: Train the first layer as an autoencoder by minimizing Equation (3) using the training
samples as the input, and let v = 2.
• Step 2: Train the vth layer by minimizing Equation (3) using σ 1v−1 (x ) as its input.
• Step 3: Let v = v + 1, and iterate Step 2 until v > k.
Energies 2017, 10, 1525 7 of 20
Here, σ 1v−1 (x ) is the hidden representation of the (v − 1)th layer, and k is the desired number of
hidden layers.
N
∑ kŷ(m) − y(m) k = 0 (8)
m =1
σ k (x )β = y (9)
where σ k (x ) is the output matrix of the SAE as well as the input matrix of the ELM and can be
expressed as
β= [ β 1 , β 2 , . . . , β nk ]Tnk ×1 (11)
y= [y(1) , y(2) , . . . , y( N ) ]TN ×1 (12)
According to the knowledge of matrix theory, as discussed in the studies on ELM [41–43],
the optimal vector β in Equation (9) can be derived as
β = σ k (x )†y (13)
800
700
Energy consumption
600
500
400
300
200
50 100 150 200 250 300 350 400 450 500
(a) The first 500 data pairs for the 30−min experiment
1400
Energy consumption
1200
1000
800
600
400
50 100 150 200 250 300 350 400 450 500
(b) The first 500 data pairs for the 60−min experiment
y( p) = ĝ(y( p − p1 ), y( p − p2 ), . . . , y( p − pn )) (14)
where ĝ(.) represents the prediction model that can be realized by the prediction algorithms.
To be more clear, we assume that the input variables of the prediction models are
x = ( x1 , x2 , . . . , xn )T , where x1 = y( p − p1 ), x2 = y( p − p2 ), . . . , xn = y( p − pn ) and the output
variable is y = y( p).
To train and test the building energy consumption models, the input–output data pairs should
first be formed. Considering the input and output form of the above prediction model, we can obtain
the input–output data pairs as follows:
x (m) , y(m) , m = 1, 2, . . . , N − p1 (15)
Energies 2017, 10, 1525 9 of 20
where x (m) = [y(m), y(m + p1 − p2 ), . . . , y(m + p1 − pn )]T , y(m) = y(m + p1 ), and N is the number of
samples in the building energy consumption time series.
The numbers of the input–output data pairs for training and testing are determined by the time
lag p1 in different experiments. We give the detailed discussion on this issue below.
K
1
∑
(m)
MAE = ŷ − y(m) (16)
K m =1
(m)
1 K ŷ − y(m)
K m∑
MRE = (17)
=1 y(m)
s
K
∑m =1 ( ŷ
( m ) − y ( m ) )2
RMSE = (18)
K
where K is the number of samples for training or testing, and ŷ(m) and y(m) are respectively the
predicted value and the target value.
In order to guarantee the performance of the prediction models, the input–output data pairs
are normalized. In this study, the following equation is used to normalize the input parts of the
input–output data pairs:
(m) (m)
(m) xq − min xq
x̂q =2 (m) (m)
−1 (19)
max xq − min xq
where q = 1, 2, · · · , n, m = 1, 2, · · · , N − p1 .
4. Experiments
In this section, the 30 and 60 min building energy consumption prediction experiments are
analyzed. For each experiment, we determine the optimal input variables for the models first, and then
make comprehensive assessments of the five prediction models.
Energies 2017, 10, 1525 10 of 20
1
Sample Partial Autocorrelations
0.5
−0.5
−1
0 50 100 150
Lags
Figure 5. The partial autocorrelation function (PACF) of the 30 min experiment with 150 time lags.
To obtain the optimal input variables for predicting building energy consumption, we chose
the time series lags whose absolute value of the partial autocorrelations were greater than or equal
to 0.1. As shown in Figure 5, for the 30 min experiment, there were 22 lags meeting the above
condition. As a result, the determined optimal input variables with respect to y( p) are x1 = y( p − 97),
x2 = y( p − 96), x3 = y( p − 95), x4 = y( p − 51), x5 = y( p − 49), x6 = y( p − 48), x7 = y( p − 47),
x8 = y( p − 46), x9 = y( p − 45), x10 = y( p − 44), x11 = y( p − 43), x12 = y( p − 42),
x13 = y( p − 40), x14 = y( p − 39), x15 = y( p − 37), x16 = y( p − 36), x17 = y( p − 32),
x18 = y( p − 31), x19 = y( p − 19), x20 = y( p − 3), x21 = y( p − 2), and x22 = y( p − 1).
Table 1. Root-mean-square errors (RMSEs) of the 30 min experiment with various numbers of hidden
layers and hidden units.
The parameter configuration of the other four comparative approaches are listed in detail
as follows.
For the BPNN, the number of hidden layer nodes and the iteration number were respectively set
to be 300 and 15,000. In the hidden layer, the logsig activation function was used. Additionally, in the
training process, a gradient descent-based algorithm was adopted.
For SVR, the radial basis function was chosen as the kernel function and the penalty factor C was
set to be 50. Moreover, we did not use shrinking heuristics in the training process.
For the GRBFNN, 5-fold cross-validation was adopted to determine the optimal center and spread
of the RBF function. In addition, the spread was chosen from 0.01 to 2 with a 0.1 step length.
For MLR, the ordinary least-squares method was adopted to minimize the SSE for obtaining the
optimal regression function.
4.1.3. Results
For the testing data, the prediction results of the five prediction models in the 30 min prediction
experiment are demonstrated in Figure 6. For better visualization, parts of the results (the values
between 400 and 500) in Figure 6 have been zoomed in and are plotted in Figure 7 to show finer details.
700
650
600
550
Energy consumption
500
450
400
350
BPNN
300 SVR
GRBFNN
MLR
250
ExtremeSAE
Actual
200
0 100 200 300 400 500 600 700
samples
(a) (b)
700 700
Energy consumption
Energy consumption
600 BPNN 600 SVR
Actual Actual
500 500
400 400
300 300
200 200
400 420 440 460 480 500 400 420 440 460 480 500
(c) (d)
700 700
GRBFNN MLR
Energy consumption
Energy consumption
600 600
Actual Actual
500 500
400 400
300 300
200 200
400 420 440 460 480 500 400 420 440 460 480 500
(e)
700
ExtremeSAE
Energy consumption
600
Actual
500
400
300
200
400 420 440 460 480 500
Figure 7. Parts of the zoomed-in prediction results: (a) backward propagation neural network (BPNN);
(b) support vector regression (SVR); (c) generalized radial basis function neural network (GRBFNN);
(d) multiple linear regression (MLR); and (e) extreme stacked autoencoder (SAE).
The residual errors of the five prediction models in this 30 min prediction experiment are
demonstrated in Figure 8. Similarly, to make for a clear comparison, parts of the results (the values
between 400 and 500) in Figure 8 have been zoomed in and are re-plotted in Figure 9.
150
BPNN
SVR
100 GRBFNN
MLR
ExtremeSAE
50
Residual Error
−50
−100
−150
0 100 200 300 400 500 600 700
samples
Figure 8. Residual errors of the five models in the 30 min prediction experiment.
Energies 2017, 10, 1525 13 of 20
150
100
50
Residual Error
−50
BPNN
SVR
GRBFNN
MLR
−100
ExtremeSAE
−150
400 410 420 430 440 450 460 470 480 490 500
samples
To quantitatively analyze the performances of the five prediction models, we consider the MAE,
MRE and RMSE indices for both the training and the testing processes. For the 30 min prediction,
the MAEs, MREs and RMSEs of the five prediction models in the training and testing processes are
listed in Table 2.
Table 2. Comparison results of the five prediction models in the 30 min prediction experiment.
1
Sample Partial Autocorrelations
0.5
−0.5
−1
0 10 20 30 40 50 60 70 80
Lags
Figure 10. The partial autocorrelation function (PACF) of the 60-min prediction experiment with 80
lags.
Table 3. Root-mean-square errors (RMSEs) of the 60 min prediction experiment with various numbers
of hidden layers and hidden units.
In this case, the configurations of the BPNN, SVR, the GRBFNN and MLR are as follows. For the
BPNN, the number of hidden layer nodes and the iteration number were respectively set to be 200 and
17,000. In the hidden layer, the logsig activation function was used. For SVR, the radial basis function
was chosen as the kernel function and the penalty factor C was set to be 80. Again, we did not use
shrinking heuristics in the training process. The configurations of the GRBFNN and MLR in this case
were the same as those in the 30 min prediction experiment.
4.2.3. Results
For the testing data, the prediction results of the five prediction models in the 60 min prediction
experiment are demonstrated in Figure 11. Again, for better visualization, parts of the results
(the values between 200 and 300) in Figure 11 have been zoomed in and are plotted in Figure 12
to demonstrate finer details.
Energies 2017, 10, 1525 15 of 20
1400
1300
1200
1100
Energy consumption
1000
900
800
700
BPNN
SVR
600
GRBFNN
MLR
500 ExtremeSAE
Actual
400
0 50 100 150 200 250 300 350
samples
Figure 11. Prediction results of the five models in the 60 min experiment.
(a) (b)
1400 1400
BPNN SVR
Energy consumption
Energy consumption
1200 1200
Actual Actual
1000 1000
800 800
600 600
400 400
200 220 240 260 280 300 200 220 240 260 280 300
(c) (d)
1400 1400
GRBFNN MLR
Energy consumption
Energy consumption
800 800
600 600
400 400
200 220 240 260 280 300 200 220 240 260 280 300
(e)
1400
ExtremeSAE
Energy consumption
1200 Actual
1000
800
600
400
200 220 240 260 280 300
Figure 12. Parts of the zoomed-in prediction results: (a) backward propagation neural network (BPNN);
(b) support vector regression (SVR); (c) generalized radial basis function neural network (GRBFNN);
(d) multiple linear regression (MLR); and (e) extreme stacked autoencoder (SAE).
Energies 2017, 10, 1525 16 of 20
The residual errors of the five prediction models in the 60 min prediction experiment are
demonstrated in Figure 13. Once more, parts of the results (the values between 200 and 300) in
Figure 13 have been zoomed in and are re-plotted in Figure 14.
Similarly, in the 60 min prediction experiment, the MAEs, MREs and RMSEs of the five prediction
models in the training and testing processes are listed in Table 4.
300
BPNN
SVR
GRBFNN
200 MLR
ExtremeSAE
100
Residual Error
−100
−200
−300
0 50 100 150 200 250 300 350
samples
Figure 13. Residual errors of the five models in the 60 min prediction experiment.
300
BPNN
SVR
GRBFNN
200 MLR
ExtremeSAE
100
Residual Error
−100
−200
−300
200 210 220 230 240 250 260 270 280 290 300
samples
Table 4. Comparison results of the five prediction models in the 60 min prediction experiment.
5. Conclusions
Deep learning has shown its powerful learning and prediction abilities in the time series prediction
applications. This study aimed to utilize one popular deep learning approach—the SAE method—to
improve the predicted results of building energy consumptions. Theoretically, this study provided
a novel learning method by combining the SAE method and the ELM method. The main difference
between the proposed method and the traditional SAE method is that the proposed method does not
need the fine-tuning of the whole network by the iterative back-propagation algorithm, but directly
utilizes the ELM method to find the output weights without iterations. This can quicken the learning
speed and strengthen the generalization performance. For the application aspect, the proposed deep
Energies 2017, 10, 1525 18 of 20
learning method was applied to the energy consumption prediction of a specific building, whose one
year energy consumption data were collected. The experimental and comparison results demonstrate
that the deep learning method outperforms several popular traditional machine learning methods.
The reason for this may be that the proposed deep learning method has deeper architecture and
improved learning strategies compared with the other comparative methods. In other words, although
the data set in this application does not have a large quantity of data, the deep learning method can
still extract better building energy consumption features and improve the prediction accuracy.
We are continuing to investigate other schemes to further improve the prediction accuracy.
By analyzing the collected building energy consumption data, we found that the building energy
consumption changes periodically. By considering its periodicity, achieving better performance may
be expected. Then, what is left to be investigated is how to simultaneously utilize both the data and
the prior knowledge of periodicity to construct the DNN. This will be one of our future research
directions.
Acknowledgments: This work is supported by the National Natural Science Foundation of China (61473176,
61105077, and 61573225), and the Natural Science Foundation of Shandong Province for Young Talents in Provincial
Universities (ZR2015JL021).
Author Contributions: Chengdong Li, Dongbin Zhao and Jianqiang Yi have contributed to developing ideas
about energy consumption prediction and collecting the data. Zixiang Ding and Guiqing Zhang programmed the
algorithm and tested it. All the authors were involved in preparing the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Štreimikienė, S. Residential energy consumption trends, main drivers and policies in Lithuania. Renew. Sustain.
Energy Rev. 2014, 35, 285–293.
2. Ugursal, V.I. Energy consumption, associated questions and some answers. Appl. Energy 2014, 130, 783–792.
3. Hua, C.; Lee, W.L.; Wang, X. Energy assessment of office buildings in China using China building energy
codes and LEED 2.2. Energy Build. 2015, 86, 514–524.
4. Zuo, J.; Zhao, Z.Y. Green building research-current status and future agenda: A review. Renew. Sustain.
Energy Rev. 2014, 30, 271–281.
5. Daut, M.A.M.; Hassan, M.Y.; Abdullah, H.; Rahman, H.A.; Abdullah, M.P.; Hussin, F. Building electrical
energy consumption forecasting analysis using conventional and artificial intelligence methods: A review.
Renew. Sustain. Energy Rev. 2017, 70, 1108–1118.
6. Li, K.; Hu, C.; Liu, G.; Xue, W. Building’s electricity consumption prediction using optimized artificial neural
networks and principal component analysis. Energy Build. 2015, 108, 106–113.
7. Pombeiro, H.; Santos, R.; Carreira, P.; Silva, C.; Sousa, J.M.C. Comparative assessment of low-complexity
models to predict electricity consumption in an institutional building: Linear regression vs. fuzzy modeling
vs. neural networks. Energy Build. 2017, 146, 141–151.
8. Jimenez, M.J.; Heras, M.R. Application of multi-output ARX models for estimation of the U and g values of
building components in outdoor testing. Sol. Energy 2005, 79, 302–310.
9. Kimbara, A.; Kurosu, S.; Endo, R.; Kamimura, K.; Matsuba, T.; Yamada, A. On-line prediction for load profile
of an air-conditioning system. Ashrae Trans. 1995, 101, 198–207.
10. Newsham, G.R.; Birt, B.J. Building-level occupancy data to improve ARIMA-based electricity use forecasts.
In Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building,
Zurich, Switzerland, 2 November 2010; pp. 13–18.
11. Aydinalp-Koksal, M.; Ugursal, V.I. Comparison of neural network, conditional demand analysis, and
engineering approaches for modeling end-use energy consumption in the residential sector. Appl. Energy
2008, 85, 271–296.
12. Hsu, D. Comparison of integrated clustering methods for accurate and stable prediction of building energy
consumption data. Appl. Energy 2015, 160, 153–163.
13. Alvarez, F.M.; Troncoso, A.; Riquelme, J.C.; Ruiz, J.S.A. Energy time series forecasting based on pattern
sequence similarity. IEEE Trans. Knowl. Data Eng. 2011, 23, 1230–1243.
Energies 2017, 10, 1525 19 of 20
14. Pérez-Chacón, R.; Talavera-Llames, R.L.; Martinez-Alvarez, F.; Troncoso, A. Finding electric energy
consumption patterns in big time series data. In Proceedings of the13th International Conference Distributed
Computing and Artificial Intelligence, Sevilla, Spain, 1–3 June 2016; Springer: Cham, Switzerland, 2016; pp.
231–238.
15. Martínez-Álvarez, F.; Troncoso, A.; Asencio-Cortés, G.; Riquelme, J.C. A survey on data mining techniques
applied to electricity-related time series forecasting. Energies 2015, 8, 13162–13193.
16. Tso, G.K.F.; Yau, K.K.W. Predicting electricity energy consumption: A comparison of regression analysis,
decision tree and neural networks. Energy 2007, 32, 1761–1768.
17. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs. Neurons: Comparison between random forest and ANN
for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89.
18. Paudel, S.; Elmitri, M.; Couturier, S.; Nguyen, P.H.; Kamphuis, R.; Lacarrière, B.; Corre, O.L. A relevant
data selection method for energy consumption prediction of low energy building based on support vector
machine. Energy Build. 2017, 138, 240–256.
19. Mena, R.; Rodríguez, F.; Castilla, M.; Arahal, M.R. A prediction model based on neural networks for the
energy consumption of a bioclimatic building. Energy Build. 2014, 82, 142–155.
20. Biswas, M.A.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural
network approach. Energy 2016, 117, 84–92.
21. Naji, S.; Shamshirband, S.; Basser, H.; Keivani, A.; Alengaram, U.J.; Jumaat, M.Z.; Petkovic, D. Application of
adaptive neuro-fuzzy methodology for estimating building energy consumption. Renew. Sustain. Energy Rev.
2016, 53, 1520–1528.
22. Ekici, B.B.; Aksoy, U.T. Prediction of building energy needs in early stage of design by using ANFIS.
Expert Syst. Appl. 2011, 38, 5352–5358.
23. Yang, J.; Rivard, H.; Zmeureanu, R. On-line building energy prediction using adaptive artificial neural
networks. Energy Build. 2005, 37, 1250–1259.
24. Naji, S.; Keivani, A.; Shamshirband, S.; Alengaram, U.J.; Jumaat, M.Z.; Mansor, Z.; Lee, M. Estimating
building energy consumption using extreme learning machine method. Energy 2016, 97, 506–516.
25. Zhang, Y.; Chen, Q. Prediction of building energy consumption based on PSO-RBF neural network.
In Proceedings of the IEEE International Conference on System Science and Engineering, Shanghai, China,
11–13 July 2014; pp. 60–63.
26. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2014,
18, 1527–1554.
27. Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with
multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008; pp. 160–167.
28. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,
313, 504–507.
29. Huval, B.; Coates, A.; Ng, A. Deep learning for class-generic object detection. arXiv 2013, arXiv:1312.6885.
30. Qiu, X.; Zhang, L.; Ren, Y.; Suganthan, P.N.; Amaratunga, G. Ensemble deep learning for regression and time
series forecasting. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence in Ensemble
Learning (CIEL), Orlando, FL, USA, 9–12 December 2014; pp. 1–6.
31. Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016,
3, 247–254.
32. Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach.
IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873.
33. Torres, J.; Fernández, A.; Troncoso, A.; Martínez-Álvarez, F. Deep learning-based approach for time series
forecasting with application to electricity load. In Proceedings of the International Work-Conference on the
Interplay between Natural and Artificial Computation, Corunna, Spain, 19–23 June2017; Springer: Cham,
Switzerland, 2017; pp. 203–212.
34. Bengio, Y.; Lamblin, P.; Dan, P.; Larochelle, H. Greedy layer-wise training of deep networks. In Proceedings
of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7
December 2006; pp. 153–160.
Energies 2017, 10, 1525 20 of 20
35. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with
denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008; ACM: New York, NY, USA, 2008; pp. 1096–1103.
36. Palm, R.B. Prediction as a Candidate for Learning Deep Hierarchical Models of Data; Technical University of
Denmark: Kongens Lyngby, Denmark, 2012; Volume 5.
37. Hosseiniasl, E.; Zurada, J.M.; Nasraoui, O. Deep learning of part-based representation of data using sparse
autoencoders with nonnegativity constraints. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2486–2498.
38. Xu, J.; Xiang, L.; Liu, Q.; Gilmore, H.; Wu, J.; Tang, J.; Madabhushi, A. Stacked sparse autoencoder (SSAE)
for nuclei detection on breast cancer histopathology images. IEEE Trans. Med. Imaging 2016, 35, 119–130.
39. Tao, C.; Pan, H.; Li, Y.; Zou, Z. Unsupervised spectral-spatial feature learning with stacked sparse
autoencoder for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2438–2442.
40. Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.A.; Vincent, P.; Bengio, S. Why does unsupervised
pre-training help deep learning? J. Mach. Learn. Res. 2010, 11, 625–660.
41. Huang, G.B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011,
2, 107–122.
42. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing
2006, 70, 489–501.
43. Li, M.B.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. Letters: Fully complex extreme learning machine.
Neurocomputing 2005, 68, 306–314.
44. Erb, R.J. Introduction to backpropagation neural network computation. Pharm. Res. 1993, 10, 165–170.
45. Awad, M.; Khanna, R. Support vector regression. Neural Inf. Process. Lett. Rev. 2007, 11, 203–224.
46. Friedrichs, F.; Schmitt, M. On the power of Boolean computations in generalized RBF neural networks.
Neurocomputing 2005, 63, 483–498.
47. Preacher, K.J.; Curran, P.J.; Bauer, D.J. Computational tools for probing interactions in multiple linear
regression, multilevel modeling, and latent curve analysis. J. Educ. Behav. Stat. 2006, 31, 437–448.
48. Eberly, L.E. Multiple linear regression. Methods Mol. Biol. 2007, 404, 165–187.
49. Chong, T.L. Estimating the differencing parameter via the partial autocorrelation function. J. Econom. 1998,
97, 365–381.
50. Zhang, Z.; Law, C.L.; Gunawan, E. Multipath mitigation technique based on partial autocorrelation function.
Wirel. Pers. Commun. 2007, 41, 145–154.
51. Alder, B.J.; Wainwright, T.E. Decay of the Velocity Autocorrelation Function. Phys. Rev. A 1970, 1, 18–21.
52. Jiang, X.; Adeli, H. Wavelet Packet-Autocorrelation Function Method for Traffic Flow Pattern Analysis.
Comput. Aided Civ. Infrastruct. Eng. 2010, 19, 324–337.
c 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).