Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach

energies
Article
Building Energy Consumption Prediction:
An Extreme Deep Learning Approach
Chengdong Li 1,∗ ID
, Zixiang Ding 1 , Dongbin Zhao 2 , Jianqiang Yi 2 and Guiqing Zhang 1
1 School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;
[email protected] (Z.D.); [email protected] (G.Z.)
2 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (D.Z.);
[email protected] (J.Y.)
* Correspondence: [email protected]; Tel.: +86-188-6641-0727
Received: 23 August 2017; Accepted: 25 September 2017; Published: 7 October 2017
Abstract: Building energy consumption prediction plays an important role in improving the energy
utilization rate through helping building managers to make better decisions. However, as a result of
randomness and noisy disturbance, it is not an easy task to realize accurate prediction of the building
energy consumption. In order to obtain better building energy consumption prediction accuracy,
an extreme deep learning approach is presented in this paper. The proposed approach combines
stacked autoencoders (SAEs) with the extreme learning machine (ELM) to take advantage of their
respective characteristics. In this proposed approach, the SAE is used to extract the building energy
consumption features, while the ELM is utilized as a predictor to obtain accurate prediction results.
To determine the input variables of the extreme deep learning model, the partial autocorrelation
analysis method is adopted. Additionally, in order to examine the performances of the proposed
approach, it is compared with some popular machine learning methods, such as the backward
propagation neural network (BPNN), support vector regression (SVR), the generalized radial basis
function neural network (GRBFNN) and multiple linear regression (MLR). Experimental results
demonstrate that the proposed method has the best prediction performance in different cases of the
building energy consumption.
Keywords: building energy consumption; deep learning; stacked autoencoders; extreme

learning machine
1. Introduction
Nowadays, with economic growth and the population increasing, more and more energy is
consumed. As one aspect of the energy consumption, building energy consumption accounts for
a considerable proportion [1,2]. For example, in China, statistical data shows that building energy
consumption accounted for 28% of the total energy consumption in 2011, and that it will reach 35%
by 2020 [3]; in the United States, building energy consumption is close to 39% of the total energy
consumption [4].Therefore, it is necessary to propose some efficient strategies to promote the building
energy utilization rate. Building energy consumption prediction can help building managers to make
better decisions so as to reasonably control all kinds of equipment. Hence, this is an efficient and
helpful way to reduce the consumption of building energy and to improve the energy utilization rate.
A great number of prediction methods have been proposed in the past several decades for building
energy consumption prediction. The majority of the case studies depend on the historical energy
consumption time series data to construct the prediction models [5]. Generally, the proposed methods
for building energy consumption prediction fall into two categories, which are statistical methods and
artificial intelligence methods.
Energies 2017, 10, 1525; doi:10.3390/en10101525 www.mdpi.com/journal/energies

Energies 2017, 10, 1525 2 of 20
The statistical methods utilize the historical data to construct probabilistic models in order to
estimate and analyze the future energy consumption. In [6], principal component analysis (PCA) was
utilized to select the significant inputs of the energy consumption prediction model. In [7], linear
regression was applied to estimate electricity consumption in an institutional building, and moreover,
fuzzy modeling and neural networks were chosen as two comparative approaches to evaluate the
performance of the linear regression method. In [8], the autoregressive model with extra inputs (ARX)
was utilized to estimate the parameters of building components. In [9], Kimbara et al. developed
an autoregressive integrated moving average (ARIMA) model to implement online building energy
consumption prediction. In [10], the ARIMA with external inputs (ARIMAX) model was applied to
predict the power demand of the buildings. In [11], a regression-based method—conditional demand
analysis (CDA)—was used for predicting the building energy consumption.
Generally speaking, the artificial intelligence methods can obtain more accurate prediction
results in most real-world applications and have been widely applied to the prediction of building
energy consumption. In [12], clusterwise regression, a novel technique that integrates clustering
and regression simultaneously was proposed for forecasting building energy consumption. In [13],
a clustering method was proposed to find the similarity of pattern sequences for electricity prices and
demand prediction. In [14], a k-means method was presented for analyzing the pattern of electricity
consumption in buildings. Additionally, data mining techniques applied to electricity-related time
series forecasting were surveyed in [15]. In [16], a decision tree was used to understand the energy
consumption patterns and to forecast the energy consumption levels. In addition, in [17], a random
forest (RF) was used to help facility managers to improve the energy efficiency in buildings. In [18],
a support vector machine (SVM) was utilized to predict the energy consumption of low-energy
buildings with a relevant data selection method. Artificial neural networks (ANNs) play an important
role in the forecasting of building energy consumption, and different kinds of ANNs have been given
for this application. In [19], a short-term predictive ANN model for electricity demand was developed
for the bioclimatic building. In [20], the Levenberg–Marquardt and Output-Weight-Optimization
(OWO)-Newton algorithm-based ANN was utilized to forecast the residential building energy
consumption. In [21,22], the ANN combined with a fuzzy inference system was examined by the
building energy consumption prediction. In [23], two adaptive ANNs with accumulative training
and sliding window training were proposed for real-time online building energy prediction. In [24],
an ANN trained by the extreme learning machine (ELM) was proposed to estimate the building
energy consumption and was compared with the genetic algorithm (GA)-based ANN. Furthermore,
a hybrid method, the radial basis function neural network (RBFNN), combined with the particle swarm
optimization (PSO) algorithm was used to improve the building energy efficiency in [25]. Although
the statistical methods and the existing artificial intelligence methods can give satisfactory results, it is
still a challenging task to obtain accurate prediction results because of random characteristics that
can be affected by the weather, the working hours, the human distribution and the equipment in the
buildings. On the other hand, the deep learning techniques that have emerged in recent years provide
us with a powerful tool to achieve better modeling and prediction performance. The deep learning
algorithm uses deep architectures or multiple-layer architectures adopting the layerwise pre-training
method for parameter optimization to obtain great feature learning ability [26]. The inherent features
of data extracted from the lowest level to the highest level of the deep learning model are more
representative than for the traditional shallow neural network. Hence, the deep architectures have
greatly improved performance for the modeling, classification and visualization problems, and they
have found lots of applications. In [27], a single convolutional neural network architecture with a
multitask learning strategy was designed for natural language processing (NLP). In [28], the deep
autoencoder network was utilized to convert high-dimensional data to low-dimensional codes, and
experiments demonstrated that it works much better than PCA for dimensionality reduction. In [29],
a stacked autoencoder (SAE) was applied for organ identification in medical magnetic resonance
images. The deep learning approaches have also been applied to time series prediction problems.
Energies 2017, 10, 1525 3 of 20
In [30], an ensemble deep learning approach was utilized for time series predictions of seven
small-batch data sets. In [31], a SAE-based deep neural network (DNN) was constructed to approximate
the Q-function in the reinforcement learning of traffic signal timing. In [32], a SAE was utilized to
realize the traffic-flow prediction on the basis of traffic-flow time series. Additionally, in [33], a deep
learning-based approach for time series forecasting with an application to electricity load was given.
In all these applications, the experimental results demonstrated that the deep learning approaches can
outperform the comparative methods.
Compared with the data sets in the research domains of image recognition, speech recognition,
and machine vision, for example, the data sets in the time series prediction applications [30–33] do
not have a large quantity of data. However, in these applications, the deep learning approaches,
including the SAE approach, still performed better than some traditional machine learning methods
because of the relatively deeper architectures and the improved or newly proposed learning strategies
in the deep learning approaches. In this paper, to enhance the prediction performance, we propose an
extreme deep learning approach to estimate building energy consumption. The proposed approach
combines the SAE with the ELM to make full use of their respective advantages. The SAE is used to
extract the building energy consumption features. Additionally, the ELM is utilized as a predictor to
obtain accurate prediction results. In the proposed extreme SAE approach, only the pre-training of the
SAE is needed, while the fine-tuning of the whole network is replaced by least-squares learning of
the parameters in the last fully connected layer. In addition, in order to determine reasonable input
variables for the extreme deep learning model, the partial autocorrelation analysis method is adopted in
this application. Finally, the proposed approach is compared with some popular methods, such as the
backward propagation neural network (BPNN), support vector regression (SVR), the generalized radial
basis function neural network (GRBFNN) and multiple linear regression (MLR). The experimental
results demonstrate that the proposed deep learning model has the best prediction ability for both the
30 and 60 min experiments.
The rest of this paper is organized as follows. In Section 2, the mechanisms of the autoencoder and
the SAE are reviewed, and the extreme deep learning model is presented. In Section 3, the prediction
model for building energy consumption is discussed in detail. Two experiments on the prediction of
the 30 and 60 min building energy consumption have been performed and the experimental results are
given in Section 4. Finally, the conclusions of this paper are drawn in Section 5.
2. Methodology
In this section, the structure and learning mechanism of the SAE is introduced first. Then, the
extreme deep architecture is shown, and the parameter learning algorithm is given.
N
To begin, we assume that there are N input–output training data pairs (x (m) , y(m) ) m=1 , where

(m) (m) ( m ) T
x ( m ) = x1 , x2 , . . . , x n ∈ Rn is the input part with n input variables and y(m) is the output part
with only one output variable.
2.1. The Stacked Autoencoder

The SAE uses multiple autoencoders as building blocks to construct a DNN [34]. Hence, before
the introduction of the SAE, the autoencoder is described first.
2.1.1. Autoencoder
The autoencoder is an unsupervised neural network and is composed of three layers which are
the input layer, the hidden layer, and the output layer [28]. It attempts to dig out a limited number of
representations to reconstruct its input, that is, the target output is equal to the input of the model.
The structure of one autoencoder with L hidden nodes is demonstrated in Figure 1.
Energies 2017, 10, 1525 4 of 20
Figure 1. The autoencoder.
In the autoencoder, there are two processes—the encoding process and the decoding process.
In the encoding process, an autoencoder attempts to dig out a hidden representation σ 1 (x ), which can
be computed as
σ 1 (x ) = f (w 1 x + b 1 ) (1)
where w 1 is an encoding matrix, b 1 is an encoding bias vector, and f (·) is the activation function that
can be chosen as the sigmoid function or the tanh function.
Additionally, in the decoding process, a decoding matrix needs to be determined to decode the
hidden representation σ 1 (x ) back into a reconstruction σ 2 (x ). The decoded output can be computed as
σ 2 (x ) = g(w 2σ 1 (x ) + b 2 ) (2)
where w 2 and b 2 are respectively the decoding matrix and the decoding bias vector, and again, g(·) is
the activation function that can be chosen as the sigmoid function or the tanh function.
We always expect the error between the input x and the reconstruction σ 2 (x ) to be as small as
possible. This can be achieved by minimizing the following loss function [35]:
1 N
2 m∑
L(x , σ 2 (x )) = kx (m) − σ 2 (x (m) )k2 (3)
=1
In other words, the optimal parameter set of the autoencoder can be determined by solving the
following optimization problem:
Ψ = {w 1 , w 2 } = arg min L(x , σ 2 (x )) (4)

w2
w 1 ,w
In the autoencoder, this optimization problem is often solved using one of the variants of the
back-propagation algorithm, such as the conjugate gradient method or the steepest descent method,
for example.
2.1.2. Sparse Autoencoder

The autoencoder will be invalid for dimension reduction and key feature extraction if the number
of hidden nodes is the same or greater than the number of input nodes, that is, L ≥ n. To solve this
problem, sparsity constraints are imposed on the hidden layer to obtain the representative features
Energies 2017, 10, 1525 5 of 20
and to learn useful structures from the input data [36–39]. This allows for sparse representations of
inputs and is useful for pre-training in many tasks.
The autoencoder with sparsity constraints is called the sparsity autoencoder. We minimize the
following loss function, which imposes a sparsity constraint on the reconstruction error to obtain the
optimal parameters of the sparsity autoencoder [36–39]:
L
SL = L(x , σ 2 (x )) + λ ∑ KL(ρ k ρ̂ j ) (5)
j =1
where λ is the penalty coefficient, ρ is a sparsity parameter that is typically a small value close to zero,
N (m) ) is the average activation of the jth hidden node with respect to the training
ρ̂ j = (1/N ) ∑m =1 σ 1 (x
set, and KL(ρ k ρ̂ j ) is the Kullback–Leibler divergence, which is also called the relative entropy and is
defined as follows [36–39]:
ρ 1−ρ
KL(ρ k ρ̂ j ) = ρlog + (1 − ρ)log (6)
ρ̂ j 1 − ρ̂ j
If ρ = ρ̂ j , KL(ρ k ρ̂ j ) will be zero. Additionally, if ρ̂ j is close to 0 or 1, KL(ρ k ρ̂ j ) will be infinity.
2.1.3. The Structure of the Stacked Autoencoder

The SAE model as a novel deep learning model is a stack of multiple autoencoders [32,40].
Figure 2 demonstrates the structure of the SAE with k hidden layers. The final output of its last hidden
layer can be expressed as

σ k (x ) = f b 1k + w 1k f · · · + w 21 f b 11 + w 11 x (7)
where w 1l and b 1l (l = 1, 2, . . . , k ) are respectively the encoding matrix and the encoding bias vector
of the lth autoencoder. Additionally, the activation function f (·) can be chosen as the sigmoid or
tanh function.
Figure 2. The stacked autoencoders with k hidden layers and its layerwise training process.
Energies 2017, 10, 1525 6 of 20
The parameter learning algorithm of the SAE is not mentioned in this subsection; it is introduced
in next subsection in detail.
2.2. Extreme Stacked Autoencoder and Its Training Algorithm

For predicting building energy consumption, we propose a deep learning approach, named
extreme SAE, which combines the SAE with the ELM. The structure of the proposed extreme SAE is
demonstrated in Figure 3. In this approach, the inputs are given to the SAE part, following which,
the fully connected layer is trained by the ELM. The SAE part is used to extract building energy
consumption features, while the ELM part is utilized as a predictor to obtain accurate prediction
results.
Figure 3. The structure of the extreme SAE with k hidden layers.
To design an extreme SAE that performs well, optimal parameters, including the parameters in
the SAE part and the parameters in the ELM part, should be determined first. In this study, we use
two steps to determine these parameters. In the first step, we pre-train the parameters in the SAE part.
Then, in the second step, we utilize the least-squares method to find the parameters in the ELM part.
2.2.1. Pre-Training of the Stacked Autoencoder Part

To obtain the optimal parameters of the SAE part, the gradient-based Back-propagation (BP)
optimization algorithm is typically used. However, its performance is unsatisfactory, mainly because
of the gradient divergence. In order to solve this issue, Hinton et al. have proposed a greedy layerwise
unsupervised learning algorithm that can train DNNs successfully [26]. The novel point of this
approach is that it pre-trains the SAE model layer by layer in a bottom-up way, as expressed below.
• Step 1: Train the first layer as an autoencoder by minimizing Equation (3) using the training
samples as the input, and let v = 2.
• Step 2: Train the vth layer by minimizing Equation (3) using σ 1v−1 (x ) as its input.
• Step 3: Let v = v + 1, and iterate Step 2 until v > k.
Energies 2017, 10, 1525 7 of 20
• Step 4: Output σ 1k (x ) and use it as the input of the predictor.
Here, σ 1v−1 (x ) is the hidden representation of the (v − 1)th layer, and k is the desired number of
hidden layers.
2.2.2. Least-Squares Learning of the Extreme Learning Machine Part

In the ELM part, the least-squares method is adopted to optimize the output weighting vector β .
(m) (m) (m)
From Figure 3, the input data x (m) = [ x1 , x2 , . . . , xn ]T , m = 1, 2, . . . , N corresponds to
a known hidden representation σ k (x (m) ) when all Φl = {w 1l , b 1l }, l = 1, 2, . . . , k are determined.
We always expect that each of the estimated results ŷ(m) with respect to x (m) can approximate the true
value y(m) with no errors. This can be mathematically expressed as
N
∑ kŷ(m) − y(m) k = 0 (8)
m =1
where ŷ(m) = σ k (x (m) )β .

Then, Equation (8) can be rewritten as
σ k (x )β = y (9)
where σ k (x ) is the output matrix of the SAE as well as the input matrix of the ELM and can be
expressed as
σ k (x ) = [σk (x (1) ), σk (x (2) ), · · · , σk (x ( N ) )]T

 
f b 1k + w 1k f · · · + w 21 f b 11 + w 11 x (1)
 
 f b k + w k f · · · + w 2 f b 1 + w 1 x (2) 
 1 1 1 1 1 
= ..

 (10)

 .


f b 1k + w 1k f · · · + w 21 f b 11 + w 11 x ( N )
N ×nk
β= [ β 1 , β 2 , . . . , β nk ]Tnk ×1 (11)
y= [y(1) , y(2) , . . . , y( N ) ]TN ×1 (12)
According to the knowledge of matrix theory, as discussed in the studies on ELM [41–43],
the optimal vector β in Equation (9) can be derived as
β = σ k (x )†y (13)
where σ k (x )† is the Moore–Penrose generalized inverse of σ k (x ).

We note that the traditional layerwise training of the SAE includes two steps, which are the
pre-training of the SAE and the fine-tuning of the whole network. Being different to the traditional
layerwise training, the extreme SAE approach utilizes the extreme learning of the output weighting
vector to replace the fine-tuning process of the whole network.
3. Energy Consumption Prediction Model

In this section, we construct an extreme SAE model for 30 and 60 min building energy consumption
prediction. First of all, the applied data set is shown. Then, the energy consumption prediction model
is described. Finally, the experimental settings are introduced.
Energies 2017, 10, 1525 8 of 20
3.1. Applied Data Set

The applied building energy consumption data were download from the website: https://
trynthink.github.io/buildingsdatasets/. The data were collected every 15 min in one retail building
in Fremont, CA. After the preprocessing of the missing data in the initial data set by the mean
filter method, we obtained 34,939 samples in the building energy consumption time series. Then,
we aggregated the collected data into 30 and 60 min intervals each for 30 and 60 min predictions.
Consequently, the building energy consumption time series for the 30 min experiment had 17,469 data
points, and the time series for the 60 min experiment had 8734 data points. For better visualization,
we plotted the first 500 samples of the 30 and 60 min cases in Figure 4 rather than plotting the whole
time series of the two cases.
800
700
Energy consumption
600
500
400
300
200
50 100 150 200 250 300 350 400 450 500
(a) The first 500 data pairs for the 30−min experiment
1400
Energy consumption
1200
1000
800
600
400
50 100 150 200 250 300 350 400 450 500
(b) The first 500 data pairs for the 60−min experiment
Figure 4. The first 500 samples of the 30 and 60 min experiments.
3.2. Energy Consumption Prediction Model

In this paper, we utilize n energy consumption data in the building energy consumption time series
before time p to predict the value at time p. In other words, we utilize y( p − p1 ), y( p − p2 ), . . . , y( p −
pn ) ( p1 > p2 > · · · > pn ) to estimate the value of y( p). Thus, the model for the prediction of energy
consumption has the following form:
y( p) = ĝ(y( p − p1 ), y( p − p2 ), . . . , y( p − pn )) (14)
where ĝ(.) represents the prediction model that can be realized by the prediction algorithms.
To be more clear, we assume that the input variables of the prediction models are
x = ( x1 , x2 , . . . , xn )T , where x1 = y( p − p1 ), x2 = y( p − p2 ), . . . , xn = y( p − pn ) and the output
variable is y = y( p).
To train and test the building energy consumption models, the input–output data pairs should
first be formed. Considering the input and output form of the above prediction model, we can obtain
the input–output data pairs as follows:

x (m) , y(m) , m = 1, 2, . . . , N − p1 (15)
Energies 2017, 10, 1525 9 of 20
where x (m) = [y(m), y(m + p1 − p2 ), . . . , y(m + p1 − pn )]T , y(m) = y(m + p1 ), and N is the number of
samples in the building energy consumption time series.
The numbers of the input–output data pairs for training and testing are determined by the time
lag p1 in different experiments. We give the detailed discussion on this issue below.
3.3. Experimental Setting

As aforementioned, we utilize the extreme SAE to predict the 30 and 60 min building energy
consumption. Furthermore, in order to examine the proposed deep learning model’s performance, the
BPNN, SVR, the GRBFNN and MLR are chosen as the comparative methods. Below, these comparative
methods are introduced briefly.
The BPNN is a multilayer ANN that adopts the back propagation algorithm to train the weights
between neighboring layers [44]. This technique has some superior abilities, such as its nonlinear
mapping capability, self-learning and adaptive capability, and generalization ability. The BPNN has
found lots of applications in many research areas.
In our comparison, SVR is adopted. As one variant of the SVM, SVR still attempts to minimize
the generalization error bound so as to achieve generalized performance [45]. Furthermore, the kernel
function is utilized in the SVR to avoid the calculations in high-dimensional space. As a result, it can
perform well when the input features have high dimensionality.
The GRBFNN is the generalized RBFNN. It utilizes k-fold cross-validation to determine the
optimal center and spread of the RBFNN [46]. MLR is the popular statistical method for regression
and prediction. It utilizes the ordinary least-squares method or generalized least-squares to minimize
the sum of squares of errors (SSE) for obtaining the optimal regression function [47,48].
To examine the effectiveness of the five prediction models, we adopt three performance
indices—the mean absolute error (MAE), the mean relative error (MRE), and the root-mean-square
error (RMSE), which can be computed respectively as
K
1
∑
(m)
MAE = ŷ − y(m) (16)

K m =1

(m)
1 K ŷ − y(m)

K m∑
MRE = (17)
=1 y(m)
s
K
∑m =1 ( ŷ
( m ) − y ( m ) )2
RMSE = (18)
K
where K is the number of samples for training or testing, and ŷ(m) and y(m) are respectively the
predicted value and the target value.
In order to guarantee the performance of the prediction models, the input–output data pairs
are normalized. In this study, the following equation is used to normalize the input parts of the
input–output data pairs:
(m) (m)
(m) xq − min xq
x̂q =2 (m) (m)
−1 (19)
max xq − min xq
where q = 1, 2, · · · , n, m = 1, 2, · · · , N − p1 .
4. Experiments
In this section, the 30 and 60 min building energy consumption prediction experiments are
analyzed. For each experiment, we determine the optimal input variables for the models first, and then
make comprehensive assessments of the five prediction models.
Energies 2017, 10, 1525 10 of 20
4.1. Thirty Minute Prediction of Building Energy Consumption
4.1.1. Determination of the Optimal Input Variables

In this study, the partial autocorrelation function (PACF) is adopted to determine the input
variables for building energy consumption prediction [49,50]. The PACF can obtain partial
autocorrelation between y( p − t) and y( p) [51,52]. The greater the partial correlation coefficient,
the greater the influence y( p − t) puts on y( p). The PACF of the building energy consumption time
series for the 30 min experiment with 150 lags is demonstrated in Figure 5.
1
Sample Partial Autocorrelations
0.5
−0.5
−1
0 50 100 150
Lags
Figure 5. The partial autocorrelation function (PACF) of the 30 min experiment with 150 time lags.
To obtain the optimal input variables for predicting building energy consumption, we chose
the time series lags whose absolute value of the partial autocorrelations were greater than or equal
to 0.1. As shown in Figure 5, for the 30 min experiment, there were 22 lags meeting the above
condition. As a result, the determined optimal input variables with respect to y( p) are x1 = y( p − 97),
x2 = y( p − 96), x3 = y( p − 95), x4 = y( p − 51), x5 = y( p − 49), x6 = y( p − 48), x7 = y( p − 47),
x8 = y( p − 46), x9 = y( p − 45), x10 = y( p − 44), x11 = y( p − 43), x12 = y( p − 42),
x18 = y( p − 31), x19 = y( p − 19), x20 = y( p − 3), x21 = y( p − 2), and x22 = y( p − 1).
4.1.2. Configuration of the Prediction Models

For an extreme SAE model that performs well, the number of hidden layers and the number of
hidden units in each hidden layer need to be determined.
In this study, we utilized the data pairs generated above to select the optimal structure of the
extreme SAE model for the 30 min experiment. Additionally, we assumed that the numbers of hidden
units in all hidden layers are equal, that is, h1 = h2 = · · · = hk = h, where k is the number of the
hidden layers.
In this paper, we chose k from 1 to 4 and the number of hidden units from
{50, 100, 150, 200, 250, 300, 350, 400}. In addition, the RMSE index was chosen as the criterion
to determine the optimal architectures of the extreme SAE for predicting the building energy
consumption.
By performing a grid search, we obtained different RMSEs for the 30 min experiment, and these
are shown in Table 1. Clearly, in this case, the extreme SAE with the smallest RMSE has 4 hidden layers
and 100 hidden units in each hidden layer.
Energies 2017, 10, 1525 11 of 20
Table 1. Root-mean-square errors (RMSEs) of the 30 min experiment with various numbers of hidden
layers and hidden units.
k=1 k=2 k=3 k=4

h = 50 23.4814 23.5453 23.4509 23.1802
h = 100 22.9833 23.5639 23.3679 22.9015
h = 150 24.3893 24.2911 24.3733 24.6273
h = 200 23.9358 24.3908 24.5166 24.2795
h = 250 23.3932 24.2846 23.8729 24.0003
h = 300 24.0451 24.2820 24.4101 24.2295
h = 350 23.9541 23.8668 23.4668 23.9033
h = 400 23.3656 23.6728 23.5747 23.7575
The parameter configuration of the other four comparative approaches are listed in detail
as follows.
For the BPNN, the number of hidden layer nodes and the iteration number were respectively set
to be 300 and 15,000. In the hidden layer, the logsig activation function was used. Additionally, in the
training process, a gradient descent-based algorithm was adopted.
For SVR, the radial basis function was chosen as the kernel function and the penalty factor C was
set to be 50. Moreover, we did not use shrinking heuristics in the training process.
For the GRBFNN, 5-fold cross-validation was adopted to determine the optimal center and spread
of the RBF function. In addition, the spread was chosen from 0.01 to 2 with a 0.1 step length.
For MLR, the ordinary least-squares method was adopted to minimize the SSE for obtaining the
optimal regression function.
4.1.3. Results
For the testing data, the prediction results of the five prediction models in the 30 min prediction
experiment are demonstrated in Figure 6. For better visualization, parts of the results (the values
between 400 and 500) in Figure 6 have been zoomed in and are plotted in Figure 7 to show finer details.
700
650
600
550
Energy consumption
500
450
400
350
BPNN
300 SVR
GRBFNN
MLR
250
ExtremeSAE
Actual
200
0 100 200 300 400 500 600 700
samples
Figure 6. Prediction results of the five models in the 30 min experiment.

Energies 2017, 10, 1525 12 of 20
(a) (b)
700 700
Energy consumption
Energy consumption
600 BPNN 600 SVR
Actual Actual
500 500
400 400
300 300
200 200
400 420 440 460 480 500 400 420 440 460 480 500
(c) (d)
700 700
GRBFNN MLR
Energy consumption
Energy consumption
600 600
Actual Actual
500 500
400 400
300 300
200 200
400 420 440 460 480 500 400 420 440 460 480 500
(e)
700
ExtremeSAE
Energy consumption
600
Actual
500
400
300
200
400 420 440 460 480 500
Figure 7. Parts of the zoomed-in prediction results: (a) backward propagation neural network (BPNN);
(b) support vector regression (SVR); (c) generalized radial basis function neural network (GRBFNN);
(d) multiple linear regression (MLR); and (e) extreme stacked autoencoder (SAE).
The residual errors of the five prediction models in this 30 min prediction experiment are
demonstrated in Figure 8. Similarly, to make for a clear comparison, parts of the results (the values
between 400 and 500) in Figure 8 have been zoomed in and are re-plotted in Figure 9.
150
BPNN
SVR
100 GRBFNN
MLR
ExtremeSAE
50
Residual Error
−50
−100
−150
0 100 200 300 400 500 600 700
samples
Figure 8. Residual errors of the five models in the 30 min prediction experiment.
Energies 2017, 10, 1525 13 of 20
150
100
50
Residual Error
−50
BPNN
SVR
GRBFNN
MLR
−100
ExtremeSAE
−150
400 410 420 430 440 450 460 470 480 490 500
samples
Figure 9. Parts of the zoomed-in residual errors.
To quantitatively analyze the performances of the five prediction models, we consider the MAE,
MRE and RMSE indices for both the training and the testing processes. For the 30 min prediction,
the MAEs, MREs and RMSEs of the five prediction models in the training and testing processes are
listed in Table 2.
Table 2. Comparison results of the five prediction models in the 30 min prediction experiment.
Performance Training Testing

Methods MAE MRE (%) RMSE MAE MRE (%) RMSE
Extreme SAE 15.0231 3.0082 23.3090 13.3865 2.9174 22.9015
BPNN 26.5890 4.9600 35.4052 21.3020 4.1792 30.8121
SVR 15.8168 3.1991 25.3225 16.3592 3.6917 27.0380
GRBFNN 11.7406 2.3532 17.7543 18.0785 3.8928 34.2312
MLR 31.3272 6.4854 40.6747 25.7448 5.4652 38.8463
4.2. Sixty Minute Prediction of Building Energy Consumption
4.2.1. Determination of the Optimal Input Variables

As aforementioned, the PACF was adopted to determine the input variables for building energy
consumption prediction. The PACF for the 60 min building energy consumption time series is
demonstrated in Figure 10.
We also chose the time series lags whose absolute values of the partial autocorrelations were
greater than or equal to 0.1 as the optimal input variables. From Figure 10, for the 60 min building
energy consumption prediction, there were 21 lags meeting the above condition. The determined
optimal input variables with respect to y( p) are x1 = y( p − 72), x2 = y( p − 49), x3 = y( p − 48),
x4 = y( p − 47), x5 = y( p − 26), x6 = y( p − 25), x7 = y( p − 24), x8 = y( p − 23), x9 = y( p − 22),
x15 = y( p − 10), x16 = y( p − 8), x17 = y( p − 7), x18 = y( p − 6), x19 = y( p − 3), x20 = y( p − 2), and
x21 = y( p − 1).
Energies 2017, 10, 1525 14 of 20
1
Sample Partial Autocorrelations
0.5
−0.5
−1
0 10 20 30 40 50 60 70 80
Lags
Figure 10. The partial autocorrelation function (PACF) of the 60-min prediction experiment with 80
lags.
4.2.2. Configuration of the Prediction Models

In the same way as for the 30 min prediction experiment, by performing a grid search, we obtained
the values of the RMSEs for the 60 min building energy consumption prediction and list these in
Table 3. Clearly, in this case, the extreme SAE with the smallest RMSE has 2 hidden layers and 50
hidden units in each hidden layer.
Table 3. Root-mean-square errors (RMSEs) of the 60 min prediction experiment with various numbers
of hidden layers and hidden units.
k=1 k=2 k=3 k=4

h = 50 59.4885 59.1812 59.8399 59.2147
h = 100 63.3515 63.4566 63.5216 63.1455
h = 150 62.3616 62.4175 62.9833 63.5690
h = 200 64.6232 64.9396 65.4826 68.3591
h = 250 63.9450 63.5129 64.6592 63.9566
h = 300 65.6216 64.4153 64.1315 64.5682
h = 350 66.2908 64.9708 66.1777 65.6557
h = 400 66.3097 64.3294 66.2774 66.3823
In this case, the configurations of the BPNN, SVR, the GRBFNN and MLR are as follows. For the
BPNN, the number of hidden layer nodes and the iteration number were respectively set to be 200 and
17,000. In the hidden layer, the logsig activation function was used. For SVR, the radial basis function
was chosen as the kernel function and the penalty factor C was set to be 80. Again, we did not use
shrinking heuristics in the training process. The configurations of the GRBFNN and MLR in this case
were the same as those in the 30 min prediction experiment.
4.2.3. Results
For the testing data, the prediction results of the five prediction models in the 60 min prediction
experiment are demonstrated in Figure 11. Again, for better visualization, parts of the results
(the values between 200 and 300) in Figure 11 have been zoomed in and are plotted in Figure 12
to demonstrate finer details.
Energies 2017, 10, 1525 15 of 20
1400
1300
1200
1100
Energy consumption
1000
900
800
700
BPNN
SVR
600
GRBFNN
MLR
500 ExtremeSAE
Actual
400
0 50 100 150 200 250 300 350
samples
Figure 11. Prediction results of the five models in the 60 min experiment.
(a) (b)
1400 1400
BPNN SVR
Energy consumption
Energy consumption
1200 1200
Actual Actual
1000 1000
800 800
600 600
400 400
200 220 240 260 280 300 200 220 240 260 280 300
(c) (d)
1400 1400
GRBFNN MLR
Energy consumption
Energy consumption
1200 Actual 1200 Actual

1000 1000
800 800
600 600
400 400
200 220 240 260 280 300 200 220 240 260 280 300
(e)
1400
ExtremeSAE
Energy consumption
1200 Actual
1000
800
600
400
200 220 240 260 280 300
Figure 12. Parts of the zoomed-in prediction results: (a) backward propagation neural network (BPNN);
(b) support vector regression (SVR); (c) generalized radial basis function neural network (GRBFNN);
(d) multiple linear regression (MLR); and (e) extreme stacked autoencoder (SAE).
Energies 2017, 10, 1525 16 of 20
The residual errors of the five prediction models in the 60 min prediction experiment are
demonstrated in Figure 13. Once more, parts of the results (the values between 200 and 300) in
Figure 13 have been zoomed in and are re-plotted in Figure 14.
Similarly, in the 60 min prediction experiment, the MAEs, MREs and RMSEs of the five prediction
models in the training and testing processes are listed in Table 4.
300
BPNN
SVR
GRBFNN
200 MLR
ExtremeSAE
100
Residual Error
−100
−200
−300
0 50 100 150 200 250 300 350
samples
Figure 13. Residual errors of the five models in the 60 min prediction experiment.
300
BPNN
SVR
GRBFNN
200 MLR
ExtremeSAE
100
Residual Error
−100
−200
−300
200 210 220 230 240 250 260 270 280 290 300
samples
Figure 14. Parts of the zoomed-in residual errors.

Energies 2017, 10, 1525 17 of 20
Table 4. Comparison results of the five prediction models in the 60 min prediction experiment.
Performance Training Testing

Methods MAE MRE (%) RMSE MAE MRE (%) RMSE
Extreme SAE 32.1336 3.2429 54.3246 33.7168 3.6420 59.1812
BPNN 65.0351 6.0898 85.4008 59.2456 6.3922 84.9968
SVR 34.3843 3.5179 58.6403 43.4038 5.1010 77.3101
GRBFNN 23.6996 2.3306 38.6174 38.6145 4.2105 80.7410
MLR 63.9774 6.6008 85.2207 56.0647 5.9556 89.9583
4.3. Comparisons and Discussion

Figures 6 and 11 demonstrate the prediction performances of the five models for the testing
data. From a global perspective, the proposed extreme SAE approach performs best, especially for
the abnormal testing data that are from 400 to 500 in the 30 min experiment and from 200 to 250 in
the 60 min experiment. These abnormal testing data reflect the uncertainties in the building energy
consumption; that is to say, the proposed extreme SAE approach has the greatest ability to deal with
the uncertain building energy consumption data. This judgement can also be observed and verified
more clearly from Figures 7 and 12.
Figures 8 and 13 show the residual errors of the five models for the testing data. From both figures,
we can observe that the residual errors of the extreme SAE usually lie in the smallest scale around zero
compared with the four comparative methods. This phenomenon also indicates that the proposed deep
learning approach has better prediction accuracy than the four comparative methods. The residual
errors for the abnormal testing data, as shown in Figures 9 and 14, reveal the performances of the five
models in more detail. Again, the extreme SAE performs best. We can also observe that MLR has the
poorest performance in both experiments. This also verifies that the artificial intelligence methods
always perform better than the statistical methods.
As shown in Tables 2 and 4, from the point of view of the testing performance, the proposed
deep learning approach has the smallest prediction error and the best accuracy for building energy
consumption prediction, and its accuracy is promising. For the testing data, in the 30 min prediction
experiment, the performance of the extreme SAE was at least 18.2%, 21.1% and 15.3% better according
to the three indices when compared with the other four methods, and, in the 60 min prediction
experiment, the values for the same indices were respectively 12.7%, 13.5% and 23.5% better when
compared with the four comparative models.
Although the GRBFNN has the best training indices, its performances for the testing data are
poor, and the over-fitting phenomenon in the training process of the GRBFNN is observed. Generally,
MLR has the poorest performances both for the training and testing data, followed by the BPNN.
SVR performs better than the GRBFNN, the BPNN and MLR, but performs worse than the proposed
extreme SAE approach. In summary, in both experiments, extreme SAE > SVR > GRBFNN > BPNN >
MLR, where > indicates “performs better than”. These facts again verify the better feature extraction
ability of the deep learning model in the prediction of building energy consumption.
5. Conclusions
Deep learning has shown its powerful learning and prediction abilities in the time series prediction
applications. This study aimed to utilize one popular deep learning approach—the SAE method—to
improve the predicted results of building energy consumptions. Theoretically, this study provided
a novel learning method by combining the SAE method and the ELM method. The main difference
between the proposed method and the traditional SAE method is that the proposed method does not
need the fine-tuning of the whole network by the iterative back-propagation algorithm, but directly
utilizes the ELM method to find the output weights without iterations. This can quicken the learning
speed and strengthen the generalization performance. For the application aspect, the proposed deep
Energies 2017, 10, 1525 18 of 20
learning method was applied to the energy consumption prediction of a specific building, whose one
year energy consumption data were collected. The experimental and comparison results demonstrate
that the deep learning method outperforms several popular traditional machine learning methods.
The reason for this may be that the proposed deep learning method has deeper architecture and
improved learning strategies compared with the other comparative methods. In other words, although
the data set in this application does not have a large quantity of data, the deep learning method can
still extract better building energy consumption features and improve the prediction accuracy.
We are continuing to investigate other schemes to further improve the prediction accuracy.
By analyzing the collected building energy consumption data, we found that the building energy
consumption changes periodically. By considering its periodicity, achieving better performance may
be expected. Then, what is left to be investigated is how to simultaneously utilize both the data and
the prior knowledge of periodicity to construct the DNN. This will be one of our future research
directions.
Acknowledgments: This work is supported by the National Natural Science Foundation of China (61473176,
61105077, and 61573225), and the Natural Science Foundation of Shandong Province for Young Talents in Provincial
Universities (ZR2015JL021).
Author Contributions: Chengdong Li, Dongbin Zhao and Jianqiang Yi have contributed to developing ideas
about energy consumption prediction and collecting the data. Zixiang Ding and Guiqing Zhang programmed the
algorithm and tested it. All the authors were involved in preparing the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Štreimikienė, S. Residential energy consumption trends, main drivers and policies in Lithuania. Renew. Sustain.
Energy Rev. 2014, 35, 285–293.
2. Ugursal, V.I. Energy consumption, associated questions and some answers. Appl. Energy 2014, 130, 783–792.
3. Hua, C.; Lee, W.L.; Wang, X. Energy assessment of office buildings in China using China building energy
codes and LEED 2.2. Energy Build. 2015, 86, 514–524.
4. Zuo, J.; Zhao, Z.Y. Green building research-current status and future agenda: A review. Renew. Sustain.
Energy Rev. 2014, 30, 271–281.
5. Daut, M.A.M.; Hassan, M.Y.; Abdullah, H.; Rahman, H.A.; Abdullah, M.P.; Hussin, F. Building electrical
energy consumption forecasting analysis using conventional and artificial intelligence methods: A review.
Renew. Sustain. Energy Rev. 2017, 70, 1108–1118.
6. Li, K.; Hu, C.; Liu, G.; Xue, W. Building’s electricity consumption prediction using optimized artificial neural
networks and principal component analysis. Energy Build. 2015, 108, 106–113.
7. Pombeiro, H.; Santos, R.; Carreira, P.; Silva, C.; Sousa, J.M.C. Comparative assessment of low-complexity
models to predict electricity consumption in an institutional building: Linear regression vs. fuzzy modeling
vs. neural networks. Energy Build. 2017, 146, 141–151.
8. Jimenez, M.J.; Heras, M.R. Application of multi-output ARX models for estimation of the U and g values of
building components in outdoor testing. Sol. Energy 2005, 79, 302–310.
9. Kimbara, A.; Kurosu, S.; Endo, R.; Kamimura, K.; Matsuba, T.; Yamada, A. On-line prediction for load profile
of an air-conditioning system. Ashrae Trans. 1995, 101, 198–207.
10. Newsham, G.R.; Birt, B.J. Building-level occupancy data to improve ARIMA-based electricity use forecasts.
In Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building,
Zurich, Switzerland, 2 November 2010; pp. 13–18.
11. Aydinalp-Koksal, M.; Ugursal, V.I. Comparison of neural network, conditional demand analysis, and
engineering approaches for modeling end-use energy consumption in the residential sector. Appl. Energy
2008, 85, 271–296.
12. Hsu, D. Comparison of integrated clustering methods for accurate and stable prediction of building energy
consumption data. Appl. Energy 2015, 160, 153–163.
13. Alvarez, F.M.; Troncoso, A.; Riquelme, J.C.; Ruiz, J.S.A. Energy time series forecasting based on pattern
sequence similarity. IEEE Trans. Knowl. Data Eng. 2011, 23, 1230–1243.
Energies 2017, 10, 1525 19 of 20
14. Pérez-Chacón, R.; Talavera-Llames, R.L.; Martinez-Alvarez, F.; Troncoso, A. Finding electric energy
consumption patterns in big time series data. In Proceedings of the13th International Conference Distributed
Computing and Artificial Intelligence, Sevilla, Spain, 1–3 June 2016; Springer: Cham, Switzerland, 2016; pp.
231–238.
15. Martínez-Álvarez, F.; Troncoso, A.; Asencio-Cortés, G.; Riquelme, J.C. A survey on data mining techniques
applied to electricity-related time series forecasting. Energies 2015, 8, 13162–13193.
16. Tso, G.K.F.; Yau, K.K.W. Predicting electricity energy consumption: A comparison of regression analysis,
decision tree and neural networks. Energy 2007, 32, 1761–1768.
17. Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs. Neurons: Comparison between random forest and ANN
for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89.
18. Paudel, S.; Elmitri, M.; Couturier, S.; Nguyen, P.H.; Kamphuis, R.; Lacarrière, B.; Corre, O.L. A relevant
data selection method for energy consumption prediction of low energy building based on support vector
machine. Energy Build. 2017, 138, 240–256.
19. Mena, R.; Rodríguez, F.; Castilla, M.; Arahal, M.R. A prediction model based on neural networks for the
energy consumption of a bioclimatic building. Energy Build. 2014, 82, 142–155.
20. Biswas, M.A.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural
network approach. Energy 2016, 117, 84–92.
21. Naji, S.; Shamshirband, S.; Basser, H.; Keivani, A.; Alengaram, U.J.; Jumaat, M.Z.; Petkovic, D. Application of
adaptive neuro-fuzzy methodology for estimating building energy consumption. Renew. Sustain. Energy Rev.
2016, 53, 1520–1528.
22. Ekici, B.B.; Aksoy, U.T. Prediction of building energy needs in early stage of design by using ANFIS.
Expert Syst. Appl. 2011, 38, 5352–5358.
23. Yang, J.; Rivard, H.; Zmeureanu, R. On-line building energy prediction using adaptive artificial neural
networks. Energy Build. 2005, 37, 1250–1259.
24. Naji, S.; Keivani, A.; Shamshirband, S.; Alengaram, U.J.; Jumaat, M.Z.; Mansor, Z.; Lee, M. Estimating
building energy consumption using extreme learning machine method. Energy 2016, 97, 506–516.
25. Zhang, Y.; Chen, Q. Prediction of building energy consumption based on PSO-RBF neural network.
In Proceedings of the IEEE International Conference on System Science and Engineering, Shanghai, China,
11–13 July 2014; pp. 60–63.
26. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2014,
18, 1527–1554.
27. Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with
multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008; pp. 160–167.
28. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,
313, 504–507.
29. Huval, B.; Coates, A.; Ng, A. Deep learning for class-generic object detection. arXiv 2013, arXiv:1312.6885.
30. Qiu, X.; Zhang, L.; Ren, Y.; Suganthan, P.N.; Amaratunga, G. Ensemble deep learning for regression and time
series forecasting. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence in Ensemble
Learning (CIEL), Orlando, FL, USA, 9–12 December 2014; pp. 1–6.
31. Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016,
3, 247–254.
32. Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach.
IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873.
33. Torres, J.; Fernández, A.; Troncoso, A.; Martínez-Álvarez, F. Deep learning-based approach for time series
forecasting with application to electricity load. In Proceedings of the International Work-Conference on the
Interplay between Natural and Artificial Computation, Corunna, Spain, 19–23 June2017; Springer: Cham,
Switzerland, 2017; pp. 203–212.
34. Bengio, Y.; Lamblin, P.; Dan, P.; Larochelle, H. Greedy layer-wise training of deep networks. In Proceedings
of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7
December 2006; pp. 153–160.
Energies 2017, 10, 1525 20 of 20
35. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with
denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008; ACM: New York, NY, USA, 2008; pp. 1096–1103.
36. Palm, R.B. Prediction as a Candidate for Learning Deep Hierarchical Models of Data; Technical University of
Denmark: Kongens Lyngby, Denmark, 2012; Volume 5.
37. Hosseiniasl, E.; Zurada, J.M.; Nasraoui, O. Deep learning of part-based representation of data using sparse
autoencoders with nonnegativity constraints. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2486–2498.
38. Xu, J.; Xiang, L.; Liu, Q.; Gilmore, H.; Wu, J.; Tang, J.; Madabhushi, A. Stacked sparse autoencoder (SSAE)
for nuclei detection on breast cancer histopathology images. IEEE Trans. Med. Imaging 2016, 35, 119–130.
39. Tao, C.; Pan, H.; Li, Y.; Zou, Z. Unsupervised spectral-spatial feature learning with stacked sparse
autoencoder for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2438–2442.
40. Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.A.; Vincent, P.; Bengio, S. Why does unsupervised
pre-training help deep learning? J. Mach. Learn. Res. 2010, 11, 625–660.
41. Huang, G.B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011,
2, 107–122.
42. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing
2006, 70, 489–501.
43. Li, M.B.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. Letters: Fully complex extreme learning machine.
Neurocomputing 2005, 68, 306–314.
44. Erb, R.J. Introduction to backpropagation neural network computation. Pharm. Res. 1993, 10, 165–170.
45. Awad, M.; Khanna, R. Support vector regression. Neural Inf. Process. Lett. Rev. 2007, 11, 203–224.
46. Friedrichs, F.; Schmitt, M. On the power of Boolean computations in generalized RBF neural networks.
Neurocomputing 2005, 63, 483–498.
47. Preacher, K.J.; Curran, P.J.; Bauer, D.J. Computational tools for probing interactions in multiple linear
regression, multilevel modeling, and latent curve analysis. J. Educ. Behav. Stat. 2006, 31, 437–448.
48. Eberly, L.E. Multiple linear regression. Methods Mol. Biol. 2007, 404, 165–187.
49. Chong, T.L. Estimating the differencing parameter via the partial autocorrelation function. J. Econom. 1998,
97, 365–381.
50. Zhang, Z.; Law, C.L.; Gunawan, E. Multipath mitigation technique based on partial autocorrelation function.
Wirel. Pers. Commun. 2007, 41, 145–154.
51. Alder, B.J.; Wainwright, T.E. Decay of the Velocity Autocorrelation Function. Phys. Rev. A 1970, 1, 18–21.
52. Jiang, X.; Adeli, H. Wavelet Packet-Autocorrelation Function Method for Traffic Flow Pattern Analysis.
Comput. Aided Civ. Infrastruct. Eng. 2010, 19, 324–337.
c 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach

Uploaded by

Copyright:

Available Formats

Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach

Uploaded by

Copyright:

Available Formats

energies

Received: 23 August 2017; Accepted: 25 September 2017; Published: 7 October 2017

Keywords: building energy consumption; deep learning; stacked autoencoders; extreme

Energies 2017, 10, 1525; doi:10.3390/en10101525 www.mdpi.com/journal/energies

2.1. The Stacked Autoencoder

Figure 1. The autoencoder.

Ψ = {w 1 , w 2 } = arg min L(x , σ 2 (x )) (4)

2.1.2. Sparse Autoencoder

If ρ = ρ̂ j , KL(ρ k ρ̂ j ) will be zero. Additionally, if ρ̂ j is close to 0 or 1, KL(ρ k ρ̂ j ) will be infinity.

2.1.3. The Structure of the Stacked Autoencoder

2.2. Extreme Stacked Autoencoder and Its Training Algorithm

Figure 3. The structure of the extreme SAE with k hidden layers.

2.2.1. Pre-Training of the Stacked Autoencoder Part

• Step 4: Output σ 1k (x ) and use it as the input of the predictor.

2.2.2. Least-Squares Learning of the Extreme Learning Machine Part

where ŷ(m) = σ k (x (m) )β .

σ k (x ) = [σk (x (1) ), σk (x (2) ), · · · , σk (x ( N ) )]T

where σ k (x )† is the Moore–Penrose generalized inverse of σ k (x ).

3. Energy Consumption Prediction Model

3.1. Applied Data Set

Figure 4. The first 500 samples of the 30 and 60 min experiments.

3.2. Energy Consumption Prediction Model

3.3. Experimental Setting

4.1. Thirty Minute Prediction of Building Energy Consumption

4.1.1. Determination of the Optimal Input Variables

4.1.2. Configuration of the Prediction Models

k=1 k=2 k=3 k=4

Figure 6. Prediction results of the five models in the 30 min experiment.

Figure 9. Parts of the zoomed-in residual errors.

Performance Training Testing

4.2. Sixty Minute Prediction of Building Energy Consumption

4.2.1. Determination of the Optimal Input Variables

4.2.2. Configuration of the Prediction Models

k=1 k=2 k=3 k=4

1200 Actual 1200 Actual

Figure 14. Parts of the zoomed-in residual errors.

Performance Training Testing

4.3. Comparisons and Discussion

You might also like