Short-Term Load Forecasting With Deep Residual Networks
Short-Term Load Forecasting With Deep Residual Networks
Short-Term Load Forecasting With Deep Residual Networks
Abstract—We present in this paper a model for forecasting methods can be found in [7]–[10]. Building STLF systems
short-term electric load based on deep residual networks. The with artificial neural networks (ANN) has long been one of
proposed model is able to integrate domain knowledge and the main-stream solutions to this task. As early as 2001, a
researchers’ understanding of the task by virtue of different
neural network building blocks. Specifically, a modified deep review paper by Hippert et al. surveyed and examined a col-
residual network is formulated to improve the forecast results. lection of papers that had been published between 1991 and
Further, a two-stage ensemble strategy is used to enhance 1999, and arrived at the conclusions that most of the proposed
the generalization capability of the proposed model. We also models were over-parameterized and the results they had to
apply the proposed model to probabilistic load forecasting using offer were not convincing enough [11]. In addition to the fact
Monte Carlo dropout. Three public datasets are used to prove
the effectiveness of the proposed model. Multiple test cases that the size of neural networks would grow rapidly with the
and comparison with existing models show that the proposed increase in the numbers of input variables, hidden nodes or
model provides accurate load forecasting results and has high hidden layers, other criticisms mainly focus on the “over-
generalization capability. fitting” issue of neural networks [1]. Nevertheless, different
Index Terms—Short-term load forecasting, deep learning, deep types and variants of neural networks have been proposed and
residual network, probabilistic load forecasting. applied to STLF, such as radial basis function (RBF) neural
networks [12], wavelet neural networks [13], [14], extreme
learning machines (ELM) [15], to name a few.
I. I NTRODUCTION Recent developments in neural networks, especially deep
HE FORECASTING of power demand is of cru- neural networks, have had great impacts in the fields includ-
T cial importance for the development of modern power
systems. The stable and efficient management, scheduling and
ing computer vision, natural language processing, and speech
recognition [16]. Instead of sticking with fixed shallow struc-
dispatch in power systems rely heavily on precise forecast- tures of neural networks with hand-designed features as inputs,
ing of future loads on various time horizons. In particular, researchers are now able to integrate their understandings of
short-term load forecasting (STLF) focuses on the forecast- different tasks into the network structures. Different building
ing of loads from several minutes up to one week into the blocks including convolutional neural networks (CNN) [17],
future [1]. A reliable STLF helps utilities and energy providers and long short-term memory (LSTM) [18] have allowed deep
deal with the challenges posed by the higher penetration of neural networks to be highly flexible and effective. Various
renewable energies and the development of electricity markets techniques have also been proposed so that neural networks
with increasingly complex pricing strategies in future smart with many layers can be trained effectively without the
grids. vanishing of gradients or severe overfitting. Applying deep
Various STLF methods have been proposed by researchers neural networks to short-term load forecasting is a relatively
over the years. Some of the models used for STLF include new topic. Researchers have been using restricted Boltzmann
linear or nonparametric regression [2], [3], support vector machines (RBM) and feed-forward neural networks with
regression (SVR) [1], [4], autoregressive models [5], fuzzy- multiple layers in forecasting of demand side loads and natural
logic approach [6], etc. Reviews and evaluations of existing gas loads [19], [20]. However, these models are increasingly
hard to train as the number of layers increases, thus the num-
Manuscript received October 1, 2017; revised February 1, 2018 and March ber of hidden layers are often considerably small (e.g., 2 to 5
18, 2018; accepted May 29, 2018. Date of publication June 5, 2018; date of
current version June 19, 2019. Paper no. TSG-01422-2017. (Corresponding layers), which limits the performance of the models.
author: Jinliang He.) In this work, we aim at extending existing structures
K. Chen, J. Hu, and J. He are with the State Key Laboratory of Power of ANN for STLF by adopting state-of-the-art deep neural
Systems, Department of Electrical Engineering, Tsinghua University, Beijing
100084, China (e-mail: [email protected]). network structures and implementation techniques. Instead of
K. Chen is with the Department of Electrical Engineering, Beijing Jiaotong stacking multiple hidden layers between the input and the out-
University, Beijing 100044, China. put, we learn from the residual network structure proposed
Q. Wang is with the Department of Information Technology and Electrical
Engineering, ETH Zurich, 8092 Zürich, Switzerland. in [21] and propose a novel end-to-end neural network model
Z. He is with the Department of Industrial and Systems Engineering, capable of forecasting loads of next 24 hours. An ensem-
University of Southern California, Los Angeles, CA 90007 USA. ble strategy to combine multiple individual networks is also
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. proposed. Further, we extend the model to probabilistic load
Digital Object Identifier 10.1109/TSG.2018.2844307 forecasting by adopting Monte Carlo (MC) dropout (for a
1949-3053 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
3944 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 4, JULY 2019
TABLE I
comprehensive review of probabilistic electric load forecast- I NPUTS FOR THE L OAD F ORECAST OF THE hTH H OUR OF THE N EXT DAY
ing, the reader is referred to [22] and [23]). The contributions
of this work are two-folds. First, a fully end-to-end model
based on deep residual networks for STLF is proposed. The
proposed model does not involve external feature extraction or
feature selection algorithms, and only raw data of loads, tem-
perature and information that is readily available are used as
inputs. The results show that the forecasting performance can
be greatly enhanced by improving the structure of the neural
networks and adopting the ensemble strategy. As complicated
feature engineering techniques and additional information
(e.g., humidity, wind speed, cloud cover, etc.) are not involved,
we provide a good benchmark that can be easily compared
with. In addition, the building blocks of the proposed model
can also be adapted to existing neural-network-based STLF
models. Combining the building blocks with existing feature
extraction and feature selection techniques is straight-forward
and may lead to further improvement in accuracy. Additional
data can also be easily incorporated. Second, a new formula-
tion of probabilistic STLF for an ensemble of neural networks
is proposed. By using MC dropout, we can directly obtain the
probability forecasting results using the models trained for the temperature time series [24]. More specifically, we expect that
task of point forecasting. Lhmonth , Lhweek , Thmonth and Thweek can help the model identify
The remainder of the paper is organized as follows. In long-term trends in the time series (the days of the same day-
Section II, we formulate the proposed model based on deep of-week index as the next day are selected as they are more
day
residual networks. The ensemble strategy, the MC dropout likely to have similar load characteristics [13]), while Lh and
day
method, as well as the implementation details are also pro- Th are able to provide short-term closeness and characteris-
vided. In Section III, the results of STLF by the proposed tics. The input Lhhour feeds the loads of the most recent 24 hours
model are presented. We also discuss the performance of to the model. Forecast loads are used to replace the values in
the proposed model and compare it with existing methods. Lhhour that are not available at the time of forecasting, which
Section IV concludes this paper and proposes future works. also helps associate the forecasts of the whole day. Note that
The source code for the STLF model proposed in this paper is the sizes of the above-mentioned inputs can be adjusted flexi-
available at https://github.com/yalickj/load-forecasting-resnet. bly. In addition, one-hot codes for season,1 weekday/weekend
distinction, and holiday/non-holiday2 distinction are added to
II. S HORT-T ERM L OAD F ORECASTING BASED help the model capture the periodic and unordinary temporal
ON D EEP R ESIDUAL N ETWORKS characteristics of the load time series.
The structure of the neural network model for load fore-
In this paper, we propose a day-ahead load forecasting casting of one hour is illustrated in Fig. 1. For Lhmonth , Lhweek ,
model based on deep residual networks. We first formulate day day
Lh , Thmonth , Thweek , and Th , we first concatenate the pairs
the low-level basic structure where the inputs of the model are day day
processed by several fully connected layers to produce prelim- [Lhmonth , Thmonth ], [Lhweek , Thweek ], and [Lh , Th ], and connect
inary forecasts of 24 hours. The preliminary forecasts are then them with three separate fully-connected layers. The three
passed through a deep residual network. After presenting the fully-connected layers are then concatenated and connected
structure of the deep residual network, some modifications are with another fully-connected layer denoted as FC2 . For Lhhour ,
made to further enhance its learning capability. An ensemble we forward pass it through two fully-connected layers, the
strategy is designed to enhance the generalization capability second layer of which is denoted as FC1 . S and W are con-
of the proposed model. The formulation of MC dropout for catenated to produce two fully-connected layers, one used as
probabilistic forecasting is also provided. part of the input of FC1 , the other used as part of the input of
FC2 . H is also connected to FC2 . In order to produce the out-
put Lh , we concatenate FC1 , FC2 , and Th , and connect them
A. Model Input and the Basic Structure for Load
with a fully-connected layer, which is then connected to Lh
Forecasting of One Hour
with another fully connected layer. All fully-connected layers
We use the model with the basic structure to give prelim-
inary forecasts of the 24 hours of the next day. Specifically, 1 In this paper, the ranges for Spring, Summer, Autumn, and Winter are
the inputs used to forecast the load for the hth hour of the March 8th to June 7th, June 8th to September 7th, September 8th to December
next day, Lh , are listed in Table I. The values for loads and 7th, December 8th to March 7th, respectively.
2 In this paper, we consider three major public holidays, namely, Christmas
temperatures are normalized by dividing the maximum value
Eve, Thanksgiving Day, and Independence Day as the activities involved in
of the training dataset. The selected inputs allow us to capture these holidays have great impacts on the loads. The rest of the holidays are
both short-term closeness and long-term trends in the load and considered as non-holidays for simplicity.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: STLF WITH DEEP RESIDUAL NETWORKS 3945
Fig. 2. The building block of the deep residual network. SELU is used as
Fig. 1. The structure of the neural network model for load forecasting of the activation function between two linear layers.
one hour.
but the output layer use scaled exponential linear units (SELU) We then concatenate {L1 , . . . , L24 } as L, which directly
as the activation function. becomes the output of the model with the basic structure. Next,
The adoption of the ReLU has greatly improved the we proceed to formulate the deep residual network and add
performance of deep neural networks [25]. Specifically, ReLU it on top of L. The output of the deep residual network is
has the form denoted as ŷ and has the same size of L.
ReLU(yi ) = max(0, yi ) (1) B. The Deep Residual Network Structure for Day-Ahead
Load Forecasting
where yi is the linear activation of the i-th node of a layer. A
problem with ReLU is that if a unit can not be activated by any In [21], an innovative way of constructing deep neural
input in the dataset, the gradient-based optimization algorithm networks for image recognition is proposed. In this paper, the
is unable to update the weights of the unit, so that the unit residual block in Fig. 2 is used to build the deep neural network
will never be activated again. In addition, the network will structure. In the residual block, instead of learning a mapping
become very hard to train if a large proportion of the hidden from x to H(x), a mapping from x to F(x, ) is learned, where
units produce constant 0 gradients [26]. This problem can be is a set of weights (and biases) associated with the residual
solved by adding a slope to the negative half axis of ReLU. block. Thus, the overall representation of the residual block
With a simple modification to the formulation of ReLU on the becomes
negative half axis, we get PReLU [27]. The activations of a H(x) = F(x, ) + x (4)
layer with PReLU as the activation function is obtained by
A deep residual network can be easily constructed by stack-
y if yi > 0
PReLU(yi ) = i (2) ing a number of residual blocks. We illustrate in Fig. 3 the
βi yi if yi ≤ 0 structure of the deep residual network (ResNet) used for the
where βi is the coefficient controlling the slope of βi yi when proposed model. More specifically, if K residual blocks are
yi ≤ 0. A further modification to ReLU that induces self- stacked, the forward propagation of such a structure can be
normalizing properties is provided in [28], where the activation represented by
function of SELU is given by
K
xK = x0 + F(xi−1 , i−1 ) (5)
y if yi > 0
SELU(yi ) = λ i yi (3)
αe − α if yi ≤ 0 i=1
within Lhhour are replaced by {L1 , . . . , Lh−1 } for h > 1. Instead where L is the overall loss of the neural network. The “1”
of simply copying the values, we maintain the neural network in the equation indicates that the gradients at the output of
connections underneath them. Thus, the gradients of subse- the network can be directly back-propagated to the input of
quent hours can be propagated backward through time. This the network, so that the vanishing of gradients (which is often
would help the model adjust the forecast value of each hour observed when the gradients at the output have to go through
given the inputs and forecast values of the rest of the hours. many layers before reaching the input) in the network is much
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
3946 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 4, JULY 2019
Fig. 3. An illustration of the deep residual network (ResNet) structure. More network in [31], the outputs of those blue dots are connected
shortcut connections are made in addition to the ones within the blocks. In this
figure, every three residual blocks has one shortcut connection and another
to all main residual blocks in subsequent layers. Starting from
shortcut connection is made from the input to the output. Each round node the second layer, the input of each main residual block is
averages all of its inputs. obtained by averaging all connections from the blue dots on
the right together with the connection from the input of the
network (indicated by the blue dots on the main path). It is
less likely to occur [29]. As a matter of fact, this equation expected that the additional side residual blocks and the dense
can also be applied to any pair (xi , xj ) (0 ≤ i < j ≤ K), shortcut connections can improve the representation capability
where xi and xj are the output of the ith residual block (or the and the efficiency of error back-propagation of the network.
input of the network when i = 0), and the jth residual block, Later in this paper, we will compare the performance of the
respectively. basic structure, the basic structure connected with ResNet, and
In addition to the stacked residual blocks, extra shortcut the basic structure connected with ResNetPlus.
connections can be added into the deep residual network, as
is introduced in [30]. Concretely, two levels of extra shortcut
connections are added to the network. The lower level shortcut C. The Ensemble Strategy of Multiple Models
connection bypasses several adjacent residual blocks, while the It is widely acknowledged in the field of machine learning
higher level shortcut connection is made between the input and that an ensemble of multiple models has higher generalization
output. If more than one shortcut connection reaches a residual capability [16] than individual models. In [33], analysis of
block or the output of the network, the values from the connec- neural network ensembles for STLF of office buildings is pro-
tions are averaged. Note that after adding the extra shortcut vided by the authors. Results show that an ensemble of neural
connections, the formulations of the forward-propagation of networks reduces the variance of performances. A demonstra-
responses and the back-propagation of gradients are slightly tion of the ensemble strategy used in this paper is shown in
different, but the characteristics of the network that we care Fig. 5. More specifically, the ensemble strategy consists of two
about remain unchanged. stages.
We can further improve the learning ability of ResNet by The first stage of the strategy takes several snapshots dur-
modifying its structure. Inspired by the convolutional network ing the training of a single model. Huang et al. [34] show that
structures proposed in [31] and [32], we propose the modified setting cyclic learning rate schedules for stochastic gradient
deep residual network (ResNetPlus), whose structure is shown descent (SGD) optimizer greatly improves the performance
in Fig. 4. First, we add a series of side residual blocks to the of existing deep neural network models. In this paper, as
model (the residual blocks on the right). Unlike the implemen- we use Adam (abbreviated from adaptive moment estima-
tation in [32], the input of the side residual blocks is the output tion [35]) as the optimizer, the learning rates for each iteration
of the first residual block on the main path (except for the first are decided adaptively. Thus, no learning rate schedules are set
side residual block, whose input is the input of the network). by ourselves. This scheme is similar to the NoCycle snapshot
The output of each main residual block is averaged with the ensemble method discussed in [34], that is, we take several
output of the side residual block in the same layer (indicated snapshots of the same model during its training process (e.g.,
by the blue dots on the right). Similar to the densely connected the 4 snapshots along the training process of the model with
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: STLF WITH DEEP RESIDUAL NETWORKS 3947
1 ∗
M
2
≈ ŷ(m) − ŷ¯ ∗ + σ2 (8)
M
m=1
where X and Y are the observations we use to train f W (·), [Lh , Th ], [Lh
week week month , Th
month hour
], and Lh has 10 hidden nodes,
a neural network with parameters W. The intractable poste- while the fully-connected layers for [S, W] have 5 hidden
rior distribution p(W|X, Y) is often approximated by various nodes. FC1 , FC2 , and the fully-connected layer before Lh have
inference methods [37]. In this paper, we use MC dropout [38] 10 hidden nodes. All but the output layer use SELU as the
to obtain the probabilistic forecasting uncertainty, which is activation function.
easy and computationally efficient to implement. Specifically, 2) The Deep Residual Network (ResNet): ResNet is added
dropout refers to the technique of randomly dropping out to the neural network with the basic structure. Each residual
hidden units in a neural network during the training of the block has a hidden layer with 20 hidden nodes and SELU as
network [39], and a parameter p is used to control the prob- the activation function. The size of the outputs of the blocks
ability that any hidden neuron is dropped out. If we apply is 24, which is the same as that of the inputs. A total of 30
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
3948 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 4, JULY 2019
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: STLF WITH DEEP RESIDUAL NETWORKS 3949
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
3950 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 4, JULY 2019
TABLE VI
C OMPARISON OF P ROBABILISTIC F ORECASTING P ERFORMANCE
M EASURES FOR THE Y EAR 2011 IN THE GEFC OM 2014 DATASET
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: STLF WITH DEEP RESIDUAL NETWORKS 3951
Fig. 8. Actual load and 95% prediction intervals for a winter week (left) and a summer week (right) of 1992 for the North-American Utility dataset. The
two weeks start with February 3rd, 1992, and July 6th, 1992, respectively.
probabilistic forecasting results by sampling the trained neu- [8] J. W. Taylor, L. M. D. Menezes, and P. E. Mcsharry, “A comparison
ral networks with MC dropout, we can conclude that the of univariate methods for forecasting electricity demand up to a day
ahead,” Int. J. Forecast., vol. 22, no. 1, pp. 1–16, Jan./Mar. 2006.
proposed model is good at capturing the uncertainty of the [9] H. Hahn, S. Meyer-Nieberg, and S. Pickl, “Electric load forecasting
task of STLF. methods: Tools for decision making,” Eur. J. Oper. Res., vol. 199, no. 3,
pp. 902–907, Dec. 2009.
[10] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meter data
IV. C ONCLUSION AND F UTURE W ORK analytics: Applications, methodologies, and challenges,” IEEE Trans.
We have proposed an STLF model based on deep resid- Smart Grid, to be published, doi: 10.1109/TSG.2018.2818167.
[11] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for
ual networks in this paper. The low-level neural network with short-term load forecasting: A review and evaluation,” IEEE Trans.
the basic structure, the ResNetPlus structure, and the two- Power Syst., vol. 16, no. 1, pp. 44–55, Feb. 2001.
stage ensemble strategy enable the proposed model to have [12] C. Cecati, J. Kolbusz, P. Różycki, P. Siano, and B. M. Wilamowski, “A
novel RBF training algorithm for short-term electric load forecasting
high accuracy as well as satisfactory generalization capabil- and comparative studies,” IEEE Trans. Ind. Electron., vol. 62, no. 10,
ity. Two widely acknowledged public datasets are used to pp. 6519–6529, Oct. 2015.
verify the effectiveness of the proposed model with various [13] Y. Chen et al., “Short-term load forecasting: Similar day-based wavelet
test cases. Comparisons with existing models have shown that neural networks,” IEEE Trans. Power Syst., vol. 25, no. 1, pp. 322–330,
Feb. 2010.
the proposed model is superior in both forecasting accuracy [14] Y. Zhao, P. B. Luh, C. Bomgardner, and G. H. Beerel, “Short-term load
and robustness to temperature variation. We have also shown forecasting: Multi-level wavelet neural networks with holiday correc-
that the proposed model can be directly used for probabilistic tions,” in Proc. Power Energy Soc. Gen. Meeting, Calgary, AB, Canada,
2009, pp. 1–7.
forecasting when MC dropout is adopted.
[15] R. Zhang, Z. Y. Dong, Y. Xu, K. Meng, and K. P. Wong, “Short-term
A number of paths for further work are attractive. As we load forecasting of Australian national electricity market by an ensemble
have only scratched the surface of state-of-the-art of deep neu- model of extreme learning machine,” IET Gener. Transm. Distrib., vol. 7,
ral networks, we may apply more building blocks of deep no. 4, pp. 391–397, Apr. 2013.
[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
neural networks (e.g., CNN or LSTM) into the model to MA, USA: MIT Press, 2016.
enhance its performance. In addition, we will further investi- [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
gate the implementation of deep neural works for probabilistic tion with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
Process. Syst., 2012, pp. 1097–1105.
STLF and make further comparisons with existing methods.
[18] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
R EFERENCES [19] S. Ryu, J. Noh, and H. Kim, “Deep neural network based demand side
short term load forecasting,” Energies, vol. 10, no. 1, p. 3, 2016.
[1] E. Ceperic, V. Ceperic, and A. Baric, “A strategy for short-term load [20] G. Merkel, R. J. Povinelli, and R. H. Brown, “Deep neural
forecasting by support vector regression machines,” IEEE Trans. Power network regression for short-term load forecasting of natural gas,”
Syst., vol. 28, no. 4, pp. 4356–4364, Nov. 2013. in Proc. 37th Annu. Int. Symp. Forecast., 2017. [Online]. Available:
[2] K.-B. Song, Y.-S. Baek, D. H. Hong, and G. Jang, “Short-term load https://isf.forecasters.org/wp-content/uploads/ISF2017-Proceedings.pdf
forecasting for the holidays using fuzzy linear regression method,” IEEE [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
Trans. Power Syst., vol. 20, no. 1, pp. 96–101, Feb. 2005. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[3] W. Charytoniuk, M. S. Chen, and P. V. Olinda, “Nonparametric regres- Las Vegas, NV, USA, 2016, pp. 770–778.
sion based short-term load forecasting,” IEEE Trans. Power Syst.,
[22] T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial
vol. 13, no. 3, pp. 725–730, Aug. 1998.
review,” Int. J. Forecast., vol. 32, no. 3, pp. 914–938, Jul./Sep. 2016.
[4] E. E. Elattar, J. Goulermas, and Q. H. Wu, “Electric load forecasting
based on locally weighted support vector regression,” IEEE Trans. Syst., [23] B. Liu, J. Nowotarski, T. Hong, and R. Weron, “Probabilistic load
Man, Cybern. C, Appl. Rev., vol. 40, no. 4, pp. 438–447, Jul. 2010. forecasting via quantile regression averaging on sister forecasts,” IEEE
[5] J. W. Taylor, “Short-term electricity demand forecasting using double Trans. Smart Grid, vol. 8, no. 2, pp. 730–737, Mar. 2017.
seasonal exponential smoothing,” J. Oper. Res. Soc., vol. 54, no. 8, [24] J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “DNN-based prediction
pp. 799–805, Aug. 2003. model for spatio-temporal data,” in Proc. 24th ACM SIGSPATIAL Int.
[6] M. Rejc and M. Pantos, “Short-term transmission-loss forecast for the Conf. Adv. Geograph. Inf. Syst., Burlingame, CA, USA, 2016, p. 92.
Slovenian transmission power system based on a fuzzy-logic decision [25] G. E. Dahl, T. N. Sainath, and G. E. Hinton, “Improving deep neural
approach,” IEEE Trans. Power Syst., vol. 26, no. 3, pp. 1511–1521, networks for LVCSR using rectified linear units and dropout,” in Proc.
Aug. 2011. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Vancouver,
[7] E. A. Feinberg and D. Genethliou, “Load forecasting,” in Applied BC, Canada, 2013, pp. 8609–8613.
Mathematics for Restructured Electric Power Systems, J. H. Chow, [26] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
F. F. Wu, and J. Momoh, Eds. New York, NY, USA: Springer, 2005, improve neural network acoustic models,” in Proc. Int. Conf. Mach.
pp. 269–285. Learn., vol. 30, 2013, p. 3.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.
3952 IEEE TRANSACTIONS ON SMART GRID, VOL. 10, NO. 4, JULY 2019
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rec- Kunjin Chen received the B.Sc. degree in electri-
tifiers: Surpassing human-level performance on ImageNet classifica- cal engineering from Tsinghua University, Beijing,
tion,” in Proc. IEEE Int. Conf. Comput. Vis., Santiago, Chile, 2015, China, in 2015, where he is currently pursuing
pp. 1026–1034. the Ph.D. degree with the Department of Electrical
[28] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self- Engineering.
normalizing neural networks,” in Proc. Adv. Neural Inf. Process. Syst., His research interests include applications of
Long Beach, CA, USA, 2017, pp. 972–981. machine learning and data science in power systems.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in
deep residual networks,” in Proc. Eur. Conf. Comput. Vis., 2016,
pp. 630–645.
[30] K. Zhang et al., “Residual networks of residual networks: Multilevel
residual networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 28,
no. 6, pp. 1303–1314, Jun. 2018, doi: 10.1109/TCSVT.2017.2654543. Kunlong Chen received the B.Sc. degree in elec-
[31] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, trical engineering from Beijing Jiaotong University,
“Densely connected convolutional networks,” in Proc. IEEE Conf. Beijing, China, in 2015 and the engineering degree
Comput. Vis. Pattern Recognit., vol. 1. Honolulu, HI, USA, 2017, from CentraleSupélec, Paris, France, in 2015.
pp. 2261–2269. He is currently pursuing the M.Sc. degree with
[32] L. Zhao et al., “Deep convolutional neural networks with the Department of Electrical Engineering, Beijing
merge-and-run mappings,” Int. Joint Conf. Artif. Intell. (IJCAI), Jiaotong University.
2018. His research interests include applications of sta-
[33] M. De Felice and X. Yao, “Short-term load forecasting with tistical learning techniques in the field of electrical
neural network ensembles: A comparative study [application engineering.
notes],” IEEE Comput. Intell. Mag., vol. 6, no. 3, pp. 47–56,
Aug. 2011. Qin Wang received the B.Sc. degree in electrical
[34] G. Huang et al., “Snapshot ensembles: Train 1, get M for free,” presented engineering from Tsinghua University in 2015. He
at the 5th Int. Conf. Learn. Represent. (ICLR), 2017. is currently pursuing the master’s degree with ETH
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Zürich, Switzerland.
presented at the 3rd Int. Conf. Learn. Represent. (ICLR), 2015. His research interests include computer vision and
[36] A. R. Webb, Statistical Pattern Recognition. Chiechester, U.K.: Wiley, deep learning applications.
2003.
[37] L. Zhu and N. Laptev, “Deep and confident prediction for time series
at Uber,” in Proc. IEEE Int. Conf. Data Min. Workshops, New Orleans,
LA, USA, 2017, pp. 103–110.
[38] Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation:
Representing model uncertainty in deep learning,” in Proc. 33rd Int.
Conf. Mach. Learn., New York, NY, USA, 2016, pp. 1050–1059. Ziyu He received the B.Sc. degree from Zhejiang
[39] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and University, Hangzhou, China, in 2015 and the M.S.
R. Salakhutdinov, “Dropout: A simple way to prevent neural networks degree from Columbia University, NY, USA, in
from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2017. He is currently pursuing the Ph.D. degree
Jan. 2014. with the Department of Industrial and Systems
[40] F. Chollet et al. (2015). Keras. [Online]. Available: Engineering, University of Southern California, CA,
https://github.com/fchollet/keras USA.
[41] M. Abadi et al., “TensorFlow: A system for large-scale machine learn- His research interests are optimization and
ing,” in Proc. 12th USENIX Symp. Oper. Syst. Design Implement., machine learning and their applications in energy.
vol. 16. Savannah, GA, USA, 2016, pp. 265–283.
[42] T. Hong et al., “Probabilistic energy forecasting: Global energy fore-
casting competition 2014 and beyond,” Int. J. Forecast., vol. 32, no. 3, Jun Hu (M’10) received the B.Sc., M.Sc., and
pp. 896–913, 2016. Ph.D. degrees in electrical engineering from the
[43] A. J. R. Reis and A. P. A. Da Silva, “Feature extraction via multiresolu- Department of Electrical Engineering, Tsinghua
tion analysis for short-term load forecasting,” IEEE Trans. Power Syst., University, Beijing, China, in 1998, 2000, and 2008,
vol. 20, no. 1, pp. 189–198, Feb. 2005. respectively.
[44] N. Amjady and F. Keynia, “Short-term load forecasting of power systems He is currently an Associate Professor with
by combination of wavelet transform and neuro-evolutionary algorithm,” the Department of Electrical Engineering, Tsinghua
Energy, vol. 34, no. 1, pp. 46–57, Jan. 2009. University. His research interests include overvolt-
[45] A. Deihimi and H. Showkati, “Application of echo state networks in age analysis in power system, sensors and big data,
short-term electric load forecasting,” Energy, vol. 39, no. 1, pp. 327–340, dielectric materials, and surge arrester technology.
Mar. 2012.
[46] S. Li, P. Wang, and L. Goel, “Short-term load forecasting by wavelet
transform and evolutionary extreme learning machine,” Elect. Power Jinliang He (M’02–SM’02–F’08) received the B.Sc.
Syst. Res., vol. 122, p. 96–103, May 2015. degree from the Wuhan University of Hydraulic
[47] Z. Hu, Y. Bao, and T. Xiong, “Comprehensive learning particle swarm and Electrical Engineering, Wuhan, China, in 1988,
optimization based memetic algorithm for model selection in short-term the M.Sc. degree from Chongqing University,
load forecasting using support vector regression,” Appl. Soft Comput., Chongqing, China, in 1991, and the Ph.D. degree
vol. 25, pp. 15–25, Dec. 2014. from Tsinghua University, Beijing, China, in 1994,
[48] S. Li, P. Wang, and L. Goel, “A novel wavelet-based ensemble method all in electrical engineering.
for short-term load forecasting with hybrid neural networks and fea- He became a Lecturer in 1994, and an Associate
ture selection,” IEEE Trans. Power Syst., vol. 31, no. 3, pp. 1788–1798, Professor in 1996, with the Department of Electrical
May 2016. Engineering, Tsinghua University. From 1997 to
[49] S. Li, L. Goel, and P. Wang, “An ensemble approach for short-term 1998, he was a Visiting Scientist with Korea
load forecasting by extreme learning machine,” Appl. Energy, vol. 170, Electrotechnology Research Institute, Changwon, South Korea. From 2014
pp. 22–29, May 2016. to 2015, he was a Visiting Professor with the Department of Electrical
[50] H. Yu, P. D. Reiner, T. Xie, T. Bartczak, and B. M. Wilamowski, Engineering, Stanford University, Palo Alto, CA, USA. In 2001, he was pro-
“An incremental design of radial basis function networks,” IEEE Trans. moted to a Professor with Tsinghua University, where he is currently the Chair
Neural Netw. Learn. Syst., vol. 25, no. 10, pp. 1793–1803, Oct. 2014. with High Voltage Research Institute. He has authored seven books and 600
[51] F. Ziel and B. Liu, “Lasso estimation for GEFCom2014 probabilistic technical papers. His research interests include advanced power transmission
electric load forecasting,” Int. J. Forecast., vol. 32, no. 3, pp. 1029–1037, technology, sensing technology and big data mining, and smart nanodielectric
2016. materials.
Authorized licensed use limited to: University of Chinese Academy of SciencesCAS. Downloaded on September 11,2024 at 09:54:54 UTC from IEEE Xplore. Restrictions apply.