08094863

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Brightness-based Convolutional Neural

Network for Thermal Image
Enhancement
KYUNGJAE LEE, JUNHYEP LEE, JOOSUNG LEE, SANGWON HWANG, AND
SANGYOUN LEE, (Member, IEEE)
Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea
Corresponding author: Sangyoun Lee (e-mail: [email protected])
This work was supported by Institute for Information & Communications Technology Promotion (IITP) grants funded by the Korea
government (MSIP) (No. 2016-0-00197).
ABSTRACT In this study, we propose a convolutional neural network for thermal image enhancement by
incorporating the brightness domain with a residual-learning technique, which improves the performance of
enhancement and speed of convergence. Typically, the training domain uses the same domain as that of the
target image; however, we evaluated several domains to determine the most suitable one for the network. In
the analyses, we first compared the performance of networks that were trained by the corresponding regions
of color-based and aligned infrared-based images, respectively, including thermal, far, and near spectra.
Then, four RGB-based domains, namely, gray, lightness, intensity, and brightness, were evaluated. Finally,
the proposed network architecture was determined by considering the residual and brightness domains. The
results of the analyses indicated that the brightness domain was the best training domain for enhancing
the thermal images. The experimental results confirm that the proposed network, which can be trained
in approximately one hour, outperforms the conventional learning-based approaches for thermal image
enhancement, in terms of several image quality metrics and a qualitative evaluation. Furthermore, the results
demonstrate that the brightness domain is effective as the training domain and can be used to increase the
performance of existing networks.
INDEX TERMS Thermal infrared image, image enhancement, convolutional neural networks.
I. INTRODUCTION Thermal sensors that can be used for capturing HQ thermal

images are expensive, whereas low-cost commercial thermal
CCURATE and high-quality (HQ) thermal infrared im-
A ages are required in a wide variety of applications,
including pedestrian detection [1], [2], surveillance [3], mili-
sensors are limited by low signal-to-noise ratios, blurring,
and halo effects, and are therefore difficult to use in practical
applications.
tary [4], fire detection [5], visual odometry [6], and gas detec-
tion [7]. Although various computer vision technologies have To overcome these challenges, a number of approaches
been developed based on RGB cameras, they suffer from have recently been proposed to enhance low-quality (LQ)
challenging problems, such as changes in illumination and thermal images. Previously, most of the approaches were
dark environments. To address these challenges in grayscale generalized by using the spatial and frequency domains [8],
or color images, thermal images have been introduced. The which include histogram equalization, contrast adjustment,
thermal images are robust in the presence of illumination transformation, empirical mode decomposition, etc. His-
changes and can take advantage of the thermal information togram equalization-based methods were introduced to in-
about the object. This is because thermal cameras are able crease the global contrast by distributing the histogram
to capture temperature information using mid-wavelength of the thermal image approximately equally [9], [10].
infrared (MWIR) (38m) and long-wavelength infrared Shao et al. [11] proposed a new approach to enhance the
(LWIR) (815m) spectra in indoor and outdoor environ- global and local contrast of an infrared image based on a
ments, regardless of the illumination or textural complexity. combination of plateau histogram equalization and discrete
VOLUME 5, 2017 1
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
stationary wavelets. Ni et al. [12] proposed an algorithm to based network, we believe that their comparisons are unfair
enhance and preserve the edges of infrared images based on because they employed datasets that contained completely
wavelet diffusion while reducing the noise. Bai et al. [13] different scenes and patterns. For example, the dataset [25]
suggested a method of contrast enhancement by adopting a they used for their gray-based network has been widely used
multiscale new top-hat transform. A bi-dimensional empiri- in various CNN methods [20][23], [26], whereas the LWIR
cal mode decomposition method was proposed in [8] that first dataset [27] has not. This difference can cause biased results
decomposed the thermal image into several intrinsic mode due to the learning of poor parameters during training.
functions. Then, these functions were expanded and fused Thus, to ensure a fair comparison, we investigated the re-
with the residue at each decomposed level. A variational sults of training each network to the corresponding regions of
infrared enhancement technique [14] was introduced to en- color and aligned IR images in the same dataset. In addition,
hance the edge details and to prevent over-enhancement using we evaluated the performance with the same experimental
a gradient field equalization technique with adaptive dual conditions for the four domains, which were converted from
thresholds, which was obtained using histogram equalization. a color image and then applied to a thermal image. We
Yuan et al. [15] presented a multifaceted approach involving empirically verified that a brightness domain based network
the enhancement of both the image contrast and subtle image achieves better performance than other domain networks, and
details by adaptively manipulating the contrast, sharpness, this can also be applied to existing networks.
and intensity of the image. It should be noted that these In summary, in the proposed network, an HQ thermal
approaches were heuristically designed to adjust the thermal image is generated by pixel-wise summing the LQ ther-
information, and therefore cannot account for various thermal image as the input and the brightness based residu-
mal images, restricting their applications as a consequence. als as the output. The experimental results show that our
More recently, convolutional neural network (CNN) based proposed network outperforms conventional learning-based
methods have achieved record breaking performance com- approaches, as measured by various image quality met-
pared to previous hand-crafted feature based methods in var- rics: 1) full-reference quality assessment: the peak signal-to-
ious vision tasks, such as object detection [16], [17], image noise ratio (PSNR), structural similarity (SSIM) [28], and
recognition [18], [19], and super-resolution images [20] information fidelity criterion (IFC) [29]; and 2) the no-
[22]. One of the first CNN based approaches for enhancing reference quality assessment: the naturalness image quality
thermal images was suggested by Choi et al. [23], who evaluator (NIQE) [30], no-reference perceptual blur met-
designed a relatively shallow CNN inspired by the proposal ric (NPBM) [31], contour volume (CV) [32], and uniform
in [20]. CNNs have been successful not only in enhancing the intensity-distribution (UID) [32].
thermal image quality, but also in verifying the performance
The contributions of this paper are as follows. First, to
improvements in a variety of applications, including pedes-
the best of our knowledge, our approach is the first attempt
trian detection, visual odometry, and image registration, on
to design a residual learning CNN trained using the bright-
the basis of enhanced thermal images.
ness domain for enhancing thermal images, which improves
In this paper, we propose a residual learning [21] based
the speed of convergence and performance of enhancement.
thermal image enhancement convolutional neural network
Second, we explored several public thermal datasets to col-
(TIECNN)1 that was motivated by [22]. Since the input LQ
lect high-quality thermal images with which to evaluate the
and output HQ images are highly correlated, it is sufficient
performance of our network for general purposes, while
to train only the high-frequency components using residual
considering various environments and situations. Third, we
learning. In addition, this method can solve the vanish-
compared the performance of various domains as training
ing/exploding gradients problem [24]. In the case of a super-
data and validated that the brightness domain based network
vised learning based CNN, the choice of training images sig-
achieved the best accuracy by way of two experimental
nificantly impacts the performance of the network. For super-
studies, namely: 1) evaluate the networks trained by the
resolution color images, Dong et al. [20] explored the per-
corresponding regions of a grayscale and aligned IR (either
formance on different channels and experimentally showed
thermal-, far-, or near-) image pair in the same dataset;
differences in the network accuracy depending on the training
and 2) conduct experiments with the four training domains
domain. They demonstrated that a Y only channel (luma)-
that can be converted from an RGB image to the range of
based network achieved credible results. However, due to the
the thermal image, i.e., the gray, lightness, intensity, and
range difference between luma and thermal images, networks
brightness domains. Finally, we present a comparative study
intended for use with thermal images cannot be trained by or
with state-of-the-art methods, based on a number of metrics.
based on the luma channel. Choi et al. [23] compared the
performance of a network based on gray and MWIR images The remainder of this paper is organized as follows. Sec-
using different datasets. Although they found that the gray- tion II describes the architecture of the proposed network.
based network provided better performance than the MWIR- The experimental results and discussion are presented in
Section III. Finally, the conclusions from this research are
1 The implementation is available at https://sites.google.com/view/kjaelee/ stated in Section IV.
tiecnn.
2 VOLUME 5, 2017
Image
Feature Extraction Mapping
Reconstruction
Conv Filters Conv Filters
Training
( , ) (1, ) (3, ) (1, ) (3, 1)
Low-quality High-quality
Brightness Image Brightness Image
Implementation
Brightness-based Network
Low-quality High-quality
Thermal Image Thermal Image
FIGURE 1. Structure of the proposed network. The network is composed of multiple layers for feature extraction and mapping, and a single layer for image
reconstruction. This network is trained by the brightness domain and predicts a residual image. A high-quality thermal image is generated by summing an input
low-quality thermal image and the output residuals.
II. PROPOSED NETWORK follows: nf Conv(sf , df ). Each variable should be deter-

In this section, we describe the proposed network and its mined in consideration of the following: nf represents the
training strategy for thermal image enhancement. Our net- low-level features, such as edges or corners, that can be
work architecture is illustrated in Fig. 1. The network adopts extracted at the lower level of the layer, and more complex
residual learning and only learns the high-frequency infor- features, such as textures, can be extracted at the higher
mation between the LQ and HQ images, and is trained by level [33]; sf indicates that a large size convolutional kernel
the brightness image instead of the thermal image. In other can be replaced with multiple stages of a small size kernel to
words, the LQ and ground truth HQ images of the brightness reduce both the number of parameters and the computational
domain are used in the training stage. In the test stage, the cost while maintaining the same receptive field [34]; d repre-
residual image is added to the input LQ thermal image to sents the number of LQ feature dimensions, which is a factor
recover the HQ thermal image. that influences the performance. Therefore, it is important to
The network consists of three blocks: feature extraction, determine optimal values of all variables.
mapping, and image reconstruction. The convolution layers Mapping. The features extracted from the previous block are
in each block are denoted nConv(s, d), where the variables non-linearly mapped by this block, which consists of three
n, s, and d indicate the number of layers, size of the filter, modules: shrinking, non-linear mapping, and expansion. The
and feature dimensions, respectively. These variables are authors in [22] show that these modules reduce the number
important factors that determine the network performance. of parameters and achieve better performance than a single
The analysis of each variable for the optimal network is convolution layer.
described in Section III, and the details of the network design The shrinking module is meant to reduce the number of
are explained in the following subsections. feature dimensions df to a shallow feature dimension dm
(dm < df ) by a 11 convolution [35] that acts as a linear
A. ARCHITECTURE combination on the LQ features, which can be represented
Feature Extraction. To extract the features of the high- as Conv(1, dm ). By doing a 11 convolution, the features
frequency information, a set of feature maps are extracted with similar properties from multiple feature-maps can be
from the input (LQ) image by way of a convolution. A grouped, which affects the following non-linear mapping
high-dimensional vector is used to represent image patches module. This reduces the computational cost, allows the non-
extracted from the input image. These vectors are composed linear mapping module to be deeper, and has the advantage of
of a set of feature maps, and through the network, the feature providing additional non-linearity by the activation function
map of the image patch is learned from the training data. of the deeper modules.
The feature extraction block consists of nf convolutional The non-linear mapping is the most important module in
layers with a kernel size sf which outputs df features as terms of enhancing the LQ thermal image, which determines
VOLUME 5, 2017 3
(a) (b) (c) (d)
(e) (f) (g) (h)

FIGURE 2. Examples of the test dataset (TIRset26). (ac): LTIR [38] (street-172, crowd-25, trees-341); (d, e): Morris [41] (drive-Video_3, drive4-Video_50); (f, g):
TMD [39] (Scene1-1645, Scene3-876); (h): TID [40] (Sempach-BG2_328).
the accuracy and complexity. Since the number of layers B. TRAINING

and feature dimensions affect this performance, the values of Zero Padding. We employed zero padding on all layers to
these variables should be carefully determined. This module avoid reducing the output size by the convolution of each
is composed of nm layers with the same kernel size 3 layer so that the input and output feature maps have the same
and feature dimensions dm , which can be represented as size.
nm Conv(3, dm ). PReLU. Each convolutional layer, except for the last layer
In the expansion modules, the mapped features are ex- (the image reconstruction block), is followed by an activation
panded to the same size as the feature dimensions df of function. We employed a parametric rectified linear unit
the feature extraction block. This layer acts like an inverse (PReLU) [36] instead of the more commonly used rectified
operation of the shrinking module. By expanding the feature linear unit (ReLU) as the activation function. In the ReLU,
dimension, the amount of information required for high- the negative part is zero, whereas the PReLU differs in that
quality reconstruction is increased. To maintain consistency it has a learn-able parameter that adjusts the slope of the
with the shrinking module, a convolution layer with a 11 negative part during the learning process, which improves the
filter is employed, which can be expressed as Conv(1, df ). accuracy at a negligible extra computational cost. Therefore,
In summary, the mapping block is structured as the PReLU is robust to the weakness of ReLU that may occur
Conv(1, dm ) nm Conv(3, dm ) Conv(1, df ). when the input value is less than zero.
Image Reconstruction. This block aggregates the detailed Residual Learning. The LQ and HQ images are highly
information and predicts a high-frequency (residuals) image. correlated except for the image details, and the difference
CNNs for image enhancement generally learn using the between the details is very small. This means that it is
targeted image domain, which means that the domain of the sufficient to predict only the high-frequency components for
training image is equal to the domain of the test image. HQ image generation. Therefore, we designed our network
However, the thermal image was not used in the training to predict the residuals. In addition, we achieve better perfor-
stage of our network because we found that reconstruc- mance with faster convergence by residual learning based on
tion guided by the brightness domain outperform networks the brightness domain rather than other domains, including
trained using other domains, as will be discussed in the infrared (thermal-, far-, near-) and color-based (gray, light-
following section. Therefore, the image reconstruction block ness, intensity) images. This is discussed in detail in the next
predicts the residuals trained by the brightness domain. section.
A convolution layer with a 33 convolution is employed to Loss Function. The training process of the network aims
reconstruct the detailed information, Conv(3, 1). The train- to minimize the loss between the predicted images and the
ing of the network is learned by the brightness domain, and in corresponding high-quality images (ground truth). The HQ
the test, the residuals resulting from the input thermal image image is composed of a pixel-wise summation of the LQ
and the input image are element-wisely combined to produce and residual images, and thus the parameter learned in the
a high-quality output image. Therefore, the image predicted network are the residuals R between the input LQ image X
by the network is expected to be similar to the differences and the ground truth Y , R = Y X. We used the Euclidean
(residuals) between the HQ and LQ images. loss, which calculates the sum of squares of differences
4 VOLUME 5, 2017
TABLE 1. The specifications of TIRSet26 used as test dataset.
Dataset Sensor Resolution Category Selected Frame ID

garden 615
FLIR Tau 320 324256
horse 311
saturated 195
AIM QWIP 640480
street 172
crouching 222
crossing 120
LTIR Dataset v1.0 [38] depthwise crossing 851
FLIR A655SC 640480 jacket 1310
quadrocopter 2 950
selma 186
trees 341
FLIR T640 640480 birds 270
FLIR A65 640512 crowd 17, 25
Scene1 1200, 1645
Trimodal Dataset [39] Axis Q1922 640456 Scene2 1580
Scene3 876
Sempach-8 264
Thermal Infrared Dataset [40] FLIR Tau 320 324256 Sempach-12 215
Sempach-BG2 328
drive Video_0003
Morris Dataset [41] Miricle 110KS 384288 drive2 Video_0029, Video_0034
drive4 Video_0050, Video_0052
between its two inputs, as an objective function: analyze the optimal design of the network using this do-
N 2 main to maximize the performance of our network. We then
1 X compare our TIECNN to state-of-the-art algorithms based on
L (Y , Y ; ) = Yi Yi

2N i=1 2 quantitative and qualitative evaluations.
N 2 We carefully collected a total of 26 high-quality thermal
1 X
= (Xi + Ri ) (Xi + Ri ) images from various datasets to create a test dataset for

2N i=1 2
thermal image enhancement that considers various situations,
N 2 environments, and applications. This test set (TIRSet26) was
1 X
= Ri Ri , (1)

collected from the LTIR Dataset version 1.0 (LTIR) [38],
2N i=1 2
Trimodal Dataset (TMD) [39], Thermal Infrared Dataset
where is a set of learned parameters using N training (TID) [40], and Morris Dataset (Morris) [41]. We removed
samples, and Y and Ri are the predicted HQ and predicted the top 24 rows (black regions) of the TID [40] and the
residual images, respectively. The LQ image X is gener- bottom 2 rows (white regions) of the Morris [41] in our exper-
ated by downsampling the ground truth HR image and then iments. Detailed information regarding the experiments, such
upsampling to the original size by a scale factor using the as the sensor and image resolution, is presented in Table 1,
bicubic algorithm. and Fig. 2 shows examples of the test set.
The loss is minimized by using a gradient-based op- For an objective comparison, we conducted all experi-
timization method known as adaptive moment estimation ments for the analysis of training domain and network archi-
(Adam) [37]. We initialized the weights of the convolutional tecture using the TIRSet26 with a scale factor of two and an
filters activated by PReLU using the method [36], and the average PSNR of 40.936. Unless specified otherwise, the ex-
weights of the last layer without the activation function were perimental conditions were fixed: nf = 2, sf = 3, df = 56,
initialized by randomly selecting a value from a Gaussian nm = 2, dm = 12 for our model, as listed in Table 2. The
distribution with zero mean and standard deviation 0.001. size of sub-images was set to 32 32 with no overlap, and
the stride to 32 with batches of size 128. The learning rate
III. EXPERIMENTS was initialized to 1e 3 for all layers (except the last layer)
To learn the network structure for enhancing a thermal image, and decreased by a factor of 10 for every 50 epochs until
it is necessary to use a large number of high-quality thermal 200 epochs had passed with momentum 0.9 and weight decay
images that have a variety of patterns as a ground truth. In 1e 4. Since it is important for the network convergence to
this section, we determine the optimum domain to use for have a smaller learning rate at the last layer [20], the learning
training, and experimentally verify that a brightness-based rate of the last layer was initialized to 1e 4, which is 0.1
network provides the best performance. Subsequently, we times smaller than that of the other layers.
VOLUME 5, 2017 5
(a) Thermal (top) and gray (bottom) images
(b) FIR (top) and gray (bottom) images
(c) NIR (top) and gray (bottom) images

FIGURE 3. Examples of the training sample images in the multi-domain datasets. (a) The Multimodal Stereo dataset 2 (MSD2); (b) Visible-FIR Day-Night
Pedestrian Sequence dataset (VFD); (c) IVRL dataset (IVRLD). Each infrared image was aligned with the corresponding gray image.
TABLE 2. Network configuration for analysis. low quality limited environments and patterns, compared to
color image datasets [22], [25], [45][47]. This implies that
Configuration Input Output Kernel size parameters learned from the thermal image have limitations
LQ Conv-E1 3 3 1 56 in quality enhancement. Thus, it may be reasonable to utilize
Feature Extraction
Conv-E1 Conv-E2 3 3 56 56 a sufficient number of color images as a training dataset
Conv-E2 Conv-S 1 1 56 12 for thermal image enhancement, as an alternative to thermal
Conv-S Conv-M1 3 3 12 12 images.
Mapping
Conv-M1 Conv-M2 3 3 12 12
Conv-M2 Conv-L 1 1 12 56 To validate this alternative, we evaluated the predicted
Conv-L Conv-R 3 3 56 1 results learned by each domain using a multi-modal
Image Reconstruction dataset [48], [49] that had similar color quality and aligned
Conv-R & LQ HQ Element-wise sum
thermal images. In this comparison, a gray scale image con-
verted from a color image was used. Furthermore, to identify
The numerical results were evaluated in terms of the peak the possibility of higher performance in other domains, the
signal-to-noise ratio (PSNR) in decibels (dB) vs. the ground FIR [50] and NIR [51][53] datasets were employed.
truth. After analyzing the results trained by the gray scale and IR
images, we investigated to determine the optimal color-based
A. TRAINING DOMAIN single domain that could be transformed from the color image
and was applicable to the thermal image.
In this section, we compare the performance across domains
to determine which domain achieves the highest accuracy.
Since CNNs learn optimal parameters through training using 1) Infrared vs. Grayscale
the ground truth and low-quality images generated by the The Multimodal Stereo Dataset 2 (MSD2) [48], [49], Visible-
ground truth, choosing the pertinent domain is a very impor- FIR Day-Night Pedestrian Sequence Dataset (VFD) [50],
tant factor in CNNs. and IVRL Dataset (IVRLD) [51][53] were used for the
The thermal image datasets [40], [42][44] have relatively comparisons of thermal, FIR, and NIR, respectively. Each
6 VOLUME 5, 2017
(a) (b) (c) (d) (e)

FIGURE 4. Domain examples using the 91-images. (a) color; (b) gray; (c) lightness; (d) intensity; and (e) brightness.
dataset consists of color and aligned IR images, and we TABLE 3. Performance table for infrared vs. gray analysis.
converted the color image to the grayscale image, according (a) Thermal vs. gray
to the following equation: Training Dataset - MSD2 [48], [49]
Epoch
Gray = 0.299 R + 0.587 G + 0.114 B, (2) Trained by 10 50 100 150 200
Thermal 41.611 42.301 42.336 42.340 42.340
Gray 41.905 42.395 42.470 42.475 42.475
where R, G, and B are the values of the red, green, and Difference 0.294 0.094 0.134 0.135 0.135
blue channels in a color image, respectively, and are scaled
between 0 and 1. (b) FIR vs. gray
Instead of using all regions of the image as training data, Training Dataset - VFD [50]
we cropped the region of interest to contain various patterns FIR 41.481 42.116 42.241 42.251 42.252
for the training, as shown in Fig. 3. The sampling has the Gray 41.697 42.453 42.487 42.491 42.491
corresponding position and same size in the gray and IR Difference 0.216 0.337 0.246 0.240 0.239
images, and the number of sample images and sub-images
were 112, 4096 in MDS2, 81, 1496 in VFD, and 56, 3072 in (c) NIR vs. gray
IVRLD, respectively. Training Dataset - IVRLD [51][53]
It can be seen that the quality of the output image improved NIR 42.197 42.367 42.402 42.400 42.400
with respect to the quality of the input image in all domains. Gray 42.045 42.049 42.167 42.162 42.165
In detail, in the results of the MSD2 and VFD, the network Difference 0.152 0.318 0.235 0.238 0.235
that learned using the gray domain had better image enhance-
ment performance than the IR (thermal- and far-) domains,
as shown in (a-b) of Table 3, respectively. Surprisingly, in the
IVRLD, we observed that the output accuracy of the network as per the following equations:
that learned using the NIR domain was better than that of the
gray model, which is unlike the previous ones, as shown in 1
Lightness = (max(R, G, B) + min(R, G, B)), (3)
(c) of Table 3. 2
1
Intensity = (R + G + B), (4)
3
2) Domains Converted from RGB Space Brightness = max(R, G, B). (5)
In order to find the domain that can maximize the learning In the experiments, the 91-image dataset [25] was used,
effect when using a color image dataset, we compared the and examples of the converted domain are shown in Fig. 4. A
domains that could be converted from RGB space to the total of 17,152 sub-images were used to train the network.
range of the thermal image without loss of information. As shown in Fig. 5 and Table 4, the network trained by
Accordingly, to train the network, we converted the RGB the gray outperformed the one that used the lightness and
space into the gray, lightness (L in HSL), intensity (I in HSI), intensity domains, but the networks based on the brightness
and brightness (V in HSV) domains. The gray domain was domain provided better performance compared to the gray-
converted by using (2), and the other domains were converted based network.
VOLUME 5, 2017 7
4 3 .0 4 2 .6
4 2 .8 4 2 .3
P S N R (d B )
P S N R (d B )
4 2 .6 4 2 .0
G ra y
4 2 .4
L u m in a n c e N e a r-in fra re d
4 1 .7
In te n s ity G ra y
B rig h tn e s s B rig h tn e s s
4 2 .2 4 1 .4
0 5 0 1 0 0 1 5 0 2 0 0 0 5 0 1 0 0 1 5 0 2 0 0
E p o c h E p o c h
FIGURE 5. Performance graph of the PSNR for gray, lightness, intensity, and FIGURE 6. Performance graph of the PSNR for the comparison of
brightness domain analysis. near-infrared, gray, and brightness domains.
TABLE 4. Performance table of the PSNR for the networks trained by the
domain of gray, lightness, intensity, and brightness, respectively. To summarize, although a one-to-one comparison between
the gray-based and IR-based networks shows that the NIR-
Training Dataset - 91-images [25] based model provided slightly higher performance than that
Epoch of the gray-based model (see Table 3 (c)), we observed that
the brightness domain converted from the 91-image dataset
Domain 10 50 100 150 200
was the best (see Table 4). Therefore, we concluded the best
Gray 42.527 42.834 42.873 42.881 42.889
choice was to convert color images into brightness images
Lightness 42.431 42.775 42.778 42.820 42.818
Intensity 42.352 42.798 42.804 42.809 42.818
based on the public database, and then to use this domain
Brightness 42.684 42.870 42.917 42.927 42.926 when training the network parameters for thermal image
enhancement.
B. NETWORK ARCHITECTURE
We now study the performance of the networks trained
by the gray and brightness domains in detail. First, the To determine the optimal structure for the proposed network,
brightness-based network was more accurate than the gray- we studied the residual learning and the variables of the
based network and showed similar performance to the NIR- image extraction and mapping block using the 91-images as
based network, as shown in Fig. 6 in which all models were a training dataset, which were converted to the brightness
trained using the IVRLD related to Table 3 (c). domain in consideration of the above experiments.
Second, we compared the performance by adopting data
1) Residual Learning
augmentation to the two domains, i.e., the brightness and
gray domains. We augmented the data in three ways: rotation, The network architecture and experimental conditions were
scaling, and flipping. Each image was rotated by 90 , 180 , the same as those described earlier. Fig. 7 shows the con-
and 270 , and downscaled by factors of 0.5 to 1.0 by 0.1, vergence curves in terms of the average PSNR on the test
and flipped vertically. A total of 463,744 sub-images were dataset. The performance of the non-residual network fluctu-
used for the training, and the learning rate of all layers ated significantly. On the other hand, the residual network
was initialized to 1e 3 (set to 1e 4 for the last layer) was stable, converged rapidly, and outperformed the non-
and decreased by a factor of 10 for every 20 epochs until residual network. Therefore, it can be seen that even though
80 epochs had passed. In Table 5, it can be seen that the the network was trained from a different domain, it can still
performance of networks trained by the brightness domain be used to enhance the thermal image.
was superior to that of the gray, and at the same time, the
data argumentation improved the network performance. 4 3
4 2
TABLE 5. Performance table of the PSNR for the analysis of gray vs.
P S N R (d B )
brightness domains using the augmented 91-images. 4 1
Training Dataset - Augmented 91-image [25] 4 0
Epoch N o n -R e s id u a l
3 9 R e s id u a l
Domain 10 20 40 60 80
0 5 0 1 0 0 1 5 0 2 0 0
Gray 43.056 43.021 43.154 43.188 43.203 E p o c h
Brightness 43.167 43.196 43.253 43.245 43.249
FIGURE 7. Performance graph of the PSNR for residual and non-residual
Difference 0.111 0.175 0.098 0.057 0.046
networks.
8 VOLUME 5, 2017
TABLE 6. Comparison with different sizes of filter and different numbers of layers in the feature extraction block.
Feature Extraction
Conv(3) Conv(5) Conv(7) Conv(9) 2Conv(3) 2Conv(5) Conv(3-5) Conv(5-3) 3Conv(3) 4Conv(3)
42.60 42.70 42.64 42.73 42.93 42.99 42.95 42.91 43.01 43.01
TABLE 7. Comparison of the performance and the number of parameters with different variables.
The others Mapping

Filter Dimension (d) Depth (nm )
df dm 2 3 4 5 6
8 42.95 (44256) 42.99 (44832) 43.04 (45408) 43.07 (45984) 43.03 (46560)
48 12 42.99 (46080) 43.07 (47376) 43.06 (48672) 43.08 (49968) 43.07 (51264)
16 43.04 (48480) 43.04 (50784) 43.08 (53088) 43.14 (55392) 43.11 (57696)
8 42.95 (59504) 43.05 (60080) 43.00 (60656) 43.05 (61232) 43.07 (61808)
56 12 43.01 (61392) 43.00 (62688) 43.02 (63984) 43.05 (65280) 43.09 (66576)
16 43.04 (63856) 43.06 (66160) 43.02 (68464) 43.09 (70768) 43.08 (73072)
2) Feature Extraction Block 3) Mapping Block

As in the above experiments, we compared the mapping
The feature extraction block of the network consists of mul-
and the remaining blocks in the proposed network with
tiple convolutional layers. Considering the number of pa-
different depths of the mapping block nm = 2, 3, 4, 5, 6, filter
rameters, the convolutional kernel size and number of layers
dimensions of the mapping layers df = 48, 56 and the others
of this block were compared experimentally with different
dm = 8, 12, 16. To reflect the above experimental results, we
combinations of nf = 1, 2, 3, 4, and sf = 3, 5, 7, 9, as
set nf = 3 and sf = 3 in the feature extraction block.
shown in Table 6. Depending on these two variables, the
An analysis of the results in Table 7 shows that the number
receptive field and non-linearity can be increased, but the
of parameters increased significantly with increasing df , but
number of parameters can also increase, and hence these
the performance improvement was insufficient. However, as
variables should be set carefully.
nm and dm became larger, the parameters increased slightly,
In Table 6, the experimental results show that the perfor- but we can observe that the performance generally improved.
mance depends on the configuration. Three and four layers Based on these results, the most optimal structure was deter-
with 33 convolution provides a higher performance in mined to be nm = 5, dm = 16, and df = 48, considering
terms of the PSNR compared to the others. Thus, we set both the performance and the number of parameters.
nf = 3 and sf = 3, which achieves comparable result with To summarize, the parameters of the proposed network
a smaller number of parameters than nf = 4 and sf = 3. were set to nf = 3, sf = 3, df = 48, nm = 5, dm = 16 for
thermal image enhancement, as described in Table 8.
C. COMPARISONS WITH THE STATE-OF-THE-ART

To validate our proposed network, we compared it with
TABLE 8. Detailed configuration of the proposed network. state-of-the-art algorithms: the super-resolution convolu-
tional neural network (SRCNN-Ex) [20] and the thermal
Configuration Input Output Kernel Size image enhancement using the convolutional neural network
(TEN) [23]. Although SRCNN-Ex was originally proposed
LQ Conv-E1 3 3 1 48
Feature Extraction Conv-E1 Conv-E2 3 3 48 48 for super-resolution images, a low-quality image with the
Conv-E2 Conv-E3 3 3 48 48 same output size was used as the input. In other words, it
Conv-E3 Conv-S 1 1 48 16 can be utilized for image enhancement.
Conv-S Conv-M1 3 3 16 16 Our implementation of SRCNN-Ex is based on its released
Conv-M1 Conv-M2 3 3 16 16
Mapping Conv-M2 Conv-M3 3 3 16 16
source code, and we implemented TEN by ourselves. We
Conv-M3 Conv-M4 3 3 16 16 trained a specific network for each scaling factor {2, 3, 4}
Conv-M4 Conv-M5 3 3 16 16 according to [20], [23], [54]. To ensure a fair comparison
Conv-M5 Conv-L 1 1 16 48
of the enhancement quality of a thermal image, all methods
Conv-L Conv-R 3 3 48 1 were implemented in Caffe [55] using a single NVIDIA
Image Reconstruction
Conv-R & LQ HQ Element-wise sum
1080 Ti graphics card and trained using the augmented 91-
VOLUME 5, 2017 9
Original (PSNR / SSIM / IFC) Bicubic (41.81 / 0.967 / 3.018)
SRCNN-Ex [20] (43.77 / 0.973 / 3.651) TEN [23] (43.93 / 0.973 / 3.584) Ours (44.11 / 0.974 / 3.667)
FIGURE 8. Visual comparison of the TIRset26 (LTIR [38]-saturated-195) with a scale factor of three.
image dataset. The strategy of the data argumentation was other layers, and decreased by a factor of 10 for every 20
identical to that described above. During training, 48 48 epochs with a momentum parameter 0.9 and weight decay
sub-images and batches of size 128 were applied. We trained 5.0 1e 4. Training with 50 epochs was sufficient and took
the SRCNN-Ex optimized by two versions of Stochastic approximately one hour.
Gradient Descent (SGD) [56] and Adam until 1000 epochs
To objectively evaluate the thermal image enhancement
had passed, respectively, and TEN optimized by Adam until
performance, seven image quality metrics were used: the
100 epochs had passed. The training of the SRCNN-Ex with
PSNR, SSIM [28], and IFC [29] for full-reference quality
57,184 parameters and TEN with 63,840 parameters took
assessment using the TIRset26; NIQE [30], NPBM [31],
approximately 12 and 1.5 h, respectively.
CV [32], and UID [32] for no-reference quality assessment
using the KAIST Dataset [42].
For training the proposed network, the learning rate was
initialized to 1e 4 for the last layer and 1e 3 for the Table 9 presents the quantitative results of our full-
TABLE 9. Quantitative evaluation of full-reference quality assessment on the TIRset26 dataset. The average results of PSNR, SSIM, and IFC for scale factors 2, 3,
and 4, with higher values indicating better performance. Red text indicates the best performance, and blue text indicates the second-best performance.
Algorithm
TIRSet26
Gray based Brightness based
Evaluation Scale Bicubic SRCNN-Ex [20] SRCNN-Ex [20] Ours
TEN [23] TEN [23]
metric factor SGD Adam (Adam)
2 40.94 42.96 42.99 43.07 43.04 43.16 43.41
PSNR 3 37.58 39.28 39.34 39.42 39.37 39.49 39.72
4 35.46 36.86 37.01 37.09 37.02 37.12 37.36
2 0.965 0.973 0.973 0.973 0.973 0.973 0.974
SSIM 3 0.936 0.947 0.948 0.948 0.948 0.948 0.950
4 0.909 0.921 0.922 0.923 0.922 0.923 0.926
2 5.869 7.179 7.335 7.273 7.277 7.237 7.435
IFC 3 3.400 3.899 4.037 4.046 3.965 4.021 4.122
4 2.574 2.502 2.566 2.596 2.573 2.257 2.712
10 VOLUME 5, 2017
Original (7.893 / 0.353 / 4.448 / 40.71) SR.-Ex [20] (7.394 / 0.308 / 5.601 / 42.18) TEN [23] (7.065 / 0.317 / 4.830 / 42.33) Ours (6.207 / 0.295 / 5.875 / 42.62)
FIGURE 9. Visual comparison of the KAIST [42] with a scale factor of four, (NIQE / NPBM / CV / UID). (top) test-set11-v000-I01537;
(middle) train-set05-v000-I01619; (bottom) train-set02-v000-I00544.
reference quality assessment using the TIRset26. Our method KAIST dataset. This dataset provides a total of 95,324 low-
outperforms the other methods for all metrics and scale quality thermal images, and is composed of 50,184 training
factors. In addition, to verify the effect of the brightness and 45,140 test sets. We evaluated each of these two sets. The
domain, we also conducted an experiment using the gray and results obtained using the proposed network are compared
brightness domains on SRCNN-Ex and TEN, respectively. to those for SRCNN-Ex and TEN, which were trained using
The experimental results of the gray and brightness based the brightness domain and optimized by the Adam algorithm.
networks show that the performance in terms of the PSNR Table 10 shows that the proposed method outperforms the
and SSIM is improved by only changing the training domain compared methods for all quality metrics.
from gray to brightness in the compared algorithms. Thus,
training based on the brightness domain is an effective means Qualitative comparisons are shown in Figs. 8 and 9, and
of improving the performance of CNNs for thermal image it can be seen that the results of the proposed method are
enhancement. perceptually better. Thus, the proposed method provides the
The following describes the performance evaluation by no- best performance among all compared methods for both the
reference quality assessment with a scale factor of four on the quantitative and qualitative evaluations.
TABLE 10. Quantitative evaluation of the no-reference quality assessment on the training and test sets of the KAIST Dataset [42]. The average results of NIQE,
NPBM, CV, and UID with a scale factor of four. Lower values of NIQE and NPBM, and higher values of CV and UID indicate better performance. Red text indicates
the best performance, and blue text indicates the second-best performance.
KAIST [42]-Train Dataset KAIST [42]-Test Dataset

Algorithm
NIQE NPBM CV UID NIQE NPBM CV UID
Original 7.604 0.369 4.591 40.77 7.442 0.370 5.070 44.41
SRCNN-Ex [20] 7.211 0.320 6.044 42.47 7.005 0.323 6.608 46.40
TEN [23] 7.172 0.328 5.392 42.53 7.023 0.331 5.873 46.47
Ours 6.356 0.307 6.318 42.65 6.060 0.310 6.933 46.66
VOLUME 5, 2017 11
IV. CONCLUSION [13] X. Bai, F. Zhou, and B. Xue, Infrared image enhancement through
The primary objective of this study was to enhance the contrast enhancement by using multiscale new top-hat transform, Infrared
Physics & Technology, vol. 54, no. 2, pp. 6169, 2011.
quality of thermal images. To achieve this, we explored [14] W. Zhao, Z. Xu, J. Zhao, F. Zhao, and X. Han, Variational infrared image
various domains, including RGB-based and multiple infrared enhancement based on adaptive dual-threshold gradient field equaliza-
images, and conducted a number of experiments to deter- tion, Infrared Physics & Technology, vol. 66, pp. 152159, 2014.
[15] L. T. Yuan, S. K. Swee, and T. C. Ping, Infrared image enhancement using
mine the most relevant training domain and the structure of adaptive trilateral contrast enhancement, Pattern Recognition Letters,
the proposed network. Through experimental analyses, we vol. 54, pp. 103108, 2015.
determined that training the network based on the bright- [16] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time
object detection with region proposal networks, in Neural Information
ness domain, which is a transformation of the RGB dataset Processing Systems (NIPS), 2015, 2015.
containing various patterns, was the most effective. The [17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look
brightness domain was then applied to thermal image en- once: Unified, real-time object detection, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2016, pp. 779
hancement through residual learning. In particular, the use of 788.
the brightness domain was an important factor that improved [18] K. Simonyan and A. Zisserman, Very deep convolutional networks for
the performance not only in our network but also in the large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[19] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image
previous method. To validate our proposed method, a test recognition, in Computer Vision and Pattern Recognition (CVPR), 2016
dataset based on high-quality thermal images was carefully IEEE Conference on, 2016.
selected from public datasets while considering various situ- [20] C. Dong, C. C. Loy, K. He, and X. Tang, Image super-resolution using
deep convolutional networks, IEEE transactions on pattern analysis and
ations, environments, and sensors. The results of comparative machine intelligence, vol. 38, no. 2, pp. 295307, 2016.
experiments demonstrated that our network outperformed all [21] J. Kim, J. Kwon Lee, and K. Mu Lee, Accurate image super-resolution
other approaches in terms of both quantitative and qualitative using very deep convolutional networks, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646
evaluations. We believe that our approach shows good poten- 1654.
tial in thermal image-based applications. In our future work, [22] C. Dong, C. C. Loy, and X. Tang, Accelerating the super-resolution
we will include a single network to handle the multi-scale convolutional neural network, in European Conference on Computer
Vision. Springer, 2016, pp. 391407.
and multi-spectral images. [23] Y. Choi, N. Kim, S. Hwang, and I. S. Kweon, Thermal image en-
hancement using convolutional neural network, in Intelligent Robots and
REFERENCES Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE,
[1] X. Zhao, Z. He, S. Zhang, and D. Liang, Robust pedestrian detection 2016, pp. 223230.
in thermal infrared imagery using a shape distribution histogram feature [24] Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies
and modified sparse representation classification, Pattern Recognition, with gradient descent is difficult, IEEE transactions on neural networks,
vol. 48, no. 6, pp. 19471960, 2015. vol. 5, no. 2, pp. 157166, 1994.
[2] J. Baek, S. Hong, J. Kim, and E. Kim, Efficient pedestrian detection at [25] J. Yang, J. Wright, T. S. Huang, and Y. Ma, Image super-resolution via
nighttime using a thermal camera, Sensors, vol. 17, no. 8, p. 1850, 2017. sparse representation, IEEE transactions on image processing, vol. 19,
[3] W. K. Wong, H. L. Lim, C. K. Loo, and W. S. Lim, Home alone no. 11, pp. 28612873, 2010.
faint detection surveillance system using thermal camera, in Computer [26] B. Yue, S. Wang, X. Liang, L. Jiao, and C. Xu, Joint prior learning for
Research and Development, 2010 Second International Conference on. visual sensor network noisy image super-resolution, Sensors, vol. 16,
IEEE, 2010, pp. 747751. no. 3, p. 288, 2016.
[4] A. C. Goldberg, T. Fischer, and Z. I. Derzko, Application of dual-band [27] M. S. Kristoffersen, J. V. Dueholm, R. Gade, and T. B. Moeslund, Pedes-
infrared focal plane arrays to tactical and strategic military problems, trian counting with occlusion handling using stereo thermal cameras,
in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Sensors, vol. 16, no. 1, p. 62, 2016.
Series, vol. 4820, 2003, pp. 500514. [28] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image
[5] B. C. Arrue, A. Ollero, and J. M. De Dios, An intelligent system for false quality assessment: from error visibility to structural similarity, IEEE
alarm reduction in infrared forest-fire detection, IEEE Intelligent Systems transactions on image processing, vol. 13, no. 4, pp. 600612, 2004.
and Their Applications, vol. 15, no. 3, pp. 6473, 2000. [29] H. R. Sheikh, A. C. Bovik, and G. De Veciana, An information fidelity
[6] P. V. K. Borges and S. Vidas, Practical infrared visual odometry, IEEE criterion for image quality assessment using natural scene statistics, IEEE
Transactions on Intelligent Transportation Systems, vol. 17, no. 8, pp. Transactions on image processing, vol. 14, no. 12, pp. 21172128, 2005.
22052213, 2016. [30] A. Mittal, R. Soundararajan, and A. C. Bovik, Making a AIJcompletely
[7] A. Prata and C. Bernardo, Retrieval of volcanic ash particle size, mass blindAI image quality analyzer, IEEE Signal Processing Letters, vol. 20,
and optical depth from a ground-based thermal infrared camera, Journal no. 3, pp. 209212, 2013.
of Volcanology and Geothermal Research, vol. 186, no. 1, pp. 91107, [31] F. Crete, T. Dolmiere, P. Ladret, and M. Nicolas, The blur effect: per-
2009. ception and estimation with a new no-reference perceptual blur metric. in
[8] B.-S. Yang, F. Gu, A. Ball et al., Thermal image enhancement using bi- Human vision and electronic imaging, vol. 12, no. 6492, 2007, p. 64920.
dimensional empirical mode decomposition in combination with relevance [32] H. Yao, M.-Y. Huseh, G. Yao, and Y. Liu, Image evaluation factors,
vector machine for rotating machinery fault diagnosis, Mechanical Sys- Image Analysis and Recognition, pp. 255262, 2005.
tems and Signal Processing, vol. 38, no. 2, pp. 601614, 2013. [33] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional
[9] Q. Chen, L.-F. Bai, and B.-M. Zhang, Histogram double equalization in networks, in European conference on computer vision. Springer, 2014,
infrared image, Journal of Infrared and Millimeter Waves, vol. 22, no. 6, pp. 818833.
pp. 428430, 2003. [34] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking
[10] H. Ibrahim and N. S. P. Kong, Brightness preserving dynamic histogram the inception architecture for computer vision, in Proceedings of the IEEE
equalization for image contrast enhancement, IEEE Transactions on Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818
Consumer Electronics, vol. 53, no. 4, 2007. 2826.
[11] M. Shao, G. Liu, X. Liu, and D. Zhu, A new approach for infrared [35] M. Lin, Q. Chen, and S. Yan, Network in network, arXiv preprint
image contrast enhancement, in Proc. of SPIE, vol. 6150, no. 1, 2006, arXiv:1312.4400, 2013.
pp. 615 0091. [36] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpass-
[12] C. Ni, Q. Li, and L. Z. Xia, A novel method of infrared image denoising ing human-level performance on imagenet classification, in Proceedings
and edge enhancement, Signal Processing, vol. 88, no. 6, pp. 16061614, of the IEEE international conference on computer vision, 2015, pp. 1026
2008. 1034.
12 VOLUME 5, 2017
[37] D. Kingma and J. Ba, Adam: A method for stochastic optimization, Kyungjae Lee received the M.S. degree in Elec-
arXiv preprint arXiv:1412.6980, 2014. trical and Electronic Engineering from Yonsei
[38] A. Berg, J. Ahlberg, and M. Felsberg, A thermal object tracking bench- University, Seoul, Korea, in 2013, and the B.S.
mark, in Advanced Video and Signal Based Surveillance (AVSS), 2015 degree in Electronics and Radio Engineering from
12th IEEE International Conference on. IEEE, 2015, pp. 16. Kyunghee University, Korea, in 2011. He is cur-
[39] C. Palmero, A. Claps, C. Bahnsen, A. Mgelmose, T. B. Moeslund, and rently pursuing the Ph.D. degree at the Image and
S. Escalera, Multi-modal rgbdepththermal human body segmentation, Video Pattern Recognition Laboratory at Yonsei
International Journal of Computer Vision, vol. 118, no. 2, pp. 217239,
University. His research interests include image
2016.
enhancement and advanced driver-assistance sys-
[40] J. Portmann, S. Lynen, M. Chli, and R. Siegwart, People detection and
tracking from aerial thermal views, in Robotics and Automation (ICRA), tems based on mono- and stereo-vision.
2014 IEEE International Conference on. IEEE, 2014, pp. 17941800.
[41] N. J. Morris, S. Avidan, W. Matusik, and H. Pfister, Statistics of infrared
images, in Computer Vision and Pattern Recognition, 2007. CVPR07.
IEEE Conference on. IEEE, 2007, pp. 17.
[42] S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon, Multispectral Junhyep Lee received the B.S. degree in Elec-
pedestrian detection: Benchmark dataset and baseline, in Proceedings of tronics and Avionics Engineering from Korea
the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Aerospace University, Korea, in 2015. He is cur-
pp. 10371045. rently working toward the Ph.D. degree at the
[43] A. Torabi, G. Mass, and G.-A. Bilodeau, An iterative integrated frame- Image and Video Pattern Recognition Laboratory
work for thermalvisible image registration, sensor fusion, and people at Yonsei University. His research interests in-
tracking for video surveillance applications, Computer Vision and Image clude 3D surface normal and depth estimation, and
Understanding, vol. 116, no. 2, pp. 210221, 2012. odometry estimation for visual SLAM.
[44] J. W. Davis and V. Sharma, Background-subtraction using contour-based
fusion of thermal and visible imagery, Computer Vision and Image
Understanding, vol. 106, no. 2, pp. 162182, 2007.
[45] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, Contour detection and
hierarchical image segmentation, IEEE transactions on pattern analysis
and machine intelligence, vol. 33, no. 5, pp. 898916, 2011.
[46] R. Timofte, V. De Smet, and L. Van Gool, Anchored neighborhood Joosung Lee received the B.S. degree in Electri-
regression for fast example-based super-resolution, in Proceedings of the cal and Electronic Engineering from Yonsei Uni-
IEEE International Conference on Computer Vision, 2013, pp. 19201927. versity, Seoul, Korea, in 2016. He is currently
[47] S. Schulter, C. Leistner, and H. Bischof, Fast and accurate image upscal- working towards the M.S. degree at the Image and
ing with super-resolution forests, in Proceedings of the IEEE Conference Video Pattern Recognition Laboratory at Yonsei
on Computer Vision and Pattern Recognition, 2015, pp. 37913799. University. His field of interest is visual odometry
[48] F. Barrera, F. Lumbreras, and A. D. Sappa, Multispectral piecewise planar and object detection using deep learning.
stereo using manhattan-world assumption, Pattern Recognition Letters,
vol. 34, no. 1, pp. 5261, 2013.
[49] C. Aguilera, F. Barrera, F. Lumbreras, A. D. Sappa, and R. Toledo,
Multispectral image feature points, Sensors, vol. 12, no. 9, pp. 12 661
12 672, 2012.
[50] A. Gonzlez, Z. Fang, Y. Socarras, J. Serrat, D. Vzquez, J. Xu, and A. M.
Lpez, Pedestrian detection at day/night time with visible and fir cameras:
A comparison, Sensors, vol. 16, no. 6, p. 820, 2016. Sangwon Hwang received the B.S. degree in
[51] C. Fredembach and S. Ssstrunk, Colouring the near-infrared, in Color Electrical and Electronic Engineering from Yonsei
and Imaging Conference, vol. 2008, no. 1. Society for Imaging Science University, Seoul, Korea, in 2016. He is currently
and Technology, 2008, pp. 176182. working toward the Ph.D. degree at the Image and
[52] L. Schaul, C. Fredembach, and S. Ssstrunk, Color image dehazing Video Pattern Recognition Laboratory at Yonsei
using the near-infrared, in Image Processing (ICIP), 2009 16th IEEE University. His research interests include odome-
International Conference on. IEEE, 2009, pp. 16291632.
try estimation and SLAM based on multi-sensor
[53] Z. Sadeghipoor, Y. M. Lu, and S. Ssstrunk, Correlation-based joint
data.
acquisition and demosaicing of visible and near-infrared images, in Image
Processing (ICIP), 2011 18th IEEE International Conference on. IEEE,
2011, pp. 31653168.
[54] H. C. Burger, C. J. Schuler, and S. Harmeling, Image denoising: Can plain
neural networks compete with bm3d? in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 2392
2399. Sangyoun Lee (M04) received the Ph.D. degree
[55] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, in Electrical and Computer Engineering from the
S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast Georgia Institute of Technology, Atlanta, GA, in
feature embedding, arXiv preprint arXiv:1408.5093, 2014. 1999, and the B.S. and M.S. degrees in Electrical
[56] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning and Electronic Engineering from Yonsei Univer-
applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, sity, Seoul, Korea, in 1987 and 1989, respectively.
pp. 22782324, 1998. He is currently a Professor and the Head of Elec-
trical and Electronic Engineering at the graduate
school, and Head of the Image and Video Pattern
Recognition Laboratory at Yonsei University. His
research interests include all aspects of computer vision, with a special focus
on pattern recognition for face detection and recognition, advanced driver-
assistance systems, and video codecs.
VOLUME 5, 2017 13

08094863

Uploaded by

08094863

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Brightness-based Convolutional Neural

I. INTRODUCTION Thermal sensors that can be used for capturing HQ thermal

( , ) (1, ) (3, ) (1, ) (3, 1)

II. PROPOSED NETWORK follows: nf Conv(sf , df ). Each variable should be deter-

(a) (b) (c) (d)

(e) (f) (g) (h)

the accuracy and complexity. Since the number of layers B. TRAINING

TABLE 1. The specifications of TIRSet26 used as test dataset.

Dataset Sensor Resolution Category Selected Frame ID

(a) Thermal (top) and gray (bottom) images

(b) FIR (top) and gray (bottom) images

(c) NIR (top) and gray (bottom) images

(a) (b) (c) (d) (e)

brightness domains using the augmented 91-images. 4 1

Training Dataset - Augmented 91-image [25] 4 0

The others Mapping

2) Feature Extraction Block 3) Mapping Block

C. COMPARISONS WITH THE STATE-OF-THE-ART

Original (PSNR / SSIM / IFC) Bicubic (41.81 / 0.967 / 3.018)

KAIST [42]-Train Dataset KAIST [42]-Test Dataset

You might also like