08094863
08094863
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT In this study, we propose a convolutional neural network for thermal image enhancement by
incorporating the brightness domain with a residual-learning technique, which improves the performance of
enhancement and speed of convergence. Typically, the training domain uses the same domain as that of the
target image; however, we evaluated several domains to determine the most suitable one for the network. In
the analyses, we first compared the performance of networks that were trained by the corresponding regions
of color-based and aligned infrared-based images, respectively, including thermal, far, and near spectra.
Then, four RGB-based domains, namely, gray, lightness, intensity, and brightness, were evaluated. Finally,
the proposed network architecture was determined by considering the residual and brightness domains. The
results of the analyses indicated that the brightness domain was the best training domain for enhancing
the thermal images. The experimental results confirm that the proposed network, which can be trained
in approximately one hour, outperforms the conventional learning-based approaches for thermal image
enhancement, in terms of several image quality metrics and a qualitative evaluation. Furthermore, the results
demonstrate that the brightness domain is effective as the training domain and can be used to increase the
performance of existing networks.
INDEX TERMS Thermal infrared image, image enhancement, convolutional neural networks.
VOLUME 5, 2017 1
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
stationary wavelets. Ni et al. [12] proposed an algorithm to based network, we believe that their comparisons are unfair
enhance and preserve the edges of infrared images based on because they employed datasets that contained completely
wavelet diffusion while reducing the noise. Bai et al. [13] different scenes and patterns. For example, the dataset [25]
suggested a method of contrast enhancement by adopting a they used for their gray-based network has been widely used
multiscale new top-hat transform. A bi-dimensional empiri- in various CNN methods [20][23], [26], whereas the LWIR
cal mode decomposition method was proposed in [8] that first dataset [27] has not. This difference can cause biased results
decomposed the thermal image into several intrinsic mode due to the learning of poor parameters during training.
functions. Then, these functions were expanded and fused Thus, to ensure a fair comparison, we investigated the re-
with the residue at each decomposed level. A variational sults of training each network to the corresponding regions of
infrared enhancement technique [14] was introduced to en- color and aligned IR images in the same dataset. In addition,
hance the edge details and to prevent over-enhancement using we evaluated the performance with the same experimental
a gradient field equalization technique with adaptive dual conditions for the four domains, which were converted from
thresholds, which was obtained using histogram equalization. a color image and then applied to a thermal image. We
Yuan et al. [15] presented a multifaceted approach involving empirically verified that a brightness domain based network
the enhancement of both the image contrast and subtle image achieves better performance than other domain networks, and
details by adaptively manipulating the contrast, sharpness, this can also be applied to existing networks.
and intensity of the image. It should be noted that these In summary, in the proposed network, an HQ thermal
approaches were heuristically designed to adjust the thermal image is generated by pixel-wise summing the LQ ther-
information, and therefore cannot account for various ther- mal image as the input and the brightness based residu-
mal images, restricting their applications as a consequence. als as the output. The experimental results show that our
More recently, convolutional neural network (CNN) based proposed network outperforms conventional learning-based
methods have achieved record breaking performance com- approaches, as measured by various image quality met-
pared to previous hand-crafted feature based methods in var- rics: 1) full-reference quality assessment: the peak signal-to-
ious vision tasks, such as object detection [16], [17], image noise ratio (PSNR), structural similarity (SSIM) [28], and
recognition [18], [19], and super-resolution images [20] information fidelity criterion (IFC) [29]; and 2) the no-
[22]. One of the first CNN based approaches for enhancing reference quality assessment: the naturalness image quality
thermal images was suggested by Choi et al. [23], who evaluator (NIQE) [30], no-reference perceptual blur met-
designed a relatively shallow CNN inspired by the proposal ric (NPBM) [31], contour volume (CV) [32], and uniform
in [20]. CNNs have been successful not only in enhancing the intensity-distribution (UID) [32].
thermal image quality, but also in verifying the performance
The contributions of this paper are as follows. First, to
improvements in a variety of applications, including pedes-
the best of our knowledge, our approach is the first attempt
trian detection, visual odometry, and image registration, on
to design a residual learning CNN trained using the bright-
the basis of enhanced thermal images.
ness domain for enhancing thermal images, which improves
In this paper, we propose a residual learning [21] based
the speed of convergence and performance of enhancement.
thermal image enhancement convolutional neural network
Second, we explored several public thermal datasets to col-
(TIECNN)1 that was motivated by [22]. Since the input LQ
lect high-quality thermal images with which to evaluate the
and output HQ images are highly correlated, it is sufficient
performance of our network for general purposes, while
to train only the high-frequency components using residual
considering various environments and situations. Third, we
learning. In addition, this method can solve the vanish-
compared the performance of various domains as training
ing/exploding gradients problem [24]. In the case of a super-
data and validated that the brightness domain based network
vised learning based CNN, the choice of training images sig-
achieved the best accuracy by way of two experimental
nificantly impacts the performance of the network. For super-
studies, namely: 1) evaluate the networks trained by the
resolution color images, Dong et al. [20] explored the per-
corresponding regions of a grayscale and aligned IR (either
formance on different channels and experimentally showed
thermal-, far-, or near-) image pair in the same dataset;
differences in the network accuracy depending on the training
and 2) conduct experiments with the four training domains
domain. They demonstrated that a Y only channel (luma)-
that can be converted from an RGB image to the range of
based network achieved credible results. However, due to the
the thermal image, i.e., the gray, lightness, intensity, and
range difference between luma and thermal images, networks
brightness domains. Finally, we present a comparative study
intended for use with thermal images cannot be trained by or
with state-of-the-art methods, based on a number of metrics.
based on the luma channel. Choi et al. [23] compared the
performance of a network based on gray and MWIR images The remainder of this paper is organized as follows. Sec-
using different datasets. Although they found that the gray- tion II describes the architecture of the proposed network.
based network provided better performance than the MWIR- The experimental results and discussion are presented in
Section III. Finally, the conclusions from this research are
1 The implementation is available at https://sites.google.com/view/kjaelee/ stated in Section IV.
tiecnn.
2 VOLUME 5, 2017
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
Image
Feature Extraction Mapping
Reconstruction
Conv Filters Conv Filters
Training
Low-quality High-quality
Brightness Image Brightness Image
Implementation
Brightness-based Network
Low-quality High-quality
Thermal Image Thermal Image
FIGURE 1. Structure of the proposed network. The network is composed of multiple layers for feature extraction and mapping, and a single layer for image
reconstruction. This network is trained by the brightness domain and predicts a residual image. A high-quality thermal image is generated by summing an input
low-quality thermal image and the output residuals.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
between its two inputs, as an objective function: analyze the optimal design of the network using this do-
N
2 main to maximize the performance of our network. We then
1 X
compare our TIECNN to state-of-the-art algorithms based on
L (Y , Y ; ) =
Yi Yi
2N i=1 2 quantitative and qualitative evaluations.
N
2 We carefully collected a total of 26 high-quality thermal
1 X
=
(Xi + Ri ) (Xi + Ri )
images from various datasets to create a test dataset for
2N i=1 2
thermal image enhancement that considers various situations,
N
2 environments, and applications. This test set (TIRSet26) was
1 X
=
Ri Ri
, (1)
collected from the LTIR Dataset version 1.0 (LTIR) [38],
2N i=1 2
Trimodal Dataset (TMD) [39], Thermal Infrared Dataset
where is a set of learned parameters using N training (TID) [40], and Morris Dataset (Morris) [41]. We removed
samples, and Y and Ri are the predicted HQ and predicted the top 24 rows (black regions) of the TID [40] and the
residual images, respectively. The LQ image X is gener- bottom 2 rows (white regions) of the Morris [41] in our exper-
ated by downsampling the ground truth HR image and then iments. Detailed information regarding the experiments, such
upsampling to the original size by a scale factor using the as the sensor and image resolution, is presented in Table 1,
bicubic algorithm. and Fig. 2 shows examples of the test set.
The loss is minimized by using a gradient-based op- For an objective comparison, we conducted all experi-
timization method known as adaptive moment estimation ments for the analysis of training domain and network archi-
(Adam) [37]. We initialized the weights of the convolutional tecture using the TIRSet26 with a scale factor of two and an
filters activated by PReLU using the method [36], and the average PSNR of 40.936. Unless specified otherwise, the ex-
weights of the last layer without the activation function were perimental conditions were fixed: nf = 2, sf = 3, df = 56,
initialized by randomly selecting a value from a Gaussian nm = 2, dm = 12 for our model, as listed in Table 2. The
distribution with zero mean and standard deviation 0.001. size of sub-images was set to 32 32 with no overlap, and
the stride to 32 with batches of size 128. The learning rate
III. EXPERIMENTS was initialized to 1e 3 for all layers (except the last layer)
To learn the network structure for enhancing a thermal image, and decreased by a factor of 10 for every 50 epochs until
it is necessary to use a large number of high-quality thermal 200 epochs had passed with momentum 0.9 and weight decay
images that have a variety of patterns as a ground truth. In 1e 4. Since it is important for the network convergence to
this section, we determine the optimum domain to use for have a smaller learning rate at the last layer [20], the learning
training, and experimentally verify that a brightness-based rate of the last layer was initialized to 1e 4, which is 0.1
network provides the best performance. Subsequently, we times smaller than that of the other layers.
VOLUME 5, 2017 5
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
TABLE 2. Network configuration for analysis. low quality limited environments and patterns, compared to
color image datasets [22], [25], [45][47]. This implies that
Configuration Input Output Kernel size parameters learned from the thermal image have limitations
LQ Conv-E1 3 3 1 56 in quality enhancement. Thus, it may be reasonable to utilize
Feature Extraction
Conv-E1 Conv-E2 3 3 56 56 a sufficient number of color images as a training dataset
Conv-E2 Conv-S 1 1 56 12 for thermal image enhancement, as an alternative to thermal
Conv-S Conv-M1 3 3 12 12 images.
Mapping
Conv-M1 Conv-M2 3 3 12 12
Conv-M2 Conv-L 1 1 12 56 To validate this alternative, we evaluated the predicted
Conv-L Conv-R 3 3 56 1 results learned by each domain using a multi-modal
Image Reconstruction dataset [48], [49] that had similar color quality and aligned
Conv-R & LQ HQ Element-wise sum
thermal images. In this comparison, a gray scale image con-
verted from a color image was used. Furthermore, to identify
The numerical results were evaluated in terms of the peak the possibility of higher performance in other domains, the
signal-to-noise ratio (PSNR) in decibels (dB) vs. the ground FIR [50] and NIR [51][53] datasets were employed.
truth. After analyzing the results trained by the gray scale and IR
images, we investigated to determine the optimal color-based
A. TRAINING DOMAIN single domain that could be transformed from the color image
and was applicable to the thermal image.
In this section, we compare the performance across domains
to determine which domain achieves the highest accuracy.
Since CNNs learn optimal parameters through training using 1) Infrared vs. Grayscale
the ground truth and low-quality images generated by the The Multimodal Stereo Dataset 2 (MSD2) [48], [49], Visible-
ground truth, choosing the pertinent domain is a very impor- FIR Day-Night Pedestrian Sequence Dataset (VFD) [50],
tant factor in CNNs. and IVRL Dataset (IVRLD) [51][53] were used for the
The thermal image datasets [40], [42][44] have relatively comparisons of thermal, FIR, and NIR, respectively. Each
6 VOLUME 5, 2017
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
dataset consists of color and aligned IR images, and we TABLE 3. Performance table for infrared vs. gray analysis.
converted the color image to the grayscale image, according (a) Thermal vs. gray
to the following equation: Training Dataset - MSD2 [48], [49]
Epoch
Gray = 0.299 R + 0.587 G + 0.114 B, (2) Trained by 10 50 100 150 200
Thermal 41.611 42.301 42.336 42.340 42.340
Gray 41.905 42.395 42.470 42.475 42.475
where R, G, and B are the values of the red, green, and Difference 0.294 0.094 0.134 0.135 0.135
blue channels in a color image, respectively, and are scaled
between 0 and 1. (b) FIR vs. gray
Instead of using all regions of the image as training data, Training Dataset - VFD [50]
we cropped the region of interest to contain various patterns FIR 41.481 42.116 42.241 42.251 42.252
for the training, as shown in Fig. 3. The sampling has the Gray 41.697 42.453 42.487 42.491 42.491
corresponding position and same size in the gray and IR Difference 0.216 0.337 0.246 0.240 0.239
images, and the number of sample images and sub-images
were 112, 4096 in MDS2, 81, 1496 in VFD, and 56, 3072 in (c) NIR vs. gray
IVRLD, respectively. Training Dataset - IVRLD [51][53]
It can be seen that the quality of the output image improved NIR 42.197 42.367 42.402 42.400 42.400
with respect to the quality of the input image in all domains. Gray 42.045 42.049 42.167 42.162 42.165
In detail, in the results of the MSD2 and VFD, the network Difference 0.152 0.318 0.235 0.238 0.235
that learned using the gray domain had better image enhance-
ment performance than the IR (thermal- and far-) domains,
as shown in (a-b) of Table 3, respectively. Surprisingly, in the
IVRLD, we observed that the output accuracy of the network as per the following equations:
that learned using the NIR domain was better than that of the
gray model, which is unlike the previous ones, as shown in 1
Lightness = (max(R, G, B) + min(R, G, B)), (3)
(c) of Table 3. 2
1
Intensity = (R + G + B), (4)
3
2) Domains Converted from RGB Space Brightness = max(R, G, B). (5)
In order to find the domain that can maximize the learning In the experiments, the 91-image dataset [25] was used,
effect when using a color image dataset, we compared the and examples of the converted domain are shown in Fig. 4. A
domains that could be converted from RGB space to the total of 17,152 sub-images were used to train the network.
range of the thermal image without loss of information. As shown in Fig. 5 and Table 4, the network trained by
Accordingly, to train the network, we converted the RGB the gray outperformed the one that used the lightness and
space into the gray, lightness (L in HSL), intensity (I in HSI), intensity domains, but the networks based on the brightness
and brightness (V in HSV) domains. The gray domain was domain provided better performance compared to the gray-
converted by using (2), and the other domains were converted based network.
VOLUME 5, 2017 7
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
4 3 .0 4 2 .6
4 2 .8 4 2 .3
P S N R (d B )
P S N R (d B )
4 2 .6 4 2 .0
G ra y
4 2 .4
L u m in a n c e N e a r-in fra re d
4 1 .7
In te n s ity G ra y
B rig h tn e s s B rig h tn e s s
4 2 .2 4 1 .4
0 5 0 1 0 0 1 5 0 2 0 0 0 5 0 1 0 0 1 5 0 2 0 0
E p o c h E p o c h
FIGURE 5. Performance graph of the PSNR for gray, lightness, intensity, and FIGURE 6. Performance graph of the PSNR for the comparison of
brightness domain analysis. near-infrared, gray, and brightness domains.
TABLE 4. Performance table of the PSNR for the networks trained by the
domain of gray, lightness, intensity, and brightness, respectively. To summarize, although a one-to-one comparison between
the gray-based and IR-based networks shows that the NIR-
Training Dataset - 91-images [25] based model provided slightly higher performance than that
Epoch of the gray-based model (see Table 3 (c)), we observed that
the brightness domain converted from the 91-image dataset
Domain 10 50 100 150 200
was the best (see Table 4). Therefore, we concluded the best
Gray 42.527 42.834 42.873 42.881 42.889
choice was to convert color images into brightness images
Lightness 42.431 42.775 42.778 42.820 42.818
Intensity 42.352 42.798 42.804 42.809 42.818
based on the public database, and then to use this domain
Brightness 42.684 42.870 42.917 42.927 42.926 when training the network parameters for thermal image
enhancement.
B. NETWORK ARCHITECTURE
We now study the performance of the networks trained
by the gray and brightness domains in detail. First, the To determine the optimal structure for the proposed network,
brightness-based network was more accurate than the gray- we studied the residual learning and the variables of the
based network and showed similar performance to the NIR- image extraction and mapping block using the 91-images as
based network, as shown in Fig. 6 in which all models were a training dataset, which were converted to the brightness
trained using the IVRLD related to Table 3 (c). domain in consideration of the above experiments.
Second, we compared the performance by adopting data
1) Residual Learning
augmentation to the two domains, i.e., the brightness and
gray domains. We augmented the data in three ways: rotation, The network architecture and experimental conditions were
scaling, and flipping. Each image was rotated by 90 , 180 , the same as those described earlier. Fig. 7 shows the con-
and 270 , and downscaled by factors of 0.5 to 1.0 by 0.1, vergence curves in terms of the average PSNR on the test
and flipped vertically. A total of 463,744 sub-images were dataset. The performance of the non-residual network fluctu-
used for the training, and the learning rate of all layers ated significantly. On the other hand, the residual network
was initialized to 1e 3 (set to 1e 4 for the last layer) was stable, converged rapidly, and outperformed the non-
and decreased by a factor of 10 for every 20 epochs until residual network. Therefore, it can be seen that even though
80 epochs had passed. In Table 5, it can be seen that the the network was trained from a different domain, it can still
performance of networks trained by the brightness domain be used to enhance the thermal image.
was superior to that of the gray, and at the same time, the
data argumentation improved the network performance. 4 3
4 2
TABLE 5. Performance table of the PSNR for the analysis of gray vs.
P S N R (d B )
Epoch N o n -R e s id u a l
3 9 R e s id u a l
Domain 10 20 40 60 80
0 5 0 1 0 0 1 5 0 2 0 0
Gray 43.056 43.021 43.154 43.188 43.203 E p o c h
Brightness 43.167 43.196 43.253 43.245 43.249
FIGURE 7. Performance graph of the PSNR for residual and non-residual
Difference 0.111 0.175 0.098 0.057 0.046
networks.
8 VOLUME 5, 2017
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
TABLE 6. Comparison with different sizes of filter and different numbers of layers in the feature extraction block.
Feature Extraction
Conv(3) Conv(5) Conv(7) Conv(9) 2Conv(3) 2Conv(5) Conv(3-5) Conv(5-3) 3Conv(3) 4Conv(3)
42.60 42.70 42.64 42.73 42.93 42.99 42.95 42.91 43.01 43.01
TABLE 7. Comparison of the performance and the number of parameters with different variables.
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
SRCNN-Ex [20] (43.77 / 0.973 / 3.651) TEN [23] (43.93 / 0.973 / 3.584) Ours (44.11 / 0.974 / 3.667)
FIGURE 8. Visual comparison of the TIRset26 (LTIR [38]-saturated-195) with a scale factor of three.
image dataset. The strategy of the data argumentation was other layers, and decreased by a factor of 10 for every 20
identical to that described above. During training, 48 48 epochs with a momentum parameter 0.9 and weight decay
sub-images and batches of size 128 were applied. We trained 5.0 1e 4. Training with 50 epochs was sufficient and took
the SRCNN-Ex optimized by two versions of Stochastic approximately one hour.
Gradient Descent (SGD) [56] and Adam until 1000 epochs
To objectively evaluate the thermal image enhancement
had passed, respectively, and TEN optimized by Adam until
performance, seven image quality metrics were used: the
100 epochs had passed. The training of the SRCNN-Ex with
PSNR, SSIM [28], and IFC [29] for full-reference quality
57,184 parameters and TEN with 63,840 parameters took
assessment using the TIRset26; NIQE [30], NPBM [31],
approximately 12 and 1.5 h, respectively.
CV [32], and UID [32] for no-reference quality assessment
using the KAIST Dataset [42].
For training the proposed network, the learning rate was
initialized to 1e 4 for the last layer and 1e 3 for the Table 9 presents the quantitative results of our full-
TABLE 9. Quantitative evaluation of full-reference quality assessment on the TIRset26 dataset. The average results of PSNR, SSIM, and IFC for scale factors 2, 3,
and 4, with higher values indicating better performance. Red text indicates the best performance, and blue text indicates the second-best performance.
Algorithm
TIRSet26
Gray based Brightness based
Evaluation Scale Bicubic SRCNN-Ex [20] SRCNN-Ex [20] Ours
TEN [23] TEN [23]
metric factor SGD Adam (Adam)
2 40.94 42.96 42.99 43.07 43.04 43.16 43.41
PSNR 3 37.58 39.28 39.34 39.42 39.37 39.49 39.72
4 35.46 36.86 37.01 37.09 37.02 37.12 37.36
2 0.965 0.973 0.973 0.973 0.973 0.973 0.974
SSIM 3 0.936 0.947 0.948 0.948 0.948 0.948 0.950
4 0.909 0.921 0.922 0.923 0.922 0.923 0.926
2 5.869 7.179 7.335 7.273 7.277 7.237 7.435
IFC 3 3.400 3.899 4.037 4.046 3.965 4.021 4.122
4 2.574 2.502 2.566 2.596 2.573 2.257 2.712
10 VOLUME 5, 2017
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
Original (7.893 / 0.353 / 4.448 / 40.71) SR.-Ex [20] (7.394 / 0.308 / 5.601 / 42.18) TEN [23] (7.065 / 0.317 / 4.830 / 42.33) Ours (6.207 / 0.295 / 5.875 / 42.62)
Original (6.907 / 0.494 / 4.891 / 49.39) SR.-Ex [20] (6.380 / 0.407 / 6.836 / 53.38) TEN [23] (6.233 / 0.413 / 6.056 / 53.88) Ours (6.232 / 0.390 / 6.977 / 53.93)
Original (7.567 / 0.470 / 4.962 / 49.10) SR.-Ex [20] (6.757 / 0.385 / 6.239 / 50.24) TEN [23] (6.754 / 0.398 / 5.700 / 50.19) Ours (6.357 / 0.375 / 6.606 / 50.36)
FIGURE 9. Visual comparison of the KAIST [42] with a scale factor of four, (NIQE / NPBM / CV / UID). (top) test-set11-v000-I01537;
(middle) train-set05-v000-I01619; (bottom) train-set02-v000-I00544.
reference quality assessment using the TIRset26. Our method KAIST dataset. This dataset provides a total of 95,324 low-
outperforms the other methods for all metrics and scale quality thermal images, and is composed of 50,184 training
factors. In addition, to verify the effect of the brightness and 45,140 test sets. We evaluated each of these two sets. The
domain, we also conducted an experiment using the gray and results obtained using the proposed network are compared
brightness domains on SRCNN-Ex and TEN, respectively. to those for SRCNN-Ex and TEN, which were trained using
The experimental results of the gray and brightness based the brightness domain and optimized by the Adam algorithm.
networks show that the performance in terms of the PSNR Table 10 shows that the proposed method outperforms the
and SSIM is improved by only changing the training domain compared methods for all quality metrics.
from gray to brightness in the compared algorithms. Thus,
training based on the brightness domain is an effective means Qualitative comparisons are shown in Figs. 8 and 9, and
of improving the performance of CNNs for thermal image it can be seen that the results of the proposed method are
enhancement. perceptually better. Thus, the proposed method provides the
The following describes the performance evaluation by no- best performance among all compared methods for both the
reference quality assessment with a scale factor of four on the quantitative and qualitative evaluations.
TABLE 10. Quantitative evaluation of the no-reference quality assessment on the training and test sets of the KAIST Dataset [42]. The average results of NIQE,
NPBM, CV, and UID with a scale factor of four. Lower values of NIQE and NPBM, and higher values of CV and UID indicate better performance. Red text indicates
the best performance, and blue text indicates the second-best performance.
VOLUME 5, 2017 11
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
IV. CONCLUSION [13] X. Bai, F. Zhou, and B. Xue, Infrared image enhancement through
The primary objective of this study was to enhance the contrast enhancement by using multiscale new top-hat transform, Infrared
Physics & Technology, vol. 54, no. 2, pp. 6169, 2011.
quality of thermal images. To achieve this, we explored [14] W. Zhao, Z. Xu, J. Zhao, F. Zhao, and X. Han, Variational infrared image
various domains, including RGB-based and multiple infrared enhancement based on adaptive dual-threshold gradient field equaliza-
images, and conducted a number of experiments to deter- tion, Infrared Physics & Technology, vol. 66, pp. 152159, 2014.
[15] L. T. Yuan, S. K. Swee, and T. C. Ping, Infrared image enhancement using
mine the most relevant training domain and the structure of adaptive trilateral contrast enhancement, Pattern Recognition Letters,
the proposed network. Through experimental analyses, we vol. 54, pp. 103108, 2015.
determined that training the network based on the bright- [16] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time
object detection with region proposal networks, in Neural Information
ness domain, which is a transformation of the RGB dataset Processing Systems (NIPS), 2015, 2015.
containing various patterns, was the most effective. The [17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look
brightness domain was then applied to thermal image en- once: Unified, real-time object detection, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2016, pp. 779
hancement through residual learning. In particular, the use of 788.
the brightness domain was an important factor that improved [18] K. Simonyan and A. Zisserman, Very deep convolutional networks for
the performance not only in our network but also in the large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[19] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image
previous method. To validate our proposed method, a test recognition, in Computer Vision and Pattern Recognition (CVPR), 2016
dataset based on high-quality thermal images was carefully IEEE Conference on, 2016.
selected from public datasets while considering various situ- [20] C. Dong, C. C. Loy, K. He, and X. Tang, Image super-resolution using
deep convolutional networks, IEEE transactions on pattern analysis and
ations, environments, and sensors. The results of comparative machine intelligence, vol. 38, no. 2, pp. 295307, 2016.
experiments demonstrated that our network outperformed all [21] J. Kim, J. Kwon Lee, and K. Mu Lee, Accurate image super-resolution
other approaches in terms of both quantitative and qualitative using very deep convolutional networks, in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646
evaluations. We believe that our approach shows good poten- 1654.
tial in thermal image-based applications. In our future work, [22] C. Dong, C. C. Loy, and X. Tang, Accelerating the super-resolution
we will include a single network to handle the multi-scale convolutional neural network, in European Conference on Computer
Vision. Springer, 2016, pp. 391407.
and multi-spectral images. [23] Y. Choi, N. Kim, S. Hwang, and I. S. Kweon, Thermal image en-
hancement using convolutional neural network, in Intelligent Robots and
REFERENCES Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE,
[1] X. Zhao, Z. He, S. Zhang, and D. Liang, Robust pedestrian detection 2016, pp. 223230.
in thermal infrared imagery using a shape distribution histogram feature [24] Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies
and modified sparse representation classification, Pattern Recognition, with gradient descent is difficult, IEEE transactions on neural networks,
vol. 48, no. 6, pp. 19471960, 2015. vol. 5, no. 2, pp. 157166, 1994.
[2] J. Baek, S. Hong, J. Kim, and E. Kim, Efficient pedestrian detection at [25] J. Yang, J. Wright, T. S. Huang, and Y. Ma, Image super-resolution via
nighttime using a thermal camera, Sensors, vol. 17, no. 8, p. 1850, 2017. sparse representation, IEEE transactions on image processing, vol. 19,
[3] W. K. Wong, H. L. Lim, C. K. Loo, and W. S. Lim, Home alone no. 11, pp. 28612873, 2010.
faint detection surveillance system using thermal camera, in Computer [26] B. Yue, S. Wang, X. Liang, L. Jiao, and C. Xu, Joint prior learning for
Research and Development, 2010 Second International Conference on. visual sensor network noisy image super-resolution, Sensors, vol. 16,
IEEE, 2010, pp. 747751. no. 3, p. 288, 2016.
[4] A. C. Goldberg, T. Fischer, and Z. I. Derzko, Application of dual-band [27] M. S. Kristoffersen, J. V. Dueholm, R. Gade, and T. B. Moeslund, Pedes-
infrared focal plane arrays to tactical and strategic military problems, trian counting with occlusion handling using stereo thermal cameras,
in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Sensors, vol. 16, no. 1, p. 62, 2016.
Series, vol. 4820, 2003, pp. 500514. [28] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image
[5] B. C. Arrue, A. Ollero, and J. M. De Dios, An intelligent system for false quality assessment: from error visibility to structural similarity, IEEE
alarm reduction in infrared forest-fire detection, IEEE Intelligent Systems transactions on image processing, vol. 13, no. 4, pp. 600612, 2004.
and Their Applications, vol. 15, no. 3, pp. 6473, 2000. [29] H. R. Sheikh, A. C. Bovik, and G. De Veciana, An information fidelity
[6] P. V. K. Borges and S. Vidas, Practical infrared visual odometry, IEEE criterion for image quality assessment using natural scene statistics, IEEE
Transactions on Intelligent Transportation Systems, vol. 17, no. 8, pp. Transactions on image processing, vol. 14, no. 12, pp. 21172128, 2005.
22052213, 2016. [30] A. Mittal, R. Soundararajan, and A. C. Bovik, Making a AIJcompletely
[7] A. Prata and C. Bernardo, Retrieval of volcanic ash particle size, mass blindAI image quality analyzer, IEEE Signal Processing Letters, vol. 20,
and optical depth from a ground-based thermal infrared camera, Journal no. 3, pp. 209212, 2013.
of Volcanology and Geothermal Research, vol. 186, no. 1, pp. 91107, [31] F. Crete, T. Dolmiere, P. Ladret, and M. Nicolas, The blur effect: per-
2009. ception and estimation with a new no-reference perceptual blur metric. in
[8] B.-S. Yang, F. Gu, A. Ball et al., Thermal image enhancement using bi- Human vision and electronic imaging, vol. 12, no. 6492, 2007, p. 64920.
dimensional empirical mode decomposition in combination with relevance [32] H. Yao, M.-Y. Huseh, G. Yao, and Y. Liu, Image evaluation factors,
vector machine for rotating machinery fault diagnosis, Mechanical Sys- Image Analysis and Recognition, pp. 255262, 2005.
tems and Signal Processing, vol. 38, no. 2, pp. 601614, 2013. [33] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional
[9] Q. Chen, L.-F. Bai, and B.-M. Zhang, Histogram double equalization in networks, in European conference on computer vision. Springer, 2014,
infrared image, Journal of Infrared and Millimeter Waves, vol. 22, no. 6, pp. 818833.
pp. 428430, 2003. [34] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking
[10] H. Ibrahim and N. S. P. Kong, Brightness preserving dynamic histogram the inception architecture for computer vision, in Proceedings of the IEEE
equalization for image contrast enhancement, IEEE Transactions on Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818
Consumer Electronics, vol. 53, no. 4, 2007. 2826.
[11] M. Shao, G. Liu, X. Liu, and D. Zhu, A new approach for infrared [35] M. Lin, Q. Chen, and S. Yan, Network in network, arXiv preprint
image contrast enhancement, in Proc. of SPIE, vol. 6150, no. 1, 2006, arXiv:1312.4400, 2013.
pp. 615 0091. [36] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpass-
[12] C. Ni, Q. Li, and L. Z. Xia, A novel method of infrared image denoising ing human-level performance on imagenet classification, in Proceedings
and edge enhancement, Signal Processing, vol. 88, no. 6, pp. 16061614, of the IEEE international conference on computer vision, 2015, pp. 1026
2008. 1034.
12 VOLUME 5, 2017
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2017.2769687, IEEE Access
K. Lee et al.: Brightness-based Convolutional Neural Network for Thermal Image Enhancement
[37] D. Kingma and J. Ba, Adam: A method for stochastic optimization, Kyungjae Lee received the M.S. degree in Elec-
arXiv preprint arXiv:1412.6980, 2014. trical and Electronic Engineering from Yonsei
[38] A. Berg, J. Ahlberg, and M. Felsberg, A thermal object tracking bench- University, Seoul, Korea, in 2013, and the B.S.
mark, in Advanced Video and Signal Based Surveillance (AVSS), 2015 degree in Electronics and Radio Engineering from
12th IEEE International Conference on. IEEE, 2015, pp. 16. Kyunghee University, Korea, in 2011. He is cur-
[39] C. Palmero, A. Claps, C. Bahnsen, A. Mgelmose, T. B. Moeslund, and rently pursuing the Ph.D. degree at the Image and
S. Escalera, Multi-modal rgbdepththermal human body segmentation, Video Pattern Recognition Laboratory at Yonsei
International Journal of Computer Vision, vol. 118, no. 2, pp. 217239,
University. His research interests include image
2016.
enhancement and advanced driver-assistance sys-
[40] J. Portmann, S. Lynen, M. Chli, and R. Siegwart, People detection and
tracking from aerial thermal views, in Robotics and Automation (ICRA), tems based on mono- and stereo-vision.
2014 IEEE International Conference on. IEEE, 2014, pp. 17941800.
[41] N. J. Morris, S. Avidan, W. Matusik, and H. Pfister, Statistics of infrared
images, in Computer Vision and Pattern Recognition, 2007. CVPR07.
IEEE Conference on. IEEE, 2007, pp. 17.
[42] S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon, Multispectral Junhyep Lee received the B.S. degree in Elec-
pedestrian detection: Benchmark dataset and baseline, in Proceedings of tronics and Avionics Engineering from Korea
the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Aerospace University, Korea, in 2015. He is cur-
pp. 10371045. rently working toward the Ph.D. degree at the
[43] A. Torabi, G. Mass, and G.-A. Bilodeau, An iterative integrated frame- Image and Video Pattern Recognition Laboratory
work for thermalvisible image registration, sensor fusion, and people at Yonsei University. His research interests in-
tracking for video surveillance applications, Computer Vision and Image clude 3D surface normal and depth estimation, and
Understanding, vol. 116, no. 2, pp. 210221, 2012. odometry estimation for visual SLAM.
[44] J. W. Davis and V. Sharma, Background-subtraction using contour-based
fusion of thermal and visible imagery, Computer Vision and Image
Understanding, vol. 106, no. 2, pp. 162182, 2007.
[45] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, Contour detection and
hierarchical image segmentation, IEEE transactions on pattern analysis
and machine intelligence, vol. 33, no. 5, pp. 898916, 2011.
[46] R. Timofte, V. De Smet, and L. Van Gool, Anchored neighborhood Joosung Lee received the B.S. degree in Electri-
regression for fast example-based super-resolution, in Proceedings of the cal and Electronic Engineering from Yonsei Uni-
IEEE International Conference on Computer Vision, 2013, pp. 19201927. versity, Seoul, Korea, in 2016. He is currently
[47] S. Schulter, C. Leistner, and H. Bischof, Fast and accurate image upscal- working towards the M.S. degree at the Image and
ing with super-resolution forests, in Proceedings of the IEEE Conference Video Pattern Recognition Laboratory at Yonsei
on Computer Vision and Pattern Recognition, 2015, pp. 37913799. University. His field of interest is visual odometry
[48] F. Barrera, F. Lumbreras, and A. D. Sappa, Multispectral piecewise planar and object detection using deep learning.
stereo using manhattan-world assumption, Pattern Recognition Letters,
vol. 34, no. 1, pp. 5261, 2013.
[49] C. Aguilera, F. Barrera, F. Lumbreras, A. D. Sappa, and R. Toledo,
Multispectral image feature points, Sensors, vol. 12, no. 9, pp. 12 661
12 672, 2012.
[50] A. Gonzlez, Z. Fang, Y. Socarras, J. Serrat, D. Vzquez, J. Xu, and A. M.
Lpez, Pedestrian detection at day/night time with visible and fir cameras:
A comparison, Sensors, vol. 16, no. 6, p. 820, 2016. Sangwon Hwang received the B.S. degree in
[51] C. Fredembach and S. Ssstrunk, Colouring the near-infrared, in Color Electrical and Electronic Engineering from Yonsei
and Imaging Conference, vol. 2008, no. 1. Society for Imaging Science University, Seoul, Korea, in 2016. He is currently
and Technology, 2008, pp. 176182. working toward the Ph.D. degree at the Image and
[52] L. Schaul, C. Fredembach, and S. Ssstrunk, Color image dehazing Video Pattern Recognition Laboratory at Yonsei
using the near-infrared, in Image Processing (ICIP), 2009 16th IEEE University. His research interests include odome-
International Conference on. IEEE, 2009, pp. 16291632.
try estimation and SLAM based on multi-sensor
[53] Z. Sadeghipoor, Y. M. Lu, and S. Ssstrunk, Correlation-based joint
data.
acquisition and demosaicing of visible and near-infrared images, in Image
Processing (ICIP), 2011 18th IEEE International Conference on. IEEE,
2011, pp. 31653168.
[54] H. C. Burger, C. J. Schuler, and S. Harmeling, Image denoising: Can plain
neural networks compete with bm3d? in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 2392
2399. Sangyoun Lee (M04) received the Ph.D. degree
[55] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, in Electrical and Computer Engineering from the
S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast Georgia Institute of Technology, Atlanta, GA, in
feature embedding, arXiv preprint arXiv:1408.5093, 2014. 1999, and the B.S. and M.S. degrees in Electrical
[56] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning and Electronic Engineering from Yonsei Univer-
applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, sity, Seoul, Korea, in 1987 and 1989, respectively.
pp. 22782324, 1998. He is currently a Professor and the Head of Elec-
trical and Electronic Engineering at the graduate
school, and Head of the Image and Video Pattern
Recognition Laboratory at Yonsei University. His
research interests include all aspects of computer vision, with a special focus
on pattern recognition for face detection and recognition, advanced driver-
assistance systems, and video codecs.
VOLUME 5, 2017 13
2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.