Tone mapping is one of the main techniques to convert high-dynamic range (HDR) images into low-dynamic range (LDR) images. We propose to use a variant of generative adversarial networks to adaptively tone map images. We designed a conditional adversarial generative network composed of a U-Net generator and patchGAN discriminator to adaptively convert HDR images into LDR images. We extended previous work to include additional metrics such as tone-mapped image quality index (TMQI), structural similarity index measure, Fréchet inception distance, and perceptual path length. In addition, we applied face detection on the Kalantari dataset and showed that our proposed adversarial tone mapping operator generates the best LDR image for the detection of faces. One of our training schemes, trained via 256 × 256 resolution HDR–LDR image pairs, results in a model that can generate high TMQI low-resolution 256 × 256 and high-resolution 1024 × 2048 LDR images. Given 1024 × 2048 resolution HDR images, the TMQI of the generated LDR images reaches a value of 0.90, which outperforms all other contemporary tone mapping operators. |
1.IntroductionThe dynamic range of an image is described as the variation of luminance in different parts of the image.1 The majority of real-life images are of low dynamic range (LDR) and are generally represented by an 8-bit integer per pixel format.2 In contrast, high dynamic range (HDR) uses more bits (16/32) to quantify the pixel values. Even though HDR images can better describe a scene, most common 8-bit display methods are not compatible with HDR images. A cost-effective method of displaying HDR images is to convert them into LDR images as opposed to using a 16-bit display setting. Many tone mapping operators (TMOs) have been proposed and have shown incredible progress in many scenarios. Even though tone mapping is one of the most common ways to perform HDR to LDR conversion, TMOs have many limitations, such as generalization, parameter turning, expert knowledge, and model instability. The main research question of this work is: Is it possible to propose a TMO that can adaptively tone-map all HDR images with different contents? In this paper, we seek to answer this question by exploring deep learning techniques. We propose a specific deep learning network, a conditional generative adversarial network (cGAN),3 to adaptively convert an HDR image into an LDR image. Our proposed model is training via HDR–LDR image pairs containing assorted content, including natural scenarios, indoor/outdoor scenes, regular/irregular geometric shapes, colorful/monochrome objects, and drastic luminance changes. In general, the implementation of any generative adversarial networks (GANs) requires an objective loss function. In deep learning networks, the loss function measures the difference between the output and input images. Common loss functions are the absolute (called ) or squared (called ). In this work, we implement a unique network composed of general cGAN loss, feature matching loss, and perceptual loss. Combining these losses allows the proposed adversarial tone mapping operator (adTMO) to learn the distribution of ideally tone-mapped images. For low-resolution image-to-image translation tasks, cGAN has shown great success in generating high-quality target images.4 However, for high-resolution image-to-image translation tasks, many problems exist. These problems require complex models to combat tilling patterns, local blurring, and saturated artifacts.5,6 One of the main deterrences of using high-resolution images is the amount of resources required for training, specifically the amount of time required for convergence. In our work, we explore the possibility of using low-resolution images to train a cGAN model (“U-Net” and PatchGAN ). We extended the work on adTMO7 to include additional metrics such as structural similarity index measure (SSIM), perceptual path length (PPL), Fréchet inception distance (FID), and multi-scale structural similarity index measure (MS-SSIM), as well as the performance metrics for face detection. We show that adTMO outperforms most other TMOs when testing on low- and high-resolution HDR images. This paper aims to design a smart TMO that can adaptively convert complex scenic HDR images into LDR images. The main contributions of our work are listed as follows.
This paper is organized as follows: Section 2 provides a literature review related to TMOs, cGAN, and metrics used for evaluating image-to-image translation tasks. Section 3 describes the architecture of adTMO and the different training/testing schemes we apply. Section 4 details the databases used for training and the preprocessing and postprocessing steps applied to the images. Section 5 summarizes the results of adTMO. Section 6 concludes our paper. 2.Related WorkIn this section, we provide a short review of tone-mapping literature, cGAN, and metrics used for evaluating image-to-image translation tasks. 2.1.TMOsOver the past 20 years, different TMOs have been designed to convert HDR images into LDR images. They can be divided into two categories, global TMOs and local TMOs, based on how they work on image pixels. Global TMOs, such as Larson et al.8 and Drago et al.,9 apply the same function on all pixels of an image. Global TMOs take less time to convert HDR images, but the output LDR images have reduced contrast. Local TMOs, e.g., Chiu et al.10 and Tumblin et al.,11 calculate the output pixel value based on the input and its neighboring pixels. Local TMOs can preserve the local structure and generate good contrast but at a cost of more computation time. In addition, most TMOs can only deal with some specific scenarios and do not generalize well with regard to image content. 2.2.Generative Adversarial NetworksFirst proposed by Goodfellow in 2014,12 GAN has shown great success in many fields. GAN consists of a generator model () and a discriminator model (). The goal of is to generate fake samples that are real enough to fool . For , its goal is to distinguish real samples from collected databases and fake samples generated by . By training and simultaneously, they can compete with each other and achieve an equilibrium allowing to implicitly learn the distribution of real samples from the collected databases, without the need of complex loss functions. In this paper, we adopt cGAN,3 so that the goal of changes to generating fake samples under new conditions. Many low-resolution image-to-image translation tasks, such as semantic labels to photos and architectural labels to photo, adopt cGAN to generate target images and achieve satisfactory results.4 Patel et al.13 conducted a similar work using cGAN to convert HDR images into LDR images, but they only tested with resolution image crops. A complex multi-scale architecture for high-resolution image-to-image tasks is proposed by Wang et al.5 and Rana et al.6 Those proposed networks required high-resolution training images and took many resources including memory and time to train. It took a week to train the multi-scale network6 using a 12-GB NVIDIA Titan-X GPU on a Intel Xeon e7 core i7 machine. Due to the downsampling process in the generation part of cGAN, it is challenging for the input images to preserve the fine details. A bilateral filter is a common method to perform edge-preserving and noise-reducing operations which can be adopted to preserve the finer details of an image.14 A method that optimizes the bilateral filtering method to have a constant time O(1) was proposed by Porikli.15 Others proposed to preserve edges in images include global image smoothing based on the weighted least squares (WLS)16 and guided image filter.17 Extended work on WLS was conducted by Min et al.18 to create a fast variant, achieving comparable results but requiring much less computational time. Optimization to the guided image filtering technique was performed by incorporating an edge-aware weighting into the guided filter, which greatly reduced the halo artifacts in images.19 Zheng et al.20 proposed to create a hybrid model that consists of both a model-driven and data-driven approach to generate a higher quality image. In this paper, we have mainly focused on the data-driven approach via the use of cGAN. However, there is an immense value in a hybrid model; thus we plan to create such a hybrid model in future works by integrating the model-driven portion into our data-driven model. 2.3.Evaluation for Image-to-Image Translation TaskEvaluation of image-to-image translation tasks remains an open question. SSIM was proposed by Wang et al.21 to compare the structural information based on the human visual system. SSIM is commonly used to compare the similarity between the generated images and the ground-truth images. It is defined by Wang et al.21 as follows: where is the mean with respect to or , is the variance with respect to or , and , are the constants defined as and ( is the dynamic range of the pixels), respectively.Based on SSIM, a metric called multi-scale structural similarity (MS-SSIM)22 was designed to incorporate the variations of viewing conditions. FID23 was proposed to capture the similarity between the generated and ground-truth images. To compute FID, both the generated and real images are propagated through a pretrained Inception V3 model24 and their difference from the last pooling layer is used. A smaller FID represents higher similarity, that is given an FID of 0, two images are identical. The FID is defined as follows: where represents the mean for the real () and generated () images, represents the covariance for the real () and generated () images, and tr is the trace linear function.Similar to FID, PPL25 uses the pretrained VGG1626 as embeddings to calculate the perceptual similarity between two images. As with FID, a smaller PPL means that two images have a greater perceptual similarity. Evaluating the performance of TMOs is also an issue for tone mapping operations. One intuitive solution is a subjective evaluation, which involves human participants ranking LDR images generated by different TMOs based on their subjective preference. Such subjective evaluation takes a lot of time and energy, with the results unstable across different participant groups.27 Another solution is objective metrics, e.g., tone-mapped image quality index (TMQI)28 and TMQI-II,29 widely used in tone-mapping optimization studies.6,30 TMQI represents a form of indexing that considers the naturalness of tone-mapped LDR images, and structural fidelity between the HDR and tone-mapped LDR images expressed as28 where and denote the original HDR image and the tone-mapped LDR image, and denote the structural fidelity and statistical naturalness measures, respectively. and control the sensitivities of and , and adjusts the relative weights between and . In this paper, we use the default , , and , recommended by Yeganeh and Wang.283.Proposed MethodIn this section, we will detail our proposed adTMO to convert HDR images into LDR images, the architecture of our and , the objective function we use, and the different training/testing schemes we deploy. 3.1.cGAN-Based adTMOIn this paper, we construct adTMO based on the principle of cGAN3 that can translate HDR images into LDR images. Figure 1 shows the training pipeline of our proposed adTMO. We train using (HDR, LDR) pairs where is trying to predict (HDR, RealLDR) pair as real and predict (HDR, FakeLDR) pair as fake. is trying to generate FakeLDR that is real enough so that is unable to distinguish FakeLDR from RealLDR. We train and simultaneously, specifically, in each iteration, we train twice with weight set to 0.5 [once using the (HDR, RealLDR) pair, and once using the (HDR, FakeLDR) pair]. 3.2.Network ArchitecturesWe adopt the network architectures from Isola et al.,4 where is a U-Net31 and is a PatchGAN,32 both using convolution-BatchNorm-LeakyRelu33 blocks with . 3.2.1.Generator architectureFigure 2 shows the architecture of our , which is a U-Net consisting of one input block, seven encoding blocks, one bottleneck, seven decoding blocks, and one output block. Each encoding block will down-sample image size by (1/2 of width and 1/2 of height) of the previous block with , and each decoding block will up-sample the previous block by 4 times. We added direct connections between the encoding and decoding blocks in order to preserve some of the finer details that may have been lost during the downsampling process. This direct connection, also called skip connection, allows for the gradient of the later layers to propagate back to the earlier layers. Such propagation prompts the model to learn, more efficiently, the mapping between the input and output layers, allowing for the finer details to be recovered from the downsampling process. For the ’th decoding block, we add a direct skip from the last ’th encoding block and concatenate the two blocks in channel before applying the LeakyRelu activation function. The filter size is set to for all blocks. The filter number is set to 64 for the first encoding block and doubles for each of the next encoding block until it reaches 512, then remains unchanged. The filter number for each decoding block is the same as the encoding block with which it connects. For the bottleneck block, the filter number is set to 512, and the activation function is ReLU. For the output block, the filter number is set to 1 and the activation function is sigmoid. We can feed our with images of different sizes given it is fully convolutional. 3.2.2.Discriminator architectureFigure 3 shows the architecture of our . This is a PatchGAN consisting of one input layer, five encoding blocks, and one output block. The input layer concatenates the input HDR and LDR image in the color channel. Each of the first four encoding blocks will down-sample image size to of the previous block with . For the last encoding block, we set , leaving the image size unchanged. The number of filters for each encoding blocks is defined as follows 64, 128, 256, 512, and 512. The output block has 1 filter, with , a sigmoid activation and outputs a matrix. Each value in the output matrix maps to a receptive field in the input layer, identifying this patch as either real or fake. 3.3.Objective FunctionAs discussed earlier, the goal of is to convert an HDR image into its tone-mapped LDR version, and the goal of is to distinguish the generated LDR image from the ground-truth LDR image. The objective of cGAN3 can therefore be written as where tries to minimize , and tries to minimize .In addition to the cGAN loss, we incorporated a feature matching loss based on . We extract features from multiple layers of and attempt to match these intermediate representations between the real and generated LDR image, i.e., we minimize the difference between the features via the L1 norm: where denotes the ’th layer with activations of , and is the number of layers of . In this experiment, we chose five convolution layers in the five encoding blocks of .Additionally, we appended the perceptual loss used by Johnson et al.,34 which consists of the features computed from every single layer of the pretrained Inception V3 network,24 given by where denotes the ’th layer with activations of the Inception V3 network, and is the selected number of layers in the Inception V3 network. In this experiment, we empirically choose five activation layers of the Inception V3 network as to calculate .With and , we are able to keep both low-level image characteristics and high-level perceptual information. Combining these losses together, our final objective is expressed as where and control the weight of and with respect to . Here we set and , recommended by Rana et al.63.4.Training and TestingWe deploy different training and testing scheme combinations to achieve better performance. 3.4.1.TrainingWe adopt three training schemes.
Fig. 4The purple and blue boxes, respectively, show how we generate training pairs for training schemes A and B. ![]() All training schemes used resolution images as the training database, so the training process took less time and resources than using high-resolution images. The Adam optimizer35 was used for all three schemes, with , , . We set the batch size to 1 and trained until the loss converged. The training process was deployed on an NVIDIA GeForce RTX 2080, and each training process can be finished within 30 h, which is much shorter than the 1-week training time in the muti-scale network propose by Rana et al.6 3.4.2.TestingWe deploy different testing schemes to evaluate the performance of our proposed adTMO.
4.Experimental SetupIn this section, we will detail the HDR image databases collected, how we pre- and postprocessed these databases. 4.1.DatabasesFrom the many open-source HDR image databases accessible online, we selected our databases based on their content diversity, usability, resolution, and quality. Table 1 summarizes the HDR image databases we used, with the majority being high-resolution. We used 105 images from Kalantari and Ramamoorthi45 to test adTMO, and 748 images from other 10 databases in Table 1 to train adTMO. Table 1HDR image databases. 4.2.ResizingWe used two collections of resolution images for training. The first set of images were the original images resized to resolution (based on training scheme A), whereas the second set of images were randomly cropped from resized images (based on training scheme B). For testing purpose, we resized HDR images into two resolutions: and . 4.3.Target LDR Images GenerationAll the collected HDR images were unlabeled, i.e., the ground-truth LDR images were unknown. To solve this problem, for each HDR image, we applied 30 different TMOs to get 30 LDR image candidates using the MATLAB HDR TOOLBOX46 and followed the suggestion to apply GammaTMO after tone-mapping as some specific TMOs require gamma encoding. From these 30 LDR image candidates, we selected the one with the highest TMQI as the ground-truth LDR image. Table 2 summarizes the performance of each TMO when applied to the resized HDR images. In Table 2, we provide the average TMQI for each TMO after applying it to the whole training set, and the number of LDR images with the highest TMQI among 30 candidates. The last row tabulates the average TMQI of the selected 748 target LDR images. Among the TMOs provided by the MATLAB HDR TOOLBOX, WardHistAdjTMO reaches the highest average TMQI and provides the most ground-truth LDR images (124 images). Apart from RamanTMO, which contributed 0 ground-truth images, all other TMOs provide at least one image for the ground-truth set. Table 2TMOs performance in tone-mapping 256×256 HDR images.
This approach to generate target LDR images is similar to the one proposed by Cai et al.47 to generate high-contrast images. Both our work and theirs aim to reproduce satisfactory natural LDR images. Although we focus on keeping the structural similarity from the HDR images and retaining the color naturalness, Cai et al. aimed to produce a high-contrast image from an under-/over-exposed image. Difference also exists in how to select the “ground-truth” target image. We use an objective metric TMQI to select a ground-truth LDR image, whereas Cai et al. used a subjective ranking to select a ground-truth high-contrast image. 4.4.NormalizationWe linearly normalized the pixel value of input HDR and LDR images into [0, 1]. For input HDR images, the min/max normalization was applied: where and are the maximum and minimum pixel values of the input HDR image, respectively. For input LDR image, we applied to do the normalization so that the pixel values of input LDR image are also in the range of [0, 1].4.5.Luminance Extraction and Color ReproductionWhen training and testing our proposed adTMO, we used the luminance channel rather than the RGB channels of the input images to ease the computation complexity and reduce the memory requirement. Before training, we calculated the weighted sum of the RGB channels to extract the luminance channel with the weights from Ref. 6: After generating the luminance channel from , we used to reproduce the RGB channels, where and are the input and output luminance channels, respectively, and and are the RGB channels of the original HDR image and the generated LDR image after color reproduction. After color reproduction, some pixel values would be larger than 255 and they were reduced to 255 to maintain the 8-bit RGB range. 5.ResultsIn this section, we discuss the results of our proposed adTMO, in terms of multiple metrics of the generated LDR images in different training/testing schemes. Figure 6 demonstrates one scenario of LDR content in the RGB channels after color reproduction, in different training/testing schemes. We omit the generated LDR content in testing scheme Y because they were the images used for constructing the images in testing scheme Z. LDR images in testing scheme W [(a), (d), and (g)] have higher TMQI, but such conversion is meaningless, as many details are lost in the resizing operation. LDR images of testing scheme X, Z in training scheme A [(b), (c)] have lower TMQI with shadows around the flowers, because we only trained adTMO with resized images so that many fine details from the original images were lost. After we add cropped images into training databases, adTMO was able to learn how to keep the details of the original images. Therefore, the LDR images of testing scheme X in training scheme B, C [(e) and (h)] look more natural and have higher TMQI. The LDR images of testing scheme Z [(c), (f), and (i)] show “concatenated” edges, because cropping a complete image into pieces and generating their tone-mapped LDR images individually break the internal connections between these pieces. Future work is required to generate these individual images and combine them in such a way that these edges are removed while maintaining the high contrast in each individual image. Some finer details are not kept well by using the proposed adTMO. It should be noted that edge-preserving techniques such as bilateral filtering or guided image filtering have shown great promise in alleviating this problem. Further experimentation is required, and we plan in the future to incorporate these techniques into a deep-learning based TMO to create a more robust operator. Fig. 6The RGB channels of LDR images generated by adTMO after color reproduction. (a)–(c) are based on training scheme A; (d)–(f) are based on training scheme B; (g)–(i) are based on training scheme C. (a), (d), (g)are based on testing scheme W; (b), (e), (h) are based on testing scheme X; and (c), (f), (i) are based on testing scheme Z. ![]() We chose training scheme C to train the proposed adTMO, testing scheme W to tone-map resolution images and testing scheme X to tone-map resolution images given that train scheme C has the larger data set for training, and the resulting LDR images [(g) and (h)] have higher TMQI. In Fig. 7, we demonstrate qualitative comparisons of adTMO and other top-9-ranked TMOs that produce the highest TMQI for four different scenarios, in generating resolution images. In most scenarios, including indoor/outdoor, irregular geometric shape, large colors range, and drastic luminance changes, our adTMO outperforms all other TMOs on TMQI metric. As well, the LDR images generated by adTMO do not suffer contrast problems like other LDR images. Tables 3 and 4 list different metrics mentioned in Sec. 2 of the test dataset tone-mapped by 30 TMOs and the proposed adTMO. We modify the PPL so that it can be used to evaluate TMOs. Specifically, the PPL is calculated as follows: where represent the function mapping latent space to style vector in adTMO, is uniformly distributed between 0 and 1, represents for linear interpolation, is the generator function to create image, measures the perceptual distance between the images, and is set as here. In generating resolution images, our proposed adTMO outperforms all other TMOs with regard to the metric FID and outperforms most of TMOs with regard to other metrics. In generating resolution images, our proposed adTMO outperforms all other TMOs with regard to the metrics TMQI, SSIM, and MS-SSIM and outperforms most other TMOs with regard to FID and PPL. We also divided the images into two sets, one for indoor scenes and another for outdoor scenes. Both reach high TMQI (0.89 and 0.90) for resolution images. Our deep learning-based tone mapping algorithm uses a mixture of best features from other TMOs. In the absence of interactive parameter adjustment as it is not always available, our approach offers the best TMQI.Fig. 7Qualitative comparisons of adTMO and top-9-ranked TMOs for outdoor and indoor scenes on TMQI metric. ![]() Table 3Qualitative comparisons of adTMO and all other TMOs for 256×256 resolution images on SSIM, MS-SSIM, FID, and PPL metrics. The bold values indicate the metric where adTMO performs the best amongst all other TMOs.
Table 4Qualitative comparisons of adTMO and all other TMOs for 1024×2048 resolution images on SSIM, MS-SSIM, FID, PPL, and face detection accuracy metrics. The bold values indicate the metric where adTMO performs the best amongst all other TMOs.
In addition to the above-mentioned metrics, we also applied a face detection technique to the generated LDR images to measure the face detection accuracy as HDR-LDR translation is often used in security and healthcare applications. The face detection accuracy is defined as , where TP and FN represent the number of faces that are detected and not detected, respectively. The face detector used in this paper is the Haar cascades face detector,48 and the test set we used for evaluation is by Kalantari and Ramamoorthi,45 which consists of HDR images containing human faces. Our proposed adTMO reaches the highest face detection accuracy compared with other TMOs. The main reason contributing to this is that we use the pretrained Inception V3 network24 to derive the perceptual loss, so our generated LDR images look more natural, and the face detector trained on natural images can achieve higher accuracy in LDR images generated by our adTMO. Overall, adTMO output has the highest quality, regarding high-resolution images and is comparable to the results for images. 6.ConclusionWe propose an adTMO, which can adaptively generate high-resolution and high-quality LDR images. We explore different training and testing schemes and find the best possible combination to generate the highest quality images. We use multiple metrics including TMQI, SSIM, MS-SSIM, and face detection accuracy to measure the performance of the proposed adTMO. When testing on low-resolution LDR images, our adTMO has the highest performance on the FID metric across all other TMOs. When testing on high-resolution LDR images, our adTMO has the highest performance on TMQI, SSIM, MS-SSIM, and face detection accuracy over all other TMOs. Looking specifically at the TMQI metric, the proposed adTMO achieves a TMQI of , which is superior to the DeepTMO’s6 . In addition, we have the advantage in the training time where we spend 30 h for training, which is much short than DeepTMO’s 1 week. AcknowledgmentsThis project was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) through the grant “Biometric-enabled Identity management and Risk Assessment for Smart Cities,” and the Mitacs Globalink Graduate Fellowship, Canada. ReferencesF. Mccollough, Complete Guide to High Dynamic Range Digital Photography, Lark Books(2008). Google Scholar
E. Reinhard et al., High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting, Morgan Kaufmann(2010). Google Scholar
M. Mirza and S. Osindero,
“Conditional generative adversarial nets,”
(2014). Google Scholar
P. Isola et al.,
“Image-to-image translation with conditional adversarial networks,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1125
–1134
(2017). https://doi.org/10.1109/CVPR.2017.632 Google Scholar
T.-C. Wang et al.,
“High-resolution image synthesis and semantic manipulation with conditional GANs,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
8798
–8807
(2018). https://doi.org/10.1109/CVPR.2018.00917 Google Scholar
A. Rana et al.,
“Deep tone mapping operator for high dynamic range images,”
IEEE Trans. Image Process., 29 1285
–1298
(2019). https://doi.org/10.1109/TIP.2019.2936649 IIPRE4 1057-7149 Google Scholar
X. Cao et al.,
“Adversarial and adaptive tone mapping operator for high dynamic range images,”
in Proc. IEEE Symp. Ser. Comput. Intell. (SSCI),
1814
–1821
(2020). https://doi.org/10.1109/SSCI47803.2020.9308535 Google Scholar
G. W. Larson, H. Rushmeier and C. Piatko,
“A visibility matching tone reproduction operator for high dynamic range scenes,”
IEEE Trans. Vis. Comput. Graphics, 3
(4), 291
–306
(1997). https://doi.org/10.1109/2945.646233 1077-2626 Google Scholar
F. Drago et al.,
“Adaptive logarithmic mapping for displaying high contrast scenes,”
Proc. Comput. Graphics Forum, 22
(3), 419
–426
(2003). https://doi.org/10.1111/1467-8659.00689 Google Scholar
K. Chiu et al.,
“Spatially nonuniform scaling functions for high contrast images,”
in Proc. Graphics Interface,
245
–245
(1993). Google Scholar
J. Tumblin, J. K. Hodgins and B. K. Guenter,
“Two methods for display of high contrast images,”
ACM Trans. Graphics, 18
(1), 56
–94
(1999). https://doi.org/10.1145/300776.300783 ATGRDF 0730-0301 Google Scholar
I. Goodfellow et al.,
“Generative adversarial nets,”
in Proc. Adv. Neural Inf. Process. Syst.,
2672
–2680
(2014). Google Scholar
V. A. Patel, P. Shah and S. Raman,
“A generative adversarial network for tone mapping HDR images,”
in Proc. Conf. Comput. Vision, Pattern Recognit., Image Process. and Graphics,
220
–231
(2017). https://doi.org/10.1007/978-981-13-0020-2_20 Google Scholar
C. Tomasi and R. Manduchi,
“Bilateral filtering for gray and color images,”
in Sixth Int. Conf. Comput. Vision (IEEE Cat. No. 98CH36271),
839
–846
(1998). https://doi.org/10.1109/ICCV.1998.710815 Google Scholar
F. Porikli,
“Constant time o (1) bilateral filtering,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
1
–8
(2008). https://doi.org/10.1109/CVPR.2008.4587843 Google Scholar
Z. Farbman et al.,
“Edge-preserving decompositions for multi-scale tone and detail manipulation,”
ACM Trans. Graphics, 27
(3), 1
–10
(2008). https://doi.org/10.1145/1360612.1360666 ATGRDF 0730-0301 Google Scholar
K. He, J. Sun, X. Tang,
“Guided image filtering,”
in Comput. Vision-ECCV 2010,
1
–14
(2010). Google Scholar
D. Min et al.,
“Fast global image smoothing based on weighted least squares,”
IEEE Trans. Image Process., 23
(12), 5638
–5653
(2014). https://doi.org/10.1109/TIP.2014.2366600 IIPRE4 1057-7149 Google Scholar
Z. Li et al.,
“Weighted guided image filtering,”
IEEE Trans. Image Process., 24 120
–129
(2015). https://doi.org/10.1109/TIP.2014.2371234 IIPRE4 1057-7149 Google Scholar
C. Zheng et al.,
“Single image brightening via multi-scale exposure fusion with hybrid learning,”
IEEE Trans. Circuits Syst. Video Technol., 31
(4), 1425
–1435
(2020). https://doi.org/10.1109/TCSVT.2020.3009235 Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612
(2004). https://doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149 Google Scholar
Z. Wang, E. P. Simoncelli and A. C. Bovik,
“Multiscale structural similarity for image quality assessment,”
in Proc. Thirty-Seventh Asilomar Conf. Signals, Syst. & Comput.s,
1398
–1402
(2003). Google Scholar
M. Heusel et al.,
“GANs trained by a two time-scale update rule converge to a local nash equilibrium,”
in Proc. Adv. Neural Inf. Process. Syst.,
6626
–6637
(2017). Google Scholar
C. Szegedy et al.,
“Rethinking the inception architecture for computer vision,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
2818
–2826
(2016). https://doi.org/10.1109/CVPR.2016.308 Google Scholar
T. Karras, S. Laine and T. Aila,
“A style-based generator architecture for generative adversarial networks,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
4401
–4410
(2019). https://doi.org/10.1109/CVPR.2019.00453 Google Scholar
K. Simonyan and A. Zisserman,
“Very deep convolutional networks for large-scale image recognition,”
(2014). Google Scholar
P. Ledda et al.,
“Evaluation of tone mapping operators using a high dynamic range display,”
ACM Trans. Graphics, 24
(3), 640
–648
(2005). https://doi.org/10.1145/1073204.1073242 ATGRDF 0730-0301 Google Scholar
H. Yeganeh and Z. Wang,
“Objective quality assessment of tone-mapped images,”
IEEE Trans. Image Process., 22
(2), 657
–667
(2012). https://doi.org/10.1109/TIP.2012.2221725 IIPRE4 1057-7149 Google Scholar
K. Ma et al.,
“High dynamic range image compression by optimizing tone mapped image quality index,”
IEEE Trans. Image Process., 24
(10), 3086
–3097
(2015). https://doi.org/10.1109/TIP.2015.2436340 IIPRE4 1057-7149 Google Scholar
K. Debattista,
“Application-specific tone mapping via genetic programming,”
Proc. Comput. Graphics Forum, 37
(1), 439
–450
(2018). https://doi.org/10.1111/cgf.13307 Google Scholar
O. Ronneberger, P. Fischer and T. Brox,
“U-net: convolutional networks for biomedical image segmentation,”
Lect. Notes Comput. Sci., 9351 234
–241
(2015). https://doi.org/10.1007/978-3-319-24574-4_28 LNCSD9 0302-9743 Google Scholar
C. Li and M. Wand,
“Precomputed real-time texture synthesis with markovian generative adversarial networks,”
in Proc. Eur. Conf. Comput. Vision,
702
–716
(2016). Google Scholar
A. Radford, L. Metz and S. Chintala,
“Unsupervised representation learning with deep convolutional generative adversarial networks,”
(2015). Google Scholar
J. Johnson, A. Alahi and L. Fei-Fei,
“Perceptual losses for real-time style transfer and super-resolution,”
in Proc. Eur. Conf. Comput. Vision,
694
–711
(2016). Google Scholar
D. P. Kingma and J. Ba,
“Adam: a method for stochastic optimization,”
(2014). Google Scholar
F. Xiao et al.,
“High dynamic range imaging of natural scenes,”
in Proc. Color and Imaging Conf.,
337
–342
(2002). Google Scholar
M.-A. Gardner et al.,
“Learning to predict indoor illumination from a single image,”
(2017). Google Scholar
P. Stanczyk and C. Phillips,
“openexr images,”
(2020) https://github.com/AcademySoftwareFoundation/openexr-images Google Scholar
B. Funt and L. Shi,
“The rehabilitation of maxRGB,”
in Proc. Color and Imaging Conf.,
256
–259
(2010). Google Scholar
P. Modin,
“HDR Vault Image Set (Version 1.0.0). Zenodo,”
(2018). https://zenodo.org/record/1245790#.YRazFYgzY2w Google Scholar
M. D. Fairchild,
“The HDR photographic survey,”
in Proc. Color and Imaging Conf.,
233
–238
(2007). Google Scholar
Pfstools Google Group, “Pfstools HDR image gallery,”
http://pfstools.sourceforge.net Google Scholar
, “HDR source image gallery,”
(2018) http://resources.mpi-inf.mpg.de/hdr/gallery.html Google Scholar
W. J. Adams et al.,
“The southampton-york natural scenes (SYNS) dataset: statistics of surface attitude,”
Sci. Rep., 6 35805
(2016). https://doi.org/10.1038/srep35805 SRCEC3 2045-2322 Google Scholar
N. K. Kalantari and R. Ramamoorthi,
“Deep high dynamic range imaging of dynamic scenes,”
ACM Trans. Graphics, 36
(4), 1
–12
(2017). https://doi.org/10.1145/3072959.3073609 ATGRDF 0730-0301 Google Scholar
F. Banterle et al., Advanced High Dynamic Range Imaging, CRC Press(2017). Google Scholar
J. Cai, S. Gu and L. Zhang,
“Learning a deep single image contrast enhancer from multi-exposure images,”
IEEE Trans. Image Process., 27
(4), 2049
–2062
(2018). https://doi.org/10.1109/TIP.2018.2794218 IIPRE4 1057-7149 Google Scholar
P. Viola and M. Jones,
“Rapid object detection using a boosted cascade of simple features,”
in Proc. IEEE Comput. Soc. Conf. Comput. Vision and Pattern Recognit.,
I–I
(2001). https://doi.org/10.1109/CVPR.2001.990517 Google Scholar
BiographyXingdong Cao received his BSc degree in electrical engineering from Zhejiang University, Zhejiang, China, in 2019. He is currently an MSc student under the supervision of professor Svetlana Yanushkevich at the Biometric Technologies Laboratory, Department of Electrical and Software Engineering, University of Calgary, Calgary, Alberta, Canada. His research interests include applying machine learning technologies to the biometrics field. Kenneth Lai received his BSc and MSc degrees from the University of Calgary, Calgary, Alberta, Canada, in 2012 and 2015, respectively, where he is currently pursuing his PhD in the Department of Electrical and Software Engineering. His areas of interest include biometrics and its application to security and health care systems. Michael Smith is a professor emeritus in electrical and software engineering at Schulich School of Engineering, University of Calgary, Calgary, Canada, with research interests in software engineering and customized real-time digital signal processing algorithms in the context of mobile embedded systems and biomedical instrumentation. He is a senior member of IEEE. Svetlana Yanushkevich received her Dr.Tech.Sc. (Dr. Habilitated) degree from the Warsaw University of Technology in 1999. She is currently a professor in the Department of Electrical and Software Engineering at the University of Calgary. She is directing the Biometric Technologies Laboratory and conducting research in the area of biometric-based authentication technologies. She is a senior member of IEEE. |