Learning To Correct Overexposed and Underexposed Photos
Learning To Correct Overexposed and Underexposed Photos
Learning To Correct Overexposed and Underexposed Photos
1 Introduction
The exposure used at capture time directly affects the overall brightness of the
final rendered photograph. Digital cameras control exposure using three main
factors: (i) capture shutter speed, (ii) f-number, which is the ratio of the focal
length to the camera aperture diameter, and (iii) the ISO value to control the
amplification factor of the received pixel signals. In photography, exposure set-
tings are represented by exposure values (EVs), where each EV refers to different
combinations of camera shutter speeds and f-numbers that result in the same
exposure effect—also referred to as ‘equivalent exposures’ in photography.
Digital cameras can adjust the exposure value of captured images for the pur-
pose of varying the brightness levels. This adjustment can be controlled manually
?
This work was done while Mahmoud Afifi was an intern at the SAIC.
2 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
Overexposure
Input images Google Photo Enhancer Photoshop HDR iPhone Photo Enhancer Our method
Fig. 1: The first column shows photographs with over- and underexposure errors.
Results from current commercial software and the proposed method are shown.
2 Related Work
t-SNE visualization
of our dataset
Linear raw-RGB
+1.5 EV +1 EV
-1.5 EV
Underexposed image
from our dataset
+0 EV -1 EV
+0 EV
Properly exposed image from
Example from the LOL our dataset -1.5 EV Properly exposed ref.
dataset [5]
Fig. 2: Dataset overview. Our dataset contains images with different exposure
error types and their corresponding properly exposed reference images. Shown
is a t-SNE visualization [37] of all images in our dataset and the low-light (LOL)
paired dataset (outlined in red) [5]. Notice that LOL covers a relatively small
fraction of the possible exposure levels, as compared to our introduced dataset.
Our dataset was rendered from linear raw-RGB images taken from the MIT-
Adobe FiveK dataset [38]. Each image was rendered with different relative ex-
posure values (EVs) by an accurate emulation of the camera ISP processes.
Paired Dataset Paired datasets are crucial for supervised learning for image
enhancement tasks. Existing paired datasets for exposure correction focus only
on low-light underexposed images. Representative examples include Wang et al.’s
dataset [7] and the low-light (LOL) paired dataset [5]. Unlike existing datasets
for exposure correction, we introduce a large image dataset rendered with a wide
range of exposure errors. Fig. 2 shows a comparison between our dataset and
the LOL dataset in terms of the number of images and the variety of exposure
errors in each dataset. The LOL dataset covers a relatively small fraction of the
possible exposure levels, as compared to our introduced dataset. Our dataset
is based on the Adobe-MIT FiveK dataset [38] and is accurately rendered by
adjusting the high tonal values provided in camera sensor raw-RGB images to
realistically emulate camera exposure errors. An alternative worth noting is to
use a large HDR dataset to produce training data—for example, the Google
HDR+ dataset [12]. One drawback, however, is that this dataset is a composite
of a varying number of smartphone captured raw-RGB images that were first
aligned to a composite raw-RGB image. The target ground truth image is based
on an HDR-to-LDR algorithm applied to this composite raw-RGB image [8,
12]. We opt instead to use the FiveK dataset as it starts with a single high-
quality raw-RGB image and the ground truth result is generated by an expert
photographer.
3 Our Dataset
To train our model, we need a large number of training images rendered with
realistic over- and underexposure errors and corresponding properly exposed
Learning to Correct Overexposed and Underexposed Photos 5
Level 1
…
swap swap
Level ݊
(A) Input image and the Laplacian (B) Properly exposed reference (C) Reconstructed image using the pyramid in (A) (D) Reconstructed image using the pyramid in (A)
pyramid image and the Laplacian pyramid after swapping the last level of the pyramid with after swapping the last two levels of the pyramid
the corresponding one in (B) with the corresponding levels in (B)
ground truth images. As discussed in Sec. 2, such datasets are currently not
publicly available. For this reason, our first task was to create a new dataset.
Our dataset is rendered from the MIT-Adobe FiveK dataset [38], which has
5,000 raw-RGB images and corresponding sRGB images rendered manually by
five expert photographers [38].
For each raw-RGB image, we use the Adobe Camera Raw SDK [39] to em-
ulate different EVs as would be applied by a camera [40]. Adobe Camera Raw
accurately emulates the nonlinear camera rendering procedures using metadata
embedded in each DNG raw file [40, 41]. We render each raw-RGB image with
different digital EVs to mimic real exposure errors. Specifically, we use the rel-
ative EVs −1.5, −1, +0, +1, and +1.5 to render images with underexposure
errors, a zero gain of the original EV, and overexposure errors, respectively.
The zero-gain relative EV is equivalent to the original exposure settings applied
onboard the camera during capture time.
As the ground truth images, we use images that were manually retouched by
an expert photographer (referred to as Expert C in [38]) as our target correctly
exposed images, rather than using our rendered images with +0 relative EV.
The reason behind this choice is that a significant number of images contain
backlighting or partial exposure errors in the original exposure capture settings.
The expert adjusted images were performed in ProPhoto RGB color space [38]
(rather than raw-RGB), which we converted to a standard 8-bit sRGB color
space encoding.
In total, we generated 24,330 8-bit sRGB images with different digital ex-
posure settings. We discarded a small number of images that had misalignment
with their corresponding ground truth image. These misalignments were due to
different usage of the DNG crop area metadata by Adobe Camera Raw SDK
and the expert. Our dataset is divided into three sets: (i) training set of 17,675
images, (ii) validation set of 750 images, and (iii) testing set of 5,905 images.
The training, validation, and testing sets, use different images taken from the
FiveK dataset. This means the training, validation, and testing images do not
share any images in common. Fig. 2 shows examples of our generated 8-bit sRGB
images and the corresponding properly exposed 8-bit sRGB reference images.
6 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
…
3-layer U-Net with 16
Encoder Decoder + Encoder Decoder + … + Encoder Decoder + output channels of
the first level of the
encoder
Level ݊ െ ͳ
2×2 transposed conv
layer with 2 stride
Level ݊
ʹሺିଶሻ ∑ െ + ʹሺିଶሻ ∑ െ +… ∑ െ
conv, LReLU,
conv, LReLU
conv, LReLU
…
conv, bn,
LReLU
conv
Ground truth Corrected Ground truth Corrected Ground truth Corrected
Laplacian pyramid ܂ሺୀሻ ܇ሺୀሻ ܂ሺୀିଵሻ ܇ሺୀିଵሻ
܂ ܇
െ ሺ ሻ
Pyramid loss Reconstruction loss Adversarial loss
4 Our Method
Given an 8-bit sRGB input image, I, rendered with the incorrect exposure set-
ting, our method aims to produce an output image, Y, with fewer exposure
errors than those in I. As we target both over- and underexposed errors, our
input image, I, is expected to contain regions of nearly over- or under-saturated
values with corrupted color and detail information. We propose to correct color
and detail errors of I in a sequential manner. Specifically, we propose to pro-
cess a multi-resolution representation of I, rather than directly dealing with the
original form of I. We use the Laplacian pyramid [42] as our multiresolution
decomposition, which is derived from the Gaussian pyramid of I.
Let X represent the Laplacian pyramid of I with n levels, such that X(l) is the
lth level of X. The last level of this pyramid (i.e., X(n) ) captures low-frequency
information of I, while the first level (i.e., X(1) ) captures the high-frequency
information. Such frequency levels can be categorized into: (i) global color infor-
mation of I stored in the low-frequency level and (ii) image coarse-to-fine details
stored in the mid- and high-frequency levels. These levels can be later used to
reconstruct the full-color image I.
Fig. 3 motivates our coarse-to-fine approach to exposure correction. Figs.
3-(A) and (B) show an example overexposed image and its corresponding well-
exposed target, respectively. As observed, a significant exposure correction can
be obtained by using only the low-frequency layer (i.e., the global color infor-
mation) of the target image in the Laplacian pyramid reconstruction process,
as shown in Fig. 3-(C). We can then improve the final image by enhancing the
details in a sequential way by correcting each level of the Laplacian pyramid, as
shown in Fig. 3-(D). Practically, we do not have access to the properly exposed
image in Fig. 3-(B) at the inference stage, and thus our goal is to predict the
missing color/detail information of each level in the Laplacian pyramid.
Learning to Correct Overexposed and Underexposed Photos 7
4.3 Losses
where Lrec denotes the reconstruction loss, Lpyr the pyramid loss, and Ladv the
adversarial loss.
Reconstruction Loss: We use the L1 loss function between the reconstructed and
properly exposed reference images. This loss can be expressed as follows:
3hw
X
Lrec = |Y(p) − T(p)| , (2)
p=1
8 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
1st level
2nd level (final result)
4th level
2nd level
w/ pyramid loss
3rd level
4th level
Input image and its 4-level Laplacian pyramid Output of each sub-network Properly exposed ref. image
Fig. 5: Multiscale losses. Shown are the output of each sub-net trained with and
without the pyramid loss (Eq. 3).
where h and w denote the height and width of the training image, respectively,
and p is the index of each pixel in our corrected image, Y, and the correspond-
ing properly exposed reference image, T, respectively. The individual losses are
described below.
Pyramid Loss: To guide each sub-network to follow the Laplacian pyramid re-
construction procedure, we introduce dedicated losses at each pyramid level. Let
T(l) denote the lth level of the Gaussian pyramid of our reference image, T,
after upsampling by a factor of two. We use a simple interpolation process for
the upsampling operation [27]. Our pyramid loss is computed as follows:
n
X 3h
Xl wl
2(l−2)
Lpyr = Y(l) (p) − T(l) (p) , (3)
l=2 p=1
where hl and wl are twice the height and width of the lth level in the Laplacian
pyramid of the training image, respectively, and p is the index of each pixel
in our corrected image at the lth level Y(l) and the properly exposed reference
image at the same level T(l) , respectively. The pyramid loss not only gives a
principled interpretation of the task of each sub-network but also results in
less visual artifacts compared to training using only the reconstruction loss (see
Fig. 5). Notice that without their intermediate pyramid losses, the multi-scale
reconstructions deviate widely from the intermediate Gaussian targets.
ܹ ܪ
ൈ ൈ݉ ܹ ܪ
ʹ ʹ ൈ ൈ ʹ݉
ܹ ܪ ʹ ʹ
ൈ ൈ ʹିଵ ݉ ܹ ܪ
ʹିଵ ʹିଵ ൈ ൈ ʹିଵ ݉
ʹିଵ ʹିଵ
… …
ܹ ܪ ܹ ܪ
ൈ ൈ ʹିଵ ݉ ൈ ൈ ʹ ݉
ܹ ܪ ʹ ʹ ʹ ʹ
ൈ ൈ ʹ݉ ܹ ܪ ܹ ܪ
Ͷ Ͷ ൈ ൈ ʹିଵ ݉ ൈ ൈ ʹ ݉
ʹିଵ ʹିଵ ʹିଵ ʹିଵ
ܹ ܪ
ൈ ൈ ʹ݉ ܹ ܪ ܹ ܪ
ʹ ʹ ൈ ൈ ʹ݉ ൈ ൈ Ͷ݉
ʹ ʹ ʹ ʹ
ܹൈܪൈ݉ ܹ ൈ ܮൈ ʹ݉
ܹൈܮൈ݉ ܹൈܮൈ͵
ܹൈܮൈ݉
Output of 3×3 covn layers with stride 1 and padding 1 Output of 2×2 transposed conv layers Skip connections
Output of 2×2 max-pooling layers with stride 2 Output of Leaky ReLU (LReLU) layers ݉ǡ ܮ՜ output channels of 1st level in the encoder
and number of levels in the encoder/decoder
Output of 1×1 covn layer with stride 1 and padding 1 Output of depth concatenation layers ܹǡ ܪ՜ input width and input height
Ͷ ൈ Ͷ ൈ ͳ
Output of batch normalization layer
ͳʹͺ ൈ ͳʹͺ ൈ ͺ
(B) Discriminator architecture used in our adversarial training
Fig. 6: Details of the architectures used in our work. (A) Encoder-decoder archi-
tecture [47] used to design our sub-networks in the main network. (B) Discrim-
inator architecture.
4.4.1 Main Network Our main network consists of four sub-networks with
∼7M parameters trained in an end-to-end manner. Each sub-network accepts a
different representation of the input image extracted from the Laplacian pyramid
decomposition. The first sub-network is a four-layer encoder-decoder network
with skip connections (i.e., U-Net-like architecture [47]). The output of the first
convolutional (conv) layer has 24 channels. Our first sub-network has ∼4.4M
learnable parameters and it accepts the low-frequency band level of the Laplacian
pyramid, i.e., X(4) . The result of the first sub-network is then upscaled using a
2×2×3 transposed conv layer with three output channels and a stride of two.
This processed layer is then added to the first mid-frequency band level of the
Laplacian pyramid (i.e., X(3) ) and is fed to the second sub-network.
The second sub-network is a three-layer encoder-decoder network with skip
connections. It has 24 channels in the first conv layer of the encoder, with a total
of ∼1.1M learnable parameters. The second sub-network processes the upscaled
input from the first sub-network and outputs a residual layer, which is then added
back to the input to the second sub-network followed by a 2×2×3 transposed
conv layer with three output channels and a stride of two. The result is added to
10 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
the second mid-frequency band level of the Laplacian pyramid (i.e., X(2) ) and is
fed to the third sub-network, which generates a new residual that is added back
again to the input of this sub-network.
The third sub-network has the same design as the second network. Finally,
the result is added to the high-frequency band level of the Laplacian pyramid
(i.e., X(1) ) and is fed to the fourth sub-network to produce the final processed
image.
The final sub-network is a three-layer encoder-decoder network with skip
connections and has ∼482.2K learnable parameters, where the output of the
first conv layer in its encoder has 16 channels. We provide the details of the
main encoder-decoder architecture of each sub-network in Fig. 6-(A).
4.4.3 Training Details We use He et al.’s method [49] to initialize the weights
of our encoder and decoder conv layers, while the bias terms are initialized to
zero. We minimize our loss functions using the Adam optimizer [50] with a decay
rate β1 = 0.9 for the exponential moving averages of the gradient and a decay
rate β2 = 0.999 for the squared gradient. We use a learning rate of 10−4 to update
the parameters of our main network and a learning rate of 10−5 to update our
discriminator’s parameters.
We train our network on patches with different dimensions. Training begins
without the adversarial loss, Ladv , then Ladv is added to enhance the results
of our initial training [51]. Specifically, we begin our training without Ladv on
176,590 patches with dimensions of 128 × 128 pixels extracted randomly from
our training images for 40 epochs. The mini-batch size is set to 32. The learning
rate is decayed by a factor of 0.5 after the first 20 epochs. Then, we continue
training on another 105,845 patches with dimensions of 256×256 pixels for 30
epochs with a mini-batch size of eight. At this stage, we train our main network
without Ladv for 15 epochs and continue training for another 15 epochs with
Ladv . The learning rates for the main network and the discriminator network
are decayed by a factor of 0.5 every 10 epochs. Finally, we fine-tune the trained
networks on another 69,515 training patches with dimensions of 512×512 pixels
for 20 epochs with a mini-batch size of four and a learning rate decay of 0.5
applied every five epochs.
We discard any training patches that have an average intensity less than 0.02
or higher than 0.98. We also discard homogeneous patches that have a gradient
magnitude less than 0.06. We randomly left-right flip training patches for data
augmentation.
Learning to Correct Overexposed and Underexposed Photos 11
In the adversarial training, we optimize both the main network and the dis-
criminator in an iterative manner. At each optimization step, the learnable pa-
rameters of each network are updated to minimize its own loss function. The
discriminator is trained to minimize the following loss function [48]:
where r (T) refers to the discriminator loss of recognizing the properly exposed
reference image T, while c (Y) refers to the discriminator loss of recognizing our
corrected image Y. The r (T) and c (Y) loss functions are given by the following
equations:
r (T) = − log (S (D (T))) , (6)
Our network is fully convolutional and can process input images with different
resolutions. While our model requires a reasonable memory size (∼7M param-
eters), processing high-resolution images requires a high computational power
that may not always be available. Furthermore, processing images with consid-
erably higher resolution (e.g., 16-megapixel) than the range of resolutions used
in the training process can affect our model’s robustness with large homogeneous
image regions. This issue arises because our network was trained on a certain
range of effective receptive fields, which is very low compared to the receptive
fields required for images with very high resolution. To that end, we use the bilat-
eral guided upsampling method [52] to process high-resolution images. First, we
resize the input test image to have a maximum dimension of 512 pixels. Then, we
process the downsampled version of the input image using our model, followed
by applying the fast upsampling technique [52] with a bilateral grid of 22×22×8
cells. This process allows us to process a 16-megapixel image in ∼4.5 seconds on
average. This time includes ∼0.5 seconds to run our network on an NVIDIAr
GeForce GTX 1080TM GPU and ∼4 seconds on an Intelr Xeonr E5-1607 @ 3.10
GHz machine for the guided upsampling process. Note the runtime of guided up-
sampling step can be significantly improved with a Halide implementation [53].
5 Empirical Evaluation
We compare our method against several existing methods for exposure correction
and image enhancement. We first present quantitative results and comparisons
in Sec. 5.1, followed by qualitative comparisons in Sec. 5.2. Finally, we present
ablation studies performed to validate our architecture and loss function in Sec.
5.3.
12 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
Expert D Expert E
Example input image Result obtained Results are evaluated against each of the five experts results from
with poor exposure by our method the MIT-Adobe FiveK dataset [38]
Fig. 7: We evaluate the results of input images against all five expert photogra-
phers’ edits from the FiveK dataset [38].
To evaluate our method, we use our test set, which consists of 5,905 images
rendered with different exposure settings, as described in Sec. 3. Specifically, our
test set includes 3,543 well-exposed/overexposed images rendered with +0, +1,
and +1.5 relative EVs, and 2,362 underexposed images rendered with −1 and
−1.5 relative EVs.
We adopt the following three standard metrics to evaluate the pixel-wise
accuracy and the perceptual quality of our results: (i) peak signal-to-noise ratio
(PSNR), (ii) structural similarity index measure (SSIM) [54], and (iii) perceptual
index (PI) [55]. The PI is given by:
where both Ma [56] and NIQE [57] are no-reference image quality metrics.
For the pixel-wise error metrics – namely, PSNR and SSIM – we compare the
results not only against the properly exposed rendered images by Expert C but
also with all five expert photographers in the MIT-Adobe FiveK dataset [38].
Though the expert photographers may render the same image in different ways
due to differences in the camera-based rendering settings (e.g., white balance,
tone mapping), a common characteristic over all rendered images by the expert
photographers is that they all have fairly proper exposure settings [38] (see
Fig. 7). For this reason, we evaluate our method against the five expert rendered
images as they all represent satisfactory exposed reference images.
We also evaluate a variety of previous non-learning and learning-based meth-
ods on our test set for comparison: histogram equalization (HE) [14], contrast-
limited adaptive histogram equalization (CLAHE) [16], the weighted variational
model (WVM) [59], the low-light image enhancement method (LIME) [3, 60],
HDR CNN [30], DPED models [35], deep photo enhancer (DPE) models [9], the
high-quality exposure correction method (HQEC) [4], RetinexNet [5], and deep
underexposed photo enhancer (UPE) [7]. To render the reconstructed HDR im-
ages generated by the HDR CNN method [30] back into LDR, we tested both the
deep reciprocating HDR transformation method (RHT) [34] and Adobe Photo-
shop’s (PS) HDR tool [58].
Learning to Correct Overexposed and Underexposed Photos 13
Input images Results from DPED [35] Our results Properly exposed ref. images
Fig. 8: Qualitative results of correcting overexposed images. Shown are the input
images, results from the DPED [35], our results, and the corresponding ground
truth images.
Table 1: Quantitative evaluation on our introduced test set. The best results
are highlighted with green and bold. The second- and third-best re-
sults are highlighted in yellow and red, respectively. We compare each
method with properly exposed reference image sets rendered by five expert pho-
tographers [38]. For each method, we present peak signal-to-noise ratio (PSNR),
structural similarity index measure (SSIM) [54], and perceptual index (PI) [55].
We denote methods designed for underexposure correction in gray. Non-deep
learning methods are marked by ∗. The terms U and S stand for unsupervised
and supervised, respectively. Notice that higher PSNR and SSIM values are
better, while lower PI values indicate better perceptual quality.
Learning to Correct Overexposed and Underexposed Photos 15
Input images Results from Deep UPE [9] Our results Properly exposed ref. images
Fig. 10: Qualitative comparison with HDR CNN [30] and Zhang et al. [26].
16 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
Input image Photoshop HDR [55] results DPE [9] results Our results
Fig. 11: Qualitative comparison with Adobe Photoshop’s local adaptation HDR
function [58] and DPE [9]. Input images are taken from Flickr.
Fig. 12: Qualitative comparison with several existing methods in correcting par-
tially overexposed regions due to backlighting. Input image is from Flickr.
available; the result is taken directly from the original paper [26]. As shown, our
results are arguably visually superior to the other methods, even when input
images have hard backlight conditions, as shown in the second row in Fig. 9 and
the example in Fig. 12.
We also ran our model on several images from Flickr that are outside our
introduced dataset, as shown in Figs. 1, 11, and 12. As with the images from
our proposed dataset, our results on the Flickr images are arguably superior to
the compared methods.
Image set NPE [24] ∗ LIME [3] ∗ WVM [59] ∗ RNet [5] KinD [6] EGAN [63] DBCP [64] Ours w/o Ladv Ours w/ Ladv
LIME [3] 3.91 4.16 3.79 4.42 3.72 3.72 3.78 3.76 3.76
NPE [24] 3.95 4.26 3.99 4.49 3.88 4.11 3.18 3.20 3.18
VV set [61] 2.52 2.49 2.85 2.60 - 2.58 - 2.28 2.28
DICM [62] 3.76 3.85 3.90 4.20 - - 3.57 2.55 2.50
Avg. 3.54 3.69 3.63 3.93 3.80 3.50 3.48 2.95 2.93
Input image Photoshop HDR [55] DPED [35] DPE [9] Ours
Fig. 13: Failure examples of correcting (top) overexposed and (bottom) under-
exposed images. The input images are taken from Flickr.
5.3.1 Loss Function Our loss function in Eq. 1 includes three main terms.
The first term is the standard reconstruction loss (i.e., L1 loss). The second and
third terms consist of the pyramid and adversarial losses, respectively, which are
introduced to further improve the reconstruction and perceptual quality of the
output images. In the following part of this section, we discuss the effect of these
loss terms.
Pyramid Loss Impact In Fig. 5, we show the output of each sub-network when
we train our model with and without the pyramid loss. We observe that the pyra-
mid loss helps to provide additional supervision to guide each sub-network to
follow a coarse-to-fine reconstruction. In this ablation study, we aim to quanti-
tatively evaluate the effect of the pyramid loss on our final results.
We train two light-weight models of our main network with and without our
pyramid loss term. Each model has four 3-layer U-Nets with a total of ∼4M
learnable parameters, where the number of output channels of the first encoder
in each U-Net is set to 24.
The training is performed on a sub-set of our training data for ∼150,000
iterations on 80,000 128×128 patches, ∼100,000 iterations on 40,000 256×256
patches, and ∼25,000 iterations on 25,000 512×512 patches. Table 3 shows the
results on 500 randomly selected images from our validation set. The results
18 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
Fig. 14: Comparison of results by varying the number of Laplacian pyramid lev-
els. Notice that higher PSNR and SSIM values are better, while lower PI values
indicate better perceptual quality.
Table 3: Results of our ablation study on 500 images randomly selected from
our validation set. We show the effects of: (i) the pyramid loss, Lpyr , and (ii)
the number of levels, n, in the main network. The best PSNR/SSIM values are
indicated with bold for each experiment.
show that the pyramid loss not only helps in providing a better interpretation
of the task of each sub-network but also improves the final results.
Input image Ours w/o adv. loss Ours w/ adv. loss Properly exposed ref. image
Fig. 15: Comparisons between our results with (w/) and without (w/o) the ad-
versarial loss for training. Notice that higher PSNR and SSIM values are better,
while lower PI values indicate better perceptual quality.
model in Sec. 5.3.1 to study the pyramid loss impact and the additional two
trained models have approximately the same number of parameters.
Table 3 shows the results obtained by each model on the same random val-
idation image subset used to study the pyramid loss impact in Sec. 5.3.1. Fig.
14 shows a qualitative comparison. As can be seen, the best quantitative and
qualitative results are obtained using the four-sub-net model (i.e., n = 4 levels).
6 Concluding Remarks
recting images rendered with exposure errors. We believe that our dataset will
help future work on improving exposure correction for photographs.
References
1. Peterson, B.: Understanding exposure: How to shoot great photographs with any
camera. AmPhoto Books (2016)
2. Karaimer, H.C., Brown, M.S.: A software platform for manipulating the camera
imaging pipeline. In: ECCV. (2016)
3. Guo, X., Li, Y., Ling, H.: LIME: Low-light image enhancement via illumination
map estimation. IEEE Transactions on Image Processing 26(2) (2017) 982–993
4. Zhang, Q., Yuan, G., Xiao, C., Zhu, L., Zheng, W.S.: High-quality exposure cor-
rection of underexposed photos. In: ACM MM. (2018)
5. Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light
enhancement. In: BMVC. (2018)
6. Zhang, Y., Zhang, J., Guo, X.: Kindling the darkness: A practical low-light image
enhancer. In: ACM International Conference on Multimedia. (2019)
7. Wang, R., Zhang, Q., Fu, C.W., Shen, X., Zheng, W.S., Jia, J.: Underexposed
photo enhancement using deep illumination estimation. In: CVPR. (2019)
8. Gharbi, M., Chen, J., Barron, J.T., Hasinoff, S.W., Durand, F.: Deep bilateral
learning for real-time image enhancement. ACM Transactions on Graphics (TOG)
36(4) (2017) 118:1–118:12
9. Chen, Y.S., Wang, Y.C., Kao, M.H., Chuang, Y.Y.: Deep photo enhancer: Unpaired
learning for image enhancement from photographs with GANs. In: CVPR. (2018)
10. Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: CVPR.
(2018)
11. Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: A white-box photo post-
processing framework. ACM Transactions on Graphics (TOG) 37(2) (2018) 26:1–
26:17
12. Hasinoff, S.W., Sharlet, D., Geiss, R., Adams, A., Barron, J.T., Kainz, F., Chen,
J., Levoy, M.: Burst photography for high dynamic range and low-light imaging
on mobile cameras. ACM Transactions on Graphics (TOG) 35(6) (2016) 1–12
13. Liba, O., Murthy, K., Tsai, Y.T., Brooks, T., Xue, T., Karnad, N., He, Q., Barron,
J.T., Sharlet, D., Geiss, R., Hasinoff, S.W., Pritch, Y., Levoy, M.: Handheld mobile
photography in very low light. ACM Transactions on Graphics (TOG) 38(6) (2019)
1–16
14. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley Longman
Publishing Co., Inc. (2001)
15. Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer,
T., ter Haar Romeny, B., Zimmerman, J.B., Zuiderveld, K.: Adaptive histogram
equalization and its variations. Computer Vision, Graphics, and Image Processing
39(3) (1987) 355–368
16. Zuiderveld, K.: Contrast limited adaptive histogram equalization. In: Graphics
Gems IV. (1994) 474–485
17. Celik, T., Tjahjadi, T.: Contextual and variational contrast enhancement. IEEE
Transactions on Image Processing 20(12) (2011) 3431–3441
18. Lee, C., Lee, C., Kim, C.S.: Contrast enhancement based on layered difference
representation of 2D histograms. IEEE Transactions on Image Processing 22(12)
(2013) 5372–5384
Learning to Correct Overexposed and Underexposed Photos 21
19. Yuan, L., Sun, J.: Automatic exposure correction of consumer photographs. In:
ECCV. (2012)
20. Yu, R., Liu, W., Zhang, Y., Qu, Z., Zhao, D., Zhang, B.: DeepExposure: Learning
to expose photos with asynchronously reinforced adversarial learning. In: NeurIPS.
(2018)
21. Park, J., Lee, J.Y., Yoo, D., So Kweon, I.: Distort-and-recover: Color enhancement
using deep reinforcement learning. In: CVPR. (2018)
22. Land, E.H.: The retinex theory of color vision. Scientific American 237(6) (1977)
108–129
23. Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the
gap between color images and the human observation of scenes. IEEE Transactions
on Image Processing 6(7) (1997) 965–976
24. Wang, S., Zheng, J., Hu, H.M., Li, B.: Naturalness preserved enhancement algo-
rithm for non-uniform illumination images. IEEE Transactions on Image Process-
ing 22(9) (2013) 3538–3548
25. Meylan, L., Susstrunk, S.: High dynamic range image rendering with a retinex-
based adaptive filter. IEEE Transactions on Image Processing 15(9) (2006) 2820–
2830
26. Zhang, Q., Nie, Y., Zheng, W.S.: Dual illumination estimation for robust exposure
correction. In: Computer Graphics Forum. (2019)
27. Mertens, T., Kautz, J., Van Reeth, F.: Exposure fusion: A simple and practical
alternative to high dynamic range photography. In: Computer Graphics Forum.
(2009)
28. Kalantari, N.K., Ramamoorthi, R.: Deep high dynamic range imaging of dynamic
scenes. ACM Transactions on Graphics (TOG) 36(4) (2017) 144–1
29. Endo, Y., Kanamori, Y., Mitani, J.: Deep reverse tone mapping. ACM Transactions
on Graphics (TOG) 36(6) (2017) 177:1–177:10
30. Eilertsen, G., Kronander, J., Denes, G., Mantiuk, R., Unger, J.: HDR image
reconstruction from a single exposure using deep CNNs. ACM Transactions on
Graphics (TOG) 36(6) (2017) 178:1–178:15
31. Moriwaki, K., Yoshihashi, R., Kawakami, R., You, S., Naemura, T.: Hybrid loss for
learning single-image-based HDR reconstruction. arXiv preprint arXiv:1812.07134
(2018)
32. Debevec, P.E., Malik, J.: Recovering high dynamic range radiance maps from
photographs. In: ACM SIGGRAPH. (1997)
33. Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from
multi-exposure images. IEEE Transactions on Image Processing 27(4) (2018)
2049–2062
34. Yang, X., Xu, K., Song, Y., Zhang, Q., Wei, X., Lau, R.W.: Image correction via
deep reciprocating HDR transformation. In: CVPR. (2018)
35. Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., Van Gool, L.: DSLR-quality
photos on mobile devices with deep convolutional networks. In: ICCV. (2017)
36. Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., Van Gool, L.: WESPE:
Weakly supervised photo enhancer for digital cameras. In: CVPR Workshops.
(2018)
37. Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. Journal of Machine
Learning Research 9 (2008) 2579–2605
38. Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global
tonal adjustment with a database of input / output image pairs. In: CVPR. (2011)
39. Adobe: Color and camera raw. https://helpx.adobe.com/ca/
photoshop-elements/using/color-camera-raw.html Accessed: 2020-03-05.
22 M. Afifi, K. G. Derpanis, B. Ommer, and M. S. Brown
40. Schewe, J., Fraser, B.: Real World Camera Raw with Adobe Photoshop CS5.
Pearson Education (2010)
41. Afifi, M., Price, B., Cohen, S., Brown, M.S.: When color constancy goes wrong:
Correcting improperly white-balanced images. In: CVPR. (2019)
42. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE
Transactions on Communications 31(4) (1983) 532–540
43. Denton, E.L., Chintala, S., szlam, a., Fergus, R.: Deep generative image models
using a Laplacian pyramid of adversarial networks. In: NeurIPS. (2015)
44. Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: Learning a generative model from
a single natural image. In: ICCV. (2019)
45. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks
for fast and accurate super-resolution. In: CVPR. (2017)
46. Ma, R., Hu, H., Xing, S., Li, Z.: Efficient and fast real-world noisy image denoising
by combining pyramid neural network and two-pathway unscented Kalman filter.
IEEE Transactions on Image Processing 29(1) (2020) 3927–3940
47. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomed-
ical image segmentation. In: MCCAI. (2015)
48. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y.: Generative adversarial nets. In: NeurIPS. (2014)
49. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on ImageNet classification. In: ICCV. (2015)
50. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
51. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided
person image generation. In: NeurIPS. (2017)
52. Chen, J., Adams, A., Wadhwa, N., Hasinoff, S.W.: Bilateral guided upsampling.
ACM Transactions on Graphics (TOG) 35(6) (2016) 1–8
53. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.:
Halide: A language and compiler for optimizing parallelism, locality, and recom-
putation in image processing pipelines. In: ACM SIGPLAN Conference on Pro-
gramming Language Design and Implementation. (2013)
54. Zhou Wang, Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assess-
ment: From error visibility to structural similarity. IEEE Transactions on Image
Processing 13(4) (2004) 600–612
55. Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM
challenge on perceptual image super-resolution. In: ECCV Workshops. (2018)
56. Ma, C., Yang, C.Y., Yang, X., Yang, M.H.: Learning a no-reference quality metric
for single-image super-resolution. Computer Vision and Image Understanding 158
(2017) 1–16
57. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image
quality analyzer. IEEE Signal Processing Letters 20(3) (2012) 209–212
58. Dayley, L.D., Dayley, B.: Photoshop CS5 Bible. John Wiley & Sons (2010)
59. Fu, X., Zeng, D., Huang, Y., Zhang, X.P., Ding, X.: A weighted variational model
for simultaneous reflectance and illumination estimation. In: CVPR. (2016)
60. Guo, X.: LIME: A method for low-light image enhancement. In: ACM MM. (2016)
61. Vonikakis, V.: Busting image enhancement and tone-mapping algorithms. https:
//sites.google.com/site/vonikakis/datasets Accessed: 2020-03-05.
62. Lee, C., Lee, C., Kim, C.S.: Contrast enhancement based on layered difference
representation. In: ICIP. (2012)
Learning to Correct Overexposed and Underexposed Photos 23
63. Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P.,
Wang, Z.: EnlightenGAN: Deep light enhancement without paired supervision.
arXiv preprint arXiv:1906.06972 (2019)
64. Lee, H., Sohn, K., Min, D.: Unsupervised low-light image enhancement using bright
channel prior. IEEE Signal Processing Letters 27 (2020) 251–255