Weber - 16 - Rapid, Detail-Preserving Image Downscaling
Weber - 16 - Rapid, Detail-Preserving Image Downscaling
Weber - 16 - Rapid, Detail-Preserving Image Downscaling
Nicolas Weber1,2
1
Bicubic
Michael Waechter1
TU Darmstadt
Bilateral
Subsampling
Sandra C. Amend1
Stefan Guthe1
Michael Goesele1,2
Gaussian
Lanczos
DPID=0.5
Figure 1: Row 1: Input images with 0.5, 1.9, 2.7, and 4.6 megapixels respectively. Rows 25: Downscaled results with 128 pixels width.
Our algorithm (DPID) preserves stars in Example 1, thin lines in Example 2, roof tiles in Example 3, and text, lines and notes in Example 4.
Abstract
Image downscaling is arguably the most frequently used image processing tool. We present an algorithm based on convolutional filters
where input pixels contribute more to the output image the more
their color deviates from their local neighborhood, which preserves
visually important details. In a user study we verify that users prefer our results over related work. Our efficient GPU implementation works in real-time when downscaling images from 24 M to
70 k pixels. Further, we demonstrate empirically that our method
can be successfully applied to videos.
Introduction
Throughout this paper we will focus on downscaling megapixel images to much smaller sizes. As argued above, this is a very important use case. It requires large downscaling factors, which are considerably more challenging than small factors. Both the algorithm
design (with respect to runtime) as well as an evaluation with a user
study must keep this scenario in mind.
Ideally, a downscaling algorithm retains the original images impression, preserves fine details, and is memory and time efficient
so that it can run on mobile devices or handle the gigantic amounts
of data on photo and video web platforms. Traditionally, down-
and Oztireli
and Gross [2015] showed, that such approaches lose visually important high-frequency details during downscaling. Kopf
et al. proposed a filtering approach that adjusts the filter kernels
and Grosss
while blur leads to loss of important details. Oztireli
experiments [2015, Figure 12] showed that users rank nave image
subsampling as runner-up behind their algorithm. It seems as if humans may have a hard time telling information and noise apart. For
the time being we use this as justification for our approach and later
demonstrate in a user study that on average people prefer our results
over related work.
Our papers contributions are as follows: We present an image
downscaling algorithm that preserves high-frequency detailseven
during strong downscaling, i.e., with large scaling factorsand can
be applied to videos with hardly any temporal artifacts. It is based
on two convolutional filters, but its algorithmic complexity is only
linear in the number of input pixels independent of the filter kernel size. One major benefit of our filter-based algorithm is that it
is implementable with fine-grained parallelism on SIMD hardware.
We demonstrate this in a GPU version running in real-time even on
24 M pixel images.
Related Work
as possible, Oztireli
and Gross [2015] optimize the downsampled
image to be close to the original in terms of the structural similarity index. However, since this approach handles color channels
individually, it may produce wrong colors (see the comic art in Figure 5d). Also, it has issues when downscaling line art (Figure 5e).
Algorithm
(1)
1
kp
X
qI (p)
.
I(q)
I(q) I(p)
Vmax
(2)
IGauss
kIGauss Ik
IBox
OIGauss
>10
kIGaussIBox k
kO
Ok
IGauss
10
9
OIBox
kO
O k
IGauss
IBox
>10
10
9
Figure 2: An input image (not shown) downscaled with a factor d 42. Top: Guidance image obtained with our approximation (I),
a 3d 3d Gaussian (IGauss ), and a 3d 3d box filter (IBox ), and the output images (O, OIGauss , OIBox ) obtained with these three guidance
images. Bottom: Differences (L2 norm over 8-bit R, G, and B channel) between the guidance images respectively output images.
=0
= 0.2
= 0.7
= 1.0
0.8
0.6
0.4
0.2
0.0
0
10
0.8
0.6
0.4
0.2
0.0
20 0
1
kp
= 2.0
Right: Results
1.0
1.0
= 0.5
10
0.8
0.6
When increases, the set of curves splits into three subsets (most
easily seen in the green graph): The first subset corresponds to pixels whose distinctness is maximal within their patch. For ,
such a pixel eventually dominates its patch, its patchs kp converges
1
against its range kernel, and its weight thus converges to 1 (or m
if there are m maximally distinct pixels within a patch). The second subset are pixels whose distinctness is smaller than the average
within their patch. When increases, their weight monotonically
converges to 0. The third subset are pixels whose distinctness is
greater than the average but not the maximum within their patch.
Increasing first increases their weight up to a turning point. After
this, the patchs pixel with the maximal distinctness (first subset)
starts dominating and the weight of the non-maximal pixel monotonically converges to 0.
0.4
0.2
0.0
20 0
10
20
Figure 4: Left: Three input image patches. Right: The weights for
each of these patches pixels with respect to .
Results
Input image Kopf et al. Oztireli
and DPID=1.0 DPID=0.5
[2013] Gross [2015]
Oztireli
and Gross [2015]. Further, we provide a short sequence
using Kopf et al.s [2013] algorithm. To downscale this short,
15 second long video, Kopf et al.s algorithm required more than
16 hours. Although Kopf et al. stated that their algorithm is not
suited for videos, it shows very few temporal artifacts.
User Study
100%
= 0.5
(a)
= 0.5
(b)
50%
=1.0
Kopf [2013]
= 1.0
ztireli [2015]
Nehab [2011]
Lanczos
Gaussian
Bilateral
Subsampling
Bicubic
ztireli [2015]
Kopf [2013]
Nehab [2011]
Lanczos
other
100%
16
15
18
10
17
12
19
11
Lanczos
Nehab [2011]
Gaussian
Bilateral
Subsampling
Bicubic
=1.0
=0.5
DPID
14
Kopf [2013]
ztireli [2015]
20
Kopf [2013]
ztireli [2015]
ztireli [2015]
50%
0%
Kopf [2013]
100%
Nehab [2011]
50%
Lanczos
0%
Bicubic
100%
Gaussian
50%
Bilateral
no preference
Figure 6: User preference for our algorithm vs. related work. Each
data point is an average over 26 study participants and 20 input
images. The error bars indicate the 95 % confidence interval for
both our approach and others.
13
= 1.0
=0.5
DPID
Subsampling
Gaussian
Subsampling
Bilateral
Bicubic
0%
no preference
other
Runtime
Lanczos
Nehab [2011]
Gaussian
Bilateral
Subsampling
Bicubic
ztireli [2015]
=1.0
=0.5
DPID
Kopf [2013]
Nehab [2011]
Lanczos
Gaussian
0%
Bilateral
DPID =1.0
Subsampling
no preference
Bicubic
DPID =0.5
no preference
other
Figure 8: Images where our algorithm has been chosen (a) most
and (b) least frequently. On image (a) our algorithm keeps the stars
and light rays intact and it is superior to the other algorithms (see
preferences in (c)). On (b) our algorithm overemphasizes the stars
and it was chosen about 50 % of the time (see preferences in (d)),
In this paper we presented an algorithm that preserves visually important details in downscaled images and is especially suited for
large downscaling factors. In a user study we showed that in a very
heterogeneous group with varying image processing experience our
algorithm was usually preferred over algorithms from related work.
Further, as shown in our supplemental material, the algorithm can
also be applied to videos with hardly any temporal artifacts.
Our algorithm handles noise well if its distribution around the mean
is not skewed. Our algorithms most striking limitations are the
fattening of thin edges/dots and aliasing. This is a consequence
of our definition of detail: The range kernel in Equation 2 is not
an ideal low-pass filter and does not remove frequency information above the Nyquist limit, which may introduce aliasing. While
this may contravene established filtering theory, our study showed
that users actually prefer thisat least on still images: DPID=1.0 ,
0.00
5.00
Time (ms)
40
10.00
4K
FullHD
15.00
iPhone 6S camera
20.00
Canon 6D
30
20
10
0
0.0
1.8
3.7
5.5
7.3
9.1
11.0
12.8
14.6
16.4
18.3
20.1
21.9
23.7
Time (ms)
60
Memcpy
iPhone 6S screen
FullHD
Canon 6D screen
40
25 FPS
20
0
0
0.5
1.5
2.5
Figure 9: Runtimes for various input and output image sizes. Each
data point is an average of 100 runs on a GTX 680. Top: 126 input
images with 0.16 M24 Mpx, all downscaled to 128 px width. The
high time for memcopy is due to PCIe 2.0s bandwidth of 8 GB/s
compared to 192 GB/s to access the GPU memory. Bottom: One
24 Mpx input image downscaled to 128, . . . , 3 Mpx.
Kopf et al.
[2013]
2.4
2.3
Oztireli
and
Gross [2015]
DPID=1.0
DPID=0.5
2.2
2.1
2
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Downscaling Factor
ztireli and Gross [2015]
Ours
Oztireli
and Gross [2015], assuming that all operations are purely
memory bandwidth limited.
best for a given image. This requires a user study with thousands
of input images and various s to obtain label annotations.
Figure 11: Input (1st row) and downscaling results (2nd 4th row)
for Images 11, 20, and 2 of our user study. For Image 11 users preferred = 1.0 over = 0.5, for Image 6 they preferred the opposite,
and for Image 2 they had no preference for either.
Acknowledgements
References
B EGHDADI , A., L ARABI , M.-C., B OUZERDOUM , A., AND I FTE KHARUDDIN , K. 2013. A survey of perceptual image processing
methods. Signal Processing: Image Communication 28, 8.
D UCHON , C. E. 1979. Lanczos filtering in one and two dimensions. Journal of Applied Meteorology 18, 8.
E ISEMANN , E., AND D URAND , F. 2004. Flash photography enhancement via intrinsic relighting. In SIGGRAPH.
KOPF, J., C OHEN , M., L ISCHINSKI , D., AND U YTTENDAELE ,
M. 2007. Joint bilateral upsampling. In SIGGRAPH.