2 Digital Image Representation: Objectives For Chapter 2
2 Digital Image Representation: Objectives For Chapter 2
2 Digital Image Representation: Objectives For Chapter 2
2.1 Introduction
Digital images are created by three basic methods: bitmapping, vector graphics,
and procedural modeling. Bitmap images (also called pixmaps or raster graphics) are
created with a pixel-by-pixel specification of points of color. Bitmaps are commonly
created by digital cameras, scanners, paint programs like Corel Paint Shop Pro, and
image processing programs like Adobe Photoshop. Vector graphic images – created in
programs such as Adobe Illustrator and Corel Draw – use object specifications and
mathematical equations to describe shapes to which colors are applied. A third way to
create digital images is by means of procedural modeling – also called algorithmic art
because of its aesthetic appeal – where a computer program uses some combination of
mathematics, logic, control structures, and recursion to determine the color of pixels and
thereby the content of the overall picture. Fractals and Fibonacci spirals are examples of
algorithmic art.
1
Chapter 2: Digital Image Representation, draft, 4/14/2011
Bitmaps are appropriate for photographic images, where colors change subtly and
frequently in small gradations. Vector graphic images are appropriate for cleanly
delineated shapes and colors, like cartoon or poster pictures. Procedurally modeled
images are algorithmically interesting in the way they generate complex patterns, shapes,
and colors in nonintuitive ways. All three methods of digital image creation will be
discussed in this chapter, with the emphasis on how pixels, colors, and shapes are
represented. Techniques for manipulating and compressing digital images are discussed
in Chapter 3.
Aside: A bitmap is a pixel-by-pixel
specification of points of color in an image file.
2.2 Bitmaps The prefix bit seems to imply that each pixel is
represented in only one bit and thus could have
2.2.1 Digitization only two possible values, 0 or 1, usually
corresponding to black and white (though it could
A bitmap is two-dimensional array be any two colors). It seems that the word
of pixels describing a digital image. Each pixmap, short for pixel map, would be more
pixel short for picture element is a number descriptive, but this word has never really caught
representing the color position (r,c) in the on. Most people use the term bitmap even when
bitmap, where r is the row and c is the referring to images that use more than one bit per
pixel.
column.
There are three main ways to
create a bitmap. One is through software, by means of a paint program. With such a
program, you could paint your picture one pixel at a time, choosing a color for a pixel,
then clicking on that pixel to apply the color. More likely, you would drag the mouse to
paint whole areas with brush strokes, a less tedious process. In this way, you could create
your own artwork – whatever picture you hold in your imagination – as a bitmap image.
More commonly, however, bitmap images are reproductions of scenes and objects
we perceive in the world around us. One way to create a digital image from real world
objects or scenes is to take a snapshot with a traditional analog camera, have the film
developed, and then scan the photograph with a digital scanner – creating a bitmap image
that can be stored on your computer.
A third and more direct route is to shoot the image with a digital camera and
transfer the bitmap to your computer. Digital cameras can have various kinds of memory
cards – sometimes called flash memory – on which the digital images are stored. You
can transfer the image to your computer either by making a physical connection (e.g.,
USB) between the camera and the computer or by inserting the camera's memory card
into a slot on your computer, and then downloading the image file. Our emphasis in this
book will be on bitmap images created from digital photography.
Digital cameras use the same digitization process discussed in Chapter 1. This
digitization process always reduces to two main steps: sampling and quantization.
Sampling rate for digital cameras is a matter of how many points of color are sampled
and recorded in each dimension of the image. You generally have some choices in this
regard. For example, a digital camera might allow you to choose from1600 × 1200, 1280
× 960, 1024 × 768, and 640 × 480. Some cameras offer no choice.
In digital cameras, quantization is a matter of the color model used and the
corresponding bit depth. We will look at color models more closely later in this chapter.
Suffice it to say for now that digital cameras generally use RGB color, which saves each
pixel in three bytes, one for each of the color channels: red, green, and blue. (A higher
2
Chapter 2: Digital Image Representation, draft, 4/14/2011
bit depth is possible in RGB, but three bytes per pixel is common.) Since three bytes is
24 bits, this makes it possible for 224 = 16,777, 216 colors to be represented.
Both sampling and quantization can introduce error in the sense that the image
captured does not represent, with perfect fidelity, the original scene or objects that were
photographed. If you don't take enough samples over the area being captured, the image
will lack clarity. The larger the area represented by a pixel, the blurrier the picture
because subtle transitions from one color to the next cannot be captured.
We illustrate this with the figures below. Imagine that you are looking at a
natural scene like the one pictured in Figure 2.1. Suppose that you divide the image into
15 rows and 20 columns. This gives you 15 × 20 rectangular sample areas. You (a
hypothetical camera) sample the image once per rectangle, using as your sample value
the average color in each square. If you then create the image using the sample values,
the image (Figure 2.2) obviously lacks detail. It is blocky and unclear.
Figure 2.2 Image, undersampled Figure 2.3 Image, reduced bit depth
A low bit depth, on the other hand, can result in patchiness of color. The bit depth
used by digital cameras is excellent for color detail. However, after taking a picture and
loading it onto your computer, you can work with it in an image processing program and
from there reduce its bit depth. (You might do this to reduce the file size.) Consider this
exaggerated example, to make the point. If you reduce the number of colors in the boat
picture from a maximum of 16,777,216 to a maximum of 12, you get the image in Figure
2.3. Whole areas of color have become a single color, as you can see in the clouds. This
is the effect of quantization error on a digital image.
3
Chapter 2: Digital Image Representation, draft, 4/14/2011
distinguish between the two usages. When you display a bitmap image on your
computer, the logical pixel – that is, the number representing a color and stored for a
given position in the image file – is mapped to a physical pixel on the computer screen.
For an image file, pixel dimensions is defined as the number of pixels
horizontally (i.e., width, w) and vertically (i.e., height, h) denoted w × h. For example,
your digital camera might take digital images with pixel dimensions of 1600 × 1200.
Similarly, your computer screen has a fixed maximum pixel dimensions – e.g., 1024 ×
768 or 1400 × 1050.
Digital cameras are advertised as offering a certain number of megapixels. The
megapixel value is derived from the maximum pixel dimensions allowable for pictures
taken with the camera. For example, suppose the largest picture you can take, in pixel
dimensions, for a certain camera is 2048 × 1536. That's a total of 3,145,728 pixels. That
makes this camera a 3 megapixel camera (approximately). (Be careful. Camera makers
sometimes exaggerate their camera's megapixel values by including such things as
"optical zoom," a software method for increasing the number of pixels without really
improving the clarity.)
Resolution is defined as the number of pixels in an image file per unit of spatial
measure. For example, resolution can be measured in pixels per inch, abbreviated ppi. It
is assumed that the same number of pixels are used in the horizontal and vertical
directions, so a 200 ppi image file will print out using 200 pixels to determine the colors
for each inch in both the horizontal and vertical directions.
Resolution of a printer is a matter of how many dots of color it can print over an
area. A common measurement is dots Aside: Unfortunately, not everyone makes a
per inch (DPI). For example, an inkjet distinction between pixel dimensions and
printer might be able to print a resolution, the term resolution commonly being
maximum of 1440 DPI. The printer and used for both. For example, your computer
display's pixel dimensions or the pixel
its software map the pixels in an image dimensions of an image may be called their
file to the dots of color printed. There resolution (with no implication of "per inch").
may be more or fewer pixels per inch The confusion goes even further. Resolution is
than dots printed. You should take a sometimes used to refer to bit depth and is thus
printer's resolution into consideration related to the number of colors that can be
represented in an image file. And the
when you create an image to be printed. documentation for some cameras uses the term
There's no point to having more image size where we would say pixel dimensions.
resolution than the printer can Be alert to these differences in usage. It is the
accommodate. concepts that are important.
Image size is defined as the
physical dimensions of an image when it is printed out or displayed on a computer, e.g. in
inches (abbreviated ") or centimeters. By this definition, image size is a function of the
pixel dimensions and resolution, as follows:
4
Chapter 2: Digital Image Representation, draft, 4/14/2011
For example, if you have an image that is 1600 × 1200 and you choose to print it out at
200 ppi, it will be 8" × 6".
You can also speak of the image size as the image appears on a computer display.
For an image with pixel dimensions h × w and resolution r, the displayed image size is,
as before, w/r × h/r. However, in this case r is the display screen's resolution. For
example, if your computer display screen has pixel dimensions of 1400 × 1050 and it is
12" × 9", then the display has a resolution of about 117 ppi. Thus, a 640 × 480 image,
when shown at 100% magnification, will be about 5½"× 4". This is because each logical
pixel is displayed by one physical pixel on the screen.
The original pixel dimensions of the image file depend on how you created the
image. If the image originated as a photograph, its pixel dimensions may have been
constrained by the allowable settings in your digital camera or scanner that captured it.
Supplement on
The greater the pixel dimensions of the image, the more faithful the image will be to the pixel
scene captured. A 300 × 400 image will not be as crisp and detailed as a 900 × 1200 dimensions,
resolution, and
image of the same subject. image size:
You can see the results of pixel dimensions in two ways. First, with more pixels
to work with, you can make a larger print and still have sufficient resolution for the
printed copy. Usually, you'll want your image to be printed at a resolution between 100
and 300 ppi, depending on the type of printer you use. (Check the specifications of your
printer to see what is recommended.) The printed size is w/r × h/r, so the bigger the r, the
hands-on
smaller the print. For example, if you print your 300 × 400 image out at 100 ppi, it will exercise
be 3" × 4". If you print it out at 200 ppi, it will be 1½" × 2".
The second way you see the result of pixel dimensions is in the size of the image
on your computer display. Since logical pixels are mapped to physical pixels, the more
pixels in the image, the larger the image on the display. A 300 × 400 image will be 1/3
the size of a 900 × 1200 on the display in each dimension. It's true that you can ask the
computer to magnify the image for you, but this won't create detail that wasn't captured in
the original photograph. If you magnify the 300 × 400 image by 300% and compare it to
an identical image that was originally taken with pixel dimensions of 900 × 1200, the
magnified image won't look as clear. It will have jagged edges.
Thus, when you have a choice of the pixel dimensions of your image, you need to
consider how you'll be using the image. Are you going to be viewing it from a computer
or printing it out? How big do you want it to be, either in print size or on the computer
5
Chapter 2: Digital Image Representation, draft, 4/14/2011
screen? If you're printing it out, what resolution do you want to use for the print? With
the answers to these questions in mind, you can choose appropriate pixel based on the
choices offered by your camera and the amount of memory you have for storing the
image.
There are times when you can't get exactly the pixel dimensions you want.
Maybe your camera or scanner has limitations, maybe you didn't take the picture
yourself, or maybe you want to crop the picture to cut out just the portion of interest.
(Cropping, in an image processing program, is simply cutting off part of the picture,
discarding the unwanted pixels.) Changing the number of pixels in an image is called
resampling. You can increase the pixel dimensions by upsampling or decrease the
dimensions by downsampling, and you may have valid reasons for doing either. But
keep in mind that resampling always involves some kind of interpolation, averaging, or
estimation, and thus it cannot improve the quality of an image in the sense of making it
any more faithful to the picture being represented. The additional pixels created by
upsampling are just "estimates" of what the original pixel values would have been if you
had originally captured the image at higher pixel dimensions, and pixel values you get
from downsampling are just averages of the information you originally captured. In
Chapter 3, we'll look more closely at how resampling is done.
The lack of clarity that results from a sampling rate that is too low for the detail of
the image is an example of aliasing. You may recall from Chapter 1 that aliasing is a
phenomenon where one thing "masquerades" as another. In the case of digital imaging, if
the sampling rate is too low, then the image takes on a shape or pattern different what
was actually being photographed – blockiness, blurriness, jagged edges, or moiré patterns
(which, informally defined, are patterns that are created from two other patterns
overlapping each other at an angle). Similarly, a digitized sound might adopt false
frequencies heard as false pitches, or digitized video might demonstrate motion not true
to the original, like spokes in a bicycle wheel rotating backwards. As explained in
Chapter 1, the Nyquist theorem tells us that aliasing results when the sampling rate is not
at least twice the frequency of the highest frequency component of the image (or sound or
video) being digitized. To understand this theorem, we need to be able to think of digital
images in terms of frequencies, which may seem a little counterintuitive. It isn't hard to
understand frequency with regard to sound, since we're accustomed to thinking of sound
as a wave and relating the frequency of the sound wave to the pitch of the sound. But
what is frequency in the realm of digital images?
6
Chapter 2: Digital Image Representation, draft, 4/14/2011
200
180
160
140
120
100
80
60
40
20
0
0 2 4 6 8 10 12 14 16
Figure 2.4 pixel position
An image in which color Figure 2.5
varies continuously from Graph of function y=f(x) for one line of color across the
light gray to dark gray and image. f is assumed to be continuous and periodic.
back again
7
Chapter 2: Digital Image Representation, draft, 4/14/2011
170
160
150
140
grayscale value
130
120
110
100
90
80
Figure 2.6 0 5 10 15 20
pixel position
25 30 35
250
200
grayscale value
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
pixel position
Figure 2.9 Graph of one row of sparrow bitmap over the spatial domain
8
Chapter 2: Digital Image Representation, draft, 4/14/2011
g rayscale value
ve
rtic
al
p n
i xe sitio
lp i xe l po
os
itio tal
p
iz o n
n h or
ve
rtic ion
al
l
p osit
pix pixe
e l
p tal
osi zon
tio hori
n
9
Chapter 2: Digital Image Representation, draft, 4/14/2011
∞
f ( x) = ∑ a n cos( nωx)
n =0
f (x) is a continuous function over the spatial domain whose graph takes the form of a
complex waveform. The equation says that it is equal to an infinite sum of cosine waves
that make up its frequency components. Our convention will be to use ω to represent
angular frequency, where ω = 2πf and f is the fundamental frequency of the wave. As n
varies, we move through these frequency components, from the fundamental frequency
on through multiples of it. a n is the amplitude for the nth cosine frequency component.
We already showed you a simple example of this in Chapter 1. The figure below
shows a sound wave that is the sum of three simple sine waves. (All other frequency
components other than those shown have amplitude 0 at all points.)
Let's look at how this translates to the discrete world of digital images. We
continue to restrict our discussion to the one-dimensional case – considering a single line
of pixels across a digital image, like the one pictured in Figure 2.13. Any row of M
pixels can be represented as a sum of the M weighted cosine functions evaluated at
discrete points. This is expressed in the following equation:
M −1
2C (u ) ⎛ (2r + 1)uπ ⎞
f (r ) = ∑ F (u ) cos⎜ ⎟
u =0 M ⎝ 2M ⎠
2
where C (δ ) = if δ = 0 otherwise C (δ ) = 1
2
Equation 2.1
You can understand Equation 2.1 in this way. f(r) is a one-dimensional array of M pixel
values. For Figure 2.13, these values would be [0, 0, 0, 153, 255, 255, 220, 220]. F(u)
10
Chapter 2: Digital Image Representation, draft, 4/14/2011
⎛ (2r + 1)uπ ⎞
is one-dimensional array of coefficients. Each function cos⎜ ⎟ is called a basis
⎝ 2 M ⎠
function. You can also think of each function as a frequency component. The
coefficients in F(u) tell you how much each frequency component is weighted in the sum
that produces the pixel values. You can think of this as "how much" each frequency
component contributes to the image.
Figure 2.13 A one-dimensional image of eight pixels (enlarged). Pixel outlines are not part of image.
For M = 8 , the basis functions are those given below. They are shown as cosine
functions (made continuous) and then as 8 × 8 bitmap images whose color values change
in accordance with the given basis function. As the values of the cosine function
decreases, the pixels get darker because 1 represents white and −1 represents black.
-1
⎛ (2r + 1)0π ⎞ 0 2 4 6
cos⎜ ⎟ = cos(0)
⎝ 16 ⎠
Figure 2.14 Basis Function 0
-1
⎛ (2r + 1)π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.15 Basis Function 1
-1
⎛ (2r + 1)2π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.16 Basis Function 2
11
Chapter 2: Digital Image Representation, draft, 4/14/2011
-1
⎛ (2r + 1)3π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.17 Basis Function 3
-1
⎛ (2r + 1)4π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.18 Basis Function 4
-1
⎛ (2r + 1)5π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.19 Basis Function 5
-1
⎛ (2r + 1)6π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.20 Basis Function 6
-1
⎛ (2r + 1)7π ⎞ 0 2 4 6
cos⎜ ⎟
⎝ 16 ⎠
Figure 2.21 Basis Function 7
12
Chapter 2: Digital Image Representation, draft, 4/14/2011
The pictures of grayscale bars correspond to the sinusoidal graphs as follows: Evaluate
the basis function at x = i for 0 <= i <= 7 from left to right. Each of the resulting values
corresponds to a pixel in the line of pixels to the right of the basis function, where the
grayscale values are scaled in the range of −1 (black) to 1 (white). Thus, frequency
⎛ (2r + 1)π ⎞
component cos⎜ ⎟ (basis function 1) corresponds to a sequence of eight pixels
⎝ 16 ⎠
that go from white to black.
Equation 2.1 states only that the coefficients F(u) exist , but it doesn't tell you
how to compute them. This is where the discrete cosine transform (DCT) comes in. In
the one-dimensional case, the discrete cosine transform is stated as follows:
M −1
2C (u ) ⎛ (2r + 1)uπ ⎞
F (u ) =
r =0
∑M
f (r ) cos⎜
⎝ 2M
⎟
⎠
2
where C (δ ) = if δ = 0 otherwise C (δ ) = 1
2
Equation 2.2
Equation 2.2 tells how to transform an image from the spatial domain, which gives color
or grayscale values, to the frequency domain, which gives coefficients by which the
frequency components should be multiplied. For example, consider the row of eight
pixels shown in Figure 2.13. The corresponding grayscale values are [0, 0, 0, 153, 255,
255, 220, 220]. This array represents the image in the spatial domain. If you compute a
value F(u) for 0 ≤ u ≤ M − 1 using Equation 2.2, you get the array of values [389.97,
−280.13, −93.54, 83.38, 54.09, −20.51, −19.80, −16.34]. You have applied the DCT,
yielding an array that represents the pixels in the frequency domain. Supplements on
discrete cosine
What this tells you is that the line of pixels is a linear combination of frequency transform:
components – that is, the basis functions multiplied by the coefficients in F and a
constant and added together, as follows:
389.97 2 ⎛ (2r + 1)π ⎞ ⎛ (2r + 1)2π ⎞
f (r ) = cos(0) + (−280.13 cos⎜ ⎟ − 93.54 cos⎜ ⎟ + interactive tutorial
M M ⎝ 2M ⎠ ⎝ 2M ⎠
⎛ (2r + 1)3π ⎞ ⎛ (2r + 1)4π ⎞ ⎛ (2r + 1)5π ⎞
83.38 cos⎜ ⎟ + 54.09 cos⎜ ⎟ − 20.51 cos⎜ ⎟ −
⎝ 2M ⎠ ⎝ 2M ⎠ ⎝ 2M ⎠
⎛ (2r + 1)6π ⎞ ⎛ (2r + 1)7π ⎞
19.80 cos⎜ ⎟ − 16.34 cos⎜ ⎟)
⎝ 2M ⎠ ⎝ 2M ⎠ programming
exercise
worksheet
worksheet
13
Chapter 2: Digital Image Representation, draft, 4/14/2011
w0*
w1*
w2*
w3*
w4*
w5*
w6*
w7*
resulting pixels
Figure 2.22
Having a negative coefficient for a frequency components amounts to adding the inverted
waveform, as shown in Figure 2.23.
As a matter of terminology, you should note that the first element F(0) – is called
the DC component. For a periodic function represented in the frequency domain, the DC
component is a scaled average value of the waveform. You can see this in the one-
dimensional case.
2 ⎛ cos(0) f (0) + cos(0) f (1) + cos(0) f (2) + cos(0) f (3) + ⎞
F (0) = ⎜ ⎟ =
M ⎜⎝ cos(0) f (4) + cos(0) f (5) + cos(0) f (6) + cos(0) f (7) ⎟⎠
2 ⎛ f (0) + f (1) + f (2) + f (3) + ⎞
⎜⎜ ⎟⎟
M ⎝ f (4) + f (5) + f (6) + f (7) ⎠
All the other components (F(1) through F(M−1)) are called AC components. The
names were derived from an analogy with electrical systems, the DC component being
14
Chapter 2: Digital Image Representation, draft, 4/14/2011
(Matrices are assumed to be treated in row-major order – row by row rather than column
by column.) Equation 2.3 serves as an effective procedure for computing the
coefficients of the frequency components from a bitmap image. (Note that since C(u)
and C(v) do not depend on the indices for the summations, you can move the factor
2C (u )C (v)
outside the nested summation. You'll sometimes see the equation written in
MN
this alternative form.)
You can think of the DCT in two equivalent ways. The first way is to think of the
DCT as taking a function over the spatial domain – function f (r , s ) – and returning a
function over the frequency domain – function F (u , v) . Equivalently, you can think of
the DCT as taking a bitmap image in the form of a matrix of color values – f (r , s ) – and
returning the frequency components of the bitmap in the form of a matrix of coefficients
– F (u , v) . The coefficients give the amplitudes of the frequency components.
Rather than being applied to a full M × N image, the DCT is generally applied to
8 × 8 pixel subblocks (for example, as a key step in JPEG compression), so our
discussion will be limited to images in these dimensions. An enlarged 8 × 8 pixel image
is shown in Figure 2.24, and its corresponding bitmap values (matrix f) and amplitudes of
the frequency components computed by the DCT (matrix F) are given in Table 2.1 and
Table 2.2, respectively.
Figure 2.24 8 × 8 bitmap image. Pixel outlines are not part of image.
15
Chapter 2: Digital Image Representation, draft, 4/14/2011
The discrete cosine transform is invertible. By this we mean that given the
amplitudes of the frequency components as F, we can get color values for the bitmap
image as f. This relationship is described in the following equation:
In essence, the equation states that the bitmap is equal to a sum of M*N weighted
frequency components.
The DCT basis functions for an 8 × 8 image are pictured in Figure 2.25. Each
"block" at position (u,v) of the matrix is in fact a graph of the function
⎛ (2r + 1)uπ ⎞ ⎛ (2s + 1)vπ ⎞
cos⎜ ⎟ cos⎜ ⎟ at discrete positions r = [0 7] and s = [0 7] . F(0,0),
⎝ 16 ⎠ ⎝ 16 ⎠
the DC component, is in the upper left corner. The values yielded by the function are
pictured as grayscale values in these positions. The thing to understand is that any 8 × 8
pixel block of grayscale values – like the one pictured in Figure 2.24 – can be recast as a
sum of frequency components pictured in Figure 2.25. You just have to know "how
much" of each frequency component to add in. That is, you need to know the coefficient
by which to multiply each frequency component, precisely what the DCT gives you in
F(u,v) (as in Table 2.2). In the case of color images represented in RGB color mode, the
16
Chapter 2: Digital Image Representation, draft, 4/14/2011
DCT can be done on each of the color components individually. Thus, the process is the
same as what we described for grayscale images, except that you have three 8 × 8 blocks
of data on which to do the DCT – one for red, one for green, and one for blue.
Let's complete this explanation with one more example, this time a two-
dimensional image. Consider the top leftmost 8 × 8 pixel area of the sparrow picture in
Figure 2.8. Figure 2.26 shows a close-up of this area, part of the sidewalk in the
background. Figure 2.27 shows the graph of the pixel data over the spatial domain.
Figure 2.28 captures the same information about the digital image, but represented in the
frequency domain. This graph was generated by the DCT. The DC component is the
largest. It may be difficult for you to see at this scale, but there are other nonzero
frequency components – for example, at positions (1,1) and (1,3) (with array indices
starting at 0).
17
Chapter 2: Digital Image Representation, draft, 4/14/2011
ve ve
rtic rtic
al
p ion al p
it
i xe p os ixe ition
lp ixel lp p os
os
itio tal
p os i x el
rizo
n itio al p
n ho n iz ont
h or
2.5 Aliasing
2.5.1 Blurriness and Blockiness
The previous section showed that one way to understand frequency in the context Supplements on
of digital images is to think of the color values in the image bitmap as defining a surface. aliasing in
sampling:
This surface forms a two-dimensional wave. A complex waveform such as the surface
shown in Figure 2.11 can be decomposed into regular sinusoidal waves of various
frequencies and amplitudes. In reverse, these sinusoidal waves can be summed to yield
the complex waveform. This is the context in which the Nyquist theorem can be
interactive tutorial
understood. Once we have represented a bitmap in the frequency domain, we can
determine its highest frequency component. If the sampling rate is not at least twice the
frequency of the highest frequency component, aliasing will occur.
Let's try to get an intuitive understanding of the phenomenon of aliasing in digital
images. Look again at the picture in Figure 2.1 and think of it as a real-world scene that
worksheet
you're going to photograph. Consider just the horizontal dimension of the image.
Imagine that this picture has been divided into sampling areas so that only 15 samples are
taken across a row – an unrealistically low sampling rate, but it serves to make the point.
If the color changes even just one time within one of the sample areas, then the two
colors in that area cannot both be represented by the sample. This implies that the image
reconstructed from the sample will not be a perfect reproduction of the original scene, as
you can see in Figure 2.2. Mathematically speaking, the spatial frequencies of the
original scene will be aliased to lower frequencies in the digital photograph. Visually, we
perceive that when all the colors in a sampling area are averaged to one color, the
reconstructed image looks blocky and the edges of objects are jagged. This observation
seems to indicate that you'd need a very high sampling rate – i.e., large pixel dimensions
– to capture a real-world scene with complete fidelity. Hypothetically, that's true for
most scenes. Fortunately, however, the human eye isn't going to notice a little loss of
detail. The pixel dimensions offered by most digital cameras these days provide more
than enough detail for very crisp, clear images.
18
Chapter 2: Digital Image Representation, draft, 4/14/2011
for the digital image is not high enough to capture the frequency of the pattern. If the
pattern is not sampled at a rate that is at least twice the rate of repetition of the pattern,
then a different pattern will result in the reconstructed image. In the image shown in
Figure 2.29, the color changes at a perfectly regular rate, with a pattern that repeats five
times in the horizontal direction. What would happen if we sampled this image five
times, at regularly spaced intervals? Depending on where the sampling started, the
resulting image would be either all black or all white. If we sample more than ten times,
however – more than twice per repetition of the pattern – we will be able to reconstruct
the image faithfully. This is just a simple application of the Nyquist theorem.
More visually interesting moiré effects can result when the original pattern is more
complex and the pattern is tilted at an angle with respect to the sampling. Imagine the
image that would result from tilting the original striped picture and then sampling in the
horizontal and vertical directions, as shown in Figure 2.30. The red grid shows the
sampling blocks. Assume that if more than half a sampling block is filled with black
from the original striped image, then that block becomes black. Otherwise, it is white.
The pattern in the reconstructed image is distorted in a moiré effect. Once you know
what the moiré effect is, you start seeing it all around you. You can see it any time one
pattern is overlaid on another – like the shimmering effect of a sheer curtain folded back
on itself, or the swirls resulting from looking through a screen at a closely woven wicker
chair.
Moiré patterns can result both when a digital photograph is taken and when a
picture is scanned in to create a digital image, because both these processes involve
choosing a sampling rate. Figure 2.31 shows a digital photograph of a computer bag
where a moiré pattern is evident, resulting from the sampling rate and original pattern
being "out of sync." Figure 2.32 shows a close-up of the moiré pattern. Figure 2.33
shows a close-up of the pattern as it should look. If you get a moiré effect when you take
a digital photograph, you can try tilting the camera at a different angle or changing the
focus slightly to get rid of it. This will change the sampling orientation or sampling
precision with respect to the pattern.
19
Chapter 2: Digital Image Representation, draft, 4/14/2011
20
Chapter 2: Digital Image Representation, draft, 4/14/2011
We have seen that one of the primary differences between analog and digital
photography is that analog photography measures the incident light continuously across
the focal plane, while digital photography samples it only at discrete points. Another
difference is that it is more difficult for a digital camera to sense all three color
components – red, green, and blue – at each sample point. These constraints on sampling
color and the use of interpolation to "fill in the blanks" in digital sampling can lead to
color aliasing. Let's look at this more closely.
Many current digital cameras use charge-coupled device (CCD) technology to
sense light and thereby color. (CMOS – complementary metal-oxide semiconductor – is
an alternative technology for digital photography, but we won't discuss that here.) A
CCD consists of a two-dimensional array of photosites. Each photosite corresponds to
one sample (one pixel in the digital image). The number of photosites determines the
limits of a camera's resolution. To sense red, green, or blue at a discrete point, the sensor
at that photosite is covered with a red, green, or blue color filter. But the question is:
Should all three color components be sensed simultaneously at each photosite, should
they be sensed at different moments when the picture is taken, or should only one color
component per photosite be sensed?
There are a variety of CCD designs in current technology, each with its own
advantages and disadvantages. (1) The incident light can be divided into three beams.
Three sensors are used at each photosite, each covered with a filter that allows only red,
green, or blue to be sensed. This is an expensive solution and creates a bulkier camera.
(2) The sensor can be rotated when the picture is taken so that it takes in information
about red, green, and blue light in succession. The disadvantage of this method is that the
three colors are not sensed at precisely the same moment, so the subject being
photographed needs to be still. (3) A more recently developed technology (Foveon X3)
uses silicon for the sensors in a method called vertical stacking. Because different depths
of silicon absorb different wavelengths of light, all three color components can be
detected at one photosite. This technology is gaining popularity. (4) A less expensive
method of color detection uses an array like the one shown in Figure 2.34 to detect only
one color component at each photosite. Interpolation is then used to derive the other two
other color components based on information from neighboring sites. It is the
interpolation that can lead to color aliasing.
G R G R
B G B G
G R G R
B G B G
Figure 2.34 Bayer color filter array
In the 4 × 4 array shown in the figure, the letter in each block indicates which
color is to be detected at each site. The pattern shown here is called a Bayer color filter
array, or simply a Bayer filter. (It's also possible to use a cyan-magenta-yellow
combination.) You'll notice that there are twice as many green sensors as blue or red.
This is because the human eye is more sensitive to green and can see more fine-grained
21
Chapter 2: Digital Image Representation, draft, 4/14/2011
changes in green light. The array shown in the figure is just a small portion of what
would be on a CCD. Each block in the array represents a photosite, and each photosite
has a filter on it that determines which color is sensed at that site.
The interpolation algorithm for deriving the two missing color channels at each
photosite is called demosaicing. A variety of demosaicing algorithms have been
devised. A simple nearest neighbor algorithm determines a missing color c for a
photosite based on the colors of the nearest neighbors that have the color c. For the
algorithm given below, assuming a CCD array of dimensions m × n , the nearest neighbors
of photosite (i,j) are sites
(i − 1, j − 1), (i − 1, j ), (i − 1, j + 1), (i, j − 1), (i, j + 1), (i + 1, j − 1), (i + 1, j ), (i + 1, j + 1) where
0 ≤ i ≤ m and 0 ≤ j ≤ n (disregarding boundary areas where neighbors may not exist).
The nearest neighbor algorithm is given as Algorithm 2.1.
algorithm nearest_neighbor
{
for each photosite (i,j) where the photosite detects color cx {
for each cy ∈ {red,green,blue} such that cy ≠ cx {
S = the set of nearest neighbors of site (i,j) that have color cy
set the color value for cy at site (i,j) equal to the average of the color values of cy at
the sites in S
}
}
Algorithm 2.1 Nearest neighbor algorithm
With this algorithm, there may be either two or four neighbors involved in the averaging,
as shown in Figure 2.35a and Figure 2.35b.
G R G B G B
B G B G R G
G R G B G B
a. Determining R or B from the center b. Determining B from the center
G photosite entails an average of two R photosite entails an average of
neighboring sites. four neighboring sites diagonal
from R.
Figure 2.35 Photosite interpolation
This nearest neighbor algorithm can be fine-tuned to take into account the rate at which
colors are changing in either the vertical or horizontal direction, giving more weight to
small changes in color as the averaging is done. Other standard interpolation methods –
linear, cubic, cubic spline, etc. – can also be applied, and the region of nearest neighbors
can be larger than 3× 3 or a shape other than square.
The result of the interpolation algorithm is that even though only one sensor for
one color channel is used at each photosite, the other two channels can be derived to yield
full RGB color. This method works quite well and is used in many digital cameras with
CCD sensors. However, interpolation by its nature cannot give a perfect reproduction of
the scene being photographed, and occasionally color aliasing results from the process,
22
Chapter 2: Digital Image Representation, draft, 4/14/2011
detected as moiré patterns, streaks, or spots of color not present in the original scene. A
simple example will show how this can happen. Imagine that you photograph a white
line, and that line goes precisely across the sensors in the CCD as shown in Figure 2.36 If
there is only black on either side of the line, then averaging the neighboring pixels to get
the color channels not sensed at the photosites covered by the line always gives an
average of 0. That is, no other color information is added to what is sensed at the
photosites covered by the line, so each photosite records whatever color is sensed there.
The result is the line shown in Figure 2.36b. Generally speaking, when a small area color
can be detected by only a few photosites, the neighboring pixels don't provide enough
information so that the true color of this area can be determined by interpolation. This
situation can produce spots, streaks, or fringes of aliased color.
G B G B B
R G R G R
G B G B B
R G R G R
a b
Figure 2.36 A situation that can result in color aliasing
23
Chapter 2: Digital Image Representation, draft, 4/14/2011
Supplements on
aliasing in
rendering:
interactive tutorial
When you draw a line in a draw or a paint program using a line tool, you click on
one endpoint and then the other, and the line is drawn between the two points. In order to
draw the line on the computer display, the pixels that are colored to form the line must be
determined. This requires a line-drawing algorithm such as the one given in Algorithm
2.2. Beginning at one of the line's endpoints, this algorithm moves horizontally across
worksheet
the display device, one pixel at a time. Given column number x0, the algorithm finds the
integer y0 such that (x0, y0) is the point closest to the line. Pixel
(x0, y0) is then colored. The results might look like Figure 2.37. The figure and the
algorithm demonstrate how a line that is one pixel wide would be drawn. It is not the
only algorithm for the purpose, and it does not deal with lines that are two or more pixels
wide.
Supplements on
algorithm draw_line aliasing in line
/*Input: x0, y0, x1, and y1, coordinates of the line's endpoints (all integers) c, color of drawing:
the line.
Output: Line is drawn on display.*/
{
/*Note: We include data types because they are important to understanding the programming
algorithm's execution. */ exercise
int dx, dy, num_steps, i
float x_increment, y_increment, x, y
dx = x1 – x0
dy = y1 – y0
if (absolute_value(dx) > absolute_value(dy) then num_steps = absolute_value(dx) interactive
tutorial
else num_steps = absolute_value(dy)
x_increment = float(dx) / float (num_steps)
y_increment = float (dy) / float (num_steps)
x = x0
y = y0
/*round(x) rounds to the closest integer.*/
draw(round(x), round(y), c)
for i = 0 to num_steps−1 {
x = x + x_increment
y = y + y_increment
draw(round(x), round(y), c)
}}
Algorithm 2.2 Algorithm for drawing a line
To test your understanding, think about how Algorithm 2.2 would be generalized
to lines of any width. Figure 2.38 shows a line that is two pixels wide going from point
24
Chapter 2: Digital Image Representation, draft, 4/14/2011
(8,1) to point (2,15). The ideal line is drawn as a dashed line between the two endpoints.
To render the line in a black and white bitmap image, the line-drawing algorithm must
determine which pixels are intersected by the two-pixel-wide area. The result is the line
drawn in Figure 2.39. Because it is easy to visualize, we use the assumption that a pixel
is colored black if at least half its area is covered by the two-pixel line. Other line-
drawing algorithms may operate differently.
0
8
1 7 15
Figure 2.38 Drawing a line two pixels wide
80
1 7 15
Figure 2.39 Line two pixels wide, aliased
Bitmaps are just one way to represent digital images. Another way is by means of
vector graphics. Rather than storing an image bit-by-bit, a vector graphic file stores a
25
Chapter 2: Digital Image Representation, draft, 4/14/2011
description of the geometric shapes and colors in an image. Thus, a line can be described
by means of its endpoints, a square by the length of a side and a registration point, a
circle by its radius and center point, etc. Vector graphics suffer less from aliasing
problems than do bitmap images in that vector graphics images can be resized without
loss of resolution. Let's look at the difference.
When you draw a line with a line tool in a bitmap image, the pixel values are
computed for the line and stored in the image pixel by pixel. Say that the image is later
enlarged by increasing the number of pixels. This process is called upsampling. A
simple algorithm for upsampling a bitmap image, making it twice as big, is to make four
pixels out of each one, duplicating the color of the original pixel. Obviously, such an
algorithm will accentuate any aliasing from the original image. Figure 2.41 shows a one-
pixel-wide line from a black and white bitmap image at three different enlargements, the
second twice as large and the third three times as large as the first. The jagged edges in
the original line look even blockier as the line is enlarged. Other more refined algorithms
for upsampling can soften the blocky edges somewhat, but none completely eliminates
the aliasing. Algorithms for upsampling are discussed in more detail in Chapter 3.
Aliasing in rendering can occur in both bitmap and vector graphics. In either type
of digital image, the smoothness of lines, edges, and curves is limited by the display
device. There is a difference, however, in the two types of digital imaging, and vector
graphics has an advantage over bitmap images with respect to aliasing. Aliasing in a
bitmap image becomes even worse if the image is resized. Vector graphic images, on the
other hand, have only the degree of aliasing caused by the display device on which the
image is shown, and this is very small and hardly noticeable.
Changing the size of an image is handled differently in the case of vector
graphics. Since vector graphic files are not created from samples and not stored as
individual pixel values, upsampling has no meaning in vector graphics. A vector graphic
image can be resized for display or printing, but the resizing is done by recomputation of
geometric shapes on the basis of their mathematical properties. In a vector graphic
image, a line is stored by its endpoints along with its color and an indication that it is a
line object. When the image is displayed, the line is rendered by a line drawing algorithm
at a size relative to the image dimensions that have been chosen for the image at that
26
Chapter 2: Digital Image Representation, draft, 4/14/2011
moment. Whenever the user requests that the image be resized, the pixels that are
colored to create the line are recomputed using the line-drawing algorithm. This means
that the jaggedness of the line will never be worse than what results from the resolution
of the display device. In fact, to notice the aliasing at all when you work in a vector
graphic drawing program, you need to turn off the anti-aliasing option when you view the
image. Figure 2.42 shows a vector graphic line at increasing enlargements, with the
"view with anti-aliasing" option turned off. The small amount of aliasing is due entirely
to the resolution of the display device. The important thing to notice is that as the line
gets larger, the aliasing doesn't increase. These observations would hold true for more
complex vector graphic shapes as well. (When the lines are printed out on a printer with
good resolution, the aliasing generally doesn't show up at all because the printer's
resolution is high enough to create a smooth edge.)
Figure 2.42 Aliasing does not increase when a vector graphic line is enlarged
In summary, vector graphics and bitmap imaging each have advantages. Vector
graphics imaging is suitable for pictures that have solid colors and well-defined edges
and shapes. Bitmap imaging is suitable Aside: The differences between the power,
for continuous tone pictures like energy, luminance, and brightness of a light can
photographs. The type of images you be confusing. Two colored lights can be of the
work with depends on your purpose. same wavelength but of different power. A
light's power, or energy per unit time, is a
physical property not defined by human
2.6 Color perception. Power is related to brightness in that
if two lights have the same wavelength but the
2.6.1 Color Perception and first has greater power, then the first will appear
Representation brighter than the second.. Brightness is a matter
of subjective perception and has no precise
Color is both a physical and mathematical definition. Luminance has a
psychological phenomenon. Physically, mathematical definition that relates a light's
color is composed of electromagnetic wavelength and power to how bright it is
waves. For humans, the wavelengths of perceived to be. Interestingly, lights of equal
power but different wavelengths do not appear
visible colors fall between approximately equally bright. The brightest wavelengths are
370 and 780 nanometers, as shown in about 550 nm. Brightness decreases from there
Figure 1.19 and 1.20. (A nanometer, as you move to longer or shorter wavelengths. In
abbreviated nm in the figure, is 10-9 general, the more luminant something is, the
meters.) These waves fall upon the color brighter it appears to be. However, keep in mind
that brightness is in the eye of the beholder – a
receptors of the eyes, and in a way not matter of human perception – while luminance
completely understood, the human brain has a precise definition that factors in power,
wavelength, and the average human observer's
sensitivity to that wavelength.
27
Chapter 2: Digital Image Representation, draft, 4/14/2011
translates the interaction between the waves and the eyes as color perception.
Although it is possible to create pure color composed of a single wavelength – for
example, by means of a laser – the colors we see around us are almost always produced
by a combination of wavelengths. The green of a book cover, for example, may look like
a pure green to you, but a spectrograph will show that it is not. A spectrograph breaks up
a color into its component wavelengths, producing a spectral density function P(λ). A
spectral density function shows the contributions of the wavelengths λ to a given
perceived color as λ varies across the visible spectrum.
Spectral density functions are one mathematical way to represent colors, but not a
very convenient way for computers. One problem is that two colors that are perceived to
be identical may, on analysis, produce different spectral density curves. Said the other
way around, more than one spectral density curve can represent two colors that look the
same. If we want to use a spectral density curve to tell a computer to present a particular
shade of green, which “green” spectral density curve is the best one to use?
It is possible to represent a color by means of a simpler spectral density graph.
(This is basically how color representation is done in the HSV and HLS color models, as
will be explained below.) That is, each color in the spectrum can be characterized by a
unique graph that has a simple shape, as illustrated in Figure 2.43. The graph for each
color gives the color’s dominant wavelength, equivalent to the hue; its saturation (i.e.,
color purity); and its luminance. The dominant wavelength is the wavelength at the
spike in the graph. The area beneath the curve indicates the luminance L. (This "curve"
is a rectangular area with a rectangular spike.) Saturation S is the ratio of the area of the
spike to the total area. More precisely with regard to Figure 2.43,
L = (d − a)e + ( f − e)(c − b)
( f − e)(c − b)
S=
L
S(λ )
S = (f −e)(c− b)
L
λ
a b c d
Figure 2.43 Spectral density graph showing hue, saturation, and lightness
28
Chapter 2: Digital Image Representation, draft, 4/14/2011
computer monitors are engineered. (We use the terms monitor and display
interchangeably to refer to the viewing screen of a computer.) An alternative way to look
at a color is as a combination of three primaries. Cathode ray tube (CRT) monitors, for
example, display colored light through a combination of red, green and blue phosphors
that light up at varying intensities when excited by an electron beam. Similarly, liquid
crystal display (LCD) panels display color with neighboring pixels of red, green, and blue
that are either lit up or masked by the liquid crystals.
So what is the best way to model color for a computer? There is no simple
answer, since different models have advantages in different situations. In the discussion
that follows, we will look at color models mathematically and find a graphical way to
compare their expressiveness.
where r, g, and b indicate the relative amounts of red, green, and blue energy
respectively. R, G, and B are constant values based on the wavelengths chosen for the
red, green and blue components. The values r, g, and b are referred to as the values of
the RGB color components (also called color channels in application programs).
Magenta (1,0,1)
White (1,1,1)
The color space for the RGB color model is easy to depict graphically. Let R, G,
and G correspond to three axes in three-dimensional space. We will normalize the
relative amounts of red, green, and blue in a color so that each value varies between 0 and
1. This color space is shown in Figure 2.44. The origin (0, 0, 0) of the RGB color cube
corresponds to black. White is the value (1, 1, 1). The remaining corners of the cube
correspond to red, green, blue, and their complementary colors – cyan, magenta, and
yellow, respectively. Others colors are created at values between 0 and 1 for each of the
components. For example, (1, 0.65, 0.15) is light orange, and (0.26, 0.37, 0.96) is a shade
of blue. Shades of gray have equal proportions of red, green, and blue and lie along the
line between (0, 0, 0) and (1, 1, 1). Notice that if you decrease each of the values for
light orange in (1, 0.65, 0.15) but keep them in the same proportion to each other, you are
29
Chapter 2: Digital Image Representation, draft, 4/14/2011
in effect decreasing the luminance of the color, which is like adding in more black. The
color moves from a light orange to a muddy brown. You can’t increase the luminance of
this color and maintain the proportions, because one of the components is already 1, the
maximum value. The color is at 100% luminance. On the other hand, the color (0.32,
0.48, 0.39) is a shade of green not at full brightness. You can multiply each component
by 2 to get (0.64, 0.96, 0.78), a much lighter shade of green.
You may want to note that in mathematical depictions of the RGB color model, it
is convenient to allow the three color components to range between 0 and 1. However,
the corresponding RGB color mode in image processing programs is more likely to have
values ranging between 0 and 255, since each of the three components is captured in
eight bits. What is important is the relative amounts of each component, and the size of
these amounts with respect to the maximum possible values. For example, the light
orange described as (1, 0.65, 0.15) above would become (255, 166, 38) in an RGB mode
with maximum values of 255.
It's interesting to note that grayscale values fall along the RGB cube's diagonal
from (0,0,0) to (1,1,1). All grayscale values have equal amounts of R, G, and B. When
an image is converted from RGB color to grayscale in an image processing program, the
the equation can be used for the conversion of each pixel value. This equation reflects
the fact that the human eye is most sensitive to green and least sensitive to blue. key
equation
Let an RGB color pixel be given by (R,G,B), where R, G,
and B are the red, green, and blue color components,
respectively. Then the corresponding grayscale value is
given by (L,L,L), where
L = 0.30 R + 0.59G + 0.11B
Since all three color components are equal in a gray pixel, only one of the three values
needs to be stored. Thus a 24-bit RGB pixel can be stored as an 8-bit grayscale pixel.
30
Chapter 2: Digital Image Representation, draft, 4/14/2011
For a pixel represented in the CMY color model, the cyan, key
magenta, and yellow color components are, respectively, C, equations
M, and Y. Let K be the minimum of C, M, and Y. Then the
equivalent color components in the CMYK model,
C new , M new , Ynew , and K are given by
K = min(C , M , Y )
C new = C − K
M new = M − K
Ynew = Y − K
(The definition above theoretically gives the values for CMYK. However, in practice,
other values are used due to the way in which colored inks and paper interact.)
31
Chapter 2: Digital Image Representation, draft, 4/14/2011
on this circle given in degrees, from 0 to 360, with red conventionally set at 0. As the
hue values increase, you move counterclockwise through yellow, green, cyan, etc.
Saturation is a function of the color’s distance from the central axis (i.e., the value axis).
The farther a color is from this axis, the more saturated the color. The value axis lies
from the black point of the hexacone through the center of the circle, with values ranging
from 0 for black to 1 for white, where 0 is at the tip and 1 is on the surface of the
hexacone. For example, (58°, 0.88, 0.93) is a bright yellow.
The HLS color model is essentially the same. To create the HLS color space from
the HSV space (and hence from RGB), go through the same steps illustrated in Figure
2.45, Figure 2.46, and Figure 2.47. Then take a mirror image of the shape in Figure 2.47
and connect it to the top, as in Figure 2.48. Hue and saturation are given as before, but
now lightness varies from 0 at the black tip to 1 at the white tip of the double cones.
The distortion of the RGB color space to either HSV or HLS is a non-linear
transformation. In other words, to translate from RGB to HSV, you can’t simply
multiply each of the R, G, and B components by some coefficient. Algorithm 2.3 shows
how to translate RGB to HSV. Algorithm 2.4 translates from RGB to HLS. The inverse
algorithms are left as an exercise.
Figure 2.45 RGB color cube viewed from the top Figure 2.46 RGB color cube collapsed to 2D
Green Yellow
White
Cyan Red
Blue Magenta
Black
32
Chapter 2: Digital Image Representation, draft, 4/14/2011
H S
White
Green Yellow
Cyan Red
Blue Magenta
Black
algorithm RGB_to_HSV
/* Input: r, g, and b, each real numbers in the range [0…1].
Output: h, a real number in the range of [0…360), except if s = 0, in which case h is
undefined. s and v are real numbers in the range of [0…1].*/
{
max = maximum(r,g,b)
min = minimum(r,g,b)
v = max
if max ≠ 0 then s = (max – min)/max
else s = 0
if s == 0 then h = undefined
else {
diff = max − min
if r == max then h = (g − b) / diff
else if g == max then h = 2 + (b − r) / diff
else if b == max then h = 4 + (r − g) / diff Supplements on
color model
h = h * 60 conversions:
if h < 0 then h = h + 360
}
}
Algorithm 2.3
RGB to HSV
programming
exercise
algorithm RGB_to_HLS
/* Input r, g, and b, each real numbers in the range [0…1] representing the red, green,
and blue color components, respectively
Output: h, a real number in the range of [0…360] except if s = 0, in which case h is
undefined.. L and s are real numbers in the range of [0…1]. h, L, and s represent hue,
lightness, and saturation, respectively.*/
{
max = maximum(r,g,b)
min = minimum(r,g,b)
L = average(max, min)
if max == min then s = 0
else {
sum = max + min
33
Chapter 2: Digital Image Representation, draft, 4/14/2011
For a pixel represented in RGB color, let the red, green, and
blue color components be, respectively, R, G, and B. Then key
equation
the equivalent Y, I, and Q color components in the YIQ
color model are given by
⎡Y ⎤ ⎡0.299 0.587 0.114 ⎤ ⎡ R ⎤
⎢ I ⎥ = ⎢0.596 − 0.275 − 0.321⎥ ⎢G ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣Q ⎥⎦ ⎢⎣0.212 − 0.523 0.311 ⎥⎦ ⎢⎣ B ⎥⎦
(Note that the values in the transformation matrix depend
upon the particular choice of primaries for the RGB
model.)
Y is the luminance component, and I and Q are chrominance. The inverse of the matrix
above is used to convert from YIQ to RGB. The coefficients in the matrix are based on
primary colors of red, green, and blue that are appropriate for the standard National
Television System Committee (NTSC) RGB phosphor.
34
Chapter 2: Digital Image Representation, draft, 4/14/2011
35
Chapter 2: Digital Image Representation, draft, 4/14/2011
intensities of the red, green, and blue components in the composite color until the match
is as close as possible. It turns out that there are pure colors in the visible spectrum that
cannot be reproduced by positive amounts of red, green, and blue light. In some cases, it
is necessary to “subtract out” some of the red, green, or blue in the combined beams to
match the pure color. (Effectively, this can be done by adding red, green, or blue to the
pure color until the two light samples match.) This is true no matter what three visible
primary colors are chosen. No three visible primaries can be linearly combined to
produce all colors in the visible spectrum.
The implication of this experiment is that no computer monitor that bases its color
display on combinations of red, green, and blue light can display all visible colors. The
range of colors that a given monitor can display is called its color gamut. Since
computer monitors may vary in their choice of basic red, green, and blue primaries, two
computer monitors based on RGB color can still have different gamuts. By similar
reasoning, the gamut of a color system based on the CMYK model will vary from one
based on RGB. In practical terms, this means that there will be colors that you can
represent on your computer monitor but you cannot print, and vice versa.
It would be useful to have a mathematical model that captures all visible colors.
From this model, we could create a color space in which all other color models could be
compared. The first step in the direction of a standard color model that represents all
visible colors was called CIE XYZ, devised in 1931 by the Commission Internationale de
l’Eclairage. You can understand how the CIE color model was devised by looking
graphically at the results of the color matching experiment. Consider the graph in Figure
2.49. (See the worksheet associated with Exercise 10 at the end of this chapter for an
explanation of how this graph was created.) The x-axis shows the wavelength, λ, ranging
through the colors of the visible spectrum. The y-axis shows the relative amounts of red,
green, and blue light energy that the “average” observer combines to match the pure light
sample. (Units are unimportant. It is the relative values that matter.) Notice that in some
cases, red has to be “subtracted” from the composite light (i.e., added to the pure sample)
in order to achieve a match.
tristimulus values
3
2.5
2
tristimulus values
r (λ)
1.5
b (λ) g (λ)
1
0.5
-0.5
300 400 500 600 700 800
λ in nanometers
Figure 2.49 Color matching functions
Mathematically, the amount of red light energy needed to create the perceived
pure spectral red at wavelength λ is a function of the wavelength, given by r(λ), and
similarly for green (the function g(λ)) and blue (the function b(λ)). Let C(λ) be the color
the average observer perceives at wavelength λ. Then C(λ) is given by a linear
combination of these three components, that is,
36
Chapter 2: Digital Image Representation, draft, 4/14/2011
C(λ ) = r (λ ) R + g (λ )G + b(λ ) B
Here, R refers to pure spectral red light at a fixed wavelength, and similarly for G and B.
The CIE model is based on the observation that, although there are no three
visible primary colors that can be combined in positive amounts to create all colors in the
visible spectrum, it is possible to use three “virtual” primaries to do so. These primaries
– called X, Y, and Z – are purely theoretical rather than physical entities. While they do
not correspond to wavelengths of visible light, they provide a mathematical way to
describe colors that exist in the visible spectrum. Expressing the color matching
functions in terms of X, Y, and Z produces the graphs in Figure 2.50. We can see that X,
Y, and Z are chosen so that all three functions remain positive over the wavelengths of the
visible spectrum. We now have the equation
C(λ ) = x(λ ) X + y (λ )Y + z (λ ) Z
to represent all visible colors.
1.8 z(λ)
1.6
1.4
tristimulus values
1.2
x(λ)
1
y(λ)
0.8
0.6
0.4
0.2
0
350 400 450 500 550 600 650 700 750
λ in nanometers
37
Chapter 2: Digital Image Representation, draft, 4/14/2011
x (λ ) y (λ ) z (λ )
xʹ′(λ ) = , y ʹ′(λ ) = , zʹ′(λ ) =
x (λ ) + y ( λ ) + z ( λ ) x ( λ ) + y (λ ) + z (λ ) x (λ ) + y (λ ) + z ( λ )
In this way, any two of the color components give us the third one. For example,
x ʹ′(λ ) = 1 − y ʹ′(λ ) − z ʹ′(λ )
xʹ′(λ), yʹ′(λ), and zʹ′(λ) are called the chromaticity values. Figure 2.51 shows
where the chromaticity values fall within the CIE three-dimensional space. Let s(λ) be a
parametric function defined as follows:
s(λ ) = ( xʹ′(λ ), y ʹ′(λ ), z ʹ′(λ ))
Because we have stipulated that x ʹ′(λ ) + y ʹ′(λ ) + z ʹ′(λ ) = 1, this function must lie on the
X + Y + Z = 1 plane. We also know that the values for X, Y, and Z are always positive
for this function because it was defined that way. Thus, we need only look at the area in
the positive octant where X + Y + Z = 1 , represented by the triangle in the figure. The
curve traced on this plane shows the values of s(λ) for the pure spectral colors in the
visible spectrum. These are fully saturated colors at unit energy. The colors in the
interior of this curve on the X + Y + Z = 1 plane are still at unit energy, but not fully
saturated.
0.8
0.6
0.4
0.2
0 0
0
0.5
0.5
1 1
To picture the curve projected onto the X+Y+Z = 1 plane, imagine standing right
in the middle of the X, Y, and Z axes, at the origin, and looking up at the CIE color space,
which you imagine casting a shadow (from either side, up on down) onto the X+Y+Z=1
plane. This is precisely s(λ). In Figure 2.52, s(λ) is the finely-dotted line, the X+Y+Z=1
plane is a triangular drawn with solid lines, and the projection of s(λ) onto the X+Y+Z=1
plane is a horseshoe-shaped coarsely dotted line, which forms the perimeter of the cone
seen in Figure 2.53. The cone shown in Figure 2.53 represents the visible colors in CIE
space. (Actually, this cone extends even beyond the X+Y+Z=1, as there is no specific
maximum energy.) The horseshoe-shaped outline from Figure 2.52 is projected onto
the X-Y plane. The 2D projection on the XY plane is called the CIE Chromaticity
Diagram (Figure 2.54). In this two-dimensional diagram, we have a space in which to
compare the gamuts of varying color models. However, we have said that color is
inherently a three-dimensional phenomenon in that it requires three values for its
specification. We must have dropped some information in this two-dimensional
depiction. The information left out here is energy. Recall that we have normalized the
chromaticity functions so that they combine for “unit energy.” Unit energy is just some
fixed level, the only one we consider as we compare gamuts within the CIE diagram, but
it’s sufficient for the comparison.
38
Chapter 2: Digital Image Representation, draft, 4/14/2011
Figure 2.52 Visible color spectrum projected onto the X+Y+Z=1 plane
0.9
1. Illuminant Area
0.8 2. Green
3. Blue Green
0.7 4. Green Blue
5. Blue
0.6 6. Blue Purple
2 19 7. Purple
18 8. Purple Red
0.5 17
Y 16 9. Red Purple
0.4 3 15 10. Red
1 13 14 11. Purple Pink
0.3 12. Pink
12 10 13. Orange Pink
4 11 9 14. Red Orange
0.2
5 8 15. Orange
0.1
7 16. Yellow Orange
6 17. Yellow
18. Yellow Green
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 19. Green Yellow
X
Figure 2.54 CIE chromaticity diagram
Figure 2.55 shows the gamuts for Aside: Not all visible colors are represented in
the RGB color vs. the CMYK color space. the CIE chromaticity diagram. Color perceptions
Note that for each, we must be assuming that depend in part on the luminance of a color
particular wavelengths for RGB and are absent from the diagram – brown, for
example, which is an orange-red of low
CMY. Thus, the RGB gamut could be the luminance.
gamut for a specific computer monitor, given the wavelengths that it uses for its pure red,
green, and blue. For any given choice of R, G, and B, these primary colors can be located
in the CIE chromaticity diagram by the x and y values given below. For example, a
reasonable choice or R, G, and B would be located at these positions in the CIE diagram:
39
Chapter 2: Digital Image Representation, draft, 4/14/2011
R G B
x 0.64 0.30 0.15
y 0.33 0.60 0.06
According to these values, the color red lies at 0.64 along the horizontal axis in the CIE
diagram and 0.33 along the vertical axis.
The gamut for RGB color is larger than the CMYK gamut. However, neither
color space is entirely contained within the other, which means that there are colors that
you can display on the computer monitor that cannot be printed, and vice versa. In
practice this is not a big problem. The range of colors for each color model is great and
the variations from one color to the next are sufficiently detailed, and usually media
creators do not require an exact reproduction of their chosen colors from one display
device to the next. However, where exact fidelity is aesthetically or commercially
desirable, users need to be aware of the limits of color gamuts.
0.9
1. Illuminant Area
0.8 2. Green
3. Blue Green
0.7 4. Green Blue
5. Blue
0.6 19 6. Blue Purple
2 7. Purple
0.5
18 8. Purple Red
Y 17 9. Red Purple
16
0.4 3 15 10. Red
1 13 14
11. Purple Pink
0.3 12. Pink
12 10 13. Orange Pink
0.2 4 11 9 14. Red Orange
5 8 15. Orange
0.1 6 16. Yellow Orange
7
17. Yellow
18. Yellow Green
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 19. Green Yellow
X
Figure 2.55 RGB vs. CMYK gamuts
40
Chapter 2: Digital Image Representation, draft, 4/14/2011
saturated the color is. For example, point A is closer to illuminant C than to point B.
Thus, it is a pastel or light green.
For a pixel represented in XYZ color, let the values for the
three color components be X, Y, and Z. Then the equivalent key
equation
R, G, and B color components in the RGB color model are
given by
⎡ R ⎤ ⎡ 3.24 − 1.54 − 0.50 ⎤ ⎡ X ⎤
⎢G ⎥ = ⎢− 0.97 1.88 0.04 ⎥⎥ ⎢⎢ Y ⎥⎥
⎢ ⎥ ⎢
⎢⎣ B ⎥⎦ ⎢⎣ 0.06 − 0.20 1.06 ⎥⎦ ⎢⎣ Z ⎥⎦
Conversion from RGB to CIE-XYZ uses the inverse of the above matrix.
41
Chapter 2: Digital Image Representation, draft, 4/14/2011
complementary primary colors, from which all other colors in their gamuts are derived.
Different computer monitors or printers can use different values for R, G, and B, and thus
their gamuts are not necessarily identical. The RGB and CMYK models are not
comprehensive, either. Regardless of the choice of primary colors in either model, there
will exist colors visible to humans that cannot be represented.
The development of the CIE XYZ color model overcame these disadvantages, but
there was still room for improvement. A remaining disadvantage of the CIE XYZ model
is that it is not perceptually uniform. In a perceptually uniform color space, the distance
between two points is directly proportional to the perceived difference between the two
colors. A color space that is perceptually uniform is easier to work with at an intuitive
level, since colors (as they look to the human eye) change at a rate that is proportional to
changes in the values representing the colors.
It is possible for a color model to be perceptually uniform in one dimension but
not perceptually uniform in its three dimensions taken together. For example, in the HSV
color model, hues change at a steady rate as you rotate from 0 to 360 degrees around the
plane denoting hue. Similarly, brightness varies steadily up and down the brightness
axis. However, equidistant movement through the three combined planes does not result
in equal perceived color changes.
The Commission Internationale de l’Eclairage continued to refine its color model
and by 1976 produced the CIE L*a*b* and CIE L*U *V* models. CIE L*a*b* is a
subtractive color model in which the L* axis gives brightness values varying from 0 to
100, the a axis moves from red (positive values) to green (negative values), and the b axis
moves from yellow (positive values) to blue (negative values). CIE L*U*V is an
additive color model similarly constructed to achieve perceptual uniformity, but that was
less convenient in practical usage.
Because the CIE L*a*b* model can represent all visible colors, it is generally
used as the intermediate color model in conversions between other color models, as
described in Section 2.4.
42
Chapter 2: Digital Image Representation, draft, 4/14/2011
43
Chapter 2: Digital Image Representation, draft, 4/14/2011
other colors; another might sacrifice color similarities for the saturation, or vividness, of
the overall image. Dot gain is a matter of the way in which wet ink spreads as it is
applied to paper, and how this may affect the appearance of an image. The color
management policy created for an image can be saved and shared with other images. It is
also embedded in individual image file and is used to communicate color, expressed in
terms of the device-independent CIE color space, from one device and application
program to another.
Most of us who work with digital image processing don’t need to worry very
much about color management systems. Your computer monitor has default settings and
is initially calibrated such that you may not find it necessary to recalibrate it, and the
default color management policy in your image processing program may be sufficient for
your purposes. However, it is good to know that these tools exist so that you know where
to turn if precise color management becomes important.
44
Chapter 2: Digital Image Representation, draft, 4/14/2011
the vocabulary of draftspersons. Before computers were used for CAD design of
airplanes, automobiles, and the like, drafters used a flexible metal strip called a spline to
trace out smooth curves along designated points on the drafting table. Metal splines
provided smooth, continuous curves that now can be modeled mathematically on
computers. Some sources make a distinction between curves and splines, reserving the
latter term for curves that are interpolated (rather than approximated) through points and
that have a high degree of continuity (as in natural cubic splines and B-splines). Other
sources use the terms curve and spline interchangeably. We will distinguish between
Hermite and Bézier curves on the one hand and natural cubic splines on the other
following the terminology used in most sources on vector graphics. We restrict our
discussion to Hermite and Bézier curves.
45
Chapter 2: Digital Image Representation, draft, 4/14/2011
Equation 2.6
It is sometimes convenient to represent the parametric equations in matrix form.
This gives us:
⎡ a x a y ⎤
⎢ ⎥
bx b y ⎥
3
[
P (t ) = [x(t ) y (t )] = t t t 1
2 ⎢ ]
⎢ c x c y ⎥
0 ≤ t ≤1
⎢ ⎥
⎢⎣d x d y ⎥⎦
or, in short,
P = T ∗C
⎡ a x a y ⎤ ⎡ a ⎤
⎢ b y ⎥⎥ ⎢ b ⎥
b
[
where T = t 3 t 2 ]
t 1 and C = ⎢ x
⎢ c x
= ⎢ ⎥
c y ⎥ ⎢ c ⎥
⎢ ⎥ ⎢ ⎥
⎣⎢d x d y ⎦⎥ ⎣d ⎦
46
Chapter 2: Digital Image Representation, draft, 4/14/2011
So let's think about where this is leading us. We define control points to be
points that are chosen to define a curve. We want to be able to define a curve by
selecting a number of control points on the computer display. In general, n + 1 control
points make it possible to model a curve with an n-degree polynomial. Cubic
polynomials have been shown to be good for modeling curves in vector graphics, so our
n will be 3, yielding a 3rd degree polynomial like the one in Equation 2.6. What we need
now is an algorithm that can translate the control points into coefficients a x , bx , c x , d x ,
a y , b y , c y , and d y , which are then encoded in the vector graphics file. That is, we want
to get matrix C. When the curve is drawn, the coefficients are used in the parametric
equation, and t is varied along small discrete intervals to construct the curve.
Given a set of points, what's the right or best way to connect them into a curve?
As you may have guessed, there's no single right way at all. A number of different
methods have been devised for translating points into curves, each with its own
advantages and disadvantages depending on the application or work environment.
Curve-generating algorithms can be divided into two main categories: interpolation
algorithms and approximation algorithms. An interpolation algorithm takes a set of
control points and creates a curve that runs directly through all of the points. An
approximation algorithm creates a curve that does not necessarily pass through all the
control points. Approximation algorithms are sometimes preferable in that they allow the
user to move a single control point and alter just one part of the curve without affecting
the rest of it. This is called local control. Hermite curves and natural cubic splines are
based on interpolation algorithms. Bézier curves are based on an approximation
algorithm. We'll focus on Bézier curves here since they are the basis for curve-drawing
tools in commonly used vector graphics environments.
The general strategy for deriving the coefficients for the parametric equations
from the control points is this: To derive an nth degree parametric equation, you must
find n + 1 coefficients, so you need n + 1 equations in n + 1 unknowns. In the case of a
cubic polynomial, you need four equations in four unknowns. These equations are
formulated from constraints on the control points. For Bézier curves, for example, two of
the control points are constrained to form tangents to the curve. Let's see how this can be
derived.
47
Chapter 2: Digital Image Representation, draft, 4/14/2011
3 * a * 0 2 + 2 * b * 0 + c = ( p1 − p0 ) /(1 / 3 − 0)
∴ c = 3( p1 − p0 )
Similarly, P ʹ′(1) is the slope of the line segment from p 2 to p 3 , yielding
3 * a *12 + 2 * b *1 + c = ( p3 − p2 ) /(1 − 2 / 3)
∴ 3a + 2b + c = 3( p3 − p2 )
The constraints that p0 is the first point on the curve and p3 is the last are stated as
a * 0 3 + b * 0 2 + c * 0 + d = p0
∴ d = p0
a *13 + b *12 + c *1 + d = p3
∴ a + b + c + d = p3
These four constraint equations in matrix form are
⎡0 0 0 1⎤ ⎡ a ⎤ ⎡ p0 ⎤
⎢1 1 1 1⎥ ⎢ b ⎥ ⎢ p3 ⎥
⎢ ⎥ * ⎢ ⎥ = ⎢ ⎥
⎢0 0 1 0⎥ ⎢ c ⎥ ⎢3 * ( p1 − p0 ) ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣3 2 1 0⎦ ⎣d ⎦ ⎣3 * ( p3 − p2 )⎦
Solving for C, the coefficient vector, we get the form C = A -1 * G B
−1
⎡ a ⎤ ⎡0 0 0 1⎤ ⎡ p0 ⎤
⎢ b ⎥ ⎢1 1 1 ⎥
1⎥ ⎢ p3 ⎥
C = ⎢ ⎥ = ⎢ * ⎢ ⎥
⎢ c ⎥ ⎢0 0 1 0⎥ ⎢3 * ( p1 − p0 ) ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣d ⎦ ⎣3 2 1 0⎦ ⎣3 * ( p3 − p2 )⎦
Thus
⎡ a ⎤ ⎡ 2 − 2 1 1 ⎤ ⎡ p0 ⎤
⎢ b ⎥ ⎢− 3 3 − 2 − 1⎥ ⎢ p3 ⎥
C = ⎢ ⎥ = ⎢ ⎥ * ⎢ ⎥
⎢ c ⎥ ⎢ 0 0 1 0 ⎥ ⎢3 * ( p1 − p0 ) ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣d ⎦ ⎣ 1 0 0 0 ⎦ ⎣3 * ( p3 − p2 )⎦
48
Chapter 2: Digital Image Representation, draft, 4/14/2011
⎢ c ⎥ ⎢ c x c y ⎥
⎢ ⎥ ⎢ ⎥
⎣d ⎦ ⎢⎣d x d y ⎥⎦
Bézier curve is defined by control points p 0 , p1 , p2 , and
p3 and the equation
⎡ − 1 3 − 3 1⎤ ⎡ p0 ⎤
⎢ 3 − 6 3 0⎥⎥ ⎢⎢ p1 ⎥⎥
C = ⎢ *
⎢− 3 3 0 0⎥ ⎢ p 2 ⎥
⎢ ⎥ ⎢ ⎥
⎣ 1 0 0 0⎦ ⎣ p3 ⎦
Equation 2.7
M is called the basis matrix. G is called the geometry matrix. The basis matrix and
geometry matrix together characterize a type of curve drawn by means of particular
control points based on constraints on these points – in this case, in the manner used to
define a Bézier curve.
With M and G, we can derive another convenient representation for Bezier
curves, in terms of the curve's blending functions, as they are called. Recall that
[ ]
T = t 3 t 2 t 1 . A Bézier curve is defined by a cubic polynomial equation.
P(t) = T * M * G
The blending functions are given by T * M . That is,
49
Chapter 2: Digital Image Representation, draft, 4/14/2011
P (t ) = (T * M ) * G = (1 − t ) 3 p0 + 3t (1 − t ) 2 p1 + 3t 2 (1 − t ) p2 + t 3 p3
(You should verify this by doing the multiplication and simplifying T * M .) The
multipliers (1 − t )3 , 3t (1 − t )2 , 3t 2 (1 − t ), and t 3 are the blending functions for Bézier
curves and can be viewed as "weights" for each of the four control points. They are also
referred to as Bernstein polynomials.
A more general formulation of Bézier curves allows for polynomials of any
degree n and describes the curves in terms of the blending functions. It is as follows:
n
P(t) = ∑ pk blending k , n (t ) for 0 ≤ t ≤ 1
k =0
In the equation, blendingk,n(t) refers to the blending functions, where n is the degree of
the polynomial used to define the Bézier curve, and k refers to the "weight" for the kth
term in the polynomial. blendingk,n(t) is defined as follows:
blendingk,n(t)= C (n, k )t k (1 − t ) n−k
and C(n,k) is defined in turn as
n!
C (n, k ) =
k!(n − k )!
In the case of the cubic polynomials that we have examined above, n = 3 , and the
blending functions are
blending 0,3 = (1 − t )
3
blending1,3 = 3t (1 − t )
2
blending 2,3 = 3t 2 (1 − t )
blending 3,3 = t 3
One of the easiest ways to picture a cubic Bézier curve is algorithmically. The de
Casteljau algorithm shows how a Bézier curve is constructed recursively from the four
control points. Consider points p0, p1, p2, and p3 shown in Figure 2.57. Let p m , p n
denote a line segment between pm and pn. First we find the midpoint p01 of p 0 ,p1 , the
midpoint p12 of p1 ,p 2 , and the midpoint p23 of p 2 ,p 3 . We then find the midpoint p012 of
p 01 ,p12 and the midpoint p123 of p12 ,p 23 . Finally, we draw p 012 ,p123 and find its
midpoint p0123. This point will be on the Bézier curve. The same procedure is repeated,
based on initial points p0, p01, p012, and p0123 on one side and p0123, p123, p23, p3 on the
other. This goes on recursively, theoretically down to infinite detail – or, more
practically, down to pixel resolution – thereby tracing out the Bézier curve. Analyzing
this recursive procedure mathematically is another method for deriving the Bernstein
polynomials.
50
Chapter 2: Digital Image Representation, draft, 4/14/2011
p2
p12
A. p1
p012 p123
p0123
p01 p23
B. p1
p123
p012 p0123
p01 p23
p0
C. p1
p0123 p123
p012
p01 p23
p0
51
Chapter 2: Digital Image Representation, draft, 4/14/2011
Figure 2.57 shows the stepwise creation of a Bézier curve as it is done using the
pen tool in programs such as Illustrator or GIMP. We'll have to adjust our definition of
control points in order to describe how curves are drawn in these application programs.
The control points defined in the Bézier equation above are p0, p1, p2, and p3. We need
to identify another point that we'll call px. To make a curve defined by the mathematical
control points p0, p1, p2, and p3, you click on four physical control points p0, p1, p2, and
px as follows:
• Click at point p0, hold the mouse down, and drag to point p1.
Supplement on
• Release the mouse button. pen tool and
• Move the cursor to your desired endpoint p3, click, and pull to create a "handle" curve drawing:
p1 p1
px p2 p2 p1
p0 px
p0 p3
p3
p3
p2
p2 p0
p1 px
px
p3
p1
p3 p2
p0
p0
px
Figure 2.58 Examples of Bézier curves drawn in, for example, Illustrator or GIMP.
52
Chapter 2: Digital Image Representation, draft, 4/14/2011
abstraction. However, if you're a computer programmer, you can also generate bitmaps
and graphical images directly by writing programs at a lower level of abstraction. To
"hand-generate" and work with bitmaps, you need to be adept at array manipulation and
bit operations. To work at a low level of abstraction with vector graphics, you need to
understand the mathematics and algorithms for line drawing, shading, and geometric
transformations such as rotation, displacement, and scaling. (These are topics covered in
graphics courses.) In this book, we emphasize your work with high-level tools for
photographic, sound, and video processing; vector drawing; and multimedia
programming. However, we try to give you the knowledge to descend into your image
and sound files at a low level of abstraction when you want to have more creative control.
There is a third type of digital image that we will call algorithmic art. (It is also
referred to as procedural modeling.) In algorithmic art, you create a digital image by
writing a computer program based on some mathematical computation or unique type of
algorithm, but the focus is different from the type of work we've described so far. In
vector graphics (assuming that you're working at a low level of abstraction), you write
programs to generate lines, three-dimensional images, colors, and shading. To do this,
you first picture and then mathematically model the objects that you are trying to create.
In algorithmic art, on the other hand, your focus is the mathematics or algorithm rather
than on some preconceived object, and you create an image by associating pixels with the
results of the calculation. The image emerges naturally as a manifestation of the Supplement on
algorithmic art:
mathematical properties of the calculation or algorithm.
One of the best examples of algorithmic art is fractal generation. A fractal is a
graphical image characterized by a recursively repeating structure. This means that if
you look at the image at the macro-level – the entire structure – you'll see a certain
interactive tutorial
geometric pattern, and then when you zoom in on a smaller scale, you'll see that same
pattern again. For fractals, this self-embedded structure can be repeated infinitely.
Fractals exist in natural phenomena. A fern has the structure of a fractal in that
the shape of one frond of a fern is repeated in each of the small side-leaves of the frond.
Similarly, a cauliflower's shape is repeated in each sub-cluster down to several levels of
detail.
53
Chapter 2: Digital Image Representation, draft, 4/14/2011
You can create a fractal with a recursive program that draws the same shape down
Supplement on
to some base level. This is an example of algorithmic art because the self-replicating Koch snowflake:
structure of the image results from the recursive nature of the algorithm. The Koch
snowflake, named for Swedish mathematician Helge von Koch, is an example of a fractal
structure that can be created by means of a recursive program. Here's an explanation of
how you can draw one by hand. First, draw an equilateral triangle. Then draw another
triangle of the same size, rotate it 180°, and align the two triangles at their center points. programming
You've created a six-pointed star. Now, consider each point of the star as a separate exercise
equilateral triangle, and do the same thing for each of these. That is, create another
triangle of the same size as the point of the star, rotate it, and align it with the first.
You've just created six more stars, one for each point of the original star. For each of
these stars, do the same thing that you did with the first star. You can repeat this to
whatever level of detail you like (or until you can't add any more detail because of the
resolution of your picture). If you fill in all the triangles with a single color, you have a
Koch snowflake.
54
Chapter 2: Digital Image Representation, draft, 4/14/2011
Complex numbers have a real and an imaginary number component. For c, we will call
these components cr and ci, and for z the components are zr and zi.
c = cr + ci i and z = z r + z i i where i is − 1
To create a Mandelbrot fractal, you relate values of c on the complex number plane to
pixel positions on the image bitmap, and you use these as initial values in computations
that determine the color of each pixel. cr corresponds to the horizontal axis on the plane,
and ci corresponds to the vertical axis. The range of −2.0 to 2.0 in the horizontal
direction and −1.5 to 1.5 in the vertical direction works well for the complex number
plane. Let's assume that we're going to create a fractal bitmap that has dimensions of
55
Chapter 2: Digital Image Representation, draft, 4/14/2011
1024 × 768 pixels. Given these respective dimensions for the two planes, each pixel
position (x,y) is mapped proportionately to a complex number (c r , ci ) .
maps to
cr = [−2.0, 2.0]
After the first computation, the output of the i th computation is the input z to the i + 1st
computation. If, after some given maximum number of iterations, the computation has
not revealed itself as being unbounded, then the pixel is painted black. Otherwise, the
interactive tutorial
pixel is painted a color related to how many iterations were performed before it was
discovered that the computation was unbounded. The result is a Mandelbrot fractal like
the ones shown in Figure 2.63.
Algorithm 2.5 describes the Mandelbrot fractal calculation. We have not shown
in this pseudo-code how complex numbers are handled, nor have we been specific about
the termination condition. For details on this, see the related programming assignment.
56
Chapter 2: Digital Image Representation, draft, 4/14/2011
algorithm mandelbrot_fractal
/*Input: Horizontal and vertical ranges of the complex number plane.
Resolution of the bitmap for the fractal.
Color map.
Output: A bitmap for a fractal.*/
{
/*constant MAX is the maximum number of iterations*/
for each pixel in the bitmap {
map pixel coordinates (x,y) to complex number plane coordinates (cr, ci)
num_iterations = 0
z=0
while num_iterations <= MAX and not_unbounded* {
z =z2 + c
num_iterations = num_iterations + 1
}
/*map_color(x,y) uses the programmer's chosen color map to determine the color of each
pixel based on how many iterations are done before the computation is found to be
unbounded*/
if num_iterations = MAX then color(x,y) = BLACK
else color(x,y) = map_color(num_iterations)
}
}
/*We have not explained not_unbounded here. For an explanation of the termination
condition and computations using complex numbers, see the programming assignment
related to this section.*/
Algorithm 2.5
A variation of the Mandelbrot fractal, called the Julia fractal, can be created by
relating z rather than c to the pixel’s position and appropriately selecting a value for c,
which remains constant for all pixels. Values for c that create interesting-looking Julia
fractals can be determined experimentally, by trial and error. Each different constant c
creates a fractal of a different shape.
Figure 2.64 Three Julia fractals using different starting values for c and different color maps
57
Chapter 2: Digital Image Representation, draft, 4/14/2011
2.9 Exercises
1. a. What type of values would you expect for the DCT of the enlarged 8 pixel × 8
pixel image below (i.e., where do you expect nonzero values)? The grayscale values are
given in the matrix. Explain your answer.
Figure 2.65
b. Compute the values. You can use computer help (e.g., write a program, use
MATLAB, etc.)
2. Say that your 1-CCD camera detects the following RGB values in a 3 pixel × 3 pixel
area. What value would it record for the three pixels that are in boldface, assuming the
nearest neighbor algorithm is used? (Give the R, G, and B values for these pixels.)
58
Chapter 2: Digital Image Representation, draft, 4/14/2011
Figure 2.66
2.10 Applications
1. Examine the specifications on your digital camera (or the one you would like to have),
and answer the following questions:
• Does your camera allow you to choose various pixel dimension settings? What
settings are offered?
• How many megapixels does the camera offer? What does this mean, and how
does it relate to pixel dimensions?
• What type of storage medium does the camera use? How many pictures can you
take on your storage medium? This will depend on the size of the storage
medium and the size of the pictures. Explain.
• How can you get your pictures from the camera to your computer?
• What technology does your camera use for detecting color values?
59
Chapter 2: Digital Image Representation, draft, 4/14/2011
2. What are the pixel dimensions of your computer display? Can you change them? If
so, change the pixel dimensions and describe what you observe about the images on your
display. Explain.
3. Examine the specifications of your printer (or the one you would like to have). Is it
inkjet? Laser? Some other type? What is its resolution? How does this affect the
choices you make for initial pixel dimensions and final resolution of digital images that
you want to print?
Examine the features of your digital image processing program, vector graphic (i.e.,
draw), and/or paint programs and try the exercises below with features that are
available.
4. Open the applications program or programs and examine the color models that they
offer you. Which of the color models described in this chapter are offered in the
application program? Are there other color models available in the application program
that are not discussed in this chapter? Examine the interface – sometimes called a "color
chooser." Look at the Help of your application program. How is color saved by the
application program? If you specify color in HLS rather than RGB, for example, using
the color chooser, is the internal representation changing, or just the values you use to
specify the color in the interface?
5. Do some experiments with the color chooser of your image processing, paint, or draw
program in RGB mode to see if you can understand what is meant by the statement "the
RGB color space is not perceptually uniform."
2.11 References
2.11.1 Print Publications
Briggs, John. Fractals: The Patterns of Chaos. New York: Simon & Schuster/A
Touchstone Book, 1992.
Foley, James D., Steven K. Feiner, John F. Hughes, and Andries Van Dam. Computer
Graphics: Principles and Practice. 2nd ed. Boston: Addison-Wesley, 1996.
Hearn, Donald, and M. Pauline Baker. Computer Graphics with OpenGL. 3rd ed. Upper
Saddle River, NJ: Pearson/Prentice-Hall, 2003.
Hill, F. S., Jr. Computer Graphics Using Open GL. 2nd ed. Upper Saddle River, NJ:
Prentice-Hall, 2001.
Hofstadter, Douglas R. Gödel, Escher, Bach: An Eternal Golden Braid. New York:
Basic Books, 1979. (Reprinted with a new preface in 1999.)
60
Chapter 2: Digital Image Representation, draft, 4/14/2011
Livio, Mario. The Golden Ratio: The Story of Phi, The World's Most Astonishing
Number. New York: Broadway Books, 2002.
Pickover, Clifford. The Pattern Book: Fractals, Art, and Nature. Singapore: World
Scientific, 1995.
61