Introduction To Computer Graphics - A Practical Learning Approach
Introduction To Computer Graphics - A Practical Learning Approach
COMPUTER
GRAPHICS
A Practical Learning Approach
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To Reni
F. Ganovelli
To my parents
S. Pattanaik
To my family
M. Di Benedetto
Contents
Preface xxxi
vii
viii Contents
4 Geometric Transformations 91
4.1 Geometric Entities . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Basic Geometric Transformations . . . . . . . . . . . . . . . 92
4.2.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.2 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.3 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.4 Expressing Transformation with Matrix Notation . . . 94
4.3 Affine Transformations . . . . . . . . . . . . . . . . . . . . . 96
4.3.1 Composition of Geometric Transformations . . . . . . 97
4.3.2 Rotation and Scaling about a Generic Point . . . . . . 98
4.3.3 Shearing . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.4 Inverse Transformations and Commutative Properties 100
4.4 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.1 General Frames and Affine Transformations . . . . . . 102
4.4.2 Hierarchies of Frames . . . . . . . . . . . . . . . . . . 102
4.4.3 The Third Dimension . . . . . . . . . . . . . . . . . . 103
4.5 Rotations in Three Dimensions . . . . . . . . . . . . . . . . . 104
4.5.1 Axis–Angle Rotation . . . . . . . . . . . . . . . . . . . 105
4.5.1.1 Building Orthogonal 3D Frames from a Single
Axis . . . . . . . . . . . . . . . . . . . . . . . 106
4.5.1.2 Axis–Angle Rotations without Building the
3D Frame . . . . . . . . . . . . . . . . . . . . 106
4.5.2 Euler Angles Rotations . . . . . . . . . . . . . . . . . 108
4.5.3 Rotations with Quaternions . . . . . . . . . . . . . . . 110
4.6 Viewing Transformations . . . . . . . . . . . . . . . . . . . . 111
4.6.1 Placing the View Reference Frame . . . . . . . . . . . 111
4.6.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . 112
4.6.2.1 Perspective Projection . . . . . . . . . . . . . 112
4.6.2.2 Perspective Division . . . . . . . . . . . . . . 114
4.6.2.3 Orthographic Projection . . . . . . . . . . . 114
4.6.3 Viewing Volume . . . . . . . . . . . . . . . . . . . . . 115
4.6.3.1 Canonical Viewing Volume . . . . . . . . . . 116
x Contents
7 Texturing 217
7.1 Introduction: Do We Need Texture Mapping? . . . . . . . . 217
7.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.2.1 Texturing in the Pipeline . . . . . . . . . . . . . . . . 220
7.3 Texture Filtering: from per-Fragment Texture Coordinates to
per-Fragment Color . . . . . . . . . . . . . . . . . . . . . . . 220
7.3.1 Magnification . . . . . . . . . . . . . . . . . . . . . . . 221
7.3.2 Minification with Mipmapping . . . . . . . . . . . . . 222
7.4 Perspective Correct Interpolation: From per-Vertex to per-
Fragment Texture Coordinates . . . . . . . . . . . . . . . . . 225
7.5 Upgrade Your Client: Add Textures to the Terrain, Street and
Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.5.1 Accessing Textures from the Shader Program . . . . . 229
7.6 Upgrade Your Client: Add the Rear Mirror . . . . . . . . . . 230
7.6.1 Rendering to Texture (RTT) . . . . . . . . . . . . . . 232
7.7 Texture Coordinates Generation and Environment Mapping 234
7.7.1 Sphere Mapping . . . . . . . . . . . . . . . . . . . . . 235
7.7.1.1 Computation of Texture Coordinates . . . . 236
7.7.1.2 Limitations . . . . . . . . . . . . . . . . . . . 236
7.7.2 Cube Mapping . . . . . . . . . . . . . . . . . . . . . . 236
7.7.3 Upgrade Your Client: Add a Skybox for the Horizon . 238
7.7.4 Upgrade Your Client: Add Reflections to the Car . . . 239
7.7.4.1 Computing the Cubemap on-the-fly for More
Accurate Reflections . . . . . . . . . . . . . . 240
7.7.5 Projective Texture Mapping . . . . . . . . . . . . . . . 241
7.8 Texture Mapping for Adding Detail to Geometry . . . . . . . 242
7.8.1 Displacement Mapping . . . . . . . . . . . . . . . . . . 243
7.8.2 Normal Mapping . . . . . . . . . . . . . . . . . . . . . 243
7.8.2.1 Object Space Normal Mapping . . . . . . . . 244
7.8.3 Upgrade Your Client: Add the Asphalt . . . . . . . . . 245
7.8.4 Tangent Space Normal Mapping . . . . . . . . . . . . 246
7.8.4.1 Computing the Tangent Frame for
Triangulated Meshes . . . . . . . . . . . . . . 247
7.9 Notes on Mesh Parametrization . . . . . . . . . . . . . . . . 249
7.9.1 Seams . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.9.2 Quality of a Parametrization . . . . . . . . . . . . . . 252
7.10 3D Textures and Their Use . . . . . . . . . . . . . . . . . . . 254
7.11 Self-Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.11.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.11.2 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Contents xiii
8 Shadows 257
8.1 The Shadow Phenomenon . . . . . . . . . . . . . . . . . . . . 257
8.2 Shadow Mapping . . . . . . . . . . . . . . . . . . . . . . . . 259
8.2.1 Modeling Light Sources . . . . . . . . . . . . . . . . . 260
8.2.1.1 Directional Light . . . . . . . . . . . . . . . . 260
8.2.1.2 Point Light . . . . . . . . . . . . . . . . . . . 260
8.2.1.3 Spotlights . . . . . . . . . . . . . . . . . . . . 261
8.3 Upgrade Your Client: Add Shadows . . . . . . . . . . . . . . 262
8.3.1 Encoding the Depth Value in an RGBA Texture . . . 263
8.4 Shadow Mapping Artifacts and Limitations . . . . . . . . . . 266
8.4.1 Limited Numerical Precision: Surface Acne . . . . . . 266
8.4.1.1 Avoid Acne in Closed Objects . . . . . . . . 267
8.4.2 Limited Shadow Map Resolution: Aliasing . . . . . . . 268
8.4.2.1 Percentage Closer Filtering (PCF) . . . . . . 268
8.5 Shadow Volumes . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.5.1 Constructing the Shadow Volumes . . . . . . . . . . . 271
8.5.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . 272
8.6 Self-Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
8.6.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . 273
8.6.2 Client Related . . . . . . . . . . . . . . . . . . . . . . 273
Bibliography 363
Index 367
List of Figures
xvii
xviii List of Figures
5.14 Stenciling example: (Left) The rendering from inside the car.
(Middle) The stencil mask, that is, the portion of screen that
does not need to be redrawn. (Right) The portion that is
affected by rendering. . . . . . . . . . . . . . . . . . . . . . . 156
5.15 Results of back-to-front rendering of four polygons. A and C
have α = 0.5, B and D have α = 1, and the order, from the
closest to the farthest, is A,B,C,D. . . . . . . . . . . . . . . 157
5.16 (Top-Left) A detail of a line rasterized with DDA rasteriza-
tion. (Top-Right) The same line with the Average Area an-
tialiasing. (Bottom) Results. . . . . . . . . . . . . . . . . . . 158
5.17 Exemplifying drawings for the cabin. The coordinates are ex-
pressed in clip space. . . . . . . . . . . . . . . . . . . . . . . 160
5.18 Adding the view from inside. Blending is used for the upper
part of the windshield. . . . . . . . . . . . . . . . . . . . . . 162
5.19 Scheme for the Cohen-Sutherland clipping algorithm. . . . . 163
5.20 Scheme for the Liang-Barsky clipping algorithm. . . . . . . 164
5.21 Sutherland-Hodgman algorithm. Clipping a polygon against
a rectangle is done by clipping on its four edges. . . . . . . . 165
5.22 (a) If a normal points toward −z in view space this does not
imply that it does the same in clip space. (b) The projection
of the vertices on the image plane is counter-clockwise if and
only if the triangle is front-facing. . . . . . . . . . . . . . . . 166
5.23 (Left) A bounding sphere for a street lamp: easy to test for
intersection but with high chances of false positives. (Right)
A bounding box for a street lamp: in this case we have little
empty space but we need more operations to test the inter-
section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.24 Example of a two-level hierarchy of Axis-Aligned Bounding
Boxes for a model of a car, obtained by slitting the bounding
box along two axes. . . . . . . . . . . . . . . . . . . . . . . . 169
B.1 Dot product. (Left) a0 and a00 are built from a by swapping
the coordinates and negating one of the two. (Right) Length
of the projection of b on the vector a. . . . . . . . . . . . . 360
B.2 Cross product. (Top-Left) The cross product of two vectors is
perpendicular to both and its magnitude is equal to the area
of the parallelogram built on the two vectors. (Top-Right) The
cross product to compute the normal of a triangle. (Bottom)
The cross product to find the orientation of three points on
the XY plane. . . . . . . . . . . . . . . . . . . . . . . . . . . 361
List of Listings
xxvii
xxviii List of Listings
There are plenty of books on computer graphics. Most of them are at the be-
ginner level, where the emphasis has been on teaching a graphics API to create
pretty pictures. There are quite a number of higher level books specializing in
narrow areas of computer graphics, for example, global illumination, geomet-
ric modeling and non-photorealistic rendering. However, there are few books
that cover computer graphics fundamentals in detail and the physical princi-
ples behind realistic rendering, so that it is suitable for use by a broader range
of audience, say, from beginner to senior level computer graphics classes to
those who wish to pursue an ambitious career in a computer graphics-related
field and/or wish to carry out research in the field of computer graphics. Also,
there are few books addressing theory and practice as the same body of knowl-
edge. We believe that there is a need for such graphics books and in this book
we have strived to address this need.
The central theme of the book is real-time rendering, that is, interactive
visualization of three-dimensional scenes. About this, we progressively cover
a wide range of topics from basic to intermediate level. For each topic, the
basic mathematical concepts and/or physical principles are explained, and the
relevant methods and algorithms are derived. The book also covers modeling,
from polygonal representations to NURBS and subdivision surface represen-
tations.
It is almost impossible to teach computer graphics without hands-on
examples and interaction. Thus, it is not an accident that many chapters
of the book come with examples. What makes our book special is that it
follows a teaching-in-context approach, that is, all the examples have been
designed for developing a single, large application, providing a context to put
the theory into practice. The application that we have chosen is a car racing
game where the driver controls the car moving on the track. The example
starts with no graphics at all, and we add a little bit of graphics with each
chapter; at the end we expect that we will have something close to what one
expects in a classical video game.
The book has been designed for a relatively wide audience. We assume
a basic knowledge of calculus and some previous skills with a programming
language. Even though the book contains a wide range of topics from basic
to advanced, the reader will develop the required expertise beyond the basic,
as he or she progresses with the chapters of the book. Thus, we believe that
both beginner- and senior-level computer graphics students will be the primary
xxxi
xxxii Preface
1
2 Introduction to Computer Graphics: A Practical Learning Approach
when you are writing your business-planning presentation, graphics help you
to summarize trends and other information in a easy-to-understand way.
type are: texture synthesis, which deals with the generation of visual
patterns of surface such as bricks of a wall, clouds in the sky, skin, facades
of buildings, etc.; intelligent cut-and-paste, an image editing operation
where the user selects a part of interest of an image and modifies it by
interactively moving it and integrates it into the surroundings of another
part of the same or other image; media retargeting, which consists of
changing an image so as to optimize its appearance in a specific media.
A classic example is how to crop and/or extend an image to show a movie
originally shot in a cinematographic 2.39 : 1 format (the usual notation
of the aspect ratio x : y means that the ratio between the width and the
height of the image is xy ) to the more TV-like 16 : 9 format.
3D Scanning: The process of converting real world objects into a digital
representation than can be used in a CG application. Many devices and
algorithms have been developed to acquire the geometry and the visual
appearance of a real object.
Geometric Modeling: Geometric modeling concerns the modeling of the
3D object used in the CG application. The 3D models can be generated
manually by an expert user with specific tools or semi-automatically
by specifying a sketch of the 3D object on some photos of it assisted
by a specific drawing application (this process is known as image-based
modeling).
Geometric Processing: Geometric processing deals with all the algorithms
used to manipulate the geometry of the 3D object. The 3D object can be
simplified, reducing the level of details of the geometry component; im-
proved, by removing noise from its surface or other topological anomalies;
re-shaped to account for certain characteristics; converted into different
types of representation, as we will see in Chapter 3; and so on. Many of
these techniques are related to the field of computational geometry.
Animation and Simulation: This area concerns all the techniques and al-
gorithms used to animate a static 3D model, ranging from the techniques
to help the artist to define the movement of a character in a movie to
the real-time physical simulation of living organs in a surgery simulator.
Much of the work in this area is rooted in the domain of mechanical en-
gineering, from where complex algorithms have been adapted to run on
low-end computers and in real time, often trading accuracy of physical
simulation for execution speed.
Computational Photography: This area includes all the techniques em-
ployed to improve the potential of digital photography and the quality
of digitally captured images. This CG topic spans optics, image pro-
cessing and computer vision. It is a growing field that has allowed us
to produce low-cost digital photographic devices capable of identifying
faces, refocusing images, automatically creating panoramas, capturing
4 Introduction to Computer Graphics: A Practical Learning Approach
The light receptors on the retina are of two types: rods and cones. Rods
are capable of detecting very small amounts of light, and produce a signal
that is interpreted as monochromatic. Imagine that one is observing the stars
during the night: rods are in use at that moment. Cones are less sensitive to
light than rods, but they are our color receptors. During the day, light inten-
sities are so high that rods get saturated and become nonfunctional, and that
is when cone receptors come into use. There are three types of cones: They
are termed long (L), medium (M) and short (S) cones depending on the part
of the visible spectrum to which they are sensitive. S cones are sensitive to
the lower part of the visible light wavelengths, M cones are sensitive to the
middle wavelengths of the visible light and L cones are sensitive to the upper
part of the visible spectrum. When the cones receive incident light, they pro-
duce signals according to their sensitivity and the intensity of the light, and
send them to the brain for interpretation. The three different cones produce
three different signals, which gives rise to the trichromacy nature of color (in
this sense human beings are trichromats). Trichromacy is the reason why dif-
ferent color stimuli may be perceived as the same color. This effect is called
metamerism. Metamerism can be distinguished into illumination metamerism
when the same color is perceived differently when the illumination changes,
and observer metamerism when the same color stimulus is perceived differ-
ently by two different observers.
The light receptors (rods, cones) do not have a direct specific individ-
ual connection to the brain but groups of rods and cones are interconnected
to form receptive fields. Signals from these receptive fields reach the brain
through the optic nerve. This interconnection influences the results of the sig-
nals produced by the light receptors. Three types of receptive fields can be
classified: black-white, red-green and yellow-blue. These three receptive fields
are called opponent channels. It is interesting to point out that the black-white
channel is the signal that has the highest spatial resolution on the retina; this
is the reason why human eyes are more sensitive to brightness changes of
an image than to color changes. This property is used in image compression
when color information is compressed in a more aggressive way than luminous
information.
completely dark room, then the white wall will reflect the additive combination
of the three colors towards our eye. All the colors can be obtained by properly
adjusting the intensity of the red, green and blue light.
With the subtractive color model the stimulus is generated by subtracting
the wavelengths from the light incident on the reflector. The most well known
example of use is the cyan, magenta, yellow and key (black) model for printing.
Assume that we are printing on a white paper and we have a white light. If we
add no inks to the paper, then we will see the paper as white. If we put cyan
ink on the paper then the paper will look cyan because the ink absorbs the
red wavelengths. If we also add yellow onto it then the blue wavelengths will
also be absorbed and the paper will look green. Finally, if we add the magenta
ink we will have a combination of inks that will absorb all the wavelengths so
the paper will look black. By modulating the amount of each primary ink or
color we put on paper in theory we can express every color of the spectrum.
So why the black ink? Since neither the ink nor the paper is ideal, in the
real situation you would not actually obtain black, but just a very dark color
and hence black ink is used instead. So in general, a certain amount of black
is used to absorb all the wavelengths, and this amount is the minimum of
the three primary components of the color. For example, if we want the color
(c, m, y) the printer will combine:
K = min(c, m, y) (1.1)
C = c−K
M = m−K
Y = y−K (1.2)
Figure 1.2 (Left) shows an example of additive primaries while Figure 1.2
(Right) shows a set of subtractive primaries.
FIGURE 1.3: (Top) CIE 1931 RGB color matching function (x̄(λ), ȳ(λ),
z̄(λ)). (Bottom) CIEXYZ color matching functions (r̄(λ), ḡ(λ), b̄(λ)).
What Computer Graphics Is 9
The plots on the bottom of Figure 1.3 shows the CIEXYZ primaries x̄(λ), ȳ(λ)
and z̄(λ). These matching functions are a transformed version of the CIERGB
color matching functions in such a way that they are all positive functions and
hence are used to simplify the design of devices for color reproduction. The
equation to transform the CIERGB color space to the CIEXYZ color space is
the following:
X 0.4887180 0.3106803 0.2006017 R
Y = 0.1762044 0.8129847 0.0108109 G (1.4)
Z 0.0000000 0.0102048 0.9897952 B
The normalized version of the XYZ color coordinates can be used to define
the so-called chromaticity coordinates:
X Y Z
x= y= z= (1.5)
X +Y +Z X +Y +Z X +Y +Z
These coordinates do not completely specify a color since they always sum
to one, and hence are specified by only two coordinates, which means a two-
dimensional representation. Typically, the Y is added together with x and y to
fully define trichromatic color space called xyY . The chromaticity coordinates
x and y just mentioned are usually employed to visualize a representation
of the colors as in the chromaticities diagram shown in Figure 1.4 (on the
left). Note that even the ink used in professional printing is not capable of
reproducing all the colors of the chromaticities diagram.
color spaces with the CIERGB color matching functions described in the pre-
vious section. CIERGB is a system of color matching functions that can rep-
resent any existing colors while a given RGB color space is a color system
that uses RGB additive primaries combined in some way to physically re-
produce a certain color. Depending on the particular RGB system, different
colors can be reproduced according to its gamut. See color insert for the color
version.
FIGURE 1.5 (SEE COLOR INSERT): HSL and HSV color space.
1.2.2.4 CIELab
CIELab is another color space defined by the CIE in 1976, with very
interesting characteristics. The color coordinates of such systems are usually
indicated with L∗ , a∗ and b∗ . L∗ stands for lightness and a∗ and b∗ identify
12 Introduction to Computer Graphics: A Practical Learning Approach
the chromaticity of the color. The peculiarity of this color space is that the
distance between colors computed as the Euclidean distance:
q
∆Lab = (L1∗ − L∗2 )2 + (a1∗ − a2∗ )2 + (b∗1 − b2∗ )2 (1.11)
is correlated very well with human perception. In other words, while the dis-
tance between two colors in other color spaces cannot be perceived propor-
tionally (for example, distant colors can be perceived as similar colors), in the
CIELab color space, near colors are perceived as similar colors and distant
colors are perceived as different colors.
The equations to convert a color in CIEXYZ color space to a color in
CIELab color space are:
L∗ = 116 f (X/Xn ) − 16
∗
a = 500 f (X/Xn ) − f (Y /Yn ) (1.12)
∗
b = 200 f (Y /Yn ) − f (Z/Zn )
We can see that this formulation is quite complex, reflecting the complex
matching between the perception of the stimulus and the color coordinates
expressed in the CIEXYZ system.
1.2.3 Illuminant
In the previous section we have seen that the conversion between CIEXYZ
and CIELab depends on the particular illuminant assumed. This is always true
when we are talking about color conversion between different color spaces.
The different lighting conditions are standardized by the CIE by publishing
the spectrum of the light source assumed. These standard lighting conditions
are known as standard illuminants. For example, Illuminant A corresponds to
an average incandescent light, Illuminant B corresponds to the direct sunlight,
and so on. The color tristimulus values associated with an illuminant are
called white point, that is, the chromaticity coordinates of how a white object
appears under this light source. For example, the conversion (1.4) between
CIERGB and CIEXYZ is defined such that the white point of the CIERGB
is x = y = z = 1/3 (Illuminant E, equal energy illuminant).
This is the reason why color conversion between different color spaces
needs to take into account the illuminant assumed in their definition. It is
also possible to convert between different illuminants. For an extensive list of
formulas to convert between different color spaces under different illumination
conditions, you can take a look at the excellent Web site of Bruce Lindbloom
What Computer Graphics Is 13
1.2.4 Gamma
Concerning the display of colors, once you have chosen a color system
to represent the colors, the values of the primaries have to be converted to
electrical signals according to the specific display characteristics in order to
reproduce the color.
In a CRT monitor, typically, the RGB intensity of the phosphors is a
nonlinear function of the voltage applied. More precisely, the power law for a
CRT monitor is the applied voltage raised to 2.5. More generally,
I =Vγ (1.14)
where V is the voltage in Volts and I is the light intensity. This equation is
also valid for other type of displays. The numerical value of the exponent is
known as gamma.
This concept is important, because this nonlinearity must be taken into
account when we want our display to reproduce a certain color with high
fidelity; this is what is intended in gamma correction.
FIGURE 1.6: Example of specification of a vector image (left) and the cor-
responding drawing (right).
for the definition of the image, for example paths (as curved or straight lines),
basic shapes (as open or closed polylines, rectangles, circles and ellipses), text
encoded in unicode, colors, different gradients/pattern to fill shapes, and so
on.
PS stands for PostScript and it is a programming language for printing
illustrations and text. It was originally defined by John Warnock and Charles
Geschke in 1982. In 1984 Adobe released the first laser printer driven by the
first version of PostScript language and nowadays Adobe PostScript 3 is the
de-facto worldwide standard for printing and imaging systems. Since it has
been designed to handle the high quality printing of pages it is more than a
language to define vector graphics; for example, it allows us to control the ink
used to generate the drawing.
FIGURE 1.8: A grayscale image. (Left) The whole picture with a highlighted
area whose detail representation (Right) is shown as a matrix of values.
16 Introduction to Computer Graphics: A Practical Learning Approach
scalar components (typically three) that identifies the color of that image’s
location.
The length of the vector used to define a pixel defines the number of
image channels of the image. A color image represented by the RGB color
space has three channels; the red channel, the green channel and the blue
channel. This is the most common representation of a color image. An image
can have more than three components; additional components are used to
represent additional information, or for certain types of images like multi-
spectral images where the color representation requires more than three values
to represent multiple wavelength bands. One of the main uses of four-channel
images is to handle transparency. The transparency channel is usually called
alpha channel . See Figure 1.9 for an example of an image with a transparent
background.
In the comparison of raster images with vector images, the resolution plays
an important role. The vector images may be considered to have infinite res-
olution. In fact, a vector image can be enlarged simply by applying a scale
factor to it, without compromising the quality of what it depicts. For exam-
ple, it is possible to make a print of huge size without compromising its final
reproduction quality. Instead, the quality of a raster image depends heavily
on its resolution: as shown in Figure 1.10, a high resolution is required to
draw smooth curves well, which is natural for a vector image. Additionally, if
the resolution is insufficient, the pixels become visible (again see Figure 1.10).
This visual effect is called pixellation. On the other hand, a vector image has
severe limitations to describe a complex image like a natural scene. In this
and similar cases, too many primitives of very small granularity are required
for a good representation and hence a raster image is a more natural represen-
tation of this type of image. This is the reason why vector images are usually
employed to design logos, trademarks, stylized drawings, diagrams, and other
similar things, and not for natural images or images with rich visual content.
What Computer Graphics Is 17
FIGURE 1.10: Vector vs raster images. (Left) A circle and a line assem-
bled to form a “9.” (From Left to Right) The corresponding raster images at
increased resolution.
FIGURE 1.11: A schematic concept of ray tracing algorithm. Rays are shot
from the eye through the image plane and intersections with the scene are
found. Each time a ray collides with a surface it bounces off the surface and
may reach a light source (ray r1 after one bounce, ray r2 after two bounces).
where Int(oi ) is the cost of testing the intersection of a ray with the object
oi . However,
Pm it is possible to adopt acceleration data structures that mostly
reduce i=0 Int(oi ) to O(log(m)) operations.
it looks just like it is passing the user input to the next stage. In older schemes
of the rasterization-based pipeline you may actually find it collapsed with the
rasterization stage and/or under the name of primitive assembly. Nowadays,
this stage not only passes the input to the rasterization stage, but it may
also create new primitives from the given input; for example it may take one
triangle and output many triangles obtained by subdividing the original one.
The rasterization stage converts points, lines and triangles to their raster
representation and interpolates the value of the vertex attributes of the prim-
itive being rasterized. The rasterization stage marks the passage from a world
made by points, lines and polygons in 3D space to a 2D world made of pix-
els. However, while a pixel is defined by its coordinates in the image (or in
the screen) and its color, the pixels produced by the rasterization may also
contain a number of interpolated values other than the color. These “more
informative” pixels are called fragments and are the input to the next stage,
the per-fragment computation. Each fragment is processed by the per-fragment
computation stage that calculates the final values of the fragment’s attributes.
Finally, the last stage of the pipeline determines how each fragment is com-
bined with the current value stored in the framebuffer , that is, the data buffer
that stores the image during its formation, to determine the color of the cor-
responding pixel. This combination can be a blending of the two colors, the
choice of one over the other or simply the elimination of the fragment.
The cost for the rasterization pipeline is given by processing (that is, trans-
forming) all the vertices nv and rasterizing all the geometric primitives:
m
X
cost(rasterization) = Ktr nv + Ras(pi ) (1.16)
i=0
where Ktr is the cost of transforming one vertex and Ras(pi ) is the cost of
rasterizing a primitive.
With ray tracing we may use any kind of surface representation, provided
that we are able to test the intersection with a ray. With the rasterization
pipeline, every surface is ultimately discretized with a number of geomet-
ric primitives and discretization brings approximation. In a practical simple
example we may consider a sphere. With ray tracing we have a precise in-
tersection (up to the numerical precision of machine finite arithmetic) of the
view ray with an analytic description of the sphere, with the rasterization the
sphere approximated with a number of polygons.
Ray tracing is simple to implement, although hard to make efficient, while
even the most basic rasterization pipeline requires several algorithms to be
implemented.
In this chapter we deal with the first practical steps that will accompany us
for the rest of the book.
First we will show how to set up our first working rendering inside an
HTML page by using WebGL. More precisely, we will draw a triangle. We
advise you that at first it would look like you need to learn an awful lot of
information for such a simple task as drawing a triangle, but what we will do
in this simple example is the same as what we would do for a huge project.
Simply put, you need to learn to drive a car for going 10 miles as well as 100
miles.
Then we will introduce the EnvyMyCar (NVMC) framework that we will
use for the rest of this book for putting into use the theory we will learn along
the way. Briefly, NVMC is a simple car racing game where the gamer controls
a car moving on a track. What makes NVMC different from a classic video
game is that there is no graphics at all, because we are in charge to develop it.
Given a complete description of the scene (where is the car, how is it oriented,
what is its shape, how is the track made, etc.) we will learn, here and during
the rest of the book, how to draw the elements and the effects we need to
obtain, at the end, a visually pleasant car racing game.
23
24 Introduction to Computer Graphics: A Practical Learning Approach
like the pipeline stage name. A vertex shader takes as input n general at-
tributes, coming from the VP, and gives as output m general attributes,
plus a special one, namely the vertex position. The resulting transformed
vertex is passed to the next stage (primitive assembler). For example, to
visualize the temperature of the Earth’s surface, a vertex shader could
receive in input three scalar attributes, representing a temperature, a
latitude and a longitude, and output two 3D attributes, representing an
RGB color and a position in space.
Primitive Assembler (PA). The WebGL rasterization pipeline can only
draw three basic geometric primitives: points, line segments, and tri-
angles. The primitive assembler, as the name suggests, is in charge of
collecting the adequate number of vertices coming from the VS, assem-
The First Steps 27
bling them in a t-uple, and passing it to the next stage (the rasterizer).
The number of vertices (t) depends on the primitive being drawn: 1
for points, 2 for line segments, and 3 for triangles. When an API draw
command is issued, we specify which is the primitive that we want the
pipeline to draw, and thus configure the primitive assembler.
Rasterizer (RS). The rasterizer stage receives as input a primitive consist-
ing of t transformed vertices, and calculates which are the pixels it cov-
ers. It uses a special vertex attribute that represents the vertex position
to identify the covered region, then, for each pixel, it interpolates the m
attributes of each vertex and creates a packet, called fragment, contain-
ing the associated pixel position and the m interpolated values. Each as-
sembled fragment is then sent to the next stage (the Fragment Shader ).
For example, if we draw a segment whose vertices have an associated
color attribute, e.g., one red and one green, the rasterizer will generate
several fragments that, altogether, resemble a line: fragments near the
first vertex will have associated a reddish color that becomes yellow at
the segment midpoint, and then goes to green while approaching the
second vertex.
Output Combiner (OC). The last stage of the geometry pipeline is the
output combiner. Before writing to the framebuffer the pixels coming
out from the fragment shader, the output combiner executes a series of
configurable tests that can depend both on the incoming pixel data, and
the data already present in the framebuffer at the same pixel location.
For example, an incoming pixel could be discarded (e.g., not written)
if it is not visible. Moreover, after the tests have been performed, the
actual color written can be further modified by blending the incoming
value with the existing one at the same location.
Framebuffer Operations (FO). A special component of the rendering ar-
chitecture is dedicated to directly access the framebuffer. The frame-
buffer operations component is not part of the geometry pipeline, and
it is used to clear the framebuffer with a particular color, and to read
back the framebuffer content (e.g., its pixels).
28 Introduction to Computer Graphics: A Practical Learning Approach
All the stages of the WebGL pipeline can be configured by using the corre-
sponding API functions. Moreover, the VS and FS stages are programmable,
e.g., we write programs that they will execute on their inputs. For this reason,
such a system is often referred to as a programmable pipeline, in contrast to a
fixed-function pipeline that does not allow the execution of custom code.
In this example it is important that the code we write will not be executed
before the page loading has completed. Otherwise, we would not be able to
access the canvas with document.getElementById() simply because the canvas
tag has not been parsed yet and thus could not be queried. For this reason we
must be notified by the browser whenever the page is ready; by exploiting the
native and widely pervasive use of object events in a Web environment, we
accomplish our task by simply registering the helloDraw function as the page
load event handler, as shown on line 17 in Listing 2.2.
1 // global variables
2 var gl = null ; // the rendering context
3
4 function setupWebGL () {
5 var canvas = document . getElementById ( " OUTPUT - CANVAS " ) ;
6 gl = canvas . getContext ( " webgl " ) ;
7 }
LISTING 2.3: Setting up WebGl.
The first thing to do is to obtain a reference to the HTMLCanvasElement
object: this is done on line 5, where we ask the global document object to
retrieve an element whose identifier (its id ) is OUTPUT-CANVAS, using the
method getElementById. As you can argue, the canvas variable is now referenc-
ing the canvas element on line 8 in Listing 2.1. Usually, the canvas will provide
The First Steps 31
the rendering context with a framebuffer that contains a color buffer consist-
ing of four 8-bit channels, namely RGBA, plus a depth buffer whose precision
would be of 16 to 24 bytes, depending on the host device. It is important to
say that the alpha channel of the color buffer will be used by the browser as a
transparency factor, meaning that the colors written when using WebGL will
be overlaid on the page in compliance with the HTML specifications.
Now we are ready to create a WebGLRenderingContext. The method get-
Context of the canvas object, invoked with the string webgl as its single argu-
ment, creates and returns the WebGL context that we will use for rendering.
For some browsers, the string webgl has to be replaced with experimental-
webgl. Note that there is only one context associated with each canvas: the
first invocation of getContext on a canvas causes the context object to be cre-
ated and returned; every other invocation will simply return the same object.
On line 6 we store the created context to the gl variable. Unless otherwise
specified, throughout the code in this book the identifier gl will be always and
only used for a variable referencing a WebGL rendering context.
1 var triangle = {
2 vertexPositions : [
3 [0.0 , 0.0] , // 1 st vertex
4 [1.0 , 0.0] , // 2 nd vertex
5 [0.0 , 1.0] // 3 rd vertex
6 ]
7 };
LISTING 2.4: A triangle in JavaScript.
The triangle variable refers to an object with a single property named
vertexPositions. In turn, vertexPositions refers to an array of three elements,
one for each vertex. Each element stores the x and y coordinates of a vertex
with an array of two numbers. Although the above representation is clear from
32 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 2.2: Illustration of the mirroring of arrays from the system memory,
where they can be accessed with JavaScript, to the graphics memory.
a design point of view, it is not very compact in terms of occupied space and
data access pattern. To achieve the best performance, we must represent the
triangle in a more raw way, as shown in Listing 2.5.
1 var positions = [
2 0.0 , 0.0 , // 1 st vertex
3 1.0 , 0.0 , // 2 nd vertex
4 0.0 , 1.0 // 3 rd vertex
5 ];
LISTING 2.5: A triangle represented with an array of scalars.
As you can notice, the triangle is now represented with a single array of six
numbers, where each number pair represents the two-dimensional coordinates
of a vertex. Nonetheless, storing the attributes of the vertices that compose
a geometric primitive (that is, the positions of the three vertices that form
a triangle as in the above example) in a single array of numbers, coordinate
after coordinate, and vertex after vertex, is exactly the way WebGL requires
us to follow whenever geometric data has to be defined.
Now we have to take a further step to convert the data in a lower level
representation. Since JavaScript arrays do not represent a contiguous chunk of
memory and, moreover, are not homogeneous (e.g., elements can have differ-
The First Steps 33
ent types), they cannot be directly delivered to WebGL, which expects a raw,
contiguous region of memory. For this reason, the WebGL specifications lead
the way to the definition of new JavaScript objects for representing contigu-
ous and strongly-typed arrays. The typed array specification defines a series
of such objects, i.e., Uint8Array (unsigned, 8-bit integers) and Float32Array
(32-bit floating-points), that we will use for creating the low-level version of
our native JavaScript array. The following code constructs a 32-bit floating-
point typed array from a native array:
Alternatively, we could have filled the typed array directly, without passing
by a native array:
The data, laid out as above, is now ready to be mirrored (for example by
creating an internal WebGL copy) and encapsulated in a WebGL object. As
mentioned above, WebGL uses its own counterpart of a native data structure:
in this case, a JavaScript typed array containing vertex attributes is mirrored
by a WebGLBuffer object:
At line 1, we store the index of the selected slot in a global variable, to be used
later. Selecting index zero is completely arbitrary: we can choose whichever
indices we want (in case of multiple attributes). At line 2, with the method
enableVertexAttribArray we tell the context that vertex attribute from slot po-
sitionAttribIndex (zero) has to be fetched from an array of values, meaning that
we will latch a vertex buffer as the attribute data source. At last, we must
specify the context, which is the data type for the attribute and how to fetch
it. The method vertexAttribPointer at line 3 solves this purpose; using a C-like
syntax, its prototype is:
The index parameter is the attribute index that is being specified; size repre-
sents the dimensionality of the attribute (two-dimensional vector); type is a
The First Steps 35
symbolic constant indicating the attribute scalar type (gl.FLOAT, e.g., floating
point number); normalized is a flag indicating whether an attribute with inte-
gral scalar type must be normalized (more on this later, at any rate, the value
here is ignored because the attribute scalar type is not an integer type); stride
is the number of bytes from the beginning of an item in the vertex attribute
stream and the beginning of the next entry in the stream (zero means that
there are no gaps, that is the attribute is tightly packed, three floats one after
another); offset is the offset in bytes from the beginning of the WebGLBuffer
currently bound to the ARRAY BUFFER target (in our case, positionsBuffer)
to the beginning of the first attribute in the array (zero means that our posi-
tion starts immediately at the beginning of the memory buffer).
The complete shape setup code is resembled in Listing 2.6.
1 // global variables
2 // ...
3 var p o s i t i o n A t t r i b I n d e x = 0;
4
5 function setupWhatToDraw () {
6 var positions = [
7 0.0 , 0.0 , // 1 st vertex
8 1.0 , 0.0 , // 2 nd vertex
9 0.0 , 1.0 // 3 rd vertex
10 ];
11
12 var typedPo sitions = new Float32Array ( positions ) ;
13
14 var positionsBuffer = gl . createBuffer () ;
15 gl . bindBuffer ( gl . ARRAY_BUFFER , positionsBuffer ) ;
16 gl . bufferData ( gl . ARRAY_BUFFER , typedPositions , gl . STATIC_DRAW ) ;
17
18 gl . e n a b l e V e r t e x A t t r i b A r r a y ( p o s i t i o n A t t r i b I n d e x ) ;
19 gl . v e rt e x A t t r i b Po i n t e r ( positionAttribIndex , 2 , gl . FLOAT , false ←-
, 0 , 0) ;
20 }
LISTING 2.6: Complete code to set up a triangle.
The rendering context is now configured to feed the pipeline with the
stream of vertex attributes we have just set up.
As you can notice, once a WebGLShader object is created (line 2), its source
code is simply set by passing a native JavaScript string to the method shader-
Source() (line 3). Finally, the shader must be compiled (line 4). The GLSL
source code vsSource, for this basic example, is:
The only change is at line 2 where we pass the FRAGMENT SHADER symbolic
constant to the creation method instead of VERTEX SHADER.
The First Steps 37
In our first example the fragment shader simply sets the color of each
fragment to blue, as shown in the following GLSL code:
The built-in vec4 output variable gl FragColor holds the color of the frag-
ment. The vector component represents the red, green, blue and alpha values
(RGBA) of the output color, respectively; each component is expressed as a
floating point in the range [0.0,1.0].
As illustrated in Figure 2.3, vertex and fragment shaders must be encap-
sulated and linked into a program, represented by a WebGLProgram object:
1 function setupHowToDraw () {
2 // vertex shader
3 var vsSource = " \
4 attribute vec2 aPosition ; \n\
5 \n\
6 void main ( void ) \n\
7 { \n\
8 gl_Position = vec4 ( aPosition , 0.0 , 1.0) ; \n\
9 } \n\
10 ";
11 var vertexShader = gl . createShader ( gl . VERTEX_SHADER ) ;
12 gl . shaderSource ( vertexShader , vsSource ) ;
13 gl . compileShader ( vertexShader ) ;
14
15 // fragment shader
16 var fsSource = " \
17 void main ( void ) \n\
18 { \n\
19 gl_FragColor = vec4 (0.0 , 1.0 , 0.0 , 1.0) ; \n\
38 Introduction to Computer Graphics: A Practical Learning Approach
20 } \n\
21 ";
22 var fragmentShader = gl . createShader ( gl . FRAGMENT_SHADER ) ;
23 gl . shaderSource ( fragmentShader , fsSource ) ;
24 gl . compileShader ( fragmentShader ) ;
25
26 // program
27 var program = gl . createProgram () ;
28 gl . attachShader ( program , vertexShader ) ;
29 gl . attachShader ( program , fragmentShader ) ;
30 gl . b i nd A tt ri b Lo c at io n ( program , positionAttribIndex , " aPosition "←-
);
31 gl . linkProgram ( program ) ;
32 gl . useProgram ( program ) ;
33 }
LISTING 2.7: Complete code to program the vertex and the fragment
shader.
Having configured the vertex streams and the program that will process
vertices and fragments, the pipeline is now ready for drawing.
Step 4: Draw
We are now ready to draw our first triangle to the screen. This is done by the
following code:
1 function draw () {
2 gl . clearColor (0.0 , 0.0 , 0.0 , 1.0) ;
3 gl . clear ( gl . COLOR_BUFFER_BIT ) ;
4 gl . drawArrays ( gl . TRIANGLES , 0 , 3) ;
5 }
At line 2 we define the RGBA color to use when clearing the color buffer (line
3). The call to drawArrays() at line 4 performs the actual rendering, creating
triangle primitives starting from vertex zero and consuming three vertices.
The resulting JavaScript code of all the parts is shown in Listing 2.8 for
recap.
1 // global variables
2 var gl = null ;
3 var p o s i t i o n A t t r i b I n d e x = 0;
4
5 function setupWebGL () {
6 var canvas = document . getElementById ( " OUTPUT - CANVAS " ) ;
7 gl = canvas . getContext ( " experimental - webgl " ) ;
8 }
9
10 function setupWhatToDraw () {
11 var positions = [
12 0.0 , 0.0 , // 1 st vertex
13 1.0 , 0.0 , // 2 nd vertex
14 0.0 , 1.0 // 3 rd vertex
15 ];
16
The First Steps 39
74 }
75
76 window . onload = helloDraw ;
LISTING 2.8: The first rendering example using WebGL.
condition (dry, wet, dusty etc.), and so on. A client may send commands to the
server to control its car, for example TURN LEFT, TURN RIGHT, PAUSE etc.,
and these messages are input to the simulation. At fixed time steps, the server
broadcasts to all the clients the state of the race so that they can render it.
FIGURE 2.5: The class NVMC incorporates all the knowledge about the
world of the race.
Initializing. The method onInitialize is called once per page loading. Here we
will place all the initialization of our graphics resources and data structures
that need to be done once and for all. Its implementation in this very basic
client is reported in Listing 2.9. The call NVMC.log at line 120 pushes a text
message on the log window appearing below the canvas (see Figure 2.6). We
will use this window to post information about the current version of the
client and as feedback for debugging purposes. Lines 124 to 136 just create a
mapping between the key W, A, S, D, and the action to take when one key is
pressed. This mapping will involve more and more keys as we will have more
input to take from the user (for example, switch the headlights on/off). Then
at line 140 we call a function to initialize all the geometric objects we need
in the client, which in this case simply means the example triangle shown in
Listing 2.6 and finally at line 141 we call the function that creates a shader
program.
1 function Triangle () {
2 this . name = " Triangle " ;
3 this . vertices = new Float32Array ([0 ,0 ,0 ,0.5 ,0 , -1 , -0.5 ,0 , -1]) ;
4 this . triangleIndices = new Uint16Array ([0 ,1 ,2]) ;
5 this . numVertices = 3;
6 this . numTriangles = 1;
7 };
LISTING 2.10: The JavaScript object to represent a geometric primitive
made of triangles (in this case, a single triangle). (Code snippet from http://
envymycarbook.com/chapter2/0/triangle.js.)
Then, we define a function to create the WebGL buffers from these
JavaScript objects, as shown in Listing 2.11.
(k, i). In this implementation we do not care that if two triangles share one
edge, we will have two copies of the same edge.
In Listing 2.12 we show how the functions above are used for our first
client.
63 NVMCClient . in iti ali zeO bjec ts = function ( gl ) {
64 this . triangle = new Triangle () ;
65 this . c r e a t e O b j e c t B u f f e r s ( gl , this . triangle ) ;
66 };
LISTING 2.12: Creating geometric objects. (Code snippet from
http://envymycarbook.com/chapter2/0.js.)
Rendering. In Listing 2.13 we have the function drawObject to actually per-
form the rendering. The difference from the example in Listing 2.7 is that we
render both the triangles and their edges and we pass the color to use (as fill-
Color and lineColor). So far the only data we passed from our JavaScript code
to the program shader were vertex attributes, more specifically the position of
the vertices. This time we also want to pass the color to use. This is a global
data, meaning that it is the same for all the vertices processed in the vertex
shader or for all the fragments of the fragment shader. A variable of this sort
must be declared by using the GLSL keyword uniform. Then, when the shader
program has been linked, we can query it to know the handle of the variable
with the function gl.getUniformLocation (see line 52 in Listing 2.14) and we
can use this handle to set its value by using the function gl.uniform (see lines
20 and 25 in Listing 2.13). Note that the gl.uniform function name is followed
by a postfix, which indicates the type of parameters the function takes. For ex-
ample, 4fv means a vector of 4 floating points, 1i means an integer, and so on.
24
25 gl . uniform4fv ( this . uniformShader . uColorLocation , lineColor ) ;
26 gl . bindBuffer ( gl . ELEMENT_ARRAY_BUFFER , obj . i n d e x Bu f f e r E d g e s ) ;
27 gl . drawElements ( gl . LINES , obj . numTriangles * 3 * 2 , gl .←-
UNSIGNED_SHORT , 0) ;
28
29 gl . bindBuffer ( gl . ELEMENT_ARRAY_BUFFER , null ) ;
30
31 gl . d i s a b l e V e r t e x A t t r i b A r r a y ( this . uniformShader . a Position Index )←-
;
32 gl . bindBuffer ( gl . ARRAY_BUFFER , null ) ;
33 };
LISTING 2.13: Rendering of one geometric object. (Code snippet from
http://envymycarbook.com/chapter2/0.js.)
In the NVMC clients the shader programs will be encapsulated in
JavaScript objects, as you can see in Listing 2.14, so that we can exploit
a common interface to access the members (for example, the position of the
vertices will always be called aPositionIndex on every shader we will write).
1 uniformShader = function ( gl ) {
2 var v e r t e x S h a d e r S o u r c e = " \
3 uniform mat4 uModelViewMatrix ; \n\
4 uniform mat4 uPr oje cti onM atr ix ; \n\
5 attribute vec3 aPosition ; \n\
6 void main ( void ) \n\
7 { \n\
8 gl_Position = uPro jec tio nMa tri x * \n\
9 uModelViewMatrix * vec4 ( aPosition , 1.0) ; \n\
10 } \n\
11 ";
12
13 var f r a g m e n t S h a d e r S o u r c e = " \
14 precision highp float ; \n\
15 uniform vec4 uColor ; \n\
16 void main ( void ) \n\
17 { \n\
18 gl_FragColor = vec4 ( uColor ) ; \n\
19 } \n\
20 ";
21
22 // create the vertex shader
23 var vertexShader = gl . createShader ( gl . VERTEX_SHADER ) ;
24 gl . shaderSource ( vertexShader , ve rt e xS ha d er S ou rc e ) ;
25 gl . compileShader ( vertexShader ) ;
26
27 // create the fragment shader
28 var fragmentShader = gl . createShader ( gl . FRAGMENT_SHADER ) ;
29 gl . shaderSource ( fragmentShader , f r a g m e n t S h a d e r S o u r c e ) ;
30 gl . compileShader ( fragmentShader ) ;
48 Introduction to Computer Graphics: A Practical Learning Approach
31
32 // Create the shader program
33 var aPositionIndex = 0;
34 var shaderProgram = gl . createProgram () ;
35 gl . attachShader ( shaderProgram , vertexShader ) ;
36 gl . attachShader ( shaderProgram , fr agmentSh ader ) ;
37 gl . b i nd A tt ri b Lo c at io n ( shaderProgram , aPositionIndex , "←-
aPosition " ) ;
38 gl . linkProgram ( shaderProgram ) ;
39
40 // If creating the shader program failed , alert
41 if (! gl . g e t P r o g r a m P a r a m e t e r ( shaderProgram , gl . LINK_STATUS ) ) {
42 var str = " Unable to initialize the shader program .\ n \ n " ;
43 str += " VS :\ n " + gl . getShaderInfoLog ( vertexShader ) + " \ n \ n " ;
44 str += " FS :\ n " + gl . getShaderInfoLog ( fragmentShader ) + " \ n \ n←-
";
45 str += " PROG :\ n " + gl . get Prog ram Inf oLo g ( shaderProgram ) ;
46 alert ( str ) ;
47 }
48
49 shaderProgram . aPositionIndex = aPositionIndex ;
50 shaderProgram . u M o d e l V i e w M a t r i x L o c a t i o n = gl . g e t U n i f o r m L o c a t i o n ←-
( shaderProgram , " uModelViewMatrix " ) ;
51 shaderProgram . u P r o j e c t i o n M a t r i x L o c a t i o n = gl .←-
g et Un i fo r mL oc a ti o n ( shaderProgram , " u Pro jec tio nMa trix " ) ;
52 shaderProgram . uColorLocation = gl . ge t Un i fo rm L oc a ti on (←-
shaderProgram , " uColor " ) ;
53
54 return shaderProgram ;
55 };
LISTING 2.14: Program shader for rendering. (Code snippet from
http://envymycarbook.com/chapter2/0/0.js.)
Interface with the game. The class NVMCClient has a member game that
refers to the class NVMC and hence gives us access to the world both for giving
input to the simulation and for reading information about the scene. In this
particular client we only read the position of the player’s car in the following
line of code (Listing 2.15):
[client number].html, a file named shaders.js for the shaders introduced with
the client, one or more javascript files containing the code for new geometric
primitives introduced within the client and a file [client number].js containing
the implementation for the class NVMCClient.
Note, and this is very important, that each file [client number].js contains
only the modifications with respect to the previous client while in the HTML
file we explicitly include the previous versions of the class NVMCClient, so that
everything previously defined in each client file will be parsed. This is very
handy because it allows us to write only the new parts that enrich our client
and/or to redefine existing functions. For example, the function createObject-
Buffers will not need to be changed until chapter 5 and so it will not appear in
the code of the clients of chapter 4. A reader may argue that many functions
can be parsed without actually being called because they are overwritten by
more recent versions. Even if this is a useless processing that slows down the
loading time of the Web page, we prefer to proceed in this way for didactic
purposes. Nothing prevents you from removing overwritten members when a
version of the client is finalized.
On the contrary, the shader programs are not written incrementally, since
we do not want to always use an improved version of the same shader program
but, instead, we will often use many of them in the same client. The same goes
for the geometric objects. In this first client we introduced the Triangle, in the
next chapter we will write the Cube, the Cone and other simple primitives and
we will use them in our future clients.
Chapter 3
How a 3D Model Is Represented
3.1 Introduction
In very general terms a 3D model is a mathematical representation of a
physical entity that occupies space. In more practical terms, a 3D model is
made of a description of its shape and a description of its color appearance. In
this chapter we focus on geometric data representation, providing a panoramic
of the most used representations and presenting the peculiarities of each.
One main categorization of 3D object’s representation can be done by
considering whether the surface or the volume of the object is represented:
• Boundary-based: the surface of the 3D object is represented. This
representation is also called b-rep. Polygon meshes, implicit surfaces and
parametric surfaces, which we will describe in the following, are common
representations of this type.
51
52 Introduction to Computer Graphics: A Practical Learning Approach
3.1.2 Modeling
An artist or a technician interactively designs the 3D model using geo-
metric modeling software such as Maya® , 3D Studio Max® , Rhinoceros® ,
Blender and others.
How a 3D Model Is Represented 53
3.1.4 Simulation
Numerical simulations are used for several goals. Winds, temperature and
pressure are simulated for weather forecasts, fluid dynamics, that is, the study
of how liquids behave, is used for designing engines, cardiac pumps, vehi-
cles and so on. Very often the data produced can be naturally mapped on a
three-dimensional model, whether a b-rep, for example the wavefront of a low
pressure area, or with a volume, for example the fluid velocity.
In the following we analyze several 3D objects’ representation by under-
lining the advantages and the disadvantages of each one.
3.2.2 Manifoldness
A surface is said to be 2-manifold if the neighborhood of each point p
on the surface is homeomorphic to a disk. Simply put, it means that if we
How a 3D Model Is Represented 55
Boundary
edge
3.2.3 Orientation
Each face is a polygon and hence it has two sides. Let us suppose we
paint one side black and one white. If we can paint every face of the mesh so
that faces that share an edge are painted the same color, we say the mesh is
orientable and the orientation of a face is how we assigned black and white to
its sides. Only we do not need to actually paint the faces, we can assign the
orientation by the order in which the vertices are specified. More precisely, if
we look at a face and follow its vertices in the order they are specified in K,
they can describe a clockwise or anti-clockwise movement, like that shown in
Figure 3.4. Obviously if we look at the same faces from behind, that is, from
the back of the page, these orientations will be swapped. We can say that the
black side of a face is the side from which the sequence of its vertices is counter-
clockwise. Note that two faces f1 and f2 have the same orientation if, for each
shared edge, its vertices appear in the opposite order in the description of f1
and f2 .
56 Introduction to Computer Graphics: A Practical Learning Approach
Clockwise Counter-clockwise
orientation f1=(v1,v2,v3) orientation f1=(v1,v3,v2)
v2
f2=(v4,v3,v2) v2 f2=(v4,v2,v3)
v1 v1
f1 f1
f2 f2
v4 v4
v3 v3
Modeling is more easy and natural with other representations, for example,
parametric surfaces such as NURBS (described in Section 3.4) are typically
used to this aim. Finally, there is no obvious parameterization (for meshes take
a look at Section 7.9 to learn more about the concept of parameterization).
Despite these problems the computer graphics community has put a lot of
effort into mesh processing and a huge number of applications use this type of
representation. One of the main reasons for this is that meshes are the com-
mon denominator of other representations, that is, it is easy to convert other
representations to a polygon mesh. Another important motivation is that, as
just stated during the description of the rendering pipeline, drawing trian-
gles on the screen is much easier and optimizable than drawing more complex
shapes and consequently modern graphics hardware has evolved towards this
direction.
where (x, y, z) are cartesian coordinates. The set S is also known as the zero set
of f (.). For example, a sphere of radius r can be represented by the equation
x2 + y 2 + z 2 = r2 , which becomes x2 + y 2 + z 2 − r2 = 0 in the canonical form.
A plane in the space can be represented by the function ax + by + cz − d = 0,
and so on. So, a 3D object more complex than a basic geometric such as a
plane or a sphere can be represented by a set of implicit surface functions,
each one describing a part of its shape.
Algebraic surfaces are a particular kind of implicit surface for which f (.)
is a polynomial. The degree of an algebraic surface is given by the sum of the
maximum powers of all terms am xim y jm z km of the polynomial. An algebraic
surface of degree two describes quadratic surfaces, a polynomial of degree
three cubic surfaces, of degree four quartic surfaces and so on. Quadratic
surfaces, also called quadrics, are very important in geometric modeling. This
kind of surface intersects every plane in a proper or degenerate way forming
17 standard-form types of surfaces [40]. To mention some: parallel planes,
ellipsoid, elliptic cone, elliptic cylinder, parabolic cylinder, hyperboloid of one
sheet, hyperboloid of two sheets can all be obtained as the intersection of a
quadric with a plane.
where t is the curve parameter. Typically t ranges from 0 to 1, with the starting
point of the curve being C(0) and the ending point C(1).
Suppose you want to freely draw a curve and then express it in parametric
form. Aside from trivial cases, finding the formulas for X(t), Y (t) and Y (t)
directly is a prohibitively difficult task. Fortunately, there are ways that allow
us to derive these formulas from an intuitive representation of the curve. For
example we can describe the curve as a sequence of points, called control
points, like those shown in Figure 3.7. We could join these points directly
and obtain a piecewise curve but we can do better and obtain a smooth
curve by introducing a basis of blending functions used to join together the
control points in a smooth way. The blending functions define the properties
of the final curve/surface such as continuity and differentiability, if the curve/
surface is an approximation or an interpolation of the control points, and so
on. We have an interpolation if the curve passes through the control points
(see Figure 3.7 (Left)), and an approximation if the control points guide the
curve but do not necessarily belong to it (see Figure 3.7 (Right)).
The typical formulation of this is:
n
X
C(t) = Pi Bi (t) 0 ≤ t ≤ 1 (3.3)
i=0
where Pi are the control points and {Bi (.)} are the blending functions. The
set of control points is also called the control polygon.
B1,2
B0,3 B3,3
B1,3 B2,3
car design by two engineers, both working for French automobile companies:
Pierre Bézier, who was an engineer for Renault, and Paul de Casteljau, who
was an engineer for Citroën. The mathematical definition of a Bézier curve is:
n
X
P (t) = Pi Bi,n (t) 0 ≤ t ≤ 1 (3.4)
i=0
where Pi are the control points and Bi,n (.) are Bernstein polynomials of degree
n.
A Bernstein polynomial of degree n is defined as:
n i
Bi,n (t) = t (1 − t)n−i i = 0 . . . n (3.5)
i
3. P
They partition the unity (their sum is always one), that is:
n
i=0 Bi,n (t) = 1.
P (t) = P0 (1 − t) + P1 t 0 ≤ t ≤ 1, (3.7)
which corresponds to the linear interpolation between the point P0 and the
point P1 .
Typically, when we have several points to connect, a cubic Bézier curve is
used. Following the definition, this curve is formed by the linear combination
of 4 control points with the Bernstein basis of degree 3. Taking into account
Equation (3.5) the cubic Bézier curve can be written:
FIGURE 3.9: Cubic Bézier curves examples. Note how the order of the
control points influences the final shape of the curve.
P3
1 0 0 0 P0
−3 3 0 0 P1
1 t t 2 t3
= 3 −6 3 0 P2 (3.9)
1 3 −3 1 P3
The main characteristics of this curve is that it starts at the point P0 and
ends at the point P4 . The curve in P0 is tangent to the segment P1 − P0 and
the curve in P4 is tangent to the segment P3 − P2 . Figure 3.9 shows some
examples of cubic Bézier curves. Two examples of Bézier curves with degree
higher than 3, and so a number of control points higher than 4, are illustrated
in Figure 3.10.
How a 3D Model Is Represented 63
FIGURE 3.10: Bézier curves of high degree (degree 5 on the left and degree
7 on the right).
where, as usual, Pi are the control points and Ni,k (t) are the blending functions
defined recursively in the following way:
t − ti ti+k+1 − t
Ni,k (t) = Ni,k−1 (t) + Ni+1,k−1 (t) (3.11)
ti+k − ti ti+k+1 − ti+1
of these reasons B-splines are, in general, more flexible than Bézier curves.
Some examples of B-splines of different order k defined on the same eight
control points of the Bézier curves of Figure 3.10 are shown in Figure 3.12 for
comparison. Note the approximating character of the B-spline curves and the
fact that the greater k is, the more limited is the support of the curve with re-
spect to the control points. This can be avoided by increasing the multiplicity
of the first and last values of the knots (see [11] for more details).
u and v are the surface parameters. Even in this case the u and v parameters
usually range from 0 to 1.
In the case of parametric curves, we have seen that Equation (3.2) can
be expressed as a linear combination of control points with some blending
How a 3D Model Is Represented 65
functions (3.3). This last equation can be extended to the case of surfaces in
several ways. The most used one is the tensor product surface, defined as:
n X
X m
S(u, v) = Pij Bi (u)Bj (v) (3.14)
i=0 j=0
where Pij are the initial control points and {Bi (.)} and {Bj (.)} are the blend-
ing functions. In this case the control points Pij are referred to as the con-
trol net of the surface S. Tensor product surfaces are also named rectangular
patches, since the domain of the parameter (u, v) is a rectangle (in R2 ). In the
following section we are going to describe two important parametric surfaces:
Bézier patches, which are the extension of Bézier curves, and NURBS, which
are the extensions of B-splines.
66 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 3.13: Bicubic Bézier patch example. The control points are shown
as black dots.
where Pij are the points of the control net, {Bi,n (.)} are Bernstein polynomials
of degree n and {Bj,m (.)} are Bernstein polynomials of degree m. Figure 3.13
shows an example of a bi-cubic Bézier patch. In this case the control net of
the patch is formed by 4 × 4 control points.
The Bézier patches can be assembled together to represent the shape of
complex 3D objects. Figure 3.14 shows an example. The model represented
in this example is the Utah teapot, a model of a teapot realized in 1975 by
Martin Newell, a member of the pioneering graphics program at the University
of Utah. Since then, this simple, round, solid, partially concave mathematical
model has been a reference object (and something of an inside joke) in the
computer graphics community.
where n is the number of control points, Pi are the control points, {Ni,k (t)}
is the same blending function as for the B-spline curves and wi are weights
used to alter the shape of the curve. According to the tensor product surface
we can extend the definition of NURBS curves in the same manner as Bézier
patches to obtain NURBS surfaces:
Pn Pm
i=0 j=0 wij Pij Ni,k (u)Nj,m (v)
S(u, v) = Pn Pm (3.17)
i=0 j=0 wij Ni,k (u)Nj,m (v)
Thanks to the properties of B-Splines, the local control property still remains
valid for the NURBS, that is, the modification of a control point only affects
the surface shape in its neighborhood. So, it is easy to control the shape of a
large surface. Mainly for this reason NURBS surfaces are the base modeling
tool of powerful and famous geometric modellng software such as Maya® and
Rhino® . Figure 3.15 depicts an example of a 3D model realized using NURBS.
3.5 Voxels
Voxels are commonly used to represent volumetric data. This representa-
tion can be seen as the natural extension of two-dimensional images to the
third dimension. Like a digital image is represented as a matrix of picture
el ements, called pixels, a volumetric image is represented by a set of volume
el ements, called voxels, arranged on a regular 3D grid (see Figure 3.16). Each
voxel of this 3D grid provides information about the volume. Depending on
the specific application, many types of data can be stored into each voxel. Such
information could be the density of the volume element, the temperature, the
color, etc.
X
Picture Volume
element element
(pixel) (voxel)
The primitives can be defined, using implicit equations, as the set of points
that satisfy the equation f (x, y, z) < 0, where f (.) = 0 is an implicit surface
function. From this definition the points such that f (x, y, z) > 0 are outside
the volume bounded by the surface defined through f (.). This last fact is
assumed conventionally. Some primitives are defined by more than one implicit
equations (e.g., a cube).
The first equation in (3.19) means that the original points do not change
as required by an interpolating scheme. The second equation is for creating
the new points between pki and pi+1k
. The weight w enables us to control the
final shape of the curve by increasing or decreasing the “tension” of the curve.
When w = 0 the resulting curve is a linear interpolation of the starting points.
For w = 1/16 a cubic interpolation is achieved. For 0 < w < 1/8 the resulting
curve is always continuous and differentiable (C 1 ).
3.7.4 Classification
In these last years several stationary schemes have been developed. A clas-
sification of them related to their basic properties is particularly useful since it
helps us to immediately understand how a certain subdivision method works
depending on its categorization.
FIGURE 3.20: Primal and dual schemes for triangular and quadrilateral
mesh.
the coarse mesh, retaining old vertices, and then connecting the new inserted
vertices together.
The dual schemes work on the dual of the polygon mesh, which is obtained
by taking the centroids of the faces as vertices, and connecting those in adja-
cent faces by an edge. Then for each vertex a new one is created inside each
face adjacent to the vertex, as shown in the top row of Figure 3.20 (third row
from left), and they are connected as shown in the last row of the figure.
Note that for quadrilateral meshes this can be done in such a way that
the refined mesh has only quadrilateral faces, while in the case of triangles,
vertex split (dual) schemes result in non-nesting hexagonal tilings. In this
sense quadrilateral tilings are special: they support both primal and dual
subdivision schemes naturally.
3.7.4.4 Smoothness
The smoothness of the limit surface, that is, of the surface obtained ap-
plying the subdivision scheme an infinite number of times, is measured by its
continuity properties. The limit surface could be continuous and differentiable
(C 1 ), continuous and twice differentiable (C 2 ) and so on.
Many subdivision schemes with different properties have been proposed,
and analyzing them is out of the scope of this book. Just to give you an idea
of how such schemes are designed, in the following we show some examples.
In particular we will describe the Loop scheme, which is an approximating
scheme for triangular meshes, and the (modified) butterfly scheme, which is
an interpolation scheme for triangular meshes.
are called creases (see also Section 6.6.1). Obtaining the subdivision rules for
its corresponding mask is not difficult. The new vertices are shown in the mask
as a black dot. The edges indicate which neighbor vertices have to be taken
into account to calculate the subdivision rules. For example, referring to the
masks of the Loop scheme of Figure 3.21, the interior odd vertex v j+1 , that is
the new inserted vertex, is computed by centering the relative mask over it,
obtaining:
3 3 1 1
v j+1 = v1j + v2j + v3j + v4j (3.20)
8 8 8 8
where v1 and v2 are the immediate neighbors of the new vertex v, and v3
and v4 are the other two vertices of the triangles that share the edge where
v will be inserted. The mask for the even vertices, which are present only in
the approximating schemes, are used to modify the position of the existing
vertices.
butterfly scheme was proposed by Zorin et al. [47]. This variant guarantees
C 1 -continuous surfaces for arbitrary meshes. The masks of this scheme are
given in Figure 3.22. The coefficients si for the extraordinary vertices are
5 1 1
{s0 = 12 , s1 = − 12 , s2 = − 12 } for k = 3, {s0 = 38 , s1 = 0, s2 = − 18 , s3 = 0}
for k = 4, and the general formula for k > 5 is:
1 1 2iπ 1 4iπ
si = + cos + cos (3.21)
k 4 k 2 k
Since this scheme is interpolating, only the masks for the odd vertices are
given, and the even vertices preserve its position during each subdivision step.
• Which are the vertices connected with a given vertex v through the set
of its adjacent edges (that is, the 1-ring of v)?
Data structures are often tailored to certain types of queries, which in
turn are application dependent, so we cannot give a unique general criterion
for assessing a data structure. However, we can analyze their performance in
several regards:
• Queries—How efficient can the main queries listed above be made;
• Memory footprint—How much memory the data structure takes;
This data structure is trivial to update but that is the only good quality.
No query besides the position of the vertex itself or “which other vertices are in
the same face” can be done naturally because there is no explicit information
on adjacency. The data stored is redundant, because each vertex must be
stored once for each face adjacent to it.
In order to avoid vertices duplication, the vertex data can be stored in two
separate arrays, the vertices array and the faces array.
FIGURE 3.24: Winged-edge data structure. The pointers of the edge e5 are
drawn in cyan.
data structure, accessing all the vertices of a face costs twice as much be-
cause we need to access the edge first, and from the edge the vertices.
3.8.2 Winged-Edge
The winged-edge is an edge-based representation of polygon meshes. It was
introduced by Bruce G. Baumgart [2] in 1975 and it is one of the first data
structures for meshes that allows complex queries.
First of all, this data structure assumes that the mesh is two-manifold,
so each edge is shared by two faces (or one face if it is a boundary edge).
The name comes from the fact that the two faces sharing the edge are its
“wings.” The main element of this data structure is the edge. As Figure 3.24
illustrates, the edge we2 stores, for each of the adjacent faces f1 and f2 , a
pointer to each edge that shares one of its two vertices with we2 . Then it also
stores the pointers to its two adjacent vertices and to the two adjacent faces.
The faces store a single pointer to one (any) of their edges and the vertex
stores a pointer to one of its adjacent edges.
The winged edge allows us to perform queries on the 1-ring in linear time
because it allows us to jump from one edge to the other by pivoting on the
shared vertex. On the average, the memory footprint is three times that of
the IDS and updates are more involving, although linear in time.
3.8.3 Half-Edge
The half-edge [28] is another edge-based data structure that attempts to
simplify the winged-edge while maintaining its flexibility. As the name sug-
How a 3D Model Is Represented 81
array of vertices:
where Nv is the number of vertices of the mesh, and an array of faces, where
each face is defined by the indices of three vertex.
// vertices definition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
// triangles definition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
}
LISTING 3.1: Cube primitive.
3.9.2 Cone
The definition of the cone is a bit more complex than that of the cube. The
cone is defined by fixing an apex at a certain height, creating a set of vertices
posed on a circle that forms the base of the cone, and connecting them to
form the triangle mesh. Our cone is centered at the origin of its coordinate
system and has a height of 2 units and a radius of 1 unit, as you can see in
Listing 3.2 (Figure 3.27 shows the result).
In this case the generation of the geometric primitive includes the concept
of “resolution” of the geometry. For resolution we mean a parameter related
to the number of triangles that will compose the final geometric primitive.
84 Introduction to Computer Graphics: A Practical Learning Approach
The more triangles are used, the more the primitive will look smooth. In this
particular case, a resolution of n gives a cone composed by 2n triangles.
The vertices that form the circular base are generated in the following way:
first, an angle step (∆α ) is computed by subdividing 2π for the resolution of
the cone: ∆α = 2π n . With simple trigonometry the base vertices are calculated
as:
x = cos α
y = 0 (3.24)
z = sin α
(3.25)
where, in this case, y is the up axis and xz form the base plane. α is the angle
corresponding to the i-th vertex, that is α = i∆. Hence, a resolution of n
generates a base composed by n+1 vertices, the additional vertex corresponds
to the origin of the axis and it is used to connect the triangles of the base of
the cone. The final cone is composed by n + 2 vertices, the n + 1 of the base
plus the apex.
The triangles are created by first connecting the base of the cone, and then
the lateral surface of the cone. Even in this case we have to pay attention
to whether the order of the indices guarantees a consistent CW or CCW
orientation. To facilitate the understanding of the connections we point out
that the vertices are stored in the vertices array in the following way:
index 0 → apex
index 1 . . . n → base vertices
index n + 1 → center of the base
// ///
// /// Resolution is the number of faces used to tesselate the ←-
cone .
// /// Cone is defined to be centered at the origin of the ←-
coordinate reference system , and lying on the XZ plane .
// /// Cone height is assumed to be 2.0. Cone radius is assumed ←-
to be 1.0 .
function Cone ( resolution ) {
// vertices definition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
var vertexoffset = 3;
for ( var i = 0; i < resolution ; i ++) {
angle = step * i ;
// triangles defition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
// lateral surface
var triangleoffset = 0;
for ( var i = 0; i < resolution ; i ++) {
3.9.3 Cylinder
The generation of the cylinder primitive is similar to that of the cone (List-
ing 3.3). Figure 3.28 shows the resulting primitive. The base of the cylinder
lies on the xz plane and it is centered at the origin. The base of the cylinder
is generated in the same way as the base of the cone; even in this case the
resolution parameter corresponds to the number of vertices of the base. The
upper part of the cylinder is analogue to the bottom part. Hence, the final
cylinder has 2n + 2 vertices and 4n triangles if the value of resolution is
n. After both the vertices of the bottom and upper part are generated, the
triangles are defined, taking care to have a consistent orientation, as usual. In
How a 3D Model Is Represented 87
// vertices definition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
// lower circle
var vertexoffset = 0;
for ( var i = 0; i < resolution ; i ++) {
angle = step * i ;
// upper circle
for ( var i = 0; i < resolution ; i ++) {
angle = step * i ;
vertexoffset += 3;
// triangles definition
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
// lateral surface
var triangleoffset = 0;
for ( var i = 0; i < resolution ; i ++)
{
this . triangleIndices [ triangleoffset ] = i ;
this . triangleIndices [ triangleoffset +1] = ( i +1) % resolution ;
this . triangleIndices [ triangleoffset +2] = ( i % resolution ) + ←-
resolution ;
triangleoffset += 3;
3.10 Self-Exercises
3.10.1 General
1. Which is the most compact surface representation to describe sphere?
And for the intersection of three spheres? And for a statue?
2. What are the main differences between the Bézier curves and the B-
Splines curves?
3. In what way can we represent volumetric data?
Geometric transformations come into play all the time in computer graphics,
and to learn how to manipulate them correctly will save you hours of painful
debugging. In this chapter, we will take an informal approach, starting from
simple intuitive examples and then generalizing.
- scalars a, b, α . . .
px qx
py qy
- points p = . . . , q =
...
...
pw qw
vx ux
vy uy
- vectors v = . . . ,u =
...
...
vw uw
Also, in general we will often use p and q to indicate points. We will use a set
of operations on these entities to express transformation of the objects they
represent. These operations, shown in Figure 4.1, are:
91
92 Introduction to Computer Graphics: A Practical Learning Approach
4.2.1 Translation
A translation is a transformation applied to a point defined as:
Tv (p) = p + v
Geometric Transformations 93
FIGURE 4.2: Examples of translation (a), uniform scaling (b) and non-
uniform scaling (c).
4.2.2 Scaling
A scaling is a transformation applied to a point or a vector defined as:
sx px
S(sx ,sy ) (p) =
sy py
4.2.3 Rotation
A rotation is a transformation applied to a point or a vector defined as:
px cos α − py sin α
Rα (p) =
px sin α + py cos α
where α is the rotation angle, that is the angle by which the point or vector is
rotated around the origin. This formula for rotation is less trivial and needs a
small proof. Referring to Figure 4.3, consider the point p at distance ρ from
the origin. The vector connecting the point p to the origin makes an angle β
with the X-axis. The coordinates of p can be written as:
px ρ cos β
p= =
py ρ sin β
94 Introduction to Computer Graphics: A Practical Learning Approach
px 0 = ρ cos (β + α)
= ρ cos β cos α − ρ sin β sin α =
= px cos α − py sin α
p0 x = axx px + axy py
p0 y = ayx px + ayy py
Geometric Transformations 95
Note that we indicate with the same symbol of the function the matrix cor-
responding to that geometric transformation. Also note that, this does not
capture translation, which is an addition to the point coordinates and not a
combination of them. So, if we want to write a rotation followed by a trans-
lation, we still have to write Rα p + v.
In order to extend the use of matrix notation to translation we have to
express points and vectors in homogeneous coordinates. Homogeneous co-
ordinates will be explained in more detail in Section 4.6.2.2; here we only need
to know that a point in Cartesian coordinates p = [px , py ]T can be written
in homogeneous coordinates as p̄ = [px , py , 1]T , while a vector v̄ = [vx , vy ]T
can be written as v̄ = [vx , vy , 0]T . The first thing to notice is that now we
can distinguish between points and vectors just by looking at whether the last
coordinate is 1 or 0, respectively. Also note that the sum of two vectors in
homogeneous coordinates gives a vector (0 + 0 = 0), the sum of a point and
a vector gives a point (1 + 0 = 1) and the subtraction of two points gives a
vector (1 − 1 = 0), just like we saw earlier in this section.
In order to simplify notation, we define the following equivalences:
0 0
p = p − 0 ,p = 0 + p
1 1
that is, if we have a point p we write as p the vector from the origin to p and,
vice versa, if we have a vector p we write as p the point obtained adding the
vector to the origin.
With homogeneous coordinates, we only need one more column and one
more row for matrix vector multiplication. In particular, we will use matrices
of the form:
axx axy vx
ayx ayy vy (4.1)
0 0 1
96 Introduction to Computer Graphics: A Practical Learning Approach
Notice what the product of such a matrix by the point p̄ looks like:
axx axy vx px axx px + axy py + vx
ayx ayy vy py = ayx px + ayy py + vy
0 0 1 1 1
axx axy
p+v
= ayx ayy
1
In other words, now we are also able to express the translation of a point
in our matrix notation, by arranging the translation vector in the upper part
of the last column of the matrix. Note that if we multiply the matrix with a
vector [vx0 , vy0 , 0] the translation part will have no effect because the elements
of v are multiplied by 0. This is coherent with the fact that vectors represent
a direction and a magnitudo (the length of the vector, hence a scalar value):
the direction can be changed by a rotation or by a non-uniform scaling, the
magnitude can be changed by a scaling but neither of them is affected by
translation.
FIGURE 4.4: (Left) Three collinear points. (Right) The same points after
an affine transformation.
so you can apply the rotation R−45◦ to all the points of the car and bring it
to the position marked with A0 , then apply the translation T(3,2) :
√1
− √12 0
1 0 3 2
T(3,2) (R−45◦ p) = 0 1 2 √12 √1
2
0 p
0 0 1 0 0 1
Matrices of this form, containing a rotation and a translation, are often re-
ferred to as roto-translation matrices. They are very common since they are
the tools you have to use to move things around, keeping their shape intact.
In general we can apply any number of affine transformations and obtain
a single corresponding matrix as we did above. We will now see that some of
them are especially useful in practical situations.
the notation):
p0 = (R(p − c)) + c = R p − R c + c =
R p + (I − R)c
|{z} | {z }
rotation translation
The very same considerations hold for the scaling, just putting a scaling
matrix S(sx ,sy ) in place of the rotation matrix Rα , thus:
S (I − S)c
S(sx ,sy ),c = (4.3)
0 1
4.3.3 Shearing
Shearing is an affine transformation where the point p is scaled along one
dimension by an amount proportional to its coordinate in another dimension
(see the example in Figure 4.7).
100 Introduction to Computer Graphics: A Practical Learning Approach
1 k 0 x + ky
Sh(h,k) (p) = h 1 0 p = hx + y
0 0 1 1
Note that shearing can also be obtained as a composition of a rotation and
a non-uniform scaling (see Exercise 5, Section 4.13).
4.4 Frames
A frame is a way to assign values to the component of geometric entities.
In two dimensions, a frame is defined by a point and two non-collinear axes.
Figure 4.8 shows our car and two distinct frames. If you were given only frame
F0 and were asked to write the coordinates of the point p on the front of the
car, it would be obvious to say (2.5, 1.5). The presence of the other frame
implies the relative nature of coordinates of point p, which are (1.4, 2.1) with
respect to frame F1 .
Note that the origin and axis of these frames are themselves expressed in
a frame. The frame we use to express all other frames is called a canonical
frame and has origin in [0, 0, 1]T and axes u = [1, 0, 0]T and v = [0, 1, 0].
To change frame means to express the value of the components of points
and vectors given in a frame into another, which is what we have just done
visually in the previous example. Here we want to show how to compute the
transformation required to change frame from F0 to F1 . Let us start with
changing the coordinates of a point given in one frame F0 into the canonical
frame. We can do this geometrically by starting from the origin of F0 , moving
p0 x units along vector u0 and p0 y units along vector v0 :
p = O0 + p0 x u0 + p0 y v0
Note that we can write this formula in matrix notation by arranging the axes
as column vectors and the origin as the last column of a matrix (that we will
call M0 ):
u0x v0x O0x
p = u0y v0y O0y p0
0 0 1
| {z }
M0
It should not be a surprise that this matrix is just another affine transforma-
tion, because we have just applied two scalings and two point-vector sums. In
fact there is more: the upper left 2 × 2 matrix obtained by the axes of F0 is the
rotation matrix that brings the canonical axes to coincide with the F0 axes
and the last column is the translation that brings the origin of the canonical
frame to coincide with F0 . So the matrix that expresses the coordinates p0 in
the canonical frame is just a rototranslation.
Ruv O0
p= p0
0 1
therefore:
M1 −1 M0 p0 = p1
The matrix M1 −1 M0 transforms a point (or a vector) given in the frame F0
into the frame F1 . Note that from Section 4.3.4 we know how to invert roto-
translation matrices like M1 without resorting to algorithms for the inversion
of generic matrices.
Note that applying a rotation about the z axis only changes the x and y
coordinates. In other words a point rotated around an axis always lies on
the plane passing through the point and orthogonal to that axis. In fact, the
rotation in 3D is defined with respect to an axis, called the axis of rotation,
which is usually specified as r = or + tdir , t ∈ (−∞, ∞). All the techniques to
implement a rotation that we will see consider the case where the rotation axis
passes through the origin, which means r = [0, 0, 0, 1]T +t dir (see Figure 4.11).
This is not a limitation, since, as we did in Section 4.3.2, we can compose
the transformations to find the rotation around a generic axis. This can be
achieved by applying a translation to make the generic axis pass through the
origin, applying the rotation and then translating it back.
1. Apply the transformation that makes the rotation axis r coincide with
the z axis.
FIGURE 4.12: How to build an orthogonal frame starting with a single axis.
106 Introduction to Computer Graphics: A Practical Learning Approach
So what we miss is just the x and y axis of the frame Fr , which we will
calculate in the next section.
yr = r × xr
It should be clear that the choice of a determines the frame. We must be careful
about choosing a vector a not collinear with vector r, because the vector
product of two collinear vectors gives a null vector. If we pick three random
numbers to build the vector a, the chances of creating a vector collinear to r are
infinitely small, but not zero. Furthermore the finite precision of the computer
representation of real numbers also degrades the quality of the vector product
for quasi collinear vectors. So, instead of taking the chance, we consider the
position of the smallest component of vector r, let us call it i, and define a as
the vector with value 1 for the component i and 0 for the other components.
In other words, we take the canonical axis “most orthogonal” to r.
OF = (p r) r
and thus:
xF = p − OF = p − (p · r) r
yF = r × xF = r × (p − OF ) =
= r × (p − OF ) = r × p − r × (p · r) r = r × p
so we have the formula for rotating the point p around the axis rdir by α:
FIGURE 4.15: Scheme of the relations among the three rings of a gimbal.
Note that rotation Rxβ is specified with respect to its outer frame, that is,
the canonical frame transformed by rotation Ryα , as well as rotation Rzγ is
expressed in the canonical frame rotated by Rxβ Ryα . In this case we speak
of intrinsic rotations while if all the rotations were expressed with respect to
the canonical frame we would have extrinsic rotations.
The mapping between angles and resulting rotations is not bijective, in
fact the same rotation can be obtained with a combination of different angles.
Figure 4.16 illustrates a very annoying phenomenon: the gimbal lock. If
β = π/2 rotation is applied with rings r1 and r3 both make the aircraft roll
by an angle α + γ so we have infinite values for α and β leading to the same
rotation, and one degree of freedom is lost: more precisely, the gimbal is no
longer able to make the aircraft yaw and the rotation is locked in 2 dimensions
(pitch and roll). This means that when we are using euler angles we need to
take into account these degenerate configurations, for example, making it so
that the discontinuity points correspond to rotations we are not interested in.
FIGURE 4.16: Illustration of gimbal lock: when two rings rotate around the
same axis one degree of freedom is lost.
110 Introduction to Computer Graphics: A Practical Learning Approach
a = (aw , a)
with a = (ax , ay , az ). So, quaternions can also be seen as points in four di-
mensions.
Quaternions can be summed component by component just like point in
two or three dimensions:
i2 = j2 = k2 = ijk = −1
which gives:
ab = (aw bw − a · b, aw b + bw a + a × b)
The identity is 1 = (1, 0, 0, 0) and the inverse can be calculated as:
1
a−1 = (aw , −a)
kak2
4.6.2 Projections
Establishing the projection means describing the type of camera we are
using to observe the scene. In the following we will assume that the scene is
expressed in the view reference frame, therefore the point of view is centered
at the origin, the camera is looking toward the negative z axis and its up
direction is y.
observing that the triangles Cap0 and Cbp are similar and hence the ratios
between corresponding sides are equal:
py
py0 : d = py : pz ⇒ p0y =
pz /d
the same holds for the x coordinate, therefore
px
pz /d
py
0
p = pz /d (4.9)
d
1
Note that our example is only an ideal description, since the eye is not a
zero-dimensional point. Another, more common, way to introduce perspective
projection is by means of the pinhole camera, represented in Figure 4.19. The
pinhole camera consists of a light-proof box with a single infinitely small hole
so that each point in front of the camera projects on the side opposite to the
hole. Note that in this way the image formed is inverted (up becomes down,
left become right) and this is why we prefer the eye-in-front-of-the-window
example.
1 λ0 λ1 λ2
Note that the fourth component can be any real number λi 6= 0 but when it
is 1 the coordinates are said to be in the canonical form. When a point is in
canonical form we can represent it in 3D space by simply considering its first
three coordinates. Consider again our point p0 in equation 4.9: if we multiply
all the components by pdz we have the following equivalence:
px
pz /d px
py
py
p0 = pz /d =
d pz
pz
1 d
Note that the difference is that now no component of p0 appears in the de-
nominator and so the perspective projection can be written in matrix form:
perspective projection
px
z
}| {
1 0 0 0 px px pz /d
0 1 0 0 py py py
Prsp p= =
pz =
pz /d
0 0 1 0 pz
1 pz
d
0 0 d 1 1 d 1
have orthographic projections (see Figure 4.20). In this case the projection
matrix is trivially obtained by setting the z coordinate to d. To be more
precise, the x and y values of the projected points are independent from d, so
we just consider the projection plane z = 0. In matrix notation:
orthographic projection
z
}| {
1 0 0 0 px px
0 1 0 0
py
py
Orth p=
=
0 0 0 0 pz 0
0 0 0 1 1 1
FIGURE 4.21: All the projections convert the viewing volume in the canon-
ical viewing volume.
(4.10)
In these transformations, n indicates the distance of the near plane, f the
distance of the far planes, and t, b, l, r are the values that define the top, the
bottom, the left, and the right delimitation of the viewing window, respec-
tively, all expressed with respect to the observer (that is, in view space).
Be aware that a point transformed by the projection matrix is not yet in its
canonical form: it will be so only after its normalization. So, the projection ma-
trix does not bring points from view space to CVV but to a four-dimensional
space called clip space because it is the space where clipping is done. The
Canonical Viewing Volume is also referred to as Normalized Device Context
(NDC) to indicate the normalization applied to the homogeneous coordinates.
Coordinates in NDC space are called normalized device coordinates.
vX −vx
1 0 vx 0 0 1 0 1
2
vY −vy
W = 0 1 vx 0 0
0 1 1
2
0 0 1 0 0 1 0 0 1
vX −vx vX −vx
0 vx
2 2
vY −vy vY −vy
=
0 2 2 vy
0 0 1
Vw0 = Vw kw
Vh0 = Vh kh
Geometric Transformations 119
so that
Vw0 Ww
=
Vh0 Wh
Note that in order to change the aspect ratio just one coefficient would be
enough, but we also want to choose how to change width and height of the
viewing window. For example, we may want to keep the width fixed, so: kw = 1
Wh Vw
and kh = W w Vh
.
A fairly common choice is not to cut anything from our intended projection
and enlarge one of the two sizes. This is achieved by setting kw = 1 and solving
for kh if VVwh > WWh . We set kh = 1 and we solve for kw otherwise. You may
w
easily verify both geometrically and algebraically that the unknown coefficient
is greater than 1, that is, we enlarge the viewing window.
4.6.5 Summing Up
Figure 4.23 summarizes the various properties preserved by all the trans-
formations discussed in this chapter.
The upper part of the tree is also obtained by a non-uniform scaling, this time
of the cone. However, the scaled cone must also be translated along the Y axis
by 0.8 units in order to be put onto the trunk. Therefore:
Geometric Transformations 123
1 0 0 0 0.6 0 0 0
0 1 0 0.8
0 1.65 0 0
M0 = T(0,0.8,0) S(0.6,1.65,0.6) =
0 0 1 0 0 0 0.6 0
0 0 0 1 0 0 0 1
Our first car will be a very simple one, made by a box for the car’s body
and four cylinders for the wheels. We want to transform the cube centered in
(0, 0, 0) with side 2 into the box with sizes (2, 1, 4) and with its bottom face
on the plane y = 0.3. We first apply a translation by 1 along the Y axis to
make the bottom face of the cube to lie on the plane y = 0. In this way we
can apply the scaling transformation S(1,0.5,2) to have the box with the right
proportions while keeping the bottom face on the plane y = 0. Finally we
translate again along Y by the radius of the wheel. All together:
Let us say that the projection is such that it includes the whole circuit, which
is a square of 200m × 200m. Since we have set the point of view along the y
axis, we will need to include 100m on each side (left, right, top and bottom)
in the viewing volme. So we take the matrix in 4.10 and replace r, l, t and
b with 100. The near plane n may be set to 0 and the far plane f so that it
includes the ground. Since we are looking from [0, 10, 0]T , f may be set to a
value greater than 10 (we will see in Section 5.2 how to choose the far plane
correctly to avoid artifacts due to limited numerical representation).
2/200 0 0 0
0 2/200 0 0
P =
0 −2/11 −11/11
0 0 0 1
As explained in Section 4.6.4.1, these values may need to be changed according
to the viewport chosen.
2 var v e r t e x S h a d e r S o u r c e = " \
3 uniform mat4 uModelViewMatrix ; \n\
4 uniform mat4 uPr oje cti onM atr ix ; \n\
5 attribute vec3 aPosition ; \n\
6 void main ( void ) \n\
7 { \n\
8 gl_Position = uPro jec tio nMa trix * uModelViewMatrix \n\
9 * vec4 ( aPosition , 1.0) ; \n\
10 }";
11
12 var f r a g m e n t S h a d e r S o u r c e = " \
13 precision highp float ; \n\
14 uniform vec4 uColor ; \n\
15 void main ( void ) \n\
16 { \n\
17 gl_FragColor = vec4 ( uColor ) ; \n\
18 } ";
LISTING 4.3: A basic shader program.
Listing 4.3 shows the vertex and the fragment shader. With respect to
the shader shown in Listing 2.8 we added a uniform variable for the color, so
that every render call will output fragments of color uColor and the matrices
uProjectionMatrix (that we set at line 235 of Listing 4.1) and uModelViewMatrix
(that we set to stack.matrix before rendering any shape) to transform the
vertex position from object space to clip space.
Lines 147-148 of Listing 4.4 show the typical pattern for drawing a shape:
we first set the uModeViewMatrix the value of the current matrix in the stack
(stack.matrix) and then make the render call. A snapshot from the resulting
client is shown in Figure 4.27.
128 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 4.27: A snapshot from the very first working client. (See client
http://envymycarbook.com/chapter4/0/0.html.)
4.10.1 Upgrade Your Client: Add the View from above and
behind
To make things more interesting, suppose we want to place the view refer-
ence frame as in Figure 4.28, which is the classic view from up-behind where
you see part of your car and the street ahead. This reference frame cannot be a
constant because it is behind the car and so it is expressed in the frame of the
car F0 , which changes as the car moves. In terms of our hierarchy, this view
reference frame is a child of the car’s frame F0 . In terms of transformations,
the view reference frame can be easily expressed in the frame of the car by
rotating the canonical frame around the x axis by −30◦ and then translating
by (0, 3, 1.5).
V c0 = T(0,3.0,1.5) Rx (−20◦ )
Please note that V c0 is expressed in the frame of the car F0 , and not (yet)
in the world reference frame. As for the wheels and the car’s body, to know
the view reference frame in the world coordinates we just need to multiply it
by F0 :
V0 = F0 V c0
Geometric Transformations 129
FIGURE 4.28: A view reference frame for implementing the view from be-
hind the car.
Since we will create more view frames and we want to give some structure
to the code we create an object that represents a specific view frame and the
way we manipulate it. In this case we call this object ChaseCamera, and show
its implementation in Listing 4.5.
74 function ChaseCamera () {
75 this . position = [0.0 ,0.0 ,0.0];
76 this . keyDown = function ( keyCode ) {}
77 this . keyUp = function ( keyCode ) {}
78 this . mouseMove = function ( event ) {};
79 this . mouseButtonDown = function ( event ) {};
80 this . mouseButtonUp = function () {}
81 this . setView = function ( stack , F_0 ) {
82 var Rx = SglMat4 . r ota tion Ang leA xis ( sglDegToRad ( -20) , [1.0 , ←-
0.0 , 0.0]) ;
83 var T = SglMat4 . translation ([0.0 , 3.0 , 1.5]) ;
84 var Vc_0 = SglMat4 . mul (T , Rx ) ;
85 var V_0 = SglMat4 . mul ( F_0 , Vc_0 ) ;
86 this . position = SglMat4 . col ( V_0 ,3) ;
87 var invV = SglMat4 . inverse ( V_0 ) ;
88 stack . multiply ( invV ) ;
89 };
90 };
LISTING 4.5: The ChaseCamera sets the view from above and behind the
car. (Code snippet from http://envymycarbook.com/chapter4/1/1.js.)
The lines from 76 to 80 define functions to handle key and mouse events.
For this specific camera they do nothing, since the camera simply depends
on the car position and orientation, which is passed to the function setView
in line 81. The goal of this function is to update the matrix stack (passed as
parameter stack) with the viewing transformation. In this case the lines from
82 to 88 simply build the frame as defined by the equations above and finally
its inverse is right multiplied to the stack’s current matrix.
130 Introduction to Computer Graphics: A Practical Learning Approach
vo0 = vo + VR tV (4.11)
In order to change the orientation of the view reference frame using the
mouse, a common strategy is to map mouse movements into Euler angles and
rotate the frame axis (VR ) accordingly. Typically, the change of orientation
is limited to yaw and pitch, therefore we can store the current direction as α
and β (following notation of Section 4.5.2) and obtain the z axis of the view
reference frame as:
Note that the choice of [0, 1, 0]T guarantees that vo 0x will be parallel to plane
XZ, which is what we want in this kind of interaction (no roll). Also note
that this choice does not work if we are aiming straight up, which means
when vo 0z = [0, 1, 0]T . This problem is usually avoided by limiting the pitch to
89.9◦ .
A more elegant implementation is to always update the original view ref-
erence frame (and not build a new one). Instead of keeping the total rotation
angles α and β we keep the increment dα and dβ. Then we first rotate VR
around the (y) axis of the world reference frame:
VR0 = Ry dα VR
Note that rotating a frame (or, more generally, applying any affine transfor-
mation to it) is simply done by rotating each of its axes. Then we apply a
rotation to VR0 around its x axis, which is done as shown in Section 4.4:
−1
VR00 = VR0 Rxdβ VR0 VR0 = Rydα VR Rxdβ (4.14)
| {z }
rotation around VR0 x
7 function P h o t o g r a p h e r C a m e r a () {
8 this . position = [0 , 0 , 0];
9 this . orientation = [1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , ←-
0 , 1];
10 this . t_V = [0 , 0 , 0];
11 this . orienting_view = false ;
12 this . lockToCar = false ;
13 this . start_x = 0;
14 this . start_y = 0;
15
16 var me = this ;
17 this . handleKey = {};
18 this . handleKey [ " Q " ] = function () { me . t_V = [0 , 0.1 , 0];};
19 this . handleKey [ " E " ] = function () { me . t_V = [0 , -0.1 , 0];};
20 this . handleKey [ " L " ] = function () { me . lockToCar = true ;};
21 this . handleKey [ " U " ] = function () { me . lockToCar = false ;};
22
23 this . keyDown = function ( keyCode ) {
24 if ( this . handleKey [ keyCode ])
25 this . handleKey [ keyCode ]( true ) ;
26 }
27
28 this . keyUp = function ( keyCode ) {
29 this . delta = [0 , 0 , 0];
30 }
31
32 this . mouseMove = function ( event ) {
33 if (! this . orienting_view ) return ;
34
35 var alpha = ( event . offsetX - this . start_x ) /10.0;
36 var beta = ( event . offsetY - this . start_y ) /10.0;
37 this . start_x = event . offsetX ;
38 this . start_y = event . offsetY ;
39
40 var R_alpha = SglMat4 . r o t a t i o n A n g l e A x i s ( sglDegToRad ( alpha ←-
) , [0 , 1 , 0]) ;
41 var R_beta = SglMat4 . r o t a t i o n A n g l e A x i s ( sglDegToRad ( beta ) ,←-
[1 , 0 , 0]) ;
42 this . orientation = SglMat4 . mul ( SglMat4 . mul ( R_alpha , this .←-
orientation ) , R_beta ) ;
Geometric Transformations 133
43 };
44
45 this . mouseButtonDown = function ( event ) {
46 if (! this . lock_to_car ) {
47 this . orienting_view = true ;
48 this . start_x = event . offsetX ;
49 this . start_y = event . offsetY ;
50 }
51 };
52
53 this . mouseButtonUp = function () {
54 this . orienting_view = false ;
55 }
56
57 this . updatePosition = function ( t_V ) {
58 this . position = SglVec3 . add ( this . position , SglMat4 . mul3 ( this←-
. orientation , t_V ) ) ;
59 if ( this . position [1] > 1.8) this . position [1] = 1.8;
60 if ( this . position [1] < 0.5) this . position [1] = 0.5;
61 }
62
63 this . setView = function ( stack , carFrame ) {
64 this . updatePosition ( this . t_V )
65 var car_position = SglMat4 . col ( carFrame ,3) ;
66 if ( this . lockToCar )
67 var invV = SglMat4 . lookAt ( this . position , car_position , [0 ,←-
1 , 0]) ;
68 else
69 var invV = SglMat4 . lookAt ( this . position , SglVec3 . sub ( this .←-
position , SglMat4 . col ( this . orientation , 2) ) , SglMat4 .←-
col ( this . orientation , 1) ) ;
70 stack . multiply ( invV ) ;
71 };
72 };
LISTING 4.6: Setting the view for the photographer camera. (Code snippet
from http://envymycarbook.com/chapter4/1/1.js.)
Figure 4.29 shows a snapshot from the client while viewing from the pho-
tographer’s point of view.
this is not always possible because the center of the sphere may be too close
to the viewing plane, note that the ratio between mouse movement and angle
depends on sphere radius: if it is too big large mouse movements will result
in small angle rotation and make the trackball nearly useless.
A common and effective solution consists of using as surface a combination
of the sphere and a hyperbolic surface as shown in Figure 4.31:
( p 2
r − (x2 + y 2 ) x2 + y 2 < r2 /2
S= r 2 /2
√ 2 2 x2 + y 2 >= r2 /2
x +y
This surface coincides with the sphere around the center of the trackball and
then smoothly approximates the plane XY . The amount of rotation can still
be determined by Equation (4.15) although the more the points p00 and p01
are off the center the smaller the angle will be with respect to their distance.
Instead we can use:
kp0 − p00 k
θ= 1
r
that is, the angle covered by the arc length kp01 − p00 k (see Figure 4.32, left
side).
rotate about the car center, to examine it, and then to switch back to WASD
mode to go to see some other detail of the scene.
We define the object ObserverCamera, which has all the same member func-
tions as the ChaseCamera and the PhotographerCamera. The implementation
of the camera is a mere implementation of the formulas in Sections 4.11.1
and 4.11.3. The only detail left out is what happens when we switch from the
trackball mode to the WASD mode. Note the WASD mode works by chang-
ing the view frame while the trackball makes the world rotate about a pivot
point, that is, without changing the view frame. Such rotation is stored in
the matrix tbMatrix, which multiplies the stack matrix right after the view
transformation, so that all the scene is rotated. If we use the virtual trackball
and then switch to WASD mode we cannot simply ignore the matrix tbMatrix
because our view frame has not changed since the beginning of the rotation
and it would be like if we did not use the trackball mode at all.
What we do is the following: every time a mouse movement controlling the
trackball has ended, we convert the trackball rotation into a camera transfor-
mation and reset the trackball transformation to the identity. In other words,
we change the view frame to obtain exactly the same transformation as with
the trackball. Let us see how this is done. During the trackball manipulation,
the generic point p of the scene is brought in view space by:
p0 = V −1 tbM atrix p
When we switch to WASD mode we want to forget about tbM atrix, but keep
the same global transformation, so we need a new view reference frame Vwasd
such that:
−1
p0 = Vwasd p = V −1 tbM atrix p
so we have:
−1
Vwasd = V −1 tbM atrix → Vnew = tbM atrix−1 V
Geometric Transformations 137
FIGURE 4.33: Adding the Observer point of view with WASD and Trackball
Mode. (See client http://envymycarbook.com/chapter4/2/2.html.)
Figure 4.33 shows a snapshot from the client while viewing the scene with
the Observer Camera.
4.13 Self-Exercises
4.13.1 General
1. Find the affine 2D transformation that transforms the square cen-
tered at (0, 0) with sides of length 2 into the square with corners
{(2, 0), (0, 2), (−2, 0), (0, −2)} and find its inverse transformation.
138 Introduction to Computer Graphics: A Practical Learning Approach
4. Consider the square with unitary sides centered at (0.5, 0.5) and apply
the following transformations:
• rotation by 45◦
• scaling by 0.5 along the y axis
• rotation by −30◦
Write down the shear transformation that gives the same result.
5. Describe what happens to the image formed in a pinhole camera when
the hole is not punctiform.
2. In order to tell the front of the car from its back, apply a transformation
along the z axis so that the front will look half as wide and half as high
as the back. Can you do it with an affine transformation? Hint: draw the
transformed version of the box and answer to the question: do parallel
lines stay parallel after this transformation?
3. Add the front view, which is the view looking forward from the car (so
the car is no longer visible).
4. In Listing 4.1 we show how to set the projection matrix so to preserve
the aspect ratio. Note that the way we did it works on the assumptions
that width/height ratio was greater for the viewport than the wanted
viewing window. What happens if we set the viewport to 300 × 400?
Will we still see the whole circuit? Fix the code in Listing 4.1 so that it
works in every case.
Chapter 5
Turning Vertices into Pixels
In the previous chapters we learned the basic algorithms for turning a specifi-
cation of a three-dimensional scene (vertices, polygons, etc.) into a picture on
the screen. In the following sections we show the details of how a geometric
primitive becomes a region of colored pixels and also several optimizations
that are commonly used for making it an efficient operation.
5.1 Rasterization
Rasterization comes into play after the vertex positions are expressed in
viewport space (see Section 4.6.4). Unlike all the previous reference systems
(in order: object space, world space, view space, NDC) this one is discrete
and we have to decide how many fragments belong to the primitive and which
discrete coordinates have to be assigned to the corresponding fragments.
5.1.1 Lines
What is called line in CG is actually a segment specified by the two end-
points (x0 , y0 ) and (x1 , y1 ), and for the sake of simplicity we assume that the
coordinates are integers. The segment can be expressed as:
y = y0 + m (x − x0 ) (5.1)
y1 − y0
m =
x1 − x0
From Equation (5.1) we can see that if x is incremented to x + 1, y is in-
cremented to y + m. Note that if −1 ≤ m < 1 and we round to the closest
integer the values of y, for all the values of x between x0 and x1 we will have
a series of pixels such that there is exactly one pixel for each column and two
consecutive pixels are always adjacent, as shown in Figure 5.1. If m > 1 or
m < −1 we can invert the roles of x and y and write the equation as:
1
x = x0 + (y − y0 ) (5.2)
m
139
140 Introduction to Computer Graphics: A Practical Learning Approach
and compute the values of x for y going from y0 to y1 . This algorithm is called
discrete difference analyzer (DDA) and is reported in Listing 5.1.
x = x0 ; y = y0 ;
do {
OutputPixel ( round (x) , round (y) ) ;
x = x + ∆x;
y = y + ∆y;
}
}
or, stated in other words, if the lines pass over (i + 1, j + 12 ) the next pixel
will be (i + 1, j + 1), otherwise it will be (i + 1, j). It is useful to write the line
equation
∆y
y= x+c (5.4)
∆x
with the implicit formulation:
y∆x − x∆y − c∆x = 0. (5.5)
For compactness we define the function F (x, y) = −∆y x + ∆x y − c. Note
that
0 if (x, y) belongs to the line
F (x, y) = > 0 if (x, y) is above the line (5.6)
< 0 if (x, y) is below the line
However, since we only use the sign of F , we simply multiply both F and ∆Di
by 2 and obtain integer-only operations:
F (i, j) = 2 ∆xj − 2 ∆y i − 2 ∆x c (5.10)
So, if our line starts from pixel (i0 , j0 ) and ends at pixel (i1 , j1 ) we can compute
the initial value of the decision variable:
1 1
F (i0 + 1, j0 + ) = 2 ∆x(j0 + ) − 2 ∆y (i0 + 1) − 2 ∆x c
2 2
= 2 ∆xj0 + ∆x − 2 ∆y i0 − 2 ∆y − 2 ∆x c
= F (i0 , j0 ) + ∆x − 2 ∆y
= 0 + ∆x − 2 ∆y
Note that, according to Equation (5.6), F (i0 , j0 ) = 0 because (i0 , j0 ) belongs
to the line. This allows us to write a rasterization algorithm that uses integer
coordinates only and simple updates of the decision variable as shown in
Listing 5.2. Since ∆y/∆x are related to the slope of the line we can simply
set them as the width and the height of the line, respectively.
B r e s e n h a m R a s t e r i z e r (i0 ,j0 ,i1 ,j1 ) {
int ∆y = j1 - j0 ;
int ∆x = i1 - i0 ;
int D = ∆x - 2 * ∆y;
int i = i0 ;
int j = j0 ;
OutputPixel (i,j ) ;
while (i < i1 )
{
if (D >= 0) {
i = i + 1;
j = j + 1;
D = D + (∆x - ∆y) ;
}
else {
i = i + 1;
D = D + (−∆y) ;
}
OutputPixel (i,j ) ;
}
}
LISTING 5.2: Bresenham rasterizer for the case of slope between 0 and 1.
All the other cases can be written taking into account the symmetry of the
problem.
1 1
x = (y − b) m , it means that the intersection point moves by m at every
scanline.
5.1.2.2 Triangles
The modern graphics hardware are massively parallel and designed to pro-
cess the pixels in groups, but if we consider the rasterization algorithms seen
so far we can notice they are quite linear in that they process one pixel after
the other and update quantities. The scanline algorithm for polygon filling
works for every type of polygon (even non-convex or with holes) but proceeds
line after line.
A more parallel approach is just to give up on incremental computation
and test in a parallel group of pixels to see if they are inside or outside the
polygon. As will be made clear in a moment, this test can be done efficiently
for convex polygons without holes. The only thing we need is to be able to
know if a point is inside a triangle or not.
A useful characteristics of the convex polygons, and hence of triangles, is
that they can be seen as the intersection of a finite number of halfspaces. If,
for each edge of the triangle, we take the line passing through it, one of the
two halfspaces will include the inside of the triangle, as shown in Figure 5.4.
Let us consider again the implicit formulation of the line used in Equa-
tion (5.6). In this case, we can think that the line divides the 2D space in two
halfspaces, one where the function F (.) is positive and one where it is nega-
tive. In principle we could just find the implicit definition for each of the three
lines but then we are left with an ambiguity: how do we know, for a given
line, if the points inside the triangle are those in the positive or in the negative
halfspace? Just to make it clearer, note that if F is the implicit formulation
of a line, −F also is.
What we need is a conventional way to define the implicit formulation that
depends on the order in which we consider the two vertices defining the line.
Consider the edge v0 v1 : the line passing through it can be written as:
x x x1 x
= 0 +t − 0 (5.11)
y y0 y1 y0
Turning Vertices into Pixels 145
Note that if we consider the vector made of x and y coefficients [∆y, −∆x]T we
will find that it is orthogonal to the line direction [∆x, ∆y]T (their dot product
is 0, see Appendix B) and it indicates the halfspace where the implicit function
is positive. To test that this is true, we can simply write E01 for point p (see
Figure 5.5):
∆y
E01 v0 + = . . . = ∆y 2 + ∆x2 > 0
−∆x
In other words, if we walk on the line along direction v1 − v0 the positive
halfspace is on our right. Note that if we write the edge equation swapping
the order of the two points we will see that E10 = −E01 and, again, the positive
halfspace will be on our right (walking along v0 − v1 ). So if we specify the
triangle as three vertices v0 v1 v2 counterclockwise, it means that if we walk
from v0 to v1 , and then from v1 to v2 , the space outside the triangle is on
our right all the time. According to this, we can test if a point p is inside the
triangle in this way:
where we indicate with c(i, j) the color of pixel (i, j) and with λ0 (i, j) and
λ1 (i, j) the coefficients of the linear combination. Let us assume the position
of pixel (i0 , j 0 ) is exactly middle way between v0 and v1 . Then we would have
Turning Vertices into Pixels 147
the viewer and then drawn from the farthest to the nearest (which is referred
to as back-to-front drawing), as is done in some painting techniques, where
the painter paints the background and then draws on it the objects starting
from the far ones and proceeding with the ones closer to the viewer. In this
manner, if more primitives overlap on a pixel the nearest will be rasterized as
the last one and will determine the pixel value.
FIGURE 5.9: (a) Depth sort example on four segments and a few examples
of planes separating them. Note that C and D cannot be separated by a plane
aligned with the axis but they are by the plane lying on C. D and E intersect
and cannot be ordered without splitting them. (b) A case where, although no
intersections exist, the primitives cannot be ordered.
Turning Vertices into Pixels 151
tests to look for a separating plane, and we start with planes parallel to the
principal planes because the tests are simply comparisons of one coordinate.
Note that for a couple of non-intersecting convex polygons one of the planes
that contains one of the primitives is the separating plane.
Unfortunately, if the primitives intersect they cannot be ordered and, in
general, it is not possible to draw first one and then the other and get the
correct result. Even if no intersections exist, it may be not possible to find a
correct back-to-front order. Figure 5.9(b) shows a case where this happens: a
part of A is behind B, a part of B is behind C, and a part of C is behind A.
5.2.2 Scanline
Note that the cyclic ordering of Figure 5.9(b) would never happen in the
bidimensional case; that is, for segments in the plane. The scanline algorithm
rasterizes the scene line-by-line along the y axis, therefore each iteration con-
cerns the intersection of a plane y = k with the scene. The result of a scanline
pass produces a set of segments whose vertices are at the intersection of the
plane with the edges of the primitives. In this manner the problem is scaled
down by one dimension and it becomes a matter of correctly rendering a set
of segments.
If we project the vertices on the current scanline, we obtain a series of spans
with the following obvious property: in each span the back-to-front order does
not change (see Figure 5.10). So, for each span, we find out the nearest portion
of segment and rasterize it until the end of the span.
The scanline may also be used for intersecting polygons, although we need
a more complicated approach to define the spans (and which is also beyond
our interest for this technique).
Although both depth sort and scanline approaches are widely known in
the CG community, they are not part of the rasterization-based pipeline, for
FIGURE 5.10: (a) Step of the scanline algorithm for a given plane. (b) The
corresponding spans created.
152 Introduction to Computer Graphics: A Practical Learning Approach
the simple reason that they do not fit the pipeline architecture well, where
primitives are processed one-by-one through the stages of the pipeline. Instead,
these algorithms need to store all the primitives to be rendered and order them
before drawing. Furthermore, they are not well suited for implementation on
parallel machines, like current GPUs are.
5.2.3 z-Buffer
The z-Buffer is a de facto standard solution for hidden surface removal.
We explained in Section 5.1 that when a primitive is rasterized, a number of
interpolated attributes are written in the output buffers. One of the attributes
is the z value of the vertex coordinate, which is written in the buffer called
z-buffer (or depth buffer ). The algorithm is very simple and it is shown in
Listing 5.3. At the beginning the z-buffer is initialized with the maximum
possible value (line 2), which is the value of the far plane of the view frustum
(note that the z-buffer operates in NDC and hence this value is 1). Then
each primitive is rasterized and for each pixel covered (line 4) its z value is
calculated. If this value is smaller than the value currently written in the
z-buffer, the latter is replaced with the new value (line 6).
1 ZBufferAlgorithm () {
2 forall i , j ZBuffer [i , j ] = 1.0;
3 for each primitive pr
4 for each pixel (i , j ) covered by pr
5 if ( Z [i , j ] < ZBuffer [i , j ])
6 ZBuffer [i , j ] = Z [i , j ];
7 }
FIGURE 5.11: State of the depth buffer during the rasterization of three
triangles (the ones shown in Figure 5.9(b)). On each pixel is indicated the
value of the depth buffer in [0, 1]. The numbers in cyan indicate depth values
that have been updated after the last triangle was drawn.
FIGURE 5.12: Two truncated cones, one white and one cyan, superimposed
with a small translation so that the cyan one is closer to the observer. However,
because of z-buffer numerical approximation, part of the fragments of the cyan
cones are not drawn due to the depth test against those of the white one.
1 f + n 1 2f n 1f +n 1 fn
fp (z) = − +1 = + − (5.24)
2 f −n zf −n 2f −n 2 z(f − n)
Unlike for orthogonal projection, this time the depth value is proportional
to the reciprocal of the z value in view space. Figure 5.13 shows a plot where
the abscissa corresponds to the value of z from n to f and the ordinate to
the value in depth buffer space for fp (z) and fo (z). We may see how with
perspective projection, the first 30% of the interval maps on the 80% of the
interval in depth space, that is, values close to the near plane n are mapped
more uniformly in depth buffer space. Therefore, the farther away from the
near plane, the less the precision.
FIGURE 5.13: A plot showing the mapping between z-values in view space
and depth buffer space.
to implement the view from inside the car: if at each frame we render both
the interior and then the rest of the scene we obtain the same result. The
difference is in efficiency: using stenciling, we only draw the interior once and
avoid executing the z-buffer algorithm (reading from the z-buffer, comparing
and writing) for all those fragments that would never be drawn anyway.
5.3.2 Blending
When a fragment passes the z-test, its color can be written into the color
buffer. Until now, we assumed that the color already present is replaced by
the color of the fragment, but this is not necessarily the case: the color of the
fragment (referred to as source color ) can be blended with that in the color
buffer (referred to as destination color ) to model things such as transparent
surfaces. A key role in blending is played by the alpha component, which is
used to specify the opacity of a surface, that is, how much light passes through
it.
The CG APIs such as WebGL offer the possibility of making a linear
combination of each color component to obtain a wide range of effects. More
precisely, let:
s = [sR , sG , sB , sa ]
d = [dR , dG , dB , da ]
s be the source and d the destination color; the new destination color (d0 ) is
computed as:
bR sR cR dR
0
bG sG cG dG
d = +
(5.25)
bB sB cB dB
bA sa cA da
| {z } | {z }
b c
where b and c are coefficients, referred to as blending factors that can be set to
produce different visual effects. The resulting value is clamped in the interval
in which the color is encoded. The most common uses of blending are handling
transparent surfaces and improving antialiasing.
Turning Vertices into Pixels 157
of the line is one pixel, which is the minimum size we can handle with our
raster device. So, the rasterizer computes a set of pixels to represent the line.
However, we can still play with the color of the surrounding pixels to reduce
the aliasing effect. If we consider the segment as a polygon having width 1
pixel and length equal to the segment’s length, we see that such a polygon
partially covers some pixels. The area averaging technique or unweighted area
sampling consists of shading each pixel intersected by this polygon with a
color whose intensity is scaled down by the percentage of the area of the pixel
covered by the polygon, as shown in Figure 5.16. The intuition behind this
technique is that if only half a pixel is inside the line, we halve its color and,
at a proper viewing distance, we will obtain a convincing result. Considering
the color space HSV (hue, saturation and value), what happens is that the
saturation is scaled down while H and V remain the same.
However, what about the color of the other half of the pixel? What if more
lines and polygons cover the same pixel? If we use blending, we can composite
our half color by using the alpha channel with the color already in the color
buffer.
To implement this technique we could just compute the area of intersec-
tion between the thick line and the pixels during the rasterization process and
output the color depending on the area value. We can try to do this as an
exercise by passing the segments’ endpoints in the fragment shader and mak-
ing the computation. However, the exact intersection between a polygon and
a square (the pixel) requires several floating point operations and we want to
avoid that.
Turning Vertices into Pixels 159
FIGURE 5.17: Exemplifying drawings for the cabin. The coordinates are
expressed in clip space. (See code at http://envymycarbook.com/chapter5/0/
cabin.js.)
139 this . d raw Col ore dObj ect ( gl , this . cabin , [0.4 , 0.8 , 0.9 , 1.0]) ;
140
141 gl . stencilFunc ( gl . EQUAL , 0 , 0 xFF ) ;
142 gl . stencilOp ( gl . KEEP , gl . KEEP , gl . KEEP ) ;
143 gl . stencilMask (0) ;
LISTING 5.4: Using stenciling for drawing the cabin.
The changes to our client to include the view from inside the car are very
few. First we add a new camera that we call DriverCamera placed inside the
car, which is easily done as in Section 4.10.1. Then we need to introduce
the code shown in Listing 5.4 at the beginning of the rendering cycle. These
few lines simply prepare the stencil buffer by drawing the polygons shown in
Figure 5.17 (on the left) and thus inhibit the writing in the frame buffer of
the pixels covered by such polygons.
With gl.clearStencil(0) we tell WebGl that the value to use to clear the
stencil buffer is 0 and with gl.stencilMask(˜0) we indicate that all bits of the
stencil are enabled for writing (these are also the default values anyway).
How the stencil test behaves is determined by the calls at lines 133-134.
Function gl.stencilFunc(func,ref,mask) sets the condition for discarding a frag-
ment as the outcome of the comparison between the value already in the
stencil buffer; and the reference value ref. func specifies the kind of test; and
mask is a bit mask that is applied to both values. In this very simple example
by passing gl.ALWAYS we say that all fragments will pass no matter what the
stencil, the reference and the mask values are.
Function gl.stencilOp(sfail,dpfail,dppass) tells how to update the stencil
buffer depending on the outcome of the stencil test and the depth test. Since
we draw the polygon with the depth buffer cleared and make all the fragments
Turning Vertices into Pixels 161
pass, the only meaningful value is dppass, that is, what to do when both stencil
and depth test pass. By setting this parameter to gl.REPLACE we tell WebGL
to replace the current value on the stencil with the reference value specified
in gl.stencilFunc (1 in this example). After the drawing (lines 136-139) we will
have a stencil buffer that contains 1 in all the positions covered by the drawing
and 0 elsewhere. This is our mask.
Then, at lines (141-143) we change the way to do the stencil test so that
only those fragments for which the value in the stencil buffer is 0 pass and that
no modification is done to the stencil buffer in any case. The final result is
that no subsequent fragments will pass the stencil test for the pixels rasterized
by the cockpit.
We also want to add the windshield with a partially opaque upper band
(see Figure 5.17 on the right), that is, we want to see through a dark wind-
shield. We can use blending by adding the code shown in Listing 5.5 at the
end of function drawScene.
The function gl.blendFunction determines how the color of the fragment is
computed, that is, the blending factors introduced in Section 5.3.2 and used
in Formula (5.25). Again, we specified the coordinates of the windshield in
clip space.
5.4 Clipping
A clipping algorithm takes as input a primitive and returns the portion
of it inside the view frustum. Clipping is not strictly necessary to produce a
correct output. We could simply rasterize the whole primitive and ignore the
fragments outside the viewport, but rasterization has a cost.
Clipping is done in clip space (which we introduced in Section 4.6.3.1)
so the problem is posed as finding the portion of a primitive (that is, a line
162 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 5.18 (SEE COLOR INSERT): Adding the view from in-
side. Blending is used for the upper part of the windshield. (See client
http://envymycarbook.com/chapter5/0/0.html.)
segment or a polygon) inside the cube [xmin , ymin , zmin ] × [xmax , ymax , zmax ]
with as few operations as possible.
Note that if p0 and p1 lie in the positive halfspace of any of the four planes,
and so such a plane is a separating plane, the corresponding bit will be 1 for
both endpoints. Therefore we have a simple test to determine if one of the
planes is a separating plane, which is:
6 0
R(p0 ) & R(p1 ) =
where & is the bitwise AND operator. Also, we have another test to check if
the segment is fully inside the view volume
R(p0 ) | R(p1 ) = 0
Note that if tmin > tmax this means there is no intersection, otherwise the
new extremes are p00 = s(tmin ) and p10 = s(tmax ).
5.5 Culling
A primitive is culled when there is no chance that any part of it will be
visible. This may happen in three cases:
• Back-face culling: when it is oriented away from the viewer.
• Frustum culling: when it is entirely outside the view volume.
166 Introduction to Computer Graphics: A Practical Learning Approach
As for clipping, we do not need culling for obtaining a correct result, we need it
for eliminating unnecessary computation, which means to avoid rasterization
of primitives that, at the end, will not determine any of the final pixel.
FIGURE 5.22: (a) If a normal points toward −z in view space this does not
imply that it does the same in clip space. (b) The projection of the vertices on
the image plane is counter-clockwise if and only if the triangle is front-facing.
Turning Vertices into Pixels 167
FIGURE 5.23: (Left) A bounding sphere for a street lamp: easy to test for
intersection but with high chances of false positives. (Right) A bounding box
for a street lamp: in this case we have little empty space but we need more
operations to test the intersection.
Let us consider the sphere. The sphere is an easy volume to test for in-
tersection, but if we use it for a street lamp, as shown in Figure 5.23, we
will have a lot of empty space. This means that it will often happen that the
sphere intersects the view frustum but the lamp does not. A box would make
a tighter bounding volume than the sphere, but the intersection test between
a freely oriented box and the frustum requires more operations than for a
sphere. What can we do when the bounding box intersects the view frustum?
Either we go ahead and process all the primitives within or we use a divide
et impera solution. This solution consists of using smaller bounding volumes
that, all together, enclose all the primitives contained in the original bound-
ing volume. This can be done recursively creating a hierarchy of bounding
volumes. Figure 5.24 shows an example of hierarchy of bounding boxes for a
model of a car.
The type of hierarchy with the strategy to handle it are other choices
that, along with the type of bounding volumes, gave rise to hundreds of al-
gorithms and data structures to do this task. This topic is well beyond the
scope of this book. Anyway, we will come back to this concept, providing some
more details, in Chapter 11, when we discuss efficient implementation of ray
tracing.
Turning Vertices into Pixels 169
Note that frustum culling is just one specific use of the hierarchies of
bounding volumes. They are in general used for speeding the process of testing
if two moving objects collide (collision detection), in which parts they do
(contact determination) so to compute the correct (or physically plausible)
behavior (collision handling).
volume and the bounding volume itself is occluded, then none of what is
inside can be visible. We point out that, in general, the difference between
occlusion culling and frustum culling is simply that we replaced is inside the
view frustum with is visible.
In general, an efficient algorithm for occlusion culling may give a dramatic
speedup to the rendering of the whole 3D scene. How much this speedup is
depends mainly on the depth complexity of the scene, which means how many
primitives overlap on the same pixels. Note that with the z-buffer algorithm
each and every primitive in the frustum is rasterized, even if it is completely
occluded from the viewer, because the conflict is resolved at pixel level.
Chapter 6
Lighting and Shading
We see objects in the real world only if light from their surface or volume
reaches our eye. Cameras capture the images of the objects only when light
coming from them falls on their sensors. The appearance of an object in a
scene, as seen by the eye or as captured in photographs, depends on the
amount and the type of light coming out of the object towards the eye or the
camera, respectively. So to create realistic looking images of our virtual scene
we must learn how to compute the amount of light coming out from every 3D
object of our synthetic world.
Light coming from an object may originate from the object itself. In such a
case, light is said to be emitted from the object, the process is called emission,
and the object is called an emitter. An emitter is a primary source of light of
the scene, like a lamp or the sun; an essential component of the visible world.
However, only a few objects in a scene are emitters. Most objects redirect light
reaching them from elsewhere. The most common form of light redirection
is reflection. Light is said to be reflected from the surfaces of objects, and
objects are called reflectors. For an emitter, its shape and its inherent emission
property determine the appearance. But, for a reflector its appearance not only
depends on its shape, but also on the reflection property of its surface, the
amount and the type of light incident on the surface, and the direction from
which the light is incident.
In the following, after a brief introduction about what happens when light
interacts with the surface of an object and is reflected by them, we will de-
scribe in depth the physics concepts that help us to better understand such
interaction from a mathematical point of view. Then, we will show how such
complex mathematical formulation can be simplified in order to implement it
more easily, for example with the local Phong illumination model. After this,
more advanced reflection models will be presented. As usual, after the theo-
retical explanation, we will put into practice all that we explain by putting
our client under a new light, so to speak.
171
172 Introduction to Computer Graphics: A Practical Learning Approach
due to the reflection of light reaching from one or more primary sources, and
indirect light, the light due to the reflection light from one or more secondary
sources.
When light hits the surface of an object, it interacts with the matter and,
depending on the specific properties of the material, different types of inter-
action can take place. Figure 6.1 summarizes most of the light–matter inter-
action effects. A part of the light reflected from a surface point is generally
distributed in all the directions uniformly; this type of reflection is called dif-
fuse reflection and it is typical of the matte and dull materials such as wood,
stone, paper, etc. When the material reflects the light received in a preferred
direction that depends on the direction of the incident light, then it is called
specular reflection. This is the typical reflection behavior of metals. A certain
amount of light may travel into the material; this phenomenon is called trans-
mission. A part of the transmitted light can reach the opposite side of the
object and ultimately leave the material; this is the case with a transparent
object and this light is named refracted light. A part of the transmitted light
can also be scattered in random directions depending on the internal compo-
sition of the material; this is the scattered light. Sometime a certain amount
of scattered light leaves the material at points away from the point at which
the light enters the surface; this phenomenon is called sub-surface scattering
and makes the object look as if a certain amount of light is suspended inside
the material. Our skin exhibits this phenomenon: if you try to illuminate your
174 Introduction to Computer Graphics: A Practical Learning Approach
ear at a point from behind you will notice that the area around that point
gets a reddish glow, particularly noticeable at the front. This reddish glow is
due to sub-surface scattering.
A part of the incident light could also be transmitted inside the material, in
this case the equation (6.1) can be written as an equilibrium of energy:
These three components of the reflected light will be treated in the next para-
graphs.
where Lincident is the amount of the incident light, θ is the angle of inclination
of the incident light vector, that is, the angle between the normal N , which
indicates the surface orientation, and the incident light direction Lincident (see
Figure 6.2); kdiffuse is a constant term indicating how much the surface is
diffusive. Using the relation of dot product of two normalized vectors and the
cosine of angle between them we can express the cosine term in Equation (6.3)
as:
cos θ = N · Lincident (6.4)
and this leads to the standard expression for the Lambertian reflection:
incident and on the reflection direction. Figure 6.3 shows the specular reflec-
tion for an ideal specular surface, that is, a mirror. In this case the material
reflects the incident light with exactly the same angle of incidence. In the
non-ideal case the specular reflection is partially diffused around the mirror
direction (see Figure 6.3, on the right). The mirror direction is computed
taking into account the geometry between the normalized vector forming the
isosceles triangle. Using simple vector algebra, we can express the mirror re-
flection direction R, as:
R = 2N(N · L) − L (6.6)
The vectors used in Equation (6.6) are shown in Figure 6.4. The equation is
easily understood by noting that the direction R can be obtained by adding
to the normalized vector −L two times the vector x = N(N · L), which corre-
sponds to the edge BC of the triangle ABC shown in the figure.
6.1.1.3 Refraction
Refraction happens when a part of the light is not reflected but passes
through the material surface. The difference in the material properties cause
light direction to change when the light crosses from one material to the other.
The amount of refracted light depends strictly on the material properties of
the surface hit, the modification of the light direction depends on both the
material in which the light travels (e.g., the air, the vacuum, the water) and on
the material of the surface hit. Snell’s Law (Figure 6.5), also called reflectance
law , models this optical phenomenon. The name of the law derives from the
Dutch astronomer Willebrord Snellius (1580–1626), but it was first accurately
described by the Arab scientist Ibn Sahl, who in 984 used it to derive lens
Lighting and Shading 177
shapes that focus light with no geometric aberrations [36, 43]. Snell’s Law
stated that:
η1 sin θ1 = η2 sin θ2 (6.7)
where η1 and η2 are the refractive indices of the material 1 and the material
2, respectively. The refractive index is a number that characterizes the speed
of the light inside a medium. Hence, according to Equation (6.7), it is possible
to evaluate the direction change when a ray of light passes from a medium to
another one. It is possible to see this phenomenon directly by putting a straw
inside a glass of water and taking a look at the glass from the side. We see
the straw appears to have different inclinations in the air and in the water,
as if it was formed by two pieces. This visual effect is caused because of the
difference in the refractive indices of the air and the water.
flowing. If the same amount of flux is flowing out from one square millimeter
of area and from one square meter of area, then the smaller area must be
much brighter than the larger area. Similarly, flux is also not indicative of
the direction towards which the light is flowing. Light flux may be flowing
through or from the area uniformly in all directions, or may be preferentially
more in some directions and less in other directions, or may even be flowing
along only one direction. In the case where the light is flowing out of the
surface uniformly in all directions, then the area would look equally bright
from every direction; this is the case with matte or diffuse surfaces. But in
the nonuniform case, the surface will look much brighter from some directions
compared to the other directions. Hence, we have to be more selective in terms
of specifying the distribution of the flow in space and in direction. So we have
to specify the exact area, or the direction, or better define density terms that
specify flux per unit area and/or per unit direction. Here we introduce three
density terms: irradiance, radiance and intensity.
The first density term we introduce is the area density, or flux per unit
area. To distinguish light that is arriving or incident at the surface from the
light that is leaving the surface, two different terms are used to specify area
density. They are: irradiance and exitance. The conventional symbol for either
of the quantities is E. Irradiance represents the amount of flux incident on or
arriving at the surface per unit area, and exitance represents the flux leaving
or passing though per unit area. They are expressed as the ratio dΦ/dA. So
irradiance and exitance are related to flux by the expression
dΦ = E dA, (6.8)
FIGURE 6.7: Radiance incoming from the direction ωi (L(ωi )). Irradiance
(E) is the total radiance arriving from all the possible directions.
where θ is the angle the normal to the surface makes with the direction of
light flow (see Figure 6.7). The unit of radiance is Watt per meter squared
per Steradian (W·M−2 ·Sr−1 ). The integral of the incoming radiance on the
hemisphere corresponds to the irradiance:
Z
E= L(ω) cos θdω. (6.12)
Ω
surface. We will use symbol Ω to represent the hemisphere. Where it is not obvious from
the context, we may use subscripting to distinguish the hemisphere of incoming directions
from the hemisphere of outgoing directions, for example: Ωin and Ωout .
182 Introduction to Computer Graphics: A Practical Learning Approach
or simply,
ρ
fr = . (6.19)
π
So, for Lambertian surfaces BRDF is only a constant factor of the surface
reflectance and hence is a zero-dimensional function.
Another ideal reflection property is exhibited by optically smooth surfaces.
Such a reflection is also known as mirror reflection. From such surfaces reflec-
tion is governed by Fresnel’s law of reflection:
R (ωi ) L (ωi ) if θi = θr
L (ωr ) = (6.20)
0 otherwise
where θi and θr are inclination angles of the incident and reflection vectors,
and the two vectors and normal to the point of the incidence are all in one
plane. R(ωi ) is the Fresnel function of the material. We provide more detail
about the Fresnel function later. Note that the direction of the reflected light
θi = θr is the same as the mirror direction just discussed for the ideal specular
material treated in Section 6.1.1. Hence, BRDF for a perfect mirror is a delta
function of incident directions, and hence is a one-dimensional function.
A number of real world surfaces exhibit invariance with respect to the
rotation of the surface around the normal vector at the incident point. For
such surfaces, BRDFs are dependent only on three parameters θi , θr and
φ = φi − φr , thus they are three-dimensional in nature. To distinguish such
reflectors from general four-dimensional BRDFs we call the former isotropic
BRDFs and the latter as anisotropic BRDFs.
Light is a form of energy. So all reflection-related functions must satisfy
the law of energy conservation. That means, unless a material is an emitting
source, the total reflected flux must be less than or equal to the flux incident
on the surface. As a consequence, ρ ≤ 1 and
Z
fr (ωi , ωr ) cos θr dω ≤ 1. (6.21)
ωr ∈Ω
The latter expression uses the relation between BRDF and reflectance. Note
that while reflectance must always be less than one, the above expressions do
not restrict the BRDF values to less than one. Surface BRDF values can be
more than one for certain incident and outgoing directions. In fact, as we just
noted, BRDF for a mirror reflector is a delta function, that is, it is infinity
along the mirror reflection direction and zero otherwise. In addition to the
energy conservation property, BRDF satisfies a reciprocity property according
to which the function value is identical if we interchange the incidence and
reflection directions. This property is known as Helmholtz Reciprocity.
Concluding, we point out that the BRDF function can be generalized to ac-
count for subsurface scattering effects. In this case the light incident from any
direction at a surface point xi gives rise to reflected radiance at another point
of the surface xo along its hemispherical directions. In this case the reflection
184 Introduction to Computer Graphics: A Practical Learning Approach
property is a function of incident point and direction, and exiting point and
direction, fr (xi , xo , ωi , ωo ), and gets the name of bidirectional scattering sur-
face reflectance distribution function (BSSRDF). The BSSRDF is a function
of eight variables instead of four assuming that the surface is parameterized
and thus that each surface point can be identified with two variables.
In a real-world scene light reaches every surface point of a reflector from all the
directions of the hemisphere around that point. So the total reflected radiance
along ωr is:
Z Z
L (ωr ) = dL (ωr ) = fr (ωi , ωr ) L (ωi ) cos θi dω. (6.23)
ωi ∈Ω ωi ∈Ω
Soft shadows: Shadows are global effects since they depend on the position
of the objects with respect to each other. Real shadows are usually “soft”
due do the fact that real light sources have a certain area extent and are
not points.
Color bleeding: This particular effect of the indirect light corresponds to
the fact that the color of an object is influenced by the color of the
186 Introduction to Computer Graphics: A Practical Learning Approach
neighboring objects. In Figure 6.9 we can see that the sphere is green
because of the neighboring green wall.
Caustics: The caustics are regions of the scene where the reflected light is
concentrated. An example is the light concentrated around the base of
the glass sphere (Bottom-Right of Figure 6.9).
In Section 6.7 we will derive simplified versions of local lighting equations
for a few simple light sources: directional source, point or positional source,
spotlight source, area source, and environment source, whose evaluations will
be mostly straightforward. Then, we will describe some reflection models,
starting from the basic Phong illumination model, and going to more advanced
ones such as the Cook-Torrance model for metallic surfaces, the Oren-Nayar
model for retro-reflective materials and the Minneart model for velvet.
Eq.
Eq. Eq.
FIGURE 6.10: How to compute vertex normals from the triangle mesh.
example consider vertex v in Figure 6.10. If we move away a little bit from v
we see that we have four different tangent planes. So which is the normal (and
hence the tangent plane) at vertex v? The answer to this question is “there is
no correct normal at vertex v,” so what we do is to find a reasonable vector
to use as normal. Here “reasonable” means two things:
• that it is close to the normal we would have on the continuous surface
we are approximating with the triangle mesh;
• that it is as much as possible independent of the specific triangulation.
The most obvious way to assign the normal at vertex v is by taking the average
value of the normals of all the triangles sharing vertex v:
1 X
nv = nfi (6.26)
|S ∗ (v)|
i∈S ∗ (v)
This intuitive solution is widely used but it is easy to see that it is highly
dependent on the specific triangulation. Figure 6.10.(b) shows the very same
surface as Figure 6.10.(a), but normal n2 contributes more than the others
to the average computation and therefore the result consequently changes.
An improvement over Equation (6.26) is to weight the contribution with the
triangle areas:
1 X
nv = P Area(fi )nvi (6.27)
i∈S ∗ (v) Area(fi ) ∗ i∈S (v)
This problem is avoided if we weight the normals with the angle formed by
the triangle at v:
1 X
nv = P α(fi , v)nvi (6.28)
i∈S ∗ (v) α(fi , v)
i∈S ∗ (v)
That said, please note that in the average situation we do not create
bad tessellation just for the fun of breaking the algorithms, so even the For-
mula (6.26) generally produces good results.
Note that if the triangle mesh is created by connecting vertices placed on
a known continuous surface, that is, a surface of which we know the analytic
formulation, we do not need to approximate the normal from the faces, we
may simply use the normal of the continuous surface computed analytically.
For example, consider the cylinder in Figure 6.11. We know the parametric
function for the points on the sides:
r cosα
Cyl(α, r, y) = y
r sinα
and the normal to the surface is none other than
Cyl(α, r, y) − [0.0, y, 0.0]T
n(α, r, y) =
kCyl(α, r, y) − [0.0, y, 0.0]T k
angles where the surface should be smooth. The second question is how to
encode this in the data structure shown in Section 3.9. This is typically done
in two alternative ways: the first way is to simply duplicate the vertices along
a crease, assign them the same position but different normals, and this is what
we will do. The second way is to encode the normal attribute on the face data,
so that each face stores the normal at its vertices. This involves a useless
duplication of all the normal values for all the vertices that are on smooth
points (usually the vast majority) but it does not change the connectivity of
the mesh.
nM −1 M uT = 0 (6.29)
−1 T
(nM )(M u ) = 0 (6.30)
Since we multiply the matrices on the left we will write:
T
(nM −1 )T = M −1 nT
Hence the normal must be transformed by the inverse traspose of the
matrix applied to the positions.
The relations are also the same for a reflector if the surface is Lambertian,
and the reflected light is spatially uniform over the surface of the reflector.
We will now proceed to derive equations for carrying out the computation
of this exiting radiance from Lambertian reflectors because of the lighting
coming from various light sources. In the derivation we will use subscripts
out and in to distinguish light exiting and light incident on the reflector
surface. Before doing this, we summarize below the property of a Lambertian
reflector:
• The reflected radiance from the surface of the reflector is direction in-
dependent, i.e., Lout is constant along any direction.
• The relation between exitance (Eout ), i.e., the areal density of reflected
flux, and the outgoing radiance Lout due to reflection, is Lout = Eπout .
• The relation between irradiance (Ein ), i.e., the area density of incident
flux, and the outgoing radiance Lout is Lout = kD Eπin where kD is the
ratio of exiting flux to the incident flux at the surface, also known as
the diffuse surface reflectance.
192 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 6.14: (Left) Lighting due to directional light source. (Right) Light-
ing due to point or positional light source.
Ei = E0 (6.33)
Ei = E0 cos θi (6.34)
Substituting the value of Ei in the rendering equation and using the fact that
light is incident from only one direction we get:
25 if ( c r e a t e N o r m a l B u f f e r ) {
26 obj . normalBuffer = gl . createBuffer () ;
27 gl . bindBuffer ( gl . ARRAY_BUFFER , obj . normalBuffer ) ;
28 gl . bufferData ( gl . ARRAY_BUFFER , obj . vertex_normal , gl .←-
STATIC_DRAW ) ;
29 gl . bindBuffer ( gl . ARRAY_BUFFER , null ) ;
30 }
LISTING 6.1: Adding a buffer to store normals. (Code snippet from
http://envymycarbook.com/chapter6/0/0.js.)
Similarly, we modify the function drawObject by adding the lines in List-
ing 6.2:
So we can have objects with normal per vertex. Now all we need is a program
shader that uses the normal per vertex to compute lighting, which we will call
lambertianShader. The vertex shader is shown in Listing 6.3.
194 Introduction to Computer Graphics: A Practical Learning Approach
Note that, with respect to that shader we wrote in Section 5.3.4, we added the
varying vpos and vnormal to have these values interpolated per fragment. See
fragment shader code in Listing 6.4. Both these variables are assigned to coor-
dinates in view space. For the position this means that only uModelViewMatrix
is applied to aPosition, while the normal is transformed by the matrix aViewS-
paceNormalMatrix. This also means that the light direction must be expressed
in the same reference system (that is, in view space). We use the uniform
variable uLightDirection to pass the view direction expressed in view space,
which means that we take the variable this.sunLightDirection and transform it
by the normal matrix.
48 { \n\
49 // normalize interpolated normal \n\
50 vec3 N = normalize ( vnormal ) ; \n\
51 \n\
52 // light vector ( positional light ) \n\
53 vec3 L = normalize ( - uLightDirection . xyz ) ; \n\
54 \n\
55 // diffuse component \n\
56 float NdotL = max (0.0 , dot (N , L ) ) ; \n\
57 vec3 lambert = ( vdiffuse . xyz * uLightColor ) * NdotL ; \n\
58 \n\
59 gl_FragColor = vec4 ( lambert , 1.0) ; \n\
60 } ";
LISTING 6.4: Fragment shader. (Code snippet from http://
envymycarbook.com/chapter6/0/shaders.js.)
SpiderGl also provides a way to specify the shaders to use for rendering a
SglModel and the values that must be passed to such shaders with an object
called SglTechnique. Line 170 in Listing 6.6 shows the creation of a SglTech-
nique for our simple single light shader.
177 },
178 globals : {
179 " u Pro jec tio nMat rix " : {
180 semantic : " PRO JECT ION _MA TRI X " ,
181 value : this . projectionMatrix
182 },
183 " uModelViewMatrix " : {
184 semantic : " WOR LD_V IEW _MA TRI X " ,
185 value : this . stack . matrix
186 },
187 " uViewSpaceNormalMatrix ": {
188 semantic : " V I E W _ S P A C E _ N O R M A L _ M A T R I X " ,
189 value : SglMat4 . to33 ( this . stack . matrix )
190 },
191 " uLightDirection " : {
192 semantic : " L I G H T 0 _ L I G H T _ D I R E C T I O N " ,
193 value : this . s u n L i g h t D i r e c t i o n V i e w S p a c e
194 },
195 " uLightColor " : {
196 semantic : " L I G H T 0 _ L I G H T _ C O L O R " ,
197 value : [0.9 , 0.9 , 0.9]
198 } ,}}) ;};
LISTING 6.6: The SglTechnique. (Code snippet from http://
envymycarbook.com/chapter6/0/0.js.)
At line 144 we assign to the renderer the technique we have defined and
at line 145 we pass the uniform values that we want to update with respect
to their initial assignent in the definition of the technique. In this example
LIGHT0 LIGHT COLOR is not set since it does not change from frame to
frame, while all the other variables do (note the sun direction is updated
by the server to make it change with time). Finally, at line 154 we invoke
this.renderer.renderModel, which performs the redering.
Note that there is nothing in these SpiderGL functionalities that we do not
already do for the other elements of the scene (the trees, the buildings). These
functions encapsulated all the steps so that it is simpler for us to write the code
for rendering a model. Note that this is complementary but not alternative
to directly using WebGL calls. In fact, we will keep the rest of the code as is
and use SglRenderer only for the models we load from external memory.
Figure 6.15 shows a snapshot from the client with a single directional light.
source is emitting light with uniform intensity I0 in every direction, then the
expression for Ei at the reflector point located at a distance r away from the
light source is:
I0 cos θi
Ei = (6.37)
r2
This expression is derived as follows: let dA be the differential area around
the reflector point. The solid angle subtended by this differential area from
the location of the point light source is dA rcos 2
θi
. Intensity is flux per solid
angle. So the total flux reaching the differential area is I0 dAr2cos θi . Irradiance
is the flux per unit area. So incident irradiance, Ei , on the differential area is
Ei = I0 cos
r2
θi
. From the incident irradiance, we can compute exiting radiance
as
I0 cos θi
L (ωr ) = fr (ωi , ωr ) Ei = fr (ωi , ωr ) (6.38)
r2
If the reflector is Lambertian then
ρ I0 cos θ
L (ωr ) = (6.39)
π r2
Thus the rendering equation for computing direct light from a Lambertian
reflector due to a point light source is
ρ I0 cos θ
, (6.40)
π πr2
that is, the reflected radiance from a perfectly diffuse reflector due to a point
light source is inversely proportional to the square of the distance of the light
source to the reflector and directly proportional to the cosine of the orientation
of the surface with respect to the light direction. This is an important result
and it is the reason why many rendering engines assume that the intensity of
the light decays with the square radius of the distance.
the function drawLamp, which, just like drawTree, assembles the basic primi-
tives created in Section 3.9 to make a shape that resembles a street lamp (in
this case a thick cylinder with a small cube on the top). The only important
change to our client is the introduction of a new shader, called lambertianMul-
tiLightShader (see Listing 6.9), which is the same as the lambertianShader with
two differences: it takes not one but an array of nLights and it considers both
directional and point lights.
FIGURE 6.16 (SEE COLOR INSERT): Adding point light for the lamps.
(See client http://envymycarbook.com/chapter6/1/1.html.)
textures (see Chapter 7). The number of lights may greatly impact on the
performance of the fragment shader, which must run over all the array and
make floating point computation. This is something that you can test on the
device you are using by simply increasing the number of lights and observe
the fall of frames per second.
Again, all the light computation is done in view space so all the lights
geometry will have to be transformed before being passed to the shader. Fig-
ure 6.16 shows a snapshot from the client showing the light of the street lamps.
6.7.5 Spotlights
Spotlights represent a cone of light originating from a point. So they are
basically point light sources with directionally varying intensity. These light
sources are specified by: the position of the origin of the light source, and the
direction of the center axis of the cone. The direction of the axis is also called
the spot direction, and the spot intensity is the maximum along that direction,
and may fall off away from that direction. So additional specifications are: the
intensity fall-off exponent (f ), and intensity cutoff angle (β). The cutoff angle
is the angle around the spot direction beyond which the spotlight intensity is
zero. The exponent determines the factor by which the intensity is reduced
for directions away from the spot direction, and is computed as follows:
f
I(ωi ) = I0 (cos α) (6.41)
where α is the angle between the cone axis and ωi , the direction of incidence
(see Figure 6.17, on the left). Using derivations from the previous paragraph
Lighting and Shading 201
FIGURE 6.17: (Left) Lighting due to spot light source. (Right) Lighting due
to area light source.
we can write the expression for reflected radiance due to spotlight as:
( f
This expression is derived as follows (refer to Figure 6.17). Let dAp be the
differential area around a point p on the light source. The solid angle subtended
by the differential reflector area dA at p is dArcos 2
θ
. The projection of dAp
p
along the direction towards the receiver is dAp cos θp . Radiance is the flux per
202 Introduction to Computer Graphics: A Practical Learning Approach
projected area per solid angle. So the total flux reaching dA at the reflector is
dAp cos θp dA cos θ
Lp (ωp ) (6.45)
rp2
The total reflected radiance due to the whole area light source is the integra-
tion of dLr , where the domain of integration is the whole area of the light
source. So the irradiance is:
Z Z
cos θ cos θp
Lr (ωr ) = dLr (ωr ) = fr (ωp , ωr ) Lp (ωp ) dAp (6.48)
p∈A p∈A rp2
If we further assume that the radiance is constant over the light source and
the reflecting surface is Lambertian, then the equation simplifies slightly to
Z
ρ cos θ cos θp
Lr = Lp dAp (6.49)
π p∈A rp2
As we see here, the computation of lighting due to area light requires integra-
tion over area, which means a two-dimensional integration. Except for very
simple area lights such as uniformly emitting hemisphere, close form integra-
tions are difficult, or even impossible to compute. Thus, one must resort to
numerical quadrature techniques in which an integration solution is estimated
as a finite summation. A quadrature technique that extends easily to mul-
tidimensional integration divides the domain into a number of sub-domains,
evaluates the integrand at the center (or a jittered location around the cen-
ter) of the sub-domain, and computes the weighted sum of these evaluated
quantities as the estimate of the integration solution. A simple subdivision
strategy is to convert the area into bi-parametric rectangles, and uniformly
divide each parameter and create equi-area sub-rectangles in the bi-parametric
space. Such conversion may require domain transformation.
An alternative way to deal with area lights is to approximate them with
a set of point lights. This approach is simple but could be computationally
expensive, so we will use it in the next example.
Lighting and Shading 203
8
9 SpotLight = function () {
10 this . pos = [];
11 this . dir = [];
12 this . posViewSpace = [];
13 this . dirViewSpace = [];
14 this . cutOff = [];
15 this . fallOff = [];
LISTING 6.10: Light object including spotlight. (Code snippet from
http://envymycarbook.com/chapter6/2/2.js.)
We define pos and dir of the headlights in model space, that is, in the
frame where the car is defined, but remember that our shader will per-
form lighting computation in view space; therefore we must express the
headlights’ position and direction in view space before passing them to the
shader. The modifications to the shader are straighforward. We proceed in
the same way as for the point lights by adding arrays of uniform variables
and running over all the spotlights (2 in our specific case, see Listing 6.11).
as shown in Listing 6.12. Second, the point lights are not really point lights
because they illuminate only the −y half space.
Figure 6.18 shows a snapshot of the client with headlights on the car.
The constants kA , kD and kS define the color and the reflection properties of
the material. For example, a material with kA and kS set to zero means it
exhibits purely diffusive reflection; conversely, a perfect mirror is characterized
by kA = 0, kD = 0, kS = 1. The role of the ambient component will be clarified
in the next section, where each component will be described in depth.
We emphasize that the summation of each contribution can be greater
than 1, which means that the energy of the light may not be preserved. This
is another peculiarity of the Phong model, since it is not a physical model but
is based on empirical observations. An advantage of this independence from
physical behavior is that each and every component in the Phong model can
be freely tuned to give to the object the desired appearance. Later on, when
we will see other reflection models, we will analyze models that follow the rule
of conservation of the energy, such as the Cook-Torrance model.
The max function in the equation guarantees that the amount of reflected
light due to any light source is not less than zero. Equation (6.57) is valid for
one light source. In the presence of multiple light sources the reflection due to
each individual light is accumulated to get the total amount of reflected light.
This is a fundamental aspect of the light: the effect of lighting in a scene is
the sum of all the light contributions. By accumulating the contributions of
208 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 6.21: Flat and Gouraud shading. As it can be seen, the flat shading
emphasizes the perception of the faces that compose the model.
210 Introduction to Computer Graphics: A Practical Learning Approach
like to visualize clearly the faces of our object; in this cases flat shading is more
useful than Gouraud shading.
previous client is the way the fragment shader computes the contribution of
each light source (see Listing 6.13).
model of the reflection developed by Torrance and Sparrow. The Torrance and
Sparrow reflection model assumes that the surface of an object is composed
by thousands of micro-faces that act as small mirrors, oriented more or less
randomly. Taking a piece of surface, the distribution of the micro-facets in that
part determines, at macroscopic level, the behavior of the specular reflection.
Later, Cook and Torrance [5] extended this model to reproduce the complex
reflection behavior of metals.
The Cook-Torrance model is defined as:
DGF
Lr = Lp (6.60)
(N · L)(N · V)
2(N · H)(N · V)
G1 = (6.62)
(V · H)
(N · H)(N · L)
G2 = (6.63)
(V · H)
G = min {1, G1 , G2 } (6.64)
where G1 accounts for the masking effects and G2 accounts for the shadowing
effects.
The term F is the Fresnel term and takes into account the Fresnel law
of reflection. The original work of Cook and Torrance is a valuable source of
information about the Fresnel equation for different types of materials. The
Fresnel effect depends not only on the material but also on the wavelength/
color of the incoming light. Even removing this dependency, its formulation is
quite complex: " #
2
1 (g − c)2 (c(g + c) − 1)
F = 1+ 2 (6.65)
2 (g + c)2 (c(g − c) + 1)
√ p
where c = V · H, g = c2 + η 2 − 1 and η is the refraction index of the
material. Due to the complexity of this formula, when no high degree of realism
is required, a good approximation can be achieved by the following simpler
formulation:
F = ρ(1 − ρ)(1 − N · L)5 (6.66)
The Top-Left image in Figure 6.24 gives us an idea of how an object
rendered with the Cook-Torrance model appears. As we can see, the car seems
effectively composed of metal.
It may be noted that this is the first reflection model we describe where the
reflected light depends not only on the angle of incidence of the incoming light
but also on the azimuthal angle.
Lr = kD Lp (N · L) (N · L)K (N · V )K−1 )
(6.72)
| {z }| {z }
diffuse darkening factor
where K is an exponent used to tune the look of the material. For a visual
comparison with the Phong illumination model, take a look at Figure 6.24.
Lighting and Shading 215
6.11 Self-Exercises
6.11.1 General
1. What kind of geometric transformations can also be applied to the nor-
mal without the need to be inverted and transposed?
2. If you look at the inside of a metal spoon, you will see your image
inverted. However, if you look at the backside of the spoon it does not
happen. Why? What would happen if the spoon surface was diffusive?
3. Find for which values of V and L the Blinn-Phong model gives exactly
the same result as the Phong model.
4. Consider a cube and suppose the ambient component is 0. How many
point lights do we need to guarantee that all the surface is lit?
5. Often the ambient coefficient is set to the same value for all elements of
the scene. Discuss a way to calculate the ambient coefficient that takes
into account the scene. Hint: for example, should the ambient coefficient
inside a deep tunnel be the same as the ground in an open space?
2. Modify the “Phong model” client in order to add several light sources
to it. What happened due to its non-energy-preserving nature? How do
you solve this problem?
3. Modify the “Cook-Torrance” client and try to implement a per-vertex
and a per-pixel Minnaert illumination model.
Chapter 7
Texturing
Texture mapping is by far the most used technique to make 3D scenes look
real. In very simple terms, it consists of applying a 2D image to the 3D ge-
ometry as you would do with a sticker on a car. In this chapter we will show
how this operation is done in the virtual world where the sticker is a raster
image and the car is a polygon mesh.
217
218 Introduction to Computer Graphics: A Practical Learning Approach
see shortly, to achieve some special behaviors. This mapping is usually re-
ferred to as texture wrapping and it is commonly defined in two alternative
modes: clamp and repeat.
Figure 7.3 shows where texturing related operations take place in the ren-
dering pipeline.
FIGURE 7.6: The simplest mipmapping example: a pixel covers exactly four
texels, so we precompute a single texel texture and assign the average color
to it.
If, for example, ρ = 16, it means that neither the x nor the y side of
the pixel spans more than 16 texels, and therefore the level to use is L =
log2 16 = 4. If ρ = 1 there is one-to-one correspondence between pixels and
texels and the level to use is L = log2 1 = 0. Note that ρ < 1 means that we
have magnification. For example ρ = 0.5 means a texel spans over two pixels
and the mipmap level would be L = log2 0.5 = −1, which means the original
texture with twice the resolution in both sides.
In the general case ρ is not a power of two, therefore log2 ρ will not be an
integer. We can choose to use the nearest level or to interpolate between the
two nearest levels (blog2 ρc, dlog2 ρe). See Figure 7.9 for an example of mipmap
at work.
!
α0 aza/d + β 0 bzb/d α0 β0
α0 a0 + β 0 b0
0
p = = = wa a + wb b
1 1 1
226 Introduction to Computer Graphics: A Practical Learning Approach
a
where we applied az /d → wa and bz /d → wb . We recall that =
1
aw
, ∀w 6= 0. Multiplying point p0 for any non-zero value, we will always
w
have points along the line passing through the point of view and p0 , meaning
all the points that project to the same point p0 . We choose to multiply p0 by
1
α0 β0 :
wa +w
b
α0 β0
! wb
α 0 β0 1
wa
a+ b
Wa a + Wb b
α0 β 0 α0 β 0
p0 = = +w +w
wa wa
b b
α0 β0
1 1
wa + wb α0 β0
wa + wb
Note that the terms multiplying a and b sum to 1, meaning that the point is
on the segment ab. Since it is also, by construction, on the line L it means
it is the point p, therefore the two terms are the barycentric coordinates of
point p, alias the texture coordinates we were looking for.
α0 wtaa + β 0 wtbb
t p0 =
α0 w1a + β 0 w1b
Note that another way to write this expression is:
0 ta 0 tb 0 ta /wa 0 tb /wb
tp0 = α +β =α +β
1 1 1/wa 1/wb
which is called hyperbolic interpolation. The generalization to the interpolation
of n values is straigthforward, so we have the rule to compute perspectively
Texturing 227
FIGURE 7.12: (Left) A tileable image on the left and an arrangment with
nine copies. (Right) A non-tileable image. Borders have been highlighted to
show the borders’ correspondence (or lack of it).
correct attributes interpolation for the triangle. Consider the triangle (a, b, c)
with texture coordinates ta , tb , tc and a fragment with position p0 = α0 a +
β 0 b + γ 0 c. The texture coordinates for p0 are obtained as:
be read. In this case gl.RGBA says that the image has four channels and
gl.UNSIGNED BYTE that each channel is encoded with an unsigned byte code
(0 . . . 255).
Line 21 indicates how the texture must be sampled when the texture is
magnified (linear interpolation) and line 22 when the texture is minified. In
the latter case we passed the parameter gl.LINEAR MIMAP LINEAR that says
that the texture value must also be interpolated linearly within the same
level and also linearly in between mipmapping levels. We invite the reader
to change this value on gl.LINEAR MIMAP NEAREST and look at the differ-
ences. At lines 23-24 we specify the wrapping mode for both coordinates to
be gl.REPEAT and at line 26 we make WebGL generate the mipmap pyra-
mid. The call gl.bindTexture(gl.TEXTURE 2D,null) unbinds the texture from
the texture target. This is not strictly necessary but consider that if we do
not do this any call to modify a texture parameter with gl.TEXTURE 2D will
affect our texture. We strongly advise to end each texture parameter setting
with the unbinding of the texture.
We will create 4 textures: one for the ground, one for the street, one for
the building facade and one for the roofs, by adding the lines in Listing 7.2 in
function onInitialize.
Figure 7.13 shows a client with textures applied to the elements of the scene.
V 0 = M Zm M −1 V
that is, express frame V in frame M coordinates, then mirror the z component,
then again express it in world coordinates.
The next step is perform the rendering of the scene with V 0 as viewing
frame and use the result as texture to be mapped on the polygon representing
the rear mirror. Note that we do not want to map the whole image on the
such polygon but only the part of the frustum that passes through the mirror.
This is done by assigning the texture coordinates to vertices pi using their
projection on the viewing window:
ti = T P V 0 pi
232 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 7.14: Scheme of how the rear mirror is obtained by mirroring the
view frame with respect to the plane where the mirror lies.
where P is the projection matrix and T is the matrix that maps coordinates
from NDC [−1, 1]3 to texture space [0, 1]3 , which you can find explained in
Section 7.7.5. Note that we actually need only the first two components of the
result.
6 NVMCClient . r e a r M i r r o r T e x t u r e T a r g e t = null ;
7 TextureTarget = function () {
8 this . framebuffer = null ;
9 this . texture = null ;
10 };
11
12 NVMCClient . p r e p a r e R e n d e r T o T e x t u r e F r a m e B u f f e r = function ( gl , ←-
generateMipmap , w , h ) {
13 var textureTarget = new TextureTarget () ;
14 textureTarget . framebuffer = gl . cr eate Fra meb uff er () ;
15 gl . bindFramebuffer ( gl . FRAMEBUFFER , textureTarget . framebuffer ) ;
16
17 if ( w ) textureTarget . framebuffer . width = w ;
18 else textureTarget . framebuffer . width = 512;
19 if ( h ) textureTarget . framebuffer . height = h ;
20 else textureTarget . framebuffer . height = 512;;
21
22 textureTarget . texture = gl . createTexture () ;
23 gl . bindTexture ( gl . TEXTURE_2D , textureTarget . texture ) ;
24 gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MAG_FILTER , gl .←-
LINEAR ) ;
25 gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_MIN_FILTER , gl .←-
LINEAR ) ;
26 gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_WRAP_S , gl . REPEAT ) ;
27 gl . texParameteri ( gl . TEXTURE_2D , gl . TEXTURE_WRAP_T , gl . REPEAT ) ;
28
29 gl . texImage2D ( gl . TEXTURE_2D , 0 , gl . RGBA , textureTarget .←-
framebuffer . width , textureTarget . framebuffer . height , 0 , gl←-
. RGBA , gl . UNSIGNED_BYTE , null ) ;
30 if ( generateMipmap ) gl . generateMipmap ( gl . TEXTURE_2D ) ;
31
32 var renderbuffer = gl . cr ea t eR e nd er b uf f er () ;
33 gl . bindRenderbuffer ( gl . RENDERBUFFER , renderbuffer ) ;
34 gl . r e n d e r b u f f e r S t o r a g e ( gl . RENDERBUFFER , gl . DEPTH_COMPONENT16 , ←-
textureTarget . framebuffer . width , textureTarget . framebuffer←-
. height ) ;
35
36 gl . f r a m e b u f f e r T e x t u r e 2 D ( gl . FRAMEBUFFER , gl . COLOR_ATTACHMENT0 , ←-
gl . TEXTURE_2D , textureTarget . texture , 0) ;
37 gl . f r a m e b u f f e r R e n d e r b u f f e r ( gl . FRAMEBUFFER , gl . DEPTH_ATTACHMENT ←-
, gl . RENDERBUFFER , renderbuffer ) ;
38
39 gl . bindTexture ( gl . TEXTURE_2D , null ) ;
234 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 7.15 (SEE COLOR INSERT): Using render to texture for im-
plementing the rear mirror. (See client http://envymycarbook.com/chapter7/
1/1.html.)
Figure 7.15 shows a view from the driver’s prespective with rearview appearing
in the rear mirror.
FIGURE 7.16: (a) An example of a sphere map. (b) The sphere map is cre-
ated by taking an orthogonal picture of a reflecting sphere. (c) How reflection
rays are mapped to texture space.
Ry 1
t = q +
2 Rx2 + Ry2 + (Rz + 1)2 2
7.7.1.2 Limitations
Sphere mapping is view dependent, meaning that, besides the aforemen-
tioned approximation, a sphere map is only valid from the point of view from
which it was created. Furthermore, the information contained in the texture
is not a uniform sampling of the environment, but it is more dense in the
center and less in the boundary. In the practical realization, the tessellation
of the object’s surface may easily lead to artifacts in the boundary region of
the sphere map, because neighbor points in the surface may correspond to
faraway texture coordinates, as shown in Figure 7.17.
FIGURE 7.18: (a) Six images are taken from the center of the cube. (b)
The cube map: the cube is unfolded as six square images on the plane. (c)
Mapping from a direction to texture coordinates.
Let us suppose that we place an ideal cube in the middle of the scene and
we shot 6 photos from the center of the cube along the three axes and in both
directions (as shown in Figure 7.18) so that the viewing window of each photo
matches exactly with the corresponding face of the cube. The result is that
our cube contains all the environment. Figure 7.18.(b) shows a development
of the cube so that each square is a face of the cube.
Limitations
Cube mapping has several advantages over sphere mapping: a cube map is
valid from every view direction, we do not have the artifact due to the sin-
gularity along the (0, 0, −1) direction and the mapping from the cube to the
cube maps does not introduce distortion. However, like sphere mapping, the
method works correctly on the same assumptions, that is: the environment is
far, far away.
238 Introduction to Computer Graphics: A Practical Learning Approach
1. The portion of the cube rendered on the screen depends only on the
view direction. This can be done by rendering the unitary cube (or a
cube of any size) centered in the viewer position.
2. Everything in the scene is closer than the box and therefore every frag-
ment of the rasterization of a polygon of the scene would pass the depth
test. This can be done by rendering the cube first and telling WebGL
not to write on the depth buffer while rendering the cube, so everything
rendered after will pass the depth test.
60 gl . depthMask ( true ) ;
61 gl . bindTexture ( gl . TEXTURE_CUBE_MAP , null ) ;
62 }
LISTING 7.6: Rendering a skybox. (Code snippet from
http://envymycarbook.com/chapter7/3/3.js.)
The shader, shown in Listing 7.7, is very simple, being just a sampling
of the cubemap based on the view direction. The only difference with the
bidimensional texture is that we have a dedicated sampler, sampleCube, and a
dedicated function, textureCube, which takes a three-dimensional vector, and
accesses the cubemap by using Equation (7.2).
2 var v e r t e x S h a d e r S o u r c e = " \
3 uniform mat4 uModelViewMatrix ; \ n \
4 uniform mat4 uPr oje cti onM atr ix ; \ n \
5 attribute vec3 aPosition ; \n\
6 varying vec3 vpos ; \n\
7 void main ( void ) \n\
8 { \n\
9 vpos = normalize ( aPosition ) ; \n\
10 gl_Position = uPro jec tio nMa tri x * uModelViewMatrix * vec4 (←-
aPosition , 1.0) ;\ n \
11 }";
12 var f r a g m e n t S h a d e r S o u r c e = " \
13 precision highp float ; \n\
14 uniform samplerCube uCubeMap ; \ n \
15 varying vec3 vpos ; \n\
16 void main ( void ) \n\
17 { \n\
18 gl_FragColor = textureCube ( uCubeMap , normalize ( vpos ) ) ;\ n \
19 } ";
LISTING 7.7: Shader for rendering a skybox. (Code snippet from
http://envymycarbook.com/chapter7/3/shaders.js.)
5 shaderProgram . v er te x Sh a de rS o ur c e = " \
6 uniform mat4 uModelViewMatrix ; \n\
7 uniform mat4 uPr oje cti onM atr ix ; \n\
8 uniform mat3 u V i e w S p a c e N o r m a l M a t r i x ; \ n \
9 attribute vec3 aPosition ; \n\
10 attribute vec4 aDiffuse ; \n\
11 attribute vec4 aSpecular ; \n\
12 attribute vec3 aNormal ; \n\
13 varying vec3 vPos ; \n\
14 varying vec3 vNormal ; \n\
15 varying vec4 vdiffuse ; \n\
16 varying vec4 vspecular ; \n\
17 void main ( void ) \n\
18 { \n\
19 vdiffuse = aDiffuse ; \n\
20 vspecular = aSpecular ; \n\
21 vPos = vec3 ( uModelViewMatrix * vec4 ( aPosition , 1.0) ) ; \ n \
22 vNormal = normalize ( u V i e w S p a c e N o r m a l M a t r i x * aNormal ) ;\ n \
23 gl_Position = uPro jec tio nMa tri x * uModelViewMatrix * vec4 (←-
aPosition , 1.0) ;\ n \
24 }";
25 shaderProgram . f r a g m e n t S h a d e r S o u r c e = " \
26 precision highp float ; \n\
27 uniform mat4 u V i e w T o W o r l d M a t r i x ; \ n \
28 uniform samplerCube uCubeMap ; \n\
29 varying vec3 vPos ; \n\
30 varying vec4 vdiffuse ; \n\
31 varying vec3 vNormal ; \n\
32 varying vec4 vspecular ; \n\
33 void main ( void ) \n\
34 { \n\
35 vec3 reflected_ray = vec3 ( u Vi e wT oW o rl d Ma tr i x * vec4 ( reflect←-
( vPos , vNormal ) ,0.0) ) ;\ n \
36 gl_FragColor = textureCube ( uCubeMap , reflected_ray ) *←-
vspecular + vdiffuse ;\ n \
37 }";
LISTING 7.8: Shader for reflection mapping. (Code snippet from
http://envymycarbook.com/chapter7/4/shaders.js.)
it in Section 7.6. The only difference is that in this case the texture will be a
face of the cubemap.
The only sensible change to the code is the introduction of function dra-
wOnReflectionMap. The first lines of the function are shown in Listing 7.9.
First of all, at line 62, we set the projection matrix with a 90 angle and aspect
ratio 1 and at line 2 we set the viewport to the size of the framebuffer we
created. Then we perform a render of the whole scene except the car for each
of the six axis aligned directions. This requires to: set the view frame (line
65), bind the right framebuffer (line 66) and clearing the used buffers.
Figure 7.19 shows a snapshot from the client with skybox, normal mapping
(discussed later) applied to the street, and reflection mapping applied on the
car.
where Pproj Vproj are the projection matrix and view matrix of the projector,
respectively, and T is the transformation that maps the space from the canon-
ical viewing volume [−1, +1]3 to [0, +1]3 . Note that the final values actually
used to access the texture will be the normalized ones, i.e. (s/q, t/q), while
coordinate r is unused in this case.
a tree may be seen as a cylinder (large scale) plus its cortex (the finer detail),
the roof of a house may be seen as a rectangle plus the tiles, a street may be
seen as a plane plus the rugosity of the asphalt. In CG, the advantage of this
way of viewing things is that usually the geometric detail can be efficiently
encoded as a texture image and used at the right time (see Figure 7.20). In
the following we will see a few techniques that use texture mapping to add
geometric detail without actually changing the geometry.
FIGURE 7.21: With normal mapping, the texture encodes the normal.
parallax error and depends on two factors: how far is the real surface from
the base geometry and how small is the angle formed by the view ray and the
base geometry.
The view ray r2 will entirely miss the object, while it would have hit the
real geometry. This is the very same parallax error with more evident conse-
quences and it means that we will be able to spot the real shape of geometry
on the object’s silhouette. In a few words, implementing normal mapping
simply means redo the Phong shading we did in Section 6.9.2, perturbing the
interpolated normal with a value encoded in a texture.
15 var f r a g m e n t S h a d e r S o u r c e = " \
16 precision highp float ; \n\
17 uniform sampler2D texture ; \n\
18 uniform sampler2D normalMap ; \n\
19 uniform vec4 uLightDirection ; \n\
20 uniform vec4 uColor ; \n\
21 varying vec2 vTextureCoords ; \n\
22 void main ( void ) \n\
23 { \n\
24 vec4 n = texture2D ( normalMap , vTextureCoords ) ; \ n \
25 n . x = n . x *2.0 -1.0; \n\
26 n . y = n . y *2.0 -1.0; \n\
27 n . z = n . z *2.0 -1.0; \n\
28 vec3 N = normalize ( vec3 ( n .x , n .z , n . y ) ) ; \n\
29 float shade = dot ( - uLightDirection . xyz , N ) ; \ n \
30 vec4 color = texture2D ( texture , vTextureCoords ) ;\ n \
31 gl_FragColor = vec4 ( color . xyz * shade ,1.0) ; \n\
32 }";
LISTING 7.10: Fragment shader for object space normal mapping. (Code
snippet from http://envymycarbook.com/chapter7/2/shaders.js.)
In other words, we want to do with the normal mapping what we do with the
color mapping: to create the texture without taking care where it is applied.
The main difference between applying the color from the texture to the
surface and applying the normal is that the color encoded in a texel does not
have to be transformed in any way to be mapped on the surface: a red texel
is red whatever geometric transformation we apply to the object. Not so for
the normals, which need a 3D reference frame to be defined. In object space
normal mapping, such a reference frame is the same reference frame as for
the object and that is why we can write directly the absolute values for the
normal.
Let us build a three-dimensional frame Ct centered at a point t in the
texture and with axis u = [1, 0, 0], v = [0, 1, 0] and n = [0, 0, 1]. What we
want is a way to map any value expressed in the frame Ct on the surface at a
point where t is projected p = M (t). In order to do this, we build a frame Tf
with origin M (t) and axis
∂M
uos = (p)
∂u
∂M
vos = (p)
∂v
nos = uos × vos
known because we know both vertex position and texture coordinates of the
vertices of the triangle. The other important thing we know is that uos and
u have the same coordinates in the respective frames, therefore we can find
uos by finding the coordinates uI of u = [1, 0, 0]T in the frame If and then
uos = uI u v10 + uI v v20 .
The end point of vector u is t0 + u. We recall from Chapter 4 that we can
express the coordinates of t0 + u in the frame If as:
7.9.1 Seams
The answer to the first question is yes. Consider just taking each individual
triangle of S and placing it onto the plane Ω in such a way that no triangles
overlap, as shown in Figure 7.26. In this simple manner we will have our
inijective function g that maps each point of S onto Ω. Note that the adjacent
triangles on the mesh will not, in general, be mapped onto adjacent triangles
on Ω, that is, g is not continuous.
Do we care that g is continuous? Yes, we do. Texture sampling is per-
formed in the assumption that close points on S map to close points on Ω.
Just consider texture access with bilinear interpolation, where the result of a
texture fetch is given by the four texels around the sample point. When the
sample is near the border of a triangle (see Bottom of Figure 7.26), the color
is influenced by the adjacent triangle on Ω. If g is not continuous, the adjacent
texels can correspond to any point on S or just to undefined values, that is,
to values that do not belong to g(S) .
In parametrization the discontinuities on g are called seams. The method
we used to build g is just the worst case for seams because it generates one
seam for each edge of each triangle.
Texturing 251
7.11 Self-Exercises
7.11.1 General
1. Suppose we do not use perspective correct interpolation and we see the
same scene with two identical viewer settings except that they have dif-
ferent near plane distances. For which setting is the error less noticeable?
How much would the error be with an orthogonal projection?
7.11.2 Client
1. Make a running text on a floating polygon over the car. Hint: Make the
u texture coordinate increase with time and use “repeat” as wrapping
mode.
2. Implement a client where the headlights project a texture. Hint: See
Section 7.7.5
Chapter 8
Shadows
257
258 Introduction to Computer Graphics: A Practical Learning Approach
2. [Lighting Pass] Draw the scene with the parameters of the viewer’s
camera and, for each generated fragment, project the corresponding
point on the light camera. Then, access the stored depth buffer to test
whether the fragment is in shadow from the light source and compute
lighting accordingly.
Note that in the practical implementation of this algorithm we cannot use
the depth buffer because WebGL does not allow us to bind it for sampling.
To overcome this limitation we will use a texture as explained in Section 8.3.
Also note that in the shadow pass we do not need to write fragment colors to
the color buffer.
FIGURE 8.3: (Left) Light camera for directional light. (Right) Light camera
for point light.
six cameras centered on c and oriented along the main axes (both in positive
and negative directions). Like we did for the directional light, we could set the
far plane to l, but we can do better and compute, for each of the six directions,
the distance from the boundary of the bounding box and set the far plane to
that distance, as shown in Figure 8.3 (Right). In this manner we will have a
tighter enclosure of the bounding box in the viewing volumes and hence more
precise depth values (see Section 5.2.4).
8.2.1.3 Spotlights
A spotlight source is defined as L = (c, d, β, f ), where c is its position in
space, d the spot direction, β the aperture angle and f the intensity fall-off
exponent. Here we are only interested in the geometric part so f is ignored.
The light rays of a spotlight originate from c and propagate towards all the
directions described by the cone with apex in c, symmetry axis d and angle β.
We set the z axis of the light camera’s frame as −d, compute the other two as
shown in Section 4.5.1.1 and set the origin to c (see Figure 8.4). The projection
is in perspective but finding the parameters is slightly more involved.
Let us start with the distance of far plane f ar. We can set it to the
maximum among projections of the vertices of the bounding box on direction
d as shown in Figure 8.4 (Left). Now that we have f ar we can compute the
base of the smallest pyramidal frustum containing the cone as:
b = 2 arctan β f ar
and scale it down to obtain the size of the viewing plane (that is, at the near
distance)
near
b0 = b
f ar
and therefore lef t = bottom = −b0 /2 and top = right = b0 /2. So we computed
the smallest trunk of the pyramid containing the cone, not the cone itself. The
262 Introduction to Computer Graphics: A Practical Learning Approach
last part is done in image space: during the shadow pass, we discard all the
fragments that are farther away from the image center than b0 /2.
that the depth buffer is still there; we are just making a copy of it into a
texture that will be our shadow map. Let us see the detail of how this is done.
Obviously, with this simple encoding we are only approximating the value
of d but this is more than enough for practical applications. Now what we need
is an efficient algorithm to find the coefficients ai . What we want is simply to
take the first decimal digit for a0 , the second for a1 , the third for a2 and the
fourth for a3 . Unfortunately in GLSL we do not have a built-in function for
singling out the ith decimal digit of a number, but we have a function frac(x),
which returns the fractional part of x, that is frac(x)=x-floor(x), so we we can
write:
frac(d10i )
ith digit(d) = frac(d10i−1 ) − 10 (8.2)
10
For example, the second digit of 0.9876 is:
mask
shif t z }| {
2
frac(0.9876 · 10 )
z }| {
2nd digit(0.9876) =
frac(0.9876 · 10) − 10 10
= (0.876 − 0.076) 10 = 8
264 Introduction to Computer Graphics: A Practical Learning Approach
This is a very simple mechanism to mask out all the digits except the one we
want. We first place the dot on the left of the digit 0.9876 · 10 = 9.8765, then
use frac to remove the integer part and remain with 0.8765. Then we mask
out the other digits by subtracting 0.0765. The same result can be obtained
in many other ways, but in this way we can exploit the parallel execution of
component-wise multiplication and subtraction of type vec4.
Now we can comment the implementation of the function pack depth in
Listing 8.2. First of all we have eight bit channels so the value for B is 28 = 256.
The vector bit shift contains the coefficients that multiply d in expression shif t
of Equation (8.2), while bit mask contains the ones in expression mask. Note
that the values in res are in the interval [0, 1], that is, the last multiplication in
Equation (8.2) it is not done. The reason is that the conversion beween [0, 1]
and [0, 255] is done at the moment of writing the values in the texture. Getting
a float value previously encoded in the texture is simply a matter of imple-
menting Equation (8.1) and it is done by the function Unpack in Listing 8.4.
Once we have filled the shadow map with scene depth as seen from the
light source in the shadow pass, we are ready to render the scene from the
Shadows 265
actual observer point of view, and apply lighting and shadowing. This lighting
pass is slightly more complicated with respect to a standard lighting one, but
it is not difficult to implement, as shown by vertex and fragment shaders code
in Listings 8.3 and 8.4, respectively. The vertex shader transforms vertices in
the observer clip space as usual, using the combined model, view, and pro-
jection matrices (uModelViewProjectionMatrix). Simultaneously, it transforms
the same input position as if it were transforming the vertex in light space,
as it has happened in the shadow pass, and makes it available to the frag-
ment shader with the varying vShadowPosition. The fragment shader is now
in charge of completing the transformation pipeline to retrieve the coordinates
needed to access the shadow map (uShadowMap) and compare occluder depth
(Sz) with the occludee depth (Fz) in the shadow test.
(a) No Depth Bias (b) Correct Depth Bias (c) Too Much Depth
Bias
because back surfaces are sufficiently distant from front ones, which now
do not self-shadow themselves;
• on the other hand, now back surfaces are self shadowing, but this time
precision issues will cause light leakage, making them incorrectly classi-
fied as lit.
The net effect of this culling reversal in the shadow pass is to eliminate false
negatives (lit fragments classified as in shadow) but to introduce false positives
(shadowed fragments being lit). However, removing the misclassification in
case of false positive back surfaces is easily accomplished: in fact, observing
that in a closed object a surface that points away from the light source and
thus is back-facing from the light point of view is enough to correctly classify
that surface as not being lit. To detect this condition, in the light pass we must
be able to check if the fragment being shaded represents a part of the surface
that is back facing from the point of view of the light: the fragment shader
must check if the interpolated surface normal N points in the same hemisphere
of the light vector (point and spot lights) or light direction (directional lights)
L, that is, if N · L > 0.
the neighborhood of T , and then averaging the boolean results and obtaining
a value in the interval [0, 1] to use for lighting the fragment. We may com-
pare this technique with the area averaging technique for segment antialiasing
discussed in Section 5.3.3 to find out that it follows essentially the same idea.
It is very important to underline that we do not perform a single shadow
test with the average depth value of the samples (that would mean to test
against a non-existent surface element given by the average depth); instead,
we execute the test for every sample and then average the results. This process,
known as Percentage Closer Filtering (PCF), helps increment the quality of
the shadow rendering at the cost of multiple accesses to the depth map.
Softening the shadow edges can also be used to mimic the visual effect of a
penumbra (only when the penumbra region is small). At any rate, this should
not be confused with a method to calculate the penumbra effect.
Figure 8.7 shows a client using PCF to reduce aliasing artifacts.
FIGURE 8.7 (SEE COLOR INSERT): PCF shadow mapping. (See client
http://envymycarbook.com/chapter8/0/0.html.)
FIGURE 8.9: If the viewer is positioned inside the shadow volume the dis-
parity test fails.
the ray after the first hit with a surface: if it exits more times than it enters,
the point hit is in shadow, otherwise it is not.
to the light can be capped by using the front faces of the object, the farthest
one by projecting the same faces onto the same plane on which we projected
the edges. Note that the projected faces will have to be inverted (that is, two
of their indices need to be swapped) so that they are oriented towards the
exterior of the bounding volume.
1. Render the scene and compute the shading as if all the fragments were
in shadow.
2. Disable writes to the depth and color buffers. Note that from now on
the depth buffer will not be changed and it contains the depth of the
scene from the viewer’s camera.
3. Enable the stencil test.
object casting the shadow change so it may easily become too cumbersome. In
fact, the construction of the shadow volume is typically done on the CPU side
and it requires great care to avoid inaccuracies due to numerical precision.
For example, if the normals of two adjacent triangles are very similar the edge
may be misclassified as silhouette edge and so create a “hole” in the shadow
volume.
8.6 Self-Exercises
8.6.1 General
1. Describe what kind of shadow you would obtain if the viewer camera
and the light camera coincide or if they coincide but the z axes of the
view frame are the opposite of each other.
2. We know that the approximation errors of shadow mapping are due to
numerical precision and texture resolution. Explain how reducing the
light camera viewing volume would affect those errors.
3. If we have n lights, we need n rendering passes to create the shadow
maps, and the fragment shader in the lighting pass will have to make
at least n accesses to texture to determine if the fragment is in shadow
for some of the lights. This will surely impact on the frame rate. Can
we use frustum culling to reduce this cost for: directional light sources,
point light sources or spotlights?
Variation 1: The UFOs are not always parallel to the ground, they can
change orientation.
Variation 2: This time the UFOs are not simple disks, they can be
of any shape. However, we know that they fly so high they will always
be closer to the sun than anything else. Think how to optimize the cast
shadows with shadow mapping, reducing the texture for the depth value
and simplifying the shader. Hint: How many bits would be enough to
store the depth in the shadow map?
2. Place a 2 × 2 meter glass panel near a building. On this panel map
a texture with an RGBA image where the value for the α channel
is either 0 or 1 (you can use the one in http://envymycarbook.com/
media/textures/smiley.png). Implement shadow mapping for the car’s
headlights so that when they illuminate the glass panel, the image on
it is mapped on the walls. Hint: account for the transparency in the
shadow pass.
Chapter 9
Image-Based Impostors
275
276 Introduction to Computer Graphics: A Practical Learning Approach
9.1 Sprites
A sprite is a two-dimensional image or animation inserted in a scene typ-
ically used to show the action of a character. Figure 9.2 shows sprites from
the well known video game Pac-Man® . On the right part of the image is
the animation of the Pac-Man. Note that since the sprite is overlaid on the
background, pixels that are not part of the drawing are transparent. Knowing
what we now know, sprites may look naive and pointless, but they have been
a breakthrough in the game industry since Atari® introduced them back in
1977. When the refresh rate did not allow the game to show moving charac-
ters, hardware sprites, circuitry dedicated to light small squares of pixels in
a predetermined sequence on any point of the screen allowed for showing an
animation as in an overlay mode, without requiring the redraw of the back-
ground. As you may note by looking at one of these old video games, there
may be aliasing effects in the transition between the sprite and the background
FIGURE 9.2: Examples of sprites. (Left) The main character, the ghost and
the cherry of the famous Pac-Man® game. (Right) Animation of the main
character.
Image-Based Impostors 277
because sprites are prepared beforehand and were the same on every position
of the screen. With 3D games, sprites became less central and more a tool for
things like lens flare, an effect we will see in Section 9.2.4. However, in recent
years there has been an outbreak of 2D games on the Web and for mobile
devices, and therefore sprites became popular again, although they are now
implemented as textures on rectangles and no sprite-specialized hardware is
involved.
9.2 Billboarding
We anticipated the example of a billboard in the introduction to this chap-
ter. More formally, a billboard consists of a rectangle with a texture, usually
with alpha channel. So, billboards also include sprites, only they live in the
3D scene and may be oriented in order to provide a sense of depth that is
not possible to achieve with sprites. Figure 9.3 shows a representation of the
billboard we will refer to in this section. We assume the rectangle is specified
in an orthogonal frame B. Within this frame, the rectangle is symmetric with
respect to the y axis, lies on the XY plane and in the Y + half space.
The way frame B is determined divides the billboard techqniques into the
following classes: static, screen-aligned, axis-aligned and spherical.
The only interesting thing to point out is how we build the frame B for
screen aligned impostors and in which point of our rendering we render them.
For things like the speedometer or the image of the drivers’s avatar that we
always want to overlay the rest, we simply repeat what we did in Section 5.3.4,
that is, we express the impostors directly in NDC space and draw it after
everything else and after disabling the depth test. We may want fancier effects
like to make some writing appear like it is in the middle of the scene, as to say,
Image-Based Impostors 279
FIGURE 9.4 (SEE COLOR INSERT): Client with gadgets added us-
ing plane-oriented billboard. (See client http://envymy carbook.com/chapter9/
0/0/html.)
15 NVMCClient . i n i t i a l i z e S c r e e n A l i g n e d B i l l b o a r d = function ( gl ) {
16 var t ex t ur e Sp ee d om et e r = this . createTexture ( gl , ←-
../../../media/textures/speedometer.png) ;
17 var textureNeedle = this . createTexture ( gl , ←-
../../../media/textures/needle2.png) ;
18 this . b i l l b o a r d S p e e d o m e t e r = new O n S c r e e n B i l l b o a r d ([ -0.8 , ←-
-0.65] , 0.15 , 0.15 , textureSpeedometer , [0.0 , 0.0 , 1.0 , ←-
0.0 , 1.0 , 1.0 , 0.0 , 1.0]) ;
19 this . c r e a t e O b j e c t B u f f e r s ( gl , this . b i l l b o a r d S p e e d o m e t e r .←-
billboard_quad , false , false , true ) ;
20 this . billboardNeedle = new OnS cree nBi llb oar d ([ -0.8 , ←-
-0.58] , 0.09 , 0.09 , textureNeedle , [0.0 , 0.0 , 1.0 , 0.0 , ←-
1.0 , 1.0 , 0.0 , 1.0]) ;
21 this . c r e a t e O b j e c t B u f f e r s ( gl , this . b il lb o ar dN ee d le .←-
billboard_quad ) ;
22
23 var textureNumbers = this . createTexture ( gl , ←-
../../../media/textures/numbers.png) ;
24 this . billboardDigits = [];
25 for ( var i = 0; i < 10; ++ i ) {
280 Introduction to Computer Graphics: A Practical Learning Approach
For the analogic version, we use two different billboards, one for the plate
and one for the needle. When we render the speedometer, we first render the
plate and then the needle, properly rotated on the base of the current car’s
velocity. For the version with digits, we create 10 billboards, all referring to
the same texture textureNumbers, containing the images of the digits 0 . . . 9,
and making sure that the texture coordinates of the billboard i map to the
rectangle of texture containing the number i.
FIGURE 9.5: Lens flare effect. Light scattered inside the optics of the camera
produce flares of light on the final image. Note also the increased diameter of
the sun, called blooming effect.
Image-Based Impostors 281
FIGURE 9.6 (SEE COLOR INSERT): (Left) Positions of the lens flare
in screen space. (Right) Examples of textures used to simulate the effect.
in the image. These are fairly complex effects to model optically in real time
(although there are recent techniques that tackle this problem) but they can be
nicely emulated using screen-aligned impostors, and they have been commonly
found in video games as far back as the late 1990s. Figure 9.6 illustrates how
to determine position and size of the flares. A flare can be done as a post
processing effect, which means that it happens on the final image after the
rendering of the 3D scene has been done. We say that a flare is a brighter and
colored region. Figure 9.6 (Right) shows what is called a luminance texture,
which is simply a single channel texture. If this image is set as texture on
a, say, red rectangle we can modulate the color of the textured polygon by
multiplying the alpha value by the fragment color in our fragment shader.
Therefore the result would be a shade of red from full red to black. If we draw
this textured rectangle enabling blending and set the blending coefficients to
gl.ONE, gl.ONE the result will simply be the sum of the color in the framebuffer
with the color of the textured polygon, which will cause the red channel to
increase by the value of the luminance texture (please note that black is 0).
This is it. We can combine more of these impostors and we will obtain any
sort of flare we want. For the main flare, that is, the light source, we may
use a few star-shaped textures and some round ones. We will have a white
patch due to the overlapping between impostors with colors on all the three
channels so we also achieve a kind of blooming.
Note that if the light source is not visible we do not want to create lens
flares. As we learned in Section 5.2, a point is not visible either because it is
outside the view frustum or because it is hidden by something closer to the
point of view along the same line of sight.
the light source position is smaller than the value in the depth texture.
If it is not, discard the fragment.
Listing 9.4 shows the fragment shader that implements the technique just
described. At lines 30-31 the light position is transformed in NDC space and
the tests in lines 32-37 check if the position is outside the view frustum, in
which case the fragment is discarded. Line 38 reads the depth buffer at the
coordinates of the projection of the point light and at line 40 we test if the
point is visible or not, in which case the fragment is discarded.
On the Javascript side, Listing 9.5 show the piece of code drawing the lens
flare. Please note that the function drawLensFlares is called after the scene has
been rendered. At lines 65-67 we disable the depth test and enable blending
and at line 73 update the position in NDC space of the billboards. Just like
for shadow mapping, we bind the texture attachment of the framebuffer where
the depth buffer has been stored (this.shadowMapTextureTarget.texture) and,
of course, the texture of the billboard.
FIGURE 9.7 (SEE COLOR INSERT): A client with the lens flare are in
effect. (See client http://envymycarbook.com/chapter7/4/4.html.)
Note that for an orthogonal projection the axes would be the same as for
the screen aligned billboards. Axis aligned billboards are typically used for
objects with a roughly cylindrical symmetry, which means they look roughly
the same from every direction, assuming you are on the same plane (that is,
not above or below). This is why trees are the typical objects replaced with
axis aligned billboards.
FIGURE 9.8: Alpha channel of a texture for showing a tree with a billboard.
(See client http://envymycarbook.com/chapter9/2/2.html.)
a rendering of the real model and so on. On-the-fly billboarding may help us
save a lot of computation but it also requires some criterion to establish when
the billboard is obsolete.
This technique is often referred to as “impostor” (for example by Reference
[1]). We refer to this technique as on-the-fly billboarding, in order to highlight
its main characteristic, and keep the original and general meaning of the term
impostor.
yB = yV
oV − oB
z’B =
koV − oB k
xB = yB × z’B
zB = xB × yB
FIGURE 9.9: (Left) Axis-aligned billboarding. The billboard may only ro-
tate around the y axis of its frame B. (Right) Spherical billboarding: the axis
zB always points to the point of view oV .
Image-Based Impostors 289
boards, not because they have spherical symmetry but because they have no
well-defined shape.
FIGURE 9.12: The way height field is ray traced by the fragment shader.
9.4 Self-Exercises
9.4.1 General
1. Discuss the following statements:
• If the object to render is completely flat there is no point in creating
a billboard.
• If the object to render is completely flat the billboard is completely
equivalent to the original object.
• The size in pixels of the texture used for a billboard must be higher
than or equal to the viewport size.
2. Which are the factors that influence the distance from which we can
notice that a billboard has replaced the original geometry?
3. Can we apply mipmapping when rendering a billboard?
2. Create a very simple billboard cloud for the car. The billboard cloud
is made of five faces of the car’s bounding box (all except the bottom
one); the textures are found by making one orthogonal rendering for
each face. Change the client to use the billboard cloud instead of the
textured model when the car is far enough from the viewer. Hint: For
Image-Based Impostors 293
finding the distance consider the size of the projection of a segment with
length equal to the car bounding box’s diagonal. To do so proceed as
follows:
(a) Consider a segment defined by positions ([0, 0, −zcar ], [0, diag,
−zcar ]) (in view space) , where zcar is the z coordinate of the car’s
bounding box center and diag is its diagonal.
(b) Compute the length diagss of the segment in screen space (in pix-
els).
(c) Find heuristically a threshold for diagss to switch from original
geometry to billboard cloud.
Note that this is not the same as taking the distance of the car from the
viewer because considering also the size of the bounding box we indi-
rectly estimate at which distance texture magnification would happen.
Chapter 10
Advanced Techniques
In this chapter we will cover some advanced techniques to add some fancy
effect to our client, mostly using screen space techiques. With this term we
refer to those techniques that work by rendering the scene, performing some
processing on the generated image and then using it to compose the output
result.
As practical examples we will see how to simulate the out-of-focus and
motion effects of the photo-camera and how to add some more advanced
shadowing effects, but, more than that, we will see the basic concepts and
tools for implementing these sort of techniques.
295
296 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 10.2: A generic filter of 3 × 3 kernel size. As we can see, the mask
of weights of the filter is centered on the pixel to be filtered.
10.1.1 Blurring
Many image filter operations can be expressed as a weighted summation
over a certain region of the input image I. Figure 10.2 shows this process.
Mathematically, the value of the pixel (x0 , y0 ) of the filtered image I 0 can
be expressed as:
xX
0 +N yX
0 +M
0 1
I (x0 , y0 ) = W (x + N − x0 , y + M − y0 )I(x, y) (10.1)
T
x=x0 −N y=y0 −M
where N and M are the radius of the filtering window, W (x, y) is the matrix
of weights that defines the filter, and T is the sum of the absolute values of
the weights, which acts as a normalization factor. The size of the filtering
window defines the support of the filter and it usually called the filter kernel
size. The total number of pixels involved in the filtering is (2N +1)(2M +1) =
4N M + 2(N + M ) + 1. We underline that, typically, the window is a square
and not a rectangle (that is, N = M ).
In its simpler form, a blurring operation can be obtained simply by aver-
aging the values of the pixels on the support of the filter. Hence, for example,
for N = M = 2, the matrix of weights corresponding to this operation is:
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
W (i, j) = (10.2)
1 1 1 1 1
1 1 1 1 1
image with a box function. This is the reason why this type of blur filter is
usually called a box filter. Obviously, the blur effect increases as the size of
the window increases. In fact, in this way the pixels’ values are averaged on
a wider support. Figure 10.3 shows an example of applying this filter to an
image (the RGB color channels are filtered separately).
The blur obtained by using the box filter can also be obtained with other
averaging functions. An alternative option can be to consider the pixels closer
to the central pixel (x0 , y0 ) more influencing than the ones distant from it. To
do so, usually a Gaussian function is employed as a weighting function. A 2D
Gaussian is defined as:
1 − (x2 +y2 2 )
g(x, y) = e 2σ (10.3)
2πσ
The support of this function (that is, the domain over which it is defined)
is all the R2 plane, but practically
p it canbe limited considering that when
the distance from the origin (x2 + y 2 ) is higher than 3σ, the Gaussian
values go very close to zero. So, it is good practice to choose the support of
the Gaussian kernel dependent on the value of σ.
By plugging the Gaussian function into Equation (10.1) we obtain the
so-called Gaussian filter :
x0P
+N y=y0 +N (x−x0 )2 +(y−y0 )2
I(x, y)e(− )
P
2σ
0 x=x0 −N y=y0 −N
I (x, y) = y=y
(10.4)
x0P
+N 0 +N (x−x0 )2 +(y−y0 )2
e(− )
P
2σ
x=x0 −N y=y0 −N
Concerning the kernel size, for what was just stated, it is good practice to set
N equal to 3σ or 2σ. For a Gaussian filter, a weighting matrix of 7 × 7 with
Advanced Techniques 299
0.2
0.15
0.1
0.05
0 8
8
6 6
4 4
2 2
0 0
Note that at the borders of the matrix, where the distance becomes 3σ, the
values go quickly to zero. A graphical representation of these weights is shown
in Figure 10.4, while an example of an application of this filter is shown in
Figure 10.5.
FIGURE 10.6: Out-of-focus example. The scene has been captured such
that the car is in focus while the rest of the background is out of focus. The
range of depth where the objects framed are in focus is called depth of field
of the camera. (Courtesy of Francesco Banterle.)
The value of c is clamped in the range [0.0, Rmax ] to prevent increasing the
kernel size too much.
3. Render a quad covering the screen exactly and with texture coordi-
nates equal to (0, 0), (1, 0), (1, 1), (0, 1). Typically this is done by draw-
ing in NDC space and hence the quad has coordinates (−1, −1, −1),
(1, −1, −1), (1, 1, −1) and (−1, 1, −1). This is called the fullscreen quad.
By rendering a fullscreen quad we activate a fragment for each pixel and so
we have access to all the pixels of the scene rendered at step 1.
Listing 10.1 shows the salient part of the JavaScript code. From line 201
to line 213 we render the scene to store the depth buffer, just like we did
in Section 8.3 for shadow mapping. In fact, we reuse the same frame buffer,
variables and shader. From lines 215 to 221 we render the scene again, this
time to store the color buffer.
In principle we would not need to render the scene twice if we had a
multiple render target. This functionality, not in the WebGL API at the time
of this writing, allows you to output on multiple buffers simultaneously, so
that the same shader may write the color on one buffer and some other value
on another. The only change in the shader language is that we would have
gl FragColor[i] in the fragment shader, with i the index of the buffer to render
to.
Finally, in lines 230-243 we render a full screen quad binding the textures
filled in the two previous renderings and enabling the depthOfFieldShader that
we will comment on next. Note that at line 233 we specify the depth of field
with two values that we mean to be in meters. This is an important bit because
we must take care of the reference systems when we read the depth from the
texture, where we will read the values in the interval [0, 1] and compare them
with values in meters. More specifically, we know that the value of zV (that
is, z in view space) will be transformed by the perspective projection as:
A B
z }| { z }| {
f +n fn 1
zN DC = +2
f −n f − n zV
(check multiplying [x, y, z, 1]T by the perspective matrix Ppersp in 4.10) and
then to [0, 1] as:
z01 = (zN DC + 1)/2
In the fragment shader, shown in Listing 10.2, we read the depth values
from the texture and they are in the interval [0, 1] (see line 40): we have to
invert the transformation to express them in view space and test them with
the depth of field interval (lines 41-42). This is why we pass to the shader
the values A and B, because they are the entries of the perspective matrix
necessary to invert the transformation from [0, 1] to view space.
You may wonder why we don’t make it simpler and pass the depth of field
interval directly in [0, 1]. We could, but if we want to be able to express our
interval in meters, which is something the user expects, we should at least
transform from view space to [0, 1] in the JavaScript side. Consider the func-
tion ComputeRadiusCoC. As it is, the radius would not increase linearly with
Advanced Techniques 303
the distance from the interval extremes but with the distance of their recipro-
cals (you can check this by plugging the above equations into the functions).
This does not mean it would not work, but we would not have implemented
what is described by Equation (10.6).
200 if ( this . d e p t h _ o f _ f i e l d _ e n a b l e d ) {
201 gl . bindFramebuffer ( gl . FRAMEBUFFER , this .←-
s h a d o w M a p T e x t u r e T a r g e t . framebuffer ) ;
202
203 this . shadowMatrix = SglMat4 . mul ( this . projectionMatrix , this .←-
stack . matrix ) ;
204 this . stack . push () ;
205 this . stack . load ( this . shadowMatrix ) ;
206
207 gl . clearColor (1.0 , 1.0 , 1.0 , 1.0) ;
208 gl . clear ( gl . COLOR_BUFFER_BIT | gl . DEPTH_BUFFER_BIT ) ;
209 gl . viewport (0 , 0 , this . s h a d o w M a p T e x t u r e T a r g e t . framebuffer .←-
width , this . s h a d o w M a p T e x t u r e T a r g e t . framebuffer . height ) ;
210 gl . useProgram ( this . s h a d o w M a p C r e a t e S h a d e r ) ;
211 gl . u n i f o r m M at r i x 4 f v ( this . s h a d o w M a p C r e a t e S h a d e r .←-
uShadowMatrixLocation , false , this . stack . matrix ) ;
212 this . drawDepthOnly ( gl ) ;
213 this . stack . pop () ;
214
215 gl . bindFramebuffer ( gl . FRAMEBUFFER , this .←-
f i r s t P a s s T e x t u r e T a r g e t . framebuffer ) ;
216 gl . clearColor (1.0 , 1.0 , 1.0 , 1.0) ;
217 gl . clear ( gl . COLOR_BUFFER_BIT | gl . DEPTH_BUFFER_BIT ) ;
218 gl . viewport (0 , 0 , this . f i r s t P a s s T e x t u r e T a r g e t . framebuffer .←-
width , this . f i r s t P a s s T e x t u r e T a r g e t . framebuffer . height ) ;
219 this . drawSkyBox ( gl ) ;
220 this . dra wEveryt hing ( gl , false , this . f i r s t P a s s T e x t u r e T a r g e t .←-
framebuffer ) ;
221 gl . bindFramebuffer ( gl . FRAMEBUFFER , null ) ;
222
223 gl . viewport (0 , 0 , width , height ) ;
224 gl . disable ( gl . DEPTH_TEST ) ;
225 gl . activeTexture ( gl . TEXTURE0 ) ;
226 gl . bindTexture ( gl . TEXTURE_2D , this . f i r s t P a s s T e x t u r e T a r g e t .←-
texture ) ;
227 gl . activeTexture ( gl . TEXTURE1 ) ;
228 gl . bindTexture ( gl . TEXTURE_2D , this . s h a d o w M a p T e x t u r e T a r g e t .←-
texture ) ;
229
230 gl . useProgram ( this . d ep th O fF ie l dS h ad er ) ;
231 gl . uniform1i ( this . de p th Of F ie l dS ha d er . uTextureLocation , 0) ;
232 gl . uniform1i ( this . d e p t h O f F i e l d S h a d e r . uDepthTextureLocation , ←-
1) ;
233 var dof = [10.0 , 13.0];
234 var A = ( far + near ) / ( far - near ) ;
235 var B = 2 * far * near / ( far - near ) ;
236 gl . uniform2fv ( this . d ep th O fF ie l dS h ad er . uDofLocation , dof ) ;
237 gl . uniform1f ( this . de p th Of F ie l dS ha d er . uALocation , A ) ;
238 gl . uniform1f ( this . de p th Of F ie l dS ha d er . uBLocation , B ) ;
239
240 var pxs = [1.0 / this . f i r s t P a s s T e x t u r e T a r g e t . framebuffer .←-
304 Introduction to Computer Graphics: A Practical Learning Approach
58 gl_FragColor = color ;
LISTING 10.2: Depth of field implementation (shader side). (Code snippet
from http://envymycarbook.com/code/chapter10/0/shaders.js.)
Note that we cannot directly use the value of the radius computed at line
45 in the loop of the filter, because the shader compiler must be able to unroll
the loop. Therefore, we place the maximum kernel size as cycle limits and test
the fragment distance from the kernel center to zero contribution outside the
kernel size.
The same operations can be done more efficiently by splitting the computa-
tion of the blurred image in a first “horizontal step,” where we sum the values
only along the x axis, and a “vertical step” where we sum on the result of the
previous step vertically. The final result will be the same, but the rendering
will be faster because now we apply N + M operations per pixel rather than
N × M. Figure 10.8 shows a snapshot from the photographer view with depth
of field. As it is, this solution can create some artifacts, the most noticeable of
which are due to the depth discontinuities. Suppose we have one object close
to the camera and that it is in focus. What happens around the silhouette
of the object is that the parts of the object out of focus are influenced by
those of the background that are in focus, with the final effect that the border
between the two will always look a bit fuzzy. These problems may be partially
overcome by not counting pixels whose depth value is too different from the
one of the pixel being considered. Another improvement may be to sample
more than a single value of the depth map and blur the color accordingly.
In this case ∆x and ∆y represents the discrete version of the partial derivatives
∂I(x, y)/∂x and ∂I(x, y)/∂y, respectively.
At this point, it is easy to define the “strength” of an edge as the magnitude
of the gradient: q
E(x, y) = ∆2x (x, y) + ∆2y (x, y) . (10.13)
We indicate with E the resulting extracted edge image.
So, taking into account Equation (10.13), the edge response at pixel (x0 , y0 )
given an input image I(x, y) can be easily written in matrix form as:
xX
0 +1 0 +1
yX
Ih (x0 , y0 ) = W∆x (x + 1 − x0 , y + 1 − y0 )I(x, y)
x=x0 −1 y=y0 −1
xX
0 +1 yX
0 +1
where Ih (x, y) is the image of the horizontal derivative, Iv (x, y) is the image of
the vertical derivative, and W∆x (i, j) and W∆y (i, j) are the matrix of weights
defined as:
0 0 0 0 1 0
W∆x = −1 0 1 W∆y = 0 0 0 (10.15)
0 0 0 0 −1 0
The filter (10.14) is the most basic filter to extract the edge based on first
order derivatives.
Two numerical approximations of the first order derivatives that are more
accurate than the one just described are provided, respectively, by exploiting
the Prewitt operator:
−1 0 1 1 1 1
W∆x = −1 0 1 W∆y = 0 0 0 (10.16)
−1 0 1 −1 −1 −1
and the Sobel operator:
1 0 −1 −1 −2 −1
W∆ x = 2 0 −2 W∆y = 0 0 0 (10.17)
1 0 −1 1 2 1
The second trick is to quantize the shading values in order to simulate the
use of a limited set of colors in the scene. In particular, here we use a simple
diffuse model with three levels of quantization of the colors: dark, normal and
light. In this way, a green object will result in some parts colored with dark
green, in other parts colored in green and in other parts colored in light green.
The code that implements this simple quantized lighting model is given in
Listing 10.4.
We follow the same scheme as for the depth-of-field client, but this time we
need to produce only the color buffer and then we can render the full screen
quad. The steps are the following:
1. Render the scene to produce the color buffer
2. Bind the texture produced at step 1 and render the full screen quad. For
each fragment, if it is on a strong edge output black, otherwise output
the color quantized version of the diffuse lighting (see Listing 10.5).
Figure 10.10 shows the final result.
310 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 10.10 (SEE COLOR INSERT): Toon shading client. (See client
http://envymycarbook.com/chapter10/1/1.html.)
FIGURE 10.11: Motion blur. Since the car is moving by ∆ during the
exposure, the pixel value in x0 (t + dt) is an accumulation of the pixels ahead
in the interval x0 (t + dt) + ∆.
small, so that every object is perfectly “still” during the shot, no matter how
fast it travels. Now we want to emulate reality by considering that the exposure
time is not infinitely small and the scene changes while the shutter is open.
Figure 10.11 illustrates a situation where, during the exposure time, the car
moves from left to right. What happens in this case is that different points
on the car surface will project into the same pixel, all of them contributing
to the final color. As a result, the image will be blurred in the regions where
the moving objects have been. This type of blur is named motion blur and in
photography it is used to obtain the panning effect, which is when you have
the moving object in-focus and the background blurred. The way it is done is
very simple: the photographer aims at the moving object while the shutter is
open, making it so that the relative motion of the object with respect to the
camera frame is almost 0. In Section 4.11.2 we added a special view mode to
the photographer such that it constantly aims at the car. Now we will emulate
motion blur so that we can reproduce this effect.
The most straightforward way to emulate motion blur is to simply mimic
what happens in reality, that is, taking multiple renderings of the scene within
the exposure interval and averaging the result. The drawback of this solution
is that you need to render the scene several times and it may become a bot-
tleneck. We will implement motion blur in a more efficient way, as a post-
processing step. First we need to calculate the so called velocity buffer, that
is, a buffer where each pixel stores a velocity vector indicating the velocity at
which the point projecting on that pixel is moving in screen space. When we
have the velocity buffer we output the color for a pixel in the final image just
by sampling the current rendering along the velocity vector associated with
the pixel, as shown in Figure 10.12.
is moving and the camera is fixed. Here we deal with a unified version of the
problem where all the motion is considered in the camera reference frame.
Note that the basic procedure that we followed to handle the geometric
transformation of our primitives coordinates is to pass to the shader program
the modelview and the projection matrix. Then in the vertex shader we always
have a line of code that transforms the position from object space to clip space:
gl Position = uProjectionMatrix * uModelViewMatrix * vec4(aPosition, 1.0);
No matter if the camera is fixed or not or if the scene is static or not, this
expression will always transform the coordinates from object to clipspace (and
hence in window coordinates). Assuming that the projection matrix does not
change (which is perfectly sound since you do not zoom during the click of
the camera), if we store the modelview matrix for each vertex of the previous
frame and pass it along with the one of the current frames to the shader,
we will be able to compute, for each vertex, its position on screen space at
the previous and at the current frame, so that their difference is the velocity
vector.
So we have to change our code to keep track, for each frame, of the value
of the modelview matrix at the previous frame (that is, stack.matrix in the
code). Since every element of the scene we draw is a JavaScript object of
our NVMCClient, we simply extend every object with a member to store the
modelview matrix at the previous frame. Listing 10.6 shows the change applied
to the drawing of the trees: at line 89, after the tree trees[i] has been rendered,
we store the modelview matrix in trees[i].previous transform. This is the value
that will be passed to the shader that computes the velocity buffer.
Advanced Techniques 313
Listing 10.8 shows the fragment shader to perform the final rendering with the
full screen quad. We have the uVelocityTexture that has been written by the
velocityVectorShader and the uTexture containing the normal rendering of the
scene. For each fragment, we take STEPS samples of the uTexture along the
velocity vector. Since velocity vector is written with only 8 bit precision, the
value we read and convert with the function Vel(..) at line 19 is not exactly
what we computed with the velocityVectorShader. This is acceptable except
when the scene is static (that is, nothing moves at all) and still, because of
this approximation, we notice some blurring around the image, so at line 30
we simply set to [0, 0] the too small velocity vectors.
where Iunsharp is the output image with the sharpness increased. The param-
eter λ is used to tune the amount of details re-added. High values of λ may
exacerbate the details too much, thus resulting in an unrealistic look for the
image, while low values of λ may produce modifications that are not per-
ceivable. The choice of these depends on the content of the image and on
316 Introduction to Computer Graphics: A Practical Learning Approach
the effect that we want to achieve. Figure 10.14 shows an example of details
enhancement using unsharp masking.
FIGURE 10.15: Occlusion examples. (Left) The point p receives only certain
rays of light because it is self-occluded by its surface. (Right) The point p
receives few rays of light because it is occluded by the occluders O.
FIGURE 10.17: The horizon angle h(θ) and the tangent angle t(θ) in a
specific direction θ.
contribution of section Sθ
z }| {
Z θ=π Z π/2
1
A(p) = V (p, ω(θ, α))W (θ)dα dθ (10.22)
2π θ=−π α=0
because the contribution of the inner integral is 0 for α > 0. Note that we
also replaced np · ω(θ, α) with a generic weighting function W (θ) (which we
320 Introduction to Computer Graphics: A Practical Learning Approach
will specify later on) that does not depend on α and so can be taken out of
the integral.
Now the interesting part. Hz is a value expressed in the tangent frame,
but our representation of the surface S is the depth buffer, which means we
can have z values expressed in the frame made by x0 and z. So we find Hz
by subtraction of two angles that we can compute by sampling the depth
buffer: h(θ) and t(θ). h(θ) is the horizon angle over the x0 axis and t(θ) is
the angle formed by the tangent vector xθ and x0 . You can easily see that:
Hz = h(θ) − t(θ) and hence equation 10.23 becomes:
Z π
1
A(p) = (sin(h(θ)) − sin(t(θ))) W (ω)dω (10.24)
2π θ=−π
Given a point p, and the knowledge of the horizon angles in several direc-
tions, allows us to estimate approximately the region of the hemisphere where
the rays are not self-occluded. The greater this region, the greater the value
of the ambient occlusion term.
Equation (10.24) can be easily calculated at rendering time with a two-pass
algorithm. In the first pass the depth map is generated, like in the depth-of-
field client (see Section 10.1.2), and used during the second pass to determine
the angles h(θ) and t(θ) for each pixel. Obviously, Equation (10.24) is evalu-
ated only for a discrete number Nd of directions (θ0 , θ1 , . . . , θNd −1 ):
Nd
1 X
A(p) = (sin(h(θi )) − sin(t(θi ))) W (θ) (10.25)
2π i=0
In this case the i-th particle is also influenced by the position of the nearest
k particles, indicated with p1i (t), pi2 (t), . . . , pki (t).
The set of particles in a particle system is not fixed. Each particle is cre-
ated by an emitter and inserted in the system with an initial state, then its
state is updated for a certain amount of time and finally is removed. The
lifespan of a particle is not always strictly dependent on time. For example,
when implementing rain, the particles may be created on a plane above the
scene and then removed when they hit the ground. Another example is with
fireworks: particles are all created at the origin of the fire (the launcher of the
fireworks) with an initial velocity and removed from the system when along
their descending parabola. The creation of particles should be randomized to
avoid creating visible patterns that jeopardize the final effect.
because there is no parallax to see in a single particle, but we can also have
simpler representations like just points or segments. For dense participating
media such as smoke, blending will be enabled and set to accumulate the value
of the alpha channel, that is, the more particles project on the same pixel the
more opaque is the result.
10.5 Self-Exercises
10.5.1 General
1. Imagine that the generateMipmap is suddenly removed by the WebGl
specification! How can we create the mipmap levels of a given texture
entirely on the GPU (that is, without readbacks)?
(b) The normal buffer. Hint: You have to pack the normals as we did
for the depth buffer.
5. Improve the toon shading client by also making the black edges bold.
Hint: Add a rendering pass in order to expand all the strong edge pixels
by one pixel in every direction.
6. Improve the implementation of the lens flares effect (see Section 9.2.4).
Hint: Use the fullscreen quad to avoid a rendering pass.
7. Using only the normal map of the street of Section 7.8.3, create an
ambient occlusion map, that is, a texture where each texel stores the
ambient occlusion term. Hint: If the dot product of the normal at texel
x, y and every one of the normals on the neighbor texels is negative we
can put 1 as an ambient occlusion term (that is, not occluded at all).
Chapter 11
Global Illumination
325
326 Introduction to Computer Graphics: A Practical Learning Approach
along the ray. If the direction is normalized then t represents the Euclidean
distance of the point from the origin of the ray.
Ray tracing plays a very important role in global illumination computation:
(a) For simulating the propagation of light rays originating at the light source
through a scene (light ray tracing or photon tracing).
(b) For computing the amount of light reaching the camera through a pixel
by tracing a ray from the camera, following the ray through the scene,
and collecting all the lighting information (classical ray tracing, Monte
Carlo ray tracing, etc.).
Tracing a ray means extending the ray from its origin, and collecting some
information along the ray. The exact information collected depends on the ap-
plication and the type of scene. We restrict our discussion to scenes containing
solid opaque objects. Independent of the application, the major computational
effort in tracing the ray is the computation of the ray-scene intersection. As
the scene is assumed to be composed of one or more solid objects, intersecting
the ray with the scene involves computation of ray–object intersection. As
parameter t represents a point along the ray, computing ray–object intersec-
tion can be resolved by computing the ray parameter t. Plenty of research
has been devoted to finding efficient ray–object intersection algorithms for a
large number of object types. We will restrict our discussion on ray–object
intersection to only two classes of objects: algebraic and parametric surfaces.
f (ox + ti dx , oy + ti dy , oz + ti dz ) = 0 (11.2)
ax + by + cz + d = 0 (11.3)
Global Illumination 327
or
(aox + boy + coz + d)
ti = − (11.5)
(adx + bdy + cdz )
Thus the ray–plane intersection computation is the simplest of all ray–object
intersections.
(p − c) · (p − c) − r2 = 0 (11.7)
For a ray to intersect a sphere, the point of intersection pi must satisfy the
following equation:
(o + ti d − c) · (o + ti d − c) − r2 = 0 (11.9)
o + ti d = pi = g(ui , vi ) (11.11)
must hold. So we get a system of three equations with three unknown param-
eters,
g(ui , vi , ti ) = 0 (11.12)
328 Introduction to Computer Graphics: A Practical Learning Approach
name suggests, this type of bounding volume is a rectangular box with its
bounding planes aligned to the axis. In other words, the six bounding planes
of an AABB are parallel to the three main axial planes. An oriented bounding
box (OBB) is another type of bounding box that is more compact in enclosing
the volume of the object, but is not constrained to have its planes parallel to
the axial planes. Figure 11.1 shows an example of an AABB and an OBB. In
the following we always refer to AABB because this box type is the one used
by the algorithm we will describe.
The AABB for a scene is computed by finding the minimum and maximum
of the coordinates of the objects in the scene. The minimum and maximum co-
ordinates define the two extreme corner points referred to as cmin = AABB.min
and cmax = AABB.max in the following.
Every pair of the faces of an AABB is parallel to a Cartesian plane. Let us
consider the two planes parallel to the XY-plane. The Z-coordinates of every
point on these planes are equal to cmin,z and cmax,z . So the algebraic equation
of the two planes of the AABB parallel to the XY-plane are
exists, then the nearest of the farthest intersection points will be the far-
thest of the ray–AABB intersection points, and the farthest of the nearest
intersection points will be the nearest ray–AABB intersection point. Thus
the ray parameter for the nearest point of ray–AABB intersection tmin is
max(tmin,x , tmin,y , tmin,z ), and tmax , that of the farthest point of intersection
is min(tmax,x , tmax,y , tmax,z ). If tmin < tmax then the ray does indeed intersect
the AABB and the nearest point of ray–AABB intersection is given by the
ray parameter tmin . The pseudo-code for ray–AABB intersection computation
is given in Listing 11.1.
function ray−AABB()
INPUT ray, AABB
tmin,y = (AABB.min.y−ray.o.y)/ray.d.y
tmax,y = (AABB.max.y−ray.o.y)/ray.d.y
if (tmin,y > tmax,y )
swap(tmin,y ,tmax,y )
endif
tmin,z = (AABB.min.z−ray.o.z)/ray.d.z
tmax,z = (AABB.max.z−ray.o.z)/ray.d.z
if (tmin,z > tmax,z )
swap(tmin,z ,tmax,z )
endif
FIGURE 11.2: The idea of a uniform subdivision grid shown in 2D. Only the
objects inside the uniform subdivision cells traversed by the ray (highlighted
in light gray) are tested for intersections. A 3D grid of AABBs is used in
practice.
used algorithm for this objects list computation for a triangle-only scene is
given in Listing 11.2.
function USSpreProcess()
INPUT scene: AABB, triangleList
OUTPUT USS: {AABB, N, objectList[N][N][N]}
// assumes N 3 as the grid resolution
{
for every T in triangleList
pmin = min coordinates of the three vertices of T
pmax = max coordinates of the three vertices of T
index_min=ivec3((pmin − AABB.min)/(AABB.max − AABB.min))
index_max=ivec3((pmax − AABB.min)/(AABB.max − AABB.min))
for i = index_min.x to index_max.x
for j = index_min.y to index_max.y
for k = index_min.z to index_max.z
append T to USS.objectList[i][j][k]
endfor
endfor
endfor
endfor
}
LISTING 11.2: USS preprocessing algorithm.
For USS-based ray tracing, the ray is first intersected with the AABB as-
sociated with the USS. If there is a valid intersection then the ray is marched
through the 3D grid of the USS, one voxel at a time. For each voxel it tra-
verses through, the ray is intersected with the list of triangles associated with
the voxel objectList. If a valid intersection point is found then that point rep-
332 Introduction to Computer Graphics: A Practical Learning Approach
resents the nearest point of intersection of the ray with the scene, and the
ray marching is terminated. Otherwise, the ray is marched through the next
voxel, and the process is repeated until the intersection is found or the ray
exits AABB. Two crucial steps of the ray marching operation are: find the
entry voxel index, and then march from the current voxel to the next voxel
along the ray. These operations must be computed accurately and efficiently.
Simple use of ray–AABB intersection tests to find the voxel of entry and then
finding the next voxel along the path will be highly inefficient. However, an
adaptation of that approach has been shown to be very efficient. We describe
it below.
Figure 11.3 shows the intersection points produced by this process in the 2D
case. For the ray whose origin is inside the volume of the USS, the starting
index values must be adjusted to guarantee the ray parameter values are
greater than zero. There are multiple ways of making this adjustment. An
iterative method for this adjustment is given in Listing 11.3.
function USS_ray_traverse()
OUTPUT t
{
while(i 6= ilimit && j 6= jlimit && k 6= klimit )
[t, intersectFlag] = ray_object_list_intersect(ray, objectList[i][j][k]);
if (intersectFlag == TRUE && pointInsideVoxel(ray.o+t∗ray.d,i,j,k)) ←-
// intesection found
return TRUE;
endif
if (tx,i < ty,j && tx,i < tz,j )
tx,i += ∆tx
i += ∆i
else if (t(y,j) < tz,k )
ty,j += ∆ty
j += ∆j
else
tz,k += ∆tz
k += ∆k
endif
endwhile
return FALSE
}
LISTING 11.4: An incremental algorithm for ray–USS traversal.
334 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 11.3: Efficient ray traversal in USS (shown in 2D). After computing
the first intersection parameters tx and ty , the ∆x and ∆y values are used to
incrementally compute the next tx and ty values.
The algorithm uses the ray object list intersect function where the ray is
intersected with each of the objects in the list and returns the nearest point of
intersection if there is one or more intersections. In USS an object is likely to
be part of multiple voxels. So the intersection of the ray with the object list
associated with the voxel may generate an intersection point outside the voxel,
and thus could generate an erroneous nearest point detection. This problem is
avoided by checking if the point is inside the current voxel or not. Like AABB,
the USS voxel boundaries are parallel to the axis plane. So the point-inside-
outside test is easily done by checking the coordinates of the point against the
coordinates of voxel bounds.
The resolution of the USS grid could be specified by the user directly,
or indirectly as average triangle density per voxel. The finer the resolution,
the faster the ray–intersection, but this requires longer pre-computation time,
a larger amount of memory for the USS storage, and also adds computa-
tional overhead for ray-tracing empty parts of the scene. The performance
of USS is good in scenes with homogeneous object distribution. However,
USS performance can be poor if the extents of the objects in the scene are
large and overlap. It also performs poorly for tracing scenes where small but
high-triangle-count objects are located, in scenes with large extent, for ex-
ample a teapot in a football field. To avoid this latter problem, nested USS
structures have been proposed. In the next section we describe a hierarchi-
cal structure-based ray tracing acceleration technique that better handles this
problem.
Global Illumination 335
BVHnode
{
AABB
objectList or // No object list for intermediate nodes.
partitionPlane // No Partition plane for leaf node.
}
The partitioning used during BVH creation may be used to sort the object
list (partition sorting), and replace the object list in the leaf node by an index
to the object and the number of objects in the list. In a BVH-based ray
acceleration scheme, ray–scene intersection starts with the intersection of the
ray with the root AABB of the BVH tree. If there is an intersection, the ray is
recursively intersected with the AABBs of the children nodes till the recursion
is complete. The order in which children nodes are intersected depends on
the ray direction and the plane of partition associated with the node. The
recursion is continued even if intersection is found. It is terminated when the
recursion stack is empty. In the case that an intersection is found with the leaf
node, the t of the ray is set and is used in the ray–AABB intersection test to
Global Illumination 337
function rayTraceRendering()
INPUT camera, scene
OUTPUT image
{
for row = 1 to rows
for col = 1 to cols
ray = getRay(row,col,camera)
image[row][col] = getColor(ray, scene)
endfor
endfor
}
LISTING 11.7: Fundamental ray-tracing algorithm.
The first function, getRay, computes the ray through every pixel of the
image array. How the ray is computed depends on the type of camera used; for
the standard pinhole camera used in rasterization-based rendering as discussed
earlier, the process is simple. We first compute the rays in camera space, where
the image window is parallel to the XY plane and located near units away
from the coordinate origin 0. Then we transform the ray from camera space to
−1
world space using inverse camera matrix (Mcamera ). For simplicity we assume
that the window is centered around the Z-axis, and the image origin is at the
Bottom-Left corner of the image window. Then the ray through the center of
the pixel (col,row) is computed as follows:
−1 T
ray.o = Mcamera (0, 0, 0)
T
−1 (col + 0.5) (row + 0.5)
ray.d = Mcamera w − 0.5 , h − 0.5 , −near
cols rows
where Mcamera = Prsp M , w and h are the width and height of the image
window, and the image resolution is cols × rows. The next function, getColor,
gets the color of the light coming along the ray towards the camera. This color
is assigned to the pixel. The exact computation carried out in this function
distinguishes one ray-traced rendering method from the other. The getColor
function may simply return the diffuse color of the object, or may evaluate
the direct lighting equation, as explained in Chapter 6, to get the color at the
point and return it. The color may be modulated with the texture queried
from the texture map associated with the surface. This process is called ray-
casting-based rendering. It creates images similar to those created in simple
rasterization based rendering. As the rendering using rasterization hardware
produces images at a much faster rate, it is uncommon to use ray casting
for the same purpose. Most ray-tracing-based rendering methods normally
include some form of global illumination computation to get color in their
getColor method, and a ray-tracing-based rendering method is distinguished
from the others based on what is done in its getColor method. Independent of
the methods, the first step in getColor is computing the nearest visible point
along the ray originating at the camera, and then computing the color at that
visible point. We detail the exact lighting computation technique used in two
Global Illumination 339
function getColor()
INPUT ray, scene
{
(t, object, intersectFlag) = raySceneIntersect(ray,scene)
if (intersectFlag==FALSE) return backgroundColor
color = black
for i=1 to #Lights
shadowRay = computeShadowRay(ray,t,object,scene,lights[i])
if (inShadow(t,ray,scene) == FALSE)
color += computeDirectLight(t,ray,scene.lights[i])
endif
endfor
if (isReflective(object)) // Interreflection support
newRay = reflect(ray,t,object)
color += object.specularReflectance ∗ getColor(newRay,scene)
endif
if (isRefractive(object)) // transparency support
newRay = refract(ray,t,object)
color += object.transparency ∗ getColor(newRay,scene)
endif
return color
}
LISTING 11.8: Algorithm for pixel color computation in classical ray-
tracing.
As we see from the algorithm, shadow, mirror reflection and transparency
are handled naturally by creating additional rays and tracing them. Unlike
in rasterization-based rendering, no complicated algorithmic effort is required
to support these features. The additional rays are termed secondary rays to
distinguish them from the rays originating at the camera, which are called
primary rays. For all the secondary rays, the origin is the point of intersection
of the primary ray with the scene, but the directions differ.
Shadow ray directions for point light sources are computed by taking the
vector difference of the shadow ray origin from the position of the light source.
This approach is also extended to area light sources by dividing the surface
of the area light source into smaller area patches, and computing the shadow
ray directions as the vector differences of the shadow ray origin from the
center of (or from randomly chosen points on) the patches. A surface point
340 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE 11.5: Path tracing. Every time a ray hits a surface a new ray is
shot and a new path is generated.
is under shadow if an object appears between the point and the light source,
which translates into finding the ti of a shadow ray–object intersection and
checking for the inequality 0 < ti < 1. This ray scene object intersection
test, and checking the value of t is done inside function inShadow. Note that
inShadow function does not require finding the nearest point of intersection.
As soon as an intersection satisfying the condition 0 < t < 1 is found, there
is no more need to continue with the intersection of the shadow ray with the
other objects of the scene. That is why sometimes it is preferable to write an
inShadow algorithm separately from the standard nearest point-finding ray-
Scene intersection algorithm. Furthermore, an object shadowing a point is
very likely to shadow the points in the neighborhood. So caching the nearest
shadow object information and first checking for the intersection of the shadow
ray for the neighboring primary rays with the cached object has been shown
to accelerate shadow computation.
Reflection and refraction of light at the nearest point along the primary ray
is handled, respectively, by setting the direction of the secondary ray to be the
mirror reflection and the refraction of the primary ray and making recursive
calls to getColor. The recursion stops when the ray does not intersect any
object in the scene, or hits an object that is non-reflective and non-refractive.
In a highly reflective/refractive closed environment the algorithm may get into
an infinite recursive loop. Though such scenes are uncommon, most algorithms
introduce a safety feature by keeping the count of recursive calls, and stopping
recursion after the recursion count reaches a certain predefined maximum
(often set at 5).
Classical ray tracing is capable of creating accurate rendering with global
illumination due to inter-reflection and inter-refraction of light in scenes with
mirror-like reflectors and fully or partly transparent objects. Classical ray
tracing has been extended to support inter-reflection in scenes with diffuse
and translucent surfaces.
Global Illumination 341
function pathTraceRendering()
INPUT camera, scene
OUTPUT image
{
for row = 1 to rows
for col = 1 to cols
for i = 1 to N
ray = getPathRay(row,col,camera)
image[row][col] = getPathColor(ray, scene)/N
endfor
endfor
}
function getPathColor()
INPUT ray, scene
{
(t, object, intersectFlag) = raySceneIntersect(ray,scene)
if (intersectFlag==FALSE) return backgroundColor
color = black
for i=1 to #Lights
shadowRay = computeShadowRay(ray, t,object,scene,lights[i])
if (inShadow(t,ray,scene) == FALSE)
color += computeDirectLight(t,ray,scene.lights[i])
endif
endfor
if (isReflective(object)) // Interreflection support
(newRay, factor) = sampleHemisphere(ray, t, object)
color += factor ∗ getPathColor(newRay, scene)
endif
return color
}
LISTING 11.9: Path tracing algorithm.
As mentioned in the early part of this section, we can see that path-tracing-
based rendering and classical rendering are very similar. The difference is in
computing the secondary reflection ray direction, which is done by Monte
342 Introduction to Computer Graphics: A Practical Learning Approach
function uniformHemisphereSample1()
INPUT object, p, ray
OUTPUT direction, factor
{
// Uniform sampling the canonical hemisphere
θ = arccos(rand())
φ = 2π rand()
sample_d = (sin(θ) cos(φ), sin(θ) sin(φ), cos(θ))T
// Let T , B , N are the unit tangent, bitangent, and
// normal vectors at the object point p
(direction, factor) = ([T B N ] ∗ sample_d, 2π cos(θ) (object.brdf(ray.d,←-
sample_d)) ) // Rotation
}
LISTING 11.10: Algorithm for uniformly sampling an arbitrarily oriented
unit hemisphere.
Another option would be to sample the whole sphere, and discard all those
vectors that make an angle greater than 90 degrees with the normal to the
surface at the point of interest. In fact the angle checking can be done by
checking for the sign of the dot product of the sample vector with the normal,
and accepting only those sample vectors that have a positive dot product.
As half of the directions are discarded this method may not be considered as
Global Illumination 343
function uniformHemisphereSample2()
INPUT object, p, ray
OUTPUT direction, factor
{
Let N be the unit normal vectors at the object point p
while(TRUE)
θ = arccos(1−2∗rand())
phi = 2π rand()
sample_d = (sin(θ) cos(φ), sin(θ) sin(φ), cos(θ))T
// Uniform sampling the canonical hemisphere
if (dot(N ,sample_d) > 0)
(direction, factor) = (sample_d, 2π dot(N ,sample_d) object.brdf(ray.d,←-
sample_d) ) // Rotation
endif
endwhile
}
LISTING 11.11: Rejection-based algorithm for uniformly sampling
an arbitrarily oriented unit hemisphere.
Cosine-Based Importance Sampling of Hemisphere: In this ap-
proach, sampled directions have a cosine θ distribution on the hemisphere
around the normal, which means the density of the samples are maximum
closer to the normal, and the density falls according to cosine function away
from the normal. Cosine-sampled directions are preferred over the uniform-
sampled direction in Monte Carlo lighting computation. It is mostly because
the color contributions brought in by the rays traced along directions away
from the normal are reduced by a factor of cos θ. So it is considered better to
sample according to cosine distribution and not reduce by a factor, than to
attenuate the color contribution by a factor of cos θ. As the distribution is very
much dependent on the angle the sampled direction makes with the normal
vector, we cannot use a full sphere sampling method here. Listing 11.12 shows
a popular cosine sampling method.
function cosineHemisphereSample()
INPUT object, p, ray
OUTPUT direction, factor
{
// Uniform sampling the canonical hemisphere
p
θ = arcsin( rand())
φ = 2π rand()
sample_d = (sin(θ) cos(φ), sin(θ) sin(φ), cos(θ))T
// Let T , B , N are the unit tangent, bitangent, and
// normal vectors at the object point p
(direction, factor) = ([T B N ] ∗ sample_d, π object.brdf(ray.d,sample_d) ) // ←-
Rotation
}
LISTING 11.12: Algorithm for cosine importance sampling a hemisphere.
344 Introduction to Computer Graphics: A Practical Learning Approach
requires tracing hundreds of millions of photons, and even with such high
numbers it is not uncommon to see noise in the parts that are indirectly visi-
ble to the light source. A lot of recent research has been devoted to increasing
the efficiency and quality of photon-traced rendering.
11.2.2 Radiosity
The radiosity method was introduced in 1984 with the goal to compute
inter-reflection of light in diffuse scenes. We have described the term radiosity
(conventional symbol B) in Section 6.2, as an alternate term for irradiance
and exitance. In the context of global lighting computation the term radiosity
refers to the method used to compute equilibrium irradiance in a scene. In
the radiosity method the propagation of light is modeled as a linear system
of equations relating the radiosity of each surface in the scene to the radiosity
of all other surfaces in the scene. The equilibrium radiosity over the surfaces
are computed by solving this linear system. In the next section we develop
this linear system and describe the various solution methods for solving this
system. Finally we describe methods for realistic rendering of scenes with a
precomputed radiosity solution.
The radiosity method is based on two important assumptions:
1. It is possible to break down the surfaces of any scene into smaller patches
in such a way that radiosity is uniform over the patch. So it is assumed
that surfaces of the scene have been broken down into a number of such
uniform radiosity patches.
2. The surface patches are flat and are diffusely reflecting in nature.
where angle (θi ) (θj ) is the angle formed by the line connecting the two patches
dAi and dAj and the normal at patch dAi (dAj ), VdAi ,dAj and RdAi ,dAj are
346 Introduction to Computer Graphics: A Practical Learning Approach
respectively the visibility and distance between the two differential patches.
The visibility term is binary in nature, and takes value one or zero depending
on whether the differential areas dAi , dAj are visible to each other or not.
We assumed that the radiance of the patches is constant over the surface
patch, and surfaces are diffuse. Under these conditions the flux and radiance
are related by expression L = Φ/(πA). So we will use this relation to replace
Li , the radiance of patch i, by Φi . Next we compute an expression for the
flux received by the total surface j due to light emitted from surface i. Here
we use the relation between irradiance and flux: dΦ = EdA and integrate the
differential flux over the whole area to get the following equation.
Z Z Z
2
Φi→j = Ej dAj = Φi /(πAi ) cos θi cos θj VdAi ,dAj dAi /RdA i ,dAj
Aj Aj Ai
(11.18)
Now we can write an expression of the fraction of the flux emitted by patch i
that reached patch j as
Z Z
2
Fi→j = Φi→j /Φi = 1/(πAi ) dAj cos θi cos θj VdAi ,dAj dAi /RdA i ,dAj
Aj Ai
(11.19)
The expression of this fraction contains terms that depend on the geome-
try and orientation and is independent of any lighting characteristics of the
Global Illumination 347
We notice that both the fractions are similar except for the area term in
the denominator of the right hand side. That means the fractions are related
to each other and the relation is: Ai Fi→j = Aj Fj→i . This relation becomes
useful in the derivation of the radiosity transport equation, and also is useful
during form factor computation because if we know Fi→j then we can get
Fj→i and vice versa by simply applying the relation.
Next we use the relation between form factors to replace Fj→i Aj /Ai by Fi→j
to get:
N
X
Bi = Bie + ρi Fi→j Bj . (11.24)
j=1
This equation is called the radiosity transport equation. Note that both the
transport equations are very similar. The only difference is the order in which
the patch indices are specified in each equation’s form factor term. In the flux
transport equation the order is from patch j to patch i, and in the radiosity
transport equation it is the other way around.
Both the transport equations are written for a patch i and are valid for
every patch i from 1 to N. So if we write expressions for all the is, then we get
a system of linear equations that expresses flux of every patch in the scene in
terms of flux of all other patches. We noticed earlier that form factor Fi→j (or
Fj→i ) depends on the orientation and location of the patches in the scene, and
hence can be computed independent of the lighting in the scene. So we should
be able to solve this linear system to get the equilibrium flux in the scene
due to inter-reflection of the emitted light. Before we describe the radiosity
solution methods, we describe a couple of methods for computing form factors.
Fi→j = 0
for N times
Let pi be a randomly sampled point on Patch i
Let ni be the normal to the patch at pi
Let pj be a randomly sampled point on Patch j
Let nj be the normal to the patch at pj
d = pj − p i
R = |d|
shadowRay = (pi ,d)
V = inShadow(ray, scene) ? 0:1
∆F = dot(d/R, nj ) dot(−d/R, nj ) (V /πR2 )
Fi→j += (∆F/N )
endfor
Fj→i = Fi→j (Ai /Aj )
LISTING 11.13: Algorithm for computing form factor between two
patches using Monte Carlo sampling.
As we notice here, we compute visibility between two points using the
inShadow function described in our ray tracing section. So the algorithm is
relatively straightforward. However, we must note that visibility by ray tracing
may require full ray-scene intersection, and the form factor must be computed
between every pair of the patches, so this method can be expensive.
Hemisphere/Hemicube Method: This is a class of methods that com-
putes form factor between a patch and the rest of the patches in the scene.
They are approximate in nature, because the computation is carried out at
only one point (mostly the center point) of the patch. So the form factor of
the patch is approximated to be the same as the form factor of the differential
surface around the point and rest of the scene. As the receiver surface is a dif-
ferential patch the form factor equation is simplified from a double integration
to a single integration.
Z Z
2
Fj→i ≈ Fj→dAi = cos θi cos θj VdAi ,dAj dAj /RdAi ,dAj
(11.25)
Aj
Furthermore, patches emitting towards another must all lie on the upper hemi-
sphere of the receiving patch. So if we are interested in computing form factor
between all those emitting patches, then we should in principle be integrating
only over the surface of the upper hemisphere. We write here an expression
for such a form factor.
Z Z X X
FHj →i ≈ cos θi VdAi ,dωj dωj ≈ cos θk Vi,k ∆ωk = Vi,k f actork
Hj k k
(11.26)
where Hj represents area of patch j visible to patch i through the unit hemi-
2
sphere over patch i, and dωj replaces cos θj dAj /RdA i ,dAj
. For a unit hemi-
sphere, R=1 and cos θj =1 as well. So the differential areas over the hemisphere
itself represent differential solid angles. However, it is not so when we replace
hemisphere by hemicube in the numerical method discussed below. The inter-
pretation of the visibility term is slightly different here: it represents whether
350 Introduction to Computer Graphics: A Practical Learning Approach
the patch j is visible through the differential solid angle. The summation terms
in the equations approximate the integration over Hj by the summation of
discrete subdivisions of the hemisphere, Vi,k is the visibility of patch j to patch
i through the solid angle k. The term f actork represents the analytical form
factor of the portion of hemispherical surface area subtended by the k-th dis-
crete solid angle. It is ∆ωj times the cosine of the direction through the k-th
solid angle, and can be computed a priori for specific discretizations. This ap-
proach of form factor computation requires that we find the visibility of patch
j through each discrete subdivision of the hemisphere. Instead of finding the
discrete subdivisions through which a patch j is visible, if we find the patch j
visible through each discrete subdivision k of the hemisphere then this latter
becomes the already discussed problem of nearest object finding along a ray.
And we can write an expression for computing only a fraction of the form
factor for a part of the patch j visible through the discrete solid angle as:
∆Fj→i ≈ ∆FHj →i = f actork (11.27)
P
and compute the form-factor for the whole patch j as Fj→i = δFj→i . Using
these formulations we can write the algorithm for computing the form factor
between the patches as given in Listing 11.14.
for all j
Fj→i = 0
endfor
method used to be much faster compared to ray tracing, and hence was one
of the earliest acceleration methods proposed for form factor computation.
form factor computation. Note that the form-factor matrix required N 2 stor-
age and a similar order of computation effort. However, we may notice that
the iterative solution proceeds with one patch (the brightest one) at a time.
So one may choose to compute only the form-factors from this patch to all
other patches using a hemisphere or hemicube-based method, and avoid the
N 2 computation and storage. However, this approach requires a small modi-
fication to the ∆B computation step of the shooting algorithm. The modified
step is X
∆B = ρi Bj Fj→i Aj /Ai . (11.28)
j=1
One disadvantage to using each and every patch for lighting computation
during the gathering process is that in a complex scene only a small fraction of
the scene patches are visible from any point in the scene. The contribution of
invisible surface patches to the illumination of the point is zero and thus any
computational effort spent in gathering light from such patches is wasteful. A
hemi-cube-based method removes this problem. In this method a virtual unit
hemi-cube is set up over the point of interest. Using the hardware Z-buffer
rendering method, the scene is projected onto the faces of this hemicube. The
Z-buffer algorithm eliminates invisible surfaces. Each pixel on the hemicube
represents a small piece of a visible surface patch and hence can be considered
as a light source illuminating the surface point at the center of the hemi-cube.
If the pixel is sufficiently small then the direct lighting due to the pixel can
be approximated by Bj Fpixel where Bj is the radiosity of the surface patch
j projected on the pixel, Fpixel = cos θπr 1 cos θ2
2 ∆A, ∆A is the area of the pixel
12
and is mostly the same for all hemicube pixels.
The cosine terms and the r term in the Fpixel expression depend on the
hemicube face on which the pixel lies and the coordinates of the pixel. For the
top face whose pixel coordinate is (x,y,1), Fpixel = π(x2 +y1 2 +1)2 ∆A. For the
side faces, the pixel coordinate is (±1,y,z) and Fpixel = π(1+yz2 +z2 )2 ∆A. For
both the front and back side the expression with pixel coordinate (x,±1,z)
z
is ∆Fpixel = π(x2 +1+z 2 )2 ∆A. In the gathering-based computation, only ra-
We have already seen in Section 2.5.1 how the framework is structured and
showed a few examples of access to the element of the scene. Here we will give
a more detailed description of the framework to serve as reference.
NVMC.Race.prototype = { ...
get bbox : function ...
get track : function ...
get tunnels : function ...
get arealigths : function ...
get lamps : function ...
get trees : function ...
get buildings : function ...
get weather : function ...
get startPosition : function ...
get observerPosition : function ...
get photoPosition : function ...
};
The class Race contains all the elements of the scene except the partici-
pants to the race (that is, the cars). These elements are: the bounding box
of the scene, track, tunnels, trees, buildings, area lights, lamps and weather.
Obviously more elements can be considered in the scene (for example, people,
flags, etc.), but our intention was not to realize a video game, but to learn
computer graphics, so we only introduced essential elements to implement the
techniques explained.
Bounding box
race.bbox = [];
bbox is an array of six floating points containing the minimun and maximum
corners of the bounding box of the scene: [minx,miny,minz,maxx,maxy,maxz].
It is guaranteed that every element of the scene lies inside this bounding box.
355
356 Introduction to Computer Graphics: A Practical Learning Approach
Track
Tunnels
race.tunnels = [];
Tunnel.prototype = {...
get leftSideAt : function ...
get rightSideAt : function ...
get pointsCount : function ...
get height : function ...
};
A tunnel is described just like a track, with one more member value indicating
the height of the entire tunnel.
Buildings
race.buildings = [];
Building.prototype = {...
get positionAt : function ...
get pointsCount : function ...
get heightAt : function ...
};
Trees
race.trees = [];
Tree.prototype = {...
get position : function ...
get height : function ...
};
Lamps
race.lamps = [];
NVMC Class 357
Lamp.prototype = {...
get position : function ...
get height : function ...
};
A streelamp, just like a tree, is described with its position on the ground and
its height.
Area lights
race.arealights = []
AreaLight.prototype = {...
get frame : function ...
get size : function ...
get color : function ...
};
Weather
Initial Positions
The last three members of the object Race are used for initialization
purposes and are: the car’s starting point position (startPosition), the pho-
tographer position (photoPosition) and the observerCamera starting point
(observerPosition).
A.2 Players
A player corresponds to a car and it has a PhysicsStaticState and a Physics-
DynamicState.
358 Introduction to Computer Graphics: A Practical Learning Approach
PhysicsStaticState.prototype ={
get mass : function ...
get forwardForce : function ...
get backwardForce : function ...
get brakingFriction : function ...
get linearFriction : function ...
};
PhysicsDynamicState.prototype = {
get position : function ...
get orientation : function ...
get frame : function ...
get linearVelocity : function ...
get angularVelocity : function ...
get linearAcceleration : function ...
};
Appendix B
Properties of Vector Products
This appendix regards the basic properties of vector products and their geo-
metric interpretation.
This equation is a pure algebraic definition where the term vector is intended
as a sequence of numbers. When we deal with a geometric interpretation where
the vectors are entities characterized by a magnitude and a direction we can
write:
a · b = kakkbk cos θ (B.2)
where θ is the angle formed by the two vectors. Equation (B.2) tells us a few
important things.
One of these things is that two non-zero-length vectors are perpendicular
to each other if and only if their dot product is 0. This is easy to verify: since
their length is not 0, the only condition for the dot product to be 0 is that
the cosine term is 0, which means θ = ±π/2. This notion also gives us a way
to find a non-zero vector perpendicular to a given one [see Figure B.1 (Left)]:
[. . . , ai , . . . , aj , . . .] · [0, . . . , 0, aj , 0, . . . , 0, −ai , 0, . . .] =
= ai aj − aj ai = 0
359
360 Introduction to Computer Graphics: A Practical Learning Approach
FIGURE B.1: Dot product. (Left) a0 and a00 are built from a by swapping
the coordinates and negating one of the two. (Right) Length of the projection
of b on the vector a.
the projection of b on a:
a
l =b·
kak
as shown Figure B.1 (Right).
The dot product fulfills the following properties:
• Commutative: a · b = b · a
• Distributive over vector addition: a · (b + c) = a · b + a · c
where i, j and k are interpreted as the three axes of the coordinate frame
where the vectors are expressed.
FIGURE B.2: Cross product. (Top-Left) The cross product of two vectors
is perpendicular to both and its magnitude is equal to the area of the paral-
lelogram built on the two vectors. (Top-Right) The cross product to compute
the normal of a triangle. (Bottom) The cross product to find the orientation
of three points on the XY plane.
362 Introduction to Computer Graphics: A Practical Learning Approach
Like for the dot product, we have a geometric interpretation of the cross
product:
ka × bk = kakkbk sin θ (B.3)
If we fix the length of two vectors, their cross product is maximum when
they are orthogonal and equals the scalar product of their lengths.
Two non-zero-length vectors are collinear if and only if their cross product
is 0. This is easy to verify: since their length is not 0, the only condition for
their cross product to be 0 is that the sin term is 0, which means θ = π ± π.
The cross product is typically used to find the normal of a triangular face.
Given a triangle T = (v0 , v1 , v2 ), we may define:
a = v1 − v0
b = v2 − v0
and hence:
N=a×b
Note that kNk = 2Area(T ) so that the magnitude of the normal corresponds
to the double of the area of the triangle (we used this property in Section 5.1.3
for expressing the barycentric coordinates).
One of the most used properties of cross product is antisymmetry, that is:
a × b = −b × a
[1] Tomas Akenine-Möller, Eric Haines, and Natty Hoffman. Real-Time Ren-
dering, 3rd edition. AK Peters, Ltd., Natick, MA, USA, 2008.
[2] B. G. Baumgart. A polyhedron representation for computer vision. In
Proc. AFIPS National Computer Conference, volume 44, pages 589–176,
1975.
[3] Louis Bavoil, Miguel Sainz, and Rouslan Dimitrov. Image-space horizon-
based ambient occlusion. In ACM SIGGRAPH 2008 talks, SIGGRAPH
’08, pages 22:1–22:1, New York, NY, USA, 2008. ACM.
[4] James F. Blinn. Models of light reflection for computer synthesized pic-
tures. In Proceedings of the 4th annual conference on computer graphics
and interactive techniques, SIGGRAPH ’77, pages 192–198, New York,
NY, USA, 1977. ACM.
[5] R. L. Cook and K. E. Torrance. A reflectance model for computer graph-
ics. ACM Trans. Graph., 1:7–24, January 1982.
[6] Xavier Décoret, Frédo Durand, François X. Sillion, and Julie Dorsey.
Billboard clouds for extreme model simplification. In ACM SIGGRAPH
2003 Papers, SIGGRAPH ’03, pages 689–696, New York, NY, USA, 2003.
ACM.
[7] Rouslan Dimitrov, Louis Bavoil, and Miguel Sainz. Horizon-split ambi-
ent occlusion. In Proceedings of the 2008 symposium on interactive 3D
graphics and games, I3D ’08, pages 5:1–5:1, New York, NY, USA, 2008.
ACM.
[8] Nyra Din, David Levin, and John A. Gregory. A 4-point interpolatory
subdivision scheme for curve design. Computer Aided Geometric Design,
4(4):257–268, 1987.
[9] Philip Dutre, Kavita Bala, Philippe Bekaert, and Peter Shirley. Advanced
Global Illumination. AK Peters Ltd, Natick, MA, USA, 2006.
[10] N. Dyn, D. Levin, and J. A. Gregory. A butterfly subdivision scheme for
surface interpolation with tension control. ACM Transaction on Graph-
ics, 9(2):160–169, 1990.
363
364 Bibliography
[11] Gerald Farin. Curves and Surfaces for CAGD. A Practical Guide, 5th
edition, AK Peters, Ltd, Natick, MA, USA, 2001.
[12] Michael S. Floater and Kai Hormann. Surface parameterization: a tuto-
rial and survey. In Neil A. Dodgson, Michael S. Floater, and Malcolm
A. Sabin, editors, Advances in Multiresolution for Geometric Modelling,
Mathematics and Visualization, pages 157–186. Springer, Berlin Heidel-
berg, 2005.
[13] Leonidas Guibas and Jorge Stolfi. Primitives for the manipulation of gen-
eral subdivisions and the computation of voronoi. ACM Trans. Graph.,
4(2):74–123, 1985.
[14] INRIA Alice: Geometry and Light. Graphite. alice.loria.fr/
software/graphite.
[15] Henrik Wann Jensen. Realistic image synthesis using photon mapping.
A. K. Peters, Ltd., Natick, MA, USA, 2001.
[16] Lutz Kettner. Using generic programming for designing a data structure
for polyhedral surfaces. Comput. Geom. Theory Appl, 13:65–90, 1999.
[17] Khoronos Group. OpenGL—The Industry’s Foundation for High Perfor-
mance Graphics. https://www.khronos.org/opengl, 2013. [Accessed
July 2013].
[18] Khronos Group. WebGL—OpenGL ES 2.0 for the Web. http://www.
khronos.org/webgl. [Accessed July 2013].
[19] Khronos Group. Khronos Group—Connecting Software to Silicon.
http://http://www.khronos.org, 2013. [Accessed July 2013].
[20] Leif P. Kobbelt, Mario Botsch, Ulrich Schwanecke, and Hans-Peter Sei-
del. Feature sensitive surface extraction from volume data. In Proceedings
of the 28th annual conference on computer graphics and interactive tech-
niques, pages 57–66. ACM Press, 2001.
[21] M. S. Langer and H. H. Bülthoff. Depth discrimination from shading
under diffuse lighting. Perception, 29:49–660, 2000.
[22] M. Langford. Advanced Photography: A Grammar of Techniques. Focal
Press, 1974.
[23] J. Lengyel. The convergence of graphics and vision. IEEE Computer,
31(7):46–53, 1998.
[24] Duoduo Liao. GPU-Based Real-Time Solid Voxelization for Volume
Graphics. VDM Verlag, 2009.
[25] Charles Loop. Smooth subdivision surfaces based on triangles. Master’s
thesis, University of Utah, Department of Mathematics, 1987.
Bibliography 365
[26] William E. Lorensen and Harvey E. Cline. Marching cubes: A high reso-
lution 3D surface construction algorithm. In Proceedings of the 14th an-
nual conference on computer graphics and interactive techniques, pages
163–169. ACM Press, 1987.
[27] Paulo W. C. Maciel and Peter Shirley. Visual navigation of large envi-
ronments using textured clusters. In 1995 Symposium on Interactive 3D
Graphics, pages 95–102, 1995.
[33] Jingliang Peng, Chang-Su Kim, and C. C. Jay Kuo. Technologies for
3d mesh compression: A survey. J. Vis. Comun. Image Represent.,
16(6):688–733, December 2005.
[34] Matt Pharr and Greg Humphreys. Physically Based Rendering: From
Theory to Implementation, 2nd edition, Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 2010.
[35] Bui Tuong Phong. Illumination for computer generated pictures. Com-
mun. ACM, 18:311–317, June 1975.
[36] R. Rashed. A pioneer in anaclastics: Ibn Sahl on burning mirrors and
lenses. Isis, 81:464–171, 1990.
[37] Erik Reinhard, Erum Arif Khan, Ahmet Oguz Akyüz, and Garrett M.
Johnson. Color Imaging: Fundamentals and Applications. AK Peters,
Ltd., Natick, MA, USA, 2008.
[38] Alla Sheffer, Bruno Lvy, Maxim Mogilnitsky, and Alexander Bo-
gom Yakov. Abf++ : Fast and robust angle based flattening. ACM
Transactions on Graphics, 2005.
366 Bibliography
[39] Marco Tarini. Improving technology for the acquisition and interactive
rendering of real world objects. Universitá degli Studi di Pisa, Pisa, Italy,
2003.
[40] Eric W. Weisstein. Quadratic surface from MathWorld –a Wolfram Web
resource. http://mathworld.wolfram.com/QuadraticSurface.html,
2013. [Accessed July 2013].
FIGURE 5.18: Adding the view from inside. Blending is used for the up-
per part of the windshield. (See client http://envymycarbook.com/chapter5/0/
0.html.)
FIGURE 6.9: Global illumination effects. Shadows, caustics and color bleed-
ing. (Courtesy of Francesco Banterle http://www.banterle.com/francesco.)
FIGURE 6.15: Scene illuminated with directional light. (See client http://
envymycarbook.com/chapter6/0/0.html.)
FIGURE 6.16: Adding point light for the lamps. (See client http://
envymycarbook.com/chapter6/1/1.html.)
FIGURE 7.9: Mipmapping at work. In this picture, false colors are used to
show the mipmap level used for each fragment.
FIGURE 7.13: Basic texturing. (See client http://envymycarbook.com/
chapter7/0/0.html.)
FIGURE 7.15: Using render to texture for implementing the rear mirror.
(See client http://envymycarbook.com/chapter7/1/1.html.)
FIGURE 7.19: Adding the reflection mapping. (See client http://
envymycarbook.com/chapter7/4/4.html.)
FIGURE 7.23: An example of how a normal map may appear if opened with
an image viewer.
FIGURE 7.26: (Top) An extremely trivial way to unwrap a mesh: g is
continuous only inside the triangle. (Bottom) Problems with filtering due to
discontinuities.
FIGURE 9.6: (Left) Positions of the lens flare in screen space. (Right) Ex-
amples of textures used to simulate the effect.
FIGURE 9.7: A client with the lens flare in effect. (See client http://
envymycarbook.com/chapter7/4/4.html.)
FIGURE 9.10: Billboard cloud example from the paper [6]. (Courtesy of
the authors). (Left) The original model and a set of polygons resembling it.
(Right) The texture resulting from the projections of the original model on
the billboards.
FIGURE 9.11: Snapshot of the client using billboard clouds for the trees.
(See client http://envymycarbook.com/chapter9/3/3.html.)
FIGURE 10.9: (Left) Original image. (Center) Prewitt filter. (Right) Sobel
filter.
FIGURE 10.10: Toon shading client. (See client http://envymycarbook.com/
chapter10/1/1.html.)
FIGURE 10.14: (Left) Original image. (Right) Image after unsharp mask-
ing. The Ismooth image is the one depicted in Figure 10.5; λ is set to 0.6.
FABIO GANOVELLI
MASSIMILIANO CORSINI
SUMANTA PATTANAIK
MARCO DI BENEDETTO
Features
• Puts computer graphics theory into practice by developing an
interactive video game
• Enables you to experiment with the concepts in a practical setting
• Uses WebGL for code examples, with the code available online
• Requires knowledge of general programming and basic notions of HTML
and JavaScript
K12432
ISBN: 978-1-4398-5279-8
90000
9 781439 852798
Computer Graphics