New Methods For Surface Reconstruction From Range Images
New Methods For Surface Reconstruction From Range Images
New Methods For Surface Reconstruction From Range Images
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
ii
I certify that I have read this dissertation and that in my opinion it is fully adequate,
in scope and in quality, as a dissertation for the degree of Doctor of Philosophy.
I certify that I have read this dissertation and that in my opinion it is fully adequate,
in scope and in quality, as a dissertation for the degree of Doctor of Philosophy.
Pat Hanrahan
I certify that I have read this dissertation and that in my opinion it is fully adequate,
in scope and in quality, as a dissertation for the degree of Doctor of Philosophy.
David Heeger
iii
iv
To Jelena
vi
Abstract
The digitization and reconstruction of 3D shapes has numerous applications in areas that
include manufacturing, virtual simulation, science, medicine, and consumer marketing. In
this thesis, we address the problem of acquiring accurate range data through optical triangulation, and we present a method for reconstructing surfaces from sets of data known as
range images.
The standard methods for extracting range data from optical triangulation scanners are
accurate only for planar objects of uniform reflectance. Using these methods, curved surfaces, discontinuous surfaces, and surfaces of varying reflectance cause systematic distortions of the range data. We present a new ranging method based on analysis of the time
evolution of the structured light reflections. Using this spacetime analysis, we can correct
for each of these artifacts, thereby attaining significantly higher accuracy using existing
technology. When using coherent illumination such as lasers, however, we show that laser
speckle places a fundamental limit on accuracy for both traditional and spacetime triangulation.
The range data acquired by 3D digitizers such as optical triangulation scanners commonly consists of depths sampled on a regular grid, a sample set known as a range image. A
number of techniques have been developed for reconstructing surfaces by integrating groups
of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers and distortions. Prior algorithms
possess subsets of these properties. In this thesis, we present an efficient volumetric method
for merging range images that possesses all of these properties. Using this method, we are
able to merge a large number of range images (as many as 70) yielding seamless, high-detail
models of up to 2.6 million triangles.
vii
viii
Acknowledgments
I would like to thank the many people who contributed in one way or another to this thesis. I owe a great debt to my adviser, Marc Levoy, who inspired me to enter the exciting
arena of computer graphics. The road to a Ph.D. is usually a bumpy one, and mine was no
exception. In the end, Marcs unerring intuitions and infectious enthusiasm sent me down
a very rewarding path. I would also thank the members of my reading committee, Pat Hanrahan and David Heeger, for their interest in this work. Early discussions with Pat while
developing the ideas about space carving in Chapter 6 were especially helpful.
Next, I would like to thank my colleagues in the Stanford Computer Graphics Laboratory. As members of the 3D Fax Project, Greg Turk, Venkat Krishnamurthy, and Brian Freyburger have been particularly supportive. Brian developed the ideas for space carving with
triangulation scanners (Section 6.4). In addition, discussions with Phil Lacroute were crucial to making the volumetric algorithm efficient (Section 7.2). Homan Igehy wrote the fast
rasterizer used for range image resampling (Section 7.2), and Matt Pharr wrote the accessibility shader used to create the rendering in Figure 8.6b. Bill Lorensen provided marching
cubes tables and mesh decimation software and managed the production of the 3D hardcopy.
Special thanks go to Charlie Orgish and John Gerth for maintaining sanity among the
machines (and therefore the students) in the graphics laboratory. Sarah Bezier and Ada
Glucksman cheerfully shielded me from the inner-workings of University bureaucracy.
Cyberware Laboratories provided the scanner used to collect data in this thesis. I am
particularly grateful to David Addleman and George Dabrowski at Cyberware for their assistance and encouragement. Funding for this work came from the National Science Foundation under contract CCR-9157767 and from Interval Research Corporation.
I would like to thank my parents, Richard and Diane, my brother, Chris, and my sisters,
ix
Ann and Laura. With their support, I knew I would persevere through the years. Finally,
my deepest gratitude goes to my wife, Jelena. She calmed me in moments of panic and had
faith in me during times of doubt. I dedicate this thesis to her.
Contents
v
Abstract
vii
Acknowledgments
ix
Introduction
1.1
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Reverse engineering . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2
Collaborative design . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3
Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4
1.1.5
1.1.6
Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.7
Home shopping . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
1.3.1
Range images . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2
Surface reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4
1.5
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
18
Triangulation Configurations . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1
2.2
2.3
3
2.1.2
Type of illuminant . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3
Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.4
Scanning method . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Geometric intuition . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2
2.2.3
2.2.4
. . . . . . . . . . . . . . . . . . . . . . 30
Spacetime Analysis
38
3.1
Geometric intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2
A complete derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3
3.4
3.5
3.5.2
3.5.3
3.5.4
3.5.5
Improving resolution . . . . . . . . . . . . . . . . . . . . . . . . . 58
60
4.1
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2
4.3
4.2.1
4.2.2
Results
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1
Reflectance correction . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.2
Shape correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.3
Speckle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.4
Complex objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xii
4.3.5
5
5.1.2
5.1.3
5.1.4
5.1.5
5.2
5.3
A probabilistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4
5.5
5.6
Calculus of variations
5.7
A minimization solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.8
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2
6.3
6.4
. . . . . . . . . . . . . . . . . . . . . 89
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
. . . . . . . . . . . . . . . 82
106
6.1.2
6.1.3
6.1.4
6.2.2
. . . . . . . . . . . . . . . . . . . . 121
6.3.1
6.3.2
6.3.3
6.4.2
6.4.3
139
7.1
7.2
7.2.2
7.2.3
7.2.4
7.2.5
7.2.6
7.3
7.4
Asymptotic Complexity
. . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.4.1
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.4.2
Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
157
8.1
8.2
8.3
Results
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Conclusion
168
9.1
9.2
9.3
9.3.2
9.3.3
175
B Stereolithography
178
Bibliography
182
xiv
List of Tables
3.1
7.1
8.1
xv
List of Figures
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.2
3.3
3.4
3.5
. . . . . . . . . . . . . . . . . . 40
3.6
3.7
3.8
3.9
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Reflectance card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2
5.3
5.4
5.5
5.6
5.7
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.15 Using the MIN() function to merge opposing distance ramps. . . . . . . . . 132
6.16 Limitations due to sharp corners. . . . . . . . . . . . . . . . . . . . . . . . 133
6.17 Intersections of viewing shafts for space carving. . . . . . . . . . . . . . . 134
6.18 Protrusion generated during space carving. . . . . . . . . . . . . . . . . . . 135
6.19 Errors in topological type due to space carving. . . . . . . . . . . . . . . . 135
6.20 Aggressive strategies for space carving using a range sensor with a single
line of sight per sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.21 Conservative strategy for space carving with a triangulation range sensor. . 138
6.22 Aggressive strategies for space carving with a triangulation range sensor. . 138
7.1
Overview of range image resampling and scanline order voxel updates . . . 140
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8.2
8.3
8.4
8.5
From the original to a 3D hardcopy of the Happy Buddha Part I of II. . 164
8.6
From the original to a 3D hardcopy of the Happy Buddha Part II of II. . 165
8.7
xix
xx
Chapter 1
Introduction
Methods to digitize and reconstruct the shapes of complex three dimensional objects have
evolved rapidly in recent years. The speed and accuracy of digitizing technologies owe
much to advances in the areas of physics and electrical engineering, including the development of lasers, CCDs, and high speed sampling and timing circuitry. Such technologies
allow us to take detailed shape measurements with precision better than 1 part per 1000 at
rates exceeding 10,000 samples per second. To capture the complete shape of an object,
many thousands, sometimes millions of samples must be acquired. The resulting mass of
data requires algorithms that can efficiently and reliably generate computer models from
these samples.
In this thesis, we address methods of both digitizing and reconstructing the shapes of
complex objects. The first part of the thesis is concerned with a popular range scanning
method known as optical triangulation. We show that traditional approaches to optical triangulation have fundamental limitations that can be overcome with a novel method called
spacetime analysis. The second part of this thesis concerns reconstruction of surfaces from
range data. Many rangefinders, including optical triangulation scanners, can acquire regular, dense samplings called range images. We describe a new method for building complex
models from range images in a way that satisfies a number of desirable properties.
In this chapter, we describe some of the applications of 3D shape acquisition (Section 1.1) followed by a description of the variety of acquisition methods (Section 1.2). In
1
CHAPTER 1. INTRODUCTION
Section 1.3, we pose the problem of surface reconstruction from range images. In Section 1.4, we place the goals of this thesis in the context of Stanfords 3D Fax Project. In
Section 1.5, we describe the contributions of this thesis, and in the last section we outline
the remainder of the thesis.
1.1 Applications
The applications of 3D shape digitization and reconstruction are wide-ranging and include
manufacturing, virtual simulation, scientific exploration, medicine, and consumer marketing.
1.1. APPLICATIONS
1.1.3 Inspection
After a manufacturer has created a computer model for a part either by shape digitization of
a physical model or through interactive CAD design, he has a variety of options for creating this part, both as a working prototype and as a starting point for mass manufacture. Ultimately, the dimensions of the final manufactured part must fall within some tolerances of the
original computer model. In this case, shape digitization can aid in determining where and
to what extent the computer model and the shape of the actual part differ. These differences
can serve as a guide for modifying the manufacturing process until the part is acceptable.
1.1.6 Medicine
Applications of shape digitization in medicine are wide ranging as well. Prosthetics can be
custom designed when the dimensions of the patient are known to high precision. Plastic
surgeons can use the shape of an individuals face to model tissue scarring processes and
CHAPTER 1. INTRODUCTION
visualize the outcomes of surgery. When performing radiation treatment, a model of the
patients shape can help guide the doctor in directing the radiation accurately.
Contact
Non-contact
CMM
Reflective
Transmissive
Industrial CT
Non-optical
Optical
Microwave radar
Sonar
Imaging Radar
Triangulation
Active stereo
Interferometry
Moire
Active depth
from defocus
Holography
CHAPTER 1. INTRODUCTION
Machines (CMMs) are extremely precise (and costly), and they are currently the standard
tool for making precision shape measurements in industrial manufacturing. The main drawbacks of touch probes are:
They are slow.
They can be clumsy to manipulate.
They usually require a human operator.
They must make contact with the surface, which may be undesirable for fragile objects.
Active, non-contact methods generally operate by projecting energy waves onto an object followed by recording the transmitted or reflected energy. A powerful transmissive
approach for shape capture is industrial computer tomography (CT). Industrial CT entails
bombarding an object with high energy x-rays and measuring the amount of radiation that
passes through the object along various lines of sight. After back projection or Fourier projection slice reconstruction, the result is a high resolution volumetric description of the density of space in and around the object. This volume is suitable for direct visualization or surface reconstruction. The principal advantages of this method over reflective methods are: it
is largely insensitive to the reflective properties of the surface, and it can capture the internal cavities of an object that are not visible from the outside. The principal disadvantages
of industrial CT scanners are:
They are very expensive.
Large variations in material densities (e.g., wood glued to metal) can degrade accuracy.
They are potentially very hazardous due to the use of radioactive materials.
Among active, reflection methods for shape acquisition, we subdivide into two more
categories: non-optical and optical approaches. Non-optical approaches include sonar and
microwave radar (RAdio Detecting And Ranging), which typically measure distances to objects by measuring the time required for a pulse of sound or microwave energy to bounce
back from an object. Amplitude or frequency modulated continuous energy waves can also
be used in conjunction with phase or frequency shift detectors. Sonar range sensors are
typically inexpensive, but they are also not very accurate and do not have high acquisition speeds. Microwave radar is typically intended for use with long range remote sensing,
though close range optical radar is feasible, as described below.
The last category in our taxonomy consists of active, optical reflection methods. For
these methods, light is projected onto an object in a structured manner, and by measuring the
reflections from the object, we can determine shape. In contrast to passive and non-optical
methods, many active optical rangefinders can rapidly acquire dense, highly accurate range
samplings. In addition, they are safer and less expensive than industrial CT, with the limitation that they can only acquire the optically visible portions of the surface. Several surveys
of optical rangefinding methods have appeared in the literature; the survey in [Besl 1989]
is especially comprehensive. These optical methods include imaging radar, interferometry,
active depth-from-defocus, active stereo, and triangulation.
Imaging radar is the same as microwave radar operating at optical frequencies. For large
objects, a variety of imaging radars have been demonstrated to give excellent results. For
smaller objects, on the order of one meter in size, attaining 1 part per 1000 accuracy with
time-of-flight radar requires very high speed timing circuitry, because the time differences
to be detected are in the femtosecond (10;12 second) range. A few amplitude and frequency
modulated radars have shown promise for close range distance measurements.
Interferometric methods operate by projecting a spatially or temporally varying periodic
pattern onto a surface, followed by mixing the reflected light with a reference pattern. The
reference pattern demodulates the signal to reveal the variations in surface geometry. Moire
interferometry involves the projection of coarse, spatially varying light patterns onto the
object, whereas holographic interferometry typically relies on mixing coherent illumination
with different wave vectors. Moire methods can have phase discrimination problems when
the surface does not exhibit smooth shape variations. This difficulty usually places a limit
on the maximum slope the surface can have to avoid ranging errors. Holographic methods
typically yield range accuracy of a fraction of the light wavelength over microscopic fields
CHAPTER 1. INTRODUCTION
of view.
Active depth from focus operates on the principal that the image of an object is blurred
by an amount proportional to the distance between points on the object and the in-focus
object plane. The amount of blur varies across the image plane in relation to the depths of
the imaged points. This method has evolved as both a passive and an active sensing strategy.
In the passive case, variations in surface reflectance (also called surface texture) are used to
determine the amount of blurring. Thus, the object must have surface texture covering the
whole surface in order to extract shape. Further, the quality of the shape extraction depends
on the sharpness of surface texture. Active methods avoid these limitations by projecting a
pattern of light (e.g., a checkerboard grid) onto the object. Most prior work in active depth
from focus has yielded moderate accuracy (up to one part per 400 over the field of view
[Nayar et al. 1995]).
Active stereo uses two or more cameras to observe features on an object. If the same
feature is observed by two cameras, then the two lines of sight passing through the feature
point on each cameras image plane will intersect at a point on the object. As in the depth
from defocus method, this approach has been explored as both a passive and an active sensing strategy. Again, the active method operates by projecting light onto the object to avoid
difficulties in discerning surface texture.
Optical triangulation is one of the most popular optical rangefinding approaches. Figure 1.2a shows a typical system configuration in two dimensions. The location of the center
of the reflected light pulse imaged on the sensor corresponds to a line of sight that intersects
the illuminant in exactly one point, yielding a depth value. The shape of the object is acquired by translating or rotating the object through the beam or by scanning the beam across
the object. Due to the finite width of the light beam, inaccuracies arise when the surface exhibits significant changes in reflectance and shape. The first part of this thesis is concerned
with describing the range accuracy limitations inherent in traditional methods, followed by
the introduction of a new method of optical triangulation, called spacetime analysis, which
substantially removes these limitations.
Surface
Object
Direction of travel
Laser
sheet
CCD
CCD
Laser
(a)
Cylindrical lens
Laser
(b)
(c)
Figure 1.2: Optical triangulation and range imaging. (a) In 2D, a narrow laser beam illuminates a
surface, and a linear sensor images the reflection from an object. The center of the image pulse maps
to the center of the laser, yielding a range value. (b) In 3D, a laser stripe triangulation scanner first
spreads the laser beam into a sheet of light with a cylindrical lens. The CCD observes the reflected
stripe from which a depth profile is computed. The object sweeps through the field of view, yielding
a range image. Other scanner configurations rotate the object to obtain a cylindrical scan or sweep
a laser beam or stripe over a stationary object. (c) A range image obtained from the scanner in (b) is
a collection of points with regular spacing.
observed distance to the surface as seen along the line of sight indexed by (j
k).
k) is the
CHAPTER 1. INTRODUCTION
10
that most closely approximates the points contained in the range images.
Note that in order to merge a set of range images into a single description of an object,
it is necessary to place them all in the same coordinate system; i.e., they must be registered
or aligned with respect to each other. The alignment may arise from prior knowledge of the
pose of the rangefinder when acquiring the range image, or it may be computed using one
1
Indeed, for some scanning technologies, points on the visible surface of an object may be unacquirable
due to the geometry of the sensor. Optical triangulation scanners, for example, cannot acquire concavities
inaccessible to a triangular probe with probe angle defined by the optical triangulation angle.
11
of a number of algorithms. We assume in this thesis that the range images are aligned to a
high degree of accuracy before we attempt to merge them.
A number of algorithms have been proposed for surface reconstruction from range images. Our experience with these algorithms has led us to develop a list of desirable properties:
Representation of range uncertainty. The data in range images typically have asymmetric error distributions with primary directions along sensor lines of sight. The
method of surface reconstruction should reflect this fact.
Utilization of all range data, including redundant observations of each object surface.
If properly used, this redundancy will provide a reduction of sensor noise.
Incremental and order independent updating. Incremental updates allow us to obtain
a reconstruction after each scan or small set of scans and allow us to choose the next
best orientation for scanning. Order independence is desirable to ensure that results
CHAPTER 1. INTRODUCTION
12
are not biased by earlier scans. Together, incremental and order independent updating
allows for straightforward parallelization.
Time and space efficiency. Complex objects may require many range images in order to build a detailed model. The range images and the model must be represented
efficiently and processed quickly to make the algorithm practical.
Robustness. Outliers and systematic range distortions can create challenging situations for reconstruction algorithms. A robust algorithm needs to handle these situations without catastrophic failures such as holes in surfaces and self-intersecting surfaces.
No restrictions on topological type. The algorithm should not assume that the object is
of a particular genus. Simplifying assumptions such as the object is homeomorphic
to a sphere yield useful results in only a restricted class of problems.
Ability to fill holes in the reconstruction. Given a set of range images that do not completely cover the object, the surface reconstruction will necessarily be incomplete.
For some objects, no amount of scanning would completely cover the object, because
some surfaces may be inaccessible to the sensor. In these cases, we desire an algorithm that can automatically fill these holes with plausible surfaces, yielding a model
that is both watertight and esthetically pleasing.
Prior algorithms possess subsets of these properties. In this thesis, we present an efficient volumetric method for merging range images that possesses all of these properties.
13
Cyberware scanner
Cyberware analysis
Spacetime analysis
Alignment
Aligned meshes
Zippering
Volumetric grid
Seamless mesh
Mesh simplification
Surface fitting
Fabrication
CHAPTER 1. INTRODUCTION
14
Figure 1.5: A photograph of the Cyberware Model MS 3030 optical triangulation scanner used for
acquiring range images in this thesis.
The input device for this project is an optical triangulation scanner manufactured by Cyberware Laboratories (pictured in Figure 1.5). The scanner performs triangulation in hardware using a traditional method of analysis which is prone to range errors. Alternatively,
we can apply a triangulation method called spacetime analysis to minimize these range
artifacts, as described in this thesis.
The range data obtained from the scanner is in the form of a dense range image. By
repeatedly scanning an object from different points of view, we obtain a set of range images that are not necessarily aligned with respect to one another. Using an iterated closest
point algorithm developed in [Besl & McKay 1992] and modified for aligning partial shape
digitizations in [Turk & Levoy 1994], we transform the scans into a common coordinate
system.
After alignment, the range images provide the starting point for performing a surface
reconstruction. One method for merging the range images into a single surface is called
zippering. This method operates directly on triangle meshes, and while it is well-behaved
for relatively smooth surfaces, it has been show to fail in regions of high curvature. An
1.5. CONTRIBUTIONS
15
1.5 Contributions
The contributions of this thesis are two-fold. First, we describe and demonstrate a new
method for optical triangulation, known as spacetime analysis. We show that this new
method yields:
Immunity to reflectance variations
Immunity to shape variations
CHAPTER 1. INTRODUCTION
16
We also show that the influence of laser speckle remains a fundamental limit to the accuracy of both traditional and spacetime triangulation methods.
Second, we describe a new, volumetric algorithm for building complex models from
range images. We demonstrate the following:
The method is incremental, order independent, represents sensor uncertainty, and behaves robustly.
The method is optimal under a set of reasonable assumptions.
Extending the method to qualify the emptiness of space around an object permits us
to construct hole-free models.
Through careful choice of data structures and resampling methods, the method can be
made time and space efficient.
Using the volumetric algorithm to combine the range images generated with our optical
triangulation method, we have constructed the most complex computer models published
to date. In addition, because these models are hole-free, we are able demonstrate their manufacturability using a layered manufacturing method called stereolithography.
1.6 Organization
Chapter 2 begins with a more detailed description of optical triangulation methods. We then
characterize the primary sources of error in traditional optical triangulation and conclude
with a discussion of some previous work.
Next, in Chapter 3, we develop a new method for optical triangulation called spacetime
analysis. We show how this method corrects for errors due to changes in reflectance and
shape, but is still limited in accuracy by laser speckle.
In Chapter 4, we describe an efficient implementation of the spacetime method, and we
demonstrate results using an existing optical triangulation scanner modified to allow digitization of the imaged light reflections. Portions of Chapters 2-4 are described in [Curless &
Levoy 1995].
1.6. ORGANIZATION
17
The next four chapters concern the problem of surface reconstruction from range images. In Chapter 5, we describe some previous work in the area, and then provide a mathematical framework that motivates a new method for surface reconstruction. This new
method shows that the maximum likelihood surface is an isosurface of a function in 3-space.
In Chapter 6, we develop a new volumetric algorithm for surface reconstruction from
range images. This algorithm utilizes the mathematical framework described in the previous chapter and extends it to allow description of the occupancy of space around the surface.
This extension leads to a method for generating models without boundaries. In addition, we
discuss some of the limitations of the volumetric method and suggest some possible solutions.
The potential storage and computational costs of the volumetric method require implementation of an efficient algorithm. In Chapter 7, we describe one such algorithm and then
analyze its asymptotic complexity.
Next, in Chapter 8, we demonstrate the results of reconstructing range surfaces using the
volumetric method and the data acquired with the scanning system described in Chapter 4.
Portions of Chapters 5-8 are also described in [Curless & Levoy 1996].
Finally, in Chapter 9, we summarize the contributions of this thesis and describe some
areas of future work.
Chapter 2
Optical triangulation: limitations and
prior work
Active optical triangulation is one of the most common methods for acquiring range data.
Although this technology has been in use for over two decades, its speed and accuracy has
increased dramatically in recent years with the development of geometrically stable imaging
sensors such as CCDs and lateral effect photodiodes.
Researchers and manufacturers have used optical triangulation scanning in a variety of
applications. In medicine, optical triangulation has provided range data for plastic surgery
simulation, offering safer, cheaper, and faster shape acquisition than conventional volumetric scanning technologies [Pieper et al. 1992]. In industry, engineers have used triangulation
scanners for applications that include postal package processing [Garcia 1989] and printed
circuit board inspection [Juha & Souder 1987]. Triangulation scanners also provide data to
drive computer graphics applications, such as digital film-making [Duncan 1993].
Figure 2.1 shows a typical system configuration in two dimensions. The location of the
center of the reflected light pulse imaged on the sensor corresponds to a line of sight that
intersects the illuminant in exactly one point, yielding a depth value. The shape of the object
is acquired by translating or rotating the object through the beam or by scanning the beam
across the object.
In this chapter, we begin with an overview of optical triangulation configurations (Section 2.1). Next, we discuss the limitations of traditional methods in both qualitative and
18
19
Surface
Range
point
Imaging lens
Sensor
Illuminant
Figure 2.1: Optical triangulation geometry. The illuminant reflects off of the surface and forms an
image on the sensor. The center of the illuminant maps to a unique position on the sensor based on
the depth of the range point. In order for all points along the center of the laser sheet to be in focus,
the angles and are related to one another by Equation 2.1.
a quantitative contexts (Section 2.2). Finally, we describe previous efforts to evaluate and
correct for the errors inherent in traditional triangulation methods (Section 2.3).
lens, a light stripe can be projected onto an object to collect a range profile (Figure 1.2b).
Researchers have also tried projecting multiple spots or multiple stripes onto an object for
more parallelized shape acquisition, though multiple steps are usually required to disambiguate the light reflections. When imaging the reflected light onto a sensor with lenses, the
single point and stripe illuminants offer the advantage that, at any instant, all intersections
of the light with the object must lie in a plane. Since lenses image points in a plane to points
in another plane, the sensor can be oriented to keep the beam or sheet of light in focus, thus
reducing depth of field problems. When the focal plane is tilted, the image plane must also
be tilted so as to satisfy the Scheimpflug condition [Slevogt 1974]:
tan = M tan
where
and
(2.1)
are the tilt angles of the focal and image planes, respectively, as shown in
Figure 2.1, and M is the magnification on the optical axis. The resulting triangulation ge-
ometry has the property that the focal, image, and lens planes all intersect in a single line.
Note that multi-point and multi-stripe systems generally cannot take advantage of this optimal configuration, because the illumination usually does not lie in a plane.
21
2.1.3 Sensor
Sensors for optical triangulation systems also come in a variety of forms. For narrow
beam illumination, point (zero-dimensional) sensors such as photodiodes or line (onedimensional) sensors such as lateral effect photodiodes and linear array CCDs are sufficient, though point sensors must be scanned to provide another dimension. For light stripe,
multi-point, and multi-stripe systems, a two-dimensional sensor is necessary and typically
comes in the form of a CCD array, though point and line sensors can also be scanned to
provide the required dimensions.
position on the sensor which hopefully maps to the center of the illuminant. Typically, researchers have opted for a statistic such as mean, median or peak of the imaged light as
representative of the center. Each of these statistics gives the correct answer when the surface is perfectly planar. In this case, the sensor records a compressed or expanded image of
the illuminants shape, depending on the orientation of the surface with respect to the illuminant and sensor. The location of the center of the imaged pulse is not altered under these
circumstances.
23
Surface
Surface
Range
point
Range
point
Sensor
Sensor
Illuminant
Illuminant
(a)
(b)
Figure 2.2: Range errors due to reflectance discontinuities. (a) The change in reflectance from light
gray to dark gray distorts the image of the illuminant, perturbing the mean and resulting in an erroneous range point. (b) Same as (a), but the reflectance step is larger, causing a larger range error.
Surface
Surface
Range
point
Range
point
Sensor
Illuminant
(a)
Sensor
Illuminant
(b)
Figure 2.3: Range errors due to shape variations. (a) When reflecting off of the surfaces at the corner
shown, the left half of the Gaussian is much more compressed than the right half. The result is a shift
in the mean and an erroneous range point. (b) When the illuminant falls off the edge of an object,
the sensor images some light reflection. In this case, a range point is found where there is no surface
at all along the center of the illuminant.
Surface
Range
point
Sensor
Illuminant
Figure 2.4: Range error due to sensor occlusion. A portion of the light reflecting from the object is
blocked before reaching the sensor. The mean yields an erroneous range point, even when the center
of the illuminant is not visible.
occluding the line of sight between the illuminated surface and the sensor. This range error
is very similar to the error encountered in Figure 2.3b.
The fourth source of range error is laser speckle, which arises when coherent laser illumination bounces off of a surface that is rough compared to a wavelength [Goodman 1984].
The surface roughness introduces random variations in optical path lengths, causing a random interference pattern throughout space and at the sensor. Figure 2.5 shows a photograph
of laser speckle arising from a rough surface1 . For triangulation systems, the result is an
imaged pulse with a noise component that affects the mean pulse detection, causing range
errors even from a planar target (see Figure 2.6).
Although optically smooth surfaces (i.e., mirrors) do not introduce laser speckle, they are also extremely
hard to measure. Mirrored surfaces only reflect light to the sensor when the surface is properly oriented. Further, mirrored surfaces generally result in interreflections that significantly complicate range analysis. As a
result, diffusely reflecting surfaces are desirable for range scanning. Diffuseness arises from surface roughness, which in turn leads to laser speckle.
25
Figure 2.5: Image of a rough object formed with coherent light. Source: [Goodman 1984].
Surface
r
Aperture
Lens
ci
Laser
Sensor
Figure 2.6: Influence of laser speckle on traditional triangulation. The image of the Gaussian is noisy,
causing a random shift in the position of the mean, ci . The uncertainty in the means position, ci
maps to an uncertainty in the observed range, r . Note that the direction of range uncertainty follows
the center of the laser beam.
3.0
2.5
2.0
1.5
1.0
0.5
0.0 |
1
theta = 10 deg
theta = 20 deg
theta = 30 deg
theta = 40 deg
|
2
|
3
|
4
|
5
|
|
|
|
|
6 7 8 9 10
Reflectance Ratio
Figure 2.7: Plot of errors due to reflectance discontinuities. As the size of the reflectance step (represented as the ratio of reflectances on either side of the step), the deviation from planarity increases
accordingly. Smaller triangulation angles (theta) exhibit greater errors.
2
EL (x) = IL exp ;w2x2
"
(2.2)
where IL is the power of the beam at its center, and w is a measure of beam width, taken to
be the distance between the beam center and the e;2 point. This measure of beam width is
common in the optics literature. We present the range errors in a scale invariant form by dividing all distances by the beam width. Figure 2.7 illustrates the maximum deviation from
planarity introduced by scanning a reflectance discontinuity of varying step magnitudes for
four different triangulation angles. As the size of the step increases, the error increases correspondingly. In addition, smaller triangulation angles, which are desirable for reducing the
likelihood of missing data due to sensor occlusions, actually result in larger range errors.
1.50
1.25
1.00
0.75
0.50
0.25
0.00 |
60
|
80
|
100
27
|
|
|
|
120 140 160 180
Corner Angle (Degrees)
Figure 2.8: Plot of errors due to corners. The y-axis indicates the closest distance between the range
data and the actual corner, while the x-axis measures the angle of the corner. Tighter corners result
in greater range errors.
This result is not surprising, as sensor mean positions are converted to depths through a division by sin , where is the triangulation angle, so that errors in mean detection translate
to larger range errors for smaller triangulation angles.
Figure 2.8 shows the effects of a corner on range error, where the error is taken to be the
shortest distance between the computed range data and the exact corner point. The corner is
oriented so that the illumination direction bisects the corners angle as shown in Figure 2.3a.
As we might expect, a sharper corner results in greater compression of the left side of the
imaged Gaussian relative to the right side, pushing the mean further to the right on the sensor
and pushing the triangulated point further behind the corner. In this case, the triangulation
angle has little effect as the greater mean error to depth error amplification is offset almost
exactly by the smaller observed left/right pulse compression imbalance.
We can estimate what the absolute values of these errors are for a typical triangulation
system. For example, the triangulation angle of our scanner hardware is 30o , and the laser
width is about 1 mm across a depth of 0.3 m. For this configuration, we would expect errors
of 0.8 mm for a 10:1 reflectance step and 0.5 mm for a corner angle of 110o .
Lens
wnar (z )
wwide (z )
wopt (z )
wo
zR
Collimated
beam
p2w
zD
Figure 2.9: Gaussian beam optics. A collimated beam is focused by the lens to a width, wo , and
spans a desired depth of field zD . Bringing the beam into tight focus results in a rapidly expanding
beam (wnar z ). A wide beam expands slowly (wwide z ), but may be too wide over the desired
over the depth of field.
depth of field. The optimal beam (wopt z ) expands only by a factor of
For this optimal configuration, the distance between the beam waist and the
point is called the
Rayleigh range, zR .
()
()
()
p2
p2
w(z) = wo 1 +
z
wo2
where wo is the beam waist or width of the beam at its narrowest point, z is the distance from
this point, and is the wavelength of the laser. Figure 2.9 shows how the beam expands for
various beam waists. Note that as the waist becomes narrower, the beam expands faster. For
a given depth of field, zD , we want the beam to be as tight as possible to limit the effects
of triangulation errors. If the waist is too narrow, then the beam expands too rapidly and
becomes too large at the edges of the field of view. If the waist is too large, it expands slowly,
29
but is still too large at the edges of the field of view. The optimum beam width is attained
when the waist obeys the relation:
s
wo =
zD
2
corresponds to what is known as the Rayleigh range of the beam. The variation in width is
thus:
s
zD
2
zD
The best width to range ratio at the edges of the field of view is:
w =
zD
zD
(2.3)
The discussion of range errors due to reflectance and shape variations in the previous
section shows that the errors of an optical triangulation system are on the order of the width
of the illuminant. Thus, the beam width to range ratio is a measure of the relative accuracy
of the system. Equation 2.3 tells us that once we have selected a range of interest, diffraction limits impose a bound on the relative accuracy in the presence of reflectance and shape
variations.
Note that Bickel et al. [1985] have explored the use of axicons (e.g., glass cones and
other surfaces of revolution) to attain tighter focus of a Gaussian beam. The refracted beam,
however, has a zeroth order Bessel function cross-section; i.e., it has numerous side-lobes
of non-negligible irradiance. The influence of these side-lobes is not well-documented and
would seem to complicate triangulation. In addition, once the sensor resolution is fixed,
then arbitrarily narrowing the beam actually becomes undesirable. If the image of the beam
spans only one or two pixels, then the computed mean will not attain sub-pixel accuracy.
See Section 3.5 for a discussion of the beam width in a signal processing context.
x) and (x) are the amplitude and phase of the wave at position x, and
where U (
is the
and Re ] is the real part of its argument. When solving problems in scalar diffraction theory,
U(x), because the time dependence is the same everywhere and is known a priori. Researchers sometimes refer to U(x) as the field amplitude,
though it should be remembered that it is a complex quantity. Note that when taking physical measurements, we evaluate the intensity of a field, given by:
31
yo
Aperture
yi
Lens
xo
Object Plane
xi
x
z
Image plane
do
di
where
(2.5)
Uo is the field in the object plane, and M is the magnification factor. The amplitude
h(xi yi) =
where P (x
ZZ
;1
(2.6)
y) describes the shape of the pupil (usually a circular disc), di is the distance to
the image, and is the optical wavelength. For a circular lens aperture, the pupil function is
unity with a radius of a and zero outside. The amplitude spread function for such an aperture
Surface
Image plane
De-phased
amplitude
spread
functions
Lens
Figure 2.11: Speckle due to diffraction through a lens. The image of the surface is filtered by the amplitude spread function. Because the surface is rough compared to a wavelength, the image contains
noisy phase terms that influence nearby points on the surface through the amplitude spread function.
The result is laser speckle. Source: [Goodman 1984].
resembles a circularly symmetric sinc function2, where the first zero crossing is at:
r 0:61 adi
Note that the imaging equations (2.4 - 2.6) relate object and image fields, not intensities.
The intensities must be computed by finding the square magnitude of the fields.
Speckle arises in an image when the surface is rough compared to the optical wavelength. The result is random variations in the relative phases of imaged points. If the imaging system were to exhibit no diffraction effects (i.e., if the amplitude spread function were
an impulse function), then these phases would have no impact, because they would vanish
when computing the intensities. However, real imaging systems do exhibit diffraction effects, and the amplitude spread function serves to distribute the random variations across
the image plane. The result is laser speckle. Figure 2.11 illustrates the this phenomenon.
In the remainder of this section, we attempt to quantify the effect of laser speckle on
2
sinc( )
sin(
x)
33
range accuracy. To do so, we consider the case of a two-dimensional world, which permits
us to remove references to the y coordinate in the equations above. This simplification will
aid in the discussion of the issues of laser speckle, but will not significantly alter the basic results. Note that in this two-dimensional world, the amplitude spread function for an
imaging aperture is simply the sinc function. In addition, we will treat the triangulation
imaging geometry as though it were an axially aligned imaging geometry. This approximation introduces some error, because some of the imaged light originates from regions outside
of the plane in focus (the center of the illuminant), but for narrow illuminants, this error is
negligible [Francon 1979].
For a Gaussian illuminant incident on the surface of an object, the field at the surface of
the object takes the form:
(2.7)
where Go (xo ) is the Gaussian illuminant and o (xo ) represents the random phase variations
introduced by the roughness of the surface. We can now write the field at the image plane
as:
(2.8)
Equations 2.7 - 2.8 describe the physical image of a Gaussian illuminant after experiencing random phase variations and diffraction filtering. Errors in determining the mean
of this image result in range errors in triangulation systems using traditional analysis. The
mean is defined by:
ci =
xiIi(xi)dxi
;1
1
Ii(xi)dxi
;1
(2.9)
For a centered Gaussian, this mean should be exactly zero; however, the random phase variations introduce statistical variations in the mean that we can quantify as:
ci
(2.10)
where the expected value of the mean is assumed to be zero. Figure 2.6 illustrates the influence of laser speckle in a triangulation system. The error in the mean maps to a range error
through the relation:
c
=M
sin
In other words, once we have computed the variations in detecting the mean at the sensor, we
must map it into a range error through multiplication by the imaging systems magnification
factor, M , and division by the triangulation factor, sin .
Computing the standard deviation described by Equations 2.9 and 2.10 analytically has
proven to be extremely difficult. Instead, we have performed numerical simulations. These
simulations require some characterization of the phase function, o (xo ), which describes
the roughness of the surface. Researchers typically model this function to be proportional
to the height profile,
o (xo ) =
This
2 (x )
o
point on the surface and a Gaussian autocorrelation spectrum. The pointwise statistics are
thus parameterized by the surface height variance,
correlation length,
four dimensional: a function of the width of the laser, the width of the point spread function,
the surface roughness and the correlation length. To date, we have thoroughly studied the
35
deviation of the mean for surfaces rough compared to a wavelength with zero correlation
length, and we have arrived at a qualitative understanding of the behavior when the surface
roughness parameters are varied. For uncorrelated rough surfaces, the variation of the mean
follows the relation:
s
ci
1 wi di
2 2a
where wi is the width of the image of the Gaussian illuminant, di is the distance from the
lens to the aperture, and a is the aperture radius. Mapping this into a range error we obtain:
r
Noting that w
M
2 sin
wi di
2a
= Mwi is the width of the illuminant in object space, and do = Mdi is the
1
2 sin
w do
2a
(2.11)
This relation yields several insights. First of all, the range error is fundamentally independent of the resolution of the sensor; it is an inescapable consequence of the speckle phenomenon. Secondly, we expect that narrowing the aperture will increase the speckle noise
and thereby increase the range error. Finally, the range error varies with the square root of
the width of the laser as seen from the sensor. Narrowing the laser ought to decrease range
noise.
Numerical experiments with various surface roughness and correlation length parameters have yielded some quantitative results that follow qualitative expectations. In summary,
as the correlation length grows, we observe a decrease in the computed range error. Similarly, as the surface roughness decreases, the range error also decreases. This result is not
surprising; the limiting case of a perfectly smooth planar surface should result in no laser
speckle and thus no range error. Increasing the correlation length and decreasing the roughness bring the results closer to the perfectly smooth case.
To give a sense for the size of the range errors, we can substitute some typical values
into Equation 2.11. The triangulation system used for experiments in Chapter 4 has a 30
degree triangulation angle, an aperture radius of about 5 mm, a stand-off of 1 meter and
uses a HeNe laser source ( = 632.8 nm). We would therefore expect the range error due
to reflections from a perfectly rough surface to be approximately 0.25 mm. This error
corresponds to roughly half the resolution of the sensor being used. Real surfaces, however,
are not perfectly rough, so we would expect smaller range errors in practice.
37
Chapter 3
Spacetime Analysis
Many optical triangulation scanners operate by sweeping the illumination over an object
while imaging the reflected light over time. The previous chapter clearly demonstrates that
analyzing the imaged pulses at each time step leads to systematic range errors. In this chapter, we describe a new method for optical triangulation called spacetime analysis. We show
that by analyzing the time evolution of the imaged pulses, this new method can reduce or
eliminate range errors associated with traditional methods.
We begin with an intuitive description of how the spacetime analysis works (Section 3.1), followed by a rigorous derivation of its properties under a set of assumptions such
as linear scanning and orthographic sensing (Section 3.2). We then describe how the method
generalizes when weakening or removing these assumptions (Section 3.3). In Section 3.4,
we consider the role of laser speckle in spacetime analysis and show that it still imposes a
limit on triangulation accuracy. Finally, we put optical triangulation scanners into a signal
processing context and draw conclusions about how to improve them (Section 3.5).
39
Surface
(xc
zc )
t1
t2
t1
tc
t3
t4
Sensor
t2
tc
t3
t4
(tc
sc )
Illuminant
Figure 3.1: Spacetime mapping of a Gaussian illuminant. The Gaussians at left represent the profile
of the illuminant at times t1 through t4 . The solid Gaussians at right represent the images of the
illuminant at these same times. The dashed Gaussian represents the time evolution of these images
for a single line of sight, corresponding to a single point on the targeted surface. Although the solid
Gaussians at right may be distorted or clipped by depth discontinuities or reflectance changes in the
surface, the dashed Gaussian is always the same shape (unless it is missing entirely), and it is centered
at a location that corresponds exactly to a range point. These properties of the dashed Gaussian are
the keys to the spacetime analysis.
conventional methods of range estimation fail. However, if we look along the lines of sight
from the corner to the laser and from the corner to the sensor, we see that the profile of the
laser is being imaged over time onto the sensor (indicated by the dotted Gaussian envelope).
Thus, we can find the coordinates of the corner point (xc zc ) by searching for the mean of
a Gaussian along a constant line of sight through the sensor images. We can express the
coordinates of this mean as a time and a position on the sensor, where the time is in general
between sensor frames and the position is between sensor pixels. The position on the sensor
indicates a depth, and the time indicates the lateral position of the center of the illuminant.
In the example of Figure 3.1, we find that the spacetime Gaussian corresponding to the exact
corner has its mean at position sc on the sensor at a time tc between t2 and t3 during the scan.
We establish the corners position by triangulating the center of the illuminant at time tc
40
!L
Surface
n p
x
!S
Sensor
Illuminant
with the sensor line of sight corresponding to the sensor coordinate sc and the lateral sensor
position at tc .
Figure 3.2 and Table 3.1 for a description of coordinate systems and symbols. Note that in
contrast to the previous section, the surface element is translating instead of the illuminantsensor assembly. The element has a normal
p(t) = po + tv
p
Our objective is to compute the coordinates o = (xo zo ) given the temporal irradiance
variations on the sensor. For simplicity, we assume that = (;v 0). The illuminant we
2
LL(x !) = IL exp ;w2x2 (! ; !L)
"
(3.1)
O
p(t)
po
n
v
!L
!S
41
where IL is the power of the beam at its center, w is the e;2 half width, and the delta function
indicates the directionality of the beam. In accordance with Figure 3.2, we will assume that
!L = (0 1).
The differential reflected radiance from the element in the direction of the
sensor is simply:
(3.2)
sume that the BRDF is defined in a global sense for the surface element, rather than relative
to its normal. Further, we treat incoming directions as pointing toward the surface rather
than away from the surface, which is consistent with the description of the radiance field in
terms of position and direction.
Substituting Equation 3.1 into Equation 3.2 and integrating over all incident directions,
we find the total radiance reflected from the element to the sensor to be:
2
;
2(
x
o ; vt)
L(p(t) !S ) = fr (!L !S )jn !LjIL exp
w2
"
(3.3)
(3.4)
42
z
w0
zo
= sinw
LS IL
ES
xo
Figure 3.3: Spacetime image of the Gaussian illuminant. After scaling the sensor scanlines to map
to depth and lateral displacement, we see that a point is imaged to a tilted Gaussian in the spacetime
image. The center of the Gaussian maps to the coordinates of the point, the amplitude depends on
the reflectance coefficient and beam strength, and the width is fixed by the scanner geometry.
where s is the position on the sensor and is the angle between the sensor and laser directions. We combine Equations 3.3-3.4 to give us an equation for the irradiance observed at
the sensor as a function of time and position on the sensor:
2
ES (t s) = fr (!L !S )jn !LjIL exp ;2(xwo ;2 vt) (s ; (xo ; vt) cos ; zo sin )
"
To simplify this expression, we condense the light reflection terms into one measure:
LS
= vt
is a measure of the relative x-displacement of the point during a scan, and z = s= sin is
which we will refer to as the reflectance coefficient of point . We also note that x
the relation between sensor coordinates and depth values along the center of the illuminant.
Making these substitutions we have:
2
ES (x z) = LS IL exp ;2(xw;2 xo) ((x ; xo) cos + (z ; zo) sin )
"
(3.5)
TA
43
SA
Sensor
Illuminant
(b)
(c)
(d)
(a)
Figure 3.4: From geometry to spacetime image to range data. (a) The original geometry. (b) The
resulting spacetime image. TA indicates the direction of traditional analysis, while SA is the direction of the spacetime analysis. The dotted line corresponds to the scanline generated at the instant
shown in (a). (c) Range data after traditional mean analysis. (d) Range data after spacetime analysis.
This equation describes a Gaussian running along a tilted line through the spacetime
sensor plane or spacetime image as depicted in Figure 3.3. We define the spacetime image to
be the image whose columns are filled with sensor scanlines that evolve over time. Through
the substitutions above, position within a column of this image represents displacement in
depth, and position within a row represents time or displacement in lateral position. From
Figure 3.3, we see that the tilt angle is
w0 = sinw
The peak value of the Gaussian is LS IL , and its mean along the line is located at (xo zo ),
the exact location of the range point. Note that the angle of the line and the width of the
Gaussian are solely determined by the fixed parameters of the scanner, not the position, orientation, or BRDF of the surface element. Thus, extraction of range points should proceed
by computing low order statistics along tilted lines through the spacetime image, rather than
along columns (scanlines) as in the conventional method. Figure 3.4 illustrates this distinction. Further, we establish the position of the surface element independent of the orientation and BRDF of the element and, assuming no interreflections, independent of any other
nearby surface elements. The decoupling of range determination from local shape and reflectance is complete. As a side effect, the peak of the Gaussian yields the irradiance at the
44
sensor due to the point. Thus, we automatically obtain an intensity image precisely registered to the range image, information which can assist in a task such as object segmentation.
The first three conditions ensure that the reflectance coefficient, LS = fr (!L !S )j
!Lj, is constant, the fourth condition guarantees that actual triangulation can occur, and the
fifth condition ensures that the illumination is scanning across points in the scene. Note that
the illumination need only be directional; coherent or incoherent light of any pattern is acceptable, though a narrow pattern will avoid potential depth of field problems for the sensor
optics. In addition, the translational motion need not be of constant speed, only constant
direction. We can correct for known variations in speed by applying a suitable warp to the
spacetime image.
We can weaken the need for orthography if we assume that the BRDF does not vary appreciably over the sensors viewing directions. The range of viewing directions depends on
the focal length of the lens system and the range of positions a single point can occupy while
being illuminated during a scan, which in turn depends on the width of the illuminant. In the
general perspective case, the spacetime analysis proceeds along curves rather than straight
lines. If the image of a point traverses a small displacement on the sensor relative to the
45
focal length, then assumption of local orthography is reasonable. In addition, general perspective can lead to changes in occlusion relationships during a scan that cause an illuminated point to be visible for a finite time and then become invisible while still illuminated.
This effect results in a partial image of the occluded points spacetime Gaussian and will
result in a range error for this point. These points will be easy to identify: they will be near
step discontinuities in the resulting range image. When acquiring multiple range images,
these points can be eliminated in favor of unoccluded acquisitions of the same portions of
the surface. Thus, while perspective sensors can create some complications in applying the
spacetime analysis, typical lens configurations will yield reasonable results.
We can also weaken the requirement for pure translation, as long as the rotational com-
ponent of motion represents a small change in direction with respect to the fr (!L
!S )jn
!Lj product. This constraint is reasonable for motion trajectories with large radii of curvature relative to the beam width. As in the perspective case, the spacetime analysis proceeds
in general along curves, and changes in occlusion relationships can lead to erroneous range
data. Nonetheless, for moderate rotational trajectories, the spacetime analysis will lead to
improved range data.
46
Ui(xi t) =
;1
Ui(0 t) =
47
= 0, so that:
1
Gg (;vit ;
;1
For a symmetric amplitude spread function, and after some straightforward manipulations,
this relation becomes:
(3.6)
As described in the previous chapter, the intensity is the square magnitude of the field,
but in this case, the mean we seek is with respect to time:
Z
ct =
tIi(0 t)dt
;1
1
Ii(0 t)dt
;1
(3.7)
The error can be solved as the variance of the mean as described in the previous chapter:
ct
(3.8)
Figure 3.5 illustrates the influence of laser speckle when using spacetime triangulation.
For traditional analysis, the uncertainty in the position of the mean on a sensor scanline corresponds directly to a depth error along the center of the illuminant. For spacetime triangulation, the error in the mean maps to an error in determining when the laser sheets center
intersects a line of sight at a range point. The range error is distributed along the sensors
line of sight, and obeys the relation:
o ct
= vsin
(3.9)
48
Surface
Lens
ct
ct
t
Sensor
Laser
Figure 3.5: Influence of laser speckle on spacetime triangulation. When performing the spacetime
analysis, the line of sight to the point is fixed, but the time at which the center of the laser passes over
the range point varies with range. Laser speckle introduces uncertainty, ct as to when the laser is
centered over the range point. The effect is range uncertainty, r , along the line of sight from the
sensor.
As in the case for traditional analysis, computing an analytical solution using Equations 3.6-3.9 has proven to be extremely difficult. Instead, we have performed numerical simulations that lead us to the same result as we found for traditional analysis (Equation 2.11):
1
2 sin
w do
2a
(3.10)
49
Surface
Moving
Laser
Fixed
Sensor
t1
(a)
t2
t3
(b)
t4
(c)
Figure 3.6: Speckle contributions at the sensor due to a moving coherent illuminant with square
cross-section. The sensor and surface are stationary while the illuminant sweeps across the surface.
A single pixel maps to a single point on the surface, but a neighborhood of points on the surface influence the pixel intensity as a result of diffraction at the aperture. This neighborhood is indicated by
the region between the lines drawn from the sensor to the surface. (a) As the illumination translates
to the right (shown here at time t1 ), only part of the neighboring points, mostly to the left of the range
point, are illuminated. Only these points will contribute to the observed, speckle-corrupted intensity
at the sensor. (b) At time t2 , the illumination uniformly covers the neighborhood and continues to do
so until time t3 . During this time, the pixel intensity will not vary as the illumination moves. (c) As
the left edge of the illumination crosses into the neighborhood near the range point (shown here at
time t4 ), then only the the points mostly to the right of the range point will affect the pixel intensity.
Thus, we expect the speckle noise in (a) and (c) to be uncorrelated.
The impact of laser speckle on traditional triangulation is not surprising, given the obvious corruption of the imaged illuminant. The effect of speckle on spacetime triangulation,
however, is harder to visualize. To gain a more intuitive grasp, consider the example in Figure 3.6, which illustrates the didactic example of a stationary object and sensor and a moving
illuminant with square irradiance profile. A single pixel on the sensor maps to a single point
on the surface, but due to diffraction at the aperture, neighboring points on the surface influence the intensity measured at this pixel. When the sensor approaches the range point from
the left, then surface points to the left of the range point affect the pixel intensity. When the
illumination is completely covering the neighborhood of the range point, then translations
of the illuminant do not affect the measured intensity. As the illumination passes the range
point to the right, the surface points to the right of the range point affect the pixel intensity.
Thus, when the illuminant is mostly to the left or right of the range point, the pixel intensity
is determined by different points on the surface.
50
Surface
r
ct
I
Sensor
t
Laser
(a)
(b)
Figure 3.7: Spacetime speckle error for a coherent illuminant with square cross-section. (a) As the
sensor and illuminant sweep over the surface, the sensor images noisy square waves. For regions
near the center of the square wave, the speckle behaves as though it adheres to the surface as the
scanning progresses. Near the edges, however, the speckle influence reveals itself. For the range
point being tracked, the amplitude in the first image depends on the properties of the left portion of
the surface, while the amplitude in the last image depends on the right portion of the surface. In (b)
we see the resulting spacetime image of the range point. The central regions of the pulse are very
flat, unlike the individual images of the square pulse at each time step in (a). However, the speckle
influences the edges of the pulse, resulting in uncertainty about the mean of the pulse.
Figure 3.7 illustrates how this distinction leads to errors in spacetime triangulation, this
time including the motion of the sensor along with the illuminant. As the center of the square
illuminant moves over the object, the speckled image of the surface does indeed move with
the surface, because the illuminant is fairly constant near the center. This explains why the
spacetime pulse is very flat in the center (Figure 3.7b). However, near the edges of the pulse,
the illumination is not at all constant, and speckle noise appears. The result is an uncertainty
in determining the location of these edges. The noise at the edges has different properties
for the left edge versus the right edge as described above, meaning that the uncertainty in
determining the position of the two edges is fairly uncorrelated. Since resolving the edges
is the only way of determining the center of the spacetime pulse, spacetime triangulation is
limited in accuracy by laser speckle.
51
ES (x z) = f (x z) hST (x z)
(3.11)
2
hST (x z) = IL exp ;w2x2 (z sin ; x cos )
#
"
and f (x
(3.12)
z) is a point in space,
f (x z) =
In general,
LS (xo )
(x ; xo) (z ; zo)
(3.13)
object that are visible to both the laser and the CCD. This set of points can be represented
as a function, z
LS
LS (x), so that
we arrive at:
f (x z) =
LS (x)
(z ; r(x))
(3.14)
Figure 3.8 illustrates how the spacetime image is derived from a two dimensional scene.
52
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3.8: Formation of the spacetime image. The scene in (a) is being scanned horizontally. First
we determine the portions of the surface visible to the laser (b) and to the sensor (c). The laser and
sensor lines of sight that clip the surface are shown dashed, while the removed portions of the surface
are shown dotted. The resulting surface (d) is convolved with the spacetime Gaussian (e) to create
the spacetime image (f). For the purposes of this illustration, the contribution of LS x has been
omitted. The result of LS x would be to modulate the amplitude of the spacetime Gaussian as it
is being convolved across the surface.
()
()
53
fectly, then we would be able to compute the exact shape of the object. In the real world,
however, this image is filtered, corrupted by noise, sampled, and ultimately reconstructed
before being analyzed for shape extraction.
The filtering can be decomposed into two components: spatial and temporal. Spatial
blurring occurs at the sensor due to the imaging optics and the fact that each sensor cell
has a finite area over which it gathers photons and accumulates a charge. We call this filter
hpix (z), because it spans sensor pixels. The other filtering component operates temporally;
as the object moves relative to the scanner, the sensor captures photons over a fixed interval
of time before sampling the result and starting over. The motion is effectively blurred during
this interval. We call this temporal filter hframe (x), where frame refers to the time interval
and x is the representative of the time component (remember, x = vt).
Several noise sources also affect the spacetime image. As described in the previous section, coherent illumination leads to speckle noise that corrupts the spacetime Gaussian. In
addition, the sensor will have some inherent noisiness caused by a variety of factors (see,
e.g., [Janesick et al. 1987]). Finally, during the sampling step, the pixel values will be quantized for representation in the computer, introducing quantization noise. We will represent
all of these noise sources with one function,
().
Putting the filtering and noise sources together, we arrive at an expression for the spacetime image prior to sampling and reconstruction:
(3.15)
This image is then sampled according to the spatial resolution of the sensor and the frame
rate. We represent this sampling process as an impulse train, i( xx zz ), where z is the sensor
pixel separation projected along the laser sheet and x is the distance the object moves relative to the scanner between frames. Once we have the sampled, filtered spacetime image,
we can employ a reconstruction filter, hrec (x
continuous image. This final image is then:
54
E~S (x z) = ESlt (x z) i x z
x
hrec (x z)
(3.16)
One of the first observations that arises from these equations is the fact that the reconstructed spacetime image is a continuous function. Thus, this reconstruction yields a (piecewise) continuous surface, rather than a set of range samples. In the next chapter, we exploit
this observation to extract as much range data as possible.
(3.17)
(3.18)
F (E~S ) = F (ESlt )
x z i( x sx z sz )]Hrec (sx sz )
(3.19)
where F () is the Fourier transform operator, sx and sz are the spatial frequencies in x and
z, and F , HST , Hpix , Hframe , and Hrec are the Fourier transforms of f , hST , hpix , hframe ,
and hrec , respectively. Figure 3.9 shows the steps in creating, filtering, and sampling the
spacetime image.
Several observations arise from thinking about the Fourier spectrum of the spacetime
image. The least surprising is the fact that higher sampling rates permit (and are usually
accompanied by) wider Hpix and Hframe filters, allowing for the acquisition of more of the
spacetime spectrum, i.e., higher frequencies. In other words, a higher sampling rate leads
to more captured detail.
sz
sz
55
F (ES )
F
sx
sz
1=w0
HST
sx
sx
(a)
(b)
(c)
sz
1= x
sz
sz
F (ESlt )
Hpix Hframe
(d)
1= z
sx
sx
sx
(e)
(f)
Figure 3.9: The spacetime spectrum: a visualization for Equations 3.17 - 3.19. The Fourier transform
of the shape function (a) is multiplied by the spectrum of the spacetime Gaussian (b) to give the
spectrum of the ideal spacetime image (c). Before sampling, this spectrum is multiplied by the pixel
and frame filters (shown as idealized low-pass filters in (d)) to give the filtered spacetime spectrum
without noise (e). The sampling process creates replicas of this filtered spectrum (f) spaced = x and
= z apart in sx and sz respectively. The reconstruction step entails extracting the central replica of
the spectrum after multiplying by Hrec (not shown).
56
57
z
f
x
(a)
(b)
sz
sz
F
F (ES )
sx
(c)
sx
(d)
Figure 3.10: Spacetime spectrum for a line. (a) A line at angle to the laser is scanned. (b) The
function to be convolved with the spacetime Gaussian is a delta function line. (c) Its Fourier transform is another delta function line oriented perpendicular to the original. (d) After filtering with
the spacetime Gaussian, the spacetime spectrum is essentially a line segment in the Fourier plane.
58
x with
width proportional to the exposure time. In addition, sensors generally have a time interval
when they stop capturing light while shifting out the recorded signals. The spectral bandwidth of the temporal filter is thus wider than the distance between samples. In both the
spatial and temporal cases, some aliasing artifacts are bound to occur. This aliasing generally leads to reduction in image quality; however, it also means that multiple spacetime
images may be combined to improve the accuracy of the range data. Techniques such as the
one described in [Irani & Peleg 1991] could be used to combine these spacetime images. In
order for such an approach to work, a sequence of spacetime images must be acquired after
translating the object (or scanner) between scans. Rotating the object (or scanner) between
scans generates an entirely new spacetime image that may not be combined with prior acquisitions. Increased resolution through combination of multiple spacetime images remains
an area for future work.
59
Chapter 4
Spacetime analysis: implementation and
results
The previous chapter provides a theoretical foundation for accurate optical triangulation
through spacetime analysis. In this chapter, we demonstrate that the method can be adapted
to work with an existing commercial laser triangulation scanner (Section 4.1). We develop
an efficient algorithm for performing the spacetime analysis and show how increased resolution can be obtained by interpolating the imaged light reflections (Section 4.2). Finally,
we show the results of applying the spacetime analysis to some real objects (Section 4.3).
4.1 Hardware
We have implemented the spacetime analysis presented in the previous chapter using a commercial laser triangulation scanner and a real-time digital video recorder. The optical triangulation system we use is a Cyberware MS 3030 platform scanner (shown in Figure 1.5).
This scanner collects range data by casting a laser stripe onto the object and by observing
reflections with a CCD camera positioned at an angle of 30o with respect to the plane of the
laser. The platform can either translate or rotate an object through the field of view of the
triangulation optics. Figure 1.2b illustrates the principle of operation. The laser width (the
distance between the beam center and the e;2 point) varies from 0.8 mm to 1.0 mm over the
field of view which is approximately 30 cm in depth and 30 cm in height. Each CCD pixel
60
4.1. HARDWARE
61
Cyberware
video camera
Abekas digitizer
Transfer to host
and RLE compress
Spacetime analysis
Range image
Figure 4.1: From range video to range image. To perform the spacetime analysis, we digitize the Cyberware video with an Abekas digitizer. We can then either store the digitized video to disk and copy
it to the host computer, or perform a real-time JPEG compression and fast decompression through
an SGI Cosmo board. The JPEG compression approach has proven to be more time efficient, but the
results in this chapter were originally obtained with the video disk option. In either case, the frames
are run-length encoded and then converted to a range image through the spacetime analysis.
62
images a portion of the laser plane about 0.5 mm square. Although the Cyberware scanner
performs a form of peak detection in real time, we require the actual video frames of the
camera for our analysis. Figure 4.1 illustrates the possible routes for processing the range
video. We can digitize these frames with an Abekas A20 video digitizer, which can acquire
486 by 720 pixel frames at 30 Hz. The captured frames have approximately the same resolution as the Cyberware CCD camera, though the scanlines are first converted to analog
video before being resampled and digitized. The digitized frames can then be stored in real
time with an Abekas A60 digital video disk. Alternatively, we can run the digitized signal
through a commercially available JPEG compression board. Because each frame is an image of the narrow laser sheet intersecting the surface, it is mostly blank. As a result, we
can JPEG compress the frame with a high quality factor, but still obtain high compression.
The JPEG compression efficiently encodes the blocks of empty space without sacrificing the
quality of the imaged light reflections. The key advantage of this approach is that the range
video compresses well enough to be stored on the host computers main memory, as opposed to a video disk which limits the speed of recovery of the recorded frames. The results
in this chapter were obtained with the Abekas digital video disk, but the results described
in Chapter 8 are based on range data acquired with the help of JPEG compression.
x
(a)
(b)
63
(c)
(d)
x
(e)
Figure 4.2: Method for optical triangulation range imaging with spacetime analysis. (a) The original
geometry. (b) The resulting spacetime image when scanning with the illuminant in the z direction.
(c) Rotation of the spacetime image followed by evaluation of Gaussian statistics. (d) The means of
the Gaussians. (e) Rotation back to world coordinates.
systems discard each CCD image after using it (e.g. to compute a stripe of the range map).
As described in the previous section, we have assembled the necessary hardware to record
the CCD frames. In the previous chapter, we discussed a one dimensional sensor scenario
and indicated that perspective imaging could be treated as locally orthographic. For a two
dimensional sensor, we can imagine the horizontal scanlines as separate one dimensional
sensors with varying vertical (y ) offsets. Each scanline generates a spacetime image, and
by stacking the spacetime images one atop another, we define a spacetime volume. Complications arise when we have perspective projection in the vertical direction, as the image of a
point sweeping through the Gaussian illuminant can cross scanlines (i.e., a point is not constrained to lie in a single spacetime image plane). We have not yet implemented the general
spacetime ranging method that accounts for this effect, so we restrict most of our analysis
to the spacetime volume near the vertical centerline of the CCD where range points will not
cross scanlines while passing through the illuminant (i.e., we can visualize the spacetime
volume as a stack of spacetime images). Nevertheless, toward the end of this chapter, we
demonstrate some results for larger objects that show improvement even though they are
not restricted to the center of the field of view in the vertical direction. This improvement is
probably due to the fact that even at the extreme positions of the field of view a point does
not traverse more than half a scanline while passing through the illuminant.
In step 2, we rotate the spacetime images so that Gaussians are vertically aligned. In a
practical system with different sampling rates in x and z , the correct rotation angle can be
computed as:
64
z
(4.1)
tan
x
where ST is the new rotation angle, x and z are the sample spacings in x and z respectively, and is the triangulation angle. In order to determine the rotation angle, ST , for a
tan
ST
given scanning rate and region of the field of view of our Cyberware scanner, we first de-
termined the local triangulation angle and the sample spacings in depth, z , and lateral posi-
tion, x. Equation 4.1 then yields the desired angle. When performing the rotation, we avoid
aliasing artifacts which would lead to range errors by employing high-quality filters. For
this purpose, we use a bicubic interpolant.
After performing the rotation of each spacetime image, we compute the statistics of the
Gaussians along each rotated spacetime image raster1. Our method of choice for computing
these statistics is a least squares fit of a parabola to the log of the data; i.e, given a Gaussian
of the form:
2
G(x) = a exp ; 2(xw;2 b)
"
where a is the amplitude and b is the desired center, we can take the log of the Gaussian:
2
2
2(
x
;
b
)
log G(x)] = log a] ; w2 = ; w22 x2 + w4b2 x + log a] ; 2wb2
!
and fit a parabola to the result. The linear coefficient contains the center, b.
We have also experimented with fitting the data directly to Gaussians using the
Levenberg-Marquardt non-linear least squares algorithm [Press et al. 1986], but the results
have been substantially the same as the log-parabola fits.
The Gaussian statistics consist of a mean, which corresponds to a range point, as well as
a width and a peak amplitude, both of which indicate the reliability of the data. Widths that
are far from the expected width and peak amplitudes near the noise floor of the sensor imply unreliable data which may be down-weighted or discarded during later processing (e.g.,
1
Actually, we do not expect to find perfect Gaussians on the rotated image rasters. The filtering and sampling processes described in Section 3.5 alter the ideal spacetime image. Nevertheless, fitting Gaussians is a
reasonable approximation that has yielded excellent results in practice
Type: zeroes
Length: 17
65
Type: varying
Length: 11
Data: 4, 18, 65, ...
Type: zeroes
Length: 10
Figure 4.3: Run-length encoded spacetime scanline. Each scanline is composed of runs of zeroes
and runs of varying intensity data.
when combining multiple range meshes). For the purposes of this thesis, we discard unreliable data and rely on future scans to fill in reliable data used in the surface reconstruction
process.
Finally, in step 4, we rotate the range points back into the global coordinate system.
66
Gather
Rotated image
Spacetime image
Send
Figure 4.4: Fast RLE rotation. The naive approach to rotating an image would be to expand the
RLE source image, and then traverse all of the pixels of the target image while gathering data from
(resampling) the source image. A much more efficient approach is to traverse the RLE structure in
the source image and send only non-zero data to the target image.
Target
Source
B
A
(a)
(b)
B
A
(c)
(d)
Figure 4.5: Reconstruction: send vs. gather. (a) The target image is rotated with respect to the source
image. (b) We can find the values at the target pixels by interpolating among the source pixels. (c)
Gather interpolation for a single target image pixel, A. (d) Send interpolation for a single source
pixel. The example in this figure illustrates the case for a bilinear filter with a 2x2 support; extension
to filters with larger supports is straightforward.
67
next step of the spacetime algorithm. Ideally, we would like to create an RLE target image
directly. We can achieve this goal with the following procedure:
1. Allocate space for each scanline in the target image.
2. Stream through each source scanline, ignoring runs of zeroes. When sending varying
data:
If the data being sent is not within the current target run being constructed:
Mark the current target run as finished.
Declare a run of zeroes between the finished run and the new target position.
Start a new run of varying pixels.
Else,
Deposit the pixel value into the existing run, increasing its length as needed.
3. Mark the last run on each scanline as being a run of zeroes.
Figure 4.6 shows a scanline as it is being constructed.
This RLE rotation algorithm delivers a dramatic performance improvement. If we con-
sider a spacetime volume of n voxels on a side, then the brute force rotation would execute
in time proportional to the size of the volume, i.e., O(n3 ). On the other hand, the fast algo-
rithm described here requires time proportional to the part of the volume with non-zero data.
The sensor samples the surface onto a grid of size roughly n x n, which is then blurred
into the spacetime volume by the spacetime impulse response (Equation 3.11). The width
of this impulse response function is constant, so the overall size of the spacetime surface
embedded in the volume is
O(n2 ).
O(n2) time,
substantially faster then the brute force method. In practice, we observe a typical speed-up
of 50:1 over the brute force approach.
68
Rotation
(a)
T=?
L=?
T=Z
L = 20
T=V
L=1
T=Z
L = 20
T=V
L=7
T=Z
L=6
T=V
L=1
T=Z
L = 20
T=V
L=7
T=Z
L=6
T=V
L=7
T=Z
L = 15
(b)
Figure 4.6: Building a rotated RLE scanline. (a) The source image is traversed in scanline order
(dotted white lines) from top to bottom (solid black arrow to the left), and the non-zero values are
deposited in the target image. (b) The scanline indicated by the solid white arrow in the target image
is constructed in steps. T is the data type, and L is the run length. In A, The scanline is initialized
to have no data or length. In B, when the first source scanline intersects the target scanline, the first
run is declared to be zeroes (T = Zero or Z) and its length is computed. In addition, a varying run
(T = Varying or V) is begun. As successive source scanlines intersect the target scanline, more
varying data is added to the current run, which grows as needed. At C, more varying data is laid
down, but it is offset from the first varying run. Thus, a new run of zeroes is declared with its length,
and a new varying run is begun. Eventually, the source scanline sweep finishes, and the remainder
of the scanline is declared to be zeroes, as shown in D.
Spacetime
image
69
Extracted
range points
x
(a)
x
(b)
Figure 4.7: Interpolation of spacetime image data. (a) Extraction of range data from a spacetime
image followed by linear interpolation of the range points. (b) Same as (a) but more interpolation in
the spacetime image rather than the among the range points.
Traditionally, researchers have extracted range data at sampling rates corresponding to one
range point per sensor scanline per unit time. Interpolation of shape between range points
has consisted of fitting primitives (e.g., linear interpolants like triangles) to the range points.
Instead, we can regard the spacetime volume as the primary source of information we have
about an object. After performing a scan, we have a sampled representation of the spacetime
volume, which we can then reconstruct to generate a continuous function. In the previous
chapter, we described this reconstruction for a spacetime image; for the spacetime volume,
we have added a vertical dimension, but the principle remains the same. This reconstructed
function then acts as our range oracle, which we can query for range data at a resampling
rate of our choosing. Figure 4.7 illustrates this idea for a single spacetime image. In practice,
we can magnify the sampled spacetime volume prior to applying the range imaging steps
described above. The result is a range grid with a higher sampling density directly derived
from the imaged light reflections.
0.8
0.6
0.4
Error (mm)
70
Scanline mean
Spacetime Gaussian
|
0.2
0.0 |
1
(a)
|
3
|
5
|
7
|
|
|
9
11
13
Reflectance Ratio
(b)
Figure 4.8: Measured error due to reflectance step. (a) Printed cards with reflectance discontinuities
were carefully taped to a machined planar surface. The black dots were used to distinguish the taped
boundary from the target surface. (b) A plot of range errors shows that the spacetime method yields
a 60-80% reduction in range error for steps varying from 1:1 to 12:1.
4.3 Results
We have performed a variety of tests to evaluate the performance of traditional triangulation
and spacetime analysis. These tests include experiments with reflectance variations, simple
shape variations, errors due to speckle, and scanning complex objects.
4.3. RESULTS
71
(b)
(c)
(a)
Figure 4.9: Reflectance card. (a) Photograph of a planar card with the word Reflectance printed
on it. (b) and (c) Shaded renderings of the range data generated by traditional mean analysis and
spacetime analysis, respectively. With traditional mean analysis, the planar card appears embossed
with the lettering, indicating a confusion between reflectance and shape. The spacetime analysis
yields a nearly planar surface, showing a disambiguation of reflectance and shape.
Figure 4.8 shows a plot of maximum deviations from planarity. The spacetime method has
clearly improved over the old method. The reduction in range errors varies from 65% for a
reflectance change of 2:1 up to 85% for a reflectance change of 12:1.
For qualitative comparison, we produced a planar sheet with the word Reflectance
printed on it. Figure 4.9 shows the results. The old method yields a surface with the characters well-embossed into the geometry, whereas the spacetime method yields a much more
planar surface indicating successful decoupling of geometry and reflectance.
0.5
0.4
0.3
0.2
0.1
72
0.0 |
100
|
(a)
Scanline mean
Spacetime Gaussian
|
120
|
|
140
160
Corner Angle (degrees)
(b)
Figure 4.10: Measured error due to corner. (a) Two machined wedges with razor sharp corners were
placed at varying angles to each other to study the effect of corner angle on range accuracy. (b) A
plot of range errors shows that the spacetime method yields a 35-50% reduction in range error for
o to
o.
corners of angles varying from
110
(a)
150
(b)
(c)
Figure 4.11: Depth discontinuities and edge curl. (a) Photograph of two strips of paper. (b) and (c)
Shaded renderings of the range data generated by traditional mean analysis and spacetime analysis,
respectively. With traditional mean analysis, the edges of the strips exhibit large edge curl artifacts
(1.1 mm between the hash marks in (b)). The curl is nearly eliminated when using the spacetime
analysis.
Finally, we impressed the word SHAPE onto a plastic ribbon using a commonly available label maker. In Figure 4.9, we wanted the word Reflectance to disappear because it
represented changes in reflectance rather than in geometry. In Figure 4.12, we want the word
SHAPE to stay because it represents real geometry. Furthermore, we wish to resolve it as
highly as possible. Figure 4.12 shows the result. Using traditional mean analysis, the word
is barely visible. Using spacetime analysis, the word becomes legible.
4.3. RESULTS
73
(b)
(c)
(a)
Figure 4.12: Shape ribbon. (a) Photograph of a surface with raised lettering (letters are approx. 0.3
mm high). (b) and (c) Shaded renderings of the range data generated by traditional mean analysis
and spacetime analysis, respectively. Using mean pulse analysis, the lettering is hardly discernible,
while the spacetime analysis yields a legible copy of the original shape.
4.3.3 Speckle
To evaluate the influence of laser speckle, we scanned planar surfaces of varying roughness
under coherent and incoherent light. First, we verified the existence of the laser speckle.
Recording the light reflections from the surfaces under incoherent (single filament, non-
diffuse) illumination showed negligible variation (< 5%) in image intensity, while the images of laser reflections showed significant variations (> 20%) in peak intensity. These
facts, coupled with the observation that surfaces with roughness finer than the resolution
of the pixels yielded variations at a scale larger than a pixel, lead us to believe that we are
observing laser speckle.
Next, we performed range scans on the planar surfaces and generated range points using
the traditional and spacetime methods. After fitting planes to range points, we found that
both methods resulted in range errors with a standard deviation of about 0.06 mm and a
distribution that is very nearly Gaussian (see Figure 4.13). As expected, both traditional
and spacetime triangulation are susceptible to errors due to laser speckle. In Section 2.2.4,
we computed the range error for a perfectly rough surface to be 0.25 mm. Since the surface
being measured is unlikely to be perfectly rough, it is not surprising that the measured error
is less.
74
120
100
Number of points
80
60
40
20
0
0.15
0.1
0.05
0
0.05
Distance from surface (mm)
0.1
0.15
Figure 4.13: Distribution of range errors over a planar target. We extracted the range data using
spacetime analysis and fit a plane through the resulting data. The bar graph represents a histogram
of distances from the planar fit, while the curve corresponds to a Gaussian approximation to the histogram.
hardware and is particularly noisy. This added noisiness results from the method of pulse
analysis performed by the hardware, a method similar to peak detection. Peak detection is
especially susceptible to speckle noise, because it extracts a range point based on a single
value or small neighborhood of values on a noisy curve. Mean analysis tends to average out
the speckle noise, resulting in smoother range data as shown in Figure 4.14c. Figure 4.14d
shows our spacetime results and Figure 4.14e shows the spacetime results with 3X interpolation and resampling of the spacetime volume as described in Section 4.2. Note the sharper
definition of features on the body of the tractor.
Figure 4.15 shows the result for a model of an alien creature. As with the tractor image,
the Cyberware output is noisier than the traditional mean analysis results. Note, however,
that the traditional mean analysis appears to smooth the geometric detail more than the Cyberware method. Both the Cyberware and the mean analysis suffer from edge curl. By contrast, the spacetime method reduces edge curl, exhibits less noise, and attains comparable
or greater detail than either of the other methods.
4.3. RESULTS
75
(b)
(c)
(d)
(e)
(a)
Figure 4.14: Model tractor. (a) Photograph of original model and shaded renderings of range data
generated by (b) the Cyberware scanner hardware, (c) mean pulse analysis, (d) our spacetime analysis, and (e) the spacetime analysis with 3X interpolation of the spacetime volume before fitting the
Gaussians. Below each of the renderings is a blow-up of one section of the tractor body (indicated by
rectangle on rendering) with a plot of one row of pixel intensities. Because of the scanner hardwares
peak detection method, it is the most susceptible to speckle noise as is evident in the rendering and in
the bumpy intensity plot. Mean analysis yields a decrease in noise and spacetime analysis yields
even less noise and more detail. The 3X interpolated spacetime analysis reveals the most detail of
all.
76
(a)
(b)
(c)
(d)
Figure 4.15: Model alien. (a) Photograph of original model and shaded renderings of range data
generated by (b) the Cyberware scanner hardware, (c) mean pulse analysis, and (d) our spacetime
analysis. The highlighted box in (c) and (d) shows the reduction of edge curl along the depth discontinuity at the aliens torso where the CCD line of sight was occluded. Note that only the spacetime
method resolves the ridges in the right leg.
4.3. RESULTS
77
Chapter 5
Surface estimation from range images
The second part of this thesis is concerned with building computer models from range images such as might be generated using the method described in the first part of the thesis.
As stated in Chapter 1, the problem of reconstructing a surface from range images can
be stated as follows:
Given a set of p aligned, noisy range images, f^1
that most closely approximates the points contained in the range images.
where each function, f^i is a sampling of the actual surface f , and each sample, f^i (j
the observed distance to the surface as seen along the line of sight indexed by (j
k ).
k) is
The
problem of finding this manifold hinges on defining how well any manifold approximates
the range images, and then solving for the best manifold. In this chapter, we begin with a
discussion of prior work in reconstructing surfaces from range data. Next, we describe range
imaging in more detail and then construct a probabilistic model relating possible manifolds
to range image samples. In Section 5.4, we define the best surface as the Maximum Likelihood surface, and we define the integral that this surface must minimize. In Section 5.5,
we show how to bring the integral under a unified domain using a vector field analogy for
range image sampling. In Section 5.6, we derive some results from the calculus of variations needed for the solution of this minimization problem. Finally, in Section 5.7, we solve
the minimization equation for surface reconstruction from range images. As a special case,
78
79
we relate the least squares minimization (corresponding to Gaussian statistics) to the search
for a zero crossing of weighted sums of signed distance functions.
80
Discrete-state voxels
Among the discrete-state volumetric algorithms, Connolly [1984] casts rays from a range
image accessed as a quad-tree into a voxel grid stored as an octree, and generates results for
synthetic data. Chien et al. [1988] efficiently generate octree models under the severe assumption that all views are taken from the directions corresponding to the 6 faces of a cube.
Li & Crebbin [1994] and Tarbox & Gottschlich [1994] also describe methods for generating binary voxel grids from range images. None of these methods has been used to generate surfaces. Further, without an underlying continuous function, there is no mechanism for
representing range uncertainty in the volumes.
81
Continuous-valued voxels
The last category of our taxonomy is implicit function methods that use samples of a continuous function to combine structured data. Our method falls into this category. Previous
efforts in this area include the work of Grosso et al. [1988], who generate depth maps from
stereo and average them into a volume with occupancy ramps of varying slopes corresponding to uncertainty measures; they do not, however, perform a final surface extraction. Succi
et al. [1990] create depth maps from stereo and optical flow and merge them volumetrically
using a straight average of estimated voxel occupancies. The reconstruction is an isosurface
extracted at an arbitrary threshold. In both the Grosso and Succi papers, the range maps are
sparse, the directions of range uncertainty are not characterized, they use no time or space
optimizations, and the final models are of low resolution. Recently, Hilton et al. [1996] have
developed a method similar to ours in that it uses weighted signed distance functions for
merging range images, but it does not address directions of sensor uncertainty, incremental
updating, space efficiency, and characterization of the whole space for potential hole filling,
all of which we believe are crucial for the success of this approach.
82
(a)
(b)
(c)
^( )
Figure 5.1: From range image to range surface. The range image, f i j in (a) is a set of points in
3D acquired on a regular sampling lattice. (b) shows the reconstruction of a range surface, f x y ,
using triangular elements. (c) is a shaded rendering of the range surface.
~( )
sponds to a distance from the sensor along a single line of sight. A range surface, f~, is a
piecewise continuous function reconstructed from a range image. Figure 5.1 shows a range
surface reconstructed using triangular elements to connect nearest neighbors in the range
image. Such a range surface is a piecewise linear reconstruction, and corresponds to a partial estimate of the shape and topology of the object. To avoid making topological errors
such as bridging depth discontinuities, researchers typically set an edge length or surface
orientation threshold when establishing connectivity of samples.
Rangefinders might have lines of sight distributed across a range image in any one of a
number of different configurations. Figure 5.2 depicts several range scanning geometries
and their corresponding viewing frustums. An orthographic imaging frustum may arise
83
from a 2-axis translational scanning configuration as depicted in Figures 5.2a and 5.2b,
though no such commercial system exists to our knowledge. Figure 5.2c shows a time of
flight scanner with two rotating mirrors; the lines of sight for such a configuration are nearly
perspective, assuming the separation between the mirrors is small. Rangefinders are in general neither precisely orthographic nor precisely perspective. Optical distortions in real
range imaging systems typically lead to violations of paraxial assumptions about the imaging optics. In addition, some scanners deviate significantly from orthographic or perspective. For instance, some light stripe triangulation scanners sample an object with a perspective projection within the plane of light, but translate horizontally to fill out the range image.
Such a projection is a cylindrical or line perspective projection; i.e., it is orthographic in
the direction of scanning, and perspective within the scanning plane. This last scanning geometry corresponds to the Cyberware scanner used for this thesis (see Figures 5.2e and 5.2f).
When applying traditional methods of optical triangulation, samples lie along rays that correspond to the projection of each CCD scanline onto the laser sheet. These sampling lines of
sight need not coincide with the lines of sight of the laser (though they do happen to coincide
in the Cyberware scanner). Figure 5.3 illustrates this point. When applying the spacetime
analysis for triangulation, however, the sampling lines of sight follow the lines of sight of
the CCD.
While their range imaging frustums are diverse, most rangefinders do have in common
that the lines of sight of the sensor are well calibrated and the range uncertainty lies predominantly in determining depth along these lines of sight. For a time of flight rangefinder, for
example, the primary source of uncertainty is in determining the time of arrival of a light
pulse relative to its time of emission, resulting in depth errors along the direction of the emitted pulse. In optical triangulation, traditional analysis of imaged laser reflections leads to
uncertainties in depth along the laser beam or within the laser sheet, as shown in Figure 2.6.
Using the spacetime analysis for optical triangulation, the errors in determining the location of a peak in time, lead to range errors along the cameras lines of sight, as shown in
Figure 3.5. Other researchers have also characterized range errors for optical triangulation,
and have found the error to be ellipsoidal, i.e., non-isotropic, about the range points [Hebert
et al. 1993] [Rutishauser et al. 1994].
84
Object
(i
j)
Laser
f^(i j )
Sensor
(b)
(a)
Nodding mirror
Mirror
Sensor
Half-silvered
mirror
Laser
(d)
(c)
Direction of travel
Laser sheet
CCD
Cylindrical lens
Laser
(e)
(f)
Figure 5.2: Range imaging frustums. (a) An orthographic triangulation scanner using a beam of
light and a linear array sensor and (b) its orthographic viewing frustum. The sensor and light move
together, stepping in fixed increments horizontally and vertically to traverse its field of view. (c) A
time-of-flight scanner with rotating mirrors to scan the beam and sensor line of sight over the object
and (d) its approximately perspective viewing frustum (for small mirror separation, d). (e) A translating laser stripe triangulation scanner with an area sensor and (f) its cylindrical projection viewing
frustum when using traditional (not spacetime) triangulation analysis.
85
Laser
sheet
Cylindrical
Lens
Laser
ci
CCD
Figure 5.3: Range errors for traditional triangulation with two-dimensional imaging. A typical optical triangulation system uses a laser beam spread into a diverging sheet of light with a cylindrical
lens and a CCD that images reflections from within this sheet. Range is usually determined by computing the centers of the imaged laser reflections on each scanline. Thus, the errors in determining
the centers correspond to errors in range that vary along the projections of the scanlines onto the
laser sheet. Notice that the divergence of the laser sheet need not coincide with the divergence of the
projected scanlines.
pdf(f j f^1
f^p)
where f is a possible manifold approximating the range images ff^i g. In order to make the
analysis feasible, we make some assumptions about the nature of this pdf . At the end of
this chapter, we will discuss the validity of these assumptions. First, we assume that the
uncertainties in the range images are independent of one another, giving us:
86
pdf(f j f^1
f^p) =
i=1
pdf(f j f^i)
(5.1)
Next, we assume that sample errors within a range image are independently distributed:
pdf(f j f^i ) =
m m
Y Y
j =1 k=1
quires that the registration of the range images is very accurate. Further, if we consider the
sampling errors to be distributed along the lines of sight of the sensor, then:
pdf(f j f^i) =
where fi (j
m m
Y Y
j =1 k=1
(5.2)
k) is the reparameterization of f over the domain of the ith range image, resampled along the sensor direction at (j k ). Combining Equations 5.1 and 5.2, we have:
pdf(f j f^1
f^p) =
p m m
Y Y Y
i=1 j =1 k=1
(5.3)
Choosing a single function that best approximates the range surfaces takes us from
the realm of probability into the realm of statistics.
pdf
[Papoulis 1991]. In our case, we seek the surface function, f , that maximizes the pdf
described in Equation 5.3. Note that maximizing the pdf is equivalent to minimizing
; log(pdf), due to the strictly monotonic nature of the logarithm function. Thus, we can
transform our problem to solving for the f that minimizes:
Maximum Likelihood (ML), which chooses the parameter values that maximize a
E (f ) = ;
p m m
XXX
i=1 j =1 k=1
w2
f2
87
d2
f (x y )
d1
w1
f1
(x
v1
y z)
v2
y
Figure 5.4: Two range surfaces, f1 and f2 , are tessellated range images acquired from directions
v1 and v2 . In the case of Gaussian statistics, the possible range surface, z f x y , is evaluated
in terms of the weighted squared distances to points on the range surfaces taken along the lines of
sight to the sensor. A point, x y z , is shown here being evaluated to find its corresponding signed
distances, d1 and d2, and weights, w1 and w2.
= ( )
A difficulty with this formulation, however, is the fact that the function f need only min-
imize the function E (f ) for the points that project onto the surface for each sampled range
image. In between these points, the surface need not even be continuous. In fact, the ML
estimate of f would simply be the original points in the range images. Even if we required
continuity, the function f could simply interpolate the points in the range images and be
almost arbitrarily ill-behaved between points. To remedy this problem, we could impose
some continuity and smoothness constraints on the function f . Alternatively, we can interpret the range images as range surfaces; i.e., by suitable interpolation of the range images,
we can generate projected surface estimates of f . Figure 5.4 illustrates this principle. In
this case, the function to minimize becomes:
E (f ) =
i=1
ZZ
Ai
(5.4)
where f~i is the interpolated range surface corresponding to f^i , we have replaced ; log
pdf()]
with g (), and we have taken the limit of the summation to be an integral. Further, the (u v )
notation is a parameterization of the sensors sampling rays. This parameterization could
88
) pa-
~
fi (u v)
pdf(fi(u v) j f~i(u v)) = ci(u v) exp ; 12 fi (u v)(;
i u v)
! 3
where ci is the normalization factor and i is the standard deviation. The numerator in the
squared term is simply the distance between reconstructed range surface points and points
on the candidate surface, where the distance is taken along the sensing direction:
equation becomes:
E (f ) = 12
i=1
ZZ
Ai
Thus, the assumption of Gaussian statistics leads to a weighted least squares minimization, where distances are taken between range surfaces and the minimizing function along
range image viewing directions.
Note that the domain of integration in the previous equations depends on the viewpoint
and ray distribution of the sensor. By choosing a common domain of integration, we can
bring the summation under the integral and attempt to minimize E (f ) by applying the calculus of variations.
89
E (f ) =
ZZ
A i=1
ei(x y f )dxdy
where we have chosen the domain of integration to be the x ; y plane over which our candi-
date surface, f (x
y), is a function. The function ei(x y f ) is the error associated with the
point, (x y f ), on the surface as measured with respect to the ith range image.
To help see the connections we need to make, we can rewrite Equation 5.4 as:
E (f ) =
i=1
ZZ
Ai
gi (u v fi)dudv
v fi).
cobian. However, rather than compute the Jacobian directly, we prefer to derive the relationship between the differentials explicitly using geometric arguments. The intuitions
gained help in generalizing from the simplest case, orthographic projection, to more arbitrary viewing frustums such as line perspective. Thus, we will first consider the case of an
orthographic sensor.
90
z = f (x y )
bx
by
n
dAf
zi
zj
z
vj
xj
dAj
vi
y
yj
x
yi
xi
dAi
dA
Figure 5.5: Differential area relations for reparameterization. Orthographic range images are acquired from different viewpoints, yielding surface parameterizations over domains such as xi yi
and xj yj . Instead, we want to estimate a surface over a canonical domain x y . When integrating over this canonical domain, we must relate differential areas in this domain to areas in the
domains of the original range images.
( )
f (x y), and then re-project it onto the ith range images viewing plane, as indicated in Figure 5.5. We use Cartesian (xi yi zi ) coordinates to represent (u v fi). First, we note that
moving a unit distance in the x-direction in the (x y ) domain corresponds to moving along
the vector bx on the surface f , where:
@z
bx = 1 0 @x
by = 0 1 @z
@y
91
a = by bx
=
@z @z ;1
@x @y
dAf = jajdA
and the normal,
n to the surface
is:
n = jaaj
The projected area, dAi
@z ; @z 1 v dxdy
E (f ) =
ei(x y z) ; @x
(5.5)
i
@y
A i=1
where we have replaced f (x y ) with z on the right hand side. This equation is now ready for
ZZ
"
minimization under the calculus of variations, but it applies only to orthographic equations
in this form.
How does this equation change when we handle more complex projections, such as perspective or line perspective? In these cases, we would find that the direction of projection is
1
92
not constant as in the orthographic case, but varies over space. Further, the contribution of
a surface element varies not only with the orientation of the element, but also with distance
from the center of projection. For example, as an element moves away from the center of
projection for the perspective case, its apparent size falls off with 1=r2 , where r is the distance from the center.
One approach to the problem might be to define a suitable projection surface that captures the variation in orientation of projection rays and onto which we can project a surface
element to compute its contribution to the error integral. For example, we could define a
unit sphere about the center of projection for the perspective case. The directions of projection would run radially, normal to the surface of the sphere, and we could project a surface
element dAf onto the sphere and use this area as the measure of the contribution of the ele-
ment. This definition leads to the measurement dAf in terms of the solid angle it subtends.
This is exactly what we would expect, and it will yield the desired 1=r2 fall-off. For the
line perspective, the projection surface would be a cylinder of unit radius, yielding projection rays that emanate radially from the center line, and the contribution of a surface element
would fall off as 1=r, where r is the distance from the center line.
While the use of these projection surfaces will work in a number of cases, it is too restrictive a definition and will not work in all cases. Instead, we propose an alternative and
more general way of looking at these projections. Consider the two-dimensional case of a
perspective projection as shown in Figure 5.6. The projection rays radiate from the center
of projection. For every point in space (except the center of projection), a single ray passes
through that point in a particular direction. The directions evaluated throughout space define a vector field as shown in Figure 5.6b. We could then compute the projected length of
a unit line element by dotting the normal of the element with the projection direction at the
center of the element.
The direction field alone, however, does not tell us the contribution the element would
have to an integral over all directions in the projection. This contribution should be a measure of how many rays cross the a unit line segment in the direction of projection. In our
example, this ray density should fall of with the inverse distance from the center of projection. By weighting our direction field by the ray density measure, we arrive at a new
93
(a)
L2
L1
L3
(b)
v(x
y)
(c)
Figure 5.6: Ray density and vector fields in 2D. (a) For a perspective projection in 2D, the sensor is
effectively a point source casting rays to sample space. (b) We can view this set of rays as defining a
vector field, where each point in space has an associated ray direction. The contribution of elements
in the field for the purposes of integration is defined by how many rays pass through the elements;
element L1 receives more rays than element L2 , because it is closer to the sensor. It also receives
more rays than L3 , because it is more normal to the ray direction. (c) To encode the variations in ray
density, we multiply the direction field in (b) by the ray density. Now the contribution of an element
is jLjn v x y .
( )
94
vector field, depicted in Figure 5.6c. For a unit line segment, dotting the normal of the segment with the strength of this vector field will yield the total contribution of the element.
This ray density field has a familiar physical interpretation. If we think of the center of
projection as being a point source sending equal amounts of particles in all directions, then
the ray density field is the vector flow field describing the direction and rate of particle
flow across unit projected lengths. If we think of these particles as photons, we arrive at a
description of the light field about the point source, and we can measure the contribution of
a surface as the amount of light that flows through the element. We assume the projections
are one-to-one, so the light fields will have a single flow direction at every point.
One of the most important elements of this physical analogy, is the fact that the projection rays emanate from the sensor, but they flow forever without being extinguished or
created anew in space. In other words, this flow field is conservative at all points outside
of the regions where the rays originate. Further, the projection rays are fixed for each range
scan; i.e., the flow field is static. These conservative and static properties will prove very
useful in the next section.
In general, any range imaging frustum will define a vector field whose directions are
consistent with the projection rays and whose magnitude is equal to the flux of rays crossing
a unit projected area. For the case of an orthographic projection, this vector field is simply
a unit vector in the direction of projection:
v(x y z) = v^
A perspective projection yields a radial field with 1=r2 fall-off:
v(r
where
) = r^r2
^r = r=jrj.
The line perspective results in a cylindrically radial field with 1=r fall-off:
v(r x) = ^rr
95
In the latter two cases, we have expressed the vector fields in spherical and cylindrical
coordinates, but they are all equivalently expressible in Cartesian coordinates.
With this new vector flow field analogy, we can now express the relation between dA =
@z ; @z 1 v (x y z)dA
dAi = ; @x
i
@y
!
@z ; @z 1 v (x y z) dxdy
E (f ) =
ei(x y z) ; @x
i
@y
A i=1
ZZ
"
(5.6)
This equation is almost identical to Equation 5.5, except that the constant vector i has been
replaced with a vector field i (x y z ). For the special case of Gaussian statistics, this inte-
gral becomes:
E (f ) =
ZZ
A i=1
wi(x y z)di(x y
z)2
"
@z ; @z 1 v (x y z) dxdy
; @x
i
@y
!
(5.7)
Note how the integration over a common domain brings the summation under the integral. This new formulation allows us to apply the calculus of variations to find the z
I=
ZZZ
@z @z dxdydz
h x y z @x
@y
!
z = f (x y)
96
that minimizes the integral. A central result of the calculus of variations states that the minimizing function must obey a partial differential equation known as the Euler-Lagrange equation [Weinstock 1974]:
@h ; @ @h ; @ @h = 0
@z @x @ (@z=@x) @y @ (@z=@y)
We can easily extend this result to the case of minimizing integrals of sums of functions:
I=
ZZZ
@z @z dxdydz
hi x y z @x
@y
i=1
!
p h
i=1 i
@z
@ @ pi=1 hi ; @ @ pi=1 hi = 0
; @x
@ (@z=@x) @y @ (@z=@y)
P
i=1
"
In order to solve an equation such as Equation 5.6 described in the previous section, we
must first propose the following theorem.
Theorem 5.1 Given the integral:
I=
ZZZ
@z @z dxdydz
h x y z @x
@y
!
where:
z = f (x y)
and the function h is of the form:
@z @z = e(x y z) ; @z ; @z 1 v(x y z)
h x y z @x
@y
@x @y
!
"
97
v re + er v = 0
See Appendix A for the proof. Under the conditions that
(5.8)
v(x y z) corresponds to a
I=
ZZZ
@z @z dxdydz
h x y z @x
@y
!
where:
z = f (x y)
and the function h is of the form:
@z @z = e(x y z) ; @z ; @z 1 v(x y z)
h x y z @x
@y
@x @y
!
where
"
v(x y z) corresponds to a conservative and static vector flow field, then the function
v re = 0
(5.9)
v)
must be exactly zero if the field is not changing with time [Schey 1992]. Thus, the result of
Theorem 5.1 is simplified by removing the divergence term.
Corollary 5.1 has an intuitive interpretation. Consider minimizing the same integral for
a constant vector field (corresponding to an orthographic projection) oriented in the +z direction. The integrand would become h = e(x
98
@e = 0
@z
The only reason h becomes dependent on partials in the theorem, is so that we can solve
the problem over a new domain, rotated with respect to the original domain (for the orthographic case). The solution in this new domain is actually the same; we take the derivative
along the direction in which the function can vary. We can see this by noting that the dot
product in Equation 5.9 is equivalent to a directional derivative taken along the line of sight
for the original domain.
In order to solve the minimization equation posed in the previous section (Equation 5.6),
we need to extend Theorem 5.1 to minimization over sums of functions. This leads to the
following corollary:
Corollary 5.2 Given the integral:
I=
ZZZ
@z @z dxdydz
h x y z @x
@y
!
where:
z = f (x y)
and the function h is of the form:
@z @z = n e (x y z) ; @z ; @z 1 v (x y z)
h x y z @x
i
@y i=1 i
@x @y
!
"
where the i (x y z ) correspond to conservative and static vector flow fields, then the function z that minimizes the integral satisfies the relation:
i=1
vi rei = 0
99
i=1
vi(x y z) rei(x y z) = 0
Noting that a unit vector dotted with the gradient of a function is the same as taking the
derivative in the direction of that vector, we can re-write the equation as:
i=1
jvijDv^ ei] = 0
i
v^
where Dv^ represents the derivative in the direction of the sensor line of sight, i , and we
have dropped the (x y z ) notation.
i
Consider the case where ei corresponds to the ; log(pdf) with Gaussian statistics. The
i=1
jvijDv^ wid2i ] = 0
i
To simplify this relation, we note that the weighting function varies across the range
image, but not along the sensor lines of sight. Accordingly:
Next, we note that the distance from the range surface is a linear function of constant
slope along the viewing direction:
Dv^ di] = 1
i
100
j ijwidi = 0
(5.10)
i=1
Thus, we arrive at a surprisingly simple result: the weighted least squares surface is deterX
mined by the zero-isosurface of the sums of the weighted signed distance functions.
In the orthographic case, the value of j i j is unity, so the minimization equation reduces
to:
i=1
widi = 0
widi = 0
2
i=1 ri
where the center of projection of the ith range image is at (xci yic zic ), and:
q
5.8 Discussion
We have shown that, under a set of assumptions, the surface of Maximum Likelihood derived from a set of range images can be found as a zero isosurface of a function over 3-space.
The function defined in this domain is a sum of functions generated by each range image,
and thus may be updated one range image at a time. For the case of a weighted least squares
minimization, the optimal surface is the zero isosurface of the sum of the weighted signed
distances along the lines of sight of the range images. Here we review the assumptions and
evaluate the validity of each.
5.8. DISCUSSION
101
All range image samples are statistically independent. Each range image is likely to
have little dependence on the other range images, since they are taken as separate events.
However, whether or not samples within a range image exhibit independence depends on
the range scanning device. Consider a scanner that projects an illuminant on one part of the
surface, records the range, and moves the illuminant to the next sample position. If the illuminant projected onto the surface does not overlap with adjacent samples, then we would
expect there to be no statistical correlation among samples. On the other hand, if the illuminant does overlap at different positions, then we might expect some correlations. For a
well designed scanner, the illuminant is usually quite narrow, and/or the sensor lines of sight
have little overlap (held tightly in focus), so that any dependence among samples is highly
local, restricted to nearby samples. In this case, we can regard the assumption of statistical
independence as an approximation to the more exact solution that accounts for local sample interdependence. Still, scanners are likely to have some element of uncertainty that is
independent among samples. Thermal noise in electronic circuitry, for example, will exist
regardless of the proximity of samples, leading to some independent range errors.
Range uncertainties are purely directional and are aligned along the lines of sight of
the sensor which are known to high accuracy. To some extent, this statement may seem to
be a tautology; we have defined a range imaging sensor as a device that returns depth values
along specified lines of sight. When the sensor errs, it returns an erroneous depth, which
must be along the line of sight of the sensor. In the previous chapters, we discussed the
errors in optical triangulation, and these errors manifested themselves along either the lines
of sight of the illuminant or the imaging device, depending what kind of analysis was used in
determining depth. However, a sensor may also have some uncertainty in its lines of sight.
For example, when performing traditional optical triangulation, if a sweeping illuminant has
some wobble in its path, then its lines of sight may not be consistent. This inconsistency
would lead to errors in depth along the line of sight, as well as errors that vary side to side,
perpendicular to the line of sight. The error might then take an elliptical form, rather than a
purely directional form. Nonetheless, the devices that control motion can be very precise,
and sensors such as CCDs have been shown to exhibit excellent geometric stability. As
a result, range imaging sensors can be designed to have excellent accuracy in the lines of
sight, leaving depth uncertainty as the dominant error in the system.
102
Errors in alignment are much smaller than errors in range samples. The problem
of aligning range images has received a fair amount of attention, and the results indicate
that a number of algorithms can achieve accurate alignment among pairs of range images
[Gagnon et al. 1994] [Turk & Levoy 1994]. Figure 5.7 shows the results of performing
alignment with one of these pairwise methods. The problem of aligning a set of range images simultaneously, however, requires new alignment algorithms. Consider a set of range
images taken from 8 points on the compass around an object. By aligning the range images in a pairwise fashion, we can walk around the object. But by the time we have come
full circle, we find that the alignment errors have accumulated, and the first and last range
images are now significantly misaligned. A better approach is to define an error function
that relates all range images to one another, and then minimize this function all at once.
Only recently are solutions to this total alignment problem emerging [Bergevin et al. 1996]
[Stoddart & Hilton 1996]. Nonetheless, through a combination of good initial alignment
and some heuristics for choosing how to perform the pairwise alignments, researchers have
demonstrated that it is still possible to obtain reasonably accurate results. Our results, which
we discuss in Chapter 8, show that the average distance between range samples and their
projections onto the reconstructed surface are roughly the same as the uncertainty in the individual samples. This fact indicates that range errors dominate the alignment errors.
The range images yield viable range surfaces. While many range scanners generate
a dense grid of depth samples, the analysis in this chapter requires the use of (piecewise)
continuous surfaces. As described earlier, we can use the depth samples to reconstruct the
desired range surfaces. In the case of the spacetime analysis described in the previous chapters, continuous reconstruction of the spacetime volume can indeed lead directly to range
surfaces2 . If the surface being sampled is smooth relative to the sampling rate, then the
resulting range surface will be a reasonable estimate of the objects shape. However, the
apparent smoothness of a surface, changes with viewing direction. A surface viewed at a
grazing angle will be sampled far less than a surface that is facing the sensor. The grazing
angle surface may even be undersampled, yielding a questionable, possibly aliased reconstruction. To ameliorate this problem, researchers avoid reconstructing the range surface
2
In practice, we perform the reconstruction at a fixed resolution, leading to range images that still must be
converted to range surfaces.
5.8. DISCUSSION
103
(a)
(b)
(d)
(c)
Figure 5.7: Aligned range images. Two range images were taken 30 degrees apart with an optical
triangulation scanner. (a) and (b) show the corresponding range surfaces. (c) shows a rendering of
the two range surfaces after alignment; the surface from (b) is rendered with a darker (red) color.
(d) shows a blow-up of the head of the model. Note that the range surfaces are so close as to be
inter-penetrating across the surface. The RMS error in distances between nearest points on the two
surfaces is on the order of the error in individual range samples, indicating that the majority of the
uncertainty remains in sampling error rather than alignment. Note that in some areas of overlap in
(c) and (d), one surface is in front of the other in large regions, suggesting correlated range errors.
However, these regions are significantly larger than the neighborhoods spanned by the illumination
of the range scanner, making it unlikely that correlation in range errors is the cause. We suspect these
coherent overlap areas result from small miscalibrations in the range scanner.
104
in areas that are known to be at sufficiently grazing angles and downweight contributions
from surfaces that are at moderately grazing angles. For example, when using the Gaussian
model with weighted signed distances, we can lower the weights in accordance with the orientation of the surface with respect to the sensor. While this is not strictly consistent with
the notion of variance in Gaussian statistics, it yields excellent results in practice, as we will
show in the following chapters.
In addition, points on the range surface that do not coincide with range samples will not
appear to be statistically independent, even if the range samples are. Consider a 2D world
where our sensor has acquired two neighboring range samples. We can reconstruct the range
surface by linearly interpolating between the two samples, i.e., by connecting them with a
line segment. We may regard the two samples as statistically independent, but the points
on the line segment adjoining them clearly depends on the original samples. While this is
not consistent with the assumption that points on the range surfaces are statistically independent, the results do indicate that regarding them as independent is a reasonable approximation.
The range images are functions over the domain where a surface is sought. When
computing the ML optimal surface, we assume that each range image is sampling the same
surface from different points of view. However, an object is generally not a single-valued
function with respect to any single point of view. If we trace a ray from the sensor and
intersect it with the object, we will see the nearest point on the front side of the object,
but if we imagine continuing the ray beyond the first intersection, it will eventually intersect
the back side of the object. Two different range images will sample the front and back
sides of the object, and clearly they are not seeing the same points. Care must be taken so
that when determining the optimal surface, we do not presume that these two range images
are measuring the same surface. In the next chapter, we discuss efforts to make sure that
opposing surfaces are not treated as the same surface.
In the case of the least squares formulation, the range uncertainties obey Gaussian
statistics. The development in the previous sections is suited to any statistical model for directional range uncertainty. Nonetheless, Gaussian statistics are appealing because of their
simplicity; in our case, they lead to signed distance functionals. When might this apply? In
some scanners, individual samples are too noisy, so the sampling is repeated along a single
5.8. DISCUSSION
105
line of sight, and the results are averaged. Almost regardless of the statistics of the errors in
the individual samples, averaging them together tends to make the statistics take on a Gaussian character. Indeed, the central limit theorem tells us that the distribution does indeed
become more Gaussian with each additional sample included in the average. In addition,
our studies of errors due to laser speckle in optical triangulation indicate that the error distributions actually do have a Gaussian appearance, as indicated by Figure 4.13.
Chapter 6
A New Volumetric Approach
The discussion in the previous chapter provides a mathematical framework for merging
range images. In this chapter, we begin with a description of how we can implement this
framework on a volumetric grid (Section 6.1). Then, we extend the framework to represent
knowledge about the emptiness of space around an object (Section 6.2). This additional
knowledge leads to a simple algorithm for filling gaps in the reconstruction where no surfaces have been observed. In Section 6.3, we analyze sampling and filtering issues that arise
from working with a sampled representation. Finally, we discuss the limitations of the volumetric method described in this chapter and offer some possible solutions (Section 6.4).
surface along the line of sight to the sensor. We construct this function by combining signed
distance functions d1 ( ), d2 ( ), ... dp ( ) and weight functions w1( ), w2 ( ), ... wp( ) obtained from range images 1 ... n. Our combining rules give us for each voxel a cumulative
x) = 0 .
As
shown in the previous chapter, this isosurface is optimal in the least squares sense.
Figure 6.1 illustrates the principle of combining unweighted signed distances for the
106
107
Range surface
Volume
Far
Near
x
Sensor
Distance
from
surface
Zero-crossing
(isosurface)
(a)
New zero-crossing
(b)
Figure 6.1: Unweighted signed distance functions in 3D. (a) A range sensor looking down the xaxis observes a range image, shown here as a reconstructed range surface. Following one line of
sight down the x-axis, we can generate a signed distance function as shown. The zero crossing of
this function is a point on the range surface. (b) The range sensor repeats the measurement, but noise
in the range sensing process results in a slightly different range surface. In general, the second surface
would interpenetrate the first, but we have shown it as an offset from the first surface for purposes of
illustration. Following the same line of sight as before, we obtain another signed distance function.
By summing these functions, we arrive at a cumulative function with a new zero crossing positioned
midway between the original range measurements.
108
d1(x)
D(x)
d (x)
2
w2(x)
W(x)
w1(x)
x
r2
Sensor
x
R
(a)
(b)
Figure 6.2: Signed distance and weight functions in one dimension. (a) The sensor looks down
the x-axis and takes two measurements, r1 and r2. d1 x and d2 x are the signed distance profiles,
and w1 x and w2 x are the weight functions. In 1D, we might expect two sensor measurements
to have the same weight magnitudes, but we have shown them to be of different magnitude here to
illustrate how the profiles combine in the general case. (b) D x is a weighted combination of d1 x
and d2 x , and W x is the sum of the weight functions. Given this formulation, the zero-crossing,
R, becomes the weighted combination of r1 and r2 and represents our best guess of the location
of the surface. In practice, we truncate the distance ramps and weights to the vicinity of the range
points.
()
()
()
()
()
()
()
()
simple case of two range surfaces sampled from the same direction. Note that the resulting isosurface would be the surface created by averaging the two range surfaces along the
sensors lines of sight. In general, however, weights will vary across the range surfaces.
axis and has taken two measurements, r1 and r2. The signed distance profiles, d1 (x) and
d2(x) may extend indefinitely in either direction, but the weight functions, w1(x) and w2(x),
taper off behind the range points for reasons discussed below.
Figure 6.2b is the weighted combination of the two profiles. The combination rules are
straightforward:
D(x) = wi(wx)(dxi)(x)
i
(6.1)
109
W (x) = wi(x)
(6.2)
x) and wi(x) are the signed distance and weight functions from the ith range image. Note that setting D(x) = 0 and solving for x is equivalent to the least squares solution
where, di (
described in the previous chapter. Equation 6.1 differs from Equation 5.10 in that the former
is normalized by the sum of the weights. Since the sum of the weights is always positive, the
equations yield the same isosurface. In practice, we have observed that the normalization
serves to equalize the gradient of the sampled field, D( ), and thereby reduces artifacts in
the isosurface algorithms, which are sensitive to sudden changes in the gradient. We discuss
the benefits of using weighted averages in greater detail in Section 6.3.2.
(6.3)
(6.4)
i+1
x) and Wi(x) are the cumulative signed distance and weight functions after inte-
where Di (
In the special case of one dimension, the zero-crossing of the cumulative function is at
a range, R given by:
R = wwi ri
i
(6.5)
i.e., a weighted combination of the acquired range values, which is what one would expect
for a least squares minimization.
110
111
Isosurface
Dmax
Dmin
(b)
(a)
(c)
n2
n1
Sensor
Sensor
(d)
(e)
(f)
Figure 6.3: Combination of signed distance and weight functions in two dimensions. (a) and (d) are
the signed distance and weight functions, respectively, generated for a range image viewed from the
sensor line of sight shown in (d). The signed distance functions are chosen to vary between Dmin
and Dmax , as shown in (a). The weighting falls off with increasing obliquity to the sensor and at the
boundaries of the meshes as indicated by the darker regions in (e). The normals, n1 and n2 shown
in (e), are oriented at a grazing angle and facing the sensor, respectively. Note how the weighting is
lower (darker) for the grazing normal. (b) and (e) are the signed distance and weight functions for a
range image of the same object taken at a 60 degree rotation. (c) is the signed distance function D x
corresponding to the per voxel weighted combination of (a) and (b) constructed using Equations 6.3
and 6.4. (f) is the sum of the weights at each voxel, W x . The dotted green curve in (c) is the
isosurface that represents our current estimate of the shape of the object.
()
()
be negative and positive, respectively, as they are on opposite sides of a signed distance
zero-crossing.
For three dimensions, we can summarize the whole algorithm as shown in Figure 6.4.
First, we set all voxel weights to zero, so that new data will overwrite the initial grid values.
112
/* Initialization */
For each voxel {
Set weight = 0
}
/* Merging range images */
For each range image {
/* Prepare range image */
Tesselate range image
Compute vertex weights
/* Update voxels */
For each voxel near the range surface {
Find point on range surface
Compute signed distance to point
Interpolate weight from neighboring vertices
Update the voxels signed distance and weight
}
}
/* Surface extraction */
Extract an isosurface at the zero crossing
Next, we tessellate each range image by constructing triangles from nearest neighbors on
the sampled lattice. We avoid tessellating over step discontinuities (cliffs in the range map)
by discarding triangles with edge lengths that exceed a threshold. We must also compute a
weight at each vertex as described below.
Once a range image has been converted to a triangle mesh with a weight at each vertex,
we can update the voxel grid. The signed distance contribution is computed by casting a ray
from the sensor through each voxel near the range surface and then intersecting it with the
triangle mesh, as shown in figure 6.5. The weight is computed by linearly interpolating the
weights stored at the intersection triangles vertices. Having determined the signed distance
and weight we can apply the update formulae described in Equations 6.3 and 6.4.
At any point during the merging of the range images, we can extract the zero-crossing
isosurface from the volumetric grid. Isosurface extraction algorithms have been wellexplored, and a number of approaches have been demonstrated to produce tessellations
without consistency artifacts [Ning & Bloomenthal 1993]. These algorithms typically decompose the volume into cubes or tetrahedra with sample values stored at the vertices, followed by interpolation of these samples to estimate locations of zero-crossings, as shown in
113
Volume
wb
d
wa
w
wc
Viewing
ray
Voxel
Sensor
Range surface
Figure 6.5: Sampling the range surface to update the volume. We compute the weight, w, and signed
distance, d, needed to update the voxel by casting a ray from the sensor, through the voxel onto the
range surface. We obtain the weight, w, by linearly interpolating the weights (wa , wb, and wc ) stored
at neighboring range vertices. Note that for a translating sensor (like our Cyberware scanner), the
sensor point is different for each column of range points.
+
+
(a)
(b)
Figure 6.6: Discrete isosurface extraction. (a) In two dimensions, a typical isocontour extraction
algorithm interpolates grid values to estimate isovalue crossings along lattice lines. By connecting
the crossings with line segments, an isocontour is obtained. (b) In three dimensions, the crossings
are connected with triangles. This method corresponds to the marching cubes algorithm described
in [Lorensen & Cline 1987] with corrections in [Montani et al. 1994].
114
Figure 6.6. The gradients at the extracted zero-crossings provide an estimate of the surface
normals1. We restrict the extraction procedure to skip samples with zero weight, generating triangles only in the regions of observed data. We will relax this restriction in the next
section.
If the points xs comprise the implicit surface defined by F (x) = const , then the gradient of F (x) at the
surface, rF (xs), corresponds exactly to the normals over the surface [Edwards & Penney 1982].
1
115
(a)
(b)
(c)
(d)
Figure 6.7: Dependence of surface sampling rate on view direction. When a surface is viewed head
on as in (a), the sampling rate is on average significantly higher than when the surface is viewed
from a grazing angle, as in (b). Note the greater detail in the estimated range surfaces, (c) versus (d).
The variations in weights can also be chosen for very practical considerations. If a surface is roughly planar with some moderate surface detail, then we would expect that a direct view of this surface in line with the predominant surface normal would give a higher
quality range surface than a grazing angle view of the same surface. Figure 6.7 illustrates
this idea. In general, separation between samples on the surface gives an indication of the
comparative sampling rates among different range images. This sample separation is well
approximated by 1= cos where
(e.g., cos2 ), the contributions from the range surfaces can be biased in favor of range views
with higher surface sampling rates.
Reducing weights near surface boundaries serves a practical purpose as well: smooth
blending of range surfaces. Consider the example in Figure 6.8. If the vertex weights are
not tapered near the boundary of a range surface, then when it is merged with another range
surface, there will be an abrupt jump from the average of two surfaces to a single surface.
By tapering the vertex weights to zero in the vicinity of the boundary of a range surface, this
surface will blend smoothly with other range surfaces.
116
(a)
(b)
(c)
Figure 6.8: Tapering vertex weights for surface blending. (a) Two surfaces overlap, but one surface
(shown dotted) has a boundary. (b) If the surface weights are not tapered near the boundary, then an
abrupt transition appears when merging the surfaces, as indicated by the merged surface (solid line).
(c) By tapering the surface weights near the boundary, the surfaces blend smoothly together.
Unseen
117
D(x) = Dmax
W(x) = 0
Hole fill
isosurface
Unseen
Observed
isosurface
Near
surface
Empty
Sensor
Empty
Near surface
D(x) = Dmin
W(x) = 0
(a)
W(x) > 0
(b)
Figure 6.9: Volumetric grid with space carving and hole filling. (a) The regions in front of the surface
are seen as empty, regions in the vicinity of the surface ramp through the zero-crossing, while regions
behind remain unseen. The green (dashed) segments are the isosurfaces generated near the observed
surface, while the red (dotted) segments are hole fillers, generated by tessellating over the transition
from empty to unseen. In (b), we identify the three extremal voxel states with their corresponding
function values.
stored on the voxel lattice. We represent the unseen state with the function values D(
x) =
Dmax , W (x) = 0 and the empty state with the function values D(x) = Dmin , W (x) = 0,
as shown in Figure 6.9b. The key advantage of this representation is that we can use the
same isosurface extraction algorithm we used in the previous section this time lifting the
prohibition on interpolating voxels of zero weight. This extraction finds both the signed
118
distance and hole fill isosurfaces and connects them naturally where they meet, i.e., at the
corners in Figure 6.9a where the dotted red line meets the dashed green line. Note that the
triangles that arise from interpolations across voxels of zero weight are distinct from the
others: they are hole fillers. We take advantage of this distinction when smoothing surfaces
as described below.
Figure 6.9 illustrates the method for a single range image, and provides a diagram for the
three-state classification scheme. Figure 6.10 illustrates how the hole filler surfaces connect
to the observed surfaces. The hole filler isosurfaces are false in that they are not representative of the observed surface, but they do derive from observed data. In particular, they
correspond to a boundary that confines where the surface could plausibly exist. Thus, the
combination of the observed surfaces and the hole fill surfaces represents the object of maximum volume which is consistent with all of the observations. In practice, we find that many
of the hole fill surfaces are generated in crevices that are hard for the sensor to reach.
119
Hole fill
isosurface
D(x,y)
Observed
isosurface
x
y
Blue: W(x,y) = 0
White: W(x,y) > 0
Zoom
Figure 6.10: A hole-filling visualization. The images represent a slice through the volumetric grid
depicted in Figure 6.9. The height of the ramps correspond to the signed distance field, D x y ,
and the shading corresponds to the weight, W x y . The isosurface is extracted at the level shown,
where the dotted green lines correspond to observed surfaces and the dashed red lines correspond
to hole-fill surfaces. Note how the observed and hole-fill surfaces connect seamlessly, with only a
change in weights (going to zero at the hole-fill regions) to indicate the transitions.
( )
( )
120
Scanning scenario
Volumetric slice
Surfaces
Sensor
(a)
(b)
(c)
(d)
Backdrop
Figure 6.11: Carving from backdrops. (a) An orthographic sensor captures a range image of two
cylinders. (b) A slice through the volumetric grid after laying down signed distance ramps near the
observed surface and carving out the empty regions in front of the surfaces. It will be difficult to scan
the portions of the cylinders that are close together. (c) The same scene as in (a) with the addition
of a backdrop. (b) The volumetric slice after scanning the cylinders with a backdrop placed behind
them. Much more of the space is carved as empty, and the cylinders are clearly segregated by the
empty region between them.
121
the signed distance representation for a range surface. We define the symbols sx , sy , and sz
to correspond to the spatial frequencies in x, y , and z . The radial spatial frequency is then:
sx sy sz
sx sy sz
2
max 2
max 2
(smax
x ) + (sy ) + (sz )
where max
sx sy sz is the radial bandlimit.
Consider the case of an orthographic sensor looking down the z-axis having acquired a
122
range surface, f (x
FfDg =
ZZZ
FfDg =
ZZ
ZZ
=
exp ;i2 (sxx + sy y)]Fz fd(z)g exp ;i2 sz f (x y)]dxdy
= Fz fd(z)gFxy fexp ;i2 szf (x y)]g
This equation reveals that the Fourier transform of the signed distance is partially sepa-
rable; i.e., a product of the Fourier transform in z and the transform in xy . The z transform
pertains to a known function, the signed distance ramp, and is thus easily computed. The
xy transform has a significantly more complicated argument that includes the z component
of frequency, sz . Ultimately, we would like to derive the bandwidth of FfDg to help decide the necessary sampling rate for accurate shape capture without aliasing. To do so, we
need to relate some property of the function f (x
transform.
One revealing measure that corresponds to bandwidth is the 2D variance of the squared
2
sx sy )
y) [Bracewell 1986],
(6.6)
123
moment to be zero. A more convenient definition of this variance arises after the application
of several relations. The first is Rayleighs theorem:
ZZ
jG(sx sy
)j2ds
x dsy
ZZ
jg(x y)j2dxdy
(6.7)
i @ g(x y)
sxG(sx sy ) = ;
2 @x
;
@ g(x y)
sy G(sx sy ) = 2 i @y
(6.8)
(6.9)
(6.10)
ZZ
2=
jsxP (sx sy
sx sy )
)j2ds
ZZ
x dsy +
ZZ
jP (sx sy )j2dsxdsy
2=
sx sy )
1
4 2
(6.11)
124
sx sy
)2 =
s2z
8
<
ZZ
A:
9
=
@ f (x y) 2 + @ f (x y) 2 dxdy
@x
@y
dxdy
ZZ
(6.12)
A
where A is the area over which the range surface is defined. At the bandlimit, sz obtains its
maximum value smax
z as set by the signed distance function. We can approximate the square
of the overall bandwidth to be:
(
A dxdy
RR
sx sy sz )
max
sx sy sz )
max
2
(smax
z ) +(
sx sy )
(6.13)
2
(smax
z )
A
8
<
9
=
@ f (x y) 2 + @ f (x y) 2 dxdy
1 + @x
@y
ZZ
A:
(6.14)
nz (x y) =
;1
v
u
u
t
2
2
@
@
1 + @x f (x y) + @y f (x y)
(6.16)
Combining Equations 6.14-6.16, we arrive at the following relation for the bandwidth of the
surface convolved with the signed distance function:
max
sx sy sz
= smax
z
v
u*
u
t
1
jnz (x y)j2
In other words, the bandwidth is set by the average square value of the reciprocal of the
component of the surface normal along the viewing direction. It is therefore very sensitive to
125
surfaces at grazing angles to the sensor; i.e., as nz approaches zero, the bandwidth becomes
very large, requiring high sampling rates.
We can derive a simple rule of thumb for the sampling rate if we consider a single planar
surface. In this case, the bandwidth expression simplifies to:
max
sx sy sz
smax
z
nz
d(z) = z rect zb
where b is width of the distance ramp. The Fourier transform of this function is:
bsz ))
Ffdg = 2ib d(sinc(
ds
z
where sinc(x)
= sin( x)= x. We can define the first zero crossing of this function to be
the bandwidth of the signed distance function. This zero crossing occurs at smax
z = 1=b,
leading to the relation:
max
sx sy sz
1
bnz
Thus, the spacing between voxels ( ) should be small enough that the sampling fre-
bnz
2
(6.17)
This relationship provides several insights. First, the voxel spacing must be small enough to
sample the distance ramp adequately, as indicated by the proportionality to the ramp width
b. Less noisy range data requires a narrower distance ramp and a smaller voxel spacing. In
addition, the dependence on nz is closely related to range image tessellation. When a range
126
image is converted to a range surface, a tessellation criterion determines whether neighboring range samples should be connected. This criterion typically takes the form of a minimum threshold normal component in the direction of the sensor or a maximum permissible
edge length. The edge length criterion is very similar to the normal component criterion.
Thus, the range surface tessellation criterion enforces a bound on the maximum voxel spacing.
Equation 6.17 can also be seen as a guideline for modifying the distance ramps and tessellation criteria to satisfy a desired sampling rate. In other words, the distance ramp can
be seen as a bandlimiting filter; widening the ramp decreases the required sampling rate.
Applying a more restrictive tessellation criterion has a similar effect.
While Equation 6.17 is based on the unweighted signed distance for a single range image, our experience has shown that it serves as an excellent guide for avoiding aliasing artifacts even when using weighted averages of signed distances for multiple range images.
i=1
Without altering the position of the least squares isosurface, we can multiply this equation
by a strictly positive conditioning function,
(x y z)
i=1
(x y z):
One such conditioning function is the reciprocal of the sum of the weights:
127
0.5
0.5
-0.5
0.38
-0.5
0.26
-0.38
0.14
-0.26
isosurface
0.02
-0.14
0.02
actual
surface
(a)
(b)
Figure 6.12: Tessellation artifacts near a planar edge. (a) A range sensor scans a planar surface up
to an edge. (b) The product of the signed distance and the weights are laid down into the volume.
The weights taper near an edge causing the value-weight product to shrink. The square markers are
voxels, the dotted line is the actual surface, and the solid line is the extracted isosurface. Note the
tessellation artifacts that result from the variations in weight.
(x y z) = w (x1 y z)
i
(6.18)
which corresponds to computing a weighted average of the signed distances at each voxel.
But what is the utility of a conditioning function? To answer this question, we consider
the impact of tapering weights near mesh boundaries. Figure 6.12 depicts a planar surface
with a boundary edge after being merged into the volumetric grid. Tapering the weights near
the boundary is intended to make for a smoother blend when merging overlapping surfaces,
but the decreasing weights, when multiplied by the signed distance, alter the gradient of
the implicit function. Isosurface extraction methods such as the Marching Cubes algorithm
are very sensitive to variations in the gradient, and the resulting isosurface has artifacts as
depicted in the figure. On the other hand, after dividing by the sum of the weights (for a
single scan, this amounts to unweighted signed distance), the gradient is constant and no
artifacts appear.
Figure 6.13 shows the effect of using sums of weighted distances versus the use of
128
weighted averages of signed distances when merging range images. Clearly, the weighted
average conditioning smooths the gradient and avoids artifacts associated with the unconditioned isosurface.
The examples in Figure 6.13 show how the sum of weights conditioning function (Equation 6.18) can help reduce variations in the gradient to produce better isosurfaces. To see
how we might design different conditioning functions, we can derive an expression for the
gradient magnitude of the conditioned signed distance function:
jrDj = r
"
= r
i=1
n
X
i=1
jvijwidi
jvijwidi + r jvijwidi
X
i=1
Since we are only interested in the gradient in the immediate vicinity of the isosurface, the
first term on the right hand side of the last equation vanishes to give:
jrDjiso =
=
r jvijwidi
X
i=1
i=1
r (jvijwidi)
where jrDjiso is the gradient magnitude at the isosurface. Thus, we can ensure a uniform
gradient magnitude in the vicinity of the isosurface by setting the conditioning function to
be:
p
X
i=1
r (jvijwidi)
This observation leads to a uniform gradient algorithm. At each voxel we would store a
scalar and a vector. The scalar would be the cumulative weighted signed distance, and the
129
(a)
(b)
(c)
(d)
Figure 6.13: Isosurface artifacts and conditioning functions. Noiseless synthetic range images were
constructed from a perfect sphere scanned orthographically from two directions spaced 45 degrees
apart. The scans were merged using two techniques, one with and one without a conditioning function. (a) and (c) are faceted renderings of the resulting isosurfaces, lit to emphasize artifacts. (b) and
(d) are the gradient magnitudes across the isosurface. Red areas have small gradients while white
areas have large gradients. In (a) and (b), no conditioning function is used, and ripples appear in
the rendering (a) due to variations in the gradient evident in (b). In (c) and (d), a weighted average
conditioning function is used, and the result is a smooth surface (c) with little variation in the gradient (b). The hole at the tops and the jagged boundaries of the reconstructions are artifacts of the
marching cubes algorithm operating on the discrete voxel grid used in this illustration.
130
vector would be the sum of the gradients computed at each voxel for each scan. After merging a set of scans, we would divide the scalar by the magnitude of the vector at each voxel to
yield a volume with gradients normalized in the vicinity of the isosurface. While we have
yet to implement such an algorithm, it holds promise for minimizing isosurface artifacts due
to gradient magnitude variations.
(a)
(b)
131
(c)
Figure 6.14: Limitation on thin surfaces. (a) A surface is scanned from two sides. (b) The volumetric grid shows that the distance ramps overlap as the opposing surfaces come close together. (c)
The resulting isosurface shows that the interference of signed distance functions results in a thicker
surface.
132
D2 (x)
D1 (x)
MIN(D1
D2 )
MIN()
MIN()
voxel within the distance ramp of the surface. This second approach assumes that errors associated with thin surfaces are fairly isotropic; i.e., the erroneous thickening of the surface
is roughly a simple dilation of the surface in the areas affected. Once approximate normals
are established, then we can modify the method for merging range surfaces into the volume.
A simple approach would restrict a range surface to influence only voxels whose normals
are within some threshold of what could possibly be visible to the sensor.
These normal techniques can be extended to store more than one estimated surface normal and distance function per voxel. For example, if a voxel is between opposing surfaces,
then the algorithm could store the estimated orientations for both of the opposing surfaces
and update separate signed distances for range estimates coming from opposite sides. In
general, storage requirements for this method would vary with the geometry of the object:
in particularly complex regions, storage of multiple estimated normals may be necessary,
whereas smoother regions without opposing surfaces would require only a single normal
and signed distance. To reconstruct the final range surface, it might be possible to follow
the contours of the two signed distances separately, and then merge the resulting geome-
try. Alternatively, these distances could be blended together using a MIN() function before
reconstruction as indicated in Figure 6.15.
(a)
(b)
133
-0.7
0.1
-0.7
-0.5
0.3
-0.5
-0.2
0.5
-0.2
(c)
(d)
Figure 6.16: Limitations due to sharp corners. (a) A corner is scanned from two sides. (b) The volumetric grid shows the distance ramps overlapping opposite sides near the apex. (c) The isosurface
shows a thickening of the corner and a gap at the apex. (d) Samples on the volumetric grid show that
even ideal range data, merged into the volume in a manner avoiding surface thickening will lead to
a gap in the reconstruction.
Other potential strategies include modifications to the weight functions. For example, if
the weight functions begin to taper off along the sensors line of sight for voxels behind the
range surface, then the interference with the opposing surface will be reduced. However,
when combining surface estimates on the same side of the surface, the tapered weights will
introduce some bias into the reconstruction. Such an approach would affect the accuracy
of the whole reconstruction, but selective application of this strategy might still be of use.
By adjusting the weight profiles for range samples near boundaries, it is possible to reduce
the surface thickening selectively in at least one of the more troublesome areas, i.e., near
corners. Early experiments in this area have yielded promising results.
134
Figure 6.17: Intersections of viewing shafts for space carving. Three scans are taken from different
points of view. The solid green curve is the actual surface. Due to occlusions (not shown), the bound
on the object is determined primarily by space carving. Notice that in this two-dimensional example,
the resulting boundary is an intersection of empty half-spaces and thus consists of large line segments
with sizes that do not depend on the voxel resolution.
will declare the region near the apex to be empty. In this case, the hole filling algorithm
will bridge the gap.
In the absence of an additional scan or the use of space carving, however, bridging the
gap is still desirable. The range samples acquired at the apex from two viewpoints may be
arbitrarily close together even more closely spaced than the samples in individual range
images so the reconstruction algorithm ought to be able to bridge the gap. One possible
solution would be a modified marching cubes algorithm that assumes regions of the volume near the surface are empty and performs a local hole-filling only to bridge small gaps.
Alternatively, a different hole-filling algorithm could be applied as a post-process, directly
filling small gaps in the polygonal reconstruction.
135
(a)
(b)
(c)
Figure 6.18: Protrusion generated during space carving. (a) Four slices through a volumetric grid
show the signed distance ramps due to visible surfaces as well as the resultant space carving. (b) Taking an isosurface that includes hole-filling yields a surface with a protrusion. (c) Red surfaces indicate the hole fill regions of the isosurface.
(a)
(b)
(c)
Figure 6.19: Errors in topological type due to space carving. (a) Five slices through a volumetric
grid show the signed distance ramps due to visible surfaces as well as the resultant space carving.
(b) The resulting isosurface has a handle. (c) Red surfaces indicate the hole fill regions of the
isosurface.
136
(a)
(b)
(c)
(d)
(e)
Figure 6.20: Aggressive strategies for space carving with a range sensor using a single line of sight
per pixel. (a) A surface is scanned with a range sensor that has a single line of sight per sample.
There is a gap in the recorded range data, which could correspond to the dotted surface. (b) A conservative strategy would carve space only from the observed surfaces. (c) A very aggressive strategy
would incorporate the notion that missing data implies no surface along the sensors line of sight. In
this case, the carving extends to the limit of the sensors field of view. (d) Due to sensitivity limitations, surfaces at grazing angles to the sensor may not be detected. This places a limit on how much
space may be declared empty and still be consistent with a surface that just evades detection due to
oblique orientation. (e) A still less aggressive strategy would simply fill the gap by bounding the
emptiness at the line segment connecting the edges of the missing data. Such a strategy might lead
to less objectionable hole fill regions such as would be generated by the indentation shown in (d).
These undesirable hole fillers might be removed at the volumetric level or by operating
on the reconstructed surface. One volumetric approach might be to apply image processing operations such as erosions and dilations to modify the hole fill regions and possibly
collapse thin unknown regions that lead to handles such as the one shown in Figure 6.19
[Jain 1989]. In addition, more aggressive space carving strategies could be employed to
empty out more of space as illustrated in Figure 6.20. Consider a very simple range sensor with a single line of sight. If such a sensor were to return no measurement, we would
normally draw no conclusions. In fact, barring a surface that is invisible to the sensor,
one can argue that there is no surface along the sensors line of sight, or else it would have
been detected. The space must therefore be empty in this direction. For a full range imaging
sensor, we could follow all lines of sight that returned no range data and declare all these
regions of the volume to be empty. In practice, this approach could not be applied indiscriminately. For instance, no data is returned from very dark regions of a surface or shiny
portions that deflect the line of sight. Even if the surface were known to have a uniform
diffuse reflectance, regions that are oblique to the sensor may not return enough light to be
detected. In this case, there would be a limit to how far the space carving would proceed
137
while being consistent with the possibility that the surface is receding with steep slope. Figure 6.20 shows how this more aggressive space carving might proceed.
In the case of a triangulation scanner, space carving is complicated by the requirement
that each point must have a clear line of sight to both the laser and the camera. The conservative approach consists of following the lines of sight from the surface to both the laser
and the camera and declaring the voxels to be empty as shown in Figure 6.21. For the case
of aggressive space carving, we refer to Figure 6.22. If we hypothesize that point A in Figure 6.22a is not empty, then there must be a surface that occludes the line of sight of the laser
or the sensor, e.g., point B. But in order for point B to be an occluder, then there must be yet
another surface that blocks its visibility. Following this line of reasoning, we eventually determine that a surface must exist in the region of space already conservatively declared to be
empty in order to prevent the detection of point A. This contradiction leads to the conclusion
that point A must be empty. By induction, we can then argue that all points that are accessible to both the laser and the sensor must be empty [Freyburger 1994]. As with the single
line of sight rangefinder, less aggressive strategies are also possible (Figure 6.22e and f).
Another approach to improving the hole fill regions would be to operate directly on the
polygonal reconstruction. For example, the tangent planes to the surface at the boundary
between the hole fill and the observed surface could be used to coerce the hole fill to relax
into a less protruding configuration. Ideally, such a relaxation procedure could be extended
to modify the topology and collapse undesirable handles in the hole fill regions.
138
(a)
(b)
(c)
(d)
Figure 6.21: Conservative strategy for space carving with a triangulation range sensor. (a) A surface
is scanned with a triangulation range sensor that has two lines of sight per sample. For the purpose of
illustration, the camera is assumed orthographic and the scanning sweep to be linear. (b) The region
of space that must be empty from the point of view of the laser. (c) The region of space that must be
empty from the point of view of the camera. (d) The union of the empty spaces yields a conservative
estimate of the emptiness of space around the observed surfaces.
(a)
(b)
(c)
(e)
(f)
(d)
Figure 6.22: Aggressive strategies for space carving with a triangulation range sensor. The sensor
geometry is assumed the same as in Figure 6.21a. (a) Magnified view of the hole fill region established by conservative space carving in Figure 6.21d. Following the arguments in the text, all points
that could be visible to both the laser and the sensor must be empty. (b) The region of space that
could be empty from the point of view of the laser. (c) The region of space that could be empty from
the point of view of the camera. (d) The intersection of the empty regions in (b) and (c) yields an
aggressive estimate about the emptiness of space around the observed surface. (e) Surfaces at grazing angle to the laser or camera might not be detected, so a less aggressive strategy would taper the
boundaries of the space carving region. (f) As in Figure 6.20, a less aggressive strategy would simply fill the gap by bounding the emptiness at the line segment connecting the edges of the missing
data.
Chapter 7
Fast algorithms for the volumetric
method
The creation of detailed, complex models requires a large amount of input data to be merged
into high resolution voxel grids. The examples in the next chapter include models generated
from as many as 70 scans containing up to 12 million input vertices with volumetric grids
ranging in size up to 160 million voxels. Clearly, time and space optimizations are critical
for merging this data and managing these grids. In this chapter, we begin by describing a
run-length encoding scheme for efficient storage of the volumetric data (Section 7.1). Next,
we develop a number of optimizations designed to improve execution time when merging
range data into the volume (Section 7.2). In Section 7.3, we summarize an efficient isosurface extraction method. Finally, we show that the storage and execution optimizations lead
to significant improvements in asymptotic complexity (Section 7.4).
140
Voxel
slices
Volume
Resampled
range
image
Range
image
Range
image
Sensor
(a)
(b)
(c)
(d)
Figure 7.1: Overview of range image resampling and scanline order voxel updates. (a) Casting rays
from the pixels on the range image means cutting across scanlines of the voxel grid, resulting in poor
memory performance. (b) Instead, we run along scanlines of voxels, mapping them to the correct positions on the resampled range image. (c) Range image scanlines are not in general oriented to allow
for coherently streaming through voxel and range scanlines. (d) By resampling the range image, we
can obtain the desired range scanline orientation.
141
Voxel grid
vvox
vim
vproj
Sensor
(a)
(b)
Figure 7.2: Orthographic range image resampling. As shown in (a), the voxel scanlines run in the
direction vvox . The projections of these scanlines onto the range image plane run along the direction
vproj , while the range image scanlines run in the direction vim. By rotating the range image plane
as shown in (b), we can align the directions of the projected voxel scanlines and the image scanlines.
This affords coherent memory access when streaming through the voxel grid and the range image.
transpose method for ensuring that voxel scanlines run parallel to image scanlines.
142
Voxel grid
vim
vvox
vproj
Sensor
(b)
(c)
Figure 7.3: Perspective range image resampling. The notation follows that of Figure 7.2. In (a), the
range image plane is at an angle to the front face of the voxel grid. As a result, the projected voxel
scanlines are not parallel. By rotating the image plane so that the image plane is parallel to the voxel
scanlines (b), we force the projected voxel scanlines to be parallel. As in the orthographic case, we
must additionally rotate the image plane about the viewing axis to complete the mutual alignment.
applies. When coupled with rotating the image about the viewing axis, the resulting image scanlines, viewed as lines segments in three dimensions, now run exactly parallel to
the voxel scanlines. An additional restriction applies to the direction of the voxel scanlines.
Consider the case shown in Figure 7.4. As we follow the voxel scanlines away from the
viewpoint, the projections of these scanlines progress toward the center of the image. Thus,
no rotation of the image plane is suitable to align voxel and image scanlines. On the other
hand, by reorganizing the voxel scanlines so that they run parallel to the image plane (i.e.,
by transposing the data structure), we obtain the desired alignment.
More complex viewing frustums can make it impossible to assure precise alignment of
voxel and image scanlines, as is the case with the line perspective of the Cyberware scanner
for traditional triangulation. In this case, we can choose an image orientation that is aligned
with scanlines for one voxel slice, but the scanlines at different slices will project to curves
that cross image scanlines. Nevertheless, if the projection is nearly orthographic, i.e., the
rays are not severely divergent, then an image orientation can be chosen to minimize the
amount that projected scanlines cross range image scanlines. If rays are highly divergent,
then other methods, such as partitioning the range image resamplings into manageable sections may be applicable.
143
Voxel grid
vvox
vim
vproj
Range image plane
Sensor
(a)
(b)
Figure 7.4: Transposing the volume for scanline alignment. The notation follows that of Figure 7.2.
The projected voxel scanlines in (a) converge to the perspective vanishing point on the image plane.
In this case, it is not possible to rotate the image plane to align with projected voxel scanlines. By
transposing the volume (b), we see that the projected scanlines are now parallel.
144
comparing a scanlines constant z value against the resampled z values from the range image. This method of establishing signed distance will be exploited for more efficient voxel
traversal using the binary depth trees described in the next section.
Figure 7.5 illustrates the shear warp procedure for an orthographic projection in two dimensions. We first transpose the voxel data structure so that the scanlines run as close to
perpendicular to the viewing direction as possible. Then we shear both the range surface and
voxel grid so that the viewing rays run perpendicular to the voxel slices. Note that shearing
the voxel grid simply amounts to changing the origins of the voxel slices; no resampling of
the voxels is ever performed. Next, we resample the range surface with respect to an image
plane parallel to the voxel slices. As before, we stream through voxel scanlines and resample depths stored in the resampled range image. The difference between the z-depth of each
voxel (constant within a scanline) and the resampled range depth corresponds to the signed
145
Shear
Shear
Voxel
slices
Range
image
Sensor
(a)
(b)
(c)
Figure 7.5: A shear warp method for parallel projection. (a) The range image and sensor lines of
sight are not initially aligned with the voxel grid. (b) By shearing the grid and range image, we
straighten the sensor rays with respect to the voxel grid. This shear performs the same function
as the rotation in Figure 7.3. By resampling the range image, we align its pixels with voxels on the
sheared grid. We then lay down the signed distance and weight functions. (c) After updating the
grid, we shear it back to the original space. Note that shearing the voxel grid is simply a matter of
offsetting the origin of each slice; no voxel resampling is performed.
Shear
Surface
Voxel grid
(c)
Sensor
(b)
(a)
Figure 7.6: Correction factor for sheared distances. The signed distance (d
r) to a voxel should
be taken along the line of sight from the sensor as shown in (a). After shearing the voxel grid and
range image, the signed distance corresponds to the differences in z-depth for the voxel and range
sample as shown in (b). The error may be corrected by a constant factor. In two dimensions, (c)
indicates the correction factor would be r
z=
, where is the angle between the original
view direction and the sheared view direction.
sin
146
shear
Voxel slices
scale(z)
Range
surface
Range
image
(a)
Sensor
(b)
(c)
(d)
Figure 7.7: Shear-warp for perspective projection. (a) The range surface and sensor lines of sight are
not initially aligned with the voxel grid. By shearing the voxel grid and range surface (b), followed
by a depth dependent scale (c), we straighten the sensor rays with respect to the voxel grid. (d)
After updating the grid, we unscale and unshear it back to the original space.
distance between the voxel and the range surface. Due to the shear, this distance must be
corrected by a multiplicative factor as depicted in Figure 7.6.
Shear warp factorizations may be adapted to other range imaging frustums as well. Figure 7.7 shows how a perspective transformation can be decomposed into a shear and a scale.
Again, the voxel lattice is never resampled; the shear and scale only serves to indicate how
voxels map onto the resampled range image plane. The differences between voxel depths
and resampled range depths again correspond to signed distances between the voxel and
the range surface. As with the orthographic case, a correction factor is necessary due to the
differences between the projection direction and viewing rays. This correction factor is not
constant over all viewing rays, but it is constant along each viewing ray. Thus, the correction
factor may be stored at each resampled range image pixel.
More complex imaging frustums such as the line perspective of Figure 5.2f will require
more complex transformations, but the principle of doing modified z comparisons is still
valid and leads to an efficient algorithm with the help of the binary depth tree described in
the next section.
147
Range surface
Scanline B
Scanline A
(a)
Range surface
Scanline B
(b)
Range surface
Scanline B
(c)
Figure 7.8: Binary depth tree. Each pair of resampled range image scanlines is entered into a binary
depth tree. (a) Level 0: a bound is set on the whole range surface. Scanline A is well in front of the
bounds and is not processed. Scanline B intersects and forces further traversal of the tree. (b) Level
1: the surface is split in two and yields tighter bounds. Still, both sides must be traversed further. (c)
Level 2: the surface is quartered and scanline B is found to intersect the first and third quarters.
148
manner that rapidly indicates what sections of a scanline are likely to be empty. The resulting speed-ups from the binary tree are typically a factor of 15 without carving, and a factor
of 5 with carving.
When using the more complex viewing frustums such as line perspective, the voxel
scanlines may map onto more than two range image scanlines. In this case, the domain of
the binary tree is widened to span as many scanlines as necessary.
To avoid this O(n3 ) cost, the solution is to work primarily on the varying voxels and to create the transposed volume directly in RLE form without ever expanding it in an intermediate
step. This objective is very similar to the one for rotation of the RLE spacetime images described in Chapter 4. Indeed, the transpose is very much like a rotation by 90 degrees. The
key differences are: (1) no reconstruction and resampling is required, and (2) there are two
values for constant runs (unseen and empty) instead of one (zeroes). The first difference
simplifies the task, while the second complicates it. In the RLE rotation algorithm, when a
run of varying type began where there had been none before on a target scanline, then the
prior run on the target scanline was deduced to be a constant run of zeroes, and could be updated accordingly. As a result, we could stream through only the varying runs in the source
image, and ignore the constant runs. However, when more than one constant run type is
possible, then the constant runs require further attention.
Fortunately, by tracking two source scanlines simultaneously, we can overcome this difficulty. As before, we construct the target scanlines as we stream through the source scanlines. This time, however, each target scanline must keep track of values (empty or unseen)
when building constant runs. As long as the runs in the source scanline are the same as the
current runs in the target scanlines, we would like to skip to the interesting voxels. When
the type of run in the source scanline does differ from the type in the target scanline, then
the target needs to be updated and a new run begun. By simultaneously streaming through
149
Target scanline
direction
Source scanline
direction
Previous
Empty
Current
Empty
skip
Unseen
Unseen
copy
voxels
skip
Empty
Empty
skip
Figure 7.9: Fast transpose of a multi-valued RLE image. The source scanline direction runs horizontally and the target direction runs vertically. We stream through the current and previous source
scanlines simultaneously. We can skip over intervals where scanlines have constant runs of the same
value, as indicated with the first empty run. When the runs differ, then the target scanline must be
finished with its run, so we compute the length of the concluded run, and begin a new one. When
the runs are both of varying type, then the varying voxels are simply copied into the target runs.
the current and the previous source scanline, we can deduce when the source is different
from the target, requiring some work to be done. Figure 7.9 illustrates the fast transpose
algorithm.
150
m
p
n
b
^b
A
more than 2m2 triangles after tessellation to create the range surface, the overall storage is
O(pm2) for the range data. Without compression, another O(n3 ) storage is required to represent the volume, giving an overall storage cost of O(pm2 + n3 ). As for computational
complexity, a brute force algorithm without any spatial data structure for visiting relevant
voxels would execute in time proportional to the number of voxels for each range image
plus the number of samples in each range image, i.e., O(m2 + n3), leading to a total time
complexity of O(pm2 + pn3 ).
In the remainder of this section, we show that the fast algorithm does significantly better than the brute force approach. We begin with the storage complexity of the algorithm
followed by an analysis of the computational complexity.
151
7.4.1 Storage
To analyze storage complexity, we first consider the costs of storing the original range images as well the resampled range images and their depth tree data structures. Next we examine the storage requirements for the RLE volume with and without space carving. Finally,
we compute the storage costs of the reconstructed surface and compare it to the volumetric
storage costs.
As described above, each range image contains m2 samples resulting in no more than
2m2 triangles after tessellation. Because the volumetric algorithm is incremental, we need
not hold all range images in memory at once, so the storage cost for manipulating the range
images is O(m2 ). Before merging the range surface into the volume, it is resampled at voxel
resolution. This resampled range image requires O(n2 ) storage. In addition, a binary tree is
associated with each scanline of the resampled range image. After summing over the nodes
of one tree, we find that it holds no more than 2n nodes, leading to a total of 2n2 nodes over
the whole resampled range image. Thus, the storage cost for the resampled range image and
its binary trees is O(n2 ).
When reconstructing the observed portions of the surface without space carving, the surface is effectively blurred into the volume by the signed distance ramps. This blurred surface occupies a volume of roughly V
3 or ^bA= 2 voxels
that must be stored. The RLE representation requires additional storage for the run lengths.
If the average number of intersections of the surface with each voxel scanline is csurf , then
(1 + 2csurf
\ )n2 run lengths are necessary. The overall storage cost for the volumetric data
structure is then O(^bA= 2 + (1 + 2csurf )n2 ). For objects that do not have an unusually
\
high surface area to volume ratio (i.e., objects without many folds or spiny projections),
n2 and the number of voxels required is proportional to ^bn2. Thus, for nonpathological surfaces, the storage cost for regions near the observed surface is O(n2 ).
then A= 2
The addition of space carving to the algorithm has the potential of greatly increased storage costs. When a sequence of voxels are all labeled empty, they are compactly stored as a
single run of empty voxels; however, the examples in Figure 7.10 illustrate some pathological cases that defy efficient storage. The objects in this figure have highly complex geometry
(7.10a) or extreme variations in surface reflectance (7.10b) that can lead to the creation of
152
(b)
(a)
Figure 7.10: Carving from difficult surfaces. (a) The scene contains multiple objects shown in
gray. Black regions are seen to be empty. After scanning from two viewpoints, many unseen islands form. (b) The same as (a), but now there is a single object with regions of negligible reflectance, shown in dark gray. The sensor cannot digitize these dark portions of the surface, and
space carving from observed surfaces again leads to a fragmented volume.
many thin shafts of emptiness when carving away from the observed surface. The intersections of these shafts can lead to many separate unknown regions. This high degree of
inhomogeneity leads to a worst case storage requirement of O(n3 ).
A=
(a)
153
(b)
Figure 7.11: Storage complexity with and without backdrops in 2D. Three circles are being scanned
indicated as the gray regions. (a) After taking 8 scans without a backdrop, a large number of unseen
islands appear. These will result in high storage costs. In fact, more scans can actually increase
the fragmentation. (b) Using a backdrop, only two scans are necessary to declare most of the space
around the object to be empty.
we have found that the storage required for the RLE volume is comparable to the storage
required for the extracted isosurface.
In summary, for non-pathological cases, the storage requirements for the fast volumetric
algorithm are O(m2 + n2 ). This complexity is clearly superior to the storage requirement
of O(pm2 + n3 ) for a non-incremental, uncompressed algorithm.
7.4.2 Computation
The key operations that influence the computational complexity of the volumetric method
are:
tessellating and resampling the range images
building the binary depth trees
performing volume transposes
finding relevant voxels to update
154
When the voxel scanlines are not oriented in a direction that is desirable for efficient
memory traversal, it is necessary to transpose the volume. The fast transpose algorithm
performs work in traversing the RLE data structures in both the current and the transposed
directions, in addition to the work of copying over voxels. Managing the data structures
takes O(2(1 + csurf )n2 ) time, while copying over the voxels requires O(^bA= 2 ) time. The
Once the volume is transposed for optimal scanline traversal, we query the binary depth
trees to decide which voxels require updating. In the case of reconstructing the observed
portions of the surface, this amounts to deciding which voxels are near the range surface being merged. We can think of the surface as being sampled at voxel resolution and blurred
along the sensor lines of sight by the signed distance ramps. Thus, as many as ^bn2 voxels
may require updating. The cost of finding that a single voxel requires updating is log n, i.e.,
the cost of traversing the binary tree down to a single leaf node. For a run of voxels spanning an interval of leaf nodes in the tree, the cost of finding the consecutive leaf nodes is
amortized over the interval, since the tree is not re-traversed for each new voxel. Still, in
the worst case, each voxel near the range surface requires log n time to be found, yielding
an overall complexity of O(n2 log n).
155
When performing space carving, the cost of locating which voxels should be marked
as empty is a function of the complexity of the emptiness shafts. We define cshaft to be the
average number of shaft intersections per scanline per range image. The cost of finding
the interval of emptiness per scanline is then O(cshaft log n), leading to an overall complex-
shaft
ity of O(cshaft
\ n2 log n). The constant, c\ , can actually be quite large as shown in Figure 7.10, but as indicated in the previous section, the shafts are usually small in number
leading to small values of cshaft for non-pathological objects and an effective complexity of
Once the relevant voxels have been identified with the binary depth tree, the runs of
signed distances and emptiness must be merged with the existing runs in the volume. The
work required for this step is proportional to the number of runs and the number of nonconstant voxels that are already in the volume plus the number that require updating. If
the average number of intersections of a range surface with each scanline is crange , then the
amount of work required to merge runs near the observed surface is O(^bA= 2 +(2+2cshaft
\ +
range
2c\ )n2 + ^bn2). Note that because we are using a double-buffered volume, i.e., we copy
over all untouched voxels to maintain memory locality, the merge step always requires at
least O(^bA= 2) time to execute. For a non-pathological surface, the overall time complexity for merging runs near the observed surface is O(n2 ).
When merging the runs of emptiness into the volume, the argument still holds that the
work required is proportional to the number of runs and the number of non-constant voxels already in the volume and about to be merged. We define cunseen (i) to be the average
number of intersections of unseen regions with each scanline after merging i scans. Merging the runs of emptiness derived from the ith range scan will then require O((2cunseen
(i)+
\
shaft
2c\ (i) + 1)n2 + ^bA= 2) for each range image. If we let ccarve = MAX(2cunseen
(i) +
\
shaft
2c\ (i) + 1), then the total complexity of merging the empty regions is no worse than
O(ccarve n2 + ^bA= 2). As indicated in the previous section, ccarve can be very large, but for
non-pathological objects and suitable use of backdrops, ccarve is typically small, leading to
an overall time complexity of O(n2 ) for non-pathological objects when merging runs during
the carving step.
All of the discussion of time complexity up to this point has been based on merging a
single range image into the volume. For p range images, each of the complexities should be
156
multiplied by p, except for the number of transposes which depends on the order of merging
scans. If the object is scanned from all sides, then by selecting the order of merging the
scans, the number of transposes need not exceed two: one transpose from the first scanline
direction to the second plus one transpose from the second direction to the third. In the worst
case, a transpose is required for every range image, which is still no more than p. Thus,
the overall complexity for merging range images into the volume is O(pm2 + pn2 log n +
pA= 2). For non-pathological objects, this becomes O(pm2 + pn2 log n).
The final step of the surface reconstruction algorithm is to extract an isosurface from
the volume. By visiting only the voxels that are near the surface, this algorithm operates in
In summary, for non-pathological objects the overall time complexity for merging the
that the logarithmic term is typically overwhelmed by other factors, leading to an observed
complexity of O(pm2 + pn2 ). This complexity is a significant improvement over the worst
case complexity of O(pm2 + pn3 ) for the brute force algorithm. At the time of publication
of this thesis, the computational optimizations for the space carving algorithm have not yet
been implemented. However, as long as the volume does not become heavily fragmented,
the optimized approach is expected to behave asymptotically as well as the algorithm that
does not employ space carving.
Chapter 8
Results of the volumetric method
In this chapter, we describe the hardware used to acquire the range data and how we treat
the range scanners lines of sight (Section 8.1). In Section 8.2, we explain our method for
addressing the problem of aligning range images. Finally, we demonstrate the effectiveness
of the volumetric method on several models (Section 8.3).
158
merely a matter a convenience for the implementation. To maximize the amount of possible space carving, it remains an area for future work to follow the lines of sight of the CCD
camera.
8.3. RESULTS
0.060
0.055
0.050
0.045
0.040
0.035
0.030
0.025
0.020
159
0.015 |
1
|
2
|
|
|
|
3
4
5
6
Number of scans merged
Figure 8.1: Noise reduction by merging multiple scans. A planar target was scanned 6 times with
a o rotation of the viewpoint after each scan. Standard deviation from planarity for the reconstruction was determined after merging each scan volumetrically. Note how each scan improves the
estimate of the shape of the target.
15
8.3 Results
We show results for a number of objects designed to explore noise reduction, robustness of
our algorithm, its ability to fill gaps in the reconstruction, and its attainable level of detail.
To explore the noise reduction properties of our algorithm, we scanned a planar target
from 6 different viewpoints. After merging each scan into the volume, we extracted an isosurface, fit a plane through this surface, and computed the standard deviation of the reconstructed vertices from the planar fit. Figure 8.1 shows the results of this procedure. Clearly,
after each scan is merged, the planar fit improves, indicating a reduction in noise.
To explore robustness, we scanned a thin drill bit (about the thickness of the laser sheet)
from 12 orientations at 30 degree spacings using the traditional method of optical triangulation. Due to the false edge extensions inherent in data from triangulation scanners using
traditional analysis (see Figure 4.11b), this particular object poses a formidable challenge.
The rendering of the drill bit reconstructed by zippering range images [Turk & Levoy 1994]
shows the catastrophic outcome typical of using a polygon-based approach. The model generated by the volumetric method, on the other hand, is without holes and preserves some of
the helical structure of the original object, even though this structure is near the resolution
160
8.3. RESULTS
161
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 8.2: Merging range images of a drill bit. We scanned a 1.6 mm drill bit from 12 orientations at
a 30 degree spacing using traditional optical triangulation methods. Illustrations (a) - (d) each show
a plan (top) view of a slice taken through the range data and two reconstructions. (a) The range data
shown as unorganized points: algorithms that operate on this form of data would likely have difficulty deriving the correct surface. (b) The range data shown as a set of wire frame tessellations
of the range data: the false edge extensions pose a challenge to both polygon and volumetric methods. (c) A slice through the reconstructed surface generated by a polygon method: the zippering
algorithm of [Turk & Levoy 1994]. (d) A slice through the reconstructed surface generated by the
volumetric method described in this thesis. (e) A rendering of the zippered surface. (f) A rendering of the volumetrically generated surface. Note the catastrophic failure of the zippering algorithm.
The volumetric method, however, produces a watertight model. (g) A photograph of the original
drill bit. The drill bit was painted white for scanning.
162
(a)
(b)
(c)
(d)
(e)
(f)
Figure 8.3: Reconstruction of a dragon Part I of II. Illustrations (a) and (d) are full views of the
dragon. Illustrations (b) and (e) are magnified views of the section highlightedby the green box in (a).
Regions shown in red correspond to hole fill triangles. Illustrations (c) and (f) are slices through the
corresponding volumetric grids at the level indicated by the green line in (b). (a)-(c) Reconstruction
from 61 range images without space carving and hole filling. The magnified rendering highlights
the holes in the belly. The slice through the volumetric grid shows how the signed distance ramps
are maintained close to the surface. The gap in the ramps leads to a hole in the reconstruction. (d)(f) Reconstruction with space carving and hole filling using the same data as in (a). While some
holes are filled in a reasonable manner, some large regions of space are left untouched and create
extraneous tessellations. The slice through the volumetric grid reveals that the isosurface between
the unseen (brown) and empty (black) regions will be connected to the isosurface extracted from the
distance ramps, making it part of the connected component of the dragon body and leaving us with
a substantial number of false surfaces.
8.3. RESULTS
163
(a)
(b)
(d)
(e)
(c)
Figure 8.4: Reconstruction of a dragon Part II of II. Following Figure 8.3, (a) and (d) are full views
of the dragon, (b) and (e) are magnified views of the belly, and (c) is a slice through the volumetric grid. (a)-(c) Reconstruction with 10 additional range images using backdrop surfaces to effect more carving. Notice how the extraneous hole fill triangles nearly vanish. The volumetric slice
shows how we have managed to empty out the space near the belly. The bumpiness along the hole
fill regions of the belly in (b) corresponds to aliasing artifacts from tessellating over the discontinuous transition between unseen and empty regions. (d) and (e) Reconstruction as in (a) and (b) with
filtering of the hole fill portions of the mesh. The filtering operation blurs out the aliasing artifacts in
the hole fill regions while preserving the detail in the rest of the model. Careful examination of (e)
reveals a faint ridge in the vicinity of the smoothed hole fill. This ridge is actual geometry present in
all of the shaded renderings, in this and the previous. The final model contains 1.8 million polygons
and is watertight.
164
(a)
(b)
(c)
Figure 8.5: From the original to a 3D hardcopy of the Happy Buddha Part I of II. (a) The original
is a plastic and rosewood statuette that stands 20 cm tall. (b) Photograph of the original after spray
painting it matte gray to simplify scanning. (c) Gouraud-shaded rendering of one range image of the
statuette. Scans were acquired using a Cyberware scanner, modified to permit spacetime triangulation. This figure illustrates the limited and fragmentary nature of the information available from a
single range image.
8.3. RESULTS
165
(a)
(b)
(c)
Figure 8.6: From the original to a 3D hardcopy of the Happy Buddha Part II of II. (a) Gouraudshaded rendering of the 2.4 million polygon mesh after merging 48 scans, but before hole-filling.
Notice that the reconstructed mesh has at least as much detail as the single range image, but is less
noisy; this is most apparent around the belly. The hole in the base of the model corresponds to regions
that were not observed directly by the range sensor. (b) RenderMan rendering of an 800,000 polygon
decimated version of the hole-filled and filtered mesh built from 58 scans. By placing a backdrop
behind the model and taking 10 additional scans, we were able to see through the space between the
base and the Buddhas garments, allowing us to carve space and fill the holes in the base. (c) Photograph of a hardcopy of the 3D model, manufactured by 3D Systems, Inc., using stereolithography.
The computer model was sliced into 500 layers, 150 microns apart, and the hardcopy was built up
layer by layer by selectively hardening a liquid resin. The process took about 10 hours. Afterwards,
the model was sanded and bead-blasted to remove the stair-step artifacts that arise during layered
manufacturing.
166
(a)
(b)
(c)
(d)
(e)
(f)
Figure 8.7: Wireframe and shaded renderings of the Happy Buddha model. (a) and (b) Wireframe
and shaded rendering of a single range surface. (c) and (d) Wireframe and shaded rendering of the
2.6 million triangle reconstruction. The contours in the wireframe correspond to small triangles created when the isosurface clips the edges and corners of voxel cubes. The shaded rendering demonstrates noise reduction and increased detail after multiple scans are combined. (e) and (f) Wireframe and shaded rendering of the 800,000 triangle decimated mesh. The decimation process effectively collapses nearly coplanar triangles into larger triangles following the method of [Schroeder &
Lorensen 1992].
8.3. RESULTS
Voxel
Input
size
Model
Scans triangles (mm)
Dragon
61
15 M
0.35
Dragon + fill
71
24 M
0.35
Buddha
48
5M
0.25
Buddha + fill
58
9M
0.25
167
Volume
dimensions
712x501x322
712x501x322
407x957x407
407x957x407
Exec.
time
Output
(min.) triangles Holes
56
1.7 M
324
257
1.8 M
0
47
2.4 M
670
197
2.6 M
0
Table 8.1: Statistics for the reconstruction of the dragon and Buddha models, with and without space
carving.
Statistics for the reconstruction of the dragon and Buddha models appear in Table 8.1.
With the optimizations described in the previous section, we were able to reconstruct the
observed portions of the surfaces in under an hour on a 250 MHz MIPS R4400 processor.
The space carving and hole filling algorithm is not completely optimized, but the execution
times are still in the range of 3-5 hours, less than the time spent acquiring and registering
the range images. For both models, the RMS distance between points in the original range
images and points on the reconstructed surfaces is approximately 0.1 mm. This figure is
roughly the same as the accuracy of the scanning technology, indicating excellent alignment
and a nearly optimal surface reconstruction.
Chapter 9
Conclusion
In this thesis, we have worked from the most basic issues of a prevalent range scanning
technology and progressed to a method that reconstructs complex models from vast amounts
of range data. We review these contributions in Sections 9.1 and 9.2. In Section 9.3, we
describe several areas for future work.
169
practice, we have demonstrated that we can significantly reduce range distortions with existing hardware that uses a perspective sensor of finite resolution. Although our implementation of the spacetime method does not completely eliminate range artifacts, it has proven
to reduce the artifacts in all shape and reflectance experiments we have conducted. Further,
increases in sensor resolution and reduction of filtering artifacts will undoubtedly improve
the accuracy for spacetime analysis, while the same cannot be said for traditional optical
triangulation methods. The influence of laser speckle, however, continues to limit triangulation accuracy.
170
CHAPTER 9. CONCLUSION
the sensor, higher pixel densities mean more samples in z in the spacetime images (as well
as more y samples for the spacetime volume). These improved pixel densities should be
accompanied by either slower scanning or higher frame rates for increased resolution in
x.
In addition, greater dynamic range at the sensor will allow for acquisition of surfaces
with widely varying reflectances even when oriented at grazing angles to the illumination.
Video digitizers with more bits per pixel will also lead to more precise representations of
the spacetime images. Experimenting with different illuminants can also lead to greater accuracy. In Chapter 3, we argue that widening the laser sheet may improve results, because
the spacetime impulse response acts as a bandlimiting filter. Further, the limitations due to
laser speckle can be reduced with partially coherent, perhaps even incoherent illumination.
In the presence of spatiotemporal aliasing, methods for registering and deblurring multiple
spacetime images should also lead to improved resolution.
When a surface is very bright or shiny, surface interreflections will corrupt the spacetime
image. Under these circumstances, recovering accurate range is likely to be very challenging. Research in shape from shading which accounts for surface interreflections may offer
some promising avenues toward a solution [Nayar et al. 1990] [Wada et al. 1995]. In the
case of shiny surfaces, the errors will depend heavily on the orientation of the surface with
respect to the illumination. Acquiring multiple range images can help identify some of these
errors as outliers, because they will not be corroborated by the other range images. In addition, the interreflections will tend to distort the shape of the spacetime Gaussians so that
they will not behave ideally; this deviation from ideal will be detected, and the samples will
be discarded or downweighted in favor of data taken from a different orientation that yields
better spacetime Gaussians.
The method of acquiring and analyzing the spacetime images in Chapter 4 requires all
frames to be captured and then post-processed. Ideally, we would like to perform the spacetime analysis in hardware, in realtime. Such a system could be implemented as a ring of
frame buffers that store the most recent N frames, where N is determined by the time it takes
171
for the illuminant to traverse a point on the surface. To reduce storage costs and improve
performance, the frames could be run-length encoded, and the spacetime rotation could be
implemented as a shear. In the case of a scanning laser beam with a linear array sensor, the
frame buffer would be small a single array of pixels and a hardware implementation
using a high speed digital signal processing chip could be quite practical.
172
CHAPTER 9. CONCLUSION
object being scanned, then the distance to the backdrop is immaterial, and it can be placed
conveniently outside of the working volume of the range scanner.
The hole filling algorithm can be improved by making the surface properties of filler
polygons more consistent with the remainder of the surface. When scanning a surface that
has a discernable texture (such as the scales of a lizard), there will be a distinct difference
between the smoothed hole fill regions and the remainder of the surface. Ideally, we would
like these hole fill regions to adapt to the surrounding texture, either automatically or with
some user guidance. Recent work on analyzing and synthesizing image textures suggests
one approach for propagating surface properties [Heeger & Bergen 1995]. If color information were also available in regions around the hole fillers, then color texture could be grafted
as well.
As range scanners become faster, some approaching realtime, a reconstruction algorithm
that is fast enough to keep pace also becomes very attractive. One can imagine a motorized
acquisition gantry or a hand-held scanner with orientation and position sensors delivering
registered range images to a system that could create a detailed model in a very short period of time. Though we have developed a number of optimizations for the volumetric algorithm to ensure good performance, we have yet to explore an increasingly important direction: parallelization. [Lacroute 1996] has parallelized the shear warp volume rendering
algorithm and obtained excellent results on a shared memory multiprocessor architecture.
We would expect to attain similar improvements with the fast volumetric method described
in Chapter 7. One factor that should make implementation easier is the lack of front to back
ordering that volume rendering requires. In the case of volumetric surface reconstruction,
the voxels are updated in a manner that is completely independent of order.
173
maximize the knowledge about the object and the space around it. Of course, without knowing the shape of the object a priori, it is impossible to determine the next best view in terms
of maximizing the visible surface area of the object. Alternative measures are necessary,
and one promising measure is the surface area of the boundary surrounding the unknown
regions of the space in and around the object. This area is in fact defined by the hole fill
polygons computed in the volumetric algorithm described in this thesis. Initial results in
solving for next best view are promising [Pito 1996], but more research is warranted.
For faithful reproduction of the appearance of objects, the surface reflectance properties must be acquired. A simple strategy would be to reconstruct the shape of an object,
take a few color images, and then paste the color onto the object by following the lines of
sight from the color camera. For overlapping views, the colors could simply be averaged
together. This strategy is flawed, however, because the results are dependent on the lighting and the viewpoint. What we really want are the underlying reflectance properties over
the surface of the object so that we can render the computer model under arbitrary lighting and viewing conditions. These reflectance properties can be very complex. For general
anisotropic surfaces, the bi-directional reflectance distribution function (BRDF) at a point is
five dimensional: 2 dimensions for each of the incoming and outgoing light directions, and
one dimension for wavelength (two dimensions for wavelength if fluorescence is possible).
Accounting for variations over the surface of the object, the space is seven (possibly eight)
dimensional. With controlled lighting and many color images of the object, we can begin
to fill this large space with data. The problem is further complicated, however, by the fact
that light reflecting from the surface toward the sensor arrives at the surface not only from
the light source, but from other points on the surface. Accounting for the effects of these
interreflections is crucial to accurately measuring surface BRDF, which in turn requires a
very accurate description of the shape of the object. Some results have been obtained for
simplified BRDFs and moderate variations across the surface [Nayar et al. 1990] [Sato &
Ikeuchi 1996], but the general problem remains open.
174
CHAPTER 9. CONCLUSION
Appendix A
Proof of Theorem 5.1
Theorem 5.1
Given the integral:
I=
ZZZ
@z @z dxdydz
h x y z @x
@y
!
where:
z = f (x y)
and the function h is of the form:
@z @z = e(x y z) ; @z ; @z 1 v(x y z)
h x y z @x
@y
@x @y
!
"
v re + er v = 0
Proof:
The solution lies in applying the calculus of variations, which has the fundamental result
known as the Euler-Lagrange equation:
175
176
@h ; @ @h ; @ @h = 0
@z @x @ (@z=@x) @y @ (@z=@y)
We begin with the first term, @h=@z . We can rewrite h as:
@z ; @z 1 v
h = e ; @x
@y
@z ; v @z + v
= e ;vx @x
y
@y z
!
"
"
(A.1)
"
(A.2)
@h = ;ev
x
@ (@z=@x)
and:
@h = ;ev
y
@ (@z=@y)
The remaining partials are then:
@ @h
@ (evx) ; @z @ (evx)
=
;
@x @ (@z=@x)
@x
@x @z
@e ; e @vx ; @z v @e + e @vx
= ;vx @x
@x @x x @z @z
"
and:
(A.3)
177
@ (evy) ; @z @ (evy )
@ @h
=
;
@y @ (@z=@y)
@y
@y @z
@e ; e @vy ; @z v @e + e @vy
= ;vy @y
@y @y y @z @z
"
(A.4)
Substituting Equations A.2, A.3, and A.4 into the Euler-Lagrange equation (Equation A.1),
we arrive at:
v re + er v = 0
2
Appendix B
Stereolithography
The field of rapid prototyping, also known as agile manufacturing, is growing quickly to fill
the need for visualizing and testing complex solid parts without the use of time consuming
milling processes. These rapid prototyping technologies typically use a layered manufacturing strategy, where the object is built in horizontal sections, one atop another. This strategy
allows for complex parts to be built without restrictions on internal cavities and accessibility issues that confront conventional milling methods. [Dolenc 1993] gives an overview of
some of the rapid prototyping methods in use today.
One of the more successful layered manufacturing strategies has been the method of
stereolithography, the method used for constructing the hardcopy of the Happy Buddha
model described in Chapter 8. Figure B.1 shows how the process works. The computer
model is first sliced horizontally, yielding a stack of polygonal outlines well-defined regions the separate the inside of the model from the outside. Next, a support platform is
raised to the surface in a vat containing a liquid photopolymer that hardens when exposed
to ultraviolet light. A sweeper spreads a thin layer of the photopolymer over the surface of
the platform, and an overhead UV laser scans and hardens the interior of the polygons in the
first slice. The platform then lowers one slice thickness, the sweeper distributes a new layer
of photopolymer, and the process repeats for the next slice. In this way, the object is constructed slice by slice. The process for building the Happy Buddha model required about
10 hours. Note how important it is that the model have no holes such holes would lead to
open polygons in the slicing step, and the definition of inside and outside the object would
178
179
UV Laser
Mirror
Sweeper
Slicing
Slices
Computer model
(a)
Liquid polymer
(UV sensitive)
Platform
(b)
Figure B.1: The stereolithography process. (a) First the computer model is cut into horizontal slices.
(b) The interior of each slice is hardened by a laser scanning over a photopolymer. The sweeper
delivers even layers across the surface, while the platform lowers after each slice is hardened.
no longer be meaningful. Thus, the contribution of being able to make watertight models
as described in this thesis is essential to the manufacture of three dimensional replicas of
the original model using stereolithography.
When the layering process is finished, the solid object sits immersed in the vat of liquid. The object is then lifted out of the vat, and excess liquid is removed. To save time, the
laser may only sweep the interior of the object enough to make it structurally stable, but not
completely hard. In this case, the object is baked in an ultraviolet oven to complete the
hardening process.
The thickness of the stereolithography slices in the Happy Buddha model is about 150
microns. This may seem very accurate, but it still leaves noticeable contouring artifacts
between slices. Figure B.2 shows some of these contours. To remove the contours, a postprocessing step of bead-blasting and manual sanding is typically required.
One difficulty in constructing models with stereolithography is the problem of overhanging surfaces. If a portion of the surface is not supported from below or from the side, then,
when it is hardened, it will sink into the vat of liquid. As a result, a support must be inserted
into the computer model before slicing and manufacture. This support is typically very narrow so that it requires little time to create as the model is being built. Further, it comes to a
APPENDIX B. STEREOLITHOGRAPHY
180
(b)
(a)
Figure B.2: Contouring due to layered manufacture. (a) The Happy Buddha stereolithography hardcopy. The green rectangle is viewed from a different angle and magnified in (b). Notice the contouring due to the layered manufacturing method. These contours are typically smoothed out with a
post-process of bead blasting and hand sanding.
sharp point so that it may be removed easily after the model is finished. In general, a complex model will require a network of supports as shown in Figure B.3.
181
(a)
(b)
Figure B.3: Supports for stereolithography manufacturing. (a) Shaded rendering of the Happy Buddha model and the supports (show in green) that were needed to handle overhanging surfaces. These
supports come to narrow points and break off easily after the model construction is finished. (b) A
plan view of the supports by themselves.
Bibliography
Bajaj, C., Bernardini, F. & Xu, G. [1995]. Automatic reconstruction of surfaces and scalar
fields from 3D scans, Proceedings of SIGGRAPH 95 (Los Angeles, CA, Aug. 6-11,
1995), ACM Press, pp. 109118.
Baribeau, R. & Rioux, M. [1991]. Influence of speckle on laser range finders, Applied Optics
30(20): 28732878.
Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. [1996]. Towards a general multiview registration technique, IEEE Transactions on Pattern Analysis and Machine Intelligence 18(5): 540547.
Besl, P. [1989]. Advances in Machine Vision, Springer-Verlag, chapter 1 - Active optical
range imaging sensors, pp. 163.
Besl, P. & McKay, H. [1992]. A method for registration of 3-d shapes, IEEE Transactions
on Pattern Analysis and Machine Intelligence 14(2): 239256.
Bickel, G., Haulser, G. & Maul, M. [1985]. Triangulation with expanded range of depth,
Optical Engineering 24(6): 975977.
Boissonnat, J.-D. [1984]. Geometric structures for three-dimensional shape representation,
ACM Transactions on Graphics 3(4): 266286.
Bracewell, R. [1986]. The Fourier Transform and Its Applications, second ed., McgrawHill.
182
BIBLIOGRAPHY
183
Buzinski, M., Levine, A. & Stevenson, W. [1992]. Performance characteristics of range sensors utilizing optical triangulation, Proceedings of the IEEE 1992 National Aerospace
and Electronics Conference, NAECON 1992, pp. 12301236.
Chien, C., Sim, Y. & Aggarwal, J. [1988]. Generation of volume/surface octree from range
data, The Computer Society Conference on Computer Vision and Pattern Recognition,
pp. 25460.
Connolly, C. I. [1974]. Cumulative Generation of Octree Models from Range Data, PhD
thesis, Stanford University.
Connolly, C. I. [1984]. Cumulative generation of octree models from range data, Proceedings, Intl. Conf. Robotics, pp. 2532.
Curless, B. & Levoy, M. [1995]. Better optical triangulation through spacetime analysis,
Proceedings of IEEE International Conference on Computer Vision, pp. 987994.
Curless, B. & Levoy, M. [1996]. A volumetric method for building complex models from
range images, Proceedings of SIGGRAPH 96 (New Orleans, LA, August 5-9 1996),
ACM Press, pp. 303312.
Dolenc, A. [1993]. Software tools for rapid prototyping technologies in manufacturing, Acta
Polytechnica Scandinavica: Mathematics and Computer Science Series Ma62: 1111.
Dorsch, R., Hausler, G. & Herrmann, J. [1994]. Laser triangulation: fundamental uncertainty in distance measurement, Applied Optics 33(7): 13061314.
Duncan, J. [1993]. The beauty in the beasts, Cinefex 55: 5595.
Eberly, D., Gardner, R., Morse, B., Pizer, S. & Scharlach, C. [1994]. Ridges for image
analysis, Journal of Mathematical Imaging and Vision 4(4): 353373.
Edelsbrunner, H. & Mucke, E. [1992]. Three-dimensional alpha shapes, Workshop on Volume Visualization, pp. 75105.
Edwards, C. & Penney, D. [1982]. Calculus and Analytic Geometry, Prentice Hall, Inc.
BIBLIOGRAPHY
184
Elfes, A. & Matthies, L. [1987]. Sensor integration for robot navigation: combining sonar
and range data in a grid-based representation, Proceedings of the 26th IEEE Conference on Decision and Control, pp. 18021807.
Foley, J., van Dam, A., Feiner, S. & Hughes, J. [1992]. Computer Graphics: Principles and
Practice, Addison-Wesley Publishing Company.
Francon, M. [1979]. Laser Speckle and Applications in Optics, Academic Press.
Freyburger, B. [1994]. Personal communication. Stanford University.
Fujii, H., Uozumi, J. & Asakura, T. [1976]. Computer sumulation study of image speckle
patterns with relation to object surface profile, J. Opt. Soc. Am. 66(11): 12221217.
Gagnon, H., Soucy, M., Bergevin, R. & Laurendeau, D. [1994]. Registration of multiple
range views for automatic 3-D model building, Proceedings 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 581586.
Garcia, G. [1989]. Development of 3-d imaging systems for postal automation, CAD/CAM
Robotics and Factories of the Future. 3rd International Conference (CARS and FOF
88) Proceedings, pp. 209216.
Goodman, J. [1984]. Laser Speckle and Related Phenomena, Springer-Verlag, chapter 1 Statistical properties of laser speckle patterns, pp. 976.
Goodman, J. W. [1968]. Introduction to Fourier optics, McGraw-Hill.
Grosso, E., Sandini, G. & Frigato, C. [1988]. Extraction of 3D information and volumetric
uncertainty from multiple stereo images, Proceedings of the 8th European Conference
on Artificial Intelligence, pp. 683688.
Hausler, G. & Heckel, W. [1988]. Light sectioning with large depth and high resolution,
Applied Optics 27(24): 51655169.
Hebert, P., Laurendeau, D. & Poussart, D. [1993]. Scene reconstruction and description:
geometric primitive extraction from multiple viewed scattered data, Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, pp. 286292.
BIBLIOGRAPHY
185
BIBLIOGRAPHY
186
Krishnamurthy, V. & Levoy, M. [1996]. Fitting smooth surfaces to dense polygon meshes,
Proceedings of SIGGRAPH 96 (New Orleans, LA, August 5-9 1996), ACM Press,
pp. 313324.
Lacroute, P. [1995]. Personal communication. Stanford University.
Lacroute, P. [1996]. Analysis of a parallel volume rendering system based on the shear-warp
factorization, IEEE Transactions on Visualization and Computer Graphics 2(3): 218
231.
Lacroute, P. & Levoy, M. [1994]. Fast volume rendering using a shear-warp factorization of
the viewing transformation., Proceedings of SIGGRAPH 94 (Orlando, FL, July 2429, 1994), ACM Press, pp. 451458.
Li, A. & Crebbin, G. [1994]. Octree encoding of objects from range images, Pattern Recognition 27(5): 727739.
Lorensen, W. & Cline, H. E. [1987].
BIBLIOGRAPHY
187
Ning, P. & Bloomenthal, J. [1993]. An evaluation of implicit surface tilers, IEEE Computer
Graphics and Applications 13(6): 3341.
Papoulis, A. [1991]. Probability, Random Variables, and Stochastic Processes, third ed.,
Mcgraw-Hill.
Pieper, S., Rosen, J. & Zeltzer, D. [1992]. Interactive graphics for plastic surgery: A
task-level analysis and implementation, 1992 Symposium on Interactive 3D Graphics, pp. 127134.
Pito, R. [1996]. A sensor based solution to the next best view problem, Proceedings of the
13th International Conference on Pattern Recognition, pp. 941945.
Potmesil, M. [1987]. Generating octree models of 3D objects from their silhouettes in a
sequence of images, Computer Vision, Graphics, and Image Processing 40(1): 129.
Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. [1986]. Numerical
Recipes in C, Cambridge University Press.
Rioux, M., Bechthold, G., Taylor, D. & Duggan, M. [1987]. Design of a large depth of view
three-dimensional camera for robot vision, Optical Engineering 26(12): 12451250.
Rutishauser, M., Stricker, M. & Trobina, M. [1994]. Merging range images of arbitrarily
shaped objects, Proceedings 1994 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 573580.
Sato, Y. & Ikeuchi, K. [1996]. Recovering shape and reflectance properties from a sequence
of range and color images, 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 493500.
Schey, H. [1992]. Div, Grad, Curl, and All That, W.W. Norton and Company.
Schroeder, W. & Lorensen, W. [1992]. Decimation of triangle meshes, Computer Graphics
(SIGGRAPH 92 Proceedings), Vol. 26, pp. 6570.
Siegman, A. [1986]. Lasers, University Science Books.
BIBLIOGRAPHY
188
BIBLIOGRAPHY
189
Wada, T., Hiroyuki, H. & Matsuyama, T. [1995]. Shape from shading with interreflections
under proximal light source, 1995 IEEE International Conference on Computer Vision, pp. 6671.
Weinstock, R. [1974]. The Calculus of Variations, with Applications to Physics and Engineering, Dover Publications.