Sasha Mirpour 3D Camera Tracking

The aim of the study is to compare different 3D camera tracking software focusing on workflow, user-friendly system, and quality of production.

The software packages compared in the study are Maya Live, SynthEyes, and Boujou.

The software packages compared in the study are Maya Live, SynthEyes, and Boujou.


Department of Mathematics, Natural and Computer Science

A Comparison of 3D Camera Tracking Software

Sasha Mirpour
September 2008

Thesis, 10 points, C level

Computer Science

Creative Programming
Supervisor/Examiner: Sharon A Lazenby
Co-examiner: Ann-Sofie stberg

A Comparison of 3D Camera Tracking Software


Sasha Mirpour

Department of Mathematics, Natural and Computer Science

University of Gvle
S-801 76 Gvle, Sweden
[email protected]

In the past decade computer generated images have become widely used in the
visual effects industry. One of the main reasons is being able to seamlessly blend
three dimensional (3D) animation with live-action footage. In this study, different
3D camera tracking software (also referred to as matchmoving) is compared
focusing on workflow, user-friendly system, and quality of production.
Keywords: matchmoving, solving, 2D tracking, 3D calibration, parallax,
photogrammetry, computer vision.

Table of Contents
1 Introduction ............................................................................................................... 3
1.1 Problem definition ............................................................................................................ 4
1.2 Aim ................................................................................................................................... 5
1.3 Question at issue ............................................................................................................... 5

2 Research.................................................................................................................... 6
2.1 Past Research .................................................................................................................... 6
2.2 Current Research .............................................................................................................. 9

3 Preparations ........................................................................................................... 11
3.1 Cameras .......................................................................................................................... 11
3.1.1 Film back/focal length .......................................................................................... 12
3.1.2 Image/pixel aspect ratio ....................................................................................... 12
3.1.3 Frame rate ............................................................................................................ 12
3.1.4 Lens distortion ...................................................................................................... 12
3.2 Planning .......................................................................................................................... 13

4 Discussion Tracking and Calibration .................................................................. 15

4.1 2D Tracking .................................................................................................................... 15
4.2 3D Calibration ................................................................................................................ 16
4.3 Automatic Tracking ........................................................................................................ 17

5 Set Fitting ............................................................................................................... 18

6 Applications ............................................................................................................ 19
6.1 SynthEyes ....................................................................................................................... 19
6.2 2d3 boujou ...................................................................................................................... 19
6.3 PFTrack .......................................................................................................................... 20
6.4 Voodoo Camera Tracker................................................................................................. 20
6.5 Maya Live ....................................................................................................................... 20

7 Comparisons and Results ...................................................................................... 20

7.1 SynthEyes ....................................................................................................................... 21
7.2 2d3 boujou ...................................................................................................................... 23
7.3 PFTrack .......................................................................................................................... 24
7.4 Voodoo Camera Tracker................................................................................................. 26
7.5 Maya Live ....................................................................................................................... 28

8 Discussion and Conclusions ................................................................................... 30

References ................................................................................................................... 32

1 Introduction
Matchmoving is a crucial element of many visual effects shots anytime computer
generated elements needs to be placed into live-action footage. The three dimensional
(3D) objects should be inserted in a method where they appear to move as if they were
part of the real footage with the correct position, scale, and orientation in relation to
the photographed objects in the scene.
Imagine a computer generated robot walking across a street or standing on a building
while the camera is moving, there would be a need for accurate 3D camera tracking
software. Matching the movements of a real camera seamlessly and being stable
enough over extended sequences is difficult while preventing problems with jitter and
drift which would otherwise ruin the appearance of the objects that are placed
correctly in the footage. Basically in the best case scenario, matchmoving allows
directors room for more freedom in terms of location, saving time and money by not
having to set up extensive bluescreen sets or motion control rigs in a limited
In the November 2001 issue of Millimeter, author Steven D. Katz simply explains
how 3D tracking performs. Matchmoving, optics, photogrammetry, and perspective
drawing are all part of an area of mathematics called projective geometry (Figure 1).
Applied to various spatial problems, it can provide solutions for measuring objects at a
distance, locating objects in space, and extracting 3D models from photographs. He
explains that "camera matching software utilizes a subset of projective geometry
called epipolar geometry (geometry of stereo vision) (Figure 2).
This branch of mathematics is used to describe the geometric relationship between two
optical systems viewing the same subject and can be used to locate points in space.
Because a moving camera offers a new view every frame, epipolar geometry works
for a single moving camera as well, and each new view is understood as a separate
optical system." Ultimately projective geometry is a method to accurately show how
3D space can be projected as 2D images while photogrammetry, shows us how 2D
images are used to calculate 3D space. [1]

Figure 1: Albrecht Drer, A Man Drawing a Line, 1525. The idea is that we can get a correct image of
some object seen through a veil or a window by tracing the outline of the object.

Figure 2: Two cameras take a picture of the same scene from different points of view. The epipolar
geometry then describes the relation between the two resulting views.

The final scene with a well created matchmoving shot is never noticed by the
viewer. Quoting Tim Dobbert, the author of Matchmoving - The Invisible Art of
Camera Tracking: If you`ve done your job right, no one should ever know you`ve
done your job at all. So despite its importance, it is completely invisible in the final
shot if it is done right. [2]

1.1 Problem definition

In the past eight to nine years, different software has emerged allowing matchmovers
to track cameras somewhat automatically using a sophisticated technology known as
photogrammetry. These software applications usually have similar workflows;
however they come in various different applications. Even though prices have fallen
considerably the last few years, there are still some huge differences in pricing.

Choosing the right application can be difficult but evidently the one tool that provides
the most techniques for solving a matchmove is the preferred one.

1.2 Aim

The aim for this thesis is to compare different camera tracking software (also referred
to as matchmoving). The chosen applications range from being professional, freeware
and built-in 3D software`s.
I will also insert 3D objects into different video sequences of exteriors and interiors
and with the given results, determine which application would provide the best results
in the various situations.
I plan to use this current research in order to help advance my knowledge and skills in
creating a complete and correct matched environment in 3D, which would help
acquiring more accurate results.

1.3 Question at issue

When comparing the end results of each solution what will be the best choice in using
matchmoving software by determining speed of workflow, ease of use, and quality?
Cameras (lenses, formats, distortion etc.)
Understanding how cameras work, can help a great deal in putting all the pieces
together in matchmoving. What do I need to find out about the cameras and how do I
find that information?
2D Tracking (interactive/manual tracking)
Markers need to be added to important features in the image where the software can
calculate the cameras motion in order to re-create the three dimensional layout of the
scene. What features in an image should I look for when placing the markers? And
how do I deal with image sequences that suffer from motion blur, occlusion, noise
3D Calibration (solving the cameras attributes)
Calibrating or solving the camera is probably the hardest part of matchmoving and
finding out the position and motion of the camera in the 3D scene. How do I find out
what is happening in the scene and how do I evaluate the solution in order to provide
the best results?
Automatic Tracking (editing, masking)
In the beginning, 2D tracking was always created manually. Ever since automatic 2D
tracking appeared, it has made more shots solvable and easier than before. What are

the technical differences between manual and automatic tracking and how are the
benefits compared?
Reconstruction (fitting everything in a 3D environment)
This is the last step in the matchmoving process. The goal is to make the camera fit the
digital set or 3D scene. Therefore, how do I accurately fit a matchmoving camera into
a computer graphic environment?

2 Research
2.1 Past Research
The science of extracting 3D information from 2D photographs predates Computer
Vision; it has a century-old history in the field of photogrammetry, which means
measuring from photos. [3] It has not been in widespread use, except in the field of
surveying, until computers became fast enough to perform the complex calculations.
This made researching in the field of Image Processing and Computer Vision
applicable to the matchmoving problem. Initially scientists built vision systems for
robots to autonomously navigate by building a full 3D representation of its
environment using only standard 2D cameras. Successful examples of this technology
can be seen in the NASA rovers Spirit and Opportunity, navigating their way around
the rocky surface of Mars (Figure 3). [2] [4]

Figure 3: NASA`s Mars Rover Spirit. Using photogrammetry to help navigate it`s way around the rocky
surface of Mars with a minimal amount of human intervention.

In the early 90`s, matchmoving had to be created manually by a trial-and-error process

where animators manipulated parameter curves in animation software to obtain perfect
visual alignment between a computer graphic object and the background footage. A
skilled match-mover would process about 50 frames in a day on reasonably complex
shots, which meant a great deal of time and effort had to be put into these types of
visual effects shots. Prior to digital tracking, these techniques relied heavily on Motion
Control rigs or fixed-camera positions. [4] [5]
Motion Control allows for precise replication of pre-programmed camera moves and
is essentially an electrical driven crane with a camera attached to it that can either be
controlled by a computer or entirely manually. The cameras position, rotation, and
focal length are recorded on a computer in order to replay the exact same motion as
many times as needed where the footage and effects can be composited seamlessly in
postproduction. Despite its features, there are some limitations in accuracy, range, and
flexibility. It is also somewhat costly.
There is however other ways of using motion control by combining it with camera
tracking. To accomplish this, a scene would be captured without motion control, using
motion tracking to obtain the camera trajectory, and then bring it into a motion control
system. The motion control rig can then mimic the exact movement of the original
footage where it allows adding another live action element that otherwise would have
been impossible. [6]
In an interview with Steve Sullivan, director of the research and development group at
Industrial Light and Magic (ILM), he said that in the past making matchmoving work
and look realistic was a challenge involving a lot of guesswork. Sullivan who
received his Ph.D. in computer engineering from the University of Illinois, also
mentions that he watched a show on the making of Jurassic Park detailing the
difficulty the movie makers had creating special effects in moving shots. "They were
doing it all very laboriously, very primitively." Still one of the first and probably one
of the best, examples of matchmoving was used in Jurassic Park. (Figure 4) [7]
The 3D tracking system that ILM used for Jurassic Park was one of the earliest of its
kind based on an older 2D tracking software system called MM2, which was not an
automated tool but a manual 2D nudge tool where the artist would keyframe position
changes by hand. They would also place colored tennis balls in the sets and
environment as reference marks, in order to use them to track the motion of the
camera through the scene. Sullivan knew more automatic approaches were possible.
Together with ILM technician Eric Schafer, they would later develop MARS, ILM`s
in-house matchmoving and tracking software which received an Academy Award. [7]

Figure 4: Jurassic Park, 1993.

As computer graphics progressed to be widely accepted while being used successfully

and more frequently, directors and cinematographers demanded more freedom in
camera movement. This resulted in a range of automated and assisted camera tracking
software solutions that we see today.
1998 was the year when 3D tracking software became acknowledged by the Industry
awards. Doug Roble of Digital Domain and Thaddeus Beier of Hammerhead
Productions both received technical Achievement Awards for their design and
implementation of their respective programs TRACK (Figure 5) and ras_track.
Digital Domains TRACK which had been developed in 1993 mainly as a 2D tracker
had moved toward 3D and was completely rewritten by becoming an integrated tool
for camera position calculation, and scene reconstruction, using Computer Vision
techniques to extract critical 2D and 3D information about a scene that the camera
used to film it.

Figure 5: 3D tracking software developed at Digital Domain which was used on nearly every shot of the
movie Titanic.

Hammerheads ras_track system was used for 2D tracking, stabilization, 3D camera

and object tracking. One year earlier in 1997 3D-Equalizer by Science-D-Visions
became the first software vendor to release a reliable survey-free 3D camera tracking
application. Since then, it has become one of the key matchmoving systems of choice
for the high end feature film production studios. [8]

2.2 Current Research

Numerous advancements have been made ever since the first survey-free 3D camera
tracking software was released in 1997. Improvements in the algorithms and software
to speed up the matchmoving process are constantly being pushed forward with both
updates of existing and newly released projects such as VooCAT by Scenespector
Systems. It is an improved commercial camera tracker with minimum user interaction,
based on the free Voodoo Camera Tracker technology. Prices have fallen but quality
and features have improved. Russ Andersson who founded Andersson Technologies
LLC in 2003 commercialized SynthEyes the same year, which had already been used
by beta testers in feature films such as Bad Boys 2, Charlie`s Angels 2 and Master and
Commander. Not only being one of the most affordable matchmoving solutions, it is
still known for its incredible speed. With a background in robotics and real-time
Computer Vision, Andersson has focused on solution speed as a key enabling aspect
of tracking and so it has been designed from the ground up to be fast. He also explains
in an interview by fxguide that Sometimes a bit of planning can save quite a lot of
time, even for a computer! [8] [9]
Hollywood special effects are not the only area that has benefitted from the
advancements in matchmoving. It is also used in everyday life where accidents cause
real damage. When it comes to accident reconstruction, it has proven to be a valuable
tool in accurately reproducing missing environmental factors in order to analyze and
study the scene long after the accidents took place (Figure 6). In the March 2008 issue
of Plaintiff Magazine, Jorge Mendoza explains thoroughly the entire camera matching
procedure. By creating a 3D survey of the accident scene and having it aerial
photographed, they later move on to place and mount cameras on eyewitness locations
and vehicles. The matchmoving software creates a 3D link between the 3D survey of
the accident scene and 2D video of the scene, accurately positioning the missing
objects. [10]

Figure 6: Animation of a garbage truck about to run over a 10-year-old boy.

Augmented reality is another field of research applicable of using matchmoving

techniques. In an interview by fxguide with Ed Bolton, a former boujou technical
specialist at 2d3 Ltd was asked whether they would explore into realtime tracking. He
asserted by telling that they had looked into it and already had a project called
Lifeplus exploring it. Lifeplus started in March 2002 by MIRALab University of
Geneva, in a joint venture of 11 industrial and research leading partners including 2d3
which developed the software for realtime, exploring augmented reality technologies.
The objective was to recreate ancient frescos-paintings and sites by inserting digital
animated characters in their original setting going about their daily lives, in this case
ancient Pompeii. Basically having a head-mounted display connected to a mobile
computer feeding it with the animated characters based on the visitors viewpoint,
created an interactive attraction combining real and virtual elements. (Figure7) [11]
[12] [13]

Figure 7: Head-mounted display is worn showing computer generated characters.

Another interesting use of augmented reality with Computer Vision can be seen in the
memory game levelHead, created by Julian Oliver. A plastic 5x5x5cm cube with
unique markers on each face is held in front of a camera (Figure 8). The camera sends
the image to a screen while capturing all the moves by displaying tiny rooms on each
face of the cube, which are all connected by doors. And in one of the rooms is the


character. By tilting the cube the player directs the character from room to room in an
effort to find the exit. The software recognizes the marker where it can overlay 3D
content on a per-face basis giving a convincing impression that each room is somehow
inside the cube. [14]

Figure 8: The character walks in the direction of the tilting.

A question I had about where matchmoving is heading in the coming years and how
software would change was answered by Russ Andersson with two main points. The
first he made was simply being able to extract more information from images. His
second point was software being able to analyze more kinds of shots automatically,
with less human input. If software can handle the more difficult shots, then obviously
less time and less skilled operators would be required, making those more difficult
shots called simple and filmmakers giving more outrageous extremely-difficult
shots to work on instead. [15]

3 Preparations
Knowing where to insert the effects or 3D objects and by planning ahead and
gathering as much information as possible helps a great deal in tracking and solving
the scene.

3.1 Cameras
For this research project, I have used a regular Digital 8 camera, DCR-TRV520 NTSC
from Sony. Digital cameras use a small chip called a charge-coupled device (CCD) to


record light coming through the lens and it is the shape of the CCD that defines the
shape of the image. There are a couple of important things to take into account when
figuring out a camera, because to a matchmover these can have a wide range of effects
depending on how they are calibrated. Many cameras feature built-in stabilization,
using a variety of operating principles which can alter the image in ways that might
disrupt the tracking and solving process. It is recommended and probably safest to turn
off the camera stabilization feature when possible working with visual effects,
SteadyShot in my case.

3.1.1 Film back/focal length

The size of the film back (or CCD) is important to a matchmover because the film
back and focal length together help define the field of view (FOV). The FOV is a
linear measurement of what you can actually see in the scene. This is needed in order
to match up a virtual camera inside a 3D animation program with the original camera
that has been used to record the scene. The film back of my NTSC Digital Video (DV)
was set as a Inch CCD.
One thing that should be noted is that within most 3D animation programs film back
is often called aperture. Usually finding out the focal length is much easier than
finding out the film back size. If it is not known, there is the possibility of allowing the
matchmoving software to solve the focal length and make the necessary adjustments.
Searching for the make and model of the camera online is the best way of finding out
the technical specifications that is needed.

3.1.2 Image/pixel aspect ratio

Image aspect ratio is a fixed value based on the width of the image to its height which
should be the same as the source footage. Pixel aspect ratios are usually set to 1.0
(square pixels) for film and in my case working with NTSC video footage this value is
set to 0.9 (nonsquare).

3.1.3 Frame rate

Matchmoving software will solve a scene perfectly fine with the wrong frame rate.
The problem appears when trying to sync the footage inside 3D or video editing
software with the correct frame rate. Making sure the settings are right from the
beginning in both matchmoving and 3D editing software will save a great deal of
work. All my project settings were set to record and play at 29.97frames/second (fps).

3.1.4 Lens distortion

Wide-angel and zoom lenses are the ones that are affected by lens distortion the most.
It is a matter of evaluating the shot to see if it will cause any problems. Lens distortion
can be removed by most compositing or even some matchmoving software with
utilities that undistort the footage before matchmoving. The distortion should be


reapplied to the computer graphic elements in the final composite or it will appear to
tear away at the edges of the frame.

3.2 Planning
In the matter of preparing and planning how to insert effects or 3D objects, Russ
Andersson Illustrates this on his website by showing how an optimal camera path
would be best suited for 3D tracking. [16]
The red arrow (to the left) in Figure 9 shows an example of how to not shoot when
moving through your scene. It does not focus on anything long enough to obtain a
decent 3D track. By focusing on a subject or area while slightly rotating around it, the
perspective changes and this change is called parallax shift. This is important as
matchmoving software analyzes parallax in the image sequences and uses that
information to generate a 3D camera and scene.

Figure 9: The green arrow (to the right) shows the proper path to take in order to create enough parallax
shift for a successful solve.

When recording, it is always recommended to try and keep the camera steady as much
as possible for it will help a great deal when it is time to do the 2D tracking.
Steadycams are the best solution; however they do not come cheap. Obtaining similar
results is shown in a tutorial by Johnny Chung Lee (Figure 10) on how to build your
own camera stabilizer. [17]


Figure 10: Modifications can be made very easy.

Measurements of the camera height and distance to the subject or other objects in the
scene are also important steps to take as they can be a significant assistance during a
computer graphic set fitting. It aids a great deal in making the scale of everything
Adding markers in the set or scene of a recording can help aid in finding the correct
scale and help improve the solution of the tracking procedure. An example of how a
tracking marker could look like is shown in Figure 11. This would be recommended in
environments with very few trackable features. The markers should be placed in
regular intervals rather than randomly spread around the set, and preferably with
known distances apart. And the distance between them will depend on how tightly the
shot is framed. Only the areas that are tracked will be known in 3D space. Therefore if
no tracking markers are placed in a particular area, the software will have no system of
accurately reproducing that portion of the 3D scene. [18]

Figure 11: This is a tracking marker. This will save you money. Written inside the triangle.


4 Discussion Tracking and Calibration

4.1 2D Tracking
When 2D images are transformed into 3D scenes accurate information has to be
gathered by 2D tracking. The tracks serve as two dimensional clues to re-create and
calculate the camera`s motion and the three dimensional layout of the scene.
The only difference when laying out 2D tracks in matchmoving from compositing
software is that the tracking is used to help determine the 3D space of the scene.
Finding a point which is clear and present throughout the image sequence as long as
possible is preferred or in any other case adding more tracks as they disappear.
Features one should look for initially would be the corners of square objects on
buildings, vehicles and signs with a clear contrast between the foreground and
background which makes them a good tracking point.
Two-dimensional track information is sometimes difficult to acquire accurately
because points are obscured or go out of frame in part of the sequence, because
lighting changes dramatically, and because the camera moves so much that the marked
points change their appearance. Whenever 2D tracking fails using keyframing, it
guides the software by identifying where the tracker is located during specific frames.
The tracking points should be spread throughout the scene at various depths, heights,
and widths as much as possible in order to obtain a complete picture of the 3D layout
of the scene. This is important as the software will not have a means of accurately
reproducing the entire 3D scene if there are no markers in a particular area.
A minimum number of tracks are needed in order to provide a solution. Depending on
the software, the number of minimum tracks can be anywhere from seven to twelve
tracks per frame. The tracks must be maintained at all times during the shot by
replacing them in a staggered manner in order to prevent bumps in the camera
movement. (Figure 12)


Figure 12: Multiple tracks that end or start on the same frame cause breaks in the cameras motion path.

Only stationary objects should be tracked, but objects like trees which are rich in
detail that might have subtle movements caused by wind should not be tracked as
matchmoving software is extremely sensitive to even the smallest movement. Features
such as lens flares, reflections, and specular highlights should also be avoided as they
create moving artifacts in the footage.
When heavy motion blur causes the tracker to fail expanding the search area helps as
it allows the tracking engine to look further during its search for a pattern. Expanding
the pattern size might also benefit but it will slow down the tracking. As a last resort
keyframing the track by hand through the motion blur can be very difficult and
inaccurate but will at least provide a solution. Temporary occlusion of objects that are
being tracked can also be helped by keyframing the gap a few frames. Heavy noise
from film grain or compression can cause the resulting camera motion to be jittery, by
removing the grain before tracking either by compositing software or using
uncompressed images which makes for a smoother motion. Another method is to only
track one channel for instance red or green where film grain is often heaviest in the
blue channel.

4.2 3D Calibration
In most matchmoving software, the calibration is a one-button operation. The user
simply instructs the program to solve the shot. The program analyzes the 2D tracking
that the user has provided and generates a camera that matches the real-world camera
by adding markers that represent the 3D locations of the features tracked in the 2D
If the calibration is correct, the 3D markers should line up with the features that they
represent in the image. The easiest way to verify it is by looking through the realworld camera and the 3D camera created by the calibration process. A first-rate


calibration is the perfect or near-perfect alignment between the 2D track and the 3D
marker when viewed through the 3D camera.
Working with animation is usually an iterative process. A solution must be zeroed in
on rather than solved the first try which also applies to matchmoving. When dealing
with an inaccurate solution caused by bad 2D tracking, one needs to go back and
adjust it. Matchmove programs use an iterative process for determining the camera
location so it is a good reason to approach a calibration in the same manner. If for
example problems occur in a certain area of the shot, then troubleshooting in the same
area is advised.
Tim Dobbert writes in his book, Matchmoving - The Invisible Art of Camera
Tracking, about three simple steps when evaluating your solution and these are
checking the 3D nulls, 3D space, and the rendered movie. The first step would be to
compare the 2D and 3D markers (nulls) simply by looking through the 3D camera.
The nulls should appear to match up with the tracked features and the size of the nulls
should also correspond to whether they are close or far away. Checking the 3D space
is accomplished by looking through a perspective view of the 3D scene to see if the
camera moves as one would expect it too without any major spikes or jitter along the
path of the camera and to see if the 3D nulls are in the right position. And the final
check is made by rendering out a test movie to see whether the markers are sticking
throughout the sequence without signs of noise or subtle drifts. [2]

4.3 Automatic Tracking

2D tracking can sometimes be a very tedious task and difficult to figure out. This is
where automatic tracking is beneficial and makes it more bearable. Depending on the
complexity of the scene and length of the shot, automatic trackers can produce
hundreds or thousands of tracks which is the most obvious difference between
automatic and manual tracking.
During automatic tracking the program identifies tracks that move significantly
differently than most tracks around them and removes them, which should eliminate
tracks on moving objects, specular highlights, reflections etc. Even though the process
is entirely automatic, there will be occasions where intervention is needed to guide the
program. When the automatic tracker does not remove all unwanted tracks i.e. on
moving objects, the result will be imperfect with a failed solution. While solving a set
of manual tracks can be somewhat easy to troubleshoot with automatic tracking, it
takes only a few bad tracks to spoil the solution. Apart from the amount of tracks
produced, during the troubleshooting phase of automatic tracking, tracks are edited out
that are not necessary rather than adding tracks in order to obtain the right solution.
Editing out these tracks is a matter of deleting them from the scene or using
keyframed masks just as in compositing to avoid editing out too many tracks from the
start. Manual tracks could also be added or emphasized on existing good quality
automatic tracks to enhance the solution into a solid calibration. [2]


One benefit of automatic tracking is having a camera motion that is less noisy or
jittery when tracking a shot with a lot of noise or film grain. This happens when a
deviation from each 2D track causes similar small scale deviations in the solution.
Automatic tracking does not suffer from this problem as often, due to the large
number of tracks involved in the solution. It could be helpful to run automatic tracking
on top of manual tracking as it would have an overall dampening effect on noisy
camera motions.
While the 2D tracking process is greatly simplified, it will not always track the
features that are desired. Just as any automatic process, the 2D tracking needs to be
monitored and in some cases manipulated to produce the right results.

5 Set Fitting
Fitting the camera into the scene has two purposes. One is a method of checking the
final quality of the matchmove in the 3D animation software. Even if this can be
actually tested inside of the matchmoving software, it does not represent the exact
appearance that results in the final stages of rendering. And secondly it also provides
some sort of reference where a simplified version of the full scene is placed, which is
called a proxy set for the scene layout. Basically, a proxy set is used to determine
whether the various areas of the computer generated scene match up with appropriate
features in the real set.
For example, if a computer graphic robot is supposed to jump between two buildings
and then down to the ground, the matchmover should provide two simple boxes to
represent the buildings and a plane to show where the ground is located which the
robot will fall down on for the animator. It does not have to be detailed. It just has to
show the spatial relationship of key objects in the scene.
A problem during the initial building of a proxy set is figuring out the scale of objects
and different measurement in the scene if they are not available. Studying the footage
looking for any known dimensions, such as standard window heights, tiles on the
ground or cars on the street is a good place to start. When the proxy geometry is in
place adding transparent checkerboard textures on them enables one to see whether the
entire scene is properly matched or not. Alternatively enabling simple wireframes on
the proxy geometry is just as good. Checkerboards and wireframes show how well the
3D objects sticks to the original footage where there is lack of geometric detail. [2]
When putting the camera in the right position within the set, it has to be moved around
and placed to its starting position. The easiest method for doing this is to create a
simple camera rig by parenting the camera under another object e.g. a null. In this
approach, the camera retains its original path without being broken when moved


The scale of the camera move is also of importance as knowing not only where the
camera started, but also how far it goes during the shot. Defining the scale of the
camera move leads to finding the cameras starting and ending positions. This is
prepared within the matchmoving software by setting up a coordinate system which
includes a field for the scale of the scene. The scale is set by defining the distance
between one or more features in the scene. If the height of a window is two meters
then tracking the corners of that window and defining it in the coordinate system as
having a scale of two meters, the camera will then move relative to that scale.

6 Applications
I have tried to select a broad range of different matchmoving software in order to
obtain a sense of which system might work best in any given situation. They vary in
price range and are very different in terms of how the user interface is laid out.

2d3 boujou
Voodoo Camera Tracker
Maya (Unlimited) Live

0.9.4 beta


$ 10,000
$ 5,250
$ 4,995

6.1 SynthEyes

SynthEyes is featured packed and comes with a very low price tag, which is used by
many big names in the industry. It can handle shots of any resolution DV, HD, film,
and IMAX. It also exports to a whole range of software and offers a scripting
language, SIZZLE which makes it easy to modify the exported files, or even add
export type. Russ Andersson is the owner and creator of this software and also
maintained by himself providing everything from support to tutorials. [19]

6.2 2d3 boujou

2d3 boujou is a popular industry-standard camera and object tracking solution

primarily known for its automatic tracking abilities and ease of use. Launched in 2001,
boujou was the first fully automated camera calibration and tracking system, using
advanced adaptive algorithms developed from vision science research. [20]


6.3 PFTrack

Pixel Farm`s Track was launched in 2003 and brought many new tools to the tracking
scene such as integrated optical flow, geometry tracking, and multiple motion solving
which is used by a growing number of renowned visual effects companies. As of
version 5.0, stereo camera functionality is added amongst numerous improvements
and already considered an industry standard. PFtrack evolved and was developed
using a licensed technology from the Icarus program which is still available free for
non-commercial use. PFTrack is very similar to Icarus in terms of the UI layout and
workflow, and despite that it has not been developed in recent years it actually stands
out as a solid matchmoving software. [21] [22]

6.4 Voodoo Camera Tracker

Voodoo Camera Tracker is a non-commercial software developed for research

purposes at the Laboratorium fr Informationstechnologie, University of Hannover. It
can export data to most used 3D animation packages. Therefore, it will be interesting
to see how it stands against the more popular choices. [23]

6.5 Maya Live

Maya Live which is available with Autodesk Maya Unlimited has not been developed
a great deal in recent years and is lagging in features such as automatic tracking
compared to other stand-alone matchmoving software. However, it serves as an
addition to an already great 3D modeling software. Autodesk recently acquired
Realviz which has a set of graphics software ranging from image based modeling to
motion capture and also a well known matchmoving software called Matchmover
which is currently discontinued but could probably be rebranded and replace Maya
Live. [24]

7 Comparisons and Results

For my comparisons, I have used two different footage examples which include a
slightly shaky outdoor shot, and a bit more stable indoor shot. For the indoor shot, I
used a regular tripod to minimize the camera shake and to obtain smoother results
which worked well enough. Additional differences in the footage comprise of light
conditions and amount of trackable features.
Autodesk Maya Unlimited (Maya) and Autodesk 3D Studio Max (3ds Max) two well
known 3D modeling and animation packages, were used when importing the solved


camera for examining the results but the final output of renderings where done mainly
with 3ds Max except for when Maya Live was used.
For practical reasons, I converted the imported DV footage to the TARGA (.tga) file
format, mainly because the Windows version of Voodoo Camera Tracker only imports
footage in .tga format. Maya also requires image files when previewing a sequence
through the 3D camera as long as the filename and numbering is set apart with a
period. Using multiple image files, will make it easier to work between different
software packages by never having to doubt the exact frame count when inside e.g.
Maya or 3ds Max. It is important when the original footage and the tracked camera
animation have to match up for only a one frame slip will break the entire sequence.
When comparing the provided software, five steps were chosen to test the software
which is listed below.

Preparing the footage

Improving the tracks
Setting up a coordinate system
Inserting 3D objects

I was not able to explore every single feature included in each one of the software
other than finding out the most basic methods of achieving a solution. I planned on
finding features that are consistent and trackable throughout the image sequence. It
could be anything from a rock on the ground to an indicator on a car. These should
have a high contrast in relation to their immediate surroundings and not change due to
camera perspective. For example, an intersection where one object crosses another, a
highlight which moves across glass or an object that moves like leaves on a tree. All
of these will produce markers that do not represent any static locked property in the

7.1 SynthEyes
If I would describe SynthEyes in one word, it would be speed. It is by far the fastest
tracker and solver of the list. It is evident from the moment the program starts running
for it opens up instantly. The first impression of the user interface is that there are very
few menu items present. It is similar in layout to 3D modeling software with four
windows each showing top, front, left and perspective/camera view as shown in
Figure 13. Importing the footage brings up a settings window that can determine
start/end frames, frame rate, interlacing ,and different presets for aspect ratios and
back plate. There is also a button for image preprocessing for cropping and stabilizing
the image or even blurring the footage if there is too much noise throughout the
sequence. At the same time, it is possible to scrub back and forth throughout the
footage. In my case, I did not use any of the preprocessing. When importing is
completed, SynthEyes starts loading the footage into memory (caching) by simply
showing the time line changing from red to white just below the menu. It is effective


and very fast when scrubbing through the timeline. Before starting the tracking
process, choosing a motion profile that fits the camera movement helps the tracker to
work even faster and most importantly options to turn on whether the camera was
zooming during the shot or mounted on a tripod. Also SynthEyes by default assumes
that the shot is smooth from a steady cam, dolly or crane and by selecting Hand-held:
Predict/Sticky under the Tracking menu it adjusts the tracking engine further.

Figure 13: Syntheyes four viewports.

Going through the tracking and solving process within SynthEyes is aided by looking
at the error rate at which it solves the scene. Numbers above 1.0 is deficient while
obtaining close to 0 is the best, although being slightly below 1.0 will provide a
workable solution. I used the automatic tracker which supplied a fair amount of tracks
without cluttering the scene too much. Trying out the solver provided an error rate just
above 1.0 hpix. I identify tracks caught on reflections and moving parts of the scene
and delete them which supplied a much better error rate that made the solution
efficient. While the solution almost achieved the desired result, it was not entirely
perfect. I wanted to make it better by using supervised trackers together with
automatic trackers which was accomplished by marking long lasting features spread
throughout the footage.
The coordinate system operates by choosing three tracks on the ground; one is for the
origin of the scene, the second is for the X axis and third is for where the actual
ground plane should be. Adding 3D test objects is easy to create by using simple
controls of the viewports that feel very natural at the first try and immensely easy to
navigate with only the mouse. Playing through the sequence shows how proficient the
3D objects are sticking to the ground. Exporting the scene is done by choosing one of
many 3D animation packages available in the extensive list of software alternatives.
Final results are shown in Figure 14.


Figure 14: syntheyes+3dsmax.

7.2 2d3 boujou

Boujou`s interface consisted of three panels around the main viewport and timeline
with several tabs under each panel. It appeared extremely cluttered at first but in fact
the whole interface could be customized by moving around and resizing the tabs or
removing them entirely. When importing the footage, the type of camera movement
was chosen and the interlacing method was set. Scrubbing the footage was somewhat
slow and had to be played through once in order for it to be fully cached into memory,
however it showed exactly how much was used and available.
One distinctive feature that stood out in boujou was the wizard that moved through
different parts of the matchmoving process and also the presence of help files on the
right side panel as shown in Figure 15. Every step was presented with different
options and related information. Additional steps in improving the 2D tracking process
was optional and aided with some detailed questions leading to actions such as
creating masks for unwanted features and adding target tracks to guide the tracking
engine and to help improve the solution.
Tracking and solving was very time consuming as boujou places numerous tracking
points filling large parts of the image. The results were often a detailed point cloud in
the 3D space. Coordinate system was named as scene geometry and worked by using
one point for the origin of the scene and three points for the x-y plane if it was going
to be imported into Maya. Test objects were easily added to the scene and could be
edited and swapped out for other objects at any time. Final results are shown in Figure


Figure 15: The wizard below the main window with the help documentation on the right side.

Figure 16: boujou+3dsmax.

7.3 PFTrack
Importing footage did not bring up any options for settings instead they were found
under camera parameters. Even the dialog was pretty limited as it seemed to
interoperate the footage correctly and the focal length would be the only obvious
parameter to change. There was no zooming in the footage; therefore it was left as a
default. The automatic tracking was fairly time consuming as it tracked ahead and then
once more backwards. Similar to boujou, the tracking points almost built up the entire


There were a couple of great tools beneath the main viewport window, one of them
was the tracking quality graph (Track-E) which showed the error rate in green, yellow
and red where red was the worst (Figure 17). By choosing Clean Tracks under the
Camera menu, the tracking threshold could be adjusted and this was shown with a
dashed line on the colored graph that excluded everything above it with the most
amount of error. Taking out all the bad tracks and leaving only the green colored ones
would not produce a more accurate camera. While some tracks had errors in them,
they also contain good tracks along the length of the timeline. The feature projection
(Proj-E) and camera error graph (Cam-E) were used to find and edit additional tracks
that may only have big errors during a couple of frames. It was a measure of how well
the 3D feature points match their 2D track positions when viewed by the camera.
There were even green tracks that could interfere when solving as they might follow a
moving object without problems but this was the same for all tracking software and
they should just be deleted. It was a matter of working toward a solution that
eventually works.

Figure 17: Shows the Track-E feature under the main window.

The automatic solutions did not provide the best results and manual tracks had to be
added in addition to the automatic. Manual tracking was actually straight forward but
not as simple as setting up the coordinate system which was easily created by laying
out two lines parallel to each other, preferably there was a clear line of sight with
walls and straight paths on each of the three axes X, Y and Z.
Managing viewports and adding 3D objects in PFTrack was made more complicated
than it had to be when compared to some of the matchmoving software which resulted
in just exporting the camera directly into Maya. It was a matter of snapping the objects
to points and then adjusting them, but I found the process rather cumbersome while
navigating around the viewports at the same time. The memory cache was off by


default and could be set to any amount desired, but there was also a preview choice
that lowers the resolution of the image which had to be played through once in order
for smoother playback. Final results are shown in Figure 18.

Figure 18: pftrack+3dsmax.

7.4 Voodoo Camera Tracker

The user interface was very simple and it was basically comprised of two floating
windows as shown in Figure 19. One was the terminal displaying various information
and warnings about what was going on in the software during tracking, and the second
main window displaying the footage and controls beneath it. Importing the image
sequence brought up three options that could set the start and end frame, interlacing
and move type which was a free move or rotation (camera on tripod). When the
sequence was loaded there really was not a practical way of scrubbing through the
footage other than playing through it or moving frame by frame.


Figure 19: Voodoo Camera Tracker`s minimal interface.

Second thing that needed to be prepared was loading the initial camera which brought
up a range of settings from focal length, film back to aspect ratio. There was also a list
of pre-sets from various camera types which helped out in case no information was
available for the camera used in the footage.
The tracking process was automatic and solved the camera when finished. Under
View Controls the main settings guiding how Voodoo estimates the camera tracking
were found under the Detection parameter. One of three different algorithms could be
chosen for the corner detection in the footage. It was not clear exactly what the
different algorithms named Harris, Foerstner, and Susan performed other than being
able to try them out to see if the results differ for the better. Otherwise it was advised
to keep them at the default configuration.
My first try with the interior shot did not provide any accurate results and produced
points behind the camera when looking through the 3D scene viewer which was a very
bad sign. The process took about one hour and finished with an error rate (RMSE) of
10.670 where it usually should be around or under 1.0 for acceptable results. Also
every time the tracking/solving was finished with a high error rate it suggested doing a
bundle adjustment to refine the estimation, this could take up to several hours with no
significant improvements especially with the amount of errors that I obtained from the
interior shot. I did try out the Forestner algorithm which provided better results but
still insufficient at an RMSE of 5.16. As a last resort I chose rotation camera instead of
free move with the default settings which assigned a 2D point cloud and with an
RMSE of 5.15. It actually worked by keeping the camera steady and not jumping
around until it started to turn in the sequence which took away the illusion entirely.


With the recommendations of avoiding motion blur, long sequences e.g. not more than
200-400 frames, and that Voodoo Tracker was more robust in estimating the camera
parameters for a camera mounted on a tripod, it provided a notion about how far one
could use the Voodoo tracker. When I began with the exterior shot, the tracker did a
much better job in solving this particular scene. And it produced one of the best 3D
point clouds out of all the trackers and with an RMSE of 2.23. I exported the scene to
3ds Max. The position of the scene was adjusted directly in 3ds Max simply by
moving around the null or dummy already created by Voodoo. The image plane or
background plate of the sequence had to be set manually inside both 3ds Max and
Maya. There was some sliding in the camera path but overall it was working
sufficiently for inserting 3D objects. Final results are shown in Figure 20.

Figure 20: voodoo+3dsmax.

7.5 Maya Live

Maya Live had no automatic tracking so the tracks were placed manually and required
around ten tracks present in the footage at any given time for best results. And it had
to be first loaded through the plug-in manager in Maya. The user interface did not
change more than the addition of a window just above the timeline. Here four main
configurations could be set which were Setup, Track, Solve, and Fine-Tune. In Setup
mode, the footage was chosen and camera parameters set. If film aperture (film back)
and device aspect ratio were correct, a square tracker button turns green and had to be
pushed in order to continue. These values could directly be copied to the render
settings. Changing to the Track window brought up three windows; the main window
with the entire image displayed, and a smaller window for zoomed in trackers, and the
third shows the quality of the tracks from red, yellow to green. Adding tracks was very
easy and worked just as in any other matchmoving software. The direction of the
tracking could be set to forward, backward or bidirectional which was very practical if
a track needed to be placed in the middle of a clip.


The playback of the footage was not smooth but scrubbing through forward and
backwards enabled it to work much better. When adding track points and starting the
tracking procedure, Maya Live would bring up a window with a video showing the
resulting motion of the track. The tracks should be mostly green indicating a good
solid track. Repeating the procedure with around ten tracks was sufficient and should
provide an overall green progress bar. The tracking points should be spread
throughout the image and last as long as possible. At this point a ground plane should
be defined. To define where the ground level was located, I had to switch to the Solve
settings and select Survey in the menu with all the points on the ground selected,
change the constraint type to Plane and make sure registration only was checked,
and finally clicking on the create button.
The scale and origin of the scene was basically set exactly in the same procedure. The
difference being selecting only two points for Distance which determines the scale,
and one for the origin being the Point constraint and making sure registration only
was un-checked in both cases. Switching back to Solve in the menu and starting the
solving process, the scene quickly changes and was centered around the origin and
everything was then lined up properly. With an overall pixel slip of 0.661 the solution
was solid enough and worked. Adding 3D objects was simply created directly through
any viewport within Maya as shown in Figure 21.

Figure 21: Maya Live Solve menu below the four viewports.

Initially, I had some issues with the tracking. After successfully adding a couple of
tracking points and suddenly when I added another, it failed to track in any direction.
Sometimes, continued tracking was allowed by deleting the last one and adjusting or
changing position of where I added the same tracker. Otherwise, it produced an error
in the command feedback. The worst issue was Maya Live suddenly crashing.


However, this issue was resolved by entirely re-installing Maya. Final results are
shown in Figure 22.

Figure 22: live+maya.

8 Discussion and Conclusions

Visual effects have gone through great lengths in pushing what is physically possible.
A great of amount of credit was due to the research in Computer Vision techniques
and photogrammetry. In the eighties, it was always easy to spot the effects shot in a
movie. The camera would usually stop moving and the actors would be on one side of
the image and the effects would be on the other. Today, cameras are constantly
moving and the special effects interacting with the actors in the scene.
In my thesis, I have explained the fundamental process of matchmoving techniques,
covered how to find and set up proper camera values, and processes that needed to be
prepared before recording a scene. I have also discussed how 2D tracking works,
explored the benefits of automatic tracking and how to evaluate the 3D calibration.
And finally, I explained how to accurately fit a solved camera to an existing 3D set.
Finding out as much information as possible from the cameras that were used in
acquiring footage was of great importance as it helped in deciphering all the clues
needed to reveal how the camera moved during a shot. During the 2D tracking
process, the best points to look after were features with clear contrast often found on
edges of stationary objects. In situations with very bad footage, the solution would be
to enhance the sequence in compositing software e.g. removing noise. Most problems
with calibration could be traced back to the 2D tracking; therefore accurate tracking
was the key. And by comparing the 2D and 3D markers, they should match up with
the tracked features when calibrated.


Automatic tracking was simply the best way of speeding up the matchmoving process.
And with the ability to add your own tracking points together with automatic tracking,
it created an even more solid solution. Depending on how well the camera motion was
acquired from the tracking process, it determined how far one could recreate the entire
scene. By using checkerboard textures or wireframes on the geometry, it could reveal
how well the scene was reproduced.
Having acquired the basic knowledge in order to begin in my research, I have tested
and compared five different matchmoving solutions. They all performed sufficiently
in most cases, enabling me to insert animated 3D objects. One important aspect of the
entire matchmoving process was that I cannot expect matchmoving software to just
solve any scene that was thrown at it. Although if lucky, and satisfactory results were
achieved on the first try. It was still generally an art form to get everything correct and
working towards a solution, which comes with a lot of practicing and with a sheer
amount of trial and error.
PFTrack provided the best end results, but it also demanded more work in order to
achieve it. Having a great set of tools and unique features not found in any other
application, it appeared to be an impressive software.
In boujou, the use of the help wizard did not result in any better solving of the camera,
but it did provide workable results. Being the most expensive package of the list, 2d3
offered a stripped-down version, named bullet SD at a much lower price range and a
simplified interface and with the same tracking and solving features. I believe that I
should have chosen this lighter solution from 2d3 in terms of diversifying the current
list even more.
Voodoo Camera Tracker which performed very poorly in one of the test footage, did
surprisingly well in the other. With Voodoo being one of the few free camera trackers,
I was not expecting it to be a 100% solution. However, I did have high hopes that it
would. Having tested the ICARUS tracker even though I have not included it in my
report, I would highly recommend it as an alternative for anyone wanting to use a
capable tool within matchmoving.
Maya Live had an intuitive 2D tracking and easy solving procedure. Being the only
fully manual tracking software made it a rather time consuming, but still proved to be
very robust in solving the camera path with good results.
Ultimately, I found that SynthEyes performed the best overall in terms of speed, ease
of use, and results. Since it is the most affordable and intuitive matchmoving software,
the combination is truly hard to beat.
All the videos of the final rendered matchmoving results can be found online at or on a CD-ROM disk from the Department
of Mathematics, Natural and Computer Science at the University of Gvle.


[1] D. Katz, Steven. The New Cinematography. Millimeter. November, 2001., July, 2008.
[2] Dobbert, Tim. Matchmoving - The Invisible Art of Camera Tracking. SYBEX Inc,
2005. pp 1-273
[3] E. Debevec, Paul. ACM SIGGRAPH 1999. Image-Based Modeling and Lighting,
July, 2008.
[4] Robert, Luc. IBE - International Broadcast Engineer,, July, 2008.
[5] Roble, Doug. ACM SIGGRAPH 1999. Vision in Film and Special Effects, July,
[6] Roberts, Mark. Motion Control. The 7 Uses of Motion Control
ntrol.html, July, 2008.
[7] Interview by Mary Helen Stoltz with Steve Sullivan., July, 2008.
[8] Mseymor7, fxguide. Art of Tracking Part 1: History of Tracking,
July, 2008.
[9] Johnmont, fxguide. Art of Tracking Part 2: Tips & Apps Overview, July, 2008.
[10] Mendoza, Jorge. Accident reconstruction and visibility studies are enhanced
with camera-matched 3D video. Plaintiff. March 2008.
uction%20and%20visibility%20studies%20are%20enhanced%20with%20cameramatched%203D%20video_Plaintiff%20magazine.pdf, July, 2008.
[11] Mseymor7, fxguide. test driving boujou bullet, July, 2008.
[12] MIRALab. University of Geneva. Lifeplus, July, 2008.


[13] BBC. Pompeii gets digital make-over, July, 2008.
[14] Oliver, Julian. levelHead, July, 2008.
[15] Andersson, Russ. E-mail, September, 2008.
[16] Andersson, Russ. Camera Tracking Path-ology, July, 2008.
[17] Chung Lee, Johnny. $14 Steadycam, July, 2008.
[18] Dobbert, Tim. Computer Arts. The rules of camera tracking
acking, July, 2008.
[19] Andersson Technologies LLC. SynthEyes., July, 2008.
[20] 2d3 Ltd, boujou., July, 2008.
[21] The Pixel Farm. PFTrack., July, 2008.
[22] ICARUS., July, 2008.
[23] digi-lab, Voodoo Camera Tracker., July, 2008.
[24] Autodesk, Maya Live., July, 2008.


