Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Vision-based Mobile Robot Localization And Mapping

using Scale-Invariant Features


Stephen Se, David Lowe, Jim Little
Department of Computer Science
University of British Columbia
Vancouver, B.C. V6T 1Z4, Canada
{se,lowe,little}@cs.ubc.ca

Abstract from which it determines both the camera motion and


A key component of a mobile robot system is the the 3D positions of the features. It is accurate in the
ability to localize itself accurately and build a map short to medium term, but long-term drifts can oc-
of the environment simultaneously. In this paper, a cur. The ego-motion and the perceived 3D structure
vision-based mobile robot localization and mapping al- can be self-consistently in error. It is an incremental
gorithm is described which uses scale-invariant image algorithm and it runs at near real-time.
features as landmarks in unmodified dynamic environ- A stereo vision algorithm for mobile robot mapping
ments. These 3D landmarks are localized and robot and navigation is proposed in [13], where a 2D occu-
ego-motion is estimated by matching them, taking into pancy grid map is built from the stereo data. However,
account the feature viewpoint variation. With our Tri- since the robot does not localize itself using the map,
clops stereo vision system, experiments show that these odometry error is not corrected and hence the map
features are robustly matched between views, 3D land- may drift over time. [10] proposed combining this 2D
marks are tracked, robot pose is estimated and a 3D occupancy map with sparse 3D landmarks for robot
map is built. localization, and corners on planar objects are used
as stable landmarks. However, landmarks are used
1 Introduction
for matching only in the next frame but not kept for
Mobile robot localization and mapping, the process
matching subsequent frames.
of simultaneously tracking the position of a mobile
robot relative to its environment and building a map of Markov localization was employed by various teams
the environment, has been a central research topic for with success [15, 17]. For example, the Deutsches
the past few years. Accurate localization is a prerequi- Museum Bonn tour-guide robot RHINO [3, 6] utilizes
site for building a good map, and having an accurate a metric version of this approach with laser sensors.
map is essential for good localization. Therefore, Si- However, it needs to be supplied with a manually de-
multaneous Localization And Map Building (SLAMB) rived map, and cannot learn maps from scratch.
is a critical underlying factor for successful mobile Thrun et al. [19] proposed a probabilistic approach
robot navigation in a large environment, irrespective for map building using the Expectation-Maximization
of the higher-level goals or applications. (EM) algorithm. The E-step estimates robot locations
To achieve SLAMB, there are different types of at various points based on the currently best available
sensor modalities such as sonar, laser range finders map and the M-step estimates a maximum likelihood
and vision. Many early successful approaches [2] uti- map based on the locations computed in the E-step.
lize artificial landmarks, such as bar-code reflectors, It searches for the most likely map by simultaneously
ultrasonic beacons, visual patterns, etc., and there- considering the locations of all past sonar scans. After
fore do not function properly in beacon-free environ- traversing a cyclic environment, the algorithm revises
ments. Vision-based approaches using stable natural estimates backward in time. It is a batch algorithm
landmarks in unmodified environments are highly de- and cannot be run in real-time.
sirable for a wide range of applications. Unlike RHINO, the latest museum tour-guide robot
Harris’s 3D vision system DROID [8] uses the vi- MINERVA [18] learns its map and uses camera mosaics
sual motion of image corner features for 3D recon- of the ceiling for localization in addition to the laser
struction. Kalman filters are used for tracking features scan occupancy map. It uses the EM algorithm in [19]
to learn the occupancy map and the novelty filter in
[6] for localization.
The Monte Carlo Localization method was pro-
posed in [5] based on the CONDENSATION algo-
rithm [9]. This vision-based Bayesian filtering method
uses a sampling-based density representation. Unlike
the Kalman filter based approaches, it can represent
multi-modal probability distributions. Given a visual
map of the ceiling obtained by mosaicing, it localizes
the robot using a scalar brightness measurement.
Sim and Dudek [16] proposed learning natural vi-
sual features for pose estimation. Landmark matching
is achieved using principal components analysis and
a tracked landmark is a set of image thumbnails de-
tected in the learning phase, for each grid position in (a)
pose space.
Using global registration and correlation tech-
niques, [7] proposed a method to reconstruct consis-
tent global maps from laser range data reliably.
Recently, Thrun et al. [20] proposed a novel real-
time algorithm combining the strengths of EM algo-
rithms and incremental algorithms. Their approach
computes the full posterior over robot poses to de-
termine the most likely pose, instead of just using (b) (c)
the most recent laser scan as in incremental map- Figure 1: SIFT features found, with scale and ori-
ping. When closing cycles, backwards correction can entation indicated by the size and orientation of the
be computed from the difference between the incre- squares. (a) Top image. (b) Left image. (c) Right
mental guess and the full posterior guess. image.
Most existing mobile robot localization and map-
ping algorithms are based on laser or sonar sensors, object recognition applications. The features are in-
as vision is more processor intensive and stable visual variant to image translation, scaling, rotation, and
features are more difficult to extract. In this paper, we partially invariant to illumination changes and affine or
propose a vision-based SLAMB algorithm by tracking 3D projection. These characteristics make them suit-
SIFT features. As our robot is equipped with Tri- able landmarks for robust SLAMB, since when mobile
clops [14], a trinocular stereo system, the estimated robots are moving around in an environment, land-
3D position of the landmarks can be obtained and marks are observed over time, but from different an-
hence a 3D map can be built and the robot can be gles, distances or under different illumination.
localized simultaneously. The 3D map, represented as At each frame, we extract SIFT features in each of
a SIFT landmark database, is incrementally updated the three images, and stereo match them among the
over time and adaptive to dynamic environments. images. Matched SIFT features are stable and will
Section 2 explains the SIFT features and the stereo serve as landmarks for the environment.
matching process. Ego-motion estimation by match- 2.1 Generating SIFT Features
ing features across frames is described in Section 3 and
Key locations are selected at maxima and minima
SIFT database landmark tracking is presented in Sec-
of a difference of Gaussian function applied in scale
tion 4. Experimental results are shown in Section 5,
space. They can be computed by building an image
where our 10m by 10m lab environment is mapped
pyramid with resampling between each level. Further-
with more than 3000 SIFT landmarks. Section 6 de-
more, SIFT locates key points at regions and scales
scribes some enhancements to the SIFT database. Fi-
of high variation, making these locations particularly
nally, we conclude and discuss some future work in
stable for characterizing the image. [12] demonstrated
Section 7.
the stability of SIFT keys to image transformations.
2 SIFT Stereo Figure 1 shows the SIFT features found on the top,
SIFT (Scale Invariant Feature Transform) was de- left and right images. The resolution is 320x240 and
veloped by Lowe [12] for image feature generation in 8 levels of scales are used. A subpixel image location,
scale and orientation are associated with each SIFT
feature. The size of the square surrounding each fea-
ture in the images is proportional to the scale at which
the feature is found, and the orientation of the squares
corresponds to the orientation of the SIFT features.
Image Number of SIFT features found
Top 193
Left 166
Right 189
2.2 Stereo Matching
The right camera in the Triclops serves as the refer-
ence camera, as the left camera is at 10cm right beside
it and the top camera is at 10cm directly above it.
In addition to the epipolar constraint and dispar- (a)
ity constraint, we also employ the SIFT scale and ori-
entation constraints for matching the right and left
images. Subpixel horizontal disparity is obtained for
each match. These resulting matches are then matched
with the top image similarly, with an extra constraint
for agreement between the horizontal and vertical dis-
parities. If a feature has more than one match sat-
isfying these criteria, it is ambiguous and discarded
so that the resulting matches are more consistent and (b) (c)
reliable. Figure 2: Stereo matching results for slightly different
From the positions of the matches and knowing the views. Horizontal line indicates its horizontal disparity
camera intrinsic parameters, we can compute the 3D and vertical line indicates its vertical disparity. Line
world coordinates (X, Y, Z) relative to the robot for lengths are proportional to the corresponding dispari-
each feature in this final set. They can subsequently ties. Closer objects will have larger disparities.
serve as landmarks for map building and tracking. The
disparity is taken as the average of the horizontal dis- cause some SIFT features will then have multiple po-
parity and the vertical disparity. tential matches and therefore be discarded.
The orientation and scale of each matched SIFT
feature are taken as the average of the orientation and
scale among the corresponding SIFT feature in the left, 3 Ego-motion Estimation
right and top images. We obtain [r, c, s, o, d, X, Y, Z] for each stereo
2.3 Results matched SIFT feature, where (r, c) is the measured
There are 106 matches between the right and left image coordinates in the reference camera, (s, o, d) are
images shown in Figure 1. After matching with the the scale, orientation and disparity associated with
top image, the final number of matches is 59. The each feature, (X, Y, Z) are its 3D coordinates relative
result is shown in Figure 2(a), where each matched to the camera.
SIFT feature is marked; the length of the horizontal To build a map, we need to know how the robot
line indicates the horizontal disparity and the vertical has moved between frames in order to put the land-
line indicates the vertical disparity for each feature. marks together coherently. The robot odometry data
Figures 2(b) and (c) show more SIFT stereo results can only give a rough estimate and it is prone to er-
for slightly different views when the robot makes some ror such as drifting, slipping, etc. To find matches in
small motion. the second view, the odometry allows us to predict the
Figure Number of final matches region to search for each match more efficiently.
Figure 2(a) 59 Once the SIFT features are matched, we can use
Figure 2(b) 66 the matches in a least-squares procedure to compute
Figure 2(c) 60 a more accurate camera ego-motion and hence better
Relaxing some of the constraints above does not localization. This will also help adjust the 3D coordi-
necessarily increase the number of final matches be- nates of the SIFT landmarks for map building.
3.1 Predicting Feature Characteristics
From the 3D coordinates of a SIFT landmark and
the odometry data, we can compute the expected 3D
relative position and hence the expected image coor-
dinates and disparity in the new view. The expected
scale is computed accordingly as it is inversely related
to the distance.
We can search for the appropriate SIFT feature
match within a region (currently 5 by 5 pixels) in
the next frame, using the disparity constraint together
with the SIFT scale and orientation constraints.
3.2 Matching Results
For the images shown in Figure 2, the rough camera
movement from the odometry is:
(a)
Figure Movement
Figure 2(a) Initial position
Figure 2(b) Forward 10cm
Figure 2(c) Rotate clockwise 5◦
The frames are then matched:
Figures to match No. of matches % of matches
Figure 2(a) and (b) 43 73%
Figure 2(b) and (c) 41 68%
Figure 3 shows the match results visually where the
shift in image coordinates of each feature is marked.
The white dot indicates the current position and the
white cross indicates the new position; the line shows
how each matched SIFT feature moves from one frame
to the next, analogous to sparse optic flow. Fig-
ures 3(a) is for a forward motion of 10cm and Fig- (b)
ures 3(b) is for a clockwise rotation of 5◦ . It can be
Figure 3: The SIFT feature matches between consec-
seen that the matches found are very consistent.
utive frames: (a) Between Figure 2(a) and (b) for a
3.3 Least-Squares Minimization 10cm forward movement. (b) Between Figure 2(b) and
Once the matches are obtained, the ego-motion (c) for a 5◦ clockwise rotation.
is determined by finding the camera movement that
would bring each projected SIFT landmark into the features with significant residual errors E (currently 3
best alignment with its matching observed feature. To pixels). Minimization is repeated with the remainder
minimize the errors between the projected image co- matches to obtain the new correction term.
ordinates and the observed image coordinates, we em-
3.4 Results
ploy a least-squares minimization [11] to compute this
camera ego-motion. Although our robot can only move We pass the SIFT feature matches in Figure 3 to the
forward and rotate, we use a full 6 degrees of freedom least-squares procedure with the odometry as the ini-
for the general motion. tial estimate of ego-motion. For between-frame move-
Newton’s method computes a correction term to be ment over a smooth floor, odometry is quite accurate
subtracted from the initial estimate, using the error and can be used to judge the accuracy of the solu-
measurements between the expected projection of the tion. The following results are obtained, where the
SIFT landmarks and the image position observed for least-squares estimate [X, Y, Z, θ, α, β] corresponds to
the matching feature. the translations in X, Y, Z directions, yaw, pitch and
The Jacobian matrix is estimated numerically and roll respectively:
Gaussian elimination with pivoting is employed to Fig Odometry Mean E Least-Squares Estimate
3(a) Z=10cm 1.328 [1.353cm,-0.534cm,11.136cm,
solve the linear system. The good feature matching (pixels) 0.059◦ ,−0.055◦ ,−0.029◦ ]
quality implies very high percentage of inliers, and 3(b) θ=5◦ 1.693 [0.711cm,0.008cm,-0.9890cm,
(pixels) 4.706◦ ,0.059◦ ,−0.132◦ ]
therefore, outliers are simply eliminated by discarding
4 Landmark Tracking ments), i.e., it has not been observed at its predicted
After matching SIFT features between frames, we position for N consecutive frames, this landmark track
would like to maintain a database containing the SIFT is terminated and pruned from the database. This re-
landmarks observed and use it to match features found moves features belonging to objects that moved in a
in subsequent views. dynamic environment.
Each SIFT feature has been stereo matched and 4.4 Field of View
localized in 3D coordinates. Its entry in the database: Firstly, we compute the expected 3D coordinates
[X, Y, Z, s, o, l] (X 0 , Y 0 , Z 0 ) from the current coordinates and the
where (X, Y, Z) is the current 3D position of the SIFT odometry. For a database landmark to be within the
landmark relative to the camera, (s, o) are its scale field of view in the next frame, we check Z 0 > 0
and orientation, and l is a count to indicate how many (not behind the camera), tan−1 (|X 0 |/Z 0 ) < 30◦ and
consecutive frames this landmark has been missed. tan−1 (|Y 0 |/Z 0 ) < 30◦ , as the Triclops camera lens field
Over subsequent frames, we would like to maintain of view is around 60◦ wide.
this database, add new entries to it, track features and
4.5 Reference Coordinate Frame
prune entries when appropriate, to cater for dynamic
environments and occlusions. We use the initial camera coordinate frame as the
reference and make all the landmarks relative to this
4.1 Track Maintenance
fixed frame. Therefore, Type I and Type II landmarks
Between frames, we obtain a rough estimate of cam-
do not need to be transformed using the camera ego-
era ego-motion from robot odometry to predict the
motion estimate at each frame. Matching the SIFT
feature characteristics for each database landmark in
landmarks referenced to the initial frame with features
the next frame. There are the following types of land-
observed in the current frame helps avoid error accu-
marks to consider:
mulation.
Type I. This landmark is not expected to be within
view in the next frame. Therefore, it is not being 5 Experimental Results
matched and its miss count remains unchanged. SIFT feature detection, stereo matching, ego-
Type II. This landmark is expected to be within motion estimation and tracking algorithms have been
view, but no matches can be found in the next implemented in our robot system. A SIFT database is
frame. Its miss count is incremented by 1. kept to track the features over frames.
Type III. This landmark is within view and a match As the robot camera Y location does not change
is found according to the position, scale, orienta- much over flat ground, we reduce the estimation from
tion and disparity criteria described before. Its 6 d.o.f. to 5, forcing the height change parameter to
miss count is reset to 0. 0. Depending on the distribution of features in the
Type IV. This is a new landmark corresponding to scene, there is ambiguity between a yaw rotation and
a SIFT feature in the new view which does not a sideways movement, which is a well-known problem.
match any existing landmarks in the database. Moreover, we set a limit to the correction terms al-
All the Type III landmarks matched are then used lowed for the least-squares minimization as the odom-
in the least-squares minimization to obtain a better etry for between-frame movement should be quite
estimate for the camera ego-motion. The landmarks in good. This will safeguard frames which have erro-
the database are currently updated by averaging. This neous matches that may lead to excessive correction
update can be replaced by some data fusion methods terms and mess up the subsequent estimation.
such as the Kalman filter [1] (Section 6.4). As our ego-motion estimation determines the move-
If there are insufficient Type III matches due to ment of the camera which is not placed in the centre of
occlusion for instance, the odometry will be used as the robot, we need to adjust the odometry information
the ego-motion for the current frame. to get the camera motion.
4.2 Track Initiation The following experiment was carried out on the fly
Initially the database is empty. When SIFT fea- while the robot is moving around. We manually drive
tures from the first frame arrive, we start a new track the robot to go around a chair in the lab for one loop
for each of the features initializing their miss count l and come back. At each frame, it keeps track of the
to 0. In subsequent frames, a new track is initiated for SIFT landmarks in the database, adds new ones and
each of the Type IV landmarks. updates existing ones if matched.
4.3 Track Termination Figure 4 shows some frames of the 320x240 im-
If the miss count l of any landmark in the database age sequence (249 frames in total) captured while the
reaches a predefined limit N (20 was used in experi- robot is moving around. The white markers indicate
4 metres and then has come back with its trajectory
shown in Figure 5. The maximum robot translation
and rotation speeds are set to around 40cm/sec and
10◦ /sec respectively such that there are sufficiently
many matches between consecutive frames.
The accuracy of the ego-motion estimation depends
on the SIFT landmarks and their distribution, the
(a) (b) number of matches, etc. In this experiment, there
are sufficiently many matches at each frame, ranging
mostly between 40 and 60, depending on the particular
part of the lab and the viewing direction.
At the end when the robot comes back to the orig-
inal position (0,0,0) judged visually:
SIFT estimate: X:-2.09cm Y:0cm Z:-3.91cm
θ:0.30◦ α:2.10◦ β:−2.02◦

6 SIFT Database
(c) (d) Our basic approach has been described above, but
Figure 4: Frames of an image sequence with SIFT fea- there are various enhancements dealing with the SIFT
tures marked. (a) 1st frame. (b) 60th frame. (c) 120th database that can help our tracking to be more robust
frame. (d) 180th frame. and our map-building to be more stable.
6.1 Database Entry
1600
In order to assess the reliability of a certain SIFT
1400
feature in the database, we need some information re-
garding how many times this feature has been matched
1200 and has not been matched so far. The new database
entry is [X, Y, Z, s, o, m, n, l] where l is still the count
1000 for the number of times being missed consecutively,
which is used to decide whether or not the feature
800
should be pruned from tracking. m is a count for the
600
number of times it has been missed so far and n is a
count for the number of times it has been seen so far.
400 Each feature has to appear at least 3 times (n ≥ 3)
to be considered as a valid feature. This is to eliminate
200 false alarms and noise, as it is highly unlikely that some
noise will cause a feature to match in the right, left &
0
top images for 3 times (a total of 9 camera views).
−200 In this experiment, we move the robot around the
lab environment without the chair in the middle. In
−400
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
order to demonstrate visually that the SIFT database
map is three-dimensional, we use a visualization pack-
Figure 5: Bird’s eye view of the SIFT landmarks age Geomview. Figure 6 shows several views of the
(including ceiling features) in the database after 249 3D SIFT map from different angles in Geomview. We
frames. The cross at (0,0) indicates the initial robot can see that the centre region is clear, as false alarms
position and the dashed line indicates the robot path. and noise features are discarded. Visual judgement
indicates that the SIFT landmarks correspond well to
the SIFT features found. At the end, a total of 3590 actual objects in the lab.
SIFT landmarks, with 3D positions relative to the ini-
tial robot position, are gathered in the SIFT database. 6.2 Permanent Landmarks
In a scene where there could be many volatile fea-
Figure 5 shows the bird’s eye view of these fea- tures, e.g., when someone blocks the camera view for a
tures. Consistent clusters are observed correspond- while, stable features observed earlier are not matched
ing to chairs, shelves, posters, computers etc. in the for a number of consecutive frames, and will be dis-
scene. The robot has traversed forward more than carded.
which the feature is observed. Subsequently, if the
new view direction differs from the original view direc-
tion by more than a threshold (currently set to 20◦ ),
its miss count will not be incremented even if it does
not match. This way we can avoid corrupting the fea-
ture information gathered earlier by the current partial
view of the world.
If a feature matches from a direction larger than
the threshold, we add a new view vector with the
associated SIFT characteristic to the existing land-
(a) mark. Therefore, a database landmark can have mul-
tiple SIFT characteristics (si , oi , vi ) where si and oi
are the scale and orientation for the view direction vi .
Over time, if a landmark is observed from various di-
rections, much richer SIFT information is gathered.
The matching procedure is as follows:
• compute view vector v between the database
landmark and the current robot position
• find the existing view direction vi associated with
the database landmark which is closest to v, i.e.,
with minimal angle φ between the two vectors
• check whether φ is less than 20◦ :
(b) (c)
Figure 6: 3D SIFT database map viewed from different – if so, update the existing s and o if feature
angles in Geomview. Each feature has appeared con- matching succeeds, or increment miss count
sistently in at least 9 camera views. (a) From top. (b) if feature matching fails
From left. (c) From right. – else, add a new entry of SIFT characteristics
(s, o, v) to the existing landmark if feature
Therefore, when the environment is clear, we can matching succeeds
build a SIFT database beforehand and mark them as The 3D positions of the landmarks are updated ac-
permanent landmarks, if they are valid (having ap- cordingly if matched.
peared in at least 3 frames) and if the percentage of 6.4 Error Modeling
their occurrence, given by n/(n+m), is above a certain There are various errors such as noise and quan-
threshold. Afterwards, this set of reliable landmarks tization associated with the images and the features
will not be wiped out even if they are being missed found. They introduce inaccuracy in both the land-
for many consecutive frames. They are important for marks’ position as well as the least-squares estimation
subsequent localization after the view is unblocked. of the robot position. In stochastic mapping, a single
6.3 Viewpoint Variation filter is used to maintain estimates of landmark posi-
Although SIFT features are invariant in image ori- tions, the robot position and the covariances between
entation and scale, they are image projections of 3D them [4], with high computational complexity.
landmarks and hence vary with large changes of view- In more recent work, we have employed a Kalman
points and as different parts of the object are observed Filter [1] for each database SIFT landmark which now
or part of the object is occluded. has a 3x3 covariance matrix for its position, assum-
For example, when the front of an object is seen ing the independence of landmarks. When a match is
first, after the robot moves around and views the ob- found in the current frame, the covariance matrix in
ject from the back, the image feature is in general com- the current frame will be combined with the covariance
pletely different. As the original feature may not be matrix in the database so far, and its 3D position will
observable from this viewpoint, or observable but ap- be updated accordingly.
pear different, its miss count will increase gradually An ellipsoidal uncertainty based on its covariance
and it will be pruned even though it is still there. is associated with each landmark position. The el-
Therefore, we allow each SIFT landmark to have lipses shrink when the landmarks are matched over
more than one SIFT characteristics, where each SIFT frames, indicating they are localized better. On the
characteristic (scale and orientation) is associated with other hand, the ellipses expand when the landmarks
a view vector keeping track of the viewpoint from are missed, indicating higher positional uncertainty.
7 Conclusion IEEE International Symposium on Computational In-
In this paper, we proposed a vision-based SLAMB telligence in Robotics and Automation (CIRA), Cali-
algorithm based on the SIFT features. Being scale fornia, November 1999.
and orientation invariant, SIFT features are good nat- [8] C. Harris. Geometry from visual motion. In A. Blake
and A. Yuille, editors, Active Vision, pages 264–284.
ural visual landmarks for tracking over long periods of
MIT Press, 1992.
time from different views. These tracked landmarks [9] M. Isard and A. Blake. Condensation - conditional
are used for concurrent robot pose estimation and 3D density propagation for visual tracking. International
map building, with promising results shown. Journal of Computer Vision, 29(1):5–28, 1998.
The algorithm currently runs at 2Hz for 320x240 [10] J.J. Little, J. Lu, and D.R. Murray. Selecting sta-
images on our mobile robot with a Pentium III ble image features for robot localization using stereo.
700MHz processor. As the majority of the process- In Proceedings of IEEE/RSJ International Conference
ing time is spent on SIFT feature extraction, MMX on Intelligent Robotic Systems (IROS’98), Victoria,
optimization is being investigated. B.C., Canada, October 1998.
At present, the map is re-used only if the robot [11] D.G. Lowe. Fitting parameterized three-dimensional
models to images. IEEE Trans. Pattern Analysis
starts up again at the last stop position or if the robot
Mach. Intell. (PAMI), 13(5):441–450, May 1991.
starts at the position of the initial reference frame.
[12] D.G. Lowe. Object recognition from local scale-
Preliminary work on the ‘kidnapped robot’ problem, invariant features. In Proceedings of the Seventh Inter-
i.e., initializing localization, has been positive. This national Conference on Computer Vision (ICCV’99),
will allow the robot to re-use the map at any arbitrary pages 1150–1157, Kerkyra, Greece, September 1999.
robot position by matching the rich SIFT database. [13] D. Murray and C. Jennings. Stereo vision based map-
We are currently looking into recognizing the re- ping and navigation for mobile robots. In Proceedings
turn to a previously mapped area and detecting the of the IEEE International Conference on Robotics and
occurrences of drift and to correct for it. Automation (ICRA’97), pages 1694–1699, New Mex-
ico, April 1998.
Acknowledgements [14] D. Murray and J. Little. Using real-time stereo vi-
Our work has been supported by the Institute for Robotics sion for mobile robot navigation. In Proceedings of
and Intelligent System (IRIS III), a Canadian Network of Cen- the IEEE Workshop on Perception for Mobile Agents,
tres of Excellence.
Santa Barbara, CA, June 1998.
References [15] I. Nourbakhsh, R. Powers, and S. Birchfield. Dervish:
[1] Y. Bar-Shalom and T.E. Fortmann. Tracking and An office-navigating robot. AI Magazine, 16:53–60,
Data Association. Academic Press, Boston, 1988. 1995.
[2] J. Borenstein, B. Everett, and L. Feng. Navigating [16] R. Sim and G. Dudek. Learning and evaluating visual
Mobile Robots: Systems and Techniques. A. K. Peters, features for pose estimation. In Proceedings of the
Ltd, Wellesley, MA, 1996. Seventh International Conference on Computer Vi-
[3] W. Burgard, A.B. Cremers, D. Fox, D. Hahnel, sion (ICCV’99), Kerkyra, Greece, September 1999.
G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun. [17] R. Simmons and S. Koenig. Probabilistic robot nav-
The interactive museum tour-guide robot. In Proceed- igation in partially observable environments. In Pro-
ings of the Fifteenth National Conference on Artificial ceedings of the Fourteenth International Joint Confer-
Intelligence (AAAI), Madison, Wisconsin, July 1998. ence on Artificial Intelligence (IJCAI), pages 1080–
[4] A.J. Davison and D.W. Murray. Mobile robot locali- 1087, San Mateo, CA, 1995. Morgan Kaufmann.
sation using active vision. In Proceedings of Fifth Eu- [18] S. Thrun, M. Bennewitz, W. Burgard, A.B. Cre-
ropean Conference on Computer Vision (ECCV’98) mers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg,
Volume II, pages 809–825, Freiburg, Germany, June N. Roy, J. Schulte, and D. Schulz. Minerva: A second-
1998. generation museum tour-guide robot. In Proceedings
[5] F. Dellaert, W. Burgard, D. Fox, and S. Thrun. Using of IEEE International Conference on Robotics and
the condensation algorithm for robust, vision-based Automation (ICRA’99), Detroit, Michigan, May 1999.
mobile robot localization. In Proceedings of IEEE [19] S. Thrun, W. Burgard, and D. Fox. A probabilistic
Conference on Computer Vision and Pattern Recog- approach to concurrent mapping and localization for
nition (CVPR’99), Fort Collins, CO, June 1999. mobile robots. Machine Learning and Autonomous
[6] D. Fox, W. Burgard, S. Thrun, and A.B. Cremers. Robots (joint issue), 31(5):1–25, 1998.
Position estimation for mobile robots in dynamic en- [20] S. Thrun, W. Burgard, and D. Fox. A real-time al-
vironments. In Proceedings of the Fifteenth National gorithm for mobile robot mapping with applications
Conference on Artificial Intelligence (AAAI), Madi- to multi-robot and 3d mapping. In IEEE Interna-
son, Wisconsin, July 1998. tional Conference on Robotics and Automation (ICRA
[7] J. Gutmann and K. Konolige. Incremental mapping 2000), San Francisco, CA, April 2000.
of large cyclic environments. In Proceedings of the

You might also like