Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
6 SIFT Database
(c) (d) Our basic approach has been described above, but
Figure 4: Frames of an image sequence with SIFT fea- there are various enhancements dealing with the SIFT
tures marked. (a) 1st frame. (b) 60th frame. (c) 120th database that can help our tracking to be more robust
frame. (d) 180th frame. and our map-building to be more stable.
6.1 Database Entry
1600
In order to assess the reliability of a certain SIFT
1400
feature in the database, we need some information re-
garding how many times this feature has been matched
1200 and has not been matched so far. The new database
entry is [X, Y, Z, s, o, m, n, l] where l is still the count
1000 for the number of times being missed consecutively,
which is used to decide whether or not the feature
800
should be pruned from tracking. m is a count for the
600
number of times it has been missed so far and n is a
count for the number of times it has been seen so far.
400 Each feature has to appear at least 3 times (n ≥ 3)
to be considered as a valid feature. This is to eliminate
200 false alarms and noise, as it is highly unlikely that some
noise will cause a feature to match in the right, left &
0
top images for 3 times (a total of 9 camera views).
−200 In this experiment, we move the robot around the
lab environment without the chair in the middle. In
−400
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
order to demonstrate visually that the SIFT database
map is three-dimensional, we use a visualization pack-
Figure 5: Bird’s eye view of the SIFT landmarks age Geomview. Figure 6 shows several views of the
(including ceiling features) in the database after 249 3D SIFT map from different angles in Geomview. We
frames. The cross at (0,0) indicates the initial robot can see that the centre region is clear, as false alarms
position and the dashed line indicates the robot path. and noise features are discarded. Visual judgement
indicates that the SIFT landmarks correspond well to
the SIFT features found. At the end, a total of 3590 actual objects in the lab.
SIFT landmarks, with 3D positions relative to the ini-
tial robot position, are gathered in the SIFT database. 6.2 Permanent Landmarks
In a scene where there could be many volatile fea-
Figure 5 shows the bird’s eye view of these fea- tures, e.g., when someone blocks the camera view for a
tures. Consistent clusters are observed correspond- while, stable features observed earlier are not matched
ing to chairs, shelves, posters, computers etc. in the for a number of consecutive frames, and will be dis-
scene. The robot has traversed forward more than carded.
which the feature is observed. Subsequently, if the
new view direction differs from the original view direc-
tion by more than a threshold (currently set to 20◦ ),
its miss count will not be incremented even if it does
not match. This way we can avoid corrupting the fea-
ture information gathered earlier by the current partial
view of the world.
If a feature matches from a direction larger than
the threshold, we add a new view vector with the
associated SIFT characteristic to the existing land-
(a) mark. Therefore, a database landmark can have mul-
tiple SIFT characteristics (si , oi , vi ) where si and oi
are the scale and orientation for the view direction vi .
Over time, if a landmark is observed from various di-
rections, much richer SIFT information is gathered.
The matching procedure is as follows:
• compute view vector v between the database
landmark and the current robot position
• find the existing view direction vi associated with
the database landmark which is closest to v, i.e.,
with minimal angle φ between the two vectors
• check whether φ is less than 20◦ :
(b) (c)
Figure 6: 3D SIFT database map viewed from different – if so, update the existing s and o if feature
angles in Geomview. Each feature has appeared con- matching succeeds, or increment miss count
sistently in at least 9 camera views. (a) From top. (b) if feature matching fails
From left. (c) From right. – else, add a new entry of SIFT characteristics
(s, o, v) to the existing landmark if feature
Therefore, when the environment is clear, we can matching succeeds
build a SIFT database beforehand and mark them as The 3D positions of the landmarks are updated ac-
permanent landmarks, if they are valid (having ap- cordingly if matched.
peared in at least 3 frames) and if the percentage of 6.4 Error Modeling
their occurrence, given by n/(n+m), is above a certain There are various errors such as noise and quan-
threshold. Afterwards, this set of reliable landmarks tization associated with the images and the features
will not be wiped out even if they are being missed found. They introduce inaccuracy in both the land-
for many consecutive frames. They are important for marks’ position as well as the least-squares estimation
subsequent localization after the view is unblocked. of the robot position. In stochastic mapping, a single
6.3 Viewpoint Variation filter is used to maintain estimates of landmark posi-
Although SIFT features are invariant in image ori- tions, the robot position and the covariances between
entation and scale, they are image projections of 3D them [4], with high computational complexity.
landmarks and hence vary with large changes of view- In more recent work, we have employed a Kalman
points and as different parts of the object are observed Filter [1] for each database SIFT landmark which now
or part of the object is occluded. has a 3x3 covariance matrix for its position, assum-
For example, when the front of an object is seen ing the independence of landmarks. When a match is
first, after the robot moves around and views the ob- found in the current frame, the covariance matrix in
ject from the back, the image feature is in general com- the current frame will be combined with the covariance
pletely different. As the original feature may not be matrix in the database so far, and its 3D position will
observable from this viewpoint, or observable but ap- be updated accordingly.
pear different, its miss count will increase gradually An ellipsoidal uncertainty based on its covariance
and it will be pruned even though it is still there. is associated with each landmark position. The el-
Therefore, we allow each SIFT landmark to have lipses shrink when the landmarks are matched over
more than one SIFT characteristics, where each SIFT frames, indicating they are localized better. On the
characteristic (scale and orientation) is associated with other hand, the ellipses expand when the landmarks
a view vector keeping track of the viewpoint from are missed, indicating higher positional uncertainty.
7 Conclusion IEEE International Symposium on Computational In-
In this paper, we proposed a vision-based SLAMB telligence in Robotics and Automation (CIRA), Cali-
algorithm based on the SIFT features. Being scale fornia, November 1999.
and orientation invariant, SIFT features are good nat- [8] C. Harris. Geometry from visual motion. In A. Blake
and A. Yuille, editors, Active Vision, pages 264–284.
ural visual landmarks for tracking over long periods of
MIT Press, 1992.
time from different views. These tracked landmarks [9] M. Isard and A. Blake. Condensation - conditional
are used for concurrent robot pose estimation and 3D density propagation for visual tracking. International
map building, with promising results shown. Journal of Computer Vision, 29(1):5–28, 1998.
The algorithm currently runs at 2Hz for 320x240 [10] J.J. Little, J. Lu, and D.R. Murray. Selecting sta-
images on our mobile robot with a Pentium III ble image features for robot localization using stereo.
700MHz processor. As the majority of the process- In Proceedings of IEEE/RSJ International Conference
ing time is spent on SIFT feature extraction, MMX on Intelligent Robotic Systems (IROS’98), Victoria,
optimization is being investigated. B.C., Canada, October 1998.
At present, the map is re-used only if the robot [11] D.G. Lowe. Fitting parameterized three-dimensional
models to images. IEEE Trans. Pattern Analysis
starts up again at the last stop position or if the robot
Mach. Intell. (PAMI), 13(5):441–450, May 1991.
starts at the position of the initial reference frame.
[12] D.G. Lowe. Object recognition from local scale-
Preliminary work on the ‘kidnapped robot’ problem, invariant features. In Proceedings of the Seventh Inter-
i.e., initializing localization, has been positive. This national Conference on Computer Vision (ICCV’99),
will allow the robot to re-use the map at any arbitrary pages 1150–1157, Kerkyra, Greece, September 1999.
robot position by matching the rich SIFT database. [13] D. Murray and C. Jennings. Stereo vision based map-
We are currently looking into recognizing the re- ping and navigation for mobile robots. In Proceedings
turn to a previously mapped area and detecting the of the IEEE International Conference on Robotics and
occurrences of drift and to correct for it. Automation (ICRA’97), pages 1694–1699, New Mex-
ico, April 1998.
Acknowledgements [14] D. Murray and J. Little. Using real-time stereo vi-
Our work has been supported by the Institute for Robotics sion for mobile robot navigation. In Proceedings of
and Intelligent System (IRIS III), a Canadian Network of Cen- the IEEE Workshop on Perception for Mobile Agents,
tres of Excellence.
Santa Barbara, CA, June 1998.
References [15] I. Nourbakhsh, R. Powers, and S. Birchfield. Dervish:
[1] Y. Bar-Shalom and T.E. Fortmann. Tracking and An office-navigating robot. AI Magazine, 16:53–60,
Data Association. Academic Press, Boston, 1988. 1995.
[2] J. Borenstein, B. Everett, and L. Feng. Navigating [16] R. Sim and G. Dudek. Learning and evaluating visual
Mobile Robots: Systems and Techniques. A. K. Peters, features for pose estimation. In Proceedings of the
Ltd, Wellesley, MA, 1996. Seventh International Conference on Computer Vi-
[3] W. Burgard, A.B. Cremers, D. Fox, D. Hahnel, sion (ICCV’99), Kerkyra, Greece, September 1999.
G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun. [17] R. Simmons and S. Koenig. Probabilistic robot nav-
The interactive museum tour-guide robot. In Proceed- igation in partially observable environments. In Pro-
ings of the Fifteenth National Conference on Artificial ceedings of the Fourteenth International Joint Confer-
Intelligence (AAAI), Madison, Wisconsin, July 1998. ence on Artificial Intelligence (IJCAI), pages 1080–
[4] A.J. Davison and D.W. Murray. Mobile robot locali- 1087, San Mateo, CA, 1995. Morgan Kaufmann.
sation using active vision. In Proceedings of Fifth Eu- [18] S. Thrun, M. Bennewitz, W. Burgard, A.B. Cre-
ropean Conference on Computer Vision (ECCV’98) mers, F. Dellaert, D. Fox, D. Hahnel, C. Rosenberg,
Volume II, pages 809–825, Freiburg, Germany, June N. Roy, J. Schulte, and D. Schulz. Minerva: A second-
1998. generation museum tour-guide robot. In Proceedings
[5] F. Dellaert, W. Burgard, D. Fox, and S. Thrun. Using of IEEE International Conference on Robotics and
the condensation algorithm for robust, vision-based Automation (ICRA’99), Detroit, Michigan, May 1999.
mobile robot localization. In Proceedings of IEEE [19] S. Thrun, W. Burgard, and D. Fox. A probabilistic
Conference on Computer Vision and Pattern Recog- approach to concurrent mapping and localization for
nition (CVPR’99), Fort Collins, CO, June 1999. mobile robots. Machine Learning and Autonomous
[6] D. Fox, W. Burgard, S. Thrun, and A.B. Cremers. Robots (joint issue), 31(5):1–25, 1998.
Position estimation for mobile robots in dynamic en- [20] S. Thrun, W. Burgard, and D. Fox. A real-time al-
vironments. In Proceedings of the Fifteenth National gorithm for mobile robot mapping with applications
Conference on Artificial Intelligence (AAAI), Madi- to multi-robot and 3d mapping. In IEEE Interna-
son, Wisconsin, July 1998. tional Conference on Robotics and Automation (ICRA
[7] J. Gutmann and K. Konolige. Incremental mapping 2000), San Francisco, CA, April 2000.
of large cyclic environments. In Proceedings of the