Overlapnet: Loop Closing For Lidar-Based Slam
Overlapnet: Loop Closing For Lidar-Based Slam
Shared Weights
Yaw
Fig. 3: Pipeline overview of our proposed approach. The left-hand side shows the preprocessing of the input data which exploits multiple
cues generated from a single LiDAR scan, including range R, normal N , intensity I, and semantic class probability S information. The
right-hand side shows the proposed OverlapNet which consists of two legs sharing weights and the two heads use the same pair of feature
volumes generated by the two legs. The outputs are the overlap and relative yaw angle between two LiDAR scans.
HxW
Conv2D (2, 1) 32 (3, 15) 14 × 429 × 32
Conv2D (2, 1) 64 (3, 15) 6 × 415 × 64 HxW _
HxW
Conv2D (2, 1) 128 (2, 9) 1 × 396 × 128
Legs
80 80
70 70
togram [26], and the original SuMa [3]. Since SuMa always
uses the nearest frame as the candidate for loop closure detec-
60 60
tion, we can only get one pair of precision and recall value re-
50 50 sulting in a single point. We also show the result of our method
40 40 using prior information, named Ours CovNearestOfTop10,
0 25 50 75 100 0 25 50 75 100
which uses covariance propagation (Sec. III-F) to define the
Recall [%] Recall [%]
search space with the Mahalanobis distance and use the
(a) KITTI sequence 00 (b) Ford campus sequence 00 nearest in Mahalanobis distance of the top 10 predictions of
Fig. 5: Precision-Recall curves of different approaches.
OverlapNet as the loop closure candidates.
Tab. II shows the comparison between our approach and
To evaluate the generalization ability of our method, we the state of the art using the F1 score and the area under
also test it on the Ford campus dataset [25], which is recorded the curve (AUC) on both KITTI and Ford campus dataset.
on the Ford research campus and downtown Dearborn in For the KITTI dataset, our approach uses the model trained
Michigan using a different version of the Velodyne HDL-64E. with all cues, including depth, normals, intensity, and a
In the case of the Ford campus dataset, we test our method probability distribution over semantic classes. For the Ford
on sequence 00 which has several large loops. Note that we campus dataset, our approach uses the model trained with
never trained our approach on the Ford campus dataset. geometric information only, namely Ours (GeoOnly), since
For generating overlap ground truth, we only use points other cues are not available in this dataset. We can see that our
within a distance of 75 m to the sensor. For overlap compu- method outperforms the other methods on the KITTI dataset
tation, see Eq. (3), we use = 1 m. We use a learning rate and attains a similar performance on the Ford campus dataset.
of 10−3 with a decay of 0.99 every epoch and train at most 100 There are two reasons to explain the worse performance on
epochs. For the combined loss, Eq. (6), we set α = 5. For the the Ford campus dataset. First, we never trained our network
overlap loss, Eq. (7), we use a = 0.25, b = 12, and scale on the Ford campus dataset or even US roads, and secondly,
factor s = 24. there is only geometric information available on the Ford
campus dataset. However, our method outperforms all baseline
A. Loop Closure Detection methods in both, KITTI and Ford campus dataset, if we
In our first experiments, we investigate the loop closure integrate prior information.
performance of our approach and compare it to existing We also show the performance in comparison to variants of
methods. Loop closure detection typically assumes that robots our method in Tab. III. We compare our best model AllChannel
revisit places during the mapping while moving with uncertain using two heads and all available cues to a variant which
odometry. Therefore, the prior information about robot poses only uses a basic multilayer perceptron as the head named
extracted from the pose graph is available for the loop closure MLPOnly which consists of two hidden fully connected layers
detection. The following criteria are used in these experiments: and a final fully connected layer with two neurons (one for
• To avoid detecting a loop closure in the most recent scans, overlap, one for yaw angle). The substantial difference of the
we do not search candidates in the latest 100 scans. AUC and F1 scores shows that such a simple network structure
• For each query scan, only the best candidate is considered is not sufficient to get a good result. Training the network with
throughout this evaluation. only one head (only the delta head for overlap estimation,
• Most SLAM systems search for potential closures only named DeltaOnly), has not a significant influence on the
within the 3σ area around the current pose estimate. We performance. A huge gain can be observed when regarding the
do the same, either using the Euclidean or the Mahalanobis nearest frame in Mahalanobis distance of the top 10 candidates
distance, depending on the approach. in overlap percentage (CovNearestOfTop10).
• We use a relatively low threshold of 30 % for the overlap
B. Qualitative Results
to decide if a candidate is a true positive. We aim to find
more loops even in some challenging situations with low The second experiment is designed to support the claim
overlaps, e.g., when the car drives back to an intersection that our method is able to improve the overall mapping result.
from the opposite direction (as highlighted in the supple- Fig. 6 shows the odometry results on KITTI sequence 02.
mentary video1 ). Furthermore, ICP can find correct poses The color in Fig. 6 shows the 3D translation error (includ-
if the overlap between pairs of scans is around 30 %, as ing height). The left figure shows the SuMa and the right
illustrated in the experimental evaluation. figure shows Ours CovNearestOfTop10 using the proposed
OverlapNet to detect loop closures. We can see that after
1 https://youtu.be/YTfliBco6aw integrating our method, the overall odometry is much more
TABLE II: Comparison with state of the art. 30
F1 800 800
Dataset Approach AUC
score
y [m]
KITTI
SuMa [3] - 0.85
400 400
Ours (AllChannel, TwoHeads) 0.87 0.88
Histogram [26] 0.84 0.83 200 200
M2DP [15] 0.84 0.85
Ford Campus
SuMa [3] - 0.33
Ours (GeoOnly) 0.85 0.84 0 ground truth 0 ground truth
0
0 200 400 600 0 200 400 600
TABLE III: Comparison with our variants. x [m] x [m]
within
GeoCovNearestOfTop10 0.85 0.88
60 30 30
40 Error [deg] 20 20
10
15 15
20
5 10 10
5 5
0 0
0 0
40 60 80 100 40 60 80 100
10 30 50 70 90 10 30 50 70 90
Overlap Threshold [%] Overlap Threshold [%]
Overlap percentage [%] Overlap percentage [%]
(a) KITTI sequence 00 (b) Ford campus sequence 00
(a) With identity as initial guess (b) With yaw angle as initial guess
Fig. 8: Overlap and yaw estimation relationship. Fig. 9: ICP using OverlapNet predictions as initial guess. The error of
ICP registration here is the Euclidean distance between the estimated
be that adding semantic information will make the input translation and the ground-truth translation.
data more distinguishable when the car drives in symmetrical
environments. We also notice that semantic information will For the Ford campus dataset, we used only geometric
increase the computation time, see Sec. IV-G. However, from information, which could be generated in 10 ms on average per
the ablation study, one could also notice that the proposed frame, 2 ms for feature extraction and 24 ms for matching with
method can also achieve good performance by only employing the worst case of 550 ms. In real SLAM operation, we only
geometric information (depth and normals). search loop closure candidates inside a certain search space
given by pose uncertainty using the Mahalanobis distance
F. Using OverlapNet Predictions as Initial Guesses for ICP (see Sec. III-F). Therefore, our method can achieve online
We aim at supporting the claim that our network provides operation in long-term tasks, since we usually only have to
good initializations for ICP with 3D laser scans collected on evaluate a small number of candidate poses.
autonomous cars. Fig. 9 shows the relation between the overlap
and ICP registration error with and without using OverlapNet V. C ONCLUSION
predictions as initial guesses. The error of ICP registration is In this paper, we presented a novel approach for LiDAR-
here depicted by the Euclidean distance between the estimated based loop closure detection. It is based on the overlap
relative translation and the ground-truth translation. As can be between LiDAR scan range images and provides a measure for
seen, the yaw angle prediction of the OverlapNet increases the the quality of the loop closure. Our approach utilizes a siamese
chance to get a good result from the ICP even if two frames network structure to leverage multiple cues and allows us to
are relatively far away from each other (with low overlap). estimate the overlap and relative yaw angle between scans.
Therefore in some challenging cases, e.g. the car drives back The experiments on two different datasets suggest that when
into an intersection from a different street, our approach can combined with odometry information our method outperforms
still find loop closures (see in the supplementary video1 ). The other state-of-the-art methods and that it generalizes well to
results also show that the overlap estimates measure the quality different environments never seen during training.
of the found loop closure: larger overlap values result in better Despite these encouraging results, there are several avenues
registration results of the involved ICP. for future research. First, we want to investigate the integration
G. Runtime of other input modalities, such as vision and radar information.
We furthermore plan to test our approach with other datasets
We tested our method on a system equipped with an Intel
collected in different seasons.
i7-8700 with 3.2 GHz and an Nvidia GeForce GTX1080 Ti
with 11 GB memory.
ACKNOWLEDGMENTS
For the KITTI sequence 00, we could exploit all input cues
including the semantic classes provided by RangeNet++ [23]. This work has been supported in part by the German
We need on average 75 ms per frame for the input data prepro- Research Foundation (DFG) under Germany’s Excellence
cessing, 6 ms per frame for the legs feature extraction, 27 ms Strategy, EXC-2070 - 390732324 (PhenoRob) and under grant
per frame for the head matching. The worst case for the head number BE 5996/1-1 as well as by the Chinese Scholarship
matching takes 630 ms for all candidates in the search space. Committee.
R EFERENCES Computer Vision and Pattern Recognition (CVPR), pages
3354–3361, 2012.
[1] Tim Bailey and Hugh Durrant-Whyte. Simultaneous [14] Jiadong Guo, Paulo V.K. Borges, Chanoh Park, and Abel
localisation and mapping (SLAM): Part I. IEEE Robotics Gawel. Local descriptor for robust place recognition
and Automation Magazine (RAM), 13(2):99–110, 2006. using LiDAR intensity. IEEE Robotics and Automation
ISSN 1070-9932. doi: 10.1109/MRA.2006.1638022. Letters (RA-L), 4(2):1470–1477, 2019.
[2] Ioan Andrei Barsan, Shenlong Wang, Andrei Pokrovsky, [15] Li He, Xiaolong Wang, and Hong Zhang. M2DP: A
and Raquel Urtasun. Learning to Localize Using a Novel 3D Point Cloud Descriptor and Its Application
LiDAR Intensity Map. In Proc. of the Second Conference in Loop Closure Detection. In Proc. of the IEEE/RSJ
on Robot Learning (CoRL), pages 605–616, 2018. Intl. Conf. on Intelligent Robots and Systems (IROS),
[3] Jens Behley and Cyrill Stachniss. Efficient Surfel- 2016.
Based SLAM using 3D Laser Range Data in Urban [16] Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel
Environments. In Proc. of Robotics: Science and Systems Andor. Real-Time Loop Closure in 2D LIDAR SLAM. In
(RSS), 2018. Proc. of the IEEE Intl. Conf. on Robotics & Automation
[4] Paul J. Besl and Neil D. McKay. A Method for (ICRA), 2016.
Registration of 3D Shapes. IEEE Trans. on Pattern [17] Peter J. Huber. Robust Statistics. Wiley, 1981.
Analalysis and Machine Intelligence (TPAMI), 14(2): [18] Mushtaq Hussain and James Bethel. Project and mission
239–256, 1992. planing. In Chris McGlone, Edward Mikhail, James
[5] Igor Bogoslavskyi and Cyrill Stachniss. Fast range Bethel, and Roy Mullen, editors, Manual of Photogram-
image-based segmentation of sparse 3d laser scans for metry, chapter 15.1.2.6, pages 1109–1111. American
online operation. In Proc. of the IEEE/RSJ Intl. Conf. on Society for Photogrammetry and Remote Sensing, 2004.
Intelligent Robots and Systems (IROS), 2016. [19] Giseop Kim, Byungjae Park, and Ayoung Kim. 1-day
[6] Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard learning, 1-year localization: Long-term LiDAR local-
Säckinger, and Roopak Shah. Signature Verifica- ization using scan context image. IEEE Robotics and
tion using a “Siamese” Time Delayed Neural Net- Automation Letters (RA-L), 4(2):1948–1955, 2019.
work. Intl. Journal of Pattern Recognition and Artifi- [20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
cial Intelligence, 07(04):669–688, 1993. doi: 10.1142/ Imagenet classification with deep convolutional neural
S0218001493000339. networks. Communications of the ACM, 60(6):8490, May
[7] Andrea Censi and Stefano Carpin. HSM3D: feature- 2017. ISSN 0001-0782. doi: 10.1145/3065386.
less global 6DOF scan-matching in the Hough/Radon [21] Stephanie Lowry, Niko Sünderhauf, Paul Newman,
domain. In Proc. of the IEEE Intl. Conf. on Robotics John J Leonard, David Cox, Peter Corke, and Michael J
& Automation (ICRA), pages 3899–3906. IEEE, 2009. Milford. Visual place recognition: A survey. IEEE
[8] Xieyuanli Chen, Andres Milioto, Emanuele Palazzolo, Trans. on Robotics (TRO), 32(1):1–19, 2016. ISSN 1552-
Philippe Giguère, Jens Behley, and Cyrill Stachniss. 3098. doi: 10.1109/TRO.2015.2496823.
SuMa++: Efficient LiDAR-based Semantic SLAM. In [22] Weixin Lu, Yao Zhou, Guowei Wan, Shenhua Hou, and
Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots Shiyu Song. L3-Net: Towards Learning Based LiDAR
and Systems (IROS), 2019. Localization for Autonomous Driving. In Proc. of the
[9] Konrad P. Cop, Paulo V.K. Borges, and Renaud Dubé. IEEE Conf. on Computer Vision and Pattern Recognition
Delight: An efficient descriptor for global localisation (CVPR), June 2019.
using lidar intensities. In Proc. of the IEEE Intl. Conf. on [23] Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill
Robotics & Automation (ICRA), 2018. Stachniss. RangeNet++: Fast and Accurate LiDAR
[10] Andrei Cramariuc, Renaud Dubé, Hannes Sommer, Semantic Segmentation. In Proc. of the IEEE/RSJ
Roland Siegwart, and Igor Gilitschenski. Learning Intl. Conf. on Intelligent Robots and Systems (IROS),
3D Segment Descriptors for Place Recognition. arXiv 2019.
preprint, 2018. [24] Sei Nagashima, Koichi Ito, Takafumi Aoki, Hideaki Ishii,
[11] Mark Cummins and Paul Newman. Highly scalable and Koji Kobayashi. A high-accuracy rotation estimation
appearance-only SLAM - FAB-MAP 2.0. In Proc. of algorithm based on 1D phase-only correlation. In Proc. of
Robotics: Science and Systems (RSS), 2009. the Intl. Conf. on Image Analysis and Recognition, pages
[12] Renaud Dubé, Daniel Dugas, Elena Stumm, Juan Nieto, 210–221, 2007.
Roland Siegwart, and Cesar Cadena. SegMatch: Segment [25] Gaurav Pandey, James R. McBride, and Ryan M. Eustice.
Based Place Recognition in 3D Point Clouds. In Proc. of Ford campus vision and lidar data set. Intl. Journal of
the IEEE Intl. Conf. on Robotics & Automation (ICRA), Robotics Research (IJRR), 30(13):1543–1552, 2011.
2017. [26] Timo Röhling, Jennifer Mack, and Dirk Schulz. A Fast
[13] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are Histogram-Based Similarity Measure for Detecting Loop
we ready for Autonomous Driving? The KITTI Vision Closures in 3-D LIDAR Data. In Proc. of the IEEE/RSJ
Benchmark Suite. In Proc. of the IEEE Conf. on Intl. Conf. on Intelligent Robots and Systems (IROS),
pages 736–741, 2015.
[27] Lukas Schaupp, Mathias Bürki, Renaud Dubé, Roland
Siegwart, and Cesar Cadena. OREOS: Oriented Recog-
nition of 3D Point Clouds in Outdoor Scenarios. Proc. of
the IEEE/RSJ Intl. Conf. on Intelligent Robots and Sys-
tems (IROS), 2019.
[28] Cyrill Stachniss, Dirk Hähnel, Wolfram Burgard, and
Giorgio Grisetti. On Actively Closing Loops in Grid-
based FastSLAM. Advanced Robotics, 19(10):1059–
1080, 2005.
[29] Cyrill Stachniss, John J. Leonard, and Sebastian Thrun.
Springer Handbook of Robotics, 2nd edition, chapter
Chapt. 46: Simultaneous Localization and Mapping.
Springer Verlag, 2016.
[30] Basitan Steder, Radu B. Rusu, Kurt Konolige, and Wol-
fram Burgard. NARF: 3D range image features for
object recognition. In Workshop on Defining and Solving
Realistic Perception Problems in Personal Robotics at the
IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
(IROS), 2010.
[31] Bastian Steder, Michael Ruhnke, Slawomir Grzonka, and
Wolfram Burgard. Place Recognition in 3D Scans Using
a Combination of Bag of Words and Point Feature Based
Relative Pose Estimation. In Proc. of the IEEE/RSJ
Intl. Conf. on Intelligent Robots and Systems (IROS),
2011.
[32] Li Sun, Daniel Adolfsson, Martin Magnusson, Henrik
Andreasson, Ingmar Posner, and Tom Duckett. Lo-
calising Faster: Efficient and precise lidar-based robot
localisation in large-scale environments. In Proc. of
the IEEE Intl. Conf. on Robotics & Automation (ICRA),
2020.
[33] Mikaela A. Uy and Gimm H. Lee. PointNetVLAD:
Deep point cloud based retrieval for large-scale place
recognition. In Proc. of the IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pages 4470–
4479, 2018.
[34] Heng Yang, Jingnan Shi, and Luca Carlone. TEASER:
Fast and Certifiable Point Cloud Registration. arXiv
preprint, 2020.
[35] Huan Yin, Yue Wang, Xiaqing Ding, Li Tang, Shoudong
Huang, and Rong Xiong. 3D LiDAR-Based Global
Localization Using Siamese Neural Network. IEEE
Trans. on Intelligent Transportation Systems (TITS),
2019.
[36] Anestis Zaganidis, Alexandros Zerntev, Tom Duckett,
and Grzegorz Cielniak. Semantically Assisted Loop Clo-
sure in SLAM Using NDT Histograms. In Proc. of the
IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems
(IROS), 2019.
[37] Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Fast
global registration. In Proc. of the Europ. Conf. on
Computer Vision (ECCV), pages 766–782, 2016.