1. Introduction
As a key technology for autonomous navigation of mobile robots, simultaneous localization and mapping (SLAM) focuses on the problem of acquiring a spatial map of an environment while simultaneously localizing the robot using this map [
1]. SLAM methods can be divided into two categories: filter-based and graph-based. Filter-based SLAM is mainly developed from the principle of recursive Bayesian estimation and is a problem of incremental, real-time data processing and robot pose correction. For example, the extended Kalman filter (EKF) can be used to estimate the robot location and geometric features in the environment [
2]. However, due to its computational complexity and linearization treatment, the application of EKF SLAM has been limited by its scaling limitation and mapping inconsistence. To overcome some of these issues, particle filters have been proposed for SLAM by sampling from robot pose data associations, but the number of particles can grow large and the estimate can become inconsistent when mapping nested loops especially in large-scale environments [
3,
4,
5]. On the other hand, graph-based SLAM models the map as a sparse graph with constraints corresponding to the relation between robot motion and environment measurement. By solving the sparse nonlinear optimization problem, graph-based methods can address the full SLAM problem of large-scale environments [
6]. Kohlbrecher et al. [
7] developed the Hector-SLAM scheme, which uses the Gauss-Newton equation to solve the nonlinear least square problem formulated for the front-end scan matching between the laser data and the map. Lu and Milios [
8] proposed a graph-based optimization mathematical framework and Gutmann and Konolige [
9] developed an efficient loop closure detection method to minimize the accumulated errors during mapping for the back-end process. With the development of optimization tools, open source implementation of graph-based SLAM frameworks is available nowadays. Konolige et al. [
10] proposed Karto SLAM, the first open source graph-based framework accounting for the system sparsity. Also, Google’s open source solution Cartographer [
11] introduces local submaps that integrate multi-sensor data and the matching strategy for loop closure detection. More literature on filter- and graph-based SLAM can be found in [
12,
13].
Compared with filter-based methods, graph-based SLAM can significantly reduce the accumulated error through back-end optimization and thus provide more consistent maps for large-scale environments, even ones with multiple nested loops. Loop closure detection is a most critical process in graph-based SLAM, which triggers the global nonlinear optimization by recognizing the revisited locations of the robot. Different approaches have been proposed for loop closure detection depending on the environments and application scenarios. The detection can be performed by processing the laser signals, for example, by matching the extracted features from single scan or submap [
14,
15,
16,
17,
18,
19] or by the matching of scan-to-scan or scan-to-submap [
11,
20]. The feature-based loop closure detection methods are suitable for large-scale environment, with efficient matching of the feature similarity, but these methods are limited to structured environments with obvious geometric characteristics. Compared with feature matching, scan-to-scan or scan-to-submap matching methods are more accurate and not affected by environmental features, but these methods usually have high computational cost. When processing the laser signals cannot provide sufficient positioning accuracy at certain locations in the map, auxiliary devices such as vision assisted sensors, RFID and magnetometer can be incorporated for loop closure detection [
21,
22,
23,
24,
25].
The loop closure detection is essentially a process of matching similar scenes and involves a series of parameters. For example, in Karto SLAM, the search radius and the quantity of consecutive scans to be detected must be set for loop closure detection. These parameters should be adapted to specific applications and directly affect whether the global nonlinear optimization should be carried out at certain locations. The importance of parameter optimization or estimation related to scan matching or data sampling in SLAM algorithms has been demonstrated in literature. For instance, the intrinsic parameters of the inertial measurement unit (IMU) have been estimated online to improve the accuracy of robot trajectory [
26]. In an Augmented Unscented Kalman Filter (AUKF)- based SLAM, the map quality can be improved by estimating the effective kinematic parameters such as wheel diameter, tread, and sensor mounting offset [
27]. Also, it has been found that the flight parameters of an unmanned aerial vehicle (UAV), such as flight pattern, altitude, and ground speed, can significantly affect the point cloud formation during its SLAM process [
28]. Several parameters of the particle filter-based Gmapping framework, such as the number of particles, the displacement update, and the resampling threshold, have been tested to determine the best configuration of these parameters, and it has been shown that these parameters have a strong impact on the mapping accuracy, CPU load and memory consumption in Gmapping [
29,
30]. All these works have shed some light on the necessity and difficulty of parameter optimization or tuning in SLAM either based on experience [
29,
30] or by establishing mathematical models [
26,
27,
28]. However, few works can be found on how the parameters related to loop closure detection can affect the map quality and how to optimize these parameters for a given environment. This can become an issue in actual applications even with open source graph-based SLAM schemes. For example, in an industrial production scenario, the deployment of automatic guided vehicles (AGV) is usually done by operators without much SLAM experience and thus usually requires the assistance of algorithm engineer for manual tuning of these parameters, which accordingly influences the production cost, time and efficiency. To overcome this issue, a first question to answer is how to efficiently measure the mapping quality with different parameters in the SLAM algorithm for a specific environment.
Several approaches for map evaluation have been published in the literature. The estimated map can be evaluated by comparing with a ground truth map, for example, by calculating the error metric according to the k-nearest neighbor concept [
31], comparing the extracted features like SURF features and rooms [
32], using image similarity methods [
33,
34] and topology graphs [
35,
36], or using artificial landmarks placed in the environment as ground truth positions [
37]. In addition, since the ground truth trajectory is easier to obtain than the ground truth map, the errors between the ground truth trajectory and the estimated robot positions have been proposed to evaluate mapping accuracy as well [
38,
39,
40]. Rather than relying on a global reference frame, relative poses can also be used for comparing the maps with different SLAM frameworks [
41,
42]. All these approaches for map evaluation are based on the comparison with ground truth data, which however are difficult to obtain in most applications. In the absence of ground truth, the general conditional random field [
43] and several metrics including the proportion of occupied and free cells, the number of corners and the amount of enclosed areas of the estimated map [
44] have been proposed to measure the mapping quality. These approaches provide efficient evaluation of SLAM quality since the measurement of map dimensions or robot poses is not required. The above map quality evaluation methods are summarized in
Table 1.
Aiming at addressing the issue due to the complexity and difficulty in the tuning of the loop closure detection parameters that are critical for the back-end process of indoor 2D graph-based SLAM, this article presents a method for the multi-objective optimization of these parameters. The proposed method unifies Karto SLAM algorithm, an efficient evaluation approach for map quality with three quantitative metrics, and a multi-objective optimization algorithm. The main contribution of the article is to use three evaluation metrics, i.e., the proportion of occupied grids, the number of corners and the amount of enclosed areas, to optimize the SLAM parameters. These metrics can reflect the errors such as overlaps, blurring and misalignment when mapping nested loops, even in the absence of ground truth. Compared with exiting graph-based SLAM methods, the proposed method provides a means to further improve the mapping quality besides the back-end optimization process, by optimizing the loop closure detection parameters. For this purpose, we first summarize the general formulation of Karto SLAM and the detailed procedure and parameters for the loop closure detection. Then, quantitative metrics are presented for map evaluation without ground truth. With these metrics objectives, the Nondominated Sorting Genetic Algorithm-III (NSGA-III) multi-objective optimization algorithm is adopted to determine the parameters for loop closure detection. Finally, the proposed method is validated on four datasets and in two real indoor environments, respectively.
2. Problem Statement
The full SLAM problem is to estimate the environment map
and the motion trajectories of the robot
in which
is the time index and
are the robot poses at time
with
, through a series of motion control variables
in which
are odometry measurements at time
with
and the perceptions of the environment
in which
are the measurements at time
with
. Then, we can describe the SLAM problem as a process of estimating the posteriori as follows:
where
represents the initial pose of the robot. By applying Bayes’ rule and following the Gaussian assumption, the full SLAM based on maximum likelihood estimation can be transformed into a constrained optimization problem.
The framework of graph-based SLAM is illustrated in
Figure 1, which can be divided into the front-end and back-end [
10]. The front-end includes scan matching to provide the estimated poses and to construct the map as a graph. The back-end is the process of nonlinear global optimization to eliminate the accumulated errors when the robot revisits the same location. Loop closure detection is a process of finding the robot itself back to the scene where it has been by scan matching and provides constraints for the back-end optimization. In Karto SLAM [
10], the Sparse Pose Adjustment (SPA) is used as the back-end framework, which uses a graph to represent the history of robot measurements and each node in the graph represents the pose computed by the correlative scan matching (CSM) [
45]. The edge between two nodes can be considered as a space constraint which is the measurement of node
from another node
. This measured offset between
and
, in the frame of
, is denoted as
. The variables
are parameterized by a translational vector
and a rotational angle
, in which
represent the robot location and
represent its rotational angle at time
with i=1,2,3,…,T. For any two nodes
and
, their offset can be calculated as:
where
represents the predicted observation,
is a
rotation matrix in terms of
as follows:
Then the error function associated with a constraint can be written as:
and the total error is computed as:
where the information matrix
is the inverse of covariance matrix between node
and node
. Then the Levenberg-Marquardt method can be used to solve this problem by minimizing the total error in Equation (5) to determine the robot poses.
For loop closure detection, Karto SLAM algorithm generates a data chain of a certain length as a storage container for the nodes within a certain search range with respect to the current pose that are not adjacent in time, and then matches the current node with the nodes in the data chain. If the probability of matching is greater than a predefined threshold, loop closure detection is successful and the global optimization is conducted. In this process, satisfying the search range and the length of the data chain are the prerequisite for the success of loop closure detection, as well as the condition for whether the back-end optimization should be carried out. Accordingly, the parameters describing the search range and the data chain length can strongly affect the SLAM results. Usually these parameters need to be tuned by experienced SLAM engineers to improve the map quality for a specific environment. Therefore, the problem under study in this article can be stated as to determining the loop closure detection parameters, i.e., the search radius and the number of consecutive nodes in the data chain, to optimize the map quality for indoor 2D SLAM.
3. Methodology
3.1. Loop Closure Detection
The flow chart of the loop closure detection in Karto SLAM is illustrated in
Figure 2, based on the Karto SLAM codes available on the GitHub website (
https://github.com/ros-perception/slam_karto). First, several consecutive nodes of data in a certain search range are found, and then coarse matching and fine matching are performed. Once loop closure detection is successful, the corresponding relative pose is used as the constraint in the global optimization to minimize the accumulative errors in robot poses. More details on Karto SLAM can also be found in Ref. [
10].
The detailed steps of loop closure detection are described as follows:
- 1)
Take the current node as the center and find all the nodes which are connected to the current node by edges within a certain distance
using the Breadth-First Search algorithm. These nodes are called as near linked nodes that represent the latest nodes connected to the current node, as plotted in
Figure 3. The red dot represents the current node, the solid green dots represent the previous nodes, and the near linked nodes are framed with an orange dotted line.
- 2)
Calculate the distance between the current node and all the previous nodes.
- 3)
Based on the Step 2, select all the nodes close to the current node within distance
. The loop closure detection process focuses on finding the nodes established when the robot passes through this position previously. Therefore, all the near linked nodes are excluded and the remaining nodes are added to the data chain (see the dotted blue line in
Figure 3). Once the length of the data chain is larger than a predefined threshold constant
, we proceed with the matching in Step 4.
- 4)
Perform coarse matching between the current node and the nodes in the data chain. If the resulting response value is larger than a threshold value and the diagonal terms in the position covariance matrix are both smaller than a certain value , then the coarse matching is successful and perform the fine matching. Here the response value represents the environmental similarity between the current node and the nodes in the data chain found in Step 3. If the fine matching is also successful, we proceed with the global optimization in Step 5.
- 5)
Once the loop closure detection is successful, we add an edge between the current node and the node in the data chain which is nearest to the current node (see
Figure 3). Then the global optimization is performed to reduce the cumulative error.
From
Figure 3, we can see that distance
represents the size of the search area for loop closure detection and
represents the quantity of consecutive nodes that have been constructed previously. Both parameters are significant in Karto SLAM because they represent the condition for whether the loop closure detection is successful to trigger the back-end optimization.
3.2. Map Evaluation Metrics without Ground Truth
Even without the comparison with the ground truth, one can still evaluate a grip map by observing its image quality, for example, whether the map is skewed and whether the walls are overlapped. In this article, we evaluate the map quantitatively using the three metrics presented in [
44] as the multiple objectives for the optimization of the search area
r and the amount of nodes
for Karto SLAM. These metrics can overcome the difficulty when evaluating map quality in the absence of the ground truth, by analyzing the defects in the map such as blurring, overlaps and misalignment, due to error accumulation and lack of global optimization. It should be noted that these metrics should not be used independently to achieve more accurate results because of the contingency of a single feature.
The proportion of occupied grids
is defined as the number of occupied grids divided by the total number of grids in the map, i.e.:
where
represents the number of occupied grids, and
represents the number of all grids in the map. When some walls or obstacles become blurred or occurred twice on the map due to the error accumulation and the loop closure detection is not satisfied,
increases. A blurry wall is shown in
Figure 4a [
44]; and in
Figure 4b, the red wall and the blue wall represent the multiple occurrences of the same wall in the environment on the map without back-end optimization [
31]. Both cases lead to an increase of the proportion of occupied grids. This indicates that a lower value of
represents a more accurate map.
For two maps generated by different combinations of loop closure detection parameters
r and
g, the map with fewer corners is more likely to be optimized successfully. As shown in
Figure 5, when the robot moves through the corner, the corner is plotted in the map in red [
31]. When it revisits the same corner, the map can produce another redundant corner (in blue) because of the error accumulation if the back-end optimization is not performed. Therefore, as long as no information is lost and the corners in the real world are all reflected on the grid map, the more corners the map has, the higher possibility that this map has a low quality.
To obtain the number of corners denoted by
from the map in our approach, a Gaussian filter is first applied to reduce the noise due to the environment using the
GaussianBlur function in OpenCV. Then, the Harris corner detector [
46] based on the concept of gray difference between adjacent pixel points is applied to acquire the corners on the map. To do this, a fixed window is used to slide in any direction on the image, and if there is a large gray level change in the pixel in the window after sliding, then there are corners in the window. In this article, we use the
cornerHarris function in OpenCV to detect corners in the map.
The third metric for map evaluation is the number of enclosed areas on the map denoted by
. An enclosed area is a certain area completely surrounded by occupied grids on the map. This often happens in the case of error accumulation while the back-end optimization is not carried out. For example, when the same room is scanned by the robot multiple times but the loop closure detection fails, the map can contain rooms that are slightly rotated and offset to each other, resulting in multiple closed polygons on the map. An example of extracted enclosed areas is shown in
Figure 6. To obtain the number of enclosed areas on the map, we first convert the occupied grid map using the
threshold function in OpenCV to a binary image. Then, the enclosed areas can be extracted through the topological structural analysis of the binary image by border following which can be implemented through the
findContours function in OpenCV [
47].
3.3. NSGA-III Multi-Objective Genetic Algorithm
Evolutionary algorithms simulate the natural selection and evolution of biological organisms and have been widely used for solving complex nonlinear optimization problems. In recent years, a few evolutionary algorithms have been developed for multi-objective optimization [
48,
49,
50,
51]. Unlike many other multi-objective solving algorithms that have problems to maintain the balance of convergence and diversity, NSGA-III algorithm [
48] can provide high efficiency and good performance and is suitable for solving multi-objective optimization of three dimensions and above. Therefore, NSGA-III is adopted in this article to optimize the loop closure detection parameters
and
in Karto SLAM. The multi-objective parameter optimization problem in this article can be described as:
where
, and
,
,
=
.
Similar to the Genetic Algorithm (GA), the parent population in NSGA-III undergoes initialization, selection, crossover, and mutation to obtain the offspring generation. Moreover, NSGA-III is a population-based heuristic algorithm, which uses a large number of well-spread reference points to maintain the diversity of the population to find a set of Pareto optimal solutions that are superior to the rest of the solutions in the search space when the multiple objectives are considered. The flow chart of NSGA-III is illustrated in
Figure 7 and the detailed steps are described in the following steps [
52]:
Step 1: Initialize parameters for the multi-objective optimization, for example, the maximum number of iterations , the population size of each generation and the ranges of the - optimization parameters.
Step 2: Randomly generate individuals to form the initial population .
Step 3: Produce the offspring population by the evaluation, selection, crossover and mutation of the population , then merge the parent population and the offspring population to form a temporary population of which the size is
Step 4: Carry out the non-dominant sorting of and divide to different nondomination levels with . All the individuals from level to are first included in a set .
Step 5: Select the first individuals to form a new population . According to Step 4, if , no other operations are needed and the next generation is started with . If , individuals from level to are selected in , i.e., , and the remaining () individuals are selected from the last front .
Step 6: Determine whether is reached. If yes, output as the final population; otherwise, continue with Step 3.
The selection of the individuals to form a new population in
Step 5 is a most critical process in NSGA-III. By using the reference point strategy, the new population guarantees the individual diversity and the solution convergence during evolution. To do this, the reference points are first located on a normalized hyper-plane. This hyper-plane is an (
M-1)-dimensional unit simplex equally inclined to all objective axes and it has an intercept of one on each axis, where
M is the number of objectives. According to the normal-boundary intersection (NBI) method [
51], a uniformly distributed reference point set is generated on the hyper-plane as shown in
Figure 8a, with the amount of reference points can be calculated as:
where
D denotes the number of divisions along each axis. Then, the ideal point of
is determined by identifying the minimum value of each objective, and each objective value of
is adaptively normalized relative to the ideal value. Thereafter, the extreme point in each objective axis is determined by finding the individual corresponding to the minimum value of the achievement scalarizing function. A hyper-plane through these extreme points is then created. Now, a reference line is drawn by connecting the reference point and the origin point of the above hyper-plane. Each individual of
is associated with the reference point, of which the reference line has the shortest perpendicular distance to that individual, as illustrated in
Figure 8b. Finally, the niche-preservation operation is used to choose
individuals one at a time from
to construct the population
. It should be noted that since the above-mentioned adaptive normalization of population members and the creation of the hyper-plane are performed at each generation using the extreme points, NSGA-III algorithm adaptively maintains the diversity in the space spanned by the members of
at every generation. This enables NSGA-III to solve the multi-objective optimization problems with differently scaled objectives. The readers are referred to [
48] for more information about NSGA-III.
5. Conclusions
A multi-objective optimization method is proposed for the loop closure detection parameters for indoor 2D graph-based SLAM. The method integrates the Karto SLAM algorithm, an evaluation approach for map quality with three metrics in the absence of ground truth, and the NSGA-III multi-objective optimization algorithm. The two loop closure detection parameters under study represent the condition for whether the SLAM algorithm should trigger the back-end optimization of graph-based SLAM. The three optimization objectives, i.e., the occupation rate, the number of corners, and the number of enclosed areas, reflect the deviation errors such as overlaps, blurring and misalignment when mapping multiple nested loops. The proposed method has been implemented and validated by testing on four datasets and two real-world environments.
It has been demonstrated that wall overlaps, redundant corners, more enclosed areas, and even large deviation errors can occur in the maps if the parameters are not properly tuned for Karto SLAM. For all the tests, the map quality can be improved by optimizing the loop closure detection parameters with the three evaluation metrics as the multiple objectives of NSGA-III, which can be proved by the iterative results of the optimization objectives or by the observation on the images. The proposed optimization method for loop closure detection parameters and the corresponding map improvement can be completed by running NSGA-III algorithm without human intervention, which provides potential applications for automatic tuning of these parameters for indoor 2D SLAM to obtain a high-quality map without ground truth.
In this work, we find that the NSGA-III algorithm requires a considerable computational time for the SLAM parameter optimization with the three objectives. Therefore, in future research, we will investigate how to improve the efficiency of the proposed method and whether it is possible to reduce the number optimization objectives. We will also further analyze the influence of other parameters in SLAM algorithms on the mapping quality and the possibility of optimizing other parameters using the proposed multi-objective optimization method.