A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles

Pang, Fenglin; Chen, Yutian; Luo, Yan; Lv, Zigui; Sun, Xuefei; Xu, Xiaobin; Luo, Minzhou

doi:10.3390/drones8110676

Open AccessArticle

A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles

by

Fenglin Pang

^1,†,

Yutian Chen

^2,†

,

Yan Luo

³,

Zigui Lv

¹,

Xuefei Sun

⁴,

Xiaobin Xu

¹

and

Minzhou Luo

^1,*

¹

College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213200, China

²

Jiangsu Jitri Intelligent Manufacturing Technology Institute Co., Ltd., Nanjing 211899, China

³

School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

⁴

College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2024, 8(11), 676; https://doi.org/10.3390/drones8110676 (registering DOI)

Submission received: 28 October 2024 / Revised: 12 November 2024 / Accepted: 14 November 2024 / Published: 15 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

With the advancement of technology, unmanned ground vehicles (UGVs) have shown increasing application value in various tasks, such as food delivery and cleaning. A key capability of UGVs is obstacle detection, which is essential for avoiding collisions during movement. Current mainstream methods use point cloud information from onboard sensors, such as light detection and ranging (LiDAR) and depth cameras, for obstacle perception. However, the substantial volume of point clouds generated by these sensors, coupled with the presence of noise, poses significant challenges for efficient obstacle detection. Therefore, this paper presents a fast obstacle detection algorithm designed to ensure the safe operation of UGVs. Building on multi-sensor point cloud fusion, an efficient ground segmentation algorithm based on multi-plane fitting and plane combination is proposed in order to prevent them from being considered as obstacles. Additionally, instead of point cloud clustering, a vertical projection method is used to count the distribution of the potential obstacle points through converting the point cloud to a 2D polar coordinate system. Points in the fan-shaped area with a density lower than a certain threshold will be considered as noise. To verify the effectiveness of the proposed algorithm, a cleaning UGV equipped with one LiDAR sensor and four depth cameras is used to test the performance of obstacle detection in various environments. Several experiments have demonstrated the effectiveness and real-time capability of the proposed algorithm. The experimental results show that the proposed algorithm achieves an over 90% detection rate within a 20 m sensing area and has an average processing time of just 14.1 ms per frame.

Keywords:

obstacle detection; LiDAR; depth camera; ground segmentation; fan-shaped area retrieval

1. Introduction

In recent years, with the advancement of technology in the field of robotics, a growing number of unmanned ground vehicles (UGVs) have been developed to replace human beings to complete many tedious or dangerous tasks. Most of the application scenarios are crowded places with a variety of static or dynamic objects, such as cleaning, food distribution, personnel transportation, rescue and disaster relief, etc., which require unmanned vehicles to have the ability to move autonomously in these complex environments. How to utilize the onboard sensors to detect obstacles to avoid collisions during movement is crucial for UGVs [1,2].

In order to perceive the environment stably and reliably in real time to avoid collisions, the vehicle must use a variety of sensors to accurately perceive the surrounding environment. Due to its accurate range information, 3D light detection and ranging (LiDAR) has the ability to operate in low-light conditions [3]. However, although 3D LiDAR has a wide FOV horizontally, it has a relatively small FOV vertically. For example, the Velodyne HDL-32 LiDAR has a horizontal FOV of 360

°

, but a vertical FOV of only 40

°

(−25

°

~+15

°

), which is not able to sense lower obstacles in the proximal part of the unmanned vehicle. In addition, the LiDAR sensor is usually installed in the center of the UGV. If the UGV is too high or the LiDAR sensor cannot achieve 360

°

horizontal coverage due to the body of the UGV, it is necessary to add new sensors in the blind spots of the LiDAR sensor. Therefore, in order to ensure the safety and reliability of UGVs during operation, multiple sensors are usually integrated to collect point clouds to realize seamless environment sensing. Three-dimensional LiDAR and multiple depth cameras combination sensing are widely used as the sensing source for various types of UGVs [4]. However, the large amount of data collected by 3D LiDAR and multiple depth cameras can be overwhelming. To be specific, one 32-wire LiDAR sensor and four depth cameras generate more than one million data points per frame. These data usually include not only information about surrounding objects, but also ground data, which can be considered redundant for object detection, classification, and tracking [5,6]. On the one hand, these large numbers of point clouds will lead to the inability to detect obstacles in real time, which increases the risk of a collision. On the other hand, there are non-negligible noise points in the point cloud, which may lead to incorrect identification of obstacles. It is difficult for existing methods to achieve real-time performance and accuracy simultaneously, which makes rapid and accurate obstacle detection from dense point clouds a significant challenge [7].

To solve this problem, this paper proposes a fast obstacle detection algorithm utilizing one 3D LiDAR sensor and multiple depth cameras from the UGV. The experimental results show that the algorithm can effectively detect various obstacles within ±20 m within 15 ms, which meets the safety and real-time requirements for UGV operation. The main contributions of the proposed algorithm are as follows:

An efficient ground segmentation algorithm based on multi-plane fitting and plane combination is proposed in order to prevent them from being considered as obstacles.
Instead of point cloud clustering, a vertical projection method is used to count the distribution of the potential obstacle points through converting the point cloud to a 2D polar coordinate system. Then, points in the fan-shaped area with a density lower than a certain threshold will be considered as noise.
To verify the effectiveness of the proposed algorithm, a cleaning UGV equipped with one LiDAR sensor and four depth cameras is used to test the performance of obstacle detection in various environments.

The paper is structured as follows: The related work of ground segmentation and obstacle detection is presented in Section 2. The details of the proposed algorithm are illustrated in Section 3. Experimental results on several datasets are shown in Section 4. Conclusions and future work are discussed in Section 5.

2. Related Works

This section will mainly focus on obstacle detection algorithms based on point clouds and the sources of point clouds, including LiDAR sensors and depth cameras. In general, these obstacle detection algorithms can be divided into two categories [8]: data-driven algorithms based on deep learning and statistical algorithms based on hand-crafted features. Each of them will be discussed below.

Statistical algorithms for obstacle detection do not require training datasets and have good generalization abilities. Overall, these algorithms have two main processes, namely ground segmentation and point cloud clustering. The process of ground segmentation reduces the point cloud size and narrows the query scope of obstacles. Douillard et al. [9] projected the original point cloud into a set plane grid and extracted feature information from each grid for ground point cloud classification. When there are relatively few LiDAR wire harnesses, missed detection or false detection may occur. Himmelsbach et al. [10] proposed a fast ground segmentation algorithm. The purpose of the algorithm is to apply the two-dimensional line extraction algorithm to the unorganized three-dimensional point domain, quickly estimate the local ground plane, and compare the single point with the obtained ground plane line, which can reliably separate ground from non-ground places, but this algorithm cannot identify the ground and obstacles well. Asvadi et al. [11] designed a segmented approach by utilizing the Random Sample Consensus (RANSAC) algorithm to piecewise fit the ground estimate. Narksri et al. [12] proposed an environment-aware algorithm that filters the majority of non-ground points using the geometry of the sensor and the distance between consecutive rings in the scan. In the second step, multi-region RANSAC plane fitting is used to separate the remaining non-ground points from ground points in the scan. Zermas et al. [13] used the lowest altitude point of each frame of LiDAR to fit the ground equation, and iteratively optimized the normal vector of the ground plane through principal component analysis (PCA). This algorithm is only applicable to flat ground environments. Based on that algorithm, Lim et al. proposed a series of works including Patchwork [14] and Patchwork++ [15], which model point clouds based on a concentric region model and combine ground normal vector direction, height, and flatness information to perform ground segmentation. Wen et al. [16] gave up the idea of ground fitting on 3D point clouds, but projected point clouds onto two cylindrical images and used a pixel-based image method for segmentation. After the process of ground segmentation, the remaining point cloud needs to be clustered to distinguish between real obstacles and noise points. It is obvious that the point cloud that belongs to obstacles is dense while the noise point cloud is sparse. Due to the unknown number of obstacles, it will be difficult to effectively detect obstacles using partition-based clustering methods represented by k-means clustering. Gao et al. [17] proposed an improved density-based spatial clustering that dynamically adjusts the neighborhood radius on the basis of describing the local environment with an elliptic model. In contrast, Jiang et al. [18] proposed an adaptive neighborhood search radius clustering algorithm, which uses the horizontal resolution and pitch resolution of LiDAR to determine the clustering distance threshold. By using Euclidean distance clustering, Peng et al. [19] introduced two additional constraints to identify target obstacles. One is the number of points within each cluster, and the other is the volume of the bounding box containing these clusters. Nan et al. [20] proposed a tunable threshold Euclidean clustering method by designing a modulation function related to the clustering center. Although these sorts of clustering algorithms can effectively distinguish point clouds, they require a significant amount of time when dealing with dense point clouds, which makes it difficult to meet real-time requirements. Oh et al. [21] proposed an algorithm called TRAVEL, which performs traversable ground detection and simultaneous object clustering of 3D point clouds using graphical representation, taking full advantage of the node edge structure and ground segmentation to ensure real-time operation and reduce over-segmentation. However, there is a relatively large number of calculations that must be performed which limits the real-time performance.

With the enhancement of computing power and the surge of annotated data, deep learning has allowed for tremendous progress in the field of 3D point cloud-based object detection [22,23,24]. According to the different input formats of point clouds, these data-driven algorithms based on deep learning can be divided into two categories: voxel-based methods and point-based methods. Voxel-based methods divide the input point cloud into a unified feature description form, such as 3D voxels [25] or 2D bird’s eye views (BEVs) [26]. Later, convolutional neural networks (CNNs) were designed to learn valid point cloud features. This type of method greatly improves the computational efficiency of CNNs, but the sparse parts of the point cloud may be ignored during data conversion, leading to detection errors. Focal voxel R-CNN [27] replaced the original 3D sparse convolution with focal sparse convolution, and a branch of sub-manifold sparse convolution was added upstream of the backbone convolution network, which increased the weight of foreground points and retained more valuable information, which could provide more reliable obstacle detection results. Vote3Deep [28] used convolutional neural networks (CNNs) to locally detect objects in 3D point clouds, which was achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers. In contrast, point-based methods learned feature representations of point clouds directly without any middleware, as exemplified by PointNet [29] and PointNet++ [30]. PointNet directly took the point cloud as input and could output labels for each point individually, which solved the disorder problem of point clouds. Due to the inadequacy of PointNet in extracting fine features, PointNet++ effectively extracted local fine features of different scales through the hierarchical set abstraction module. Lee et al. [31] proposed the Simultaneous Segmentation and Detection Network (SSADNet) algorithm, which performed simultaneous segmentation and detection based on a single neural network to recognize drivable areas and obstacles. The structure of SSADNet consisted of a branch for segmentation, a branch for detection, and a bridge connecting the two parts. Yin et al. [32] put forward a novel CNN model named CenterPoint that does not require manual setting of the anchor and has higher accuracy when facing scenes with objects of varying sizes. Objectively speaking, this type of method based on deep learning performs well in scenarios similar to the training data. However, it requires high computing power, has weak generalization, and has difficulty meeting real-time requirements.

3. Method

3.1. Algorithm Overview

The overall flow of the proposed fast obstacle detection algorithm based on point clouds is shown in Figure 1. It contains three modules: multi-sensor point cloud fusion, ground point cloud segmentation, and point cloud projection and denoising based on a fan-shaped area. In the first module, the point cloud from the depth camera will be transferred to the LiDAR coordinate system under the condition of time consistency. In the second module, in order to remove the ground point cloud that is not related to obstacles, an efficient ground segmentation algorithm is designed to adaptively fit multiple ground features. In the third module, the potential obstacle points will be converted into a 2D polar coordinate system. Points in the fan-shaped area with a density lower than a certain threshold will be considered as noise.

3.2. Multi-Sensor Point Cloud Fusion

For the depth camera, since each pixel in the depth image corresponds to its detection distance, it is necessary to convert the depth information from the pixel coordinate system to the depth camera coordinate system. The conversion formulas are shown in Equations (1) and (2), where the detection distance of the point with pixel coordinates

(u, v)

in the depth image is

d

, K is the intrinsic parameter of the depth camera,

c_{x}

and

c_{y}

are the offsets of the camera’s optical axis in the pixel coordinate system, and

f_{x}

and

f_{y}

are the focal lengths of the depth camera in the width and height directions, respectively.

P_{C} = K^{- 1} [\begin{matrix} u \times d \\ v \times d \\ d \end{matrix}]

(1)

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}]

(2)

Since point cloud data come from multiple sensors including LiDAR and multiple depth cameras, a multi-sensor fusion operation is required to obtain the fused point cloud in the same coordinate system almost simultaneously. The relative pose relationship between these sensors is known. For convenience, the LiDAR coordinate system is set as the main coordinate system, while each depth camera coordinate system is set as a sub-coordinate system. Point clouds obtained from the depth cameras will be converted to the main coordinate system. The definitions of these coordinate systems are described in Figure 2.

C_{i}

and L represent the coordinate systems of the i-th depth camera and LiDAR sensor. In order to ensure that the fused point cloud is basically consistent in time, the timestamp of the current frame of the LiDAR sensor is used as the standard. Each depth camera frame can only be fused within a certain time threshold.

3.3. Ground Point Cloud Segmentation

The spatial characteristics of the ground point cloud are related to the environment in which the UGVs are located. For most service robots, their main operation scenarios include office buildings, shopping malls, underground parking lots, etc., which usually comprise only one ground plane.

Therefore, based on the ground plane fitting proposed in the literature [13], the ground point cloud is extracted by using a single-sensor multi-plane combination of the plane fitting method, which follows the following principles. Details are described in Figure 3. Firstly, the ground point cloud has distinct planar features, which can be fitted by a planar formula. Secondly, for UGVs, the points with the lowest height values are most likely to belong to the ground point cloud, so there is no need to introduce the RANSAC process. Third, utilizing each individual sensor for plane fitting and then combining the plane information to form a new plane would enable the use of parallel computing to improve algorithmic efficiency. The fused point cloud

P

consists of the LiDAR’s point cloud

P_{L}

and several depth cameras’ point clouds from

P_{C_{1}}

to

P_{C_{n}}

. A single sensor’s point cloud

P_{L}

or

P_{C_{i}}

needs to be sorted in ascending order according to the height value. Taking

P_{L}

as an example, its ascending state is represented as

P_{L}^{*}

. A set of seed points with low height values are selected to estimate the initial plane formulation for the ground, described as

P_{L, s e e d}^{*}

.

Later, it is necessary to calculate the mean point

p_{L, s e e d}^{c}

of

P_{L, s e e d}^{*}

and the corresponding covariance matrix

C_{L, s e e d}

to obtain the simple linear ground estimation model. Performing singular value decomposition (SVD) on

C_{L, s e e d}

can result in three main directions of

P_{L, s e e d}^{*}

. Obviously, the singular vector corresponding to the minimum singular value obtained by SVD is the normal vector perpendicular to the ground plane. The initial ground plane estimation of

P_{L, s e e d}^{*}

can be described in Equations (3)–(7):

a_{L} x + b_{L} y + c_{L} z + d_{L} = 0

(3)

C_{L, s e e d} = \sum_{j = 1}^{N} (p_{L, s e e d}^{j} - p_{L, s e e d}^{c}) {(p_{L, s e e d}^{j} - p_{L, s e e d}^{c})}^{T}

(4)

C_{L, s e e d} = U S V^{T}, U = [u_{1}, u_{2}, u_{3}]

(5)

{(a_{L}, b_{L}, c_{L})}^{T} = u_{3}

(6)

d_{L} = - {u_{3}}^{T} \cdot p_{L, s e e d}^{c}

(7)

where

a_{L}

,

b_{L}

,

c_{L}

, and

d_{L}

are the parameters of the ground plane model, and

p_{L, s e e d}^{j}

is the j-th point of

P_{L, s e e d}^{*}

.

After obtaining the initial ground plane estimation, the next step is to calculate the distance from each point in

P_{L}

to the ground plane. This distance is compared with a threshold

{T h}_{d}

to determine whether it falls within the ground plane. The points belonging to the ground plane serve as the new seed points for updating the ground plane estimation. After repeating the above process

I t e r N u m

times, the ground plane estimation of LiDAR will be stable. The ground plane estimation of each depth camera performs the same operation. The mean points including

p_{L, s e e d}^{c}

,

p_{C_{1}, s e e d}^{c}

,

\dots

,

p_{C_{n}, s e e d}^{c}

will be combined to fit a new plane

a_{F} x + b_{F} y + c_{F} z + 1 = 0

by using the Gauss–Newton method. Since the ground plane extracted by each sensor has different errors, it is not appropriate to give the same weight to these mean points. Inspired by the works [33,34] in weighted plane fitting, the indicator for evaluating the quality of a ground point cloud is its thickness, that is, its standard deviation in the z-axis direction. The smaller the standard deviation, the higher the quality of the ground point cloud extracted by the sensor. There is no doubt that the standard deviation

σ_{L, s e e d}^{z}

in the z-axis direction of

P_{L, s e e d}^{*}

can be obtained from

C_{L, s e e d}

. The corresponding weight

w_{L, s e e d}^{c}

is calculated in Equation (8), which ensures the normalization of weight coefficients.

w_{L, s e e d}^{c} = \frac{e^{- σ_{L, s e e d}^{z}}}{e^{- σ_{L, s e e d}^{z}} + e^{- σ_{C_{1}, s e e d}^{z}} + \dots + e^{- σ_{C_{n}, s e e d}^{z}}}

(8)

Later, the new plane estimation can be calculated by the minimizing the weight-based distances between the mean points in the new plane, which is shown in the following Equations (9) and (10).

d_{L, s e e d} = w_{L, s e e d}^{c} \times {({p_{L, s e e d}^{c}}^{T} [\begin{matrix} a_{F} \\ b_{F} \\ c_{F} \end{matrix}] + 1)}^{2}

(9)

\min_{a_{F}, b_{F}, c_{F}} (\sum_{X \in \{L, C_{1}, \dots, C_{n}\}} d_{X, s e e d})

(10)

The point in the fused point cloud

P

whose distance to the new plane does not exceed the threshold

{T h}_{d}

will be considered as a ground point, otherwise it is a potential obstacle point. Finally, the ground point cloud

P_{g r o u n d}

and the potential obstacle point cloud

{\hat{P}}_{o b s t a c l e}

will be obtained. The flow chart of the ground point cloud segmentation method is shown in Algorithm 1.

Algorithm 1 Ground Point Cloud Segmentation

Input : P

(The fused point cloud) includes {P_{L}, P}_{C_{1}}, \dots, P_{C_{n}}

Output : P_{g r o u n d}

(ground point cloud), {\hat{P}}_{o b s t a c l e}

(potential obstacle point cloud)

1 : Sort each one of {P_{L}, P}_{C_{1}}, \dots, P_{C_{n}}

from low to high along the height value;

2 : Select a set of seed points with low height values as P_{L, s e e d}^{*}, P_{C_{1}, s e e d}^{*}, \dots, P_{C_{n}, s e e d}^{*}

;

3 : foreach P_{X, s e e d}^{*} \in \{P_{L, s e e d}^{*}, P_{C_{1}, s e e d}^{*}, \dots, P_{C_{n}, s e e d}^{*}\}

do

4 : Plane Fitting and obtain ground plane model a_{X} x + b_{X} y + c_{X} z + d_{X} = 0

;

5 : end

6 : for i = 1 : I t e r N u m

do (parallel computing for each P_{X}

in \{{P_{L}, P}_{C_{1}}, \dots, P_{C_{n}}\}

)

7 : foreach p \in P_{X}

do

8 : Calculate the distance d

to corresponding plane model;

9 : if d < {T h}_{d}

then

10 : Put p

into P_{X, s e e d}^{*}

;
11: else

12 : Remove p

from P_{X, s e e d}^{*}

;

13 : end

14 : Do plane Fitting for P_{X, s e e d}^{*}

and obtain the ground plane model;

15 : end

16 : Plane fitting for getting a new plane a_{F} x + b_{F} y + c_{F} z + 1 = 0

17 : foreach p \in P

do

18 : Calculate the distance d

to the new plane model;

19 : if d < {T h}_{d}

then

20 : Put p

into P_{g r o u n d}

;

21 : else

22 : put p

into {\hat{P}}_{o b s t a c l e}

;

23 : end

24 : return P_{g r o u n d}

and {\hat{P}}_{o b s t a c l e}

3.4. Point Cloud Projection and Denoising Based on Fan-Shaped Area

After receiving the potential obstacle point cloud

{\hat{P}}_{o b s t a c l e}

, it is vital to count its distribution for distinguishing the noise points. Obviously, regions with dense point clouds are more likely to have obstacles, while regions with only sporadic point clouds are likely to be just noise. A clustering algorithm is an unsupervised learning method mainly used in data mining and data analysis. Its purpose is to automatically group a set of objects into different categories, with high similarity of data within each class and low similarity between different classes. The existing mainstream method is to cluster

{\hat{P}}_{o b s t a c l e}

to determine the scale and distribution of point clouds in each class. Most clustering algorithms require setting hyperparameters such as the number of cluster centers and boundary distance in advance, and are very time-consuming when the point cloud is dense. Therefore, a new statistical method for the point cloud distribution of

{\hat{P}}_{o b s t a c l e}

is proposed.

Firstly, the potential obstacle point cloud

{\hat{P}}_{o b s t a c l e}

will be projected vertically, transforming it from 3D to 2D. Within a certain horizontal range, the real obstacle point cloud is continuous in the vertical direction, and it becomes dense after projecting it into 2D space. For each point

p (x, y, z)

in

{\hat{P}}_{o b s t a c l e}

, its corresponding polar coordinate value

(r, θ)

can be calculated in the following Equations (11) and (12):

r = \sqrt[2]{x^{2} + y^{2}}

(11)

θ = a r c t a n \frac{y}{x}

(12)

Later, a grid system based on the fan-shaped area will be established so that

r \in [r_{m i n}, r_{m a x}]

and

θ \in [θ_{m i n}, θ_{m a x}]

. Discretize the polar diameter with step size

∆ r

and the polar angle with step size

∆ θ

so that the point in this grid system can be represented as

(r_{i}, θ_{j})

. The specific description is in the following Equations (13) and (14).

r_{i} = r_{m i n} + i \times ∆ r, i = 0,1, 2, \dots, N_{r}, N_{r} = ⌊\frac{r_{m a x} - r_{m i n}}{∆ r}⌋

(13)

θ_{j} = θ_{m i n} + j \times ∆ θ, j = 0,1, 2, \dots, N_{θ}, N_{θ} = ⌊\frac{θ_{m a x} - θ_{m i n}}{∆ θ}⌋

(14)

All the points in

{\hat{P}}_{o b s t a c l e}

need to have their corresponding grid values calculated. For the point with polar coordinate value

(r, θ)

, the corresponding grid value is

(⌊\frac{r - r_{m i n}}{∆ r}⌋, ⌊\frac{θ - θ_{m i n}}{∆ θ}⌋)

. The grid system will count the number of points that fall into each grid. In order to accelerate the retrieval process of the grid system, a hash table is used to store the number of points that each grid falls into.

Because the grid division is delicate, it is unscientific to judge whether there is an obstacle in the grid only by the number of points falling into the grid. Including neighboring grids in the statistics will reflect the true condition of the current grid. Specifically, with the current grid as the center, expand

N_{n e i b o, r}

steps forward and backward on the polar diameter and

N_{n e i b o, θ}

steps forward and backward on the polar angle. The number of points falling into the expanding grids is defined as

N_{f}

. The schematic diagram of the above process is shown in Figure 4. For the convenience of explanation, the parameters

N_{n e i b o, r}

and

N_{n e i b o, θ}

are both set to 1.

According to the analysis of LiDAR and depth camera measurement noise in [33,34], the uncertainty of point cloud is positively correlated with its detection distance. It is not reasonable to set a uniform threshold to determine whether the point cloud within a grid is a real obstacle. Setting a dynamic threshold related to distance is a feasible method. When the distance between the grid and the UGV is closer, the corresponding grid’s area is smaller and the threshold value needs to be set smaller. It is necessary to judge the condition of a grid according to the density information of the points falling into the grid. The area of expanding grids by using

N_{n e i b o, r}

and

N_{n e i b o, θ}

can be calculated in Equation (15), and the density value is calculated in Equation (16). If the density value is greater than the threshold

{T h}_{p}

, the corresponding points in the current grid will be treated as obstacle points.

t_{a r e a} = \frac{π {(r + N_{n e i b o, r} \times ∆ r)}^{2}}{2 π ∕ (2 N_{n e i b o, θ} \times ∆ θ)} - \frac{π {(r - N_{n e i b o, r} \times ∆ r)}^{2}}{2 π ∕ (2 N_{n e i b o, θ} \times ∆ θ)}

(15)

{T h}_{d e n s i t y} = \frac{N_{f}}{t_{a r e a}}

(16)

Combined with the content of the previous section, the flow chart of the noise filtering method is shown in Algorithm 2. Through a series of operations, the real obstacle point cloud is obtained.

Algorithm 2 Noise Filtering

Input : {\hat{P}}_{o b s t a c l e}

(The potential obstacle point cloud)

Output : P_{o b s t a c l e}

(Real obstacle point cloud)

1 : Initialize hash table H_{t}

and P_{o b s t a c l e}

2 : foreach p \in {\hat{P}}_{o b s t a c l e}

do

3 : Calculate its corresponding polar coordinate value (r, θ)

;

4 : Calculate its corresponding grid value (⌊\frac{r - r_{m i n}}{∆ r}⌋, ⌊\frac{θ - θ_{m i n}}{∆ θ}⌋)

;

5 : Add 1 to H_{t} (⌊\frac{r - r_{m i n}}{∆ r}⌋, ⌊\frac{θ - θ_{m i n}}{∆ θ}⌋)

;

6 : end

;

7 : for i = 0 : N_{r}

do

8 : for j = 0 : N_{θ}

do

9 : if H_{t} (i, j) > 0

then

10 : Determine the expanding grids based on N_{n e i b o, r}

and N_{n e i b o, θ}

;

11 : Calculate the number of points N_{f}

falling into the expanding grids;

12 : Calculate the density information of the expanding grids;

13 : if {T h}_{d e n s i t y} > {T h}_{p}

then

14 : Put the corresponding points into P_{o b s t a c l e}

15 : end

16 : end

17 : end

18 : end

19 : return P_{o b s t a c l e}

4. Experiments and Results

4.1. Experiment Overview

In order to verify the effectiveness of the proposed method, the authors selected the cleaning UGV to test the effectiveness and real-time capability of the proposed algorithm. This UGV is equipped with one Leishen C32W LiDAR in its top center and four depth cameras in front, behind, left, and right of it. Specifically, the camera model installed in the front and behind directions is the Orbbec DaBai DCW2, which has a detection range of 0.2 m to 3 m and provides a depth resolution of 640 × 400. Conversely, the camera model installed in the left and right directions is the Orbbec Dabai MAX (Orbbec, Shenzhen, China), which has a detection range of 0.2 m to 2.5 m and provides a depth resolution of 640 × 320. The frequency of all the sensors is set to 10 hz for maintaining time consistency. The loaded sensors are shown in Figure 5 and the cleaning UGV is shown in Figure 6.

The UGV is equipped with an Intel Core i7 (Santa Clara, CA, USA) processor with 3.1GHz and 16G RAM, which has enough computing ability in point cloud calculation. The transmission of sensor data is achieved through the Robot Operating System (ROS) (Stanford, CA, USA) framework [35]. This UGV will be tested in two environments; one is a warehouse and the other is a parking scene. The total acquisition time of the warehouse scene is 431 s. Its main static obstacles include boxes and walls in various scenes. During the acquisition process, people walking in the warehouse are its main dynamic obstacles. The noise mainly comes from the unstable point cloud information obtained by the depth camera. The acquisition time of the parking scene is 1204 s. Its main static obstacles include pillars, walls, and stationary vehicles. Since drivers get out of cars and vehicles start moving during the acquisition process, its main dynamic obstacles include pedestrians and moving vehicles. Similarly, its noise source is mainly the unstable observation value of the depth camera. After the process of multi-sensor point cloud fusion, the vertical view of the fused point cloud in the main coordinate system is shown in Figure 7 and Figure 8. The red, orange, blue, green, and purple point clouds come from the front-view depth camera, left-view depth camera, behind-view depth camera, right-view depth camera, and LiDAR sensor, respectively. The UGV is located in the middle of the wrapped area of four depth camera point clouds. It can be clearly observed that the detection distance of the depth camera is smaller but the point cloud is dense, while the detection distance of the LiDAR sensor is longer but the point cloud is relatively sparse.

Ground point cloud segmentation is a classic binary classification problem that requires dividing the point cloud into a ground portion and non-ground portion. Similarly, obstacle detection is used to determine whether the point is a noise point or an obstacle. Therefore, evaluation indicators including the precision, recall, and F1_score are used to evaluate the classification performance. In this task, correct matches are true positives (TPs), incorrect matches are false positives (FPs), wrongly discarded matches are false negative (FNs), and the correctly discarded matches are true negative (TNs). Precision measures the proportion of samples correctly predicted as positive to all samples predicted as positive. High precision means that the model rarely misclassifies positive classes. Recall measures the ratio of the number of samples that are actually predicted to be positive to all actual positive samples. High recall means that the model is able to find all positive classes well. These two indicators often need to be weighed in practical applications, because improving one may sacrifice the other. Coincidentally, the F1_score is the harmonic mean of precision and recall. The precision, recall, and F1_score can be calculated in the following Equations (17)–(19).

f_{p r e c i s i o n} = \frac{T P}{T P + F P}

(17)

f_{r e c a l l} = \frac{T P}{T P + F N}

(18)

{F 1}_{s c o r e} = \frac{2 \times f_{p r e c i s i o n} \times f_{r e c a l l}}{f_{p r e c i s i o n} + f_{r e c a l l}}

(19)

4.2. Experimental Results in Ground Point Cloud Segmentation

Because the proposed algorithm is a statistical algorithm based on hand-crafted features, it is necessary to compare the effect of ground point cloud segmentation with other advanced algorithms, including Patchwork++ [15] and DipG-Seg [16]. In this part, the performance of the algorithms will be tested using the single ground plane case. Before conducting comparative experiments, it is necessary to explain the configuration of algorithm-related hyperparameters. For this algorithm, the threshold

{T h}_{d}

to determine whether a point falls within the corresponding ground plane is set as 0.1 m, and the iteration optimization time

I t e r N u m

is set as 10. In the initial stage, the lowest points in the first 10% are selected as seed points. In order to ensure good performance of ground segmentation, the above hyperparameter settings are all based on reference [13].

The ground point cloud segmentation effect is tested in both the warehouse and parking environments. After inputting the fused point cloud, the performance of the ground point cloud segmentation effect using several algorithms is shown in Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14. It is obvious that the proposed method achieves the best ground segmentation effect that basically reflects the spatial information of the ground point cloud. In sharp contrast, the performance of Patchwork++ and DipG-Seg are not satisfactory. The ground point clouds they extract are quite different from the real scenes and do not distinguish between ground point clouds and non-ground obstacles. As shown in Table 1 and Table 2, it is obvious that the proposed algorithm achieves the highest score from the evaluation metrics.

The outstanding performance of the proposed algorithm in ground point cloud segmentation can be attributed to the inclusion of the point cloud generated by the depth camera. The point cloud generated by the depth camera is affected by factors such as ambient lighting, texture contrast, and camera hardware performance [36]. If the ambient light changes greatly or the surface of the object is smooth, the depth estimation may be inaccurate. In summary, the point cloud data provided by LiDAR are more stable in most cases [37], especially in complex lighting and dynamic environments. The proposed ground segmentation method is able to overcome the unstable point cloud vibration, which reflects its robustness for fused point cloud data. Since Patchwork++ and DipG-Seg were originally designed for ground segmentation of the LiDAR point cloud, they are sensitive to the unstable point cloud from depth cameras. Patchwork++ uses region-wise vertical plane fitting (R-VPF) to divide the fused point cloud into several subsets and estimates the plane based on the principle of concentric circles. Because the concentric circle assumption does not hold in this fused point cloud, its idea is not applicable in this scenario. As for DipG-Seg based on image processing, this algorithm extracts the ground point cloud by filtering the image obtained by projecting the point cloud data. In the process of projecting point clouds to generate images, the noise from the depth camera will have a non-negligible interference on the image quality. There is no doubt that pixel filling and mean filtering have limited effects on improving image quality. Ultimately, ground point cloud segmentation based on pixel features make it difficult to ensure the effectiveness of the algorithm.

However, although the proposed algorithm has a better performance in ground segmentation, there are still some shortcomings in the proposed algorithm, as shown in Figure 15. Since there is a height threshold based on plane fitting, this will cause the points where obstacles are close to the ground plane to be incorrectly classified into the ground point cloud. This problem is more obvious in scenes with many small objects. Meanwhile, some remote points generated by depth cameras are mistakenly identified as obstacle points.

The ground segmentation efficiency of these algorithms is compared on the unified platform without using GPU-related devices. For the sake of fairness, the average running time over a period of time was chosen as the comparison. Table 1 and Table 2 illustrate that the runtimes of Patchwork++ and DipG-Seg are slightly greater than that of the proposed algorithm. In general, these methods basically meet the real-time requirement when the frequency of the receiving point cloud is 10 hz. On the basis of Patchwork, Patchwork++ divides the point cloud into a few regions for fitting the ground plane and adds some outlier suppression modules to optimize the segmentation effect. Therefore, Patchwork++ is relatively more time-consuming. DipG-Seg does not use point cloud-based fitting operations, and its processing flow is based on image filtering and segmentation. This image-based processing method further reduces the time required for ground segmentation. The proposed method performs parallel ground plane estimation based on the number of sensors and only requires a small amount of iterative optimization to achieve a stable segmentation effect. In general, the proposed method is slightly better in efficiency.

4.3. Experimental Results in Obstacle Detection

Next, on the basis of the ground point cloud segmentation by the proposed algorithm, the real obstacle point cloud is obtained by using a point cloud clustering-based method or a point cloud denoising-based method. In this part, a typical Euclidean clustering algorithm [19] is used to compare the proposed fan-shaped area retrieval. In addition, the CenterPoint model [32] based on deep learning also adds a comparison experiment. As for the proposed algorithm, the step size

∆ r

of polar diameter and the step size

∆ θ

of polar angle are respectively set as 0.1m and 0.01rad. The neighboring parameters

N_{n e i b o, r}

and

N_{n e i b o, θ}

are set as 2 and 10. This smaller hyperparameter setting is to prevent erroneous binding to adjacent noise points. To verify the above functions, a larger set of hyperparameter settings where

N_{n e i b o, r}

= 5 and

N_{n e i b o, θ}

= 20 will also be compared. The density threshold

{T h}_{p}

to determine the obstacle is set as 80. Its setting is mainly based on the work of reference [18]. The obstacle avoidance function only requires the point cloud within a certain range of this kind of low-speed UGV. Therefore, the obstacle detection radius centered on the vehicle body is set to 20 m.

The obstacle detection effect is tested in the same scenes as the previous part. After inputting the potential obstacle point cloud, the performance of obstacle detection by two algorithms is shown in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22 and Figure 23. The Euclidean clustering method calculates the Euclidean distance between points to determine if they belong to the same cluster. Then, the noise point cloud will be eliminated by the number of points in the cluster and the size of the bounding box corresponding to the cluster. This clustering method can combine the point cloud with similar spatial distances and filter out some sparse points. However, when facing relatively dense noise, it will cluster them together and mistakenly believe that these noise points belong to obstacles. Figure 16 and Figure 20 confirm the above phenomenon. In fact, the point cloud data in the edge area detected by depth cameras are relatively unstable. This leads to a relatively uniform and dense distribution of noise. Due to their large quantity and relatively large spatial volume, clustering-based methods have difficultly distinguishing whether they belong to obstacles or noise. CenterPoint has a good recognition of obstacles that are labeled in the training set, but the effect is not satisfactory for unlabeled obstacles. Admittedly, this will avoid many false detections of noise points as obstacles, but it will cause many real obstacles to be ignored by the UGV. In the above experiment, it treated all identified obstacles as cars. The proposed fan-shaped area retrieval also detects various static and dynamic obstacles. Compared with the Euclidean clustering method, it basically overcomes the noise interference problem. Due to the fan-shaped segmentation of the point cloud into small areas, the real obstacle point cloud presents a very dense scene, while there are obvious gaps between the noise points. This provides a good pattern for distinguishing between the real obstacle points and noise points. If the density of the point cloud in the area does not reach a certain level, it will not be considered as an obstacle. As is shown in Figure 18, Figure 19, Figure 22, and Figure 23, due to the design of the density threshold mechanism, the points belong to the persons near the UGV, and their number exceeds the smaller threshold. Some noise points are also relatively dense with remote distances. Thus, their quantity is not sufficient to meet the threshold requirement. Both of the comparisons between Figure 18 and Figure 19 and between Figure 22 and Figure 23 show that setting a smaller threshold can avoid the association of irrelevant point clouds and prevent noise points from being mistakenly included in the same category as real obstacles. Objectively speaking, the proposed algorithm effectively filters out most of the noise, but it can also lead to missed or false detections. Some smaller boxes placed in the long distance are overlooked, and the trailing points of LiDAR close to the UGV may be misidentified as an obstacle to some extent.

The obstacle detection efficiency of two algorithms is compared on the same platform. The average running time over a period of time is used for comparison. Table 3 and Table 4 illustrate that the runtime of the Euclidean clustering method is significantly greater than that of the proposed algorithm. Due to the need to control costs, the computing equipment performance of the proposed cleaning UGV is limited. Since CenterPoint is based on deep learning, it is difficult to run on computing devices without GPU configuration. Therefore, it takes the longest time to process. The proposed algorithm is able to meet the real-time requirement. According to the time complexity analysis of clustering algorithms [38], the time complexity of Euclidean clustering is

O (l n k d)

, where

n

is the number of point clouds,

k

is the number of clusters,

d

is the feature dimension, and

l

is the number of iterations. When the number of points is particularly large, it will consume a lot of time. Due to abandoning the clustering operation, the proposed fan-shaped area retrieval greatly improves the efficiency of point cloud denoising, and the time complexity is only

O (n)

.

5. Conclusions

In this paper, a fast obstacle detection algorithm based on 3D LiDAR and multiple depth cameras is proposed to improve the effectiveness and real-time capability of obstacle detection for UGVs. The fused point cloud is preliminarily filtered by efficient ground point cloud segmentation based on multi-plane fitting and plane combination, which ensures that the ground point cloud is not misidentified as an obstacle. Subsequently, the distribution of the remaining point cloud is counted by vertical projection, and the fan-shaped area retrieval will find the points belonging to the real obstacles on the basis of the adaptive threshold function. Experimental results demonstrate the effectiveness and real-time capability of the proposed algorithm. Compared with several state-of-the-art algorithms, the proposed algorithm can efficiently detect ground point clouds and complete separation operations in the case of multiple ground planes. Due to the adoption of the efficient point cloud data analysis method, the proposed algorithm can effectively avoid false detections caused by noise and detect obstacles of different sizes within various distance ranges. Specifically, the proposed algorithm can achieve an over 90% detection rate within a 20 m sensing area, with an average processing time of just 14.1 ms per frame.

However, the proposed algorithm still has some shortcomings. When faced with an unstructured multi-plane environment, the proposed algorithm will find it difficult to filter out all the ground point clouds and is likely to regard them as obstacles. Furthermore, due to the projection-based retrieval mechanism, thin and hollow objects are likely to be overlooked. In future work, the authors will make improvements to address the above issues, such as adding height information in projection statistics and designing a more robust method for detecting the number of ground planes.

Author Contributions

Conceptualization, F.P. and Y.C.; data curation, Y.L. and Z.L.; formal analysis, X.S. and X.X.; funding acquisition, Y.L. and M.L.; investigation, Z.L. and X.X.; methodology, F.P.; project administration, M.L.; resources, Z.L. and X.S.; software, F.P., Y.C. and X.S.; supervision, M.L.; validation, Y.C.; writing—original draft, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Jiangsu Provincial Department of Science and Technology’s Frontier-Leading Technology Basic Research (BK20232031).

Data Availability Statement

The data are only available upon request to the corresponding author.

Conflicts of Interest

Author Yibo Wu was employed by the company Jiangsu Jitri Intelligent Manufacturing Technology Institute Co., Ltd., Nanjing 211899, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiang, C.; Feng, C.; Xie, X.; Shi, B.; Lu, H.; Lv, Y.; Yang, M.; Niu, Z. Multi-Sensor Fusion and Cooperative Perception for Au-tonomous Driving: A Review. IEEE Intell. Transp. Syst. Mag. 2023, 15, 36–58. [Google Scholar] [CrossRef]
Pang, F.; Luo, M.; Xu, X.; Tan, Z. Path tracking control of an omni-directional service robot based on model predictive control of adaptive neural-fuzzy inference system. Appl. Sci. 2021, 11, 838. [Google Scholar] [CrossRef]
Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Yang, S.; Tang, W.; Yuan, J.; Chan, Q.; Yang, Y. Multi-Task Environmental Perception Methods for Autonomous Driving. Sensors 2024, 24, 5552. [Google Scholar] [CrossRef]
Nowakowski, M.; Kurylo, J. Usability of Perception Sensors to Determine the Obstacles of Unmanned Ground Vehicles Operating in Off-Road Environments. Appl. Sci. 2023, 13, 4892. [Google Scholar] [CrossRef]
Sánchez, C.M.; Zella, M.; Capitán, J.; Marrón, P.J. From Perception to Navigation in Environments with Persons: An Indoor Evaluation of the State of the Art. Sensors 2022, 22, 1191. [Google Scholar] [CrossRef] [PubMed]
Cao, M.; Wang, J. Obstacle Detection for Autonomous Driving Vehicles with Multi-Lidar Sensor Fusion. J. Dyn. Syst. Meas. Control. 2020, 142, 021007. [Google Scholar] [CrossRef]
Yu, X.; Marinov, M. A Study on Recent Developments and Issues with Obstacle Detection Systems for Automated Vehicles. Sustainability 2020, 12, 3281. [Google Scholar] [CrossRef]
Douillard, B.; Underwood, J.; Melkumyan, N.; Singh, S.; Vasudevan, S.; Brunner, C.; Quadros, A. Hybrid elevation maps: 3D surface models for segmentation. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 1532–1538. [Google Scholar]
Himmelsbach, M.; Hundelshausen, F.V.; Wuensche, H.J. Fast Segmentation of 3D Point Clouds for Ground Vehicles. In Proceedings of the IEEE Intelligent Vehicles Symposium, San Diego, CA, USA, 21–24 June 2010. [Google Scholar]
Asvadi, A.; Peixoto, P.; Nunes, U. Detection and Tracking of Moving Objects Using 2.5D Motion Grids. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 1–15 September 2015; pp. 788–793. [Google Scholar]
Narksri, P.; Takeuchi, E.; Ninomiya, Y.; Morales, Y.; Akai, N.; Kawaguchi, N. A Slope-robust Cascaded Ground Segmentation in 3D Point Cloud for Autonomous Vehicles. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 497–504. [Google Scholar]
Zermas, D.; Izzat, I.; Papanikolopoulos, N. Fast Segmentation of 3D Point Clouds: A Paradigm on Lidar Data for Autonomous Vehicle Applications. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–7 June 2017; pp. 5067–5073. [Google Scholar]
Lim, H.; Oh, M.; Myung, H. Patchwork: Concentric Zone-Based Region-Wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor. IEEE Robot. Autom. Lett. 2021, 6, 6458–6465. [Google Scholar] [CrossRef]
Lee, S.; Lim, H.; Myung, H. Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 13276–13283. [Google Scholar]
Wen, H.; Liu, S.; Liu, Y.; Liu, C. DipG-Seg: Fast and Accurate Double Image-Based Pixel-Wise Ground Segmentation. IEEE Trans. Intell. Transport. Syst. 2024, 25, 5189–5200. [Google Scholar] [CrossRef]
Gao, F.; Li, C.; Zhang, B. A Dynamic Clustering Algorithm for Lidar Obstacle Detection of Autonomous Driving System. IEEE Sens. J. 2021, 21, 25922–25930. [Google Scholar] [CrossRef]
Jiang, W.; Song, C.; Wang, H.; Yu, M.; Yan, Y. Obstacle Detection by Autonomous Vehicles: An Adaptive Neighborhood Search Radius Clustering Approach. Machines 2023, 11, 54. [Google Scholar] [CrossRef]
Peng, P.; Pan, J.; Zhao, Z.; Xi, M.; Chen, L. A Novel Obstacle Detection Method in Underground Mines Based on 3D LiDAR. IEEE Access 2024, 12, 106685–106694. [Google Scholar] [CrossRef]
Nan, Z.; Zhu, G.; Zhang, X.; Lin, X.; Yang, Y. A Novel High-Precision Railway Obstacle Detection Algorithm Based on 3D LiDAR. Sensors 2024, 24, 3148. [Google Scholar] [CrossRef]
Oh, M.; Jung, E.; Lim, H.; Song, W.; Hu, S.; Lee, E.M.; Park, J.; Kim, J.; Lee, J.; Myung, H. TRAVEL: Traversable ground and above-ground object segmentation using graph representation of 3D LiDAR scans. IEEE Robot. Autom. Lett. 2022, 7, 7255–7262. [Google Scholar] [CrossRef]
Zamanakos, G.; Tsochatzidis, L.; Amanatiadis, A.; Pratikakis, I. A comprehensive survey of Lidar-based 3D object detection methods with deep learning for autonomous driving. Comput. Graph. 2021, 99, 153–181. [Google Scholar] [CrossRef]
Alaba, S.Y.; Ball, J.E. A survey on deep-learning-based Lidar 3d object detection for autonomous driving. Sensors 2022, 22, 9577. [Google Scholar] [CrossRef]
Fraga-Lamas, P.; Ramos, L.; Mondéjar-Guerra, V.; Fernández-Caramés, T.M. A Review on IoT Deep Learning UAV Systems for Autonomous Obstacle Detection and Collision Avoidance. Remote. Sens. 2019, 11, 2144. [Google Scholar] [CrossRef]
Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
Zhang, Y.; Zhou, Z.; David, P.; Yue, X.; Xi, Z.; Gong, B.; Foroosh, H. PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9598–9607. [Google Scholar]
Qin, J.; Sun, R.; Zhou, K.; Xu, Y.; Lin, B.; Yang, L.; Chen, Z.; Wen, L.; Wu, C. Lidar-Based 3D Obstacle Detection Using Focal Voxel R-CNN for Farmland Environment. Agronomy 2023, 13, 650. [Google Scholar] [CrossRef]
Engelcke, M.; Rao, D.; Wang, D.Z.; Tong, C.H.; Posner, I. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1355–1361. [Google Scholar] [CrossRef]
Qi, C.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Qi, C.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in A Metric Space. Adv. Neural Inf. Process. Syst. 2017, 30, 5099–5108. [Google Scholar]
Lee, Y.; Park, S. A deep learning-based perception algorithm using 3d Lidar for autonomous driving: Simultaneous segmentation and detection network (ssadnet). Appl. Sci. 2020, 10, 4486. [Google Scholar] [CrossRef]
Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3d object detection and tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
Li, Y.; Li, W.; Darwish, W.; Tang, S.; Hu, Y.; Chen, W. Improving Plane Fitting Accuracy with Rigorous Error Models of Structured Light-Based RGB-D Sensors. Remote. Sens. 2020, 12, 320. [Google Scholar] [CrossRef]
Jian, Z.; Lu, Z.; Zhou, X.; Lan, B.; Xiao, A.; Wang, X.; Liang, B. Putn: A plane-fitting based uneven terrain navigation framework. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 7160–7166. [Google Scholar]
Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. In Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009; Volume 3, p. 5. [Google Scholar]
Li, J.; Gao, W.; Wu, Y.; Liu, Y.; Shen, Y. High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review. Comput. Vis. Media 2022, 8, 369–393. [Google Scholar] [CrossRef]
Babbel, B.J.; Olsen, M.J.; Che, E.; Leshchinsky, B.A.; Simpson, C.; Dafni, J. Evaluation of uncrewed aircraft systems’ LiDAR data quality. ISPRS Int. J. Geo-Inf. 2019, 8, 532. [Google Scholar] [CrossRef]
Xu, D.; Tian, Y. A comprehensive survey of clustering algorithms. Ann. Data Sci. 2015, 2, 165–193. [Google Scholar] [CrossRef]

Figure 1. Overall process of the proposed algorithm.

Figure 2. The schematic diagram of coordinate transformation.

T_{C_{i} L}

represents the relative pose relationship from the i-th depth camera to the LiDAR sensor.

Figure 2. The schematic diagram of coordinate transformation.

T_{C_{i} L}

represents the relative pose relationship from the i-th depth camera to the LiDAR sensor.

Figure 3. The schematic diagram of ground point cloud segmentation.

Figure 4. Schematic diagram of fan-shaped area retrieval.

Figure 5. The loaded sensors. (a) Leishen C32W LiDAR; (b) Orbbec DaBai DCW2; (c) Orbbec Dabai MAX.

Figure 6. The cleaning UGV equipped with these sensors.

Figure 7. The vertical view of the fused point cloud in the main coordinate system (warehouse).

Figure 8. The vertical view of the fused point cloud in the main coordinate system (parking).

Figure 9. The performance of the ground segmentation effect by Patchwork++ in the warehouse environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 10. The performance of the ground segmentation effect by DipG-Seg in the warehouse environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 11. The performance of the ground segmentation effect by the proposed method in the warehouse environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 12. The performance of the ground segmentation effect by Patchwork++ in the parking environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 13. The performance of the ground segmentation effect by DipG-Seg in the parking environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 14. The performance of the ground segmentation effect by the proposed method in the parking environment. Red represents the ground point cloud and green represents the non-ground point cloud.

Figure 15. Detailed image of the ground segmentation effect of the proposed algorithm. (a) Warehouse; (b) parking.

Figure 16. The vertical view of the obstacle detection effect using Euclidean clustering (warehouse).

Figure 17. The vertical view of the obstacle detection effect using CenterPoint (warehouse).

Figure 18. The vertical view of the obstacle detection effect using the proposed algorithm in smaller hyperparameter settings (warehouse).

Figure 19. The vertical view of the obstacle detection effect using the proposed algorithm in larger hyperparameter settings (warehouse).

Figure 20. The vertical view of the obstacle detection effect using Euclidean clustering (parking).

Figure 21. The vertical view of the obstacle detection effect using CenterPoint (parking).

Figure 22. The vertical view of the obstacle detection effect using the proposed algorithm in smaller hyperparameter settings (parking).

Figure 23. The vertical view of the obstacle detection effect using the proposed algorithm in larger hyperparameter settings (parking).

Table 1. Comparison of evaluation indicators for different algorithms in ground segmentation (warehouse).

Algorithm	Precision (%)	Recall (%)	F1_score (%)	Runtime (ms)
Patchwork++	75.3	30.6	43.5	22.4
DipG-Seg	48.7	19.3	27.6	17.7
Proposed	88.2	92.5	90.3	11.3

Table 2. Comparison of evaluation indicators for different algorithms in ground segmentation (parking).

Algorithm	Precision (%)	Recall (%)	F1_score (%)	Runtime (ms)
Patchwork++	63.4	28.1	38.9	23.0
DipG-Seg	52.7	26.2	35.0	18.1
Proposed	94.4	95.3	94.8	11.6

Table 3. Comparison of evaluation indicators in different algorithms for smaller objects at close range (warehouse).

Algorithm	Precision (%)	Recall (%)	F1_score (%)	Runtime (ms)
Euclidean Clustering	75.3	95.1	84.0	124.2
CenterPoint	94.4	12.3	21.8	1263.7
Proposed (Smaller Hyperparameter)	91.2	93.7	92.4	2.8
Proposed (Larger Hyperparameter)	80.9	94.1	87.0	2.9

Table 4. Comparison of evaluation indicators in different algorithms for smaller objects at close range (parking).

Algorithm	Precision (%)	Recall (%)	F1_score (%)	Runtime (ms)
Euclidean Clustering	87.1	96.4	91.5	119.7
CenterPoint	96.8	16.7	28.5	1425.9
Proposed (Smaller Hyperparameter)	92.2	95.5	93.8	2.6
Proposed (Larger Hyperparameter)	88.4	95.9	92.0	2.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pang, F.; Chen, Y.; Luo, Y.; Lv, Z.; Sun, X.; Xu, X.; Luo, M. A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles. Drones 2024, 8, 676. https://doi.org/10.3390/drones8110676

AMA Style

Pang F, Chen Y, Luo Y, Lv Z, Sun X, Xu X, Luo M. A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles. Drones. 2024; 8(11):676. https://doi.org/10.3390/drones8110676

Chicago/Turabian Style

Pang, Fenglin, Yutian Chen, Yan Luo, Zigui Lv, Xuefei Sun, Xiaobin Xu, and Minzhou Luo. 2024. "A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles" Drones 8, no. 11: 676. https://doi.org/10.3390/drones8110676

Article Menu

A Fast Obstacle Detection Algorithm Based on 3D LiDAR and Multiple Depth Cameras for Unmanned Ground Vehicles

Abstract

1. Introduction

2. Related Works

3. Method

3.1. Algorithm Overview

3.2. Multi-Sensor Point Cloud Fusion

3.3. Ground Point Cloud Segmentation

3.4. Point Cloud Projection and Denoising Based on Fan-Shaped Area

4. Experiments and Results

4.1. Experiment Overview

4.2. Experimental Results in Ground Point Cloud Segmentation

4.3. Experimental Results in Obstacle Detection

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI