3.1. Experiment on Equidistant Outward-Expanded Polygon Post-Processing Algorithm
The effectiveness of the fusion recognition and ranging algorithm depends on the recognition enhancement of the equidistant expansion polygon of the leaves, the ranging accuracy of the optimal confidence point cloud, and the ranging enhancement of the 2D projection clustering of the stems. This section uses a corn physical model to verify the recognition enhancement of the equidistant expansion polygon. First, the YOLOv8 model needs to be used to train the corn leaves and stalks recognition model to recognize corn leaves and stalks in real time.
During training, the YOLOv8l model is used. This model is a larger model in the YOLO series, with relatively high accuracy but relatively low detection speed. Since the actual field operation speed of the emasculating robot is low and the industrial computer is equipped with a GPU, it is suitable to use the l model for more accurate training. The appropriate setting of training parameters has a great impact on the recognition accuracy of the subsequent model. After adjustment, epochs is set to 300 and batch is set to 8. The training results are shown in the
Figure 8. The model accuracy, recall rate and average precision are 92.1%, 91.4%, and 94.9%, respectively. It is practical and can support the effective recognition of corn leaves and stalks.
To simulate different growth positions and characteristics of real corn leaves in a depth camera, we use a robotic arm to grasp the tip of a leaf, as shown in
Figure 9. The robotic arm’s gripper faces the base of the leaf, and its initial position depends on the natural initial position of the tip of the leaf. The gripper is controlled to rotate around the leaf’s main vein by 15° per rotation and to translate radially along the stem by 1 cm per movement.
When the shape of the leaves in the image is relatively uniform and the change of the projection area in the X and Y directions is relatively gentle, YOLOv8 can obtain a high recognition rate, as shown in
Figure 10a. However, when the projection area in the X and Y directions changes greatly, that is, when the leaves are curved, the recognition rate of YOLOv8 will decrease, and a single leaf may be mistaken for multiple leaves, as shown in
Figure 10b. The use of equidistant polygon expansion for post-processing will cause the contours to overlap, and the overlapping parts of the contours can be used to cluster the same leaf, as shown in
Figure 10c. According to the above post-processing principle, three groups of tests were set up using three leaves, and the rotation angle and translation distance of the manipulator gripper of each test group are shown in
Table 1. The results of the three groups of tests are shown in
Figure 11, where the recognition completeness value of YOLOv8 and the recognition completeness value of the equidistant expansion polygon post-processing are expressed as identification completeness percentage (ICP). ICP can be expressed by the proportion of frames with complete leaf recognition in the 500 frames before and after the current moment. In the three groups of tests, the highest recognition completeness reached 95.89%, and the average recognition completeness was 95.61%. It can be seen that the equidistant polygon expansion post-processing algorithm enhances the integrity of the recognition results and plays a vital role in improving the measurement accuracy.
When corn leaves are relatively sparse, meaning the leaves do not overlap, the equidistant expansion polygon post-processing algorithm effectively re-identifies discontinuous leaf segments caused by factors such as lighting or curling as a single leaf. However, when the corn leaves are dense and overlapping occurs, this algorithm may mistakenly cluster overlapping leaves in the 2D projection as a single leaf. To address this issue, depth camera point cloud data is utilized, and the Euclidean distance clustering algorithm is applied to determine whether the point clouds within the leaf contour generated by the post-processing algorithm belong to the same leaf. If the point clouds within the contour meet the Euclidean distance clustering criteria, meaning that the variation in distance between adjacent point clouds is within a certain range, the point clouds can be considered to belong to the same leaf, indicating that the contour encloses a single leaf. Conversely, if a sufficient number of point clouds exhibit a sudden change in distance, failing to meet the Euclidean distance clustering criteria, it can be determined that the point clouds within the contour generated by the post-processing algorithm do not belong to the same leaf. In such cases, the non-primary point clouds and the feature points of the contour polygon are removed, retaining only the contour of the primary identified leaf.
3.2. Experiment on Optimal Confidence Point Cloud Distance Measurement
Since the ranging accuracy of LiDAR and depth camera is stable at centimeter level, while the accuracy of laser displacement sensor is stable at millimeter level, we first use a BLF-200NM-485 laser displacement sensor to measure the relative distance between corn leaves and stalks features. The sensor has a detection range of 100 mm–2000 mm, a resolution of 1 mm, and a ranging accuracy of 2 mm, which is higher than the ranging accuracy of LiDAR and depth camera. Considering that the laser displacement sensor can only perform high-precision measurements of single-point distances, a dual-axis X-Z slide table was constructed as shown in
Figure 12a,b. The laser displacement sensor was fixed on the slider of the
Z-axis slide table, enabling high-precision measurements of discrete curved surfaces within the plane.
Using the corn model for experimentation, as shown in
Figure 13, the setup involves controlling
X- and
Z-axis slide table motors via RS485. The
X-axis motor drives the screw and slider at a speed of 5 mm/s. When reaching the limit position or zero position of the
Z-axis slide table, the
Z-axis motor moves the slider upward by 2 mm, enabling discrete scanning with the laser displacement sensor. After scanning, a set of points including corn features is obtained. Points outside the corn features are filtered out for downsampling, and the point cloud data is imported into MATLAB to obtain results as shown in
Figure 14a, primarily illustrating the scan results of the leaves. Due to significant variations in leaf morphology, individual distance points cannot fully reflect the overall characteristics of the corn leaves. Therefore, the mean of effective distance points in the depth direction is used as the distance value of the leaves, and the α-shape is employed to describe the leaf shape, as depicted in
Figure 14b.
The α-shape algorithm is used to extract convex hull shapes from point cloud data [
23]. Based on the convex hull generation method of Delaunay triangulation, this algorithm effectively extracts shapes with convex hull characteristics from discrete point cloud data. The basic idea of the α-shape algorithm is to control a parameter α to determine the shape of the convex hull. Different values of α yield different shapes of the convex hull. Specifically, when α is small, the shape of the convex hull closely resembles the original point cloud data, whereas a larger α results in a smoother shape of the convex hull [
24].
In the α-shape algorithm, Delaunay triangulation is first performed. Delaunay triangulation divides a set of points into non-overlapping triangles and has favorable properties for describing adjacency relationships between points. After obtaining the Delaunay triangulation, the convex hull boundary can be filtered based on boundary conditions. For each edge, if there exists a circle such that both endpoints of the edge and all points on the circle are within the α radius, then the edge belongs to the convex hull boundary.
There are libraries in MATLAB and the Point Cloud Library (PCL) that implement the α-shape algorithm, facilitating its fast invocation across different platforms. After obtaining the convex hull shape of the leaves, the leaves are projected onto a plane perpendicular to the line connecting the maize axis and the robot axis. The extreme points of the projected leaves in the vertical and horizontal directions relative to this plane are collected as feature points Q for subsequent tests.
Based on the above data processing workflow, to further test the effectiveness of the algorithm when the robot operates in a maize field, a plane Cartesian coordinate system is established with the maize simulation model as the origin. Considering that the typical row spacing in real maize fields is often 60 cm, four measurement points are set at 30 cm horizontal distance from the maize, spaced 40 cm apart, as shown in
Figure 15. Using the laser displacement sensor, relative distances of maize leaf features are measured. The measurement results are presented in
Table 2.
Using the laser displacement sensor, accurate distances of maize leaves relative to the measurement points can be measured. To quantitatively analyze the actual ranging accuracy of the optimal confidence point cloud, an experimental prototype was constructed as shown in
Figure 16. The prototype has dimensions of 48 cm in length, 45 cm in width, and 42 cm in height. It utilizes a tracked chassis as the motion unit, with DC brushed motors driving the tracks on both sides. After reduction by a gear reducer, the maximum travel speed is 0.3 m/s, and the minimum turning radius is 24 cm. The chassis motor driver, lithium battery, and transformer are placed on top of the robot chassis. The industrial computer, GNSS module, IMU, UWB positioning module, etc., are housed in the electronic control cabinet above the robot. The TM16 LiDAR is placed above the electric control cabinet. It is a 16-line multi-line LiDAR with a measurement range of 0.2~150 m. The ranging accuracy is ±5 cm within the range of 0.2–0.5 m, and the ranging accuracy is ±2 cm above 0.5 m. The RealsenseD435 depth camera is above the electric control cabinet and below the LiDAR and is arranged symmetrically. The ideal ranging range of the depth camera is 0.3~3 m, the depth field of view is 87° × 58°, and the ranging error within 2 m is less than 2%. Under the feedback of UWB, the chassis is controlled to move at a speed of 0.1 m/s along the dashed line in the diagram, and distance values obtained by the optimal confidence point cloud measurement are recorded at four measurement points.
Distance values collected from point clouds on the leaves are converted into projection distances to obtain the point cloud average and the extreme values of the four convex hulls. The experimental setup is depicted in
Figure 17, and the measurement scenario is shown in
Figure 18. The experimental results are presented in
Table 3.
Comparing
Table 2 and
Table 3 reveals that the distance accuracy of the optimal confidence point cloud is lower when the distance is less than 30 cm, with an average distance measurement error of 3.1 cm and a maximum error of 3.4 cm for convex hull feature points. Beyond 30 cm, there is a significant improvement in distance accuracy, with an average measurement error of 0.51 cm and a maximum error of 0.25 cm. In order to reflect the accuracy of the optimal confidence point cloud algorithm in ranging, a single sensor was used for measurement and comparison. If only the LiDAR was used, the mean ranging error was 3.5 cm and the maximum error of the convex hull feature point was 6.9 cm. If only the depth camera was used, the mean ranging error was 4.1 cm and the maximum error of the convex hull feature point was 4.7 cm. It can be seen that the ranging accuracy of the optimal confidence point cloud is due to a single sensor, which verifies the effectiveness of the multi-sensor fusion solution.
The primary reason for the lower accuracy within 30 cm is due to the measurement characteristics of the sensors, where both the depth camera and LiDAR experience a sharp decrease in measurement precision. Therefore, even with multi-sensor fusion, distance measurement accuracy is challenging to guarantee. Conversely, beyond 30 cm, sensor distance measurement performance is normal, resulting in notably improved accuracy.
For the convex hull feature points at the same measurement location, due to the large scanning measurement graduation value of the BLF-200NM-485 laser displacement sensor, edge points may be missed, and the depth camera and LiDAR have poor edge segmentation capabilities, which often lead to point cloud jitter at the edge position. At the same time, due to the interference of natural wind, the sensor’s edge detection value may also jitter. Therefore, there is a certain error in the convex hull feature points, but for corn field navigation operations, the centimeter-level error is still within the allowable range.
By utilizing a high-precision laser displacement sensor to scan and measure corn features, particularly leaves, high-accuracy distance measurements are obtained. This validates the effectiveness of the proposed fusion algorithm in distance measurement tests at the same test points.
3.3. FILL-DBSCAN Post-Processing Algorithm Test
The post-processing algorithm using equidistant expansion polygons addresses the issue of misidentification of leaves during recognition due to leaf shape and lighting variations. It utilizes features close in distance to the mask after misidentification, employing equidistant expansion polygons to connect and cluster multiple masks of a single misidentified leaf, thereby achieving accurate leaf recognition. Unlike leaves, maize stalks are often misidentified as multiple masks due to leaf obstruction, resulting in greater distances between these masks within the same maize plant. Consequently, equidistant expansion polygons cannot effectively connect these segments. However, post-misidentification, the number of stalks significantly increases, complicating navigation operations. Therefore, it is essential to utilize the FILL-DBSCAN post-processing algorithm to cluster multiple misidentified stalk masks.
Simulating the scenario where maize stalks are obstructed using a maize model, as shown in
Figure 19a, the stalk is divided into three segments. After recognition by YOLOv8, these segments are similarly identified as three segments, with substantial distances between the point clouds of these segments, as depicted in
Figure 19b. Projecting these three segments onto a plane and utilizing the DBSCAN algorithm achieves plane clustering, as shown in
Figure 19c, thereby enabling three-dimensional clustering of multiple segments of stalk point clouds from the same maize plant, as shown in
Figure 19d. The corn model was placed on a rotating platform and rotated in 5° increments. Clustering tests were conducted on the stalks at different angles. Out of 72 experimental groups, 69 successfully identified and completed clustering. In three groups, the clustering stability was low due to a large occlusion area and low identification accuracy. The FILL-DBSCAN post-processing algorithm achieved a 95.8% success rate in recognizing the same corn stalks under leaves occlusion, which is sufficient to meet actual navigation needs.
3.4. Corn Field Validation Test
To further verify the effect of the algorithm in this paper on actual corn fields, a corn field in Zhangye, Gansu was selected for field testing. The actual test scene is shown in the
Figure 20. Different from laboratory tests, the morphology of corn leaves in the natural state will change greatly within a day, and the natural wind will also affect the leaf morphology, resulting in obvious errors in the leaf measurement values before and after the test. The change in illumination will directly affect the recognition effect of YOLOv8 and the post-processing algorithm. Therefore, the test time needs to be strictly controlled to reduce the test error. A total of 500 corn leaves in different areas of the field were selected for equidistant expansion polygon post-processing tests. Similarly, the proportion of frames in which leaves were completely recognized in the 500 frames before and after post-processing was used to represent the recognition completeness value of YOLOv8 and the recognition completeness value of equidistant polygon expansion post-processing. The maximum recognition completeness of YOLOv8 was 95.3%, and the average was 80.4%. After post-processing, the maximum recognition completeness was 97.1%, and the average was 86.4%. It can be seen that in the real field environment, the equidistant expansion polygon post-processing algorithm effectively improves the integrity and accuracy of leaf recognition.
In the indoor test, it was clear that the measurement results within the range of 30 cm were limited by the measurement characteristics of the sensor, and there was a large error in the measurement results. Therefore, in the corn field test, only the leaves outside the 30 cm range of the test prototype were measured. Select corn planting rows with flat terrain and use laser displacement sensors to perform discrete measurements on leaves with good growth and measurement significance on both sides of the rows. The measurement results of the laser displacement sensor are also used as a reference for the optimal confidence point cloud measurement error. In total, 8 consecutive corn plants were selected between the rows for the test, and 10 measurement sites were selected for the test. The test schematic is shown in the
Figure 21, where the red dot is measurement site 1, and the 10 measurement sites are evenly distributed on the
y-axis with a spacing of 40 cm. For measurement site 1, plants 1 to 8 are the main measurement objects, and for measurement site 2, plants 3 to 10 are the main measurement objects. The measurement objects of measurement sites 3 to 10 are similar. During the test, UWB equipment is used to ensure the accuracy of the measurement site position.
The number of effective leaves in the 10 measurements was 308. The interface of the host computer Rviz during the test is shown in
Figure 22. To enhance the recognition and clarity, the depth camera point cloud image is not shown in the figure. The red outline is the post-processing result of the equidistant expansion polygon, that is, the continuously identified leaves. The yellow outline is the clustering result of the stems under the FILL-DBSCAN algorithm. After screening the objects outside the measurement range and removing the abnormal points, the laser displacement sensor measurement results and the optimal confidence point cloud measurement results are compared with the distance measurement mean error and the convex hull feature point error. The optimal confidence point cloud distance measurement mean error is 2.9 cm, and the maximum error of the convex hull feature point is 6.3 cm. After that, only the laser radar or depth camera is used to measure the leaves. The distance measurement mean error is 8.8 cm, and the convex hull feature point error is 9.1 cm. As the measurement distance increases, the depth camera measurement error gradually increases. It can be seen that the optimal confidence point cloud can effectively improve the distance measurement accuracy of corn leaves. The main reason why the accuracy of the optimal confidence point cloud distance measurement in the corn field is lower than that in the laboratory is that the different flatness of the ground leads to a low coincidence between the laser displacement sensor and the test prototype measurement benchmark. At the same time, the field is more susceptible to natural wind and light, which will also increase the measurement error.
Clustering was achieved in all 26 plants with the support of the FILL-DBSCAN algorithm. Different from laboratory tests, the test prototype will measure corn plants at different positions when driving between corn rows. Even if the test prototype is in a certain position, the stalks cannot be clustered due to the occlusion of leaves on the stalks, but when the prototype drives to a position with less occlusion, the stalks can still be clustered. The measurement of the same corn stalk at different positions effectively reduces the impact of the occlusion effect of leaves on corn stalk identification.