Height-Variable Monocular Vision Ranging Technology For Smart Agriculture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Received 11 June 2023, accepted 14 August 2023, date of publication 17 August 2023, date of current version 1 September 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3305964

Height-Variable Monocular Vision Ranging


Technology for Smart Agriculture
TIAN GAO 1, MEIAN LI 1, LIXIA XUE2 , JINGWEN BAO1 , HAO LIAN1 , TIN LI1 , AND YANYU SHI1
1 Computer and Information Engineering College, Inner Mongolia Agricultural University, Hohhot 010000, China
2 Inner Mongolia Power Group Mengdian Information and Communication Industry Company, Hohhot 010000, China

Corresponding author: Meian Li ([email protected])


This work was supported in part by the Inner Mongolia Natural Science Foundation ‘‘Agricultural machinery obstacle avoidance
algorithms based on CNN and vehicle monocular vision dynamic ranging model’’ under Grant 2019MS06016, in part by the Research and
Innovation Funding Projects for Postgraduates in Inner Mongolia ‘‘Vehicle camera self-calibration technology based on target recognition
and geometric ranging model’’ under Grant SZ20200077, and in part by the Inner Mongolia Natural Science Foundation ‘‘Spatial
invariance method of feature point for multi-objective individual identification of livestock’’ under Grant 2023LHMS06012.

ABSTRACT Smart agriculture utilizes a variety of advanced technologies to promote sustainable agriculture
and provide solutions for intelligent, automated and unmanned agriculture. Agricultural robots and related
technologies are an important part of smart agriculture, while autonomous navigation is a core function
of autonomous agricultural robots, which rely on information about the distance of obstacles in a scene
to support decision making. In this paper, we propose a ground point geometric ranging model, which
can be used in camera height dynamic change scenarios, and the method is validated by model derivation
and hypothesis testing. The model combines ranging and camera calibration, choosing to compensate for
distortion and defocus phenomena caused by nonlinear imaging of the camera to the focal length, and
completes the parameter calibration using a small amount of ground point real distance data. In this paper,
the YOLOv8 model is used to identify and range outdoor cattle, and the experimental results show that the
lowest range accuracy of this method reaches 95%, this method eliminates the dependence on camera height
for focal calibration in ranging models, and in practice requires only once focal calibration for permanent
use, achieving a significant reduction in the complexity of focal calibration, and the migrability of the model
in scenarios where the camera height changes.

INDEX TERMS Monocular vision, distance measurement, focal length calibration, smart agriculture,
YOLOv8.

I. INTRODUCTION integrated. The ‘‘ Intelligent Agriculture ’’ can significantly


Agriculture is the source of human food and clothing, the improve the efficiency of agricultural production and man-
basis of survival, and the primary condition for all produc- agement, which is based on accurate modern sensor technol-
tion. It provides food, foodstuffs, industrial raw materials, ogy for real-time monitoring, the use of cloud computing,
capital and export materials for other sectors of the national data mining and other technologies for multi-level analysis,
economy. Agriculture is an important industrial sector in the and links analysis instructions with various control equip-
national economy, supporting the construction and develop- ment to complete agricultural production and management.
ment of the national economy with land resources as the Computer vision collects the necessary visual data about
production object. The development trend of agriculture is crops, livestock or farms, enabling us to identify, detect and
highly intellectualized, socialized, internationalized, com- track specific objects using visual elements, and to under-
mercialized, capitalized, scaled up, specialized, regionalized, stand complex visual data for automated tasks. Over the
factoryized and other positive factors are intertwined and past decades, expert and intelligent systems based on com-
puter vision technology have been well used in agricultural
The associate editor coordinating the review of this manuscript and operations, such as seed quality analysis [1], soil analy-
approving it for publication was Md. Moinul Hossain . sis [2], plant health analysis [3] and yield estimation [4].

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 92847
T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

In smart agriculture, distance measurement also has impor- during inference may significantly increase the efficiency
tant research value in areas such as automatic driving of and the performance, the network requirement for labeled
agricultural vehicles and terrain mapping of farmland. ground truth depth data decreased. Masoumian et al. [11]
The purpose of this paper is to investigate ranging tech- proposed a Unsupervised Learning Approach, they devel-
niques that can be applied in low-cost agriculture, achieving oped a multi-scale Monocular Depth Estimation based on a
the ranging of objects with contact points with the ground, graph convolutional network. Their network consists of two
such as livestock and crops. Based on the team’s previous parallel autoencoder networks: DepthNet and PoseNet. The
research results, we propose a ground point geometric rang- DepthNet is an autoencoder composed of two parts: encoder
ing model, which can be used in camera height dynamic and decoder; the CNN encoder extracts the feature from the
change scenarios, and the method is validated by model input image, and a multi-scale GCN decoder estimates the
derivation and hypothesis testing. The model combines rang- depth map. PoseNet is used to estimate the ego-motion vector
ing and camera calibration, choosing to compensate for dis- between two consecutive frames. The estimated 3D pose and
tortion and defocus phenomena caused by nonlinear imaging depth map are used to construct a target image. This approach
of the camera to the focal length, and completes the parameter has a high prediction accuracy of 89% on the publicly KITTI
calibration using a small amount of ground point real distance and Make3D datasets, with a 40% reduction in the number
data. In this paper, the YOLOv8 model is used to identify and of trainable parameters. Compared with deep learning-based
range outdoor cattle, and the experimental results show that ranging methods, geometric model-based ranging methods
the lowest range accuracy of this method reaches 95%. require fewer training parameters and less ground truth depth
data, and the scenes are more transferable.
II. LITERATURE SURVEY Camera calibration is the basis of visual ranging, and
Commonly used distance measurement methods include the geometric model-based ranging methods need to obtain
radar distance measurement [5], laser distance measure- the calibration parameters of the camera first, so the main
ment [6], and visual distance measurement. The radar ranging calibration methods of the camera need to be introduced
system has high accuracy, but the equipment is costly and here before introducing the geometric-based ranging model.
vulnerable to weather; the laser ranging system is low cost, The main purpose of camera calibration is to determine
but requires a clean environment free of foreign objects such the relationship between the 3D information of the surface
as dust; the visual ranging system is low cost, has no special points of the object and the corresponding image points by
requirements for the use environment, and is more suitable for constructing a linear imaging model of the camera. There are
use in the complex outdoor environment of farms than other three methods of camera calibration, namely the traditional
methods. camera calibration method, the active vision camera calibra-
Vision ranging systems can be classified into monocular tion method, and the camera self-calibration method [12],
systems and stereo systems [7]. In a stereo system it is [13]. The traditional camera calibration method is applica-
necessary to match the pictures captured by multiple cam- ble to any camera and has high accuracy, but requires a
eras. When the distance from the camera to the scene is calibration object and a complex algorithm. The traditional
greater than the baseline of the stereo camera (the distance camera calibration method proposed by Zhang [14] uses a
between two cameras), the stereo vision degrades to monoc- calibration plate composed of two-dimensional squares for
ular vision. The monocular vision system does not need to calibration, acquires pictures of different poses of the cali-
match the image in the data preprocessing stage and the bration plate, extracts the pixel coordinates of corner points
hardware cost is low [8], so it is a good choice to obtain the in the pictures, calculates the initial values of the internal
depth information of the image target. At present, the com- and external parameters of the camera through homography
monly used monocular vision ranging methods include deep matrix, estimates the distortion coefficients using nonlinear
learning-based ranging methods and geometric model-based least squares, and finally optimizes the parameters using
ranging methods. In the deep learning-based ranging meth- the maximum likelihood estimation method. Li et al. [15]
ods, Qi et al. [9] proposed a Supervised Learning Approach, presented a calibration method that uses feature extraction
this method utilized two networks to estimate the depth map techniques to encode feature points that contain features at
and surface normal from single images. These two networks many different scales. It can be used for internal and external
enable the conversion of depth-to-normal and normal-to- calibration of multi-camera systems, as well as for internal
depth and collaboratively increase the accuracy of the depth calibration of individual cameras. The active vision-based
map and surface normal. Although their neural network can camera calibration method [16] is based on an active system
increase the accuracy of depth maps, for training, they require that controls the camera to make a specific motion and take
ground truth, including surface normal, which is hard to multiple sets of images, and solves the internal and external
obtain. Luo et al. [10] proposed a Semi-Supervised Learning parameters of the camera based on the image information and
Approach, this method showed that the monocular depth known displacement changes. This calibration method does
estimation problem can be reformulated as two sub-problems, not require calibration object, and the algorithm is simple and
a view synthesis procedure followed by stereo matching. they robust, but it needs to be equipped with an accurate control
corroborated that the application of geometric limitations platform, so the cost is high. The self-calibration method

92848 VOLUME 11, 2023


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

based on Kruppa [17] establishes the constraint equation point on the same ray passing through the camera optical
about the camera’s internal parameter matrix through the center has the same pixel coordinates, it is guessed that the
quadratic curve, and uses at least 3 pairs of images to calibrate focal length value of the pixel point is independent of the
the camera. The length of the image sequence will affect the camera height. In this paper, we will demonstrate through
stability of the calibration algorithm, and the infinite plane theoretical deduction and hypothesis testing that under the
in the projective space cannot be guaranteed. The camera calibration method proposed by Xue, the change of camera
self-calibration method does not need the calibration object, height will not affect the focal length of the pixel point,
and it has strong flexibility and can be calibrated online, but that is, the calibration result of focal length can be applied
the accuracy is low and the robustness is poor. at full camera height with a fixed camera tilt angle, thus
In the geometry-based ranging model, the method pro- achieving the purpose of reducing the number of focal length
posed by Mao et al. [18] measures the distance of an object calibrations and expanding the use scenario of the ranging
that has a contact point with the ground and the height of method. The main innovations of this study are as follows:
the object to be measured is known. The exact distance 1) A geometric ranging model combining camera calibration
can be obtained based on similar triangles when the vertex is presented, and it is demonstrated through model derivation
imaging is located in the center of the image plane, and when and hypothesis testing that the method can be permanently
the vertex imaging is not in the center of the image plane, available with a single calibration. This allows the model to
the approximate solution of the distance is obtained. The be used in ranging scenarios where the camera height varies
disadvantage of this method is that the object needs to be arbitrarily. 2) Combining the latest YOLOv8 target detection
perpendicular to the ground, and the approximate solution model with the ranging model proposed in this paper, a lowest
can show higher accuracy when the object distance is much range accuracy of more than 95% is achieved in cattle ranging
larger than the camera height, or the camera height is much on outdoor farms.
larger than the object height. The ranging model proposed
by Martínez-Díaz [19] does not need to consider the camera III. MATERIALS AND METHODS
pose, but needs to know the relative distance between the A. ESTABLISH GEOMETRIC MODEL AND INFERENCE
three points in the image including the measured point, so the PROOF
application scenario is limited. Most of the current ranging In the geometric model proposed by Xue, the distance from
methods based on geometric ranging models do not take the point on the plane to the camera is:
into account the impact of camera nonlinear imaging on the
H · x 2 + f 2 − y · f · tan α

model, and require known spatial information such as the
d= p (1)
height of the object being measured or the distance between x 2 + f 2 · (f · tan α + y)
feature points during the ranging process.
The calibration method proposed by Xue et al. [20] links As shown in (1), H is the height of the camera, x and y
the geometric ranging model and the focal length calibration are the physical coordinates of the measured point, and α is
together, differs from the previous calibration idea of correct- the downward tilt of the camera. The focal length f of the
ing the pixel coordinates with distortion. In this method, the pixel point which corresponding the measured point in the
distortion and defocusing phenomena caused by the nonlinear image can be found when d is known. This calibration method
imaging of the camera are innovatively compensated to the compensates the distortion and defocusing caused by the
focal length of each pixel point, so the focal length of the non-linear imaging of the camera to the focal length, so the
camera is not considered as a constant value in this calibration focal length of each pixel point is not the same. This can be
method, and the expression of the focal length of a pixel explained that this aberration reduction method considers that
point with its physical coordinates needs to be obtained using the light reflected from the measured point remains linearly
polynomial regression. The paper proposes a ranging model imaged after passing through the optical centre, but that the
when the camera has tilt in three dimensions, and experiments different imaging points correspond to different positions
prove that the ranging accuracy under this calibration method of the imaging plane, and therefore different focal length
reaches more than 97%. In the discussion part of the article, values f for different imaging points. The focal length at
Xue uses experimental data to point out that the focal length point P1 is noted as f1 and the focal length at point P2 is
regression vector v calculated by the camera at different noted as f2.
heights is not the same, but does not further investigate the In Fig.1, point O is the location of the camera, and the
effect of camera height change on the focal length of pixel distances of point O from plane 1 and plane 2 are H1
points. Therefore, it needs to be calibrated separately for and H2. P1 and P2 are the intersections of the rays pass-
each installation height, which cannot meet the use in height ing through point O with plane 1 and plane 2. Since P1
change scenarios. and P2 are in the same ray passing through the optical
In a word, this paper is a follow-up study to the cali- center, the projection points of P1 and P2 on the image
bration method of Xue, and aims to illustrate the effect of plane are the same, both being P′ (P′ X, P′ Y). d1 and d2
camera installation height on the focal length calibration are the distances of points P1 and P2 from the camera,
results of pixel points in this method. Since any measured respectively.

VOLUME 11, 2023 92849


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

B. HYPOTHESIS TESTING
Hypothesis testing, also known as statistical hypothesis test-
ing, is a statistical inference method used to judge whether
differences between samples and samples, or between sam-
ples and the population, are caused by sampling error or
by essential differences. Significance testing is one of the
most commonly used methods in hypothesis testing, and it
is also the most basic form of statistical inference. The basic
principle is to first make a certain assumption about the
characteristics of the population, and then make an inference
about whether this assumption should be rejected or accepted
through the statistical inference of sampling research. Com-
monly used hypothesis testing methods include Z-Test, T-
Test, Chi-square Test, F-Test, etc.
In this paper, the focal length values of multiple pixel
points at different heights are obtained by shooting infor-
mation, and then the relationship between the focal length
values of the same pixel points at different camera heights
is investigated by Paired-Samples T Test. The basic steps
are as follows: (a) The original hypothesis is proposed as:
FIGURE 1. Ranging models at different camera heights. the population mean of the focal length of the pixel points
at different heights has no significant difference, expressed
Since d1 // d2, OO1P1∽OO2P2, according to the proper- as H0 = µ1 − µ2. µ1 and µ2 are the population mean
ties of similar triangles can be obtained: values of the two paired samples, respectively. (b) Construct
H 1· P′2 +f 12 −P′Y ·f 1·tan α

the statistics:
q X
X +f 1 ·(f 1·tan α+PY )
P′2 2 ′
d1
=
OO1
=
H1
= (2) d̄ − (µ1 − µ2 )
d2 OO2 H2 H 2· P′2 +f 22 −P′Y ·f 2·tan α
 t= s√ (4)
q X n
X +f 2 ·(f 2·tan α+PY )
P′2 2 ′

After the (2) is reduced, can be obtained, so the focal d̄ is the mean of the difference between the two paired sam-
lengths of P1 and P2 are equal. ples and µ1 − µ2 is the difference between the mean values
According to the above proof, the focal length values and of the two populations. (c) Calculation of the observed values
physical coordinates of P1 and P2 are equal. Since light of the test statistic and the corresponding probability P. (d)
travels in a straight line, when light intersects a plane in Given the significance level α, compare it with the probability
space, it must also have an intersection point in another plane P of the test statistic. If the probability P is less than the
parallel to that plane. So for any point P1 on plane 1 you can significant level α, the original hypothesis should be rejected
find a point P2 on plane 2 with the same focal length value and the mean values of the two populations are significantly
and physical coordinates as P1. different, that is, the transformation of camera height has a
In order to obtain the focal length values of different significant impact on the focal length of pixel point. On the
pixel points, Xue uses polynomial regression to learn the contrary, if the probability P is greater than the significant
relationship between the focal length f of a pixel point and level α, the original hypothesis should not be rejected and
the physical coordinates (x,y). The physical coordinate is the there is no significant difference between the two population
independent variable and the focal length is the dependent means, that is, the transformation of the camera height has no
variable, (3) and regression vector V are obtained through significant effect on the focal length of pixel point. In order to
polynomial regression. reflect the reliability of the conclusion from a statistical point
of view, the experiment was conducted using SPSS software
f (x, y) = v00 + v10 · x + v01 · y + v11 · x · y + v00 · y2 · · · to perform Paired-Samples T Test on multiple sets of focal
(3) length data. The original hypothesis was that there was no
And because the regression vector V in this calibration significant difference between the two sets of focal length
method is only related to the physical coordinates and focal data at different heights, the significance level is set at 95%.
length value of the pixel point, and any parallel plane can This experiment uses a 1920∗ 1080 pixel zoom camera, the
find the pixel point with the same physical coordinate and camera attitude is set as downward inclination α = 35◦ and
focal length value, so the same set of regression vectors can left inclination β = 0◦ , five camera heights are randomly
be applied to parallel planes. It can be proved that in Xue’s selected as H1 = 0.515m, H2 = 0.568m, H3 = 0.646m,
calibration method, the calibration result of the focal length H4 = 0.756m, and H5 = 0.809m. The experiment obtains
is not affected by the height transformation of the camera. the shooting information of 30 obstacles at different cam-

92850 VOLUME 11, 2023


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

era heights. In each shooting, the pixel coordinates of the


30 obstacles need to be kept unchanged. The distribution of
the randomly selected 30 pixels point is shown in Fig.2. The
distance d from each obstacle to the camera is obtained by
laser rangefinder, and the focal length f of each pixel point at
five heights can be calculated by (1). Since H3 is the middle
value of the five height values, the experiment explores the
relationship between the focal length value of each pixel point
when the camera height is H3 and the focal length value of
the same pixel point at the other camera heights. Experiment
uses MATLAB2022 to calculate focal length values and focal
length regression vectors.
FIGURE 2. Distribution of pixel points.

C. RANGING VERIFICATION
In order to reflect the reliability of the conclusion from els to improve the model performance. 3. Head: The Head
the distance measurement accuracy, the regression vectors part is replaced by the current mainstream Decoupled-Head
V under different heights of H1- H5 are first obtained by structure, which separates the classification and detection
polynomial regression, which are noted as V1- V5, and head, and also changed from Anchor-Based to Anchor-Free.
then calculate the obstacle distance obtained by using the 4. Loss: YOLOv8 abandoned the previous IOU matching
regression vector corresponding to the height and the obstacle or unilateral proportional distribution method, using the
distance obtained by using the regression vector V3, and Task-Aligned Assigner positive and negative sample match-
finally evaluate the distance error obtained by the two meth- ing method, and the introduction of Distribution Focal Loss
ods. Experiment uses MATLAB2022 to calculate the focal (DFL). 5. Train: The data enhancement part of the training
length value, focal length regression vector and distance to introduces the last 10 epoch off Mosiac enhancement opera-
the object. This experiment uses the error metric function tion in YOLOX, which effectively improves the accuracy.
Fig.3 shows a schematic of the network structure of
1 Xn Real (t) − Pred (t) YOLOv8. Fig.4 shows the complete ranging process and
MAPE = (5)
n t Real (t) calibration method, and includes the comparison before and
as the error evaluation index, because MAPE considers not after the improvement.
only the error between the predicted value and the true value, A zoom camera with 1920∗ 1080 pixels was used for the
but also the ratio between the error and the true value, so it is experiment, and the camera pose was set to downward tilt.
suitable for the evaluation of ranging accuracy. The experiments used a laser rangefinder to obtain the actual
Our method is planned for use in a low-cost agricul- distance of the measured point, which was used to verify the
tural application scenario, aiming to measure the distance of accuracy of the range finding results. The experimental steps
objects such as livestock and crops that have a point of contact are as follows: 1) calibrate the focal length of the image plane
with the ground, enabling intelligent grazing by unmanned at the camera height of 1.40m, and use the focal length regres-
farming machines. In order to verify the transferability of sion vector for the physical coordinates of all imaging points.
the ranging model in a real-world scenario, we conducted 2) adjust the camera height to 1.50m and acquire the images
experiments on an outdoor farm. of the measured points. 3) output the distance measurement
In practice, we need to identify the object to be results, compare them with the distance acquired by the laser
measured, and then complete the distance measure- rangefinder and output the distance measurement accuracy.
ment based on its output position information. This The experiment uses MATLAB2022 to calculate the focal
paper uses the official YOLOv8 open source code length value of the measured point, and uses python 3.9 for
(https://github.com/ultralytics/ultralytics) for target detec- cattle identification and ranging.
tion. YOLOv8, a major update published by ultralytics on The pre-trained YOLOv8 model downloaded from the offi-
January 10, 2023, now supports image classification, object cial website, as well as includes the recognition of cattle,
detection and real-time segmentation tasks. so we directly input the subject image into the pre-trained
The core features and changes of the YOLOv8 algorithm YOLOv8 model for target detection and position extraction.
are summarized as follows: 1. A new SOTA model is pro-
vided, including P5 640 and P6 1280 resolution target detec- IV. RESULTS
tion networks and YOLACT-based instance segmentation A. FOCAL LENGTH VALUE HYPOTHESIS TESTING RESULTS
models. Different size models based on scaling factors are F(H1) - F(H5) in Fig.5 is the focal length of 30 pixels points
also provided for different scenarios at N/S/M/L/X scales. obtained by using (1) when the camera height is H1∼H5,
2. Backbone: Backbone and Neck sections replace the C3 Fig.6 shows the regression models at five different heights.
structure of YOLOv5 with a more gradient-rich C2f structure, The Paired-Samples T Test results are shown in Table 1. Since
and adjust the number of channels for different scale mod- the probability P for all paired tests are greater than 0.05,

VOLUME 11, 2023 92851


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

the obstacle, d is the actual distance between the obstacle


and the camera, DCG is the distance from the obstacle to the
camera calculated using the regression vector corresponding
to the camera height, DEG is the distance from the obstacle
to the camera calculated using the regression vector V3. The
calculation gives MAPE(DCG ) = 0.025 and MAPE(DEG ) =
0.035, and the experimental data show that the error caused by
using the regression vector V3 for ranging at different heights
is only 1%.
The following Fig.7 shows the result of the subject
recognition.
For the four position coordinates extracted from the identi-
fied object, we choose the middle value of the two coordinates
in contact with the ground as the position coordinates of
the measured object, and output the distance measurement
model to calculate the final distance range results. Since when
FIGURE 3. Structure of YOLOv8. using the laser rangefinder to measure the distance of the
actual cattle, we choose the position of the front hoof of
the cattle as the measurement point, so the position point
output with the YOLOv8 model has a drift from the actual
measurement point, which can cause some of the ranging
error. The following Table 3 shows the range accuracy of the
two methods of manually extracting the front hooves of the
cow and using YOLOv8 to output the position of the cow.
According to the above results, it can be seen that the range
accuracy of positioning using YOLOv8 is slightly lower than
the manual extraction. Among them, the measured object No.
4 was reduced the most. The reason for the analysis was that
the positions of the remaining three measured objects out-
putted with YOLOv8 basically overlapped with the position
of the cow’s front hoof, while the position of No. 4 outputted
with YOLOv8 identification differed greatly from the posi-
tion of the cow’s front hoof, resulting in the reduction of the
ranging accuracy.
The experimental results show that our ranging method
still has high ranging accuracy in outdoor agricultural
environments.

V. DISCUSSION
A. PIXEL POINTS OF MULTIPLE CAMERA HEIGHTS
JOINTLY CALIBRATE THE FOCAL LENGTH
In Sec. III-A, it is demonstrated that the calibration results of
the pixel point focal length are not affected by the camera
height transformation, which not only means that the cali-
bration of the pixel point focal length can be done at only
one camera height when the camera tilt angle is constant, but
also shows that the calibration results are not affected when
FIGURE 4. Ranging process and calibration method. using pixel points from multiple camera heights together to
calibrate the focal length. In order to verify the correctness of
it indicates that the mean difference between the pixel point this conclusion, this paper first calibrates the focal length of
focal lengths obtained at height H3 and those obtained at the camera using the shooting information of 30 pixel points
other heights is not statistically significant compared to 0. at the camera height of 0.646m, and the calibration result
is recorded as F(fixH); then calibrates the focal length of
B. RANGING VERIFICATION RESULTS the camera using the shooting information of these 30 pixel
Table 2 shows the shooting information of two randomly points at five different camera heights, and the calibration
selected pixel points, where (x,y) is the pixel coordinates of result is recorded as F(mixH). Paired-Samples T Test was

92852 VOLUME 11, 2023


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

TABLE 1. Paired samples test of pixel focal length values at different


camera heights.

TABLE 2. Shooting information of pixel point (448,417) and (880,453) at


five camera heights.

FIGURE 5. Focal length values of 30 pixel points at different camera


heights.

conducted on the focal length calibration results of the two


groups, and the original hypothesis was that there was no
significant difference between the focal length data of the two
groups, α = 0.05, the test results are shown in Table 4. The FIGURE 6. Pictures a∼e are the regression models when the camera
two sets of calibration results are used to measure the distance height is H1-H5.
of 30 obstacles captured under the camera height of 0.646m,
and the related data and distance measurement results are
shown in Table 5, where (x,y) are the pixel coordinates of the
obstacles, d is the actual distance between the obstacles and
the camera, D1 is the distance obtained when the focal length
of the pixel point is F(fixH), D2 is the distance obtained when
the focal length of the pixel point is F(mixH).

B. PIXEL POINTS OF MULTIPLE CAMERA HEIGHTS


JOINTLY CALIBRATE THE FOCAL LENGTH
The research in the Results part and Sec. V-A of this paper
is based on the analysis of the shooting information of fixed
pixels. In the actual ranging environment, most application FIGURE 7. Pixel coordinates of the measured object in the farm.
scenarios are for ranging specific obstacles in the field of
view, so experiments are still needed to verify whether the 11 camera heights calculated by using the regression vector
conclusions of this study hold for ranging the same obstacle corresponding to the camera height, and the average range
at multiple camera heights. accuracy of this method is 98.9% and the lowest range accu-
In the experiment, 30 fixed obstacles are distributed in racy is 96%; Fig.9 shows the range accuracy of the obstacle at
the range of 40-50m from the camera, the camera height is 11 camera heights calculated by using the regression vector
between 1.39m-1.41m and every 2mm is taken, the camera V6, the average range accuracy is 98.6% and the lowest
downward tilt angle is set to 7 degrees, the regression vectors range accuracy is 95.8%. The experimental results show that
obtained by (3) for the camera at 11 heights are denoted as there is no significant difference between the two focal length
V1-V11. Fig.8 shows the range accuracy of the obstacle at calibration methods in the actual ranging environment, which

VOLUME 11, 2023 92853


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

TABLE 3. Range results for the four cows (labeled from left to right). TABLE 5. Focal length calibration and ranging at single camera height
and multiple camera heights.

TABLE 4. Paired samples test of focal length calibration results at single


and multiple camera heights.

FIGURE 8. Range accuracy calculated using regression vectors


corresponding to camera heights for 30 obstacles at 11 camera heights.

verifies that the calibration method proposed by Xue can


achieve high accuracy ranging in height-transformed scenes.

C. COMPARE WITH OTHER METHODS


This section compares the method proposed in this paper
with the following three ranging methods: 1) The calibra-
tion method proposed by Zhang [14]. The experiment first
calibrates the camera to obtain the focal length and radial proposed by Xue et al. [20]. This paper is an in-depth study
aberration parameters. The distance to the measured point is based on Xue’s method, aiming to clarify the effect of camera
obtained by bringing the distortion-reduced physical coordi- height on camera calibration and ranging results, so a com-
nates and a fixed focal length value to the distance measure- parison with Xue’s method is also needed.
ment model. 2) The distance measurement method proposed The experiments were made using a 1920∗ 1080 pixel zoom
by Martínez-Díaz [19]. As the method requires the input of camera, and the camera pose set to a downward tilt angle
the distance between any three feature points on the surface of is set to 7 degrees. A laser rangefinder was used to obtain
the target, the experiment will first mark the feature points on the actual distance to the measured point, which was used
the target and obtain the distance between them. Because the to verify the accuracy of the distance measurement results.
method proposed in this paper is for the ranging of ground The experiments were first calibrated with the camera at a
points, a point that has contact with the ground needs to be height of 1.4m, then the camera was adjusted to a height of
included in the selection of feature points, and its distance 1.5m and the target was photographed and ranged. In this
to the camera needs to be calculated. 3) The ranging method experiment, 30 fixed obstacles were placed within a range

92854 VOLUME 11, 2023


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

TABLE 6. Comparison of ranging methods.

combined with the ranging model, so the ranging accuracy is


not high. The improved ranging method in this paper reduces
the lowest range accuracy by 0.31% compared to Xue, but the
method proposed in this paper eliminates the dependence on
camera height for focal calibration in ranging models, and in
FIGURE 9. Range accuracy calculated using regression vector V6 for practice requires only once focal calibration for permanent
30 obstacles at 11 camera heights. use, achieving a significant reduction in the complexity of
focal calibration, and the migrability of the model in scenarios
where the camera height changes.
VI. CONCLUSION
In this paper, we propose a ground point geometric ranging
model, which can be used in camera height dynamic change
scenarios, and the method is validated by model derivation
and hypothesis testing. The model combines ranging and
camera calibration, choosing to compensate for distortion
and defocus phenomena caused by nonlinear imaging of the
camera to the focal length, and completes the parameter
calibration using a small amount of ground point real distance
data. In this paper, the YOLOv8 model is used to identify
and range outdoor cattle, and the experimental results show
that the lowest range accuracy of this method reaches 95%.
Compared with the previously proposed ranging method, the
lowest range accuracy is reduced by only 0.31%, using a very
small loss of ranging accuracy in exchange for a reduction
in calibration complexity, and the migrability of the model
in scenarios where the camera height changes. In the future,
we will apply this conclusion to dynamic scene ranging
to provide an effective ground data acquisition method for
FIGURE 10. Accuracy comparison of four ranging methods.
farm data collection and the establishment of ‘‘ Intelligent
Agriculture ’’. Applying this method to data collection from
of 5-20m from the camera, the accuracy of the 30 obstacles unmanned patrol vehicles on farms can effectively avoid the
was compared across the four ranging methods including establishment of fixed ground monitoring systems, increase
the method studied in this paper. Fig.10 shows the ranging the flexibility and diversity of data collection territories,
accuracy of the 30 measured points. and achieve automatic, accurate and real-time ground data
Table 6 shows the results of the multi-faceted comparison collection.
of the four ranging methods. It can be seen that Xue’s method
has the highest lowest range accuracy, but only her method REFERENCES
needs to be recalibrated when the camera height changes. [1] S. Zhu, L. Zhou, P. Gao, Y. Bao, Y. He, and L. Feng, ‘‘Near-infrared
hyperspectral imaging combined with deep learning to identify cotton seed
Saúl Martínez-Díaz’s method can only be used for ranging varieties,’’ Molecules, vol. 24, no. 18, p. 3268, Sep. 2019.
objects with known distances between surface features, and [2] R. Azadnia, A. Jahanbakhshi, S. Rashidi, M. Khajehzadeh, and P. Bazyar,
the usage scenario is more limited. The method of Zhang ‘‘Developing an automated monitoring system for fast and accurate
prediction of soil texture using an image-based deep learning net-
requires only one-time calibration and has no limitation on work and machine vision system,’’ Measurement, vol. 190, Feb. 2022,
the usage scenarios, but the calibration part is not closely Art. no. 110669.

VOLUME 11, 2023 92855


T. Gao et al.: Height-Variable Monocular Vision Ranging Technology for Smart Agriculture

[3] A. Abdalla, H. Cen, L. Wan, K. Mehmood, and Y. He, ‘‘Nutrient sta- MEIAN LI was born in Dazhu, Sichuan, China,
tus diagnosis of infield oilseed rape via deep learning-enabled dynamic in 1973. He received the Ph.D. degree in computer
model,’’ IEEE Trans. Ind. Informat., vol. 17, no. 6, pp. 4379–4389, systems architecture from the University of Elec-
Jun. 2021. tronic Science and Technology, in 2007.
[4] O. E. Apolo-Apolo, J. Martínez-Guanter, G. Egea, P. Raja, and He has presided over one emergency project
M. Pérez-Ruiz, ‘‘Deep learning techniques for estimation of the yield and of the National Natural Foundation of China, one
size of citrus fruits using a UAV,’’ Eur. J. Agronomy, vol. 115, Apr. 2020, project of the Inner Mongolia Natural Foundation
Art. no. 126030.
of China, and participated in two projects of the
[5] L. Piotrowsky, S. Kueppers, T. Jaeschke, and N. Pohl, ‘‘Distance mea-
National Natural Foundation of China and two
surement using mmWave radar: Micron accuracy at medium range,’’ IEEE
Trans. Microw. Theory Techn., vol. 70, no. 11, pp. 5259–5270, Nov. 2022. projects of the Inner Mongolia Natural Foundation
[6] L. Lombardi, V. Annovazzi-Lodi, G. Aromataris, and A. Scirè, ‘‘Distance of China. He has published more than 30 articles, including nearly 20 SCI and
measurement by delayed optical feedback in a ring laser,’’ Opt. Quantum EI retrieved articles. His current research interests include data intelligence
Electron., vol. 54, no. 5, p. 270, May 2022. and software engineering, particularly theoretical methods and applications
[7] A. P. Nugroho, M. A. N. Fadilah, A. Wiratmoko, Y. A. Azis, A. W. Efendi, of vehicle monocular vision.
L. Sutiarso, and T. Okayasu, ‘‘Implementation of crop growth moni-
toring system based on depth perception using stereo camera in plant
factory,’’ IOP Conf. Ser., Earth Environ. Sci., vol. 542, no. 1, Jul. 2020, LIXIA XUE was born in Ulanqab, China, in 1997.
Art. no. 012068. She received the bachelor’s degree in informa-
[8] P. Ferrara, A. Piva, F. Argenti, J. Kusuno, M. Niccolini, M. Ragaglia, tion management and information system and the
and F. Uccheddu, ‘‘Wide-angle and long-range real time pose estimation: master’s degree in computer application technol-
A comparison between monocular and stereo vision systems,’’ J. Vis. ogy from Inner Mongolia Agricultural University
Commun. Image Represent., vol. 48, pp. 159–168, Oct. 2017. in 2019 and 2022, respectively. Currently, she is
[9] X. Qi, R. Liao, Z. Liu, R. Urtasun, and J. Jia, ‘‘GeoNet: Geometric working as a Project Manager in Inner Mongolia
neural network for joint depth and surface normal estimation,’’ in Proc. Power Group Mengdian Information and Commu-
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 283–291. nication Industry Company. Her research interests
[10] Y. Luo, J. Ren, M. Lin, J. Pang, W. Sun, H. Li, and L. Lin, ‘‘Single include computer vision, unmanned driving, etc.
view stereo matching,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., Jun. 2018, pp. 155–163.
[11] A. Masoumian, H. A. Rashwan, S. Abdulwahab, J. Cristiano, M. S. Asif, JINGWEN BAO was born in China, in 1998. She
and D. Puig, ‘‘GCNDepth: Self-supervised monocular depth estima- majored in information management and infor-
tion based on graph convolutional network,’’ Neurocomputing, vol. 517, mation systems at Inner Mongolia Agricultural
pp. 81–92, Jan. 2023, doi: 10.1016/j.neucom.2022.10.073. University. She received management degree in
[12] Q.-T. Luong and O. D. Faugeras, ‘‘Self-calibration of a moving camera 2021. During her graduate studies, she received the
from point correspondences and fundamental matrices,’’ Int. J. Comput. master’s degree in electronic information in 2023.
Vis., vol. 22, no. 3, pp. 261–289, Mar. 1997. Her research focuses on computer vision.
[13] W. Dong and V. Isler, ‘‘A novel method for the extrinsic calibration of
a 2D laser rangefinder and a camera,’’ IEEE Sensors J., vol. 18, no. 10,
pp. 4200–4211, May 2018.
[14] Z. Zhang, ‘‘A flexible new technique for camera calibration,’’ IEEE Trans.
Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, Nov. 2000.
HAO LIAN was born in Ulanqab, China, in 2000.
[15] B. Li, L. Heng, K. Koser, and M. Pollefeys, ‘‘A multiple-camera system
He received the Bachelor of Engineering degree
calibration toolbox using a feature descriptor-based calibration pattern,’’ in
in 2022. He currently holds the master’s degree
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Nov. 2013, pp. 1301–1307.
[16] Y. Xu, F. Gao, Z. Zhang, and X. Jiang, ‘‘A calibration method for non- from Inner Mongolia Agricultural University. His
overlapping cameras based on mirrored absolute phase target,’’ Int. J. Adv. research interests include computer vision and
Manuf. Technol., vol. 104, nos. 1–4, pp. 9–15, Sep. 2019. deep learning.
[17] J. Li, Y. Yang, and G. Fu, ‘‘Camera self-calibration method based on GA-
PSO algorithm,’’ in Proc. IEEE Int. Conf. Cloud Comput. Intell. Syst.,
Sep. 2011, pp. 149–152.
[18] M. Jiafa, H. Wei, and S. Weiguo, ‘‘Target distance measurement
method using monocular vision,’’ IET Image Process., vol. 14, no. 13,
pp. 3181–3187, Nov. 2020. TIN LI was born in China, in 1999. She holds the
[19] S. Martínez-Díaz, ‘‘3D distance measurement from a camera to a mobile bachelor’s degree in information management and
vehicle, using monocular vision,’’ J. Sensors, vol. 2021, pp. 1–8, Apr. 2021. information systems big data from Inner Mongolia
[20] L. Xue, M. Li, L. Fan, A. Sun, and T. Gao, ‘‘Monocular vision ranging Agricultural University, where is currently pursu-
and camera focal length calibration,’’ Sci. Program., vol. 2021, pp. 1–15, ing the master’s degree in computer technology.
Jul. 2021. Her research direction focuses on the field of com-
puter vision.

TIAN GAO was born in Ulanqab, China, in 1997. YANYU SHI was born in Ulanqab, China, in 1999.
She received the B.S. degree in information and He received the bachelor’s degree in computer
computing science from Inner Mongolia Nor- science and technology from Inner Mongolia Agri-
mal University in 2019, where she pursued her cultural University in 2022, where he is currently
M.S. degree in computer application technology in pursuing the master’s degree in computer tech-
2020 to 2023, respectively. Her research interests nology. His research interests including intelligent
include computer vision ranging, smart agricul- animal husbandry and computer vision.
ture, etc.

92856 VOLUME 11, 2023

You might also like