Robust Object Detection Under Smooth Perturbations in Precision Agriculture

Mahmoud, Nesma Talaat Abbas; Virro, Indrek; Zaman, A. G. M.; Lillerand, Tormi; Chan, Wai Tik; Liivapuu, Olga; Roy, Kallol; Olt, Jüri

doi:10.3390/agriengineering6040261

Open AccessArticle

Robust Object Detection Under Smooth Perturbations in Precision Agriculture

by

Nesma Talaat Abbas Mahmoud

^1,†

,

Indrek Virro

^2,*,†

,

A. G. M. Zaman

²

,

Tormi Lillerand

²

,

Wai Tik Chan

¹

,

Olga Liivapuu

²

,

Kallol Roy

¹ and

Jüri Olt

²

¹

Institute of Computer Science, Faculty of Science and Technology, University of Tartu, 50090 Tartu, Estonia

²

Chair of Biosystems Engineering, Institute of Forestry and Engineering, Estonian University of Life Sciences, 51006 Tartu, Estonia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AgriEngineering 2024, 6(4), 4570-4584; https://doi.org/10.3390/agriengineering6040261 (registering DOI)

Submission received: 25 October 2024 / Revised: 23 November 2024 / Accepted: 27 November 2024 / Published: 29 November 2024

(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning algorithms are increasingly used to enhance agricultural productivity cost-effectively. A critical task in precision agriculture is locating a plant’s root collar. This is required for the site-specific fertilization of the plants. Though state-of-the-art machine learning models achieve stellar performance in object detection, they are often sensitive to noisy inputs and variation in environment settings. In this paper, we propose an innovative technique of smooth perturbations to improve the robustness of root collar detection tasks using the YOLOv5 neural network model. We train a YOLOv5 model on blueberry image data for root collar detection. A small amount noise is added as a smooth perturbation to the bounding box of dimensions 50× 50, and this perturbed image is fed for training. Furthermore, we introduce an additional test set that represents the out-of-distribution (O.O.D.) case by applying Gaussian blur on test images to simulate particle situation. We use three different image datasets to train our model, the (i) Estonian blueberry, (ii) Serbian blueberry image, and (iii) public dataset sourced from Roboflow datasets, of sample size 118, 2779, and 2993, respectively. We achieve an overall precision of 0.886 on perturbed blueberry images compared to 0.871 on original (unperturbed) images for the O.O.D. test set. Similarly, our smooth perturbation training has achieved an mAP of 0.828, which significantly increases against the result of normal training, which only reaches 0.794. The result proves that our proposed smooth perturbation is an effective method to increase the robustness and generalizability of the object detection task.

Keywords:

AI-based agriculture; computer vision; machine learning; object detection; perturbation; precision agriculture; robotics

1. Introduction

Detecting plant root collar position is an essential precision fertilization task for pesticide spraying, watering, etc. Precision agriculture relies on advancements in computer vision to enable tasks such as weed detection, crop monitoring, and autonomous navigation. Accurately locating the plant’s roots and stems enables optimal usage of water, fertilizers, and pesticides. Initial research efforts used geometric methods to locate plant root collars [1]. Precision fertilization is a critical component of precision agriculture (smart farming) that aims to increase overall efficiency, sustainability, and productivity by utilizing the latest technology and data-driven decisions. This approach not only benefits farmers but also contributes to environmental sustainability. Machine learning models identify and locate the root collar position from plant images. After training, machine learning models are deployed and integrated with agricultural robots for building an automated fertilization system. Object detection is a vital component in building precision agriculture systems and uses artificial vision (eyes) to see root collars in plant images [2]. Object detection is a method for detecting parts in an image from noise (chaotic) when there is a complex background. A machine learning model is trained on input images to solve the task of identifying and detecting the object of interest. In this paper, our object of interest is the root collar in the image. Detecting and locating the precise location of the root collar is critical, as the use of unwanted fertilizers may inadvertently favor weed growth over crops. Weeds often exhibit faster and more efficient nutrient uptake than crop plants [3]. Precision fertilization is thus of utmost importance for differentiating between the growth of plants of interest (blueberry) and unwanted weeds [4]. Preliminary results hint at a significant increase in productivity (approximately 2.25 times) and a substantial reduction in specific fertilizer costs (approximately 8.4 times) compared to portable spot fertilizers. Similar work along this line combines a neural network model with K-means clustering and a Lagrange multiplier to estimate the fertilization rate [5].

This paper proposes a novel smooth perturbation method to increase the robustness and stability of the blueberry root collar detection algorithm. The bounding box of the blueberry root collar is first perturbed with smooth noise and then fed to the YOLOv5 machine learning model for training. Input images are smoothly perturbed (added) with smooth and bounded noise at different locations and fed for training. The trained YOLOv5 model is used for the root collar detection task. The smooth noise added in the bounding box acts as a “degree of anti-concentration mechanism” for all possible input image data distributions. This low level of anti-concentration mechanism aided in searching for YOLOv5’s optimal neural weights during the gradient descent optimization. The loss landscape of YOLOv5 has multiple optima and our smooth perturbed noise helps to escape the local optima during gradient descent. From the computational complexity perspective of YOLOv5’s learning algorithm, smooth perturbations push closer to average-case guarantees than the worst-case guarantees [6]. We train our YOLOv5 model to detect the four bounding box coordinates of the blueberry root collar and the model will also return a confidence level about its prediction accuracy. The overall machine learning pipeline is shown in Figure 1. We infer and evaluate the performance of our trained YOLOv5 model on custom and public blueberry image datasets under different input noisy (with perturbations) distributions.

Ensuring the robustness of root collar detection under noisy and adverse conditions is important both from a theory and implementation perspective. Perturbations such as noise, blur, and lighting variability inadvertently enter agricultural image acquisition. Several camera factors, such as poor lighting conditions, high ISO settings, long exposure times, and heat, contribute to the noise. Good generalization in detecting root collar bounding boxes under unpredictable agricultural environments is necessary for the scalability of precision agriculture systems. The blueberry root collars are not standalone objects but rather are surrounded by weeds, grasses, etc. Fast and accurate detection of blueberry root collars from a heterogeneous environment is necessary. The machine learning model detects the bounding box over the root collar from the pixel space representation. From the theory perspective, the convergence and stability of the gradient descent algorithm on noisy input images is an active area of research. The robustness of the object detection algorithm is inferred through our proposed smooth perturbations in the bounding box. Recent research findings surprisingly show that the optimal solution remains invariant to sufficiently small perturbations in the input [7].

The main contributions of our paper are as follows:

Adding smooth noise in the bounding box to improve the robustness and stability of YOLOv5 in blueberry root collar detection task.
Robustness and stability inference on test data.
Building a blueberry image data acquisition system using a camera and mobile platform.

The dataset, code, and results are publicly accessible at the following URL: https://zenodo.org/records/14205417 (accessed on 22 November 2024).

2. Blueberry Plantations

Blueberries (Vaccinium angustifolium Ait.), perennially grown in the Nordic regions, are typically cultivated in the spring and summer. Regions of Northern and Eastern Europe are typically cold areas that historically have been the most active in blueberry plantations. Popular varieties such as Bluecrop were the choice for the blueberry farmers, though they are not ideal for export markets due to their shelf-life and firmness. New cultivars Cargo are now replacing the older varieties that offer better products for consumers and improved financial revenues for growers. In southwest Finland [8,9], blueberry cultivation has been notably successful. The most prolific blueberry varieties are ’June’ and ’Rancocas’, although the region’s severe winter conditions impact their productivity. The Finnish cultivar ’Aron’ [(’Rancocas’ X (V. uliginosum X ’Rancocas’)] is known for its resilience to cold temperatures and resistance to fungal infections. The ’Aron’ variety reaches a height of up to 3 feet and produces a moderate amount of medium-sized, high-quality fruit. While in Norway, the majority of blueberry fields can be found at coordinates 47°02′ N, 64°02′ W, with additional significant sites at Big Pond, Hermanville (46°28′ N, 62°16′ W), PEI, and Fox Point (45°24′ N, 64°27′ W), NS [10]. These PEI locations face the environmental influences of the Gulf of St. Lawrence, whereas the NS sites are affected by the salt spray emanating from the Bay of Fundy. The experimental data from Norway’s PE region show less severe effects than those observed at Big Pond. Poland plays a significant role in producing blueberries on a medium scale, largely due to numerous domestic plantings. The harvesting season begins with the ’Earliblue’ variety, succeeded by ’Weymouth’, ’Ivanhoe’, ’Herbert’, and ’Jersey’. In Germany, the blueberry harvesting period starts with the ’Earliblue’ variety in mid-July and concludes with ’Elliott’ in mid-September in the southern regions. The cultivar ’Darrow’ is unsuitable for the northern regions and ’Weymouth’ variety is progressively being replaced by ’Bluetta’. For automated harvesting purposes, German blueberry varieties such as ’Ama’ and ’Heerma’ are increasingly being used. These plants typically receive minimal fertilizer and are seldom sprayed. The majority of these blueberry farms use pine sawdust as mulch and delay pruning until at least six years post-planting. Southern highbush blueberries (SBH), a hybrid variety created by crossing Vaccinium corymbosum L., V. darrowi Camp, and other native Vaccinium species from the southeastern United States, are well-suited for regions with gentle winters of Florida, North Carolina, Georgia, and California [11]. The cultivation of SHB blueberries in tropical and subtropical areas is also vital for agricultural ventures in Mexico, Peru, Australia, northern Chile, and North Africa. Most southern highbush blueberries (SHB) are used for market consumption and gathered through a mechanical harvesting system. In Estonia, while blueberry cultivation in fields is widespread, it is not very profitable due to the absence of mechanization and specialized technology. The key to advancing blueberry farming in Estonia lies in the mechanization and automation of the cultivation process. This includes the creation of a robotic hand for efficiently spraying granular fertilizers. The robotic hand needs to be augmented with machine vision to identify blueberry root collars for full automation. A major issue in Estonian wild blueberry fields and other Nordic countries is the presence of common perennial grass weeds, which, if not controlled properly, can significantly reduce yields. Uncontrolled weeds form a dense cover that reduces blueberry yield significantly [12,13]. Typically, these grass weeds are managed with herbicide sprays, which is an expensive method for small-scale blueberry farmers. Blueberry harvesters manage the weed growth by spraying on costly herbicides such as hexazinone and terbacil. Current research in blueberry cultivation is concentrated on developing a precision-controlled granular fertilizer spray system equipped with nozzle control [14,15,16]. The system is designed to nourish the blueberry plants while depriving and eliminating the weeds by precision spraying on the blueberry roots. Locating the blueberry root collar is the main task for precision spraying on the blueberry root collar surrounded by grass weeds [17]. The creation of computer vision algorithms to identify weed-infested areas, generate maps, and finally integrate them into agricultural robotics systems is an active area of research. The similarity between weeds and blueberry plants makes detection non-trivial. Added to the complexity are the irregular patterns of the weeds around the blueberry bushes, especially in the wild. The matured blueberry fruits are classified using spectral properties to be plucked for consumption [18,19].

3. Related Work

Maria-Florina Balcan et al. investigated computational and statistical aspects of learning linear thresholds in the presence of noise on image classification [20]. Without any noise, several algorithms exist that can efficiently learn near-optimal linear thresholds even using a small amount of data. Konstantin Makarychev et al. introduced perturbation resilience (Bilu–Linial stability) and defined the following: the input instance is perturbation resilient if the optimal solution remains the same after the perturbation of the instance [7]. They presented algorithmic and hardness results for different perturbation-resilient input instances. Perturbation-resilient instances of clustering problems were investigated by Awasthi et al. [21]. The researchers presented an algorithm for searching the optimal clusters for 3-perturbation-resilient instances. In this groundbreaking study [22], the properties of the black box were investigated by perturbing the model’s input by masking images and observing the effects. Balcan et al. proposed an algorithm for

(1 + \sqrt{2})

perturbation-resilient input instances [23]. Angelidakis et al. investigated the Maximum Independent Set problem under the notion of stability introduced by Bilu and Linial [24]. Ben-David et al. proved that it is the optimal clustering is NP-hard for instances of k-medians satisfying the

(2 + ϵ) -

center proximity condition [25]. Cohen-Addad and Schwiegelshohn (2017) observed that the local search algorithm finds the optimal clustering for

(3 + ϵ)

-perturbation-resilient instances [26].

Recently, there has been a considerable effort in research on visual object detection and classification in precision agriculture. The research varies widely, covering object detection in agricultural contexts, benchmarking its performance, and comparing its accuracy to human experts [27,28]. Real-time computer vision systems are used for various agronomics tasks, including the following: locating crop stems [27,29], large-scale image recognition [30], detecting the crop line for autonomous weeding [31], weed identification in vegetable plantations [32], automatic grapevine phenotyping [33], and detection of sheep sorrel and hair fescue in field images [34]. Ni et al. [11] developed a deep learning pipeline using Mask R-CNN to segment individual blueberries and extract traits such as maturity, compactness, and berry count. Their study achieved a mean average precision (mAP) of 78.3% on the validation dataset and 71.6% on the test dataset under a 0.5 intersection-over-union (IoU) threshold, demonstrating the model’s efficacy in detecting and analyzing individual blueberries. Additionally, they introduced innovative metrics for compactness and maturity that are critical for mechanical harvesting and yield assessment. While Ni et al.’s work focuses on using image segmentation for trait extraction under controlled lighting and environmental conditions, our study addresses the critical challenge of robustness to perturbations such as noise and blur. Unlike their approach, which assumes ideal imaging setups, we emphasize the need for models capable of maintaining high performance in noisy, real-world agricultural environments. Our results extend the utility of deep learning in agricultural applications by improving detection accuracy and generalization under varying environmental conditions. Rakhmatulin et al. [35] conducted a comprehensive review of DL-based approaches for real-time weed detection in agricultural fields. The study highlights the superiority of modern architectures such as YOLO, Faster R-CNN, and Mask-RCNN over traditional machine vision methods, particularly in handling tasks like weed detection under variable lighting, dense vegetation, and occlusions. Despite significant progress, the authors emphasize that challenges such as dataset limitations, variability in natural environments, and computational constraints persist. They further recommend employing ensemble techniques, transfer learning, and synthetic data generation to address these issues. Additionally, García-Navarrete et al.’s [36] review highlights the strengths of CNNs in achieving high accuracy for weed segmentation and detection tasks.

In recent years, the development of large, annotated datasets has significantly enhanced the application of deep learning in agricultural tasks. Ref. [37] introduced Weed25, which is a dataset comprising 14,035 images of 25 weed species annotated for use in weed identification tasks. This dataset includes images captured under diverse environmental conditions, such as varying light intensities and growth stages, making it a robust resource for training deep learning models. Using Weed25, the authors evaluated several state-of-the-art object detection algorithms, including YOLOv3, YOLOv5, and Faster R-CNN, achieving mean average precision (mAP) scores of 91.8%, 92.4%, and 92.15%, respectively. The study demonstrates the utility of large-scale datasets in improving weed identification accuracy and robustness under real-world conditions. Similarly, platforms such as Roboflow software [38] offer public datasets and pre-trained models for agricultural applications, enabling researchers to annotate, augment, and deploy their models across diverse agricultural domains. Filipovic et al. used the same dataset we employed in our experiments and also utilized the YOLOv5 model for the detection of blueberry bushes [28]. The authors evaluated several YOLOv5 variants (nano, small, and medium) as baseline models and reported mAP50 scores of 0.859, 0.873, and 0.872, respectively. While YOLOv5 achieved high precision and recall on their dataset, the study primarily focused on structured agricultural environments with minimal perturbations, such as consistent lighting and controlled occlusions. This study addresses this gap by exploring perturbation resilience in object detection models in the agricultural field. By evaluating the performance of YOLOv5 on custom and public datasets under various perturbation scenarios, we provide novel insights into the robustness of the YOLOv5 detection algorithm. This work contributes to developing reliable, real-time vision systems for dynamic agricultural environments.

4. Theoretical Foundations

4.1. Perturbation Resilience

In this section, we give a theoretical foundation of convergence of root collar detection algorithm under the presence of noise. The fast convergence of the gradient descent algorithm of the YOLOv5 model in the presence of noise is discussed. We first formally define the smooth perturbation as [24,39]:

Definition 1.

A γ-perturbation of an image instance I is an instance

\tilde{I}

produced by adding a small noise in the bounding box B by a noise of variance between 0 and γ

Let the perturbation noise random variables (added to the bounding box of the images)

Z_{i}^{j}

be drawn from uniform random variables. The loss function of our YOLOv5 model is denoted as

L (W, Z, D)

, where W, D are the weights and D is the image data distribution. The gradient descent method is used for updating the weights W is as follows:

W_{i + 1} = W_{i} - γ ▿ L (W)

(1)

where

γ

is the learning rate. We now define the neural weight difference between two consecutive epochs

W_{i}

and

W_{i + 1}

as a new random variable that is dependent on the noise

g (W (Z)) = | | W_{i + 1} (Z) - W_{i} (Z) | |

(2)

We modify Bernstein-style uniform convergence and we empirically show our smooth noise aided the convergence by regularizing the weight updates and enhancing generalization [40]

E [g (W (Z))] \leq ϵ = \frac{120 d log (\frac{4 e^{2}}{ϵ})}{T k (Z)}

(3)

where k is a function of perturbation noise, d is the VC dimension, T is the time horizon and

ϵ

is the upper bound. The faster convergence of the root collar detection algorithm defined in Equation (3) comes from adding smooth noise represented by

\frac{1}{k (Z)}

. In the original formulations of Bernstein-style uniform convergence, the k is taken as a constant. Adding different noise perturbations makes

\frac{1}{k (Z)}

a generative parameter that can be tweaked for faster convergence.

4.2. Model Architecture

The YOLOv5 neural network model from the You Only Look Once (YOLO) series is trained for blueberry root collar detection [41,42]. Root collar detection is modeled as a regression that spatially locates bounding boxes. YOLOv5, a convolutional neural network (CNN)-based model, predicts bounding boxes and class probabilities trained from input images. YOLOv5 learns general representations in images and detects the statistical patterns in the pixel space for root detection. YOLOv5 architecture comprises backbone, bottleneck, and head, as shown in Figure 2 [43].

The backbone structure enhances the core framework of the object detection algorithm and employs a more efficient feature fusion method. The input image is initially processed in an input layer and relayed to the backbone section for feature extraction. The backbone captures feature maps in various dimensions and subsequently merges them using the feature fusion network, which is commonly referred to as the neck (bottleneck). The bottleneck performs the job of up-sampling the output feature maps. The output feature maps are generated by multiple convolution down-sampling operations from the feature extraction network for detecting different scale targets. The head of YOLOv5 carries out bounding box regression on every pixel within these maps, utilizing pre-established prior anchors. A multi-dimensional array is created at the end that contains details such as the type of object, confidence in the object’s class, and the bounding box’s coordinates, along with its width and height.

The YOLOv5 model starts the object detection by dividing the input image into an

S \times S

grid. Each grid cell detects the object only if its center falls into it. Each of the grid cells predicts B bounding boxes along with its confidence scores

P (O b j e c t) \times I O U_{p r e d}^{t r u t h}

.

I O U_{p r e d}^{t r u t h} = \frac{t r u t h \cap p r e d}{t r u t h \cup p r e d}

(4)

Each grid cell computes the conditional class probabilities

P (C l a s s_{i} | O b j e c t)

, which when multiplied by box confidence scores, gives the class-specific confidence scores given by:

P (C l a s s_{i} | O b j e c t) \times P (O b j e c t) \times I O U_{p r e d}^{t r u t h} = P (C l a s s_{i}) \times I O U_{p r e d}^{t r u t h}

(5)

The used YOLOv5 model is pre-trained with COCO weights of 214 layers with 7 million parameters [44].

5. Design of Experiment, Results, and Discussion

5.1. Data Acquisition: Hardware and Software

In this paper, we have used three different image datasets to train our model. The first dataset was collected by the authors from the Estonian blueberry field. This blueberry dataset is small, with a total of 118 blueberry images, 82 of which are used to train, 23 for validation, and 13 for testing. The images were reframed from the 360-degree view. The blueberry image dataset was first collected, then cleaned and pre-processed to train the YOLOv5 model for the root collar detection task. Roboflow software (version 1.0, Roboflow Inc, Des Moines, Iowa, USA) is used to label our training image dataset. The images were acquired from the cultivated berry plantation in Vehendi village, Elva Municipality, Tartu County (GPS 58.20, 26.13). This plantation covers 28 ha on an exhausted milled peat field measured by researchers from Estonia [45]. The Insta360 One X2 camera (Insta360, Shenzhen, China) was used for image acquisition in September 2021. The camera is mounted on a mobile platform, the lens is at the height of 1.6 m, and it navigates through the blueberry field row and takes pictures of the plants. The mobile platform for data acquisition consists of these main parts: 1 wheels, 2 base frame, 3 tripod, 4 extension arm, and 5 Insta360 One X2 camera. The mobile platform’s main parts (a) and camera view with the reframed area (b) are shown in Figure 3.

The second dataset was collected with the Luxonis OAK-D RGB camera (Luxonis, Boulder, CO, USA) mounted at a 0.5 m height during March, May, and August 2022 by the researchers of BioSense Institute, Novi Sad, Serbia [28]. A total of 2779 image instances from the dataset are selected, of which bush and pole objects are categorized. The training data size is 1910 image instances, where 573 instances are allocated for validation and 296 instances are used for testing.

The third dataset is a public dataset sourced from Roboflow datasets [46]. It comprises 2993 images, divided into three subsets: a training set containing 70% (2096 images), a validation set with 20% (598 images), and a test set with 10% (299 images). The dataset has three classes: pepper, tomato, and weed.

5.2. Design of Experiment

To simulate smooth perturbations, Gaussian noise with a predefined mean and standard deviation was added to the training dataset to enhance robust training. Each noisy square, measuring 50 × 50 pixels, was applied within the bounding box of the target objects, ensuring localized noise perturbations directly affected the regions of interest. This process, as illustrated in Figure 4, allows for controlled noise application, aiding in the evaluation of the model’s ability to generalize under noisy conditions. The perturbed images are trained for the blueberry object detection task. Two sets of experiments are conducted on two different targets: (1) bush and pole and (2) blueberry root collars. Each set of experiments runs on two datasets: (i) base case and (ii) perturbation case. The training samples used for each set of experiments are the same, except that smooth noise is added inside the bounding box area in perturbation studies, as shown in Figure 4. The experimental setups are summarized in Table 1. Our YOLOv5’s generalization capacity in the root collar detection is measured on out-of-distribution (O.O.D.) test samples: raw images and Gaussian blur are applied. Gaussian blur simulates the out-of-focus situation commonly occurring in wild environments. The O.O.D. test set represents a broader generalization since neither model has seen any training samples with this feature. The generalization performance of our model shows perturbation resilience as explained in Section 4.1.

Our experiments use three datasets: the Estonian blueberry dataset (118 images), the Serbian dataset (2779 images), and the Roboflow public dataset (2993 images). We have chosen a relatively smaller size model, YOLOv5, to reduce the chance of overfitting on a smaller sample size of image data. YOLOv5 strikes an optimal balance, offering robust performance and generalization to out-of-distribution data while ensuring faster inference times and lower computational demands, making it ideal for precision agriculture applications that require real-time decisions. We have conducted experiments comprising six studies for root object detection tasks. Experiment 1 targets identifying bush and pole features with the raw (unperturbed) dataset. In experiment 2, we train the model perturbed with noise to perform the same task as Experiment 1 of identifying bush and pole features. Experiment 3 focuses on detecting blueberry root collars with the raw (unperturbed) blueberry image dataset. Experiment 4 replicates the same task of Experiment 3 except on the perturbed blueberry images. Experiment 5 is applied on the Roboflow public weeds dataset; this experiment used the original images without any perturbation, while experiment 6 uses the same dataset with smooth noise added to training images in the bounding boxes. The summary of the experimental settings can be found in Table 1.

5.3. Results

In this subsection, we present our results on test sets, which mainly contribute to our findings, followed by a discussion on the training and validation phase.

Test sets results: The results of the experiments on the original test set (I.I.D.) without perturbation or modifications are shown in Table 2, while the Gaussian blurred test set results are shown in Table 3. Our experiment results infer about model’s adaptability and validate perturbation resilience under different scenarios in the weeds root detection task.

In Table 3, the test results on experiments 1, 2, 5, and 6 strongly support the proposed hypothesis of perturbation resilience. The generalization performance of the model on the Gaussian blurred dataset is better than the base case in predicting the Gaussian blurred test set (out-of-distribution, O.O.D.). The perturbed model performs better on the out-of-distribution testing data as in experiments 2 and 1 the mAP50

0.828 > 0.794

. Comparing experiments 1, 3, and 5, which were trained on unperturbed images, we can see that their results in Table 3 are less than in Table 2; this indicates that the models trained without perturbation struggle to generalize well to O.O.D. test images. These observations support the hypothesis that smooth perturbation during training enhances the model’s generalization capacity. However, for experiment 5, which involves a small dataset, the results show a significant drop in performance with perturbation (experiment 6). This suggests that the dataset size limits the effectiveness of the perturbation strategy, likely due to insufficient training data to balance the added variability.

Furthermore, the smooth perturbation also indicates evidence that the assumption of large sample requirements for generalization can be relaxed. In the experiment, both models are trained with the same amount of information (identical training dataset, apart from noise added). Still, the perturbed model shows its stability on the Gaussian blur test set compared to the normal one. This contrasts with the usual data argumentation techniques, which require additional training data to capture specific settings.

Class-specific performance: The specific classes for “Bush” and “Pole” in experiment 2, which adds noise in bounding boxes, yield better precision and mAP50 scores for out-of-distribution samples. These improvements emphasize the effectiveness of introducing noise in bounding boxes to enhance the detection of objects within these classes.

Training and validation results: The first experiment uses the raw blueberry bush data, while the second experiment introduces smooth noise in every bounding box. The performance metrics training/validation accuracy for both sets of experiments are shown in Figure 5 and Figure 6. Experiment 1, with the original images without any perturbations in the training or validation set, demonstrates notable improvements over experiment 2 with 0.93 precision, 0.82 recall, and 0.90 mAP50. This is expected because the training dataset of experiment 2 is perturbed while the validation dataset is not. Yet the performance of experiment 2 is still acceptable with the precision of 0.91, recall around 0.70, and mAP50 around 0.81. The details of the validation set visualization are shown in Figure 7.

5.4. Discussion

In reference to the test outcomes, take for example Figure 8, which illustrates the second experiment’s results using the perturbed training dataset. Our model successfully identified both bush objects; however, it displayed lower confidence in its detection for the bounding boxes without the Gaussian blur than the middle image with the Gaussian blur, even when accurate. Test results shown in Table 3 show that the overall precision for experiment 1 (0.871) exhibits a slightly lower overall precision than experiment 2 (0.886). However, experiment 2 shows lower overall recall (0.735) than experiment 1 (0.761). This suggests that the perturbation in experiment 2 has a trade-off effect on the precision and recall of the test set. Further, the mean average precision score (mAP) for experiment 1 (mAP50: 0.794) is less than experiment 2 (mAP50: 0.828). In Table 3, experiment 2 shows robust object detection performance on the test set compared to experiment 1, with no noises on input blueberry image data. Our experimental findings confirm our hypothesis of better robustness and accuracy through smooth perturbation on object detection tasks, thus improving precision. The training convergence (within finite epochs) of noisy object detection points to theoretical foundations of Beyond Worst-Case Analysis [7]. On the practical side, noisy object detection could be used in real-world agricultural applications, as there is inadvertent noise in measurements.

Moreover, the results presented in Figure 9 demonstrate the comparative performance of models trained with and without perturbation. In the case of the model trained on the unperturbed training set in Figure 9a, predictions for the blurred test set show a noticeable drop in confidence scores compared to the original test set. This reduction highlights the model’s sensitivity to distribution shifts, specifically Gaussian blur, which can significantly degrade its ability to generalize effectively to out-of-distribution (O.O.D.) scenarios.

Conversely, the model trained with perturbation-based augmentation in Figure 9b demonstrates a much smaller performance gap between the original and blurred test sets. The confidence scores remain relatively consistent, indicating that the model has developed improved robustness to visual distortions introduced during testing. This improvement can be attributed to the inclusion of perturbation in the training pipeline, which appears to enhance the model’s adaptability to noisy or altered images.

Introducing smooth perturbations, represented by noisy squares within bounding boxes during training, revealed its impact on the model’s robustness. Moreover, dataset sizes (sample complexity) affect the generalization capacity: larger datasets are more robust to smooth perturbations, while smaller datasets are prone to smooth perturbations. This is shown in our experimental results on the smaller blueberry dataset (Estonia) versus the larger dataset (Serbia) on bush and pole object detection. The pre-trained YOLOv5 model’s transfer learning capabilities were evident in the initial successes but also exposed its limitations in handling perturbed or smaller datasets, indicating the necessity for dataset-specific fine-tuning or augmentation strategies.

6. Conclusions

Our experiments underscore the trade-offs between model accuracy, robustness, and adaptability with perturbed or varied datasets. This paper proposes an innovative method called smooth perturbations for root collar detection of plants. Our proposed method trains the YOLOv5 model on input images injected with smooth noise. Root collar detection under environmental perturbations has both implications in theory and practice. On the theoretical side, our results show the convergence of the object detection algorithm under the noise. The holy grail in smooth perturbations is to prove guarantees on algorithm convergence using the assumption of a low level of anti-concentration in the possible input distributions. On the practical side, smooth perturbations provide a path for scalable precision agriculture. Our study addresses a critical gap by proposing a robust object detection method specifically designed to handle environmental perturbations, such as noise and blur, which are common in precision agricultural scenarios.

7. Future Work

In the future, one can develop a new perturbation policy using the geometry for the root collar detection tasks.The perturbation policy will be adversarial and adaptable, leveraging principles and techniques from game theory. Different pre-trained neural network models will be used to train on smooth perturbed images, and the dependence of neural architectures on the root collar detection accuracy will be studied. The smooth perturbations will be connected with explainable machine learning. These steps could significantly extend this research’s relevance and impact in both agricultural technology and computer vision fields. The practical deployment of our noisy root collar detection in different environments of robotics/drones will be tested. Noisy object detection algorithms in different memory-constrained environments need model pruning. Pruning YOLOv5 weights is an added perturbation besides the perturbations on the input image data space. The correlations between perturbations on the neural space and input image data space will be studied on the generalizability.

Author Contributions

Conceptualization, N.T.A.M., W.T.C. and K.R.; Data curation, N.T.A.M. and I.V.; Formal analysis, W.T.C., O.L. and K.R.; Funding acquisition, K.R. and J.O.; Investigation, I.V., A.G.M.Z., T.L. and J.O.; Methodology, N.T.A.M., I.V., A.G.M.Z., T.L., K.R. and J.O.; Project administration, K.R. and J.O.; Software, N.T.A.M., I.V. and K.R.; Supervision, K.R. and J.O.; Validation, A.G.M.Z., T.L., W.T.C. and O.L.; Visualization, N.T.A.M., I.V. and T.L.; Writing—original draft, N.T.A.M., I.V., W.T.C. and K.R.; Writing—review and editing, N.T.A.M., I.V., A.G.M.Z., W.T.C., O.L., K.R. and J.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by IT Academy Research Programme, Estonia, development fund PM210001TIBT from the Estonian University of Life Sciences and proof-of-concept grant EAG304 from the Estonian Research Council.

Data Availability Statement

The dataset, code, and results are publicly accessible at the following URL: https://zenodo.org/records/14205417 (accessed on 20 October 2024).

This work has been supported in part by IT Academy Research Programme, Estonia, development fund PM210001TIBT from the Estonian University of Life Sciences and proof-of-concept grant EAG304 from the Estonian Research Council. We are grateful to all who provided feedback on this work.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Langer, F.; Mandtler, L.; Milioto, A.; Palazzolo, E.; Stachniss, C. Geometrical Stem Detection from Image Data for Precision Agriculture. arXiv 2018, arXiv:1812.05415. [Google Scholar]
Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10819–10829. [Google Scholar]
Balasubramaniyan, P.; Palaniappan, S.P. Principles and Practices of Agronomy; Agrobios: Jodhpur, India, 2001. [Google Scholar]
Virro, I.; Arak, M.; Maksarov, V.; Olt, J. Precision fertilisation technologies for berry plantation. Agron. Res. 2020, 18, 2797–2810. [Google Scholar] [CrossRef]
Yu, H.; Liu, D.; Chen, G.; Wan, B.; Wang, S.; Yang, B. A neural network ensemble method for precision fertilization modeling. Math. Comput. Model. 2010, 51, 1375–1382. [Google Scholar] [CrossRef]
Haghtalab, N.; Roughgarden, T.; Shetty, A. Smoothed Analysis with Adaptive Adversaries. J. ACM 2024, 71, 1–34. [Google Scholar] [CrossRef]
Angelidakis, H.; Makarychev, K.; Makarychev, Y. Algorithms for Stable and Perturbation-Resilient Problems. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, 19–23 June 2017; pp. 438–451. [Google Scholar] [CrossRef]
Çeli̇K, H. The performance of some northern highbush blueberry (Vaccinium corymbosum L.) varieties in North eastern part of Anatolia. Anadolu Tarım Bilim. Derg. 2009, 24, 141–146. [Google Scholar]
Gough, R. The Highbush Blueberry and Its Management; Taylor & Francis: Abingdon, UK, 1993. [Google Scholar]
Eaton, L.J.; Sanderson, K.R.; Hoyle, J. Effects of salt deposition from salt water spray on lowbush blueberry shoots. Small Fruits Rev. 2004, 3, 95–103. [Google Scholar] [CrossRef]
Ni, X.; Li, C.; Jiang, H.; Takeda, F. Deep learning image segmentation and extraction of blueberry fruit traits associated with harvestability and yield. Hortic. Res. 2020, 7, 110. [Google Scholar] [CrossRef]
Lyu, H.; McLean, N.; McKenzie-Gopsill, A.; White, S.N. Weed survey of Nova Scotia lowbush blueberry (Vaccinium angustifolium Ait.) fields. Int. J. Fruit Sci. 2021, 21, 359–378. [Google Scholar] [CrossRef]
White, S.N.; Zhang, L. Evaluation of terbacil-based herbicide treatments for hair fescue (Festuca filiformis) management in lowbush blueberry. Weed Technol. 2021, 35, 485–491. [Google Scholar] [CrossRef]
Bilodeau, M.F.; Esau, T.J.; MacEachern, C.B.; Farooque, A.A.; White, S.N.; Zaman, Q.U. Identifying hair fescue in wild blueberry fields using drone images for precise application of granular herbicide. Smart Agric. Technol. 2023, 3, 100127. [Google Scholar] [CrossRef]
Rai, N.; Zhang, Y.; Ram, B.G.; Schumacher, L.; Yellavajjala, R.K.; Bajwa, S.; Sun, X. Applications of deep learning in precision weed management: A review. Comput. Electron. Agric. 2023, 206, 107698. [Google Scholar] [CrossRef]
Ahmad, A.; Saraswat, D.; Aggarwal, V.; Etienne, A.; Hancock, B. Performance of deep learning models for classifying and detecting common weeds in corn and soybean production systems. Comput. Electron. Agric. 2021, 184, 106081. [Google Scholar] [CrossRef]
Akumu, C.E.; Dennis, S. Effect of the Red-Edge Band from Drone Altum Multispectral Camera in Mapping the Canopy Cover of Winter Wheat, Chickweed, and Hairy Buttercup. Drones 2023, 7, 277. [Google Scholar] [CrossRef]
Yang, C.; Lee, W.S.; Gader, P. Hyperspectral band selection for detecting different blueberry fruit maturity stages. Comput. Electron. Agric. 2014, 109, 23–31. [Google Scholar] [CrossRef]
Yang, C.; Lee, W.S.; Williamson, J.G. Classification of blueberry fruit and leaves based on spectral signatures. Biosyst. Eng. 2012, 113, 351–362. [Google Scholar] [CrossRef]
Balcan, M.F.; Haghtalab, N. Noise in Classification. In Beyond the Worst-Case Analysis of Algorithms; Roughgarden, T., Ed.; Cambridge University Press: Cambridge, UK, 2020; Chapter 16. [Google Scholar]
Awasthi, P.; Blum, A.; Sheffet, O. Center-based clustering under perturbation stability. Inf. Process. Lett. 2012, 112, 49–54. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision/arXiv, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
Balcan, M.F.; Liang, Y. Clustering under Perturbation Resilience. In Proceedings of the 39th International Colloquium Conference on Automata, Languages, and Programming–Volume Part I, ICALP’12, Warwick, UK, 9–13 July 2012; pp. 63–74. [Google Scholar] [CrossRef]
Angelidakis, H.; Awasthi, P.; Blum, A.; Chatziafratis, V.; Dan, C. Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), Leibniz International Proceedings in Informatics (LIPIcs). Munich/Garching, Germany, 9–11 September 2019; Bender, M.A., Svensson, O., Herman, G., Eds.; Dagstuhl Publishing: Wadern, Germany, 2019; Volume 144, pp. 7:1–7:16. [Google Scholar] [CrossRef]
Reyzin, L. Data Stability in Clustering: A Closer Look. arXiv 2011, arXiv:1107.2379. [Google Scholar]
Cohen-Addad, V.; Schwiegelshohn, C. On the Local Structure of Stable Clustering Instances. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, 15–17 October 2017; pp. 49–60. [Google Scholar] [CrossRef]
Wosner, O.; Farjon, G.; Bar-Hillel, A. Object detection in agricultural contexts: A multiple resolution benchmark and comparison to human. Comput. Electron. Agric. 2021, 189, 106404. [Google Scholar] [CrossRef]
Filipović, V.; Stefanović, D.; Pajević, N.; Grbović, Ž.; Djuric, N.; Panić, M. Bush Detection for Vision-Based UGV Guidance in Blueberry Orchards: Data Set and Methods. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3645–3654. [Google Scholar]
Lac, L.; Costa, J.P.D.; Donias, M.; Keresztes, B.; Bardet, A. Crop stem detection and tracking for precision hoeing using deep learning. Comput. Electron. Agric. 2022, 192, 106606. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Adhikari, S.P.; Yang, H.; Kim, H. Learning semantic graphics using convolutional encoder–decoder network for autonomous weeding in paddy. Front. Plant Sci. 2019, 10, 1404. [Google Scholar] [CrossRef] [PubMed]
Jin, X.; Che, J.; Chen, Y. Weed identification using deep learning and image processing in vegetable plantation. IEEE Access 2021, 9, 10940–10950. [Google Scholar] [CrossRef]
Milella, A.; Marani, R.; Petitti, A.; Reina, G. In-field high throughput grapevine phenotyping with a consumer-grade depth camera. Comput. Electron. Agric. 2019, 156, 293–306. [Google Scholar] [CrossRef]
Hennessy, P.J.; Esau, T.J.; Schumann, A.W.; Zaman, Q.U.; Corscadden, K.W.; Farooque, A.A. Evaluation of cameras and image distance for CNN-based weed detection in wild blueberry. Smart Agric. Technol. 2022, 2, 100030. [Google Scholar] [CrossRef]
Rakhmatulin, I.; Kamilaris, A.; Andreasen, C. Deep neural networks to detect weeds from crops in agricultural environments in real-time: A review. Remote Sens. 2021, 13, 4486. [Google Scholar] [CrossRef]
García-Navarrete, O.L.; Correa-Guimaraes, A.; Navas-Gracia, L.M. Application of Convolutional Neural Networks in Weed Detection and Identification: A Systematic Review. Agriculture 2024, 14, 568. [Google Scholar] [CrossRef]
Wang, P.; Tang, Y.; Luo, F.; Wang, L.; Li, C.; Niu, Q.; Li, H. Weed25: A deep learning dataset for weed identification. Front. Plant Sci. 2022, 13, 1053329. [Google Scholar] [CrossRef]
Dwyer, B.; Nelson, J.; Hansen, T. Roboflow (Version 1.0) [Software]. 2024. Available online: https://roboflow.com (accessed on 20 October 2024).
Spielman, D.A.; Teng, S. Smoothed Analysis of Algorithms: Why the Simplex Algorithm Usually Takes Polynomial Time. J. ACM 2001, 51, 385–463. [Google Scholar] [CrossRef]
Howard, S.R.; Ramdas, A.; McAuliffe, J.; Sekhon, J. Time-uniform Chernoff bounds via nonnegative supermartingales. Probab. Surv. 2020, 17, 257–317. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Washington, DC, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef]
Jocher, G. YOLOv5 by Ultralytics, Version v7.0; Ultralytics LLC: Arlington, VA, USA, 2020. [Google Scholar] [CrossRef]
Liu, H.; Sun, F.; Gu, J.; Deng, L. SF-YOLOv5: A Lightweight Small Object Detection Algorithm Based on Improved Feature Fusion Mode. Sensors 2022, 22, 5817. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 740–755. Available online: https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48 (accessed on 20 October 2024).
Soots, K.; Lillerand, T.; Jogi, E.; Virro, I.; Olt, J. Feasibility analysis of cultivated berry field layout for automated cultivation. Eng. Rural. Dev. 2021, 20, 1003–1008. [Google Scholar] [CrossRef]
Plants. Weeds Dataset. 2024. Available online: https://universe.roboflow.com/plants-e5wq4/weeds-sovlx (accessed on 20 October 2024).

Figure 1. Object detection on perturbed noise.

Figure 2. YOLOv5 architecture.

Figure 3. Camera mounted on mobile platform. (a) Mobile platform main parts: 1 wheels, 2 base frame, 3 tripod, 4 extension arm, and 5 Insta360 One X2 camera; (b) the 360 degree camera view reframed area marked in red.

Figure 4. Example of the smooth noise added within the bounding box of the training datasets for some experiments. (a) Noise added to Serbian dataset. (b) Noise added to Roboflow public dataset.

Figure 5. Bush dataset training/validation results Experiment 1.

Figure 6. Perturbed bush dataset training/validation results Experiment 2.

Figure 7. Validation batch result.

Figure 8. The image on the left represents an image from the test set with Gaussian blur, the image in the middle is the prediction for the blurred image, while the image on the right represents the model prediction for the same test image but without blurring.

Figure 9. The perturbed model predictions over augmented and non-augmented test image. (a) Model trained with unperturbed training set predictions. (b) Model trained with perturbed training set predictions.

Table 1. Experiment setting summary, the perturbation on training is the smooth noise square added in the bounding box.

Detection Task	Experiment	Perturb on Training	Augumentation on Testing
Bush and pole	Exp. 1	No
Bush and pole	Exp. 2	Yes	Gaussian blur
Blueberry root collar	Exp. 3	No
Blueberry root collar	Exp. 4	Yes
Pepper, tomato, and weed	Exp. 5	No
Pepper, tomato, and weed	Exp. 6	Yes

Table 2. Comparison of test results using original test images (I.I.D.).

Experiment	Precision	Recall	mAP50
Exp. 1	0.910	0.859	0.899
Exp. 2	0.880	0.711	0.818
Exp. 3	0.744	0.672	0.635
Exp. 4	0.13	0.385	0.119
Exp. 5	0.842	0.812	0.875
Exp. 6	0.943	0.536	0.617

Table 3. Comparison of test results using the Gaussian blurred test images (out-of-distribution O.O.D.).

Experiment	Precision	Recall	mAP50
Exp. 1	0.871	0.761	0.794
Exp. 2	0.886	0.735	0.828
Exp. 3	0.572	0.462	0.468
Exp. 4	0.121	0.385	0.109
Exp. 5	0.881	0.756	0.847
Exp. 6	0.891	0.523	0.624

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mahmoud, N.T.A.; Virro, I.; Zaman, A.G.M.; Lillerand, T.; Chan, W.T.; Liivapuu, O.; Roy, K.; Olt, J. Robust Object Detection Under Smooth Perturbations in Precision Agriculture. AgriEngineering 2024, 6, 4570-4584. https://doi.org/10.3390/agriengineering6040261

AMA Style

Mahmoud NTA, Virro I, Zaman AGM, Lillerand T, Chan WT, Liivapuu O, Roy K, Olt J. Robust Object Detection Under Smooth Perturbations in Precision Agriculture. AgriEngineering. 2024; 6(4):4570-4584. https://doi.org/10.3390/agriengineering6040261

Chicago/Turabian Style

Mahmoud, Nesma Talaat Abbas, Indrek Virro, A. G. M. Zaman, Tormi Lillerand, Wai Tik Chan, Olga Liivapuu, Kallol Roy, and Jüri Olt. 2024. "Robust Object Detection Under Smooth Perturbations in Precision Agriculture" AgriEngineering 6, no. 4: 4570-4584. https://doi.org/10.3390/agriengineering6040261

APA Style

Mahmoud, N. T. A., Virro, I., Zaman, A. G. M., Lillerand, T., Chan, W. T., Liivapuu, O., Roy, K., & Olt, J. (2024). Robust Object Detection Under Smooth Perturbations in Precision Agriculture. AgriEngineering, 6(4), 4570-4584. https://doi.org/10.3390/agriengineering6040261

Article Menu

Robust Object Detection Under Smooth Perturbations in Precision Agriculture

Abstract

1. Introduction

2. Blueberry Plantations

3. Related Work

4. Theoretical Foundations

4.1. Perturbation Resilience

4.2. Model Architecture

5. Design of Experiment, Results, and Discussion

5.1. Data Acquisition: Hardware and Software

5.2. Design of Experiment

5.3. Results

5.4. Discussion

6. Conclusions

7. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI