1. Introduction
Unmanned aerial vehicles (UAVs) or drones are successfully used in several industries. They have a wide range of applications such as surveillance, aerial photography, infrastructural inspection, and rescue operations. These applications require that the onboard system can sense the environment, parse, and react according to the parsing results. Scene parsing is a function that enables the system to understand the visual environment, such as recognizing the type of objects, place of objects, and regions of object instances in a scene. These problems are the main topics in computer vision—classification, object detection, and object segmentation. Object detection is a common topic and has attracted the most interest in recent studies. In object detection, traditional handcrafted feature-based methods showed limited performance [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. A competitive approach is to apply deep-learning-based methods, which have gained popularity in recent years [
13,
14,
15,
16]. However, deploying deep learning models to a UAV onboard system raises new challenges—(1) difficulties of scene parsing in cases of low-resolution or motion-blurred input, (2) difficulties of deploying the model to an embedded system with limited memory and computation power, and (3) balancing between model accuracy and execution time. Autonomous landing is a core function of an autonomous drone, and it has become an urgent problem to be solved in autonomous drone applications. Recently, deploying deep learning models to UAV systems has become more feasible, because of both the growth in computing power and the extensive studies of deep neural networks, which have achieved significant results in scene parsing tasks, such as object detection tasks (e.g., faster region-based convolutional neural network (R-CNN) [
17] and single-shot multibox detector (SSD) [
18]. Therefore, the topic of autonomous drone landing has attracted much research interest, and the trend is toward autonomous landing by using deep-learning-based methods for tracking a guiding marker. Several state-of-the-art (SOTA) object detectors, based on convolutional neural networks (CNNs) have been proposed and deployed successfully for marker detection in marker tracking tasks. You only look once (YOLO) models might be the most popular deep object detectors in practical applications, because the detection accuracy and execution time are well balanced. Nevertheless, those systems have low robustness and are prone to failure when dealing with low-resolution [
16] or motion-blurred images [
19]. Such inputs need to be preprocessed before being fed to the detector. Thus, using a combination of a few networks as a pipeline is a promising approach to achieve this goal. In addition, drone landing causes motion of the attached camera. Even if a drone has an antivibration damper gimbal, the recorded frames are affected by motion blurring, especially in the case of high-speed landing [
20]. For this reason, marker detection with motion-blurred input is a critical problem that necessarily needs to be addressed.
Therefore, we propose to learn efficient motion deblurring marker detection for autonomous drone landing, through a combination of motion deblurring and object detection, and apply a slimming deblurring model to balance the system speed with accuracy, on embedded edge devices. To this end, we trained the DeblurGAN network on our synthesized dataset and then pruned the model to obtain the slimmed version, SlimDeblurGAN. Moreover, we trained a variant of the YOLO detector on our synthesized dataset. Finally, we stacked SlimDeblurGAN and the detector, and then evaluated the system on a Desktop PC and a Jetson board TX2 environment. This research was novel compared to previous studies, in the following four ways:
This is one of the first studies on simultaneous deep-learning-based motion deblurring and marker detection for autonomous drone landing.
The balance of accuracy and processing speed is critical when deploying a marker tracking algorithm on an embedded system with limited memory and computation power. By proposing a dedicated framework for pruning the motion deblurring model, our proposed SlimDeblurGAN acquires the real-time speed on embedded edge devices, with high detection accuracy.
Through an iterative performance of channel pruning and fine-tuning, our proposed SlimDeblurGAN showed a lower computing complexity but a higher accuracy of marker detection, compared to the state-of-the-art methods, including original DeblurGAN. The SlimDeblurGAN generator uses batch normalization, instead of instance normalization and imposes sparsity regularization. By performing channel pruning on the convolutional layers of the generator, SlimDeblurGAN has a more compact and effective channel configuration of the convolutional layers. Furthermore, it has a smaller number of trainable parameters than DeblurGAN. Thus, its inference time is shorter than that of the original DeblurGAN, with a small degradation of accuracy.
The codes of pruning framework for slimming DeblurGAN, SlimDeblurGAN, and YOLOv2, two synthesized motion-blurred datasets, and trained models are available to other researchers through our website (Dongguk drone motion blur datasets and the pretrained models.
http://dm.dgu.edu/link.html), for fair comparisons.
2. Related Works
There are numerous studies on autonomous drone landing, which can be classified into two types—those not considering motion blurring and those considering motion blurring.
Not considering motion blurring: At the initial stages, researchers considered objects on the runway with a lamp to guide the UAV, to determine a proper landing area. Gui et al. [
1] proposed a vision-based navigation method for UAV landing, by setting up a system in which a near-infrared (NIR) light camera was integrated with a digital signal processing processor and a 940-nm optical filter was used to detect NIR light-emitting diode (LED) lamps on the runway. Their method had a significant advantage, i.e., it could not only work well in the daytime but also at nighttime. However, this method required a complicated setup of four LEDs on the runway. In addition, this method could only be performed in a wide area. Therefore, it failed to operate in narrow urban landing areas. Forster et al. [
2] proposed a landing method including generating a 3D terrain depth map from the images captured by a downward-facing camera, and determining a secure area for landing.
It was completely proven that this method could work well in both indoor and outdoor environments. Nevertheless, the depth estimation algorithm was only tested at a maximum range of 5 m, and this method exhibited a slow processing speed. Two limitations of markerless methods are the difficulty of spotting a proper area for landing and the requirement of complicated setups for the landing area.
To solve these problems, marker-based methods were proposed. According to the type of features used, marker-based methods could be categorized into two kinds—handcrafted feature-based and deep feature-based methods. One of the handcrafted feature-based approaches that was robust to low-light conditions adopted a thermal camera-based method. These methods have high performance, even in nighttime scenarios, by using the emission of infrared light from a target on the ground. However, such methods require the drone to carry an additional thermal camera, as thermal cameras are not available in conventional drone systems. Other handcrafted marker-based approaches are based on visible-light cameras. Lin et al. [
4] proposed a method to track the relative position of the landing area, using a single visible-light camera-based method. They used an international H-pattern marker to guide a drone landing in a cluttered shipboard environment. The characteristic of this method was that it could restore the marker from partial occlusion, and correctly detect the marker from complicated backgrounds. Moreover, they adopted the Kalman filter to fuse the vision measurement with the inertial measurement unit (IMU) sensor outputs, to obtain a more accurate estimate. Following that approach, Lange et al. [
5] introduced a method to control the landing position of autonomous multirotor UAVs. They also proposed a new hexagonal pattern of landing pads, including concentric white rings on a black background and an algorithm to detect the contour rings from the landing pads. In addition, they used auxiliary sensors such as the SRF10 sonar sensor (Robot Electronics, Norfolk, UK), which accurately measured the current altitude above the ground, and the Avago ADNS-3080 optical flow sensor (Broadcom Inc., San Jose, CA, USA), which output the UAV’s current velocity. These methods have the same disadvantage as the previous one, mandatorily carrying additional hardware, such as the IMU sensor, sonar sensor, and optical flow sensor. Some previous studies investigated UAV landing on a moving platform [
6,
20]. These studies take account of the six-degrees of freedom (6-DOF) pose of the marker, by using special landing pads, like fiducial markers. They also investigated a landing scenario in which the markers were positioned on the deck of a ship or placed on a moving platform. Other than landing on a fixed area, this method not only solved the marker-tracking problem but also tackled the more challenging obstacle. However, it requires more calculations and the estimation of relative position between the UAV and the moving target. Hence, they used SOTA computer vision methods, including multisensor fusion, tracking, and motion prediction of landing target on the moving platform. Consequently, the limitation of such methods is the short working range, due to the limited working range of the hardware employed. In particular, a previous study adopted the fiducial AprilTag [
21] marker as the landing pad, owing to its robustness in difficult situations, such as severe rotation, heavy occlusion, light variation, and low image resolution. Although this study successfully tracked the marker in daytime conditions, the maximum distance between the landing target and the UAV was only approximately 7 m.
Araar et al. [
7] proposed a new solution for multirotor UAV landing, using a new landing pad and relative-pose-estimation algorithm. In addition, they adopted two filters (an extended Kalman filter and an extended H_∞) to fuse the estimated pose and the inertial measurement. Although their method was highly accurate, it required information on the inertial measurements. Additionally, only indoor environment experiments were conducted, and the maximum working range was limited, owing to the drawback of the employed AprilTag marker. A novel idea was adopted in another study, taking advantage of cloud computing to overcome the limitation of the onboard hardware [
11]. Specifically, the heavy computation tasks of computer vision were transferred to a cloud-based system, and the onboard system of the UAV only handled the returned results. Barták et al. [
8] introduced an adequate handcrafted marker-based method for drone landing. Handcrafted feature-based techniques, such as blob pattern recognition, were adopted to identify and recognize the landing target. Control algorithms were also employed to navigate the drone to the appropriate target area. In this way, this method worked well in real-world environments. Nevertheless, their experiments were conducted only during daytime, and the maximum detection range was limited to 2 m. In an attempt to address autonomous UAV landing on a marine vehicle, Venugopalan et al. [
9] proposed a method that adopted handcrafted feature-based techniques, like color detection, shape detection, pattern recognition, and image recognition, to track the landing target. Additionally, a searching and landing algorithm and a state machine-based method, were proposed. Their method worked well, with a success rate of over 75%, even in some difficult environmental conditions like oscillatory motion associated with the landing target or wind disturbance. However, the testing distance between the landing target and the UAV in their experiments was close. Wubben et al. [
10] proposed the method for accurate landing of unmanned aerial vehicles, based on ground pattern recognition. In their method, a UAV equipped with a low-cost camera could detect ArUco markers sized 56 × 56 cm, from an altitude of up to 30 m. When the marker was detected, the UAV changed its flight behavior in order to land on the accurate position where the marker was located. Through experiments, they confirmed an average offset of only 11 cm from the target position, which vastly enhanced the landing accuracy, compared to the conventional global positioning system (GPS)-based landing, which typically deviated from the intended target by 1 to 3 m. Some researchers studied the autonomous landing of micro aerial vehicles (MAVs), using two visible-light cameras [
12]. They performed a contour-based ellipse detection algorithm to track a circular landing pad marker in the images obtained from the forward-facing camera. When the MAV was close to the target position, the downward-facing camera was used because the fixed forward-facing camera view was limited. By using two cameras to extend the field of view of the MAV, the system could search for the landing pad even when it was not directly below the MAV. However, this method was only tested in an indoor scenario, which limited the working range.
In order to overcome the performance limitations of the handcrafted feature-based methods, deep feature-based methods were introduced, which exhibited high accuracy and increased detection range. Nguyen et al. [
13] proposed a marker tracking method for autonomous drone landing, based on a visible-light camera on a drone. They proposed a variant of YOLOv2 named lightDenseYOLO to predict the marker location, including its center and direction. In addition, they introduced Profile Checker V2 to improve accuracy. As a result, their method could operate with a maximum range of 50 m. Similarly, Yu et al. [
14] introduced a deep-learning-based method for MAV autonomous landing systems, and they adopted a variant of the YOLO detector to detect landmarks. The system achieved high accuracy of marker detection and exhibited robustness to various conditions, such as variations in landmarks under different lighting conditions and backgrounds. Despite achieving high performance in terms of detection range and accuracy, these methods did not consider input images under conditions like low-resolution and motion-blurred images. In another study, Polvara et al. [
15] proposed a method based on deep reinforcement learning to solve the autonomous landing problem. Specifically, they adopted a hierarchy of double-deep Q-networks that were used as high-level control policies to reach the landing target. Their experiments, however, were only conducted in indoor environments.
Recently, Truong et al. [
16] proposed a super-resolution reconstruction (SR) marker detection method for autonomous drone landing, by using a combination of SR and marker-detection deep CNNs, to track the marker location. Their proposed method successfully handled the obstacle of low-resolution input. Moreover, they introduced a cost-effective solution for autonomous drone landing, as their system required only a low-cost, low-resolution camera sensor, instead of expensive, high-resolution cameras. Furthermore, their proposed system could operate on an embedded system and acquired a real-time speed. However, they did not consider the case of motion blurring in the captured image. A low-resolution image was acquired by a low-resolution camera, including the small number of pixels from the camera sensor, but the motion blurring was caused by the f-number of the camera lens and the camera exposure time. A small f-number and a large exposure time caused a large amount of motion blurring in the captured image. It is often the case that motion blurring frequently occurred in the captured image by drone camera, because the image was captured while the drone was moving or landing. Therefore, we propose a new method of motion deblurring and marker detection for drone landing, which is completely different from the previous work [
16], which considers only SR of the low-resolution image by drone camera, without motion deblurring. In addition, we propose a new network of SlimDeblurGAN for motion deblurring (that is different from previous work [
16]), which used deep CNN with a residual net skip connection and network-in-network (DCSCN) for SR. Considering the motion blur method: All previous methods exhibited promising solutions for autonomous landing. They conducted experiments based on various scenarios like indoor, outdoor, daytime, and nighttime, as well as difficult conditions like low light and low resolution of the input. However, the input images under the motion blur effect, which frequently occur due to the movement of the drone, were not considered in their studies. Therefore, we propose a deep-learning-based motion deblurring and marker detection method for drone landing. These research studies [
13,
14] were about marker detection by a drone camera and did not consider the motion blurring in the captured image, which was different from our research considering motion deblurring. The research in [
20] dealt with the motion blurring in the captured image by UAV, but they did not measure the accuracy of marker detection and the processing speed on the actual embedded system for the drone. Different from this research, we measured the accuracies of marker detection by our method and compared them with the state-of-the-art methods. In addition, we measured the processing speed of marker detection by our method on the actual embedded system for the processing on the drone and compared them with the state-of-the-art methods. The research [
19] studied the detection of motion-blurred vehicle logo. However, its target was only for logo detection, which was different from our research of marker detection by a drone camera. Although the method in [
13,
14,
21] achieved a 99% accuracy for landmark or marker, based on field experiments, they assumed only the slow movement or landing of drone, which did not generate the motion blurring. However, in the actual case of drone movement or landing at normal speed, motion blurring occurred frequently, as mentioned in [
20].
Table 1 presents a comparison of the proposed and previous methods.