1. Introduction
Historically, Unmanned Aerial Vehicles (UAVs) have primarily been used for military applications. More recently, the use of UAVs in the civilian domain as remote sensing tools presents new and exciting opportunities. Improvements in the availability of accurate and miniature Global Positioning Systems (GPS) and Inertial Measurement Units (IMUs), along with the availability of quality off-the-shelf consumer grade digital cameras and other miniature sensors have resulted in an increased use of civilian UAVs [
1]. The highest spatial resolution data available from conventional platforms, such as satellites and manned aircraft, is typically in the range of 20–50 cm/pixel. UAVs are capable of flying much lower and hence can collect imagery at a much higher resolution [
2,
3], often at a sub-decimetre resolution, even as detailed as 1 cm/pixel. The temporal resolution of conventional systems is limited by the availability of aircraft platforms and orbit characteristics of satellites. For the purpose of monitoring highly dynamic vegetation, satellite sensors are often limited due to unfavourable re-visit times [
4].
Many studies have successfully used UAVs to map and monitor areas of vegetation that are of an agricultural and/or an environmental interest, see for example [
5–
8]. Johnson
et al. [
6] used a small fixed wing UAV to collect imagery over a commercial vineyard in California. The imagery had a spatial resolution of 20 cm/pixel and was processed to segment the scenes into vegetation and soil areas and to subsequently calculate percentage vegetation cover. Monitoring of small plots within wheat crops in southwest France [
7] is another example of UAVs assisting with agricultural processes. Lelong
et al. [
7] used a modified digital camera to collect imagery in four bands, red, green, blue and near-infrared to enable the calculation of vegetation indices such as the Normalized Difference Vegetation Index (NDVI).
In an environmental monitoring context Rango
et al.[
8] deployed a fixed wing UAV in the rangelands of southern New Mexico, acquiring imagery with at a 5–6 cm/pixel resolution. Laliberte [
9] also collected imagery of the New Mexico rangelands, but also used a six band multispectral camera to capture high resolution data in the near infrared. Imagery of such high spatial resolution can provide a lot of information, such as detailed area of vegetation and bare soil coverage, composition by functional or structural group, spatial distribution of plants, inter canopy gaps and in some cases, vegetation type [
10]. In another study, Dunford
et al. [
5] used a paraglider type UAV to acquire imagery with a spatial resolution of 6–21 cm/pixel over 179 ha of riparian forest in France. An object-based classification approach was then found to be the most accurate classifier for the detection of dead wood within the forested area [
5].
Despite significant evidence highlighting the value of UAVs in the fields of precision agriculture and environmental monitoring, the collection of ultra-high resolution UAV imagery presents a number of challenges. Due to the relatively low flying height (e.g., 50–120 m) of micro-UAVs (<5 kg), the images have a small footprint (e.g., 50 × 40 m when flying at 50 m above ground level with a typical camera and lens configuration). This necessitates the capture of a large number of images to achieve the spatial coverage required for many applications. For example, a single flight covering approximately 2 ha can yield around 150–200 images. To maximise the potential of the UAV technology for environmental and agricultural applications, it is essential that an automated, efficient, and accurate technique be developed to rectify and mosaic the large volume of images generated.
There are fundamental differences between imagery collected by a UAV flying at low altitude compared to that collected by a traditional aerial platform flying at higher altitudes. UAV imagery is often collected in a haphazard manner (
i.e., flight lines with variable overlap and cross-over points); it has large rotational and angular variations between images [
11]; the altitude of the platform is low in relation to the height variation within the scene, causing large perspective distortions [
11]; and the exterior orientation (EO) parameters are either unknown or, if measured, they are likely to be inaccurate. UAV imagery often has high variability in illumination, occlusions and variations in resolution [
12], which are characteristics more typical of those usually presented in close-range photogrammetry applications [
13]. Hence, UAV photography has characteristics of both traditional aerial photography and terrestrial photography, and there are opportunities to use image processing algorithms that are applicable to both types of imagery, as suggested by Barazzetti
et al.[
12].
Recently there have been advances in the realm of Computer Vision (CV), resulting in new algorithms for processing terrestrial photography. Examples are the powerful Scale Invariant Feature Transform (SIFT) [
14] feature detector, and the Structure from Motion (SfM) algorithms that make use of SIFT features to create 3D models from a series of overlapping photos [
15]. SIFT is a region detector, rather than an interest point extractor that would typically be used by traditional photogrammetric software [
16]. As a region detector it has been demonstrated that SIFT is applicable to UAV imagery due to its robustness against changes in rotation, scale, and translation between images [
16].
The standard approach in modern photogrammetry is to employ a Bundle Block Adjustment (BBA) to solve for the exterior orientation of each photograph and, if required and provided the geometry of the block of photographs allows it, to solve for additional parameters such as the interior orientation (IO). An introduction to the BBA is provided by e.g., Wolf and Dewitt[
17]. Most commonly, metric mapping cameras are used for aerial photography for which the IO parameters are known. UAV imagery is typically collected with consumer grade cameras for which IO parameters are neither known nor stable. Measured values for EO parameters, typically captured at relatively low accuracy in the case of UAV photography, can be included in the BBA, and provide approximate measurements for the bundle adjustment [
18].
Increasingly, in the case of traditional aerial photogrammetry, the position and orientation of the camera can be derived from GPS and IMU data with sufficient accuracy to allow direct georeferencing without the need for Ground Control Points (GCPs). Often if ground control is available it is primarily used to ensure a reliable transformation from the GPS based coordinate system into the required map coordinate system. This is not the case for UAV photography because of the lower accuracy of the GPS/IMU data and because of the very large scale of the imagery and map products.
Tie/pass points are required to complete a BBA and are typically automatically generated in the case of traditional aerial photography by an interest point extractor algorithm. For UAV imagery, a SIFT algorithm can be used and has the potential to generate a large number of features that can be used as tie/pass points, supplying more redundant observations for a BBA and thus improving the accuracy of the results [
11].
Table 1 clearly demonstrates that with UAV imagery, the IO and EO parameters are often not well known, making the use of a traditional BBA problematic or, at least, more similar to terrestrial or close-range photogrammetry. Attempts have been made to overcome these limitations by developing techniques to specifically work with UAV imagery. Berni
et al.[
4] used onboard IMU and GPS data to estimate the camera’s approximate EO parameters which were then imported into traditional photogrammetric software along with calibrated images to create a mosaic. The images collected had a high level of overlap, allowing only the central part of the images to be used to avoid the extremities where view angle caused perspective distortions [
4]. A minimum number of GCPs were then manually measured and an aerotriangulation performed. Berni
et al.[
4] were then able to use an existing Digital Terrain Model (DTM) to generate an orthomosaic, however, no overall spatial accuracy for this method was reported.
Laliberte
et al. [
19] developed a method that relied on an existing underlying orthorectified photo and DTM. They initially estimated camera EO parameters from onboard sensors and then iteratively matched each individual image with the existing orthophoto to improve the accuracy of the EO parameters and provide GCPs based on matched features between images. After many iterations of this process, photogrammetric software used the EO parameters and GCPs to orthorectify the images and generate a seamless mosaic. Laliberte
et al.[
19] identified that their methodology has a number of limitations: it requires pre-existing orthophotos that can quickly become out of date, the 10 m DEMs used for orthorectification were not detailed enough compared to the resolution of the UAV imagery, it suffered from problems finding accurate EO parameters, and achieved variable accuracy of the automatically generated tie points. The overall accuracy of the method was reported to have an RMS error of 0.48 m,(corresponding to ∼10 pixels), however, it was acknowledged that the method had only been tested over relatively flat terrain and algorithm performance in areas with higher vertical variability had not been confirmed [
19].
Bryson
el al. [
31] presented a georectification and mosaicking technique that used onboard IMU/GPS data to initially estimate camera pose and then image features were matched across the image dataset. A bundle adjustment then used the initial camera pose estimates and the matched features to refine the camera poses; subsequently the images are then rectified and mosaicked using these poses. The method described by Bryson
et al. [
31] is similar to the method that we propose in that it uses similar processes (e.g., bundle adjustment, feature matching). However, there are significant differences in the platform used (rotary wing
versus fixed wing) and the resolution of the imagery collected. Also, in this study we do not use onboard IMU data; we can automatically identify GCPs, and we integrate theuse of multiview stereopsis algorithms into the solution.
These techniques performed well but many are based on traditional photogrammetric software designed to process imagery collected from conventional platforms. Some of these techniques have some key disadvantages: they use existing underlying DTMs and base orthophotos, they rely on complex workflows to estimate camera EO parameters, and, in some cases, require human intervention to identify GCPs.
In this study, we describe a methodology for geometric image correction that uses new CV and SfM algorithms that are more applicable to UAV photography. The technique is fully automated and can directly georeference and rectify the imagery with only low accuracy camera positions, resulting in UAV image mosaics in real-world coordinates. Alternatively, GCPs can be automatically identified to improve the spatial accuracy of the final product. The automation and simplicity of our technique is ideally suited to UAV operations that generate large image data sets that require rectification and mosaicking prior to subsequent analysis.