1. Introduction
Wetlands have been identified as valuable resources that provide a variety of ecological and socioeconomic benefits [
1], but they are also threatened due to human activities, such as agricultural intensification and climate change [
2]. These threats and others make monitoring the spatiotemporal variation of wetlands’ hydrological processes crucial to their effective management. Here, by hydrological processes, we refer to wetlands’ highly variable environments characterized by hydric soils temporarily or permanently flooded by water. When dry, wetlands resemble surrounding uplands, whereas when inundated, they can have either moist soils or surface water that ranges from centimeters to meters deep. There are also high levels of diversity in wetland cover classes, wherein some inundated wetlands are filled with emergent or submerged vegetation, and others are absent of all vegetation.
Though the dynamic nature of wetlands makes them ecologically valuable to numerous flora and fauna, this also makes them difficult to monitor [
3,
4]. Monitoring depressional wetlands can also be challenging because these highly dynamic systems are primarily dependent on climate and local weather systems for ponding, and can often be relatively small (<40 ha) [
5,
6]. The interplay among water, vegetation, and soil results in wetlands that share spectral reflectance characteristics of both aquatic and terrestrial environments. Accurate and unbiased estimates of wetland surface water across the range of natural conditions have therefore eluded scientists.
The Prairie Pothole Region (PPR) is one example of a high-risk, dynamic wetland system composed of millions of temporary, seasonal, and semi-permanent depressional wetlands, called potholes. These potholes are known for their cycles of drought and deluge, which drive important ecosystem functions, such as the abundance of aquatic invertebrates [
5]. The PPR covers an extensive area of approximately 750,000 km
2, including parts of five US states and three Canadian provinces (
Figure 1), and provides habitat for over 50% of North America’s migratory waterfowl [
7,
8]. Hydroperiods in the potholes vary from days to years, but seasonal wetlands that maintain water for less than four months are common [
9,
10]. Reduced surface water area and changes in hydrology are common in PPR wetlands, for example, as caused by tile draining to allow for higher agricultural production [
11], or upland sediment erosion into wetlands, which, though a natural process, is often accelerated by agricultural activity, which fills potholes, and reduces their volume [
12]. The total wetland loss in the PPR caused by climate change and human activity was estimated to be 30,000 ha between 1997 and 2009 [
10]. A resulting shift towards smaller wetlands and shortened hydroperiods [
13,
14,
15] has underscored a need to understand how these altered hydrological conditions affect ecosystem services and habitat provisioning at broad spatial scales, which starts with an accurate and repeatable estimate of spatial variation in wetland surface water.
Remote sensing analysis can provide broad-scale spatial and temporal information about wetland surface water [
16,
17]. Previous studies utilized various remote sensing technologies to monitor wetlands across the PPR [
8,
18]. For example, [
8] used high-resolution NAIP data and LIDAR Digital Elevation Models (DEMs) to map PPR wetland inundation, and tested the results with the Wildlife Service National Wetlands Inventory (NWI). However, though NAIP and DEMs can provide fine spatial resolution data (<1 m), these methods cannot capture temporal variation within a season, as NAIP and LiDAR data are not collected intraannually. Optical sensors, such as Sentinel-2 and Landsat, can detect surface water, and have often been used with success for deep, permanent, large water bodies [
19,
20]. For example, the Joint Research Centre (JRC) provided Landsat-derived surface water products useful for capturing large wetlands. However, the JRC and other products that rely on moderate resolution spectral data often underperform in detecting water in small potholes with dense vegetation canopies and mixed pixels. Others have used Sentinel-1 synthetic aperture radar (SAR) data (spatial resolution: 10 m) to map water extent in the PPR with reasonable success [
21,
22], as SAR data is robust to cloud cover, and 10 m data provide reasonable spatial resolution. However, no study has solved all of the challenges for mapping the spatial and temporal variation of surface water in the PPR, and made their algorithm available for long-term monitoring by the research and conservation community. There is a need for open-science algorithms that capture the variation of surface water, can map water even below emergent vegetation, and still represent surface water in smaller potholes.
This study relies on geospatial informatics, which is an expanding field, and includes remote sensing of landscape-scale big data, the development of machine learning tools, and integration with High-Performance Computational (HPC) cloud computing resources. Geospatial informatics offers a unique opportunity for the fast processing of broad-scale remote sensing data in a short time, providing a more comprehensive set of applications, and addressing the limitation of traditional methods [
23,
24]. The Google Earth Engine (GEE) cloud geospatial computing platform provides a web-based interface to fast parallel processing on Google HPCs with planetary-scale analysis capabilities. The GEE provides a multi-petabyte catalog of global satellite and geospatial datasets [
25], such as Landsat, MODIS, and Sentinels. It also gives users the ability to analyze, manipulate, and map the results, and create web-based applications to repeat the analysis [
26]. As part of our work, we utilized the capabilities of GEE to create an open-source algorithm for mapping wetlands that can readily be shared with conservation managers and the science community for continued use and development.
To help solve the historical problems of surface water mapping in the PPR, this paper presents a multi-sensor fusion approach that integrates selected fine-resolution (10-m) bands of Sentinel-2 with 10-m Sentinel-1 SAR data, allowing an estimate of both large and small inundated areas. The integration of SAR with optical data also offers complementary information, and can significantly improve the interpretation and classification of results [
27,
28], for example, by allowing surface water estimates beneath closed-canopy herbaceous vegetation. Altogether, this study aims to provide scalable surface water estimates that can assist with habitat models for wetland-dependent organisms, such as waterbirds or aquatic invertebrates. We will provide our algorithm in a format that can be freely shared and readily implemented by those with minimal coding and modeling experience, such as conservation managers. We achieved this through the following objectives: (1) we developed an open-source framework to map the spatial variation in wetland surface inundation and vegetation based on Sentinel-1 SAR data and Sentinel-2 high-resolution bands within the GEE platform; (2) we deployed this algorithm over a portion of PPR in the high priority conservation area of the PPR; (3) we analyzed the accuracy of this algorithm for generating the information needed for setting conservation targets.
3. Data
The data includes a set of aerial imagery to serve as ground truth data, the high-resolution bands (bands 2, 3, 4, and 8) of Sentinel-2, and C-band SAR data Sentinel-1 sensor. We describe the details of the dataset below.
3.1. Ground Truth
Researchers from Duck Unlimited Inc., a non-profit conservation organization, provided the ground truth data. These data include georeferenced aerial photographs of the PPR wetlands in North Dakota collected through a partnership with the United States Fish and Wildlife Service (USFWS). The USFWS used a fixed-wing aircraft to collect imagery in a 1.5 m spatial resolution. If necessary, the images were orthorectified by technicians or research scientists, and used to estimate wet areas during spring and summer for the research projects. We used the summer data of two years (2016 and 2017). These datasets were provided in shapefile formats, and showed wetland boundaries, delineating dry and inundated wetland areas. Some of these wetlands also contained emergent vegetation cover, as identified by field observers (range: 0–80% vegetation cover).
We examined the spectral reflectance of wetland and non-wetland classes, which differed substantially, as indicated by a plot generated for a portion of the study area (
Figure 2). The spectral characteristics of wetlands and open water especially differ due to mixed pixels, differences in water depth, the potential presence of vegetation, and variation in water turbidity. Compared to forest and agriculture, deep open water exhibited lower spectral reflectance, as water rapidly absorbs electromagnetic radiation, especially longer wavelengths, and attenuation increases with water depth. The spectral reflectance of wetlands is intermediate to upland vegetation and open water, making wetlands a distinct and highly variable land cover type. Wetlands and moist soils show a dampened near-infrared (NIR) and shortwave infrared (SWIR) reflectance compared with upland vegetation, but are too shallow to attenuate all electromagnetic radiation, as often occurs in deep open water. The spectral characteristics of wetlands will also change rapidly with inundation and vegetation status. To account for this in our ground truth point selection, we selected random points within the digitized wetland surface water area polygon shapefiles to provide the ground truth pixels in GEE. We also included non-wetland training data that represented agriculture, forest, and urban areas. We collected those points using visual observation of high-resolution Google Earth images. The total number of points (including wetland and non-wetland classes) for the years 2016 and 2017 were 895 and 2231, respectively.
Additionally, we provided two out-of-sample subsets for small-vegetated (1440 points) and non-vegetated wetlands (1680 points). Ducks Unlimited provided the vegetation data within the surface water polygons. We used those additional points in a separate accuracy assessment process to evaluate the performance of our method for the smallest wetlands, which are the most challenging to classify as they contain the highest proportion of mixed pixels. The average time difference between ground truth data (wetlands and non-wetlands) and satellite data acquisition was one month.
3.2. Sentinel-1
Sentinel-1 obtains C-band synthetic aperture radar (SAR) images at various polarizations and resolutions. C-band Level-1 Ground Range Detected (GRD) data were obtained through GEE. These data were collected in the Interferometric Wide (IW) swath mode with a spatial resolution of 10 m, a swath width of 250 km, and a repeat cycle of 12 days. These data are available in GEE as preprocessed datasets that express each pixel’s backscatter coefficient (σ°) in decibels (dB). The preprocessing steps include applying orbit files, thermal noise removal, radiometric calibration, and orthorectification (terrain correction). This study used two polarization modes: single co-polarization with vertical transmits and receive (VV), and dual-band co-polarization with vertical transmit and horizontal receive (VH). A total of 20 ascending orbit Sentinel-1 SAR scenes spanning two months were collected over the study area. We used median values of the S1 temporal time series in the multisensory band composite. A median composite can provide a cleaner image with reduced speckle noise [
29]. These data were acquired from July to September 2016. The descending orbit data were excluded from the study because they lacked sufficient coverage orbit over the study area (
Table 1). Unlike optical sensors, SAR data can be acquired day and night and during cloudy conditions, completely independent of solar radiation, which is particularly important in high latitudes, and increases the availability of multi-temporal observations for assessing wetland hydroperiods. Moreover, SAR data is sensitive to both open water and below-canopy inundation, making it advantageous to identify inundation in vegetated wetlands [
30]. The C-band SAR data of Sentinel-1 is also known to be useful for the discrimination of water and non-water classes in non-forested wetlands with short herbaceous vegetation (e.g., bog and fen) [
31]. This is in contrast to the longer wavelengths, such as L-band SAR data, that are preferred to detect inundation areas in forests due to higher penetration depth [
32].
3.3. Sentinel-2
We used a total of 118 Sentinel-2 (S2) images with level 1C processing to surface reflectance as part of this study. S2 is a wide-swath multi-spectral earth observation mission with spatial resolution varying from 10 to 60 m. The multi-spectral data include 13 bands in the visible, near-infrared (NIR), and shortwave spectra, revisiting every 10 days under the same viewing angle. The level 1-C products within GEE are orthorectified and radiometrically corrected, providing top-of-atmosphere (TOA) reflectance values. We adopted an automatic cloud masking procedure using the QA60 band of the S2 1C product to mask the opaque and cirrus clouds. We also set the cloud coverage within S2 scenes to a maximum of 10 percent over the time of data acquisition. Due to frequent cloud coverage over the study area, we used a median of 5 months (May to October 2016) of the reflectance values. We used four bands of S2 (blue, green, red, and near-infrared) with a spatial resolution of 10 m to create the band compositions for supervised classifications using machine learning algorithms. We used median values of S2 temporal images to be used in the multisensory band composite. Additionally, we calculated the normalized difference vegetation index (NDVI) [
33] and normalized difference water index (NDWI) [
34] using the four bands of S2, and used them as predictors in the classification process (
Figure 3).
Figure 4 shows the variation of NDWI over two potholes in the study area, showing periods of inundation and drought. Typically, NDWI > 0.3 and <0.3 indicates the presence and absence of detectable surface water [
35]
3.4. JRC Global Surface Water Products
This study focused on depressional wetlands that, by definition, are not permanent water, and often change inundation status quickly due to climate variability. We used the JRC product to differentiate wetlands from permanent water bodies across the entire study area. The Joint Research Centre’s Global Surface Water (JRC GSW) product contains the surface water’s spatial and temporal distribution at 30 m resolution. The product provides different characteristics of surface water, including occurrence, intensity, seasonality, recurrence, transitions, and maximum water extent [
36]. The JRC GSW data were generated using more than 3 million scenes from various Landsat missions (Landsat 5, 7, and 8) between 1984 to 2019. The pixels were classified into water and non-water classes using an expert system. JRC GSW presents results each month for the entire period (1984–2019) for change detection. We defined permanent water bodies as those classified as water in >90% of the observations within the period (1984–2019), and filtered those pixels from the study. The permanent wet pixels were excluded from the final results to map the surface waters that only belong to wetlands.
5. Results
The trained SVM and RF algorithms were used to classify multisensor composites for the years 2016 and 2017. The accuracy assessment showed SVM and RF models yielded favorable results across the testing data. However, the RF outperformed the SVM in both 2016 and 2017 testing data. Therefore, the RF model was selected as the optimum model for wetland inundation mapping. The overall testing data accuracy for the SVM and RF model for the year 2016 was 0.88 and 0.95 (
Table 4), and for the year 2017 was 0.88 and 0.94, respectively (
Table 5). A summary of accuracy assessment using overall accuracy, Kappa, Sensitivity, and specificity for the years 2016 and 2017 is shown in
Table 4 and
Table 5, respectively.
We mapped wetland surface water across the study area for the years 2016 and 2017.
Figure 6 and
Figure 7 show the identified wetlands across the study area for the years 2016 and 2017, respectively. The spatial resolution of the final maps is 10 m. The local wetland inundation in the study area can also be extracted based on the results. For example, a portion of the study area is magnified in
Figure 8. A visual comparison between the aerial survey (ground truth data) and wetland surface water map (based on RF classifier) in
Figure 9 shows that surface water in wetlands was mapped with acceptable accuracy (overall accuracy: 0.95; Kappa: 0.9). As we mentioned before, we also tested our algorithm in the identification of surface water in small vegetated and small non-vegetated wetlands. We compared the results with NDWI and Landsat-derived JRC surface water products (
Table 6 and
Table 7). The results showed higher accuracy in RF as the optimum model (overall accuracy 0.76) compared to JRC (overall accuracy 0.60) and NDWI (overall accuracy 0.62) in surface water detection in small and highly vegetated wetlands (
Table 6). The RF (overall accuracy 0.81) also outperformed the NDWI (overall accuracy 0.44) and JRC (overall accuracy 0.41) in small non-vegetated wetlands (
Table 7).
Figure 6 and
Figure 7 show the identified wetlands after excluding the permanent wet pixels for the years 2016 and 2017, respectively. The presence of emergent vegetation within the identified wetlands, as indicated by NDVI, for both years is also shown in
Figure 6 and
Figure 7.
6. Discussion
This study developed an automated workflow within the GEE platform for mapping wetland surface water for 2016 and 2017 by applying the RF classifier to a combination of Sentinel-1, Sentinel-2 band data, and spectral reflectance indices derived from Sentinel-2. The results were evaluated using statistical coefficients and visual comparison with ground truth data, as well as results from Landsat-derived surface water products. The inundation of relatively large and deep water bodies can be identified in most existing remote sensing products. However, mapping wetland surface water in the PPR region is challenging due to two main reasons: (1) most PPR wetlands are very small and are highly sensitive to climate variability; and (2) the wetlands can be dry or wet, and they can contain different species of vegetation that can mask surface water. Therefore, these wetlands have complex spectral characteristics that complicate the detection of surface water extent from satellite sensors. Our approach also provides information regarding emergent vegetation within those wetlands. This is important because emergent vegetation provides shelter and food for aquatic vertebrates, such as waterfowl communities [
46]. Our method can also detect water below those vegetation canopies; water that would otherwise be excluded from habitat maps. We also provide an open-science algorithm in GEE for repeating these estimates, which can form the basis of long-term wetland surface water monitoring in the PPR.
A typical approach for mapping wetlands uses passive remote sensing that relies on water’s optical properties, which differ from other land use types [
47,
48,
49]. For instance, water quickly absorbs electromagnetic radiation, and more rapidly attenuates longer wavelengths than shorter ones [
50,
51,
52]. However, the application of optical sensors in identifying PPR wetlands is limited, since both water depth and mixed pixels can change the water spectral signature [
50,
51,
52]. Moreover, organic carbon compounds, water turbidity, chlorophyll content, and suspended materials can also add variation to water spectral properties. We addressed this issue by integrating the high-resolution bands of optical and radar sensors.
Figure 9 shows a visual comparison of surface water derived from different remote sensing data. The figure shows that many small wetlands were not captured in the Landsat-derived surface water products, since the spatial resolution of Landsat products (30 m) is too coarse to capture those wetlands. This is typical of many surface water classifiers that are focused on deep open water, as they misclassify the highly variable spectral signatures of inundated wetlands [
53]. Moreover, optical sensors struggled to capture wetlands covered by emergent vegetation. This study integrated Sentinel-1 SAR data into the high resolution (10 m) optical bands of Sentinel-2 to create a more robust classifier (
Figure 9A). We also used a wider temporal window for the optical bands, which increased the number of observations over the study area. This allows our algorithm to minimize the effects of cloud covers, and identify the small wetlands by detecting frequently wet pixels. We performed an independent accuracy assessment on small and highly vegetated, and small non-vegetated wetlands. The results showed acceptable accuracy for both types of wetlands. We also compared the results with surface water maps derived from optical sensors (
Table 6). Our algorithm performs better in identifying both large and small wetland water bodies than the Landsat-derived JRC and Sentinel-2-derived NDWI algorithms (
Table 4,
Table 5 and
Table 6).
The wetland surface water was also evaluated in vegetated and non-vegetated wetlands. Visual observation shows that the small inundated wetlands contain more vegetation compared to larger and deeper water bodies. Comparing 2016 and 2017 wetland surface water maps reveals abrupt changes in emergent vegetation in small wetlands. These results agree with the findings of [
54]. They reported that the small, ephemeral wetlands in the PPR experienced more vegetation change variability than larger, semi-permanent wetlands [
54]. Large and deep water bodies can be easily detected by various remote sensing data. For example, [
55] used Landsat time-series to create a global map of inland water dynamics. However, identifying small water bodies in the PPR is challenging due to the wetlands’ size and strong potential for dense vegetation cover. This is very important, as the majority of wetlands in the PPR are small. This causes the surface water in potholes to be highly dynamic. The total surface water area calculated from the JRC product and our classification method was 294 km
2 and 376 km
2, respectively. Algorithms that miss surface water in these small wetlands will be biased, and misrepresent the hydrologic variability on the landscape. For example, small wetlands provide more foraging habitats for organisms that rely on shallow water.
Cloud computing and the advent of multisensor remote sensing data in the GEE have several advantages for large-scale and time-series analysis, such as monitoring wetlands dynamics [
56]. The use of the GEE cloud computing platform is more convenient than traditional methods, considering its processing speed and ease of use [
57]. As more machine learning algorithms and remote sensing data become available within the GEE platform, we expect remote sensing data processing to be simplified even further. Additionally, and unlike most supercomputing centers, GEE is also designed to help researchers quickly disseminate their results to other researchers and interested parties. Once an algorithm has been developed on the GEE, users can generate systematic data products or deploy interactive applications aided by the GEE’s resources [
25]. The fully automated workflow developed for this study allows us to refine the existing data and method, and rapidly apply it to a broad geographical scale to generate estimates in new years. One of the disadvantages of using the GEE cloud computing platform is that it limits the number of field samples and input features. This is especially challenging when the analysis is applied to a large domain, which may reduce the efficiency of the implemented method.