1. Introduction
Carbon peaking and carbon neutrality goals are currently popular topics all over the world [
1]. Forests account for a high proportion of global above-ground carbon stocks [
2]. An accurate AGB estimation is the data basis for studying forest carbon stocks and investigating their impact on climate change and ecosystem functions. However, traditional forest inventory for biomass estimation includes inevitable drawbacks when taking large-scale measurements [
3], such as great time consumption and high costs, making it difficult to popularize. Remote sensing data contribute to obtaining spatially contiguous high-precision data [
4,
5]. Among these, the cost and performance of the AGB estimation model are the main factors determining the selection of remote sensing data sources.
The observational advantages of remote sensing, such as wide coverage and fast operation, compensate for the shortcomings of traditional methods of measuring forest biomass. Current remote sensing methods for estimating AGB mainly utilize optical remote sensing and light detection and ranging (LiDAR). Optical images contain spectral information of the forest on a large scale. The spectral information obtained from satellite images has a good correlation with AGB, representing its potential to inform predictors in AGB regression. However, if the tree density is high it can be difficult to quantify a reliable statistical relationship between spectral information and biomass [
6]. Unlike optical image systems, LiDAR systems capture the vertical distance between the sensor and the object. The 3D structural data of vegetation generated by LiDAR has been adopted in many applications [
7]. Synthetic Aperture Radar (SAR) at lower frequencies (L and P bands) has the ability to penetrate deep into the forest and interact with the trees, providing a better correlation with AGB [
8]. However, it has stronger ground effects than LiDAR. Owing to their distinct features, optical images and laser scanning point clouds can be combined to monitor forest AGB changes.
The synthesis of multi-source features for modeling AGB has been widely used for AGB estimation [
9]. For example, optical image data and ALS data [
10]. de Almeida et al. [
4] explore the combination of ALS and hyperspectral data, which showed significantly improved performance in Amazon AGB estimation. It has been proved that remote sensing data sources (ALS, Airborne hyperspectral imagery, or their combination) have a greater impact on modeling outcomes than regression methods. In addition, there are many challenges in the selection of suitable metrics, as well as many different models [
9]. In contrast to deep learning methods, various machine learning approaches have been explored to investigate the importance of different variables in the estimation of AGB. The choice of parametric and non-parametric statistical models is crucial for AGB estimation, as the performance of models varies significantly in different modeling methods [
3]. Previous studies have claimed that their developed AGB estimation models can achieve satisfactory results using variable selection, different regression models, and multi-source and multi-resolution data. However, to the best of our knowledge, there has been little prior research on the fusion of image data with different resolutions and ALS data in AGB modeling.
To address these knowledge gaps, our work combines medium- and high-resolution satellite data with airborne point clouds. The impact of different resolutions of satellite imagery on modeling performance is investigated. Satellites data have large coverage and are publicly available, while ALS data can capture 3D tree morphology information but usually cover a smaller area owing to budgetary limitations. This study proposes a framework to leverage the different advantages of satellite imagery and ALS data in estimating aboveground biomass. Specifically, instead of developing a new algorithm, we assess several regression models using dual-source data. This assessment helps us to identify the best regression model based on model performance and cost, which is then used to estimate the aboveground biomass over large areas. AGB modeling is explored in this paper to answer the following questions: (1) What impact does the integration of ALS data variables and satellite imagery variables have on the precision and bias of AGB estimations? (2) To what extent do the different regression methods affect the accuracy and bias of AGB estimation? and (3) How does the spatial resolution of satellite imagery influence the estimation of AGB?
The remainder of this paper is organized as follows.
Section 2 describes the multi-source data, the extracted variables, and the methods used to explore the AGB regression modeling for forested areas. First, the vertical structure of the forest sample plots is obtained by applying terrestrial laser scanning (TLS) and unmanned aerial vehicle laser scanning (ULS) data on field plots. The extracted diameter at breast height (DBH) and tree height of individual trees are then used to estimate plot-level AGB through allometric model equations to obtain the AGB as a dependent variable. Independent variables are extracted from the ALS data and images with different resolutions in order to explore the correlation between the different variables and forest AGB.
Section 3 describes the experimental results and evaluation. A discussion is presented in
Section 4, in which AGB models built by the different methods and data sources are compared and analyzed. Finally, the paper is summarized in
Section 5. This study explores the performance of AGB modeling with a fusion data source and investigates the impact of integrating spectral information with different cost considerations into high-cost ALS data.
2. Materials and Methods
As illustrated in
Figure 1, the proposed method includes the following main steps: (1) tree parameter and AGB calculation at the sample plot level; (2) extraction and selection of variables from satellite images and ALS data; and (3) AGB regression model construction and evaluation using four regression methods and multi-source data.
2.1. Study Areas and Datasets
Guangxi province, China, has a subtropical monsoon climate with abundant rainfall and heat. Most of the forest plots are planted trees. In this study, four forested areas in Guangxi province were selected, as listed in
Table 1 and
Figure 2. In the experiment, a total of 68 plots of 15 × 15 m were set up. The experimental data included ULS data, TLS and ALS data, Landsat 8 (LS8) data, and Gaofen2 (GF2) image data. The ULS and TLS data were collected in June 2020. The ALS data and satellite images were collected in 2019 and 2020, respectively.
The ULS, airborne, and terrestrial LiDAR data allowed for complete scanning of the internal and upper structure of the forest. An Austria -400 3D laser scanning system, which scans 122,000 points per second, was used for the TLS data acquisition in the experiment. A Netherlands ULS, which has a flight height of about 50–70 m above the ground, was used for the unmanned aerial vehicle (UAV) data acquisition. The ALS data were acquired using a fixed-wing manned aircraft equipped with an Austria -- airborne LiDAR scanning system. The ALS system was used at an average flight altitude of 2500 m above ground, with a full area point density of over three points/m2. The ALS data were preprocessed to produce a digital terrain model (DTM) and a normalized point cloud.
For the sample plots, the GF2 satellite images were downloaded from the China Center for Resources Satellite Data and Application, which provides cloud-free images at the sub-meter level. For the sample plots, the GF2 satellite provides cloud-free images at the sub-meter level. The images have four spectral bands with a spatial resolution of 4 m (wavelength 0.45–0.9 µm) and one panchromatic band with a spatial resolution of 1 m. Similarly, the LS8 images have nine multispectral bands with a spatial resolution of 30 m along with a panchromatic band with a resolution of 15 m. Pansharpening was performed on the GF2 and LS8 data using the software Esri from Davis, CA, USA. After pansharpening, the images were cut and mosaicked, then the preprocessed satellite images of the study area were obtained.
2.2. Construction of Field Plots
According to
Table 1, all TLS and ULS data in four forests in three cities were collected. The distribution of sample plots is shown in
Figure 3. The collected point clouds can be used to segment and calculate individual trees. The accuracy of the point cloud method for obtaining tree parameters has been verified in [
11]. The AGB of an individual tree is the sum of the parts, including the stems, branches, and leaves. The AGB sample plots were assessed based on the existing allometric equations [
12], as shown in
Table 2. TLS obtains rich information on the bottom part of the tree, based on which the
can be accurately calculated. ULS is able to capture information on the top of the tree, the ground height, and the highest point of the tree, which are important for correctly calculating
H. The combination of TLS and ULS data can be used to obtain complete information on the sample plots. First, the collected multi-station TLS point clouds were co-registered to the same coordinate systems, then the TLS and ULS data were co-registered using the Random Sample Consensus (RANSAC) method [
13]. The average registration residual for the TLS-to-TLS scenario was 0.049 m, while for the ULS-to-ALS scenario it was 0.299 m. The fused point cloud was then used to automatically perform an accurate individual tree extraction [
14]. In addition, we used the TLS-to-ALS registration from [
15]. The average registration residual for the TLS-to-ALS scenario was 0.049 m. The conversion of ALS data to 1 m resolution imagery allowed for geographic registration of the ALS and GF2 imagery using the geographic registration method in ENVI 5.3. Satellite image plots with the same geographic extent were obtained from TLS point cloud plots.
2.3. Variable Extraction and Selection
2.3.1. Satellite Image Processing and Metric Extraction
Several imagery spectrum variables or transformation results have been proposed as potential AGB predictors. Based on the LS8 and GF2 images, some of the optical image variables were obtained by principal component analysis (PCA) transformation, minimum noise fraction (MNF) transformation, and transformation by various vegetation indices [
5]. Descriptions of the variables and the calculation equations are provided in
Table 3. All the satellite image variables were obtained at the plot level.
For GF2 data and LS8 data, the raw bands have different spectral characteristics in terms of the absorption and reflectance of vegetation, and are used as candidate variables, as shown in
Table 3. Here,
,
,
, and
correspond to Bands 1 to 4, respectively. PCA is able to extract the most important components of the image, and is commonly used to remove noise from satellite data. The first three principal components are not correlated with each other after transformation. The data variance of the first principal component (
) is the largest, then those of
and
decrease in turn. The retained three
components arrange the components from largest to smallest according to the signal-to-noise ratio. In addition, the raw bands can be combined to calculate vegetation indices in order to highlight the vegetation. For example, the
is used to provide an indication of the health and growth of the vegetation; the
represents an improvement over
in terms of a decoupling the canopy background signal and reducting atmospheric influences. The
and
reflect the difference between the reflection of vegetation in the visible and near-infrared bands. The
and
are used to reduce the effect of the soil. The
is insensitive to aerosols, and is particularly suitable for monitoring areas with high atmospheric aerosol levels [
16].
Table 3.
Metrics calculated from the satellite image data.
Table 3.
Metrics calculated from the satellite image data.
Abbr. | Description | Equation | Reference |
---|
, , , | Raw optical image bands | | |
PC1, PC2, PC3 | Extraction of the three band of PCA | | |
MNF1, MNF2, MNF3 | Extraction of the three band of MNF transformation | | |
EVI | Enhanced vegetation index | | [17] |
NDVI | Normalized difference vegetation index | | [18] |
RVI | Ratio vegetation index | | [19] |
DVI | Differential vegetation index | | [20] |
SAVI | Soil-adjusted vegetation index | | [21] |
MSAVI | Modified soil-adjusted vegetation index | | [22] |
ARVI | Atmospherically resistant vegetation index | | [23] |
2.3.2. ALS Data Processing and Variable Extraction
A range of ALS variables are used to characterize the forest canopy and vertical structures, including height percentile variables, canopy cover variables, and tree height variables at the plot level. In this case, to construct the height percentile variables within a sample plot, the height at a given percentile is extracted from the normalized point cloud. The height percentile variables show the vertical division of the forest. All the extracted ALS variables are listed in
Table 4.
In general, the first return points of ALS point clouds are the canopy points in the forest, while the last return points are the ground points. The percentage of first return points to total points () variable describes the proportion of canopy points to the total number of points in the forest. Further extraction of the canopy height variables can describe the vertical distribution and variability of the forest canopy. Canopy cover is defined as the percentage of vegetation returns to the total number of returns, which describes the planting density of trees at the horizontal level.
2.3.3. Selection for Satellite Imagery and ALS Data Variables
As the extracted variables contain redundant information, variable selection is a key issue. Importance ranking of variables and Pearson correlation analysis was used to filter the variables. Finally, suitable optical image and ALS data variables, along with their combinations, were added to the regression model.
After extracting variables from the preprocessed GF2, LS8, and ALS data, the random forest (RF) importance ranking of the variables was calculated. The change in the mean squared error (%IncMSE) can be considered as the contribution of the variable to the AGB prediction and used to assess the importance of specific variables. In general, the %IncMSE is used to interpret the decrease in the precision of the AGB prediction when the variable is removed. When %IncMSE is higher than 4%, the variable is retained. Meanwhile, the correlation between the extracted variables was calculated using Pearson correlation analysis. In this paper, 0.7 was considered as the assumed threshold for a high correlation coefficient. If the correlation coefficient exceeds 0.7, only the more important variables are kept. In this way, those variables with high importance and low correlation are selected for modeling.
2.4. Regression Models
The regression methods considered in this study included both parametric and non-parametric methods, namely, the stepwise regression method (SRM), support vector machine (SVM), boosting tree, and bagging tree approaches. Ten-fold cross-validation [
24] was employed to prevent overfitting of the models using all 68 plots.
Multiple variable regression analysis is a commonly used method in biomass modeling [
25]. In this study, the SRM method was used to describe the relationship between the independent variables and the dependent variable (AGB). When a new dependent variable was introduced, the original variables were tested one at a time. Variables were retained (
p < 0.05) or removed depending on their significance level. The independent variables from
Table 3 and
Table 4 were filtered via SRM to build the final model, as shown in Equation (
1):
where
b is an intercept and
is the parameter for
fitting in the SRM. For metrics,
is the one in
Table 3 or
Table 4 from the images and ALS data retained after feature selection and the SRM.
The other three methods (SVM, boosting tree, and bagging tree) are all non-parametric methods which use default parameters. The same type of learner was used in both the bagging tree and boosting tree methods; the differences are shown in terms of sample selection, sample weight, prediction function, and parallel computation. SVM is a dichotomous classification method that separates two categories by seeking an optimal decision boundary at a maximum interval.
The precision and bias vary depending on the different regression models. To assess the results of the different regression models, the coefficient of determination (
), root-mean-square error (
), and
were calculated in order to evaluate the performance of the models. The definitions for the calculation of each indicator are as follows:
where n is the number of samples,
is the ground truth of the
sample plot,
is the predicted value for the
sample plot, and
indicates the mean of the ground truth of the sample plots.
5. Conclusions
In this study, medium-resolution and high-resolution satellite imagery were combined with ALS data to estimate model performance and evaluate costs. First, TLS and ULS data were used to acquire individual tree parameters and sample AGB plots, replacing the traditional ground-measured data. The combination of ALS point clouds and satellite images was then used to obtain structure and spectral information of the forested areas. Satellite imagery, ALS, and fused features were extracted and filtered for AGB modeling using four regression methods. The most important part of this paper is the exploration of the performance of the different data sources, especially the performance of AGB estimation combined with medium-to-high resolution imagery and ALS data. the ALS model performed the best, followed by the GF2 model, while the LS8 model exhibited poorer performance. Among the different methods, the GF2-ALS model developed using SRM performed the best, with an of 0.82. The cost needs to be considered during data selection as well; thus, remote sensing data with different costs were used to explore the potential of AGB estimation, including free and low-cost satellite data as well as more expensive airborne point cloud data. Using imagery alone as the data source, the GF2 model provided an increase in terms of of 0.1 per square kilometer compared to the LS8 model, at an additional cost of 1 USD. When using fused data, the GF2-ALS data model increased by 0.2, while the cost increased by USD 351 per square kilometer. Combining imagery with high spatial resolution and ALS data significantly improved the performance of the AGB model. Overall, it is necessary to comprehensively consider both cost and model performance for large-scale estimation of AGB.