1. Introduction
A land-use map is a thematic map that shows the current status, geographical differences, and classification of land resources. These maps are essential for the dynamic prediction of land-use time series [
1], land-use planning [
2], natural disaster prevention and control, land management [
3], and ecological and environmental protection [
4]. For example, land-use mapping can provide data support for local cropland use and conservation management, the assessment of local food cultivation areas, and the prevention of a lack of grain growth in croplands [
5]. Therefore, the accuracy of land-use maps significantly affects many aspects of land-use planning, including spatial planning.
Many researchers have recently released a number of well-recognized land-use products, such as the 10 m ESA World Cover 2020 (ESA 2020) [
6], the ESRI Global LULC 2020 (ESRI 2020) (with 10 m resolution) [
7], and the Google Dynamic World LULC (10 m) products [
8], but the scales of these products are global and thus offer relatively poor mapping accuracy and contrasting results that are not specific to locally distinctive land-use types. Meanwhile, the accuracy in highly heterogeneous areas of global-scale products tends to be insufficient for satisfying the needs of local governments and society. For example, the Xinjiang Uygur Autonomous Region is the largest provincial-level administrative region in China in terms of land area. One of the focal spots in this region is the southern region of Xinjiang, near Bayingoleng Mongol Autonomous Prefecture, which is bordered by the Tianshan Mountains to the north. This region has high mountains and abundant biodiversity; the central part is a plain oasis, encompassing a large cotton-growing area and the famous Kolar pear plantations [
9]. It contains the largest inland lake in China, Lake Bosten, while the southern part hosts the Taklamakan Desert, the second largest desert in the world, and the Tarim River, the longest inland river in China, which flows from west to east through the desert. Thus, this landscape is spatially and temporally heterogeneous, which helps explain the relatively poor accuracy of current land-use classification products [
10]. At present, aside from the works that describe the region using global-scale land-use products, many researchers have depicted the spatial distribution of specific land-use classes in local regions, such as the spatial distribution of crops in the Baja Oasis agricultural area, the most characteristic of which is the spatial distribution of pear orchards [
9]; however, the distribution is incomplete. As a result, reliable, targeted, large-scale land-use products are lacking in the Bayingoleng Mongol Autonomous Region.
At present, although satellite remote-sensing imaging technology is extensively used in land-use mapping on different scales [
11,
12], it remains challenging to use this technology for highly heterogeneous land-use mapping [
10]. However, the availability and free use of Sentinel imagery have enabled the present research because Sentinel imagery not only actively acquires radar features (Sentinel 1, operating in the C-band) but also passively acquires optical features (Sentinel 2, from visible to short-wave infrared bands). Sentinel imagery has already been used for the fine mapping of crops [
13,
14], yield prediction [
15], flood hazard monitoring [
16,
17], ecological monitoring and protection [
18], etc. With larger study scales and the emergence of high spatial- and temporal-resolution images, the work related to remote-sensing images has become more demanding in terms of data pre-processing and computer capabilities [
19,
20]. Many previous studies have been limited by computational power and usually focus on small areas to reduce image-processing time [
21,
22]. The Google Earth Engine (GEE) is a cloud-based, planetary-level geospatial analysis platform that brings Google’s enormous computing power to bear on a variety of complex and far-reaching problems [
23], including forest monitoring [
24], surface water extraction [
25], the fine mapping of crops [
26], yield prediction [
27], disaster monitoring [
28], and urban-distribution mapping [
29]. By using the GEE platform, a large number of publicly available remote-sensing image datasets can be accessed and processed, providing a stable data source for research that involves remote-sensing imagery [
23].
The spectral reflectance of different growth stages is particularly useful for identifying heterogeneous vegetation, which has been well documented [
30,
31]. The spectral reflectance of overwintered crops in the winter differs from that of non-overwintered crops, and similar patterns exist between evergreen and deciduous trees [
32]. At the same time, overwintered crops break dormancy and grow rapidly in the spring, whereas spring-sown crops are in the seedling stage, so the spectral reflectance of the two differs significantly [
33,
34]. The spectral reflectance of crops also varies within the same season due to differences in crop growth (e.g., phenological periods and growth rates) [
34]. Also, the radar features of crops, such as those delineated in Sentinel 1 image features, can be used to differentiate garlic from winter wheat [
35]. Nevertheless, compared with traditional mapping based on full-fertility imagery, it is unclear how different features at different periods affect mapping, and knowledge in this area would help users to understand how missing imagery affects the accuracy of land-use mapping, which would ameliorate land-use decisions. In addition, full-fertility spectral images are sometimes difficult to obtain due to weather conditions (e.g., clouds, rain, etc.).
Several machine-learning classifiers have successfully been used for land-use classification, including maximum likelihood [
10], classification and regression tree (CART) [
36], support vector machines (SVM) [
11], random forest (RF) [
10], artificial neural networks [
37], and convolutional neural networks (CNNs) [
38]. Abdi et al. [
11] evaluated the performance of multiple machine-learning algorithms when applied to complex northern landscapes and found that the overall accuracy of SVM (75.8%) exceeded that of RF (73.9%). However, a study of land-use classification in the Sahel obtained more accurate results using RF (73.3%) than an integrated classifier (72.0%) or SVM (60.8%) [
10]. RF is an integrated learning model based on the idea of “bagging” to integrate decision trees. Another common and efficient strategy pertaining to integrated learning models is “boosting,” whereby each tree learns from the residuals of all previous trees to improve the classification or regression results. Gradient tree boosting (GTB) is a commonly used boosting algorithm, and its performance in classification and regression was proven long ago [
39]. These classifiers perform differently due to differences in image features, target classes, and classifier parameters; the most suitable classifiers, and even the best parameters, vary across geographic regions.
In conclusion, the spectral reflectance of different periods and radar features play a crucial role in identifying heterogeneous vegetation and distinguishing various land-use classes. However, the impact of the spectral and radar features of different periods on the accuracy of land-use mapping and decision making remains unclear. To address this issue, the present study aims to utilize multi-temporal Sentinel imagery data to achieve the following: (1) Assess the importance, contribution, and effect of the classification model of spectral and radar features in different periods and determine the optimal time to classify land use. Furthermore, to determine the optimal method and parameters suitable for local land-use classification research, we plan to (2) evaluate and compare the performance of three common classifiers (CART, RF, and GTB) related to decision trees for land-use classification. Finally, we (3) generate a reliable and targeted land-use product that is suitable for the highly heterogeneous environment in the southern Xinjiang Uygur Autonomous Region. The product not only depicts common land use categories but also enables the targeted mapping of pear orchards that affect the local economy, thereby providing data support for agricultural planning conducted by local governments.
3. Methods
Figure 2 shows a technical flowchart of this work with the following detailed steps:
Step 1: Determination of the time window: Based on the measured sample points (
Section 2.2.2) and MODIS images (
Section 2.2.1), the time-series SG-NDVI change curve is constructed to determine the time window that can be used for land-use classification (
Section 3.2.1).
Step 2: Aggregation of multi-temporal images: The identified time windows are combined with the sentinel image data, the spectral and radar features that can be used for classification are analyzed, seven schemes of integrating the multi-temporal features for classification are proposed (
Section 3.2.2), and, finally, the multi-temporal multi-spectral imagery are aggregated.
Step 3: Classification of Multiple Schemes: Based on the seven aggregated multi-temporal images, three commonly used classification algorithms (CART, RF, and GTB) are used to perform the classification task (
Section 3.3), and a total of 21 classification scenarios are generated.
Step 4: Validation and comparison: Multiple classification scenarios are validated by comparing the measured and classified classes of sample points (
Section 3.4). In addition, the best classification results are compared with existing land-use products (
Section 3.5).
3.1. Land-Use Classes
Our land-use mapping products were developed to ascertain the local characteristics of arid biota in Northwest China and can assist decision makers in spatial planning, such as in the identification of vegetation spatial structures (crops, grasslands, and forests with economic value and poplar trees and shrubs with ecological value), water management, and desertification management. Based on field visits and the opinions of local experts, we classified land use into the following nine classes (
Figure 3):
Cropland: This class consists of land covered with cropland used mainly for maize, cotton, wheat, pepper, and beet farming.
Forest: This class consists of land planted with trees for ecological conservation, mainly consisting of evergreen coniferous forests in the northern part of the study area, Populus euphratica trees near the Tarim River in the northern part, and drought-tolerant shrubs such as Tamarix ramosissima Lcdcb and Haloxylon ammodendron.
Korla pear: This class includes the geographic areas of Kolar pear plantations for economic purposes.
Grassland: This class mainly includes geographic areas dominated by natural forbs (plants without stems or branches on the ground and lacking a solid structure), mainly encompassing alpine grasslands in the northern part of the study area, with the dominant grass species being Carex stipitiutriculata, Stipa mpurpurea, and Kobresia capillifolia.
Water: This class mainly includes geographic areas covered by a water body, including lakes such as Bosten Lake, the largest inland freshwater lake in China; rivers such as the Tarim River, the longest inland river in China’s Bayingoleng Mongol Autonomous Region section; and reservoirs.
Snow and ice: This class includes areas that are permanently covered by snow or glaciers.
Wetland: This class includes geographic areas dominated by natural herbaceous vegetation that is permanently or periodically inundated by water bodies, which mainly includes the reed wetlands around Bosten Lake in the eastern part of the study area and the grassland wetlands in the northern part of the study area.
Impervious area and bare soil: This class consists of land covered by buildings, roads, and bare sandy soil. Sandy soils are mainly distributed along the margins of the Taklamakan Desert in the southern part of the study area, and urban buildings are mainly distributed in the central part of the study area.
Rocky: This class refers to geographical areas covered by rocks, where very few grasses are present.
3.2. Image-Aggregation Scheme
3.2.1. Determination of the Time Window
To understand the phenological characteristics of the different types of vegetation in the study area, MODIS curves for 2021 were constructed by using MODIS products with a resolution of 250 m. MODIS products have been widely used for phenological window determination [
35]. The MODIS-NDVI images were calculated on the GEE platform using the following formula:
To eliminate noise from the MODIS-NDVI time series curves while ensuring that the curves characterize the vegetation growth process, we used a Savitzky–Golay filter (SG) to smooth the MODIS-NDVI time series curves; this process effectively eliminates the effect of noise in NDVI time series data [
48]. A sample curve of 572 pixels comprising cropland (183), forest (155), Kolar pear plantations (91), and grassland (143) pixels was selected to construct an NDVI mean curve, which is shown in
Figure 4. The curve provides the mean values at different times. The different temporal reflectance patterns resulting from the phenological periods of different vegetation types contain important classification information and possess a high capacity for discrimination.
As seen in
Figure 4, the NDVI time series curve is almost a bell-shaped curve, with the NDVI increasing from low to high and then gradually decreasing with time, constituting a typical vegetation time-series spectral feature. The importance of using temporal phenological information to distinguish vegetation types is illustrated by the different trends in the NDVI values for multiple vegetation classes at different time points during the annual growing season (from May to October) of the major vegetation types in the study area.
From 5 May to 10 September, the NDVI values of pear orchards exceeded those of grasslands, and the NDVI values of grasslands exceeded those of forested lands because the poplar forests with low NDVI values reduced the overall NDVI in the forested land [
49]. However, the NDVI values in the cropland varied in different periods, namely, from 5 May to 28 May, when the crops were all in the seedling stage with low cover and the spectral response of the bare soil produced the lowest NDVI values for the crops. From 28 May to 20 June, as the crops grew (and corn entered the plucking stage and cotton entered the bud stage), the crop volume and the cover gradually increased, and the influence of the soil gradually decreased, causing the NDVI of crops at this time to exceed that of forested land, although it remained less than that of orchards and grasslands. From 28 June to 10 September, the crops continued to grow rapidly (maize entered the stalking stage and cotton entered the boll stage), chlorophyll accumulated rapidly, and plant height, volume, and cover reached a maximum, so the NDVI value reached a maximum on 28 July. Thereafter, the crops gradually entered maturity (corn entered the milking stage and cotton entered the spitting stage), chlorophyll levels began to decrease, lutein levels increased, and leaves even began to fall off. From 25 September to 5 November, the vegetation NDVI decreased rapidly, the crops entered the harvesting period, the orchards presented the largest NDVI because the fruit trees still had some greenness in their leaves after harvesting, the grasses entered the wilting period, and the leaves turned yellow, so the NDVI values were at a minimum. The influence of evergreen coniferous forests kept the forested land NDVI at a higher level.
This scenario illustrates the importance of using temporal information to distinguish vegetation types. Therefore, in conjunction with the above analysis, we first identified four time windows that can be used for land-use classification: (1) the first window covers the period from 5 May to 27 May (Time1), (2) the second window covers the period from 28 May to 20 June (Time2), (3) the third window covers the period from 28 June to 10 September (Time3), and (4) the fourth window covers the period from 25 September to 5 November (Time4). Then, the nearest neighbor interpolation method was used to resample the three image datasets (Sentinel-1/-2 and elevation data) into images with a resolution of 10 m. Finally, multi-temporal cube datasets were generated using the median synthesis method based on the time windows (Time1~Time4) determined from the NDVI time series. To investigate the contribution and importance of the temporal phenological features to the classification model, seven experimental schemes (
Table 2) were tested: experimental Schemes 1–4 compare the abilities of each temporal feature, and Schemes 5–7 analyze the contribution and importance of different phenological periods.
3.2.2. Temporal Differences in Characteristic Variables
A series of features was computed from the remote-sensing image data for classification, including radar features for Sentinel-1, spectral features for Sentinel-2, and elevation features, as detailed in
Table 3. These features were integrated in a median way during the given periods (Time1~Time4). A concern faced in this regard was that the median synthetic image had a localized but extremely small area missing (0.07% and 0.25%) due to the short duration of Time1 and Time2. Therefore, to avoid affecting the overall results, the composite image corresponding to the years 2019 and 2020 was used for filling.
To illustrate the capability of these features to classify land use, we also compared the differences between these features across phenological periods.
Figure 5 shows the time series of the average reflectance bands and the vegetation indices for all sample points, illustrating that the NDWI discriminates between water bodies and other classes, with multi-period NDWI values greater than 0.2 for water bodies, while snow and ice classes are close to 0 and all other classes have NDWI values less than 0. The values of TC_Wetness differ significantly from those of snow and ice and other classes, with the multi-period TC_Wetness values being greater than 0.5, whereas all the other classes presented values less than 0.1. The B11 and B12 bands from both the impervious surface and bare soil classes are higher than those of other classes, with the difference in the B12 band being more significant. The B12 band is greater than 0.3 over multiple periods for both the impervious surface and bare soil classes but is less than 0.3 for all other classes. In addition, the BRISI values at time3 and time4 have the advantage of distinguishing impervious surface and bare soil from other classes, with the difference exceeding 0.4 at time3 and 0.8 at time4. The SD_V and SD_VH of the rocky terrain and snow and ice classes are significantly different from those of the other classes (
Figure 5f,g); thus, these radar features can be used as classification features to accomplish land-use classification.
3.3. Classification Algorithm
To assess the importance of different temporal features for classification and determine the optimal time for land-use classification, we used aggregated multi-temporal multispectral images as an input, and three classifiers were used to perform the land-use classification task, which generated a total of 21 classification scenarios (7 schemes × 3 classifier). These classifiers were selected because although they have some performance differences, in general, they are efficient and stable, often used in classification tasks [
10,
36,
39], and suitable for studies such as those evaluating the importance of different temporal features for land-use classification.
3.3.1. Classification and Regression Trees
The CART model is a classification or regression algorithm for generating decision trees based on fuzzy mathematical ideas and is one of the most important and popular tools in the field of modern data mining. It has a wide range of applications [
52], such as the fine mapping of crops [
53], disease diagnosis [
54], digital soil mapping [
55], solar radiation prediction [
56], and suspended sediment load prediction [
57]. This algorithm uses the dichotomous recursive concept to divide a sample into two subsamples layer by layer so that each non-leaf node has two branches; subsequently, it employs the affiliation as a classification indicator taking values from 0 to 1 to yield the probability that a leaf node belongs to a given class. Finally, the classification is determined according to the leaf node’s affiliation. The CART model has two important parameters on the GEE platform: Max Nodes and Min Leaf Population. The maximum number of leaf nodes in each tree is unlimited by default due to the small number of features in this study. The minimum number of leaves was obtained from control-variable experiments.
3.3.2. Random Forest
RF is a machine-learning algorithm that integrates multiple decision trees to train, classify, and regress samples [
58]. It is integrated by bagging; that is, the classification or regression results are voted on by multiple trees (for regression, the classification is averaged). The model has successfully been used in crop distribution mapping [
59], land-use production [
10], PM2.5 prediction [
60], groundwater-level mapping [
61], etc. The advantages of this model are its ability to be simply implemented and quickly trained, good generalizability, superior ability to learn the interactions between features compared with a single decision tree, reduced tendency to fall victim to overfitting, and ability to handle high-dimensional data and unbalanced datasets [
62]. The main parameters of RF include Number of Trees and the Min Leaf Population, and the optimal parameters are obtained from subsequent experiments.
3.3.3. Gradient Tree Boost
GTB is an iterative classification or regression algorithm incorporating a decision tree as the base learner, allowing it to boost integrated learning [
39]. The main idea behind the GTB is that all decision trees in the training set learn first from the residuals of all previous trees and use the negative gradient of the loss function in the current model to approximate the residuals in the boosting tree algorithm and thereby obtain a new regression or classification tree [
63]. Such a serial decision tree approach reduces overfitting, allows for the handling of different types of data, and makes accurate predictions. This method has been applied in many scenarios, including the safety evaluation of steel trusses [
64], document classification [
65], and the classification of algal bloom species [
66]. The main parameters of the GTB are (i) Number of Trees and (ii) Procedural Learning Rate. The optimal parameters are obtained from subsequent experiments.
3.4. Accuracy Assessment
The confusion matrix in the GEE serves to assess and validate the accuracy of land-use classifications, specifically by statistically comparing the classification results at the validation points against the true survey results. For each class, the confusion matrix yields the number of correctly classified classes and the number of misclassified classes (commission errors and omission errors are quantified via Producer Accuracy (PA) and User Accuracy (UA), respectively) [
10]. As a result, the Overall Accuracy (OA), Kappa Coefficient (Kappa), and F
1 score of the classification results can be calculated, among which Overall Accuracy is the ratio of the number of correctly classified sample points to the number of overall sample points, indicating how close the results are to being accepted as true. The kappa coefficient is a ratio that represents the fraction of error reduction generated by a classification versus a completely random classification. The F
1 score is calculated to evaluate the recognition accuracy and recall of each class and the whole model [
59].
3.5. Comparison with Existing Land-Use Products
To emphasize the importance and uniqueness of our products, three existing global-scale land-use products were collected for comparison with our products. These include ESA 2020, ESRI 2020, and Google Dynamic World LULC products generated from 1 January to 31 December 2021 (Google 2021). Although the validity and land-use classes of these products differ from those of our product, some of the land-use classes are comparable with each other because they do not change significantly during this period and some of their definitions are similar. First, the spatial distribution of land use is visually compared for several products. Second, we also compare the discrimination accuracy (F1 score) of similarly defined partial classes to evaluate the similarities and differences between different products.
6. Conclusions
To develop reliable land-use products for the highly heterogeneous environment in the southern Xinjiang Uygur Autonomous Region, this study not only assessed and compared the performance of multiple classifiers for highly heterogeneous land-use classification using multi-temporal Sentinel images but also explored how multi-temporal features affect land-use classification models. The following conclusions can be drawn from the results of this study: (1) The GTB classifier demonstrates significant potential for land-use classification, exhibiting superior mapping accuracy compared to RF and CART in various scenarios. Specifically, in scenario 7, the GTB model achieved the highest overall accuracy (95%), Kappa coefficient (95), and F1 score (98%). This performance highlights the model’s exceptional ability to accurately classify and map land use, underscoring its superiority over alternative methodologies. (2) The autumnal image features exhibit strong discriminability for land-use types. When using the autumn features (Time 4), all three classifiers yielded their highest accuracies (OA values of 86%, 93%, and 94% using CART, RF, and GTB, respectively, and kappa values of 84%, 91%, and 92%, respectively). This finding underscores the significance of autumnal imagery in accurately differentiating land-use types. (3) The inclusion of multi-temporal image features consistently improved classification performance. With the integration of multi-temporal features, all the classifiers achieved an average increase of 4% in overall accuracy, with a minimum increase of 2% and a maximum increase of 6%. In terms of the kappa coefficient, an average improvement of 5% was observed, with the maximum improvement reaching 7%. These findings highlight the significant positive impact of incorporating multi-temporal information on the accuracy and reliability of land-use classification. The analysis of the impact of these classifiers and image features on land-use maps offers valuable insights for conducting similar land-use classifications in highly heterogeneous regions. Moreover, our findings contribute to the production of precise land-use maps in diverse environments within the study area. In addition, we achieved the accurate mapping of significant features such as Korla pear plantations, which exert a substantial influence on local economic development, as well as the alpine wetlands located in the northwest region. These outcomes have significant implications for understanding and managing complex landscapes with diverse land-use patterns.