1. Introduction
Debris flows involve gravity-driven motion of solid-fluid mixtures with abrupt surge fronts, free upper surfaces, variably erodible basal surfaces, and compositions that may change with position and time [
1]. They can cause great damage to the safety of people’s lives and property, public facilities and ecological environment. Due to the harsh natural environment and deforestation caused by over-exploitation of human beings, Shigatse is a typical area with active debris flows in the Tibet Autonomous Region. Debris flows can cause very high damages because the study area is densely populated. Therefore, mitigating and reducing the disasters caused by debris flows are critical to the local authorities. Most of Shigatse mountainous area is inaccessible and characterized by very steep slope such that it is very difficult to carry out field surveys. The installation and maintenance of sufficient monitoring facilities in these areas are also very challenging. Therefore, zoning debris flow susceptibility (DFS) maps through spatial data can be used to prevent and mitigate casualties and economic losses caused by debris flow events.
Susceptibility mapping of debris flow is prominent for early warning and treatments of regional debris flows. DFS assessment is based on the spatial characteristics of debris flow events and relevant factors (topography, soil, vegetation, human activities and climate). It aims to estimate the spatial distribution of future debris flow probability in a given area [
2]. Some studies have discussed and analyzed debris flows in the study area [
3,
4], focusing on the residential settlements and vicinity of roads. Assessing the susceptibility of debris flows in the whole study area is difficult due to the vast size of land (exceeding 180,000 square kilometers). The detailed spatial information on the debris flow triggering factors is also quite limited. In this case, satellite remote sensing has good application prospects because it can describe the characteristics of a large area, such as terrain, vegetation, and climate of the place where debris flow events occur. Therefore, compared with the traditional field geological survey, which requires a lot of work and resources, data from remote sensing represented in a GIS environment can fill the gap of on-site monitoring data. That is, it can be applied for the DFS researches in a more effective and economical way.
In recent years, GIS and remote sensing data have been used to conduct many studies of disasters in mountains. Researchers built their methodology analyzing data of known occurred debris flows and tested it through unknown debris flow events. Gregoretti et al. [
5] proposed a GIS-based model tested against field measurements for a rapid hazard mapping. Kim et al. [
6] used a high-resolution light detection and ranging (LiDAR) digital elevation model to calculate the volume of debris flows. Kim et al. [
7] developed a GIS-based real-time debris flow susceptibility assessment framework for highway sections. Alharbi et al. [
8] presented a GIS-based methodology for determining initiation area and characteristics of debris flow by using remote sensing data. At present, the DFS assessment methods can be mainly divided into two categories: qualitative and quantitative models. The qualitative model assigns a weight (0–1) to each debris flow triggering factor based on expert experience and knowledge or heuristics to assess the DFS [
9]. Common qualitative analysis methods include fuzzy logic [
10], analytic hierarchy [
11] and network analysis [
12] and so on. While these models have achieved a lot in the study of debris flows, they still suffer for some shortcomings, such as a high degree of subjectivity and limited applicability to specific areas [
13].
Quantitative methods usually include two types: deterministic and statistical models based on physical mechanisms. Deterministic methods are used to study the physical laws of debris flows and establish the corresponding models to simulate the DFS [
14]. The disadvantages of these models are in that they require detailed inspection data for each slope. Thus, they are only suitable for smaller areas. Statistical models are data-driven. The DFS assessment from them combines the past debris flow events with environmental characteristics. It is assumed that the environmental characteristics of the past debris flows events will lead to debris flows in the future. The models for the DFS quantitative assessment include information model [
15], evidence weight method [
16], frequency ratio [
17] and so on.
In recent years, data mining and machine learning techniques have also received extensive attention because they can more accurately describe the nonlinear relationship between DFS and triggering factors [
18], and there is no special requirement for the distribution of triggering factors. Machine learning algorithms are often superior to traditional statistical models [
19] for the following reasons. First of all, machine learning can adapt to larger datasets, while traditional statistical learning methods are more suitable for small datasets. Secondly, machine learning has better controllability and extensibility than traditional statistical models. Moreover, traditional statistical models are in general limited to certain requirements and assumptions on data, whereas machine learning methods are not. In the past three decades, common machine learning methods used for studying DFS mapping include back propagation neural network (BPNN) [
20], decision tree (DT) [
21], Bayesian network [
22], and support vector machine [
23]. With the advancement of researches, more and more models have been developed with better fitting performance. Under such circumstances, continuous verification and evaluation are still necessary for constructing and selecting a DFS evaluation model. Therefore, comparisons among various models to investigate DFS have become hot topics in academia. Since the information about debris flow occurrence is very limited and different, stability and accurate predictive power are the primary requirements for selecting the appropriate method to achieve better modeling results.
Among machine learning methods, BPNN is widely used because it carries the excellent nonlinear fitting and complex learning abilities to extract the complex relationship between debris flow triggering factors and DFS [
24]. Convolutional neural network, a classical deep learning method, has been rapidly developed in the past decade and is widely used in pattern recognition and medicine It is generally used for classification and recognition of two-dimensional images. In recent years, artificial intelligence scholars have made the convolutional neural network one-dimensional, so as to perform the speech recognition [
25], fault diagnosis [
26] and data classification [
27]. As an end-to-end model, the one-dimensional convolutional neural network can extract and classify different characteristics of debris flows directly from raw data without expert guidance. DT is another powerful prediction model with three major advantages: the model is easy to build; the final model is easy to interpret; and the model provides clear information about the relative importance of input factors [
28]. These advantages have motivated researchers to develop new DT models to better utilize the debris flow information. At the same time, integrated learning algorithms based on decision trees have also been widely concerned. Among them, the more representative ones are bagging and boosting. Kadavi et al. [
29] used four integrated algorithms: Adaboost, Bagging, LogitBoost, and Multiclass classifier to calculate and plot the DFS map. They proved that the Multiclass classifier had the best performance by verifying the AUC value of the test set.
Due to the complex terrain, geology and other mountain conditions in the study area, the multi-source and multi-data are used as much as possible to characterize the terrain and geological conditions of debris flows. Although machine learning methods have been demonstrated to achieve results with satisfactory to some extent, this paper further discusses whether they can be applied to examine the DFS. Its most important contributions are described as follows. (a) We collected debris flow events data and a variety of original remote sensing data related to topographic factors, such as soil factors, human factors and vegetation factors, and performed pre-processing operations, including projection, registration and sampling based on remote sensing and GIS technology (ArcGIS v.10.2 software). (b) We obtained the characteristics of the study area where debris flow occurred and used the data generation algorithm to merge the collected debris flow events data. (c) Based on the Python language, using the keras framework and the scikit-learn module, five DFS models (BPNN, 1D-CNN, DT, RF, and XGBoost) were constructed for the training set. The applicability of these models was examined for the Shigatse region. It is notable that this is the first comparative experiment of XGBoost and 1D-CNN in the study of DFS. (d) Cross-validation methods were used to compare the performance of artificial neural networks and tree-based models to reduce the bias and variance. (e) Statistical analyses of the comparative algorithm were done to verify whether the performance is significantly different. (f) The test set was used to evaluate the models’ prediction ability in combination with the five evaluation methods of classical Recall, Precision, F1 score, Accuracy, and AUC [
30]. (g) At the end of the study, the tree-based “feature importance” ranking was used to evaluate the main characteristic factors affecting the DFS.
2. Material and Methods
2.1. Study Area
The study area covers an area of 182,000 square kilometers in the southwestern part of China. It is located in the southwest of the Tibet Autonomous Region (27°23′~31°49′N, and 82°00′~90°20′E). As shown in
Figure 1, the Shigatse area is mainly located between the central Himalayas and central part of the Gangdise-Nyqinqin Tanggula Mountains. Its elevation is high in the northern part and southern part, including the southern Tibetan Plateau and Yarlung Zangbo River basin. The overall terrain of the Shigatse region is complex and diverse, mainly consisting of mountains, wide valleys and lake basins with a maximum elevation of over 8700 m. The study area belongs to semi-arid monsoon climate zone of inland plateau. It is featured with dry climate, less precipitation, rainy season coincident with hot season, and annual average sunshine hours of 3240 h.
The transportation mainly includes three main lines: China-Nepal (Zhongni) Highway, 318 National Road and Largo Railway passing through the study area. The geological disasters in the study area are serious, mainly including debris flows, rock collapses, and landslides. Among them, the debris flow is the most common one. A large number of debris flows exist in many parts of the study region. They directly threaten the safety of the three major transportation lines and residents’ lives and properties. According to the collected data and previous studies [
31], the debris flows in the study area are mostly caused by heavy rain.
2.1.1. Debris Flow Dataset
Collection and analysis of debris flow event datasets are prerequisites for the DFS assessment. There are 1944 debris flow sites in the study area from 1998 to 2008. Each case includes information obtained from field disaster investigation, such as time, debris flow susceptibility level, and geographic location. The information on debris flows is provided by the Tibet Meteorological Bureau. These events can be viewed through the geological cloud portal [
32].
2.1.2. Debris Flow Triggering Factors
It is significant to analyze the environmental characteristics of the debris flow events for the DFS estimation. Due to the complexity of the environment and various development stages of debris flows, the causes of debris flows are controversial. Researchers have done a lot of studies on the relationship between debris flows and triggering factors, such as topography, soil, climate, and human activities. Therefore, we have classified 15 environmental factors into five categories as shown in
Table 1.
Topographic factors that include elevation, slope, aspect, and curvature are extracted from the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) using the ArcGIS platform [
33]. The vegetation coverage is represented by the normalized difference vegetation index (NDVI), calculated from the obtained 2000–2008 MODIS images and averaged to generate the thematic layer of the annual average NDVI. Rainfall data are collected from the Tropical Rainfall Measurement Task (TRMM) [
34]. We use a rainfall dataset (No: 3B42v7) with a time interval of three hours and a spatial resolution of 0.25 degree during 1998–2008 to construct a thematic layer of annual average precipitation. The 15 types of land use information layers are provided by National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (
http://www.geodata.cn) [
35,
36]. In addition, the road vector data provided by OpenStreetMap (OSM) (
https://www.openstreetmap.org/#map=11/22.3349/113.76000) is used to calculate the distance from the road. Soil factors are provided by the Resource and Environmental Science Data Center (RESDC) of the Chinese Academy of Sciences.
Higher resolution is conducive to the topographic analysis of single-ditch debris flow, but in this work, our research focuses on the use of multiple attribute factors to analyze the disaster susceptibility of the entire study area. Golovko [
2] and Ahmed [
9] believe that 30M resolution Digital Elevation Model (DEM) can be used for the analysis of the susceptibility to mountain disasters. Therefore, DEM data with a pixel size of 30 m is used (
Figure 2a). The slope angle is a fundamental factor calculated by the DEM data and the range of it obtained by statistics is wide (0–89°) (
Figure 2b). The aspect of the slope is another key factor affecting the DFS. Because the slope surface in different directions is exposed to the wind and rain in different degrees. The aspect thematic layers are reclassified into nine categories: flat (−1), north (337.5–360°, 0–22.5°), north-east (22.5–67.5°), east (67.5–112.5°), south-east (112.5–157.5°), south (157.5–202.5°), south-west (202.5–247.5°), west (247.5–292.5°), and north-west (292.5–337.5°) (
Figure 2c). The second derivative of the slope, i.e., the curvature, helps us understand the characteristics of the basin runoff and erosion processes. In this study, three curvature functions are used to show the shape of the terrain (
Figure 2d–f). They are the curvature of the profile, the curvature of the plane, and the total curvature of the surface defined as the curvature of the maximum slope, the curvature of the contour, and the combination of the curvatures, respectively.
Human activities affect the geographical environment, which in turn influences the occurrence of debris flows. The land cover thematic map shows how human production can change natural land, and 14 land use types including farmland and forest can be identified (
Figure 2g). The DFS assessment often takes the distance from the road into account, because the road construction and maintenance cause certain change and damage to the local morphology. This variable is calculated by using the Euclidean distance calculation technique in the spatial analysis tool of ArcGIS 10.2 (
Figure 2h).
The vegetation coverage is one of the important parameters to evaluate the DFS. NDVI extracted from remote sensing images is a commonly used vegetation index for inferring the vegetation density. It is very sensitive to the presence of chlorophyll on vegetation surface (
Figure 2i). We calculated the NDVI value using the following formula:
where NIR and RED represent the near-infrared and red-band, respectively, and they are the second and first channels of the MODIS image. The NDVI value ranges between −1 and 1. The negative value indicates that the ground cover is an object highly reflective to visible light such as clouds, snow, water, etc. 0 means bare land. A positive value represents a vegetation coverage area and it increases with the vegetation coverage density.
Debris flows are usually influenced by changes in humidity caused by rainfall infiltration. Permeability can be expressed by soil type (
Figure 2j), soil texture (
Figure 2k–m) and soil erosion (
Figure 2n). Since the particle distribution determines the shape of soil water characteristic curve and affects the soil hydraulic characteristics, the soil type and texture have a great influence on the DFS. Most of the study area is covered by alpine soil, including grass mat soil, cold soil, and frozen soil. According to statistics, most of the alpine soil is brown and has a strong acidity. The alpine soil is mainly composed of silt, sand and clay fine sand, and has fast permeable and low moisturized ability. Soil erosion is sometimes used as a synonym for soil and water loss, and areas with severe erosion are susceptible to debris flows. The external causes of soil erosion are mainly hydro, wind, and freeze-thaw. Clearly, fragile soil characteristics accompanied by concentrated rainfall usually result in debris flows.
Rainfall is the main factor leading to debris flows. The study area is affected by the monsoon climate with rare precipitation and an annual average precipitation less than 1300 mm (
Figure 2o). However, statistical analyses of the geological hazard points occurring in the study area show that heavy rain and continuous rainfall are important external factors leading to geological disasters in the Shigatse area.
2.2. Methods
The main purpose of our research is to fit the relationship between the triggering factors and occurrence of debris flows. The problem can be expressed as a multi-class classification. Given a set of input quantities, the classification model attempts to label the DFS level for each pixel in the study area. The input quantities to the models are the triggering factors of the debris flow events that were collected by the local Chinese Geological Survey researchers after many years of field investigation. According to the researchers’ investigation of the debris flow sites, we obtain the values of 15 triggering factors at the corresponding positions through the value extraction function of ArcGIS v10.2 software. That is, the input of the model is a one-dimensional vector form [×1, ×2, …, ×15]. The output value of the model is the DFS level, indicating the occurrence probability of debris flows. The division criteria of regional DFS are based on the detailed survey and specification of landslide collapse debris flows by the China Geological Survey as shown in
Table 2.
According to statistics, the number of moderately susceptible units in the study area is much higher than that of the other susceptible grades (
Figure 3). Therefore, before constructing a debris flow assessment model, the oversampling technique Borderline-SMOTE algorithm should be used to eliminate the classification imbalance in the dataset. The number of each debris flow susceptibility level after oversampling is shown in
Figure 3. The original dataset is divided into training sets and test sets according to a percentage of 75 and 25%, respectively. The training set of the debris flow triggering factors is used to learn the ability to fit the actual DFS classification, and the validation set is used for adjusting the model parameters to prevent over-fitting or under-fitting. In this study, five DFS models have been established using DT, BPNN, 1D-CNN, RF, and XGBoost. Among them, the DT and BPNN have been the most commonly used machine learning models in the past few decades. The one-dimensional convolutional neural network (1D-CNN) has achieved remarkable results in one-dimensional signal processing, such as fault diagnosis and speech recognition. RF is based on the DT. It is a typical integrated algorithm in machine learning algorithms. The XGBoost is also a tree-based integration model, which counts on the residuals generated by the last iteration. To the best of our knowledge, the XGBoost and 1D-CNN have not been used for the DFS. Based on the above introductions, the research framework for the DFS in Shigatse is shown in
Figure 4.
In addition, we have also implemented other traditional machine learning algorithms, such as support vector machine, logistic regression, and naive Bayesian model, but the results are disappointing. Therefore, these methods are not introduced here. The following part is a brief introduction to the data sampling generation algorithm and five classifiers used in this paper.
2.2.1. Borderline-SMOTE
It is well known that in the model training process, when a certain class in the classified data set is of a high proportion, the classifier performance will be seriously affected. Synthetic Minority Oversampling Technique is often referred to as SMOTE that has been improved for its application in solving data imbalance problems [
37]. It is used to artificially generate vector data to achieve the consistency among each category in the dataset. In the study, it is common that most units are with moderate susceptibility. In the classification process, the scarcity of the category data with fewer samples (the minority class) is one of the main factors for over-fitting and inaccuracy. This paper chooses the boundary-based SOMTE algorithm (Borderline-SMOTE) to handle the imbalance of the data. Specifically, the k-nearest neighbor algorithm is used to calculate the nearest neighbor sample in the minority sample set in the training set. The boundary sample set is constructed according to whether the majority class in the nearest neighbor sample set is dominant. The k-nearest neighbors of the sample T
i in the boundary sample set are calculated, and the sample T
j is randomly selected. The SMOTE algorithm is used to randomly insert the feature vector between the selected neighbor samples and the initial vector. The SMOTE algorithm is shown in Equation (2),
Finally, the generated new sample is added to the original sample set.
2.2.2. Back Propagation Neural Network
Back propagation neural network (BPNN) is a mathematical model that simulates the processing information of the biological nervous system. The BPNN, as the most classic part of artificial neural network, usually has three or more network structures, including input layer, output layer and hidden layer. The main structural functions of the BPNN are the forward propagation of the signal and the back propagation of the error. The neurons between the layers are fully connected, while the neurons of each layer are not connected to each other. The network is a gradient descent method, using gradient search technology to minimize the cost function of the network. It has strong nonlinear mapping ability and is especially suitable for dealing with the intricate relationship between debris flow triggering factors and DFS susceptibility. The general operation of the network is as follows. The input sample leaves the input layer and enters the hidden layer. After being activated by the transfer function (such as Tanh, Relu, Sigmoid and Tanh used in this article.), it passes to the next hidden layer until the output layer. The output formula for each layer is as follows:
where
represents the transfer function;
θ = {
} represents the network parameter;
is the weight; and
is the threshold.
2.2.3. One-Dimensional Convolutional Neural Network
As a feedforward neural network, one-dimensional neural network (1D-CNN) is inspired by the mammalian visual cortex receptive field. The network perceives the local features and integrates the local features in high-dimensional space, and finally obtains global information. The basic structure of the convolutional neural network includes an input layer, alternating convolution layers, pooling layers, and a fully connected layer. The convolutional layer captures the information of the local connections in the input features through the convolution kernel and reduces the parameters of the model using the weight sharing principle. The convolution formula is:
where
represents the transfer function,
represents the j-th feature map of convolutional layer
l,
M represents the number of feature maps,
represents the
ith feature map of the l-1 layer,
represents convolution operation,
represents trainable convolution kernel, and
represents bias. The shape and number of one-dimensional convolution kernels can largely determine the feature-extraction ability of the overall network. The shape of the convolution kernel affects the fineness of feature extraction. The number of convolution kernels corresponds to the size of the feature layer, affecting multiple ways of feature extraction and the computational scale of the network. The pooling layer combines multiple adjacent nodes to merge similar features and performs down-sampling operation on the feature layer extracted by the convolutional layer, thereby reducing training parameters and preventing network over-fitting to enhance the generalization ability of the network. At present, the main pooling methods include maximum pooling, mean pooling, and L2-norm pooling. After the convolutional layer and pooling layer are located, the fully connected layer trains the weights and biases of the convolution kernels in the entire convolutional neural network based on the back-propagation principle. The fully connected layer structure is similar to the BP neural network mentioned in the previous section, which has a hidden layer and uses the Softmax activation function to complete the classification. The structure of the entire network is shown in
Figure 5.
2.2.4. Decision Tree
DT is a common machine learning algorithm similar to the tree structure, often used to find pattern structures in data. It aims to establish correct decision rules and determine the relationship between feature attributes and target variables without expert experience. Usually DT contains a root node, multiple internal nodes, and a set of leaf nodes from top to bottom. The leaf node corresponds to the prediction result, and the node contains all samples. The key to DT learning is to divide the best attributes. At present, the algorithms for constructing DT mainly include CART, C4.5 and ID3. This study uses a CART algorithm with better performance and efficiency [
38]. CART uses the Gini coefficient to divide the node properties and establish a DT by selecting the attributes that minimize the Gini coefficient after dividing the nodes. The Gini index is shown in Equation (5) where
k is the category and
t is the node. Finally, pruning techniques are used to deal with the over-fitting problem of the model. Upon completing the entire algorithm, we can clearly understand the internal decision-making mechanism and thus get a more objective knowledge of debris flow triggering factors.
2.2.5. Random Forest
As an integrated classification algorithm of machine learning, RF aims to improve the flexibility and accuracy of classification trees. In RF, a large number of trees are generated by constructing a base DT on multiple bootstrap sample sets of data during the running of the algorithm. Because the feature attributes of each node are randomly selected, the node characteristics are effectively reduced without increasing the deviation. Each feature subset is much smaller than the total number of features in the input space so that each DT is decorrelated. Finally, the output of the classification task is predicted by a simple voting method. RF has been constructed with a number of DTs. It has been greatly improved compared with a single DT. However, RF is as complex as the single basic DT. Therefore, RF is also a fairly effective integrated learning algorithm.
2.2.6. XGBoost
XGBoost, also known as extreme Gradient Boosting, is a gradient-enhanced integration algorithm based on classification trees or regression trees. It works the same way as Gradient Boosting, but adds features similar to Adaboost. The algorithm combines multiple DT models in an iterative way to form a classification model with a better structure and higher precision. It has achieved impressive results in many international data mining competitions and won more than two championships in the Kaggle competition. In the DFS analysis experiment, the XGBoost can classify the DFS level according to the environmental characteristics of the Shigatse region and rank the importance of the triggering factors.
The XGBoost uses both the first and second derivatives to perform Taylor expansion on the loss function and adds a regular term to it. Therefore, while considering the model accuracy, the model complexity is also well controlled. Finally, the predictive power of the model is trained by minimizing the total loss function [
39]. The objective function of the model can be expressed as Equation (6):
where
i represents the
ith sample,
represents the predicted value of the (
t − 1)th model for the sample
i,
represents the newly added
tth model,
represents the regular term, C represents some constant terms, and the outermost L() represents the error.
The optimizer aims to calculate the structure and the leaf score of the CART tree. XGBoost accelerates existing lifting tree enhancement algorithms through the cache-aware read-ahead technology, distributed external memory computing technology and AllReduce fault-tolerant tools. It can also be trained by using a graphics processing unit to provide a very high speed boost.
In this work, we can import the XGBoost toolkit in Python. The training process controls the establishment of DT by adjusting five hyper-parameters: the number of iterations, the number of trees generated, the learning rate, the maximum depth of each tree, and the L2 regularization. The Gamma hyper-parameter limits the gain amount required for segmentation.
2.3. Evaluation Measures
DFS mapping should have the ability to effectively predict the probability of future debris flows in the study area. In order to evaluate the predictive power of several machine learning algorithms, five common evaluation methods are used to quantify model performances, including Precision, Recall, F1 score, Accuracy and AUC. Finally, 293 debris flow events are applied as a test set.
In the case of the binary classification problem, four elements, i.e., TP, TN, FP and FN, are defined as follows.
TP: True Positive. Samples belonging to the TRUE class are correctly marked as positive by the model.
TN: True Negative. Samples belonging to the TRUE class are incorrectly marked as negative by the model.
FP: False Positive. Samples belonging to the FALSE class are incorrectly marked as positive by the model.
FN: False Negative. Samples belonging to the FALSE class are correctly marked as negative by the model.
2.3.1. Precision
In the binary classification task, precision represents the ratio of the correct labeled True class samples to the total number of predicted values labeled true. The formula is as follows:
Precision is expressed as a weighted average of the precision of each class in a multivariate classification task.
2.3.2. Recall
The Recall rate is the ratio of the correct labeled True sample to the total number of True samples, expressed as Equation (8) in the binary classification task.
The Recall rate represents the weighted average of the Recall rates for each category in a multivariate classification task.
2.3.3. F1 Score
The F1 score is represented by Precision and Recall, with values between 0 and 1, which represent the worst and best, respectively. The relative contributions of accuracy and recall to the F1 score are equal. The formula is defined as follows:
In a multivariate classification task, the F1 score represents a weighted average of F1 scores for each category.
2.3.4. Accuracy
In a multivariate classification task, accuracy represents the ratio of correctly classified samples to the total number of samples.
2.3.5. Area Under the Curve (AUC)
The AUC method is defined as the area under the receiver operating characteristic curve (ROC), which can give different distributions of each class. It can be used to judge classifiers’ performance. The ROC curve is plotted as the relationship between the true positive rate (TPR) and false positive rate (FPR). TPR represents the ratio of the positive instance correctly classified to the total number of all the positive instances, as represented by Equation (10):
FPR is the ratio of the positive instance misclassified to the total number of all the negative instances, as represented by Equation (11):
The AUC method is also designed to evaluate the binary classification. First, we need to convert the multivariate classification task into multiple binary classification, and then separately calculate the AUC values of the respective categories. Finally, the multivariate classification result is obtained by obtaining the average of the total AUC values [
40].
2.4. Cross-Validation
In this paper, the cross-validation method is used to complete the parameter optimization. Specifically, based on the error-based verification evaluation index, the training set is divided into k pairs of mutually equal exclusion subsets, where k − 1 pairs are used as the training sets and the remaining subset are used as the verification set. The experiment is performed by rotating the subset k times in turn, and the k verification results are averaged. In this paper, the GridSearchCV module via the scikit-learn and the cv function via the XGBoost library are used to optimize the parameters of the decision tree, random forest and the XGBoost model. In the Keras framework, the cross-validation method based on the GridSearchCV module is also used to search in the parameter space, and the optimal parameter estimation of the model in the data set is given.
2.5. Statistical Analysis
In order to compare the performance differences between the models, we conducted a statistical analysis. One-way ANOVA can be used to test whether there is a statistically significant difference in two or more unrelated groups [
41]. Model performance is evaluated by the accuracy of test results during the model training. Therefore, the accuracy group obtained by cross-validation of different algorithms is used for one-way ANOVA. The null hypothesis given by H0 tested by One-way ANOVA is as follows.
H0: The accuracy of all algorithms is not significantly different.
H1: There are significant differences in the accuracy of at least two or more algorithms.
The One-way ANOVA results in a P-value, and the P-value is the risk of rejecting the hypothesis H0.
The results can only determine whether there is a significant difference between at least one group of samples and other groups, but it is impossible to judge whether there is a difference between the two groups. Therefore, comparisons between specific groups are often performed after one-way ANOVA. The honestly significant difference (HSD) method was developed by Tukey and is favored by researchers as a simple and practical pairwise comparison technique. The main idea of HSD is to use the statistical distribution to calculate the true significant difference between two mean values and call it q-distribution [
42]. This distribution gives an exact sampling distribution of the largest difference between a set of mean values in the same population. All pairwise differences were evaluated by using this distribution. This paper uses HSD as a post-hoc analysis to test the variance homogeneity of performance indicators from different algorithms.
All statistical analyses were completed by using the Statistical Package for Social Sciences (IBM SPSS Statistics for Windows Version 22.0).
2.6. Feature Importance
The tree-based machine learning approach in this study provides a “feature importance” toolkit for ordering index factors based on the function strength of a particular problem [
43]. In the basic decision tree, feature attributes are selected for the node segmentation, and the number of times measures the importance of the attribute. For a single decision tree T, Equation (12) represents the score of importance for each predictor feature
, and the decision tree has
J − 1 internal nodes.
The selected feature is the one that provides maximal estimated improvement
in the squared error risk over that for a constant fit over the entire region. The following formula represents the importance calculation over the additive M trees.
In reality, a frequently used attribute often has a good distinguishing ability. In this study, the importance of the factors affecting the debris flows occurrences is ranked from high to low according to the characteristic attribute of the decision-making process of DFS.
4. Discussion
This study aims to estimate the regional DFS by using five highly representative machine learning models, i.e., BPNN, 1D-CNN, DT, RF, and XGBoost. According to literature, such investigations are rare in Shigatse, particularly based on 1D-CNN and XGBoost.
4.1. Model Performance
First, as seen from
Table 4, category imbalances have caused over-fitting problems on all of the above machine learning models, resulting in poor model performance. The data generated by the interpolation method is close to the original data, and we will adopt better interpolation methods to obtain more reasonable data and use other methods to alleviate the over-fitting problem of machine learning for unbalanced data [
44] in the future studies.
As shown in
Table 7, among the five evaluation methods (Recall, Precision, F1 score, Accuracy and AUC), the results show that there is a gap in performance among different algorithms. The performance rankings of the five models from high to low are XGBoost, 1D-CNN, RF, BPNN and DT. In addition, ANOVA and Tukey HSD test results show that XGBoost is significantly different from the RF, BPNN, and DT models. This also shows that XGBoost’s generalization performance and predictive ability are significantly better than RF, BPNN and DT.
In the early days, BPNN showed excellent performance in a variety of classification tasks. However, this research only demonstrates its accuracy to outperform a single DT. XGBoost is not only better than BPNN in terms of accuracy, but also in terms of speed, because BPNN has too many parameters to be adjusted. Especially, XGBoost can generate “feature importance” that allows researchers to analyze the data and BPNN is a black box model, for which much research has been done to explain the internal structure. Although XGBoost has not been used for debris flow susceptibility analysis, some scholars in the field of mountain disaster study have similar conclusions that the boost model exceeds the accuracy of BPNN by 8% [
18].
RF and XGBoost are integrated machine learning algorithms based on DT. The corresponding evaluation scores are higher than that of a single DT. Such a result shows that the integrated algorithm can make up the lack of fitting ability of a single DT. Although RF and XGBoost are both integrated machine learning algorithms, XGBoost’s overall performance is better than the RF algorithm. The RF algorithm focuses on the final voting results of all DTs, which can reduce variance, while XGBoost focuses on the residuals generated by the last iteration which can reduce both variance and bias. Performance comparisons between XGBoost and RF have been commonly obtained in many research areas. Usually, XGBoost is in the leading position [
39,
45].
Like XGoost, 1D-CNN has not been used in debris flow susceptibility, and little literature is concerned about them. The cross-validation results show that the accuracies of XGBoost and 1D-CNN are not significantly different, but the average accuracy of XGBoost is better than that of 1D-CNN. The test performance of the two models is also led by XGBoost. The main reason for this result is that CNN can capture things like image, audio and possibly text quite well by modeling the spatial temporal locality, while tree-based models solve tabular data very well.
When considering the model classification performance comprehensively, we can find that XGBoost has the best comprehensive performance with high classification accuracy, good prediction effect and less calculation time. Therefore, the XGBoost research method should attract more attention in the future evaluation of DFS.
4.2. Feature Importance
Based on the selected model to construct the DFS map, the following conclusions can be drawn from
Figure 4. The DFS in the study area is mainly medium and low, accounting for more than 50% of the entire study area. The feature attribute scores provided by the tree-based machine learning method are important for analyzing the cause of debris flows. The results have shown that the aspect, profile curvature, annual average rainfall and DEM are the main factors affecting the occurrence of debris flows in the study area. The other triggering factors such as vegetation cover and human activities also have a certain impact on debris flow, while the contribution of soil factors to the modeling is relatively weak. According to the evaluation results of the model feature attributes, the targeted analysis and investigation of the debris flow triggering factors in the study area can be carried out. Based on historical data statistics, analysis of the main triggering factors is conducted.
In the study area, different slope directions lead to differences in hydrothermal conditions, which in turn affect the geographical element distributions such as vegetation, hydrology, soil, and topography. Some Chinese scholars have also examined the relationship between vegetation and debris flow erosion and suggested that the slope direction largely determines the vegetation type and soil type [
46]. Feature selection show that the slope aspect has the greatest impact on the distribution of debris flows. According to the actual debris flow events statistics (
Figure 10a), the distribution of debris flow events in each aspect is given, and the number of debris flows on the southwest slope and the east slope are relatively large. Among them, the debris flow events on the southwest slope are the densest as well as the distribution location of highly occurrence-prone debris flow events.
The curvature of the slope describes its shape, which controls the formation of debris flow events by affecting the gravitational potential energy and water collection conditions. The feature selection results show that the profile curvature can be better used to estimate the DFS than the plane curvature and total curvature. The shape of the slope is usually linear, concave or convex, indicating the mid-term evolution of the landscape, the maturity of the landscape and the period of violence of the landscape, respectively. According to the statistic (
Figure 10b), the debris flows are mainly concentrated in the area where the curvature is negative, i.e., the surface of the pixel is convex. This statistical result is consistent with the results of Guo et al. [
47] on mountain debris flow events.
The overall elevation of the Shigatse region is very high and the valley is deep. Statistics on the distribution of debris flow events at different altitudes indicate that debris flow events are mainly distributed at altitudes between 3600–4600 m (
Figure 10c). High-susceptibility and medium-susceptibility levels are distributed at the altitude of about 4000 m. The reason is that the region at the altitude between 3600 and 4600 m is very steep and densely populated. As a result, human activities have a huge impact on it. In addition, the area within this altitude is mainly eroded by flowing water with serious accumulation of loose materials and debris flow events particularly develop. Tang et al. [
31] also got similar conclusions for the investigation of the study area.
Rainfall is the main triggering factor for debris flows. It mainly promotes the mountain debris flows by increasing soil bulk density and reducing cohesion and internal friction. The study area is mainly a rain-sparing region with annual average rainfall less than 1350 mm (
Figure 10d). However, the debris flow has a very significant correlation with the rainfall season in Shigatse. Although the annual precipitation is not high, the temporal distribution is concentrated. That is, heavy rain season is also the season when debris flows frequently occur. According to statistics, debris flows in the flood season in Shigatse accounts for more than 70% of the total debris flows.
Machine learning algorithms can handle large-scale data. In addition, they are more objective than the traditional qualitative evaluation methods and can support making decisions without expert system support. However, there are some inherent problems. For example, the data preprocessing workload is large and time-consuming, and the data processing results have a great impact on the classifier.