1. Introduction
The soil organic carbon (SOC) pool in farmland is the most active component of the soil carbon pool because it responds quickly to changes in agricultural practices [
1,
2]. It has great potential to become an important carbon sink if appropriate agricultural measures are taken, which will help alleviate the greenhouse effect and maintain ecological balance [
3,
4,
5,
6,
7]. SOC content is an important index used to evaluate soil fertility and the quality of farmland, and its changes reflect the fixation or loss of soil carbon [
8,
9]. Therefore, accurate mapping of SOC content in farmland is of pivotal importance for soil quality evaluation, agricultural management, and climate change mitigation.
Identifying the dominant factors of SOC is the core of accurate SOC mapping [
10,
11,
12]. However, in low-relief farmlands, long-term cultivation and flat terrain weaken the relationship between soil properties and natural factors. Some studies find that natural factors (e.g., soil type, climate factors, and terrain factors) can only slightly improve the accuracy of the SOC prediction model [
13,
14,
15]. The dominant factors of SOC in low-relief farmlands remain unclear. Several studies have highlighted the importance of agricultural activities and landscape patterns in the spatial variation of farmland SOC [
16,
17]. Time-series vegetation indexes, crop type, crop rotation, crop phenological information, cropping systems, and landscape metrics have been successfully used to improve the mapping accuracy of SOC [
1,
14,
18,
19,
20,
21,
22]. Hence, the integration of natural factors, agricultural activities, and landscape metrics may better explain SOC variation and improve SOC mapping accuracy in low-relief agricultural areas.
Commonly used data mining models, including stepwise linear regression (SLR), support vector machine, artificial neural network, gradient boosting regression tree, and random forest (RF), are global models that assume that the relationship between the dependent variable and covariates is homogeneous across the region [
23,
24,
25,
26,
27,
28]. However, many studies have found that the relationship between soil properties and environmental variables is often moderated by third-party variables [
29,
30,
31,
32,
33]. Alidoust, Afyuni, Hajabbasi, and Mosaddeghi [
32] found that the influencing factors of SOC in western central Iran vary considerably among different land uses, and the explanatory power of environmental variables on SOC variation of forests is higher than that of farmland and grassland. Zhou [
34] determined that the relationship between SOC and environmental variables depends on soil types in the hilly agricultural areas of Chongqing, China. The two studies reveal a stratified heterogeneous relationship that the global prediction models cannot delineate. Therefore, effective methods should be employed to reveal the stratified heterogeneous relationship between SOC and environmental variables.
The Cubist model is a type of ensemble learning regression tree model based on the Quinlan M5 algorithm [
35,
36]. The Cubist model delineates the stratified heterogeneous relationship between the target variable and covariates by adopting a stratified linear regression strategy. Specifically, it creates stratification rules that divide the data into homogeneous strata and obtains the linear regression results of each stratum. This strategy avoids the interference of subjective stratification on fitting accuracy and can easily determine the main controlling factors of SOC in each stratum. The Cubist model is also notable because of its high accuracy. Several studies have found that the Cubist model outperforms the RF model in the spatial estimation of soil properties [
26,
37,
38,
39,
40]. However, scholars mainly focus on the high prediction accuracy of the Cubist model, ignoring its vital role in revealing stratified heterogeneous relationships, and failing to explore the determinants of SOC in different subregions.
Using 242 topsoil samples collected from Jianghan Plain, China, this study aims to explore the stratified heterogeneous relationship between SOC and natural factors, agricultural activities, and landscape metrics, determine the dominant factors of SOC in each stratum, and predict the spatial distribution of SOC in low-relief farmlands using the Cubist model. Ordinary kriging (OK), SLR, and RF were used as references.
4. Discussion
4.1. Relationship between SOC and Environmental Variables
The results of the SLR, RF, and Cubist models showed that the variation in SOC was related to terrain factors, agricultural activities, and landscape patterns. The latter two play a more important role in controlling SOC variation than natural factors.
Terrain factors influence soil moisture, soil erosion, and deposition by affecting runoff as well as soil temperature by affecting solar radiation intensity, thus directly and indirectly affecting the spatial distribution of SOC [
66,
67,
68,
69]. Particularly after long-term cultivation, the soil and water conservation capacity of farmland soil is weakened, and soil erosion is prone to occur [
70,
71]. The soil is carried along the slope with runoff from high places and deposited in low-lying areas. As a result, the SOC content in the low-lying area is often higher than that at the top of the slope [
72,
73,
74]. Moreover, terrain affects the spatial distribution of SOC by influencing farmers’ decision making on farmland land use types. In the study area, farmers often choose low-lying farmland as a paddy field to facilitate water storage. Irrigated land is often located near rural residential areas where the terrain is relatively high. Given that the mean value of ln (SOC) in paddy fields is significantly higher than that in irrigated land, this terrain-based land use decision strategy intensifies the difference in SOC content between high- and low-lying areas. Therefore, slope and elevation showed a significant relation with ln (SOC), even in such low-relief agricultural areas.
Agricultural activities affect the spatial distribution of SOC by controlling the input of soil carbon and the decomposition rate of SOC [
13,
75]. A proper cropping and management system is conducive to farmland carbon fixation; otherwise, it leads to SOC loss [
76,
77,
78,
79]. In comparison with irrigated land, paddy fields have a higher input of stubble, a higher proportion of large aggregates, and weaker soil respiration caused by flooding environment [
80]. As such, paddy fields are more conducive to carbon sequestration. In this study, we found that the ln (SOC) content of paddy fields was significantly higher than that of irrigated lands, which was consistent with the findings of most previous studies [
15,
81,
82,
83,
84]. MCI reflects tillage intensity and thus becomes an important indicator of SOC variation. For conventional tillage, the increase in tillage intensity accelerates the decomposition of large aggregates in the soil, causing SOC to be directly exposed to the air [
70,
71]. This increases the mineralization rate of SOC and is not conducive to SOC sequestration in farmland soil. The results of this study confirmed that winter fallow was more conducive to SOC accumulation than rotation with winter wheat.
Straw return has a positive effect on SOC sequestration [
85,
86,
87]. On the one hand, the decomposition of crop residuals provides SOM, N, P, and K to the soil and thus improves soil fertility. On the other hand, the crop residuals form humus under the action of microorganisms, which enhances soil cementation and facilitates the formation of soil macro-aggregates. Straw return can also reduce bulk density, increase soil porosity, improve soil physical structure, and enhance soil and water conservation capacity [
85,
86,
88,
89]. As a result, the amount of straw return is often positively correlated with SOC content. In this study, NDI also showed a significantly positive relation with ln (SOC), and its importance was second only to land use and MCI, highlighting the significance of straw return to farmland carbon sequestration.
In this study, landscape metrics were confirmed to be effective indicators of SOC. The IJI was negatively correlated with ln (SOC), which indicated that farmland fragmentation was not conducive to carbon sequestration. WB and IC were significantly positively related to ln (SOC). This phenomenon is probably because the high percentages of water bodies and irrigated canals ensure that local farmlands can be better irrigated, which promotes vegetation growth and thus increases the input of carbon into the soil [
90,
91,
92].
The aforementioned results revealed that terrain factors continued to affect SOC spatial distribution, even in low-relief areas. Human activities, including agricultural activities and landscape patterns, were global dominant factors of SOC variation.
4.2. Stratified Heterogeneous Relationship between SOC and Environmental Variables
The fitting R2 of the Cubist model was higher than that of the global regression model (i.e., SLR and RF), highlighting the consideration of stratified heterogeneous relationships between SOC and environmental variables. The stratification rules show that the relationship between SOC and environmental variables varies with different cropping systems.
The dominant factors of SOC in paddy fields and irrigated fields differed. The SOC of irrigated land was mainly affected by Dis_Lake, WB, and IC, and these covariates were associated with water and irrigation. The SOC of paddy fields was affected by various variables, including elevation, slope, NDI, IJI, WB, and IC. Comparing the absolute values of the coefficients of these variables in the two regressions, we found that the irrigated land sample was more affected by irrigation-related factors. This phenomenon may be because the soil moisture of irrigated land is low, so the increase in soil moisture has a more obvious effect on the decrease in soil temperature and the growth of crops [
92,
93,
94]. However, in paddy fields, SOC is not sensitive to subtle changes in soil moisture owing to the long-term flooding environment [
80,
95,
96]. Therefore, irrigation-related factors have a greater impact on the SOC of irrigated lands. These findings indicate that special attention should be paid to the irrigation of irrigated land.
The relationship between SOC and NDI varies under different MCIs. Specifically, the coefficient of NDI was much larger when MCI was equal to 1 than when MCI was equal to 2, indicating that straw return played a less important role in SOC accumulation when rotating with winter wheat. This may be because the amount of stubble in various summer crops is quite different, which makes a big difference in the amount of stubble in different fields [
97,
98]. If rotating with winter wheat, then the amount of wheat straw returned from different fields had little difference. Straw return had a greater influence on the spatial variation of SOC when MCI = 1. These findings highlight the importance of straw return on carbon sequestration, especially when only summer crops are planted.
4.3. Comparison of Model Performance
The model evaluation results showed that the OK model performed poorly in SOC estimation. This may be because the spatial distribution of SOC is non-stationary owing to the influence of various natural and human activities, which violates the intrinsic assumption of the OK model [
55,
99,
100]. Moreover, the average distance of the sampling points is 687 m, which is larger than the range of SOC (i.e., 180 m), resulting in a moderate spatial dependence of SOC (nugget-to-sill ratio is 33.46%) [
101]. In summary, the spatial non-stationarity and limited spatial dependence of SOC were the reasons for poor prediction accuracy of OK.
The SLR, RF, and The Cubist models outperformed the OK model. Given that the validity of regression models relies highly on the choice of environmental variables, these results demonstrate the effectiveness of agricultural activities and landscape metrics in SOC mapping. The Cubist model outperformed the SLR and RF models, which emphasized the improvement of SOC estimation by considering stratified heterogeneous relationships. Several studies have found that the prediction accuracy of regressions can be improved by adding residuals interpolated by the OK model [
102,
103,
104,
105]. However, in this study, we did not use such a strategy because of the low spatial dependence of the regression residuals.
4.4. Limitations and Future Work
In this study, natural factors, agricultural activities, and landscape metrics could explain the 37.0% variation in ln (SOC), in which agricultural activities played more important roles. Other agricultural activities, such as tillage methods [
70,
71] and fertilization [
106,
107], may further enhance the explanatory power of the Cubist model. However, obtaining the spatial distribution of tillage methods and fertilization is difficult using current optical remote sensing technology. Their impact on the spatial variation of farmland SOC and their application in SOC mapping need to be further explored.
This study provided a framework to explore the influencing factors of farmland soil properties in plains and to determine the dominant factors of farmland SOC on a regional scale. However, the influencing factors of SOC may vary with the expansion of the study area. For example, natural factors, such as climate and soil type, may play more important roles on a larger scale. The dominant factors of farmland SOC on a large scale should be investigated.
5. Conclusions
This research explored the global and stratified dominant factors of farmland SOC in plains and estimated the spatial distribution of SOC using SLR, RF, and Cubist models. The land use types, MCI, NDI, and WB were the global dominant factors of SOC, indicating that paddy field, low cropping intensity, straw return, and sufficient irrigation facilities are conducive to farmland SOC accumulation. The dominant factors of SOC vary in different cropping systems. Compared with the SOC of paddy fields, the SOC of irrigated land was more affected by irrigation-related factors. The effect of straw return on SOC was diverse under different cropping intensities. These findings reveal the stratified heterogeneous relationship between SOC and covariates and highlight the importance of farmland zoning management. Cubist model outperformed other models, which demonstrated its effectiveness in explaining the SOC variation and SOC mapping in low-relief farmlands.