This study was conducted to examine the capability of topographic features and remote sensing data in combination with other auxiliary environmental variables (geology and geomorphology) to predict CEC by using different machine learning models ((random forest (RF), k-nearest neighbors (kNNs), Cubist model (Cu), and support vector machines (SVMs)) in the west of Iran. Accordingly, the collection of ninety-seven soil samples was performed from the surface layer (0-20 cm), and a number of soil properties and X-ray analyses, as well as CEC, were determined in the laboratory. The X-ray analysis showed that the clay types as the main dominant factor on CEC varied from illite to smectite. The results of modeling also displayed that in the training dataset based on 10-fold cross-validation, RF was identified as the best model for predicting CEC (R2 = 0.86; root mean square error: RMSE = 2.76; ratio of performance to deviation: RPD = 2.67), whereas the Cu model outperformed in the validation dataset (R2 = 0.49; RMSE = 4.51; RPD = 1.43)). RF, the best and most accurate model, was thus used to prepare the CEC map. The results confirm higher CEC in the early Quaternary deposits along with higher soil development and enrichment with smectite and vermiculite. On the other hand, lower CEC was observed in mountainous and coarse-textured soils (silt loam and sandy loam). The important variable analysis also showed that some topographic attributes (valley depth, elevation, slope, terrain ruggedness index-TRI) and remotely sensed data (ferric oxides, normalized difference moisture index-NDMI, and salinity index) could be considered as the most imperative variables explaining the variability of CEC by the best model in the study area.
Keywords: clay type; machine learning; mineralogy; remote sensing indices; soil modeling; valley depth.