Application of Hybrid Sampling Method in the Prediction of Telecom Customer Churn

S Ye, G Xia, X Zhang, X Li - Proceedings of the 2021 4th International …, 2021 - dl.acm.org
S Ye, G Xia, X Zhang, X Li
Proceedings of the 2021 4th International Conference on Algorithms …, 2021dl.acm.org
In the telecommunications industry, there is a widespread problem of data imbalance. This
problem seriously affects the prediction results, making it impossible for telecommunications
operators to accurately find potential lost customers, causing a lot of losses. Aiming at the
problem of economic loss caused by the imbalance of telecommunications customer data
that affects model prediction performance, this paper proposes two hybrid algorithms DB-
QCS (DBSCAN Quadrilateral centroid SMOTE) and KM-QCS (K-Means Quadrilateral …
In the telecommunications industry, there is a widespread problem of data imbalance. This problem seriously affects the prediction results, making it impossible for telecommunications operators to accurately find potential lost customers, causing a lot of losses. Aiming at the problem of economic loss caused by the imbalance of telecommunications customer data that affects model prediction performance, this paper proposes two hybrid algorithms DB-QCS (DBSCAN Quadrilateral centroid SMOTE) and KM-QCS (K-Means Quadrilateral centroid SMOTE) to solve the above problems, The hybrid algorithm mainly solves the problem of further increasing the marginalization of the sample distribution and introducing noise when the SMOTE algorithm synthesizes new samples. The main idea is to first use the under-sampling method to delete outliers or edge samples in most classes of samples, thereby reducing the number of synthesized new samples to solve the problem of introducing excessive noise. Then, the problem of marginalization of the sample distribution is solved by limiting the synthesis area of the new sample during oversampling, and finally the sampled data set is used for classification training. A large number of experiments on 5 unbalanced telecom customer data sets show that the hybrid algorithm achieves higher F-measure, G-mean and AUC values compared with the SMOTE algorithm.
ACM Digital Library
Showing the best result for this search. See all results