Undersampling of approaching the classification boundary for imbalance problem

L Jiang, P Yuan, J Liao, Q Zhang… - Concurrency and …, 2023 - Wiley Online Library
L Jiang, P Yuan, J Liao, Q Zhang, J Liu, K Li
Concurrency and Computation: Practice and Experience, 2023Wiley Online Library
Using imbalanced data in classification affect the accuracy. If the classification is based on
imbalanced data directly, the results will have large deviations. A common approach to
dealing with imbalanced data is to re‐structure the raw dataset via undersampling method.
The undersampling method usually uses random or clustering approaches to trimming the
majority class in the dataset, since some data in the majority class makes not contribute to
classification model. In this paper a revised undersampling approach is proposed. First, we …
Summary
Using imbalanced data in classification affect the accuracy. If the classification is based on imbalanced data directly, the results will have large deviations. A common approach to dealing with imbalanced data is to re‐structure the raw dataset via undersampling method. The undersampling method usually uses random or clustering approaches to trimming the majority class in the dataset, since some data in the majority class makes not contribute to classification model. In this paper a revised undersampling approach is proposed. First, we perform space compression in the vertical direction of the separating hyperplane. Then, a weighted random sampling hybrid ensemble learning method is carried out to make the sampled objects spread more widely near the separating hyperplane. Experiments with 7 under‐sampling methods on 21 imbalanced datasets show that our method has achieved good results.
Wiley Online Library
Showing the best result for this search. See all results