Random forest algorithm using quartile-pattern bootstrapping for a class imbalanced problem

W Jitpakdeebodin, K Sinapiromsaran - Proceedings of the 2023 5th …, 2023 - dl.acm.org
W Jitpakdeebodin, K Sinapiromsaran
Proceedings of the 2023 5th International Conference on Image, Video and …, 2023dl.acm.org
A classification is a problem of identifying the category (or the class) of an unknown-class
observation using past historical data. One important issue in a classification is a class
imbalanced problem which typically finds in a classification where the proportion of the
target class is significantly smaller than others. A traditional classifier normally misclassifies
an instance from this target class, called the minority class, as noise due to the small number
of instances. Modification of the classification algorithm to handle a class imbalanced …
A classification is a problem of identifying the category (or the class) of an unknown-class observation using past historical data. One important issue in a classification is a class imbalanced problem which typically finds in a classification where the proportion of the target class is significantly smaller than others. A traditional classifier normally misclassifies an instance from this target class, called the minority class, as noise due to the small number of instances. Modification of the classification algorithm to handle a class imbalanced problem is a challenging task, especially for a random forest. In the random forest algorithm, the bootstrapping step is used to generate several subsets from a training data by random sampling uniformly with replacement. Most bootstrapping subsets may not even contain instances from the minority class which guarantee decision tree components to misclassify instances from the minority class. A random tree algorithm that needs to generate the bootstrapping subsets for each decision tree must assure the distribution of minority instances. This paper proposes a random forest algorithm using quartile-pattern bootstrapping by leveraging mass-ratio-variance outlier factor and minority condensation decision tree to handle this problem. The mass-ratio-variance outlier factor is a score assigned to each instance that will give a large value to an outlier and give a low value to instances surrounded by other instances in the same class. To evaluate the performance of this proposed algorithm, two synthesized datasets are used in the experiments. The experimental results show significant improvement when a dataset is imbalanced. The performance from the test dataset via F1 with the proposed algorithm is better than the performance from the traditional random forest algorithm.
ACM Digital Library
Showing the best result for this search. See all results