计算机科学 ›› 2015, Vol. 42 ›› Issue (Z11): 63-66.
张枭山,罗强
ZHANG Xiao-shan and LUO Qiang
摘要: 在面对现实中广泛存在的不平衡数据分类问题时,大多数 传统分类算法假定数据集类分布是平衡的,分类结果偏向多数类,效果不理想。为此,提出了一种基于聚类融合欠抽样的改进AdaBoost分类算法。该算法首先进行聚类融合,根据样本权值从每个簇中抽取一定比例的多数类和全部的少数类组成平衡数据集。使用AdaBoost算法框架,对多数类和少数类的错分类给予不同的权重调整,选择性地集成分类效果较好的几个基分类器。实验结果表明,该算法在处理不平衡数据分类上具有一定的优势。
[1] He H,Garcia E A.Learning from imbalanced data[J].IEEETransactions on Knowledge and Data Engineering,2009,21(9):1263-1284 [2] Chan P K,Stolfo S J.Toward Scalable Learning with Non-Uniform Class and Cost Distributions:A Case Study in Credit Card Fraud Detection[C]∥KDD.1998:164-168 [3] Kubat M,Holte R C,Matwin S.Machine learning for the detection of oil spills in satellite radar images[J].Machine learning,1998,30(2/3):195-215 [4] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of artificial intelligence research,2002,16(1):321-357 [5] Han H,Wang W Y,Mao B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning [M]∥Advances in intelligent computing.Springer Berlin Heidelberg,2005:878-887 [6] 刘余霞,刘三民,刘涛,等.一种新的过采样算法 DB_SMOTE[J].计算机工程与应用,2014,50(6):92-95 [7] Kubat M,Matwin S.Addressing the curse of imbalanced training sets:one-sided selection[C]∥ICML.1997:179-186 [8] 程险峰,李军,李雄飞.一种基于欠采样的不平衡数据分类算法[J].计算机工程,2011,37(13):147-149 [9] Yen S J,Lee Y S.Cluster-based under-sampling approaches for imbalanced data distributions[J].Expert Systems with Applications,2009,36(3):5718-5727 [10] Freund Y,Schapire R E.A decision-theoretic generalization ofon-line learning and an application to boosting[J].Journal of computer and system sciences,1997,55(1):119-139 [11] Sun Y,Kamel M S,Wong A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378 [12] Seiffert C,Khoshgoftaar T M,Van Hulse J,et al.RUSBoost:improving classification performance when training data is skewed[C]∥19th International Conference on Pattern Recognition,2008(ICPR 2008).IEEE,2008:1-4 [13] Ditterrich T G.Machine learning research:four current direction[J].Artificial Intelligence Magzine,1997,18(4):97-136 [14] Chawla N V,Lazarevic A,Hall L O,et al.SMOTEBoost:Improving prediction of the minority class in boosting[M]∥Know-ledge Discovery in Databases(PKDD 2003).Springer Berlin Heidelberg,2003:107-119 [15] 李雄飞,李军,董元方,等.一种新的不平衡数据学习算法 PCBoost[J].计算机学报,2012,35(2):202-209 [16] Minaei-Bidgoli B,Topchy A P,Punch W F.A Comparison of Resampling Methods for Clustering Ensembles[C]∥IC-AI.2004:939-945 [17] Hadjitodorov S T,Kuncheva L I,Todorova L P.Moderate diversity for better cluster ensembles[J].Information Fusion,2006,7(3):264-275 [18] Strehl A,Ghosh J.Cluster ensembles-a knowledge reuseframework for combining multiple partitions[J].The Journal of Machine Learning Research,2003,3:583-617 [19] Fred A L N,Jain A K.Data clustering using evidence accumulation[C]∥Proceedings 16th International Conference on Pattern Recognition,2002.IEEE,2002:276-280 [20] Topchy A,Jain A K,Punch W.A mixture model of clustering ensembles[C]∥Proc.SIAM Intl.Conf.on Data Mining.2004 [21] MacQueen J.Some methods for classification and analysis ofmultivariate observations[J].Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,1967,1(14):281-297 [22] Fred A.Finding consistent clusters in data partitions[M]∥Multiple classifier systems.Springer Berlin Heidelberg,2001:309-318 |
No related articles found! |
|