Survey On Approaches, and Applications of The Boosting: Problems
Survey On Approaches, and Applications of The Boosting: Problems
Survey On Approaches, and Applications of The Boosting: Problems
Abstract
Cluster analysis or clustering is the task of grouping a
set of objects in such a way that objects in the same
group are more similar to each other than to those in
other groups. Boosting is iterative process to increase the
accuracy of the supervised learning algorithm i.e.
classifiers. Clustering with boosting improves quality of
mining process. In boosting, misclassified instances by
initial classifier are used for learning of subsequent
classifiers and set of classifiers is used to classifying
further instances. Usage of boosting in many
applications proved its effectiveness. Although its
success, boosting had certain problems. It could not
handle noisy data and data with troublesome areas. This
limitation of boosting is solved by cluster based boosting
in which data is clustered before boosting and depend on
the cluster boosting is performed. CBB works well on
benchmark data. Real world data contains many
irrelevant features. In CBB all features data is used for
clustering. Due to consideration of irrelevant feature
there is possibility of inaccurate clustering. Inaccurate
clustered data may result into negative effect on
boosting performance. To overcome this issue feature
selection will be applied before clustering on training
data.
Keywords: Data Classification, Boosting, Clustering,
Ensemble of Classifier
1. INTRODUCTION
In data mining classification is process to label an
unlabeled instance (testing data) of the data using
knowledge extracted from learning on labeled instances
(training data). Classifiers in the data mining can be
categorized by their learning process or representation of
extracted knowledge. Support vector machine (SVM),
decision trees like ID3, C4.5, k-nearest neighbor
classifiers, Probability based classifiers like Nave bayes
are some popular examples of the classifiers proposed in
the literature.
Once learning process is completed learned model
/classifier/
function
is
used
for
classifying/predicting/labeling the test/unlabeled data.
2. LITERATURE SURVEY
[1]Boosting helps to improve classification process and
able to provide better results. As described in this paper
[1], using ensembles for the sake of better clustering
quality there were scope for getting better results. To
achieve significant accuracy in classification there is a
need to improve partitioning process. In order to improve
the quality of partitioning, this paper proposed a robust
multi clustering solution which is based on general
principles of boosting by boosting a simple clustering
algorithm. This multiple clustering approach performs
iteration of the process of training examples which
provides multiple clustering and gain a common
partitioning. Common partitioning was achieved by using
iterations of basic clustering algorithm and aggregation of
multiple clustering results. This partition aggregation is
obtained using weighted voting. Each partition has a
weight which indicates its quality measure. Experimental
result showed that the method is promising and provide
robustness and improved performance.
[2]Boosting methodology works on inaccurate classified
instances for subsequent function learning. As it is known
that the boosting does not usually overfit the training data
even with classifiers with large size [1]. It is explained
using margins the classifier achieves on training examples
by Schapire et al. Margin represents the confidence of the
predictability of the aggregated classifier. This paper
studied Breimans arc-gv algorithm for maximizing
margins also it explains why boosting is resistant to
overfitting and how it refines the decision boundary for
accurate predictions.
[3]AdaBoost rarely suffers the problem of overfting when
less noisy data is present. The adaptive boosting algorithm
known as AdaBoost provided great success and proved as
important developments in classification methodologies
described in the literature. But in case of high noise data
Adaboost suffers with problem of overfitting. This paper
focused on AdaBoost and to improve the robustness of
AdaBoost, paper proposed two regularization schemes
from the viewpoint of mathematical programming. These
two algorithms AdaBoostKL and AdaBoostNorm2 are
proposed based on the different penalty functions and can
be considered as an extension of AdaBoostReg in term of
pursuing a soft margin achieves better performances than
AdaBoostReg. Paper showed that among the regularized
AdaBoost algorithms the performance of AdaBoostKL is
considered as a best.
[4]Generally boosting faces over-fitting problem on some
dataset and works well on some another datasets. Authors
of this paper observed that this problem happens due to
presence of overlapping classes. To overcome this problem
boosting, confusing samples are find out using Bayesian
classifier and removed during boosting phase. Authors
References
[1] D. Frossyniotis, A. Likas, and A. Stafylopatis, A
clustering methodbased on boosting, Pattern Recog.
Lett., vol. 25, pp. 641654, 2004
[2] L. Reyzin and R. Schapire, How boosting the margin
can alsoboost classifier complexity, in Proc. Int.
Conf. Mach. Learn., 2006,pp. 753760.
[3] Y. Sun, J. Li, and W. Hager, Two new regularized
adaboost algorithms,in Proc. Int. Conf. Mach. Learn.
Appl., 2004, pp. 4148
[4] A. Vezhnevets and O. Barinova, Avoiding boosting
overfittingby removing confusing samples, in Proc.
Eur. Conf. Mach. Learn.,2007, pp. 430441.
[5] A. Ganatra and Y. Kosta, Comprehensive evolution
and evaluationof boosting, Int. J. Comput. Theory
Eng., vol. 2, pp. 931936,2010.
[6] ]Cluster-Based Boosting,L. Dee Miller and LeenKiat Soh, Member, IEEE
[7] W. Gao and Z-H. Zhou, On the doubt about margin
explanation of boosting, Artif. Intell., vol. 203, pp.
118, Oct. 2013
AUTHORS
Rutuja Shirbhate received the B.E. degree in Informtion
Technology from Sant Gadge Baba Amravati University in 2012
During 2008-2012. Currently pursuing M.E. post-gratuation
degree from Savitri Phule Pune University.
Dr.S.D.Babar. is professor in Sinhgad institute of Technology,
Lonavala, Pune.
Page 50