2 DHS IEEE DM Bank

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

IEEE TRANSACTION ON ELECTRIC ELECTRONICS COMPUTER SCIENCE, BIOMEDICAL ENGINEERINGS YEAR :2018

Classification Of A Bank Data Set On Various


Data Mining Platforms
Bir Banka Müşteri Verilerinin Farklı Veri
Madenciliği Platformlarında Sınıflandırılması
Muhammet Sinan Başarslan İrem Düzdar Argun
Computer Programming, School of Advanced Vocational Industry Engineering Department, Engineering Faculty
Studies Düzce University
Doğuş University Düzce, Turkey
Istanbul, Turkey [email protected]
[email protected]

Abstract— The process of extracting meaningful rules from marketing data set. Bach et al. [2] have identified customers
big and complex data is called data mining. Data mining has an who responded positively to the campaigns by performing
increasing popularity in every field today. Data units are customer segmentation with a variety of methods such as
established in customer-oriented industries such as marketing, artificial neural networks. Sumathi and Sivanandam have used
finance and telecommunication to work on the customer churn data mining methods of financial institutions and other
and acquisition, in particular. Among the data mining methods, institutions to discover interrelationships between data [3].
classification algorithms are used in studies conducted for Keramati et al. have used decision trees, artificial neural
customer acquisition to predict the potential customers of the networks, k-nearest neighbors and support vector machines,
company in question in the related industry. In this study, bank
among the machine learning algorithms, to predict existing
marketing data set in UCI Machine Learning Data Set was used
by creating models with the same classification algorithms in
customers who would prefer competing banks using a
different data mining programs. Accuracy, precision and f- telecommunication company data located in Iran. They have
measure criteria were used to test performances of the identified the algorithm that gives the best result by comparing
classification models. When creating the classification models, the the algorithms used in the study [4].
test and training data sets were randomly divided by the holdout
method to evaluate the performance of the data set. The data set II. OVERVIEW
was divided into training and test data sets with the 60-40%, 75-
25% and 80-20% separation ratios. Data mining programs used This section addresses the data mining and classification
for these processes are the R, Knime, RapidMiner and WEKA. algorithms and data mining programs used throughout the
And, classification algorithms commonly used in these platforms study.
are the k-nearest neighbor (k-nn), Naive Bayes, and C4.5
decision tree. A. Data Mining
Data mining is the process of extracting meaningful and
Keywords—data mining; banking; customer acquisition; data structures information in the complex data sets. During this
mining programs.
procedure, data mining methods such as classification,
clustering and association rules are used. Data mining methods
I. INTRODUCTION are used to analyze, categorize, summarize and determine the
Today, data mining is used in the solution of problems in relationships using different dimensions of data [5]. These
many fields such as health, finance and education. Data mining methods are divided into two groups as predictive or
studies are being carried out in the field of health for diagnosis descriptive methods [7].
of the disease, in customer-oriented industries such as
telecommunication, insurance and banking to work on B. Classification Algorithms Used in the Study
customer churn and customer acquisition. In this research, a In this study, bank marketing data set in UCI Machine
forecasting study was carried out to see whether the campaign Learning Data Set [1] was used. Models were created using
of a bank results in new customer acquisition. Another purpose classification algorithms on this data set. Classification
of this study is to see the results of the same classification algorithms used in the study are the k-nearest neighbor (k-nn),
algorithms in different data mining programs. Obtained results Naive Bayes (NB), and C4.5 decision tree. The classification
were shown in tables in the results section. algorithms used are addressed in this section.
There are many classification algorithms continuously
developed for various applications in the literature on bank

978-1-5386-5135-3/18/$31.00 ©2018 IEEE

You might also like