Data Mining Models
Data Mining Models
Second Edition

David L. Olson
Data mining has become the fastest growing topic of interest in business programs in
the past decade. This book is intended to first describe the benefits of data mining in
business, describe the process and typical business applications, describe the workings
of basic data mining models, and demonstrate each with widely available free
software. This second edition updates Chapter 1, and adds more details on Rattle data
mining tools.
The book focuses on demonstrating common business data mining applications. It
provides exposure to the data mining process, to include problem identification, data
management, and available modeling tools. The book takes the approach of
demonstrating typical business data sets with open source software. KNIME is a very
easy-to-use tool, and is used as the primary means of demonstration. R is much more
powerful and is a commercially viable data mining tool. We will demonstrate use of R
through Rattle. We also demonstrate WEKA, which is a highly useful academic
software, although it is difficult to manipulate test sets and new cases, making it
problematic for commercial use. We will demonstrate methods with a small but
typical business dataset. We use a larger (but still small) realistic business dataset for
Chapter 9.

big data, business analytics, clustering, data mining, decision trees, neural network models, regression models
models, regression models
Chapter 1 Data Mining in Business
Chapter 2 Business Data Mining Tools
Chapter 3 Data Mining Processes and Knowledge Discovery
Chapter 4 Overview of Data Mining Techniques
Chapter 5 Data Mining Software
Chapter 6 Regression Algorithms in Data Mining
Chapter 7 Neural Networks in Data Mining
Chapter 8 Decision Tree Algorithms
Chapter 9 Scalability

I wish to recognize some of the many colleagues I have worked and published with,
specifically Yong Shi, Dursun Delen, Desheng Wu, and Ozgur Araz. There are many
others I have learned from in joint efforts as well, both students and colleagues, all of
whom I wish to recognize with hearty thanks.

Data Mining in Business

Data mining refers to the analysis of large quantities of data that are stored in
computers. Bar coding has made checkout very convenient for us and provides retail
establishments with masses of data. Grocery stores and other retail stores are able to
quickly process our purchases and use computers to accurately determine the product
prices. These same computers can help the stores with their inventory management,
by instantaneously determining the quantity of items of each product on hand. -
Computers allow the store’s accounting system to more accurately measure costs and
determine the profit that store stockholders are concerned about. All of this
information is available based on the bar coding information attached to each product.
Along with many other sources of information, information gathered through bar
coding can be used for data mining analysis.
The era of big data is here, with many sources pointing out that more data are
created over the past year or two than was generated throughout all prior human
history. Big data involves datasets so large that traditional data analytic methods no
longer work due to data volume. Davenport1 gave the following features of big data:

Data too big to fit on a single server

Data too unstructured to fit in a row-and-column database
Data flowing too continuously to fit into a static data warehouse
Lack of structure is the most important aspect (even more than the size)
The point is to analyze, converting data into insights, innovation, and
business value

Big data has been said to be more about analytics than about the data itself. The era
of big data is expected to emphasize focusing on knowing what (based on correlation)
rather than the traditional obsession for causality. The emphasis will be on
discovering patterns offering novel and useful insights.2Data will become a raw
material for business, a vital economic input and source of value. Cukier and Mayer–
Scheonberger3 cite big data providing the following impacts on the statistical body of
theory established in the 20th century: (1) There is so much data available that
sampling is usually not needed (n = all). (2) Precise accuracy of data is, thus, less
important as inevitable errors are compensated for by the mass of data (any one
observation is flooded by others). (3) Correlation is more important than causality—
most data mining applications involving big data are interested in what is going to
happen, and you don’t need to know why. Automatic trading programs need to detect
the trend changes, not figure out that the Greek economy collapsed or the Chinese
government will devalue the Renminbi (RMB). The programs in vehicles need to
detect that an axle bearing is getting hot and the vehicle is vibrating and the wheel
should be replaced, not whether this is due to a bearing failure or a housing rusting
There are many sources of big data.4 Internal to the corporation, e-mails, blogs,
enterprise systems, and automation lead to structured, unstructured, and
semistructured information within the organization. External data is also widely
available, much of it free over the Internet, but much also available from the
commercial vendors. There also is data obtainable from social media.
Data mining is not limited to business. Both major parties in the U.S. elections
utilize data mining of potential voters.5 Data mining has been heavily used in the
medical field, from diagnosis of patient records to help identify the best practices.6
Business use of data mining is also impressive. Toyota used data mining of its data
warehouse to determine more efficient transportation routes, reducing the time to
deliver cars to their customers by an average 19 days. Data warehouses are very large
scale database systems capable of systematically storing all transactional data
generated by a business organization, such as Walmart. Toyota also was able to
identify the sales trends faster and to identify the best locations for new dealerships.
Data mining is widely used by banking firms in soliciting credit card customers, by
insurance and telecommunication companies in detecting fraud, by manufacturing
firms in quality control, and many other applications. Data mining is being applied to
improve food product safety, criminal detection, and tourism. Micromarketing targets
small groups of highly responsive customers. Data on consumer and lifestyle data is
widely available, enabling customized individual marketing campaigns. This is
enabled by customer profiling, identifying those subsets of customers most likely to
be profitable to the business, as well as targeting, determining the characteristics of
the most profitable customers.
Data mining involves statistical and artificial intelligence (AI) analysis, usually
applied to large-scale datasets. There are two general types of data mining studies.
Hypothesis testing involves expressing a theory about the relationship between actions
and outcomes. This approach is referred to as supervised. In a simple form, it can be
hypothesized that advertising will yield greater profit. This relationship has long been
studied by retailing firms in the context of their specific operations. Data mining is
applied to identifying relationships based on large quantities of data, which could
include testing the response rates to various types of advertising on the sales and
profitability of specific product lines. However, there is more to data mining than the
technical tools used. The second form of data mining study is knowledge discovery.
Data mining involves a spirit of knowledge discovery (learning new and useful
things). Knowledge discovery is referred to as unsupervised. In this form of analysis,
a preconceived notion may not be present, but rather relationships can be identified by
looking at the data. This may be supported by visualization tools, which display data,
or through fundamental statistical analysis, such as correlation analysis. Much of this
can be accomplished through automatic means, as we will see in decision tree
analysis, for example. But data mining is not limited to automated analysis.
Knowledge discovery by humans can be enhanced by graphical tools and
identification of unexpected patterns through a combination of human and computer

Requirements for Data Mining

Data mining requires identification of a problem, along with the collection of data that
can lead to better understanding, and computer models to provide statistical or other
means of analysis. A variety of analytic computer models have been used in data
mining. In the later sections, we will discuss various types of these models. Also
required is access to data. Quite often, systems including data warehouses and data
marts are used to manage large quantities of data. Other data mining analyses are done
with smaller sets of data, such as can be organized in online analytic processing
Masses of data generated from cash registers, scanning, and topic-specific databases
throughout the company are explored, analyzed, reduced, and reused. Searches are
performed across different models proposed for predicting sales, marketing response,
and profit. The classical statistical approaches are fundamental to data mining.
Automated AI methods are also used. However, a systematic exploration through
classical statistical methods is still the basis of data mining. Some of the tools
developed by the field of statistical analysis are harnessed through automatic control
(with some key human guidance) in dealing with data.
Data mining tools need to be versatile, scalable, capable of accurately predicting the
responses between actions and results, and capable of automatic implementation.
Versatile refers to the ability of the tool to apply a wide variety of models. Scalable
tools imply that if the tools works on a small dataset, it should also work on a larger
dataset. Automation is useful, but its application is relative. Some analytic functions
are often automated, but human setup prior to implementing procedures is required. In
fact, analyst judgment is critical to successful implementation of data mining. Proper
selection of data to include in searches is critical. Data transformation also is often
required. Too many variables produce too much output, while too few can overlook
the key relationships in the data.
Data mining is expanding rapidly, with many benefits to business. Two of the most
profitable application areas have been the use of customer segmentation by marketing
organizations to identify those with marginally greater probabilities of responding to
different forms of marketing media, and banks using data mining to more accurately
predict the likelihood of people to respond to the offers of different services offered.
Many companies are using this technology to identify their blue-chip customers, so
that they can provide them with the service needed to retain them.
The casino business has also adopted data warehousing and data mining.
Historically, casinos have wanted to know everything about their customers. A typical
application for a casino is to issue special cards, which are used whenever the
customer plays at the casino, or eats, or stays, or spends money in other ways. The
points accumulated can be used for complimentary meals and lodging. More points
are awarded for activities that provide Harrah’s more profit. The information obtained
is sent to the firm’s corporate database, where it is retained for several years. Instead
of advertising the loosest slots in town, Bellagio and Mandalay Bay have developed
the strategy of promoting luxury visits. Data mining is used to identify high rollers, so
that these valued customers can be cultivated. Data warehouses enable casinos to
estimate the lifetime value of the players. Incentive travel programs, in-house
promotions, corporate business, and customer follow-up are the tools used to maintain
the most profitable customers. Casino gaming is one of the richest datasets available.
Very specific individual profiles can be developed. Some customers are identified as
those who should be encouraged to play longer. Other customers are identified as
those who are discouraged from playing.

Business Data Mining

Data mining has been very effective in many business venues. The key is to find
actionable information or information that can be utilized in a concrete way to
improve profitability. Some of the earliest applications were in retailing, especially in
the form of market basket analysis. Table 1.1 shows the general application areas we
will be discussing. Note that they are meant to be representative rather than

Table 1.1 Data mining application areas

Application area Applications Specifics
Retailing Affinity positioning Position products effectively
Cross-selling; develop and maintain Find more products for customers
customer loyalty
Banking Customer relationship management (CRM) Identify customer value
Develop programs to maximize the
Credit card management Lift Identify effective market segments
Churn Identify likely customer turnover
Insurance Fraud detection Identify claims meriting -
Telecommunications Churn Identify likely customer turnover
Telemarketing Online information Aid telemarketers with easy data
Recommender systems access
Human resource - Churn (Retention) Identify potential employee turnover

Data mining offers retailers, in general, and grocery stores, specifically, valuable
predictive information from mountains of data. Affinity positioning is based on the
identification of products that the same customer is likely to want. For instance, if you
are interested in cold medicine, you probably are interested in tissues. Thus, it would
make marketing sense to locate both items within easy reach of the other. Cross-
selling is a related concept. The knowledge of products that go together can be used
by marketing the complementary product. Grocery stores do that through position
product shelf location. Retail stores relying on advertising can send ads for sales on
shirts and ties to those who have recently purchased suits. These strategies have long
been employed by wise retailers. Recommender systems are effectively used by
Amazon and other online retailers. Data mining provides the ability to identify less
expected product affinities and cross-selling opportunities. These actions develop and
maintain customer loyalty.
Grocery stores generate mountains of cash register data that require automated tools
for analysis. Software is marketed to service a spectrum of users. In the past, it was
assumed that cash register data was so massive that it couldn’t be quickly analyzed.
However, the current technology enables the grocers to look at customers who have
defected from a store, their purchase history, and characteristics of other potential


The banking industry was one of the first users of data mining. Banks are turning to
technology to find out what motivates their customers and what will keep their
business (customer relationship management—CRM). CRM involves the application
of technology to monitor customer service, a function that is enhanced through data
mining support. Understanding the value a customer provides the firm makes it
possible to rationally evaluate if extra expenditure is appropriate in order to keep the
customer. There are many opportunities for data mining in banking. Data mining
applications in finance include predicting the prices of equities involve a dynamic
environment with surprise information, some of which might be inaccurate and some
of which might be too complex to comprehend and reconcile with intuition.
Data mining provides a way for banks to identify patterns. This is valuable in
assessing loan applications as well as in target marketing. Credit unions use data
mining to track member profitability as well as monitoring the effectiveness of
marketing programs and sales representatives. They also are used in the effort of
member care, seeking to identify what credit union customers want in the way of

Credit Card Management

The credit card industry has proven very profitable. It has attracted many card issuers,
and many customers carry four or five cards. Balance surfing is a common practice,
where the card user pays an old balance with a new card. These are not considered
attractive customers, and one of the uses of data warehousing and data mining is to
identify balance surfers. The profitability of the industry has also attracted those who
wish to push the edge of credit risk, both from the customer and the card issuer
perspective. Bank credit card marketing promotions typically generate 1,000
responses to mailed solicitations, a response rate of about 1 percent. This rate is
improved significantly through data mining analysis.
Data mining tools used by banks include credit scoring. Credit scoring is a
quantified analysis of credit applicants with respect to the prediction of on-time loan
repayment. A key is a consolidated data warehouse, covering all products, including
demand deposits, savings, loans, credit cards, insurance, annuities, retirement
programs, securities underwriting, and every other product banks provide. Credit
scoring provides a number for each applicant by multiplying a set of weighted
numbers determined by the data mining analysis multiplied times ratings for that
applicant. These credit scores can be used to make accept or reject recommendations,
as well as to establish the size of a credit line. Credit scoring used to be conducted by
bank loan officers, who considered a few tested variables, such as employment,
income, age, assets, debt, and loan history. Data mining makes it possible to include
many more variables, with greater accuracy.
The new wave of technology is broadening the application of database use and
targeted marketing strategies. In the early 1990s, nearly all credit card issuers were
mass-marketing to expand their card-holder bases. However, with so many cards
available, broad-based marketing campaigns have not been as effective as they
initially were. Card issuers are more carefully examining the expected net present
value of each customer. Data warehouses provide the information, giving the issuers
the ability to try to more accurately predict what the customer is interested in, as well
as their potential value to the issuer. Desktop campaign management software is used
by the more advanced credit card issuers, utilizing data mining tools, such as neural
networks, to recognize customer behavior patterns to predict their future relationship
with the bank.


The insurance industry utilizes data mining for marketing, just as retailing and
banking organizations do. But, they also have specialty applications. Farmers
Insurance Group has developed a system for underwriting, which generates millions
of dollars in higher revenues and lower claims. The system allows the firm to better
understand narrow market niches and to predict losses for specific lines of insurance.
One discovery was that it could lower its rates on sports cars, which increased their
market share for this product line significantly.
Unfortunately, our complex society leads to some inappropriate business operations,
including insurance fraud. Specialists in this underground industry often use multiple
personas to bilk insurance companies, especially in the automobile insurance
environment. Fraud detection software use a similarity search engine, analyzing
information in company claims for similarities. By linking names, telephone numbers,
streets, birthdays, and other information with slight variations, patterns can be
identified, indicating a fraud. The similarity search engine has been found to be able
to identify up to seven times more fraud than the exact-match systems.


Deregulation of the telephone industry has led to widespread competition. Telephone

service carriers fight hard for customers. The problem is that once a customer is
obtained, it is attacked by competitors, and retention of customers is very difficult.
The phenomenon of a customer switching carriers is referred to as churn, a
fundamental concept in telemarketing as well as in other fields.
A director of product marketing for a communications company considered that
one-third of churn is due to poor call quality and up to one-half is due to poor
equipment. That firm has a wireless telephone performance monitor tracking
telephones with poor performances. This system reduced churn by an estimated 61
percent, amounting to about 3 percent of the firm’s overall subscribers over the course
of a year. When a telephone begins to go bad, the telemarketing personnel are alerted
to contact the customer and suggest bringing in the equipment for service.
Another way to reduce churn is to protect customers from subscription and cloning
fraud. Cloning has been estimated to have cost the wireless industry millions. A
number of fraud prevention systems are marketed. These systems provide verification
that is transparent to the legitimate subscribers. Subscription fraud has been estimated
to have an economic impact of $1.1 billion. Deadbeat accounts and service shutoffs
are used to screen potentially fraudulent applicants.
Churn is a concept that is used by many retail marketing operations. Banks widely
use churn information to drive their promotions. Once data mining identifies
customers by characteristic, direct mailing and telemarketing are used to present the
bank’s promotional program. The mortgage market has seen massive refinancing in a
number of periods. Banks were quick to recognize that they needed to keep their
mortgage customers happy if they wanted to retain their business. This has led to
banks contacting the current customers if those customers hold a mortgage at a rate
significantly above the market rate. While they may cut their own lucrative financial
packages, banks realize that if they don’t offer a better service to borrowers, a
competitor will.

Human Resource Management

Business intelligence is a way to truly understand markets, competitors, and

processes. Software technology such as data warehouses, data marts, online analytical
processing (OLAP), and data mining make it possible to sift through data in order to
spot trends and patterns that can be used by the firm to improve profitability. In the
human resources field, this analysis can lead to the identification of individuals who
are liable to leave the company unless additional compensation or benefits are
Data mining can be used to expand upon things that are already known. A firm
might know that 20 percent of its employees use 80 percent of services offered, but
may not know which particular individuals are in that 20 percent. Business
intelligence provides a means of identifying segments, so that programs can be
devised to cut costs and increase productivity. Data mining can also be used to
examine the way in which an organization uses its people. The question might be
whether the most talented people are working for those business units with the highest
priority or where they will have the greatest impact on profit.
Companies are seeking to stay in business with fewer people. Sound human
resource management would identify the right people, so that organizations could treat
them well to retain them (reduce churn). This requires tracking key performance
indicators and gathering data on talents, company needs, and competitor requirements.

The era of big data is here, flooding businesses with numbers, text, and often more
complex data forms, such as videos or pictures. Some of this data is generated
internally, through enterprise systems or other software tools to manage a business’s
information. Data mining provides a tool to utilize this data. This chapter reviewed the
basic applications of data mining in business, to include customer profiling, fraud
detection, and churn analysis. These will all be explored in greater depth in Chapter 2.
But, here our intent is to provide an overview of what data mining is useful for in
The process of data mining relies heavily on information technology, in the form of
data storage support (data warehouses, data marts, or OLAP tools) as well as software
to analyze the data (data mining software). However, the process of data mining is far
more than simply applying these data mining software tools to a firm’s data.
Intelligence is required on the part of the analyst in selection of model types, in
selection and transformation of the data relating to the specific problem, and in
interpreting results.

Business Data Mining Tools

Have you ever wondered why your spouse gets all of these strange catalogs for
obscure products in the mail? Have you also wondered at his or her strong interest in
these things, and thought that the spouse was overly responsive to advertising of this
sort? For that matter, have you ever wondered why 90 percent of your telephone calls,
especially during meals, are opportunities to purchase products? (Or for that matter,
why calls assuming you are a certain type of customer occur over and over, even
though you continue to tell them that their database is wrong?)
One of the earliest and most effective business applications of data mining is in
support of customer segmentation. This insidious application utilizes massive
databases (obtained from a variety of sources) to segment the market into categories,
which are studied with data mining tools to predict the response to particular
advertising campaigns. It has proven highly effective. It also represents the
probabilistic nature of data mining, in that it is not perfect. The idea is to send catalogs
to (or call) a group of target customers with a 5 percent probability of purchase rather
than waste these expensive marketing resources on customers with a 0.05 percent
probability of purchase. The same principle has been used in election campaigns by
party organizations—give free rides to the voting booth to those in your party;
minimize giving free rides to voting booths to those likely to vote for your opponents.
Some call this bias. Others call it sound business.
Data mining offers the opportunity to apply technology to improve many aspects of
business. Some standard applications are presented in this chapter. The value of
education is to present you with past applications, so that you can use your
imagination to extend these application ideas to new environments.
Data mining has proven valuable in almost every academic discipline.
Understanding business application of data mining is necessary to expose business
college students to current analytic information technology. Data mining has been
instrumental in customer relationship management,1 credit card management,2
banking,3 insurance,4 telecommunications,5 and many other areas of statistical support
to business. Business data mining is made possible by the generation of masses of
data from computer information systems. Understanding this information generation
system and tools available leading to analysis is fundamental for business students in
the 21st century. There are many highly useful applications in practically every field
of scientific study. Data mining support is required to make sense of the masses of
business data generated by computer technology.
This chapter will describe some of the major applications of data mining. By doing
so, there will also be opportunities to demonstrate some of the different techniques
that have proven useful. Table 2.1 compares the aspects of these applications.

Table 2.1 Common business data mining applications

Application Function Statistical technique AI tool
Catalog sales Customer segmentation Cluster analysis K-means
Mail stream optimization Neural network
CRM (telecom) Customer scoring Cluster analysis Neural network
Churn analysis
Credit scoring Loan applications Cluster analysis K-means
Pattern search
Banking (loans) Bankruptcy prediction Prediction Decision tree
Discriminant analysis
Investment risk Risk prediction Prediction Neural network
Insurance Customer retention (churn) Prediction Decision tree
Pricing Logistic regression Neural network

A wide variety of business functions are supported by data mining. Those

applications listed in Table 2.1 represent only some of these applications. The
underlying statistical techniques are relatively simple—to predict, to identify the case
closest to past instances, or to identify some pattern.

Customer Profiling
We begin with probably the most spectacular example of business data mining.
Fingerhut, Inc. was a pioneer in developing methods to improve business. In this case,
they sought to identify the small subset of the most likely purchasers of their specialty
catalogs. They were so successful that they were purchased by Federated Stores.
Ultimately, Fingerhut operations were a victim to the general malaise in IT business in
2001 and 2002. But, they still represent a pioneering development of data mining
application in business.


This section demonstrates the concept of lift used in customer segmentation models.
We can divide the data into groups as fine as we want (here, we divide them into 10
equal portions of the population, or groups of 10 percent each). These groups have
some identifiable features, such as zip code, income level, and so on (a profile). We
can then sample and identify the portion of sales for each group. The idea behind lift
is to send promotional material (which has a unit cost) to those groups that have the
greatest probability of positive response first. We can visualize lift by plotting the
responses against the proportion of the total population of potential customers, as
shown in Table 2.2. Note that the segments are listed in Table 2.2 sorted by expected
customer response.

Table 2.2 Lift calculation

Ordered Expected Proportion Cumulative Random average Lift
segment customer (expected response proportion
response responses) proportion
Origin 0 0 0 0 0
1 0.20 0.172 0.172 0.10 0.072
2 0.17 0.147 0.319 0.20 0.119
3 0.15 0.129 0.448 0.30 0.148
4 0.13 0.112 0.560 0.40 0.160

5 0.12 0.103 0.664 0.50 0.164

6 0.10 0.086 0.750 0.60 0.150
7 0.09 0.078 0.828 0.70 0.128
8 0.08 0.069 0.897 0.80 0.097
9 0.07 0.060 0.957 0.90 0.057
10 0.05 0.043 1.000 1.00 0.000

Both the cumulative responses and cumulative proportion of the population are
graphed to identify the lift. Lift is the difference between the two lines in Figure 2.1.
Figure 2.1 Lift identified by the mail optimization system

The purpose of lift analysis is to identify the most responsive segments. Here, the
greatest lift is obtained from the first five segments. We are probably more interested
in profit, however. We can identify the most profitable policy. What needs to be done
is to identify the portion of the population to send promotional materials to. For
instance, if an average profit of $200 is expected for each positive response and a cost
of $25 is expected for each set of promotional material sent out, it obviously would be
more profitable to send to the first segment containing an expected 0.2 positive
responses ($200 times 0.2 equals an expected revenue of $40, covering the cost of $25
plus an extra $15 profit). But, it still might be possible to improve the overall profit by
sending to other segments as well (always selecting the segment with the larger
response rates in order). The plot of cumulative profit is shown in Figure 2.2 for this
set of data. The second most responsive segment would also be profitable, collecting
$200 times 0.17 or $34 per $25 mailing for a net profit of $9. It turns out that the
fourth most responsive segment collects 0.13 times $200 ($26) for a net profit of $1,
while the fifth most responsive segment collects $200 times 0.12 ($24) for a net loss
of $1. Table 2.3 shows the calculation of the expected payoff.
Figure 2.2 Profit impact of lift

Table 2.3 Calculation of the expected payoff

Segment Expected segment Cumulative Random cumulative Expected
revenue ($200 × P) expected revenue cost ($25 × i) payoff
0 0 0 0 0
1 40 40 25 15
2 34 74 50 24
3 30 104 75 29
4 26 130 100 30
5 24 154 125 29

6 20 174 150 24
7 18 192 175 17
8 16 208 200 8
9 14 222 225 –3
10 10 232 250 –18

The profit function in Figure 2.2 reaches its maximum with the fourth segment.
It is clear that the maximum profit is found by sending to the four most responsive
segments of the ten in the population. The implication is that in this case, the
promotional materials should be sent to the four segments expected to have the largest
response rates. If there was a promotional budget, it would be applied to as many
segments as the budget would support, in order of the expected response rate, up to
the fourth segment.
It is possible to focus on the wrong measure. The basic objective of lift analysis in
marketing is to identify those customers whose decisions will be influenced by
marketing in a positive way. In short, the methodology described earlier identifies
those segments of the customer base that would be expected to purchase. This may or
may not have been due to the marketing campaign effort. The same methodology can
be applied, but more detailed data is needed to identify those whose decisions would
have been changed by the marketing campaign, rather than simply those who would
Another method that considers multiple factors is Recency, Frequency, and
Monetary (RFM) analysis. As with lift analysis, the purpose of an RFM is to identify
customers who are more likely to respond to new offers. While lift looks at the static
measure of response to a particular campaign, RFM keeps track of customer
transactions by time, by frequency, and by amount. Time is important as some
customers may not have responded to the last campaign, but might now be ready to
purchase the product being marketed. Customers can also be sorted by the frequency
of responses and by the dollar amount of sales. The subjects are coded on each of the
three dimensions (one approach is to have five cells for each of the three measures,
yielding a total of 125 combinations, each of which can be associated with a positive
response to the marketing campaign). The RFM still has limitations, in that there are
usually more than three attributes important to a successful marketing program, such
as product variation, customer age, customer income, customer lifestyle, and so on.6
The approach is the basis for a continuing stream of techniques to improve customer
segmentation marketing.
Understanding lift enables understanding the value of specific types of customers.
This enables more intelligent customer management, which is discussed in the next

Comparisons of Data Mining Methods

Initial analyses focus on discovering patterns in the data. The classical statistical
methods, such as correlation analysis, is a good start, often supplemented with visual
tools to see the distributions and relationships among variables. Clustering and pattern
search are typically the first activities in data analysis, good examples of knowledge
discovery. Then, appropriate models are built. Data mining can then involve model
building (extension of the conventional statistical model building to very large
datasets) and pattern recognition. Pattern recognition aims to identify groups of
interesting observations. Often, experts are used to assist in pattern recognition.
There are two broad categories of models used for data mining. Continuous,
especially time series, data often calls for forecasting. Linear regression provides one
tool, but there are many others. Business data mining has widely been used for
classification or developing models to predict which category a new case will most
likely belong to (such as a customer profile relative to the expected purchases,
whether or not loans will be problematic, or whether insurance claims will turn out to
be fraudulent). The classification modeling tools include statistically based logistic
regression as well as artificial intelligence-based neural networks and decision trees.
Sung et al. compared a number of these methods with respect to their advantages
and disadvantages. Table 2.4 draws upon their analysis and expands it to include the
other techniques covered.

Table 2.4 Comparison of data mining method features7

Method Advantages Disadvantages Assumptions
Cluster Can generate understandable Computation time increases Need to make data
analysis formula with dataset size numerical
Can be applied Requires identification of
automatically parameters, with results
sensitive to choices

Discriminant Ability to incorporate Violates normality and Assume multivariate

analysis multiple financial ratios independence assumptions normality within groups
simultaneously Reduction of dimensionality Assume equal group
Coefficients for combining issues covariances across all
the independent variables Varied interpretation of the groups
Ability to apply to new data relative importance of variables Groups are discrete,
Difficulty in specifying the nonoverlapping, and
classification algorithm identifiable
Difficulty in interpreting the
time-series prediction tests
Regression Can generate understandable Computation time increases Normality of errors
formula with dataset size No error autocorrelation, -
Widely understood Not very good with nonlinear heteroskedasticity,
Strong body of theory data multicollinearity
Neural Can deal with a wide range Require inputs in the range of 0 Groups are discrete,
network of problems to 1 nonoverlapping, and
models Produce good results in Do not explain results identifiable
complicated domains May prematurely converge to an
(nonlinear) inferior solution
Can deal with both
continuous and categorical
Have many software
packages available
Decision Can generate understandable Some algorithms can only deal Groups are discrete,
trees rules with binary-valued target nonoverlapping, and
Can classify with minimal classes identifiable
computation Most algorithms only examine a
Use easy calculations single field at a time
Can deal with continuous Can be computationally
and categorical variables expensive
Provide a clear indication of
variable importance

Knowledge Discovery

Clustering: One unsupervised clustering technique is partitioning, the process of

examining a set of data to define a new categorical variable partitioning the space into
a fixed number of regions. This amounts to dividing the data into clusters. The most
widely known partitioning algorithm is k-means, where k center points are defined,
and each observation is classified to the closest of these center points. The k-means
algorithm attempts to position the centers to minimize the sum of distances. Centroids
are used as centers, and the most commonly used distance metric is Euclidean. Instead
of k-means, k-median can be used, providing a partitioning method expected to be
more stable.
Pattern search: Objects are often grouped to seek patterns. Clusters of customers
might be identified with particularly interesting average outcomes. On the positive
side, you might look for patterns in highly profitable customers. On the negative side,
you might seek patterns unique to those who fail to pay their bills to the firm.
Both clustering and pattern search seek to group the objects. Cluster analysis is
attractive, in that it can be applied automatically (although ample computational time
needs to be available). It can be applied to all types of data, as demonstrated in our
example. Cluster analysis is also easy to apply. However, its use requires selection
from among alternative distance measures, and weights may be needed to reflect
variable importance. The results are sensitive to these measures. Cluster analysis is
appropriate when dealing with large, complex datasets with many variables and
specifically identifiable outcomes. It is often used as an initial form of analysis. Once
different clusters are identified, pattern search methods are often used to discover the
rules and patterns. Discriminant analysis has been the most widely used data mining
technique in bankruptcy prediction. Clustering partitions the entire data sample,
assigning each observation to exactly one group. Pattern search seeks to identify local
clusterings, in that there are more objects with similar characteristics than one would
expect. Pattern search does not partition the entire dataset, but identifies a few groups
exhibiting unusual behavior. In the application on real data, clustering is useful for
describing broad behavioral classes of customers. Pattern search is useful for
identifying groups of people behaving in an anomalous way.

Predictive Models

Regression is probably the most widely used analytical tool historically. A main
benefit of regression is the broad understanding people have about regression models
and tests of their output. Logistic regression is highly appropriate in data mining, due
to the categorical nature of resultant variables that is usually present. While regression
is an excellent tool for statistical analysis, it does require assumptions about
parameters. Errors are assumed to be normally distributed, without autocorrelation
(errors are not related to the prior errors), without heteroskedasticity (errors don’t
grow with time, for instance), and without multicollinearity (independent variables
don’t contain high degrees of overlapping information content). Regression can deal
with nonlinear data, but only if the modeler understands the underlying nonlinearity
and develops appropriate variable transformations. There usually is a tradeoff—if the
data are fit well with a linear model, regression tends to be better than neural network
models. However, if there is nonlinearity or complexity in the data, neural networks
(and often, genetic algorithms) tend to do better than regression. A major relative
advantage of regression relative to neural networks is that regression provides an
easily understood formula, while neural network models have a very complex model.
Neural network algorithms can prove highly accurate, but involve difficulty in the
application to new data or interpretation of the model. Neural networks work well
unless there are many input features. The presence of many features makes it difficult
for the network to find patterns, resulting in long training phases, with lower
probabilities of convergence. Genetic algorithms have also been applied to data
mining, usually to bolster operations of other algorithms.
Decision tree analysis requires only the last assumption, that groups are discrete,
nonoverlapping, and identifiable. They provide the ability to generate understandable
rules, can perform classification with minimal computation, and these calculations are
easy. Decision tree analysis can deal with both continuous and categorical variables,
and provide a clear indication of variable importance in prediction and classification.
Given the disadvantages of the decision tree method, it is a good choice when the data
mining task is classification of records or prediction of outcomes.

Data mining applications are widespread. This chapter sought to give concrete
examples of some of the major business applications of data mining. We began with a
review of Fingerhut data mining to support catalog sales. That application was an
excellent demonstration of the concept of lift applied to retail business. We also
reviewed five other major business applications, intentionally trying to demonstrate a
variety of different functions, statistical techniques, and data mining methods. Most of
those studies applied multiple algorithms (data mining methods). Software such as
Enterprise Miner has a variety of algorithms available, encouraging data miners to
find the method that works best for a specific set of data.
The second portion of the book seeks to demonstrate these methods with small
demonstration examples. The small examples can be run on Excel or other simple
spreadsheet packages with statistical support. Businesses can often conduct data
mining without purchasing large-scale data mining software. Therefore, our
philosophy is that it is useful to understand what the methods are doing, which also
provides the users with better understanding of what they are doing when applying
data mining.

Data Mining Processes and Knowledge

In order to conduct data mining analysis, a general process is useful. This chapter
describes an industry standard process, which is often used, and a shorter vendor
process. While each step is not needed in every analysis, this process provides a good
coverage of the steps needed, starting with data exploration, data collection, data
processing, analysis, inferences drawn, and implementation.
There are two standard processes for data mining that have been presented. CRISP-
DM (cross-industry standard process for data mining) is an industry standard, and
SEMMA (sample, explore, modify, model, and assess) was developed by the SAS
Institute Inc., a leading vendor of data mining software (and a premier statistical
software vendor). Table 3.1 gives a brief description of the phases of each process.
You can see that they are basically similar, only with different emphases.

Table 3.1 CRISP-DM and SEMMA

Business understanding Assumes well-defined questions
Data understanding Sample
Data preparation Explore

Modeling Modify data

Evaluation Model
Deployment Assess

Industry surveys indicate that CRISP-DM is used by over 70 percent of the industry
professionals, while about half of these professionals use their own methodologies.
SEMMA has a lower reported usage, as per the survey.

CRISP-DM is widely used by the industry members. This model consists of six
phases intended as a cyclical process shown in Figure 3.1.

CRISP-DM process

This six-phase process is not a rigid, by-the-numbers procedure. There is usually a

great deal of backtracking. Additionally, experienced analysts may not need to apply
each phase for every study. But, CRISP-DM provides a useful framework for data

Business Understanding

The key element of a data mining study is understanding the purpose of the study.
This begins with the managerial need for new knowledge and the expression of the
business objective of the study to be undertaken. Goals in terms of things, such as
which types of customers are interested in each of our products or what are the typical
profiles of our customers, and how much value do each of them provide to us, are
needed. Then, a plan for finding such knowledge needs to be developed, in terms of
those responsible for collecting data, analyzing data, and reporting. At this stage, a
budget to support the study should be established, at least in preliminary terms.

Data Understanding

Once the business objectives and the project plan are established, data understanding
considers data requirements. This step can include initial data collection, data
