Data Mining: Concepts and Techniques: - Chapter 10
Data Mining: Concepts and Techniques: - Chapter 10
Data Mining: Concepts and Techniques: - Chapter 10
Retail industry
Telecommunication industry
sets
Need multiple dimensional view in selection
Data types: relational, transactional, text, time sequence,
spatial?
System issues
running on only one or on several operating systems?
a client/server architecture?
Data sources
ASCII text files, multiple relational data sources
Scalability
Row (or database size) scalability
SGI MineSet
Multiple data mining algorithms and advanced statistics
Clementine (SPSS)
An integrated data mining development environment
for end-users and developers
Multiple data mining algorithms and visualization tools
Data visualization
Data in a database or data warehouse can be
viewed
at different levels of granularity or abstraction
dimensions
Data can be presented in various visual forms
Regression trees
Binary trees used for classification and prediction
Factor analysis
determine which vars are combined to generate a given factor
e.g., for many psychiatric data, one can indirectly measure other
quantities (such as test scores) that reflect the factor of interest
Discriminant analysis
predict a categorical response variable, commonly used in social
science
Attempts to determine several discriminant functions (linear
combinations of the independent variables) that discriminate
among the groups defined by the response variable
Time series: many methods such as autoregression, ARIMA
(Autoregressive integrated moving-average modeling), long memory
time-series modeling
Survival analysis
predict the probability that a patient undergoing a medical
treatment would survive at least to time t (life span prediction)
Quality control
display group summary charts
August 9, 2019 Data Mining: Concepts and Techniques 34
Theoretical Foundations of Data Mining (1)
Data reduction
The basis of data mining is to reduce the data
representation
Trades accuracy for speed in response
Data compression
The basis of data mining is to compress the given
data by encoding in terms of bits, association rules,
decision trees, clusters, etc.
Pattern discovery
The basis of data mining is to discover patterns
occurring in the database, such as associations,
classification models, sequential patterns, etc.
Probability theory
The basis of data mining is to discover joint probability
distributions of random variables
Microeconomic view
A view of utility: the task of data mining is finding
patterns that are interesting only to the extent in that
they can be used in the decision-making process of
some enterprise
Inductive databases
Data mining is the problem of performing inductive logic
on databases,
The task is to query the data and the theory (i.e.,
patterns) of the database
Popular among many researchers in database systems
August 9, 2019 Data Mining: Concepts and Techniques 36
Data Mining and Intelligent Query
Answering
Query answering
Direct query answering: returns exactly what is being
asked
Intelligent (or cooperative) query answering: analyzes
You use your credit card, debit card, supermarket loyalty card, or
frequent flyer card, or apply for any of the above
You surf the Web, reply to an Internet newsgroup, subscribe to a
magazine, rent a video, join a club, fill out a contest entry form,
You pay for prescription drugs, or present you medical care
number when visiting the doctor
Collection of personal data may be beneficial for
companies and consumers, there is also potential for
misuse
August 9, 2019 Data Mining: Concepts and Techniques 43
Protect Privacy and Data Security
Fair information practices
International guidelines for data privacy protection
Biometric encryption
Anonymous databases
Application exploration
development of application-specific data mining
system
Invisible data mining (mining as built-in function)