Data Mining-CH5
Data Mining-CH5
Data Mining-CH5
Overview
• There is a huge amount of data available in the
Information Industry. This data is of no use until it is
converted into useful information. It is necessary to
analyze this of data and extract useful information from it.
• Extraction of information is not the only process we need
to perform; data mining also involves other processes
such as Data Cleaning, Data Integration, Data
Transformation, Pattern Evaluation and Data
Presentation.
• Once all these processes are over, we would be able to
use this information in many applications such as Fraud
Detection, Market Analysis, Production Control, Science
Exploration, etc.
What is Data Mining?
• Data Mining is defined as extracting information from
huge sets of data. In other words, data mining is the
procedure of mining knowledge from data. The
information or knowledge extracted can be used for any of
the following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Market Analysis and Management
• Listed below are the various fields of market where data mining is used
−
• Customer Profiling − Data mining helps determine what kind of people
buy what kind of products.
• Identifying Customer Requirements − Data mining helps in identifying
the best products for different customers. It uses prediction to find the
factors that may attract new customers.
• Cross Market Analysis − Data mining performs
Association/correlations between product sales.
• Target Marketing − Data mining helps to find clusters of model
customers who share the same characteristics such as interests,
spending habits, income, etc.
• Determining Customer purchasing pattern − Data mining helps in
determining customer purchasing pattern.
• Providing Summary Information − Data mining provides us various
multidimensional summary reports.
Corporate Analysis and Risk
Management
• Characterization
• Discrimination
• Association and Correlation Analysis
• Classification
• Prediction
• Clustering
• Outlier Analysis
• Evolution Analysis
Data Mining - Issues
• Data mining is not an easy task, as the algorithms used
can get very complex and data is not always available at
one place. It needs to be integrated from various
heterogeneous data sources. These factors also create
some issues.
• we will discuss the major issues regarding
• Mining Methodology and User Interaction
• Performance Issues
• Diverse Data Types Issues
Mining Methodology and User Interaction Issues
• Mining different kinds of knowledge in databases −
Different users may be interested in different kinds of
knowledge. Therefore it is necessary for data mining to
cover a broad range of knowledge discovery task.
Knowledge Discovery
• Some people treat data mining same as knowledge
discovery, while others view data mining as an
essential step in the process of knowledge discovery.
Here is the list of steps involved in the knowledge
discovery process −
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Data Mining
• Pattern Evaluation
User interface
User interface is the module of data mining system
that helps the communication between users and
the data mining system. User Interface allows the
following functionalities −
• Interact with the system by specifying a data mining query task.
• Providing information to help focus the search.
• Mining based on the intermediate data mining results.
• Browse database and data warehouse schemas or data structures.
• Evaluate mined patterns.
• Visualize the patterns in different forms.
Data Integration
• Data Integration is a data preprocessing technique that
merges the data from multiple heterogeneous data
sources into a coherent data store. Data integration may
involve inconsistent data and therefore needs data
cleaning.
• Data Cleaning