Data Mining-CH5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

DATA MINING

Overview
• There is a huge amount of data available in the
Information Industry. This data is of no use until it is
converted into useful information. It is necessary to
analyze this of data and extract useful information from it.
• Extraction of information is not the only process we need
to perform; data mining also involves other processes
such as Data Cleaning, Data Integration, Data
Transformation, Pattern Evaluation and Data
Presentation.
• Once all these processes are over, we would be able to
use this information in many applications such as Fraud
Detection, Market Analysis, Production Control, Science
Exploration, etc.
What is Data Mining?
• Data Mining is defined as extracting information from
huge sets of data. In other words, data mining is the
procedure of mining knowledge from data. The
information or knowledge extracted can be used for any of
the following applications −
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Market Analysis and Management

• Listed below are the various fields of market where data mining is used

• Customer Profiling − Data mining helps determine what kind of people
buy what kind of products.
• Identifying Customer Requirements − Data mining helps in identifying
the best products for different customers. It uses prediction to find the
factors that may attract new customers.
• Cross Market Analysis − Data mining performs
Association/correlations between product sales.
• Target Marketing − Data mining helps to find clusters of model
customers who share the same characteristics such as interests,
spending habits, income, etc.
• Determining Customer purchasing pattern − Data mining helps in
determining customer purchasing pattern.
• Providing Summary Information − Data mining provides us various
multidimensional summary reports.
Corporate Analysis and Risk
Management

• Data mining is used in the following fields of the Corporate


Sector −
• Finance Planning and Asset Evaluation − It involves
cash flow analysis and prediction, contingent claim
analysis to evaluate assets.
• Resource Planning − It involves summarizing and
comparing the resources and spending.
• Competition − It involves monitoring competitors and
market directions.
Fraud Detection

• Data mining is also used in the fields of credit card


services and telecommunication to detect frauds.
• In fraud telephone calls, it helps to find the destination of
the call, duration of the call, time of the day or week, etc.
It also analyzes the patterns that deviate from expected
norms.
Data Mining - Tasks
• Data mining deals with the kind of patterns that can be
mined. On the basis of the kind of data to be mined, there
are two categories of functions involved in Data Mining −
• Descriptive
• Classification and Prediction
Descriptive Function
• The descriptive function deals with the general properties
of data in the database. Here is the list of descriptive
functions −
• Class/Concept Description
• Mining of Frequent Patterns
• Mining of Associations
• Mining of Correlations
• Mining of Clusters
Mining of Association
• Associations are used in retail sales to identify patterns
that are frequently purchased together. This process
refers to the process of uncovering the relationship among
data and determining association rules.
• For example, a retailer generates an association rule that
shows that 70% of milk is sold with bread and only 30% of
biscuits are sold with bread.
Mining of Correlations
• It is a kind of additional analysis performed to uncover
interesting statistical correlations between associated-
attribute-value pairs or between two item sets to analyze
that if they have positive, negative or no effect on each
other.
Mining of Clusters

• Cluster refers to a group of similar kind of objects. Cluster


analysis refers to forming group of objects that are very
similar to each other but are highly different from the
objects in other clusters.
Classification and Prediction
• Classification is the process of finding a model that
describes the data classes or concepts. The purpose is to
be able to use this model to predict the class of objects
whose class label is unknown. This derived model is
based on the analysis of sets of training data. The derived
model can be presented in the following forms −
• Classification (IF-THEN) Rules
• Decision Trees
• Mathematical Formulae
• Neural Networks
Classification
• It predicts the class of objects whose class label is
unknown. Its objective is to find a derived model that
describes and distinguishes data classes or concepts.
The Derived Model is based on the analysis set of training
data i.e. the data object whose class label is well known.
Prediction
• It is used to predict missing or unavailable numerical
data values rather than class labels. Regression Analysis
is generally used for prediction. Prediction can also be
used for identification of distribution trends based on
available data.
Evolution Analysis
• Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes
over time.
Kind of knowledge to be mined
It refers to the kind of functions to be performed. These functions
are −

• Characterization
• Discrimination
• Association and Correlation Analysis
• Classification
• Prediction
• Clustering
• Outlier Analysis
• Evolution Analysis
Data Mining - Issues
• Data mining is not an easy task, as the algorithms used
can get very complex and data is not always available at
one place. It needs to be integrated from various
heterogeneous data sources. These factors also create
some issues.
• we will discuss the major issues regarding
• Mining Methodology and User Interaction
• Performance Issues
• Diverse Data Types Issues
Mining Methodology and User Interaction Issues
• Mining different kinds of knowledge in databases −
Different users may be interested in different kinds of
knowledge. Therefore it is necessary for data mining to
cover a broad range of knowledge discovery task.

• Interactive mining of knowledge at multiple levels of


abstraction −
The data mining process needs to be interactive because it
allows users to focus the search for patterns, providing and
refining data mining requests based on the returned
results.
Data mining query languages and ad hoc data
mining
• Data Mining Query language that allows the user to
describe ad hoc mining tasks, should be integrated with a
data warehouse query language and optimized for
efficient and flexible data mining.
Presentation and visualization of data mining
results
• Once the patterns are discovered it needs to be
expressed in high level languages, and visual
representations. These representations should be easily
understandable.
Handling noisy or incomplete data
• The data cleaning methods are required to handle the
noise and incomplete objects while mining the data
regularities. If the data cleaning methods are not there
then the accuracy of the discovered patterns will be poor.
Pattern evaluation
• The patterns discovered should be interesting because
either they represent common knowledge or lack novelty.
Diverse Data Types Issues
• Handling of relational and complex types of data −
The database may contain complex data objects,
multimedia data objects, spatial data, temporal data etc. It
is not possible for one system to mine all these kind of
data.

• Mining information from heterogeneous databases


and global information systems −
The data is available at different data sources on LAN or
WAN. These data source may be structured, semi
structured or unstructured. Therefore mining the knowledge
from them adds challenges to data mining.
From Data Warehousing (OLAP) to Data
Mining (OLAM)
• Online Analytical Mining integrates with Online Analytical
Processing with data mining and mining knowledge in
multidimensional databases.
Importance of OLAM
• High quality of data in data warehouses −
The data mining tools are required to work on integrated,
consistent, and cleaned data. These steps are very costly
in the preprocessing of data. The data warehouses
constructed by such preprocessing are valuable sources of
high quality data for OLAP and data mining as well.
• Available information processing
infrastructure surrounding data
warehouses − Information processing
infrastructure refers to accessing,
integration, consolidation, and
transformation of multiple
heterogeneous databases, web-
accessing and service facilities,
reporting and OLAP analysis tools.
Data Mining Engine
• Data mining engine is very essential to the data mining
system. It consists of a set of functional modules that
perform the following functions −
• Characterization
• Association and Correlation Analysis
• Classification
• Prediction
• Cluster analysis
• Outlier analysis
• Evolution analysis
Knowledge Base
• This is the domain knowledge. This knowledge is used to
guide the search or evaluate the interestingness of the
resulting patterns.

Knowledge Discovery
• Some people treat data mining same as knowledge
discovery, while others view data mining as an
essential step in the process of knowledge discovery.
Here is the list of steps involved in the knowledge
discovery process −
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Data Mining
• Pattern Evaluation
User interface
User interface is the module of data mining system
that helps the communication between users and
the data mining system. User Interface allows the
following functionalities −
• Interact with the system by specifying a data mining query task.
• Providing information to help focus the search.
• Mining based on the intermediate data mining results.
• Browse database and data warehouse schemas or data structures.
• Evaluate mined patterns.
• Visualize the patterns in different forms.
Data Integration
• Data Integration is a data preprocessing technique that
merges the data from multiple heterogeneous data
sources into a coherent data store. Data integration may
involve inconsistent data and therefore needs data
cleaning.
• Data Cleaning

Data cleaning is a technique that is applied to


remove the noisy data and correct the
inconsistencies in data. Data cleaning involves
transformations to correct the wrong data. Data
cleaning is performed as a data preprocessing
step while preparing the data for a data
warehouse.
• Data Selection
Data Selection is the process where data relevant to the
analysis task are retrieved from the database. Sometimes
data transformation and consolidation are performed before
the data selection process.
• Clusters
Cluster refers to a group of similar kind of
objects. Cluster analysis refers to forming
group of objects that are very similar to each
other but are highly different from the
objects in other clusters.
• Data Transformation
In this step, data is transformed or
consolidated into forms appropriate for
mining, by performing summary or
aggregation operations.
• Data Mining, which is also known as Knowledge
Discovery in Databases (KDD), is a process of
discovering patterns in a large set of data and data
warehouses.
• Various techniques such as regression analysis,
association, and clustering, classification, and outlier
analysis are applied to data to identify useful outcomes.
These techniques use software and backend algorithms
that analyze the data and show patterns.
• The data mining process starts with giving a certain input
of data to the data mining tools that use statistics and
algorithms to show the reports and patterns. The results
can be visualized using these tools that can be
understood and further applied to conduct business
modification and improvements.
• Data mining is widely used by organizations in building a
marketing strategy, by hospitals for diagnostic tools, by
eCommerce for cross-selling products through websites
and many other ways.
Examples Of Data Mining In Real Life
• #1) Mobile Service Providers
• Mobile service providers use data mining to design their
marketing campaigns and to retain customers from
moving to other vendors.

• From a large amount of data such as billing information,


email, text messages, web data transmissions, and
customer service, the data mining tools can predict
“churn” that tells the customers who are looking to change
the vendors.
• With these results, a probability score is given. The mobile
service providers are then able to provide incentives,
offers to customers who are at higher risk of churning.
This kind of mining is often used by major service
providers such as broadband, phone, gas providers, etc.
• #2) Retail Sector
• Data Mining helps the supermarket and retail sector
owners to know the choices of the customers. Looking at
the purchase history of the customers, the data mining
tools show the buying preferences of the customers.

• With the help of these results, the supermarkets design


the placements of products on shelves and bring out
offers on items such as coupons on matching products,
and special discounts on some products.
Other Areas
• Ecommerce
• Science And Engineering
• Crime Prevention
Data Mining detects outliers across a vast amount of data.
The criminal data includes all details of the crime that has
happened. Data Mining will study the patterns and trends
and predict future events with better accuracy.
The agencies can find out which area is more prone to
crime, how much police personnel should be deployed,
which age group should be targeted, vehicle numbers to be
scrutinized, etc.
• Detect Financial Crimes
Banking data come from many different sources, various
cities, and different bank locations. Multiple data analysis
tools are deployed to study and to detect unusual trends
like big value transactions. Data visualization tools, outlier
analysis tools, clustering tools, etc are used to identify the
relationships and patterns of action.
• Market Basket Analysis is the technique to find the
groups of items that are bought together in stores.
Analysis of the transactions show the patterns such as
which things are bought together often like bread and
butter, or which items have higher sales volume on certain
days such as beer on Fridays.

• This information helps in planning the store layouts,


offering a special discount to the items that are less in
demand, creating offers such as “buy 2 get 1 free” or “get
50% on second purchase” etc.
• Big Companies Using Data Mining
• Some online companies using data mining techniques are
given below:

• AMAZON: Amazon uses Text Mining to find the lowest


price of the product.
• MC Donald’s: McDonald's uses big data mining to
enhance its customer experience. It studies the ordering
pattern of customers, waiting times, size of orders, etc.
• NETFLIX: Netflix finds out how to make a movie or a
series popular among the customers using its data mining
insights.
Data Analysis Process
• The knowledge discovery process is a sequence of the
following steps:

• Data Cleaning: This step removes noise and inconsistent data


from the input data.
• Data Integration: This step combines multiple sources of data.
The data cleaning and data integration step together to form the
preprocessing of data. The preprocessed data is then stored in
the data warehouse.
• Data Selection: These steps select the data to the analysis
task from the database.
• Data Transformation: In this step, various data aggregation
and data summary techniques are applied to transform the data
into a useful form for mining.
• Data Mining: In this step, data patterns are extracted by
applying intelligent methods.
• Pattern Evaluation: The extracted data patterns are
evaluated and recognized according to the
interestingness measures.
• Knowledge Representation: Visualization and knowledge
representation techniques are used to present the mined
knowledge to the users.
• The steps 1 to 4 come under the data preprocessing
stage. Here, data mining is represented as a single step
but it refers to the entire knowledge discovery process.

• Thus, we can say, that data analysis is the process of


discovering interesting patterns and knowledge from a
large amount of data. The data sources can include
databases, data warehouses, World Wide Web, flat files
and other informative files.
Conclusion
• Data mining is used in diverse applications such as
banking, marketing, healthcare, telecom industries, and
many other areas.

• Data mining techniques help companies to gain


knowledgeable information, increase their profitability by
making adjustments in processes and operations. It is a
fast process which helps business in decision making
through analysis of hidden patterns and trends.

You might also like