Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data

Data Mining is defined as the procedure of extracting information
from huge sets of data.
There is a huge amount of data available in the Information Industry.

This data is of no use until it is converted into useful information. It is
necessary to analyze this huge amount of data and extract useful
information from it.
Extraction of information is not the only process we need to perform;

data mining also involves other processes such as Data Cleaning, Data
Integration, Data Transformation, Data Mining, Pattern Evaluation and
Data Presentation. Once all these processes are over, we would be able
to use this information in many applications such as Fraud Detection,
Market Analysis, Production Control, Science Exploration, etc.
What is Data Mining?

Data Mining is defined as extracting information from huge sets of data.
In other words, we can say that data mining is the procedure of mining
knowledge from data. The information or knowledge extracted so can be
used for any of the following applications
Market Analysis
Fraud Detection
Customer Retention
Production Control
Science Exploration
a Mining Applications
Dat
Data mining is highly useful in the following domains
Market Analysis and Management
Corporate Analysis & Risk Management
Fraud Detection
Apart from these, data mining can also be used in the areas of
production control, customer retention, science exploration, sports,
astrology, and Internet Web Surf-Aid.
Market Analysis and Management

Listed below are the various fields of market where data mining is used
Customer Profiling Data mining helps determine what kind of

people buy what kind of products.
Identifying Customer Requirements Data mining helps in
identifying the best products for different customers. It uses
prediction to find the factors that may attract new customers.
Cross Market Analysis Data mining performs

association/correlations between product sales.
Target Marketing Data mining helps to find clusters of model

customers who share the same characteristics such as interests,
spending habits, income, etc.
Determining Customer purchasing pattern Data mining

helps in determining customer purchasing pattern.
Providing Summary Information Data mining provides us

various multidimensional summary reports.
Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector
Finance Planning and Asset Evaluation It involves cash flow

analysis and prediction, contingent claim analysis to evaluate
assets.
Resource Planning It involves summarizing and comparing the

resources and spending.
Competition It involves monitoring competitors and market

directions.
Fraud Detection
Data mining is also used in the fields of credit card services and
telecommunication to detect frauds. In fraud telephone calls, it helps to
find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected
norms.
Data mining deals with the kind of patterns that can be mined. On the
basis of the kind of data to be mined, there are two categories of
functions involved in Data Mining
Descriptive
Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the
database. Here is the list of descriptive functions
Class/Concept Description
Mining of Frequent Patterns
Mining of Associations
Mining of Correlations
Mining of Clusters
Class/Concept Description
Class/Concept refers to the data to be associated with the classes or
concepts. For example, in a company, the classes of items for sales
include computer and printers, and concepts of customers include big
spenders and budget spenders. Such descriptions of a class or a concept
are called class/concept descriptions. These descriptions can be derived
by the following two ways
Data Characterization This refers to summarizing data of class under study. This class under study is
called as Target Class.
Data Discrimination It refers to the mapping or classification of a class with some predefined group or
class.
Mining of Frequent Patterns

Frequent patterns are those patterns that occur frequently in
transactional data. Here is the list of kind of frequent patterns
Frequent Item Set It refers to a set of items that frequently appear together, for example, milk and
bread.
Frequent Subsequence A sequence of patterns that occur frequently such as purchasing a camera is
followed by memory card.
Frequent Sub Structure Substructure refers to different structural forms, such as graphs, trees, or
lattices, which may be combined with itemsets or subsequences.
Mining of Association
Associations are used in retail sales to identify patterns that are
frequently purchased together. This process refers to the process of
uncovering the relationship among data and determining association
rules.
For example, a retailer generates an association rule that shows that
70% of time milk is sold with bread and only 30% of times biscuits are
sold with bread.
Mining of Correlations
It is a kind of additional analysis performed to uncover interesting
statistical correlations between associated-attributevalue pairs or
between two item sets to analyze that if they have positive, negative or
no effect on each other.
Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers
to forming group of objects that are very similar to each other but are
highly different from the objects in other clusters.
Classification and Prediction

Classification is the process of finding a model that describes the data
classes or concepts. The purpose is to be able to use this model to
predict the class of objects whose class label is unknown. This derived
model is based on the analysis of sets of training data. The derived
model can be presented in the following forms
Classification (IF-THEN) Rules
Decision Trees
Mathematical Formulae
Neural Networks
The list of functions involved in these processes are as follows
Classification It predicts the class of objects whose class label is

unknown. Its objective is to find a derived model that describes
and distinguishes data classes or concepts. The Derived Model is
based on the analysis set of training data i.e. the data object
whose class label is well known.
Prediction It is used to predict missing or unavailable numerical

data values rather than class labels. Regression Analysis is
generally used for prediction. Prediction can also be used for
identification of distribution trends based on available data.
Outlier Analysis Outliers may be defined as the data objects
that do not comply with the general behavior or model of the data
available.
Evolution Analysis Evolution analysis refers to the description

and model regularities or trends for objects whose behavior
changes over time.
Data Mining Task Primitives

We can specify a data mining task in the form of a data mining
query.
This query is input to the system.
A data mining query is defined in terms of data mining task

primitives.
Note These primitives allow us to communicate in an interactive

manner with the data mining system. Here is the list of Data Mining Task
Primitives
Set of task relevant data to be mined.
Kind of knowledge to be mined.
Background knowledge to be used in discovery process.
Interestingness measures and thresholds for pattern evaluation.
Representation for visualizing the discovered patterns.

Set of task relevant data to be mined
This is the portion of database in which the user is interested. This
portion includes the following
Database Attributes
Data Warehouse dimensions of interest

Kind of knowledge to be mined
It refers to the kind of functions to be performed. These functions are
Characterization
Discrimination
Association and Correlation Analysis

Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis

Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data

Uploaded by

Copyright:

Available Formats

Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data

Uploaded by

Copyright:

Available Formats

What are some of the processes involved in data mining?

What are some of the processes involved in data mining?

What are some common data mining techniques?

What are some common data mining techniques?

Data Mining is defined as the procedure of extracting information

from huge sets of data.

There is a huge amount of data available in the Information Industry.

Extraction of information is not the only process we need to perform;

What is Data Mining?

Market Analysis and Management

Corporate Analysis & Risk Management

Market Analysis and Management

Customer Profiling Data mining helps determine what kind of

Cross Market Analysis Data mining performs

Target Marketing Data mining helps to find clusters of model

Determining Customer purchasing pattern Data mining

Providing Summary Information Data mining provides us

Corporate Analysis and Risk Management

Finance Planning and Asset Evaluation It involves cash flow

Resource Planning It involves summarizing and comparing the

Competition It involves monitoring competitors and market

Classification and Prediction

Mining of Frequent Patterns

Mining of Frequent Patterns

Classification and Prediction

Classification (IF-THEN) Rules

The list of functions involved in these processes are as follows

Classification It predicts the class of objects whose class label is

Prediction It is used to predict missing or unavailable numerical

Evolution Analysis Evolution analysis refers to the description

Data Mining Task Primitives

This query is input to the system.

A data mining query is defined in terms of data mining task

Note These primitives allow us to communicate in an interactive

Set of task relevant data to be mined.

Kind of knowledge to be mined.

Background knowledge to be used in discovery process.

Interestingness measures and thresholds for pattern evaluation.

Representation for visualizing the discovered patterns.

Data Warehouse dimensions of interest

Association and Correlation Analysis

You might also like