DA Notes - Module 3
DA Notes - Module 3
DA Notes - Module 3
(R1UC402T)
UNIT-II
Linear Correlation- Regression Modelling- Multivariate Analysis-
Bayesian Modelling- Inference and Bayesian Networks- Support
vector and Kernel Methods- Analysis of time series- Linear System
Analysis- Non Linear Dynamics- Rule Induction- Basic Fuzzy and
Neural Networks
Definition
Let X and Y be two random variables. The linear correlation
coefficient (or Pearson's correlation coefficient) between X and , Y
denoted by Corr[X,Y] is defined as follows:
where is the covariance between and and and are the standard
deviations Corr[X,Y]=Cov[X,Y]/σ[X]σ[Y] where Cov[X,Y] is the
Covariance[X,Y] .
Note that, in principle, the ratio is well-defined only if σ[X]and σ[Y] and are
strictly greater than zero. However, it is often assumed that Corr[X,Y]=0
when one of the two standard deviations is zero. This is equivalent to
assuming that0/0=0 because Cov[X,Y]=0 when one of the two standard
deviations is zero.
Interpretation
The interpretation is similar to the interpretation of covariance: the
correlation between X and Y provides a measure of how similar their
deviations from the respective means are
Linear correlation has the property of being bounded between -1 and 1
-1 ≤ Corr[X,Y] ≤ 1
Thanks to this property, correlation allows to easily understand the intensity
of the linear dependence between two random variables: the closer
correlation is to 1, the stronger the positive linear dependence between
X and Y is (and the closer it is to -1, the stronger the negative linear
dependence between X and Y is).
Terminology
The following terminology is often used:
1. If Corr[X,Y]>0 then X and Y are said to be positively linearly
correlated (or simply positively correlated).
2. If Corr[X,Y]<0 then X and Y are said to be negatively linearly
correlated (or simply negatively correlated).
3. If Corr[X,Y]≠0 then X and Y are said to be linearly correlated (or
simply correlated).
Symmetry
The linear correlation coefficient is symmetric:
Corr[X,Y]=Corr[Y,X]
Regression Modelling:
It includes many techniques for modeling and analyzing several
variables, when the focus is on the relationship between a dependent
variable and one or more independent variables (or 'predictors'). ... In all
cases, a function of the independent variables called
the regression function is to be estimated.
REGRESSION:
Data can be smoothed by fitting data to a function such as with
regression.
Linear regression involves finding the best line to fit two variables or
attributes so that one attribute can be used to predict the other.
Multiple linear regression:More than two attributes are involved and the
data are fit to a multidimensional surface.
w1=|D|i=1∑(xi- x)(yi- y)
___________________
|D| 2
i=1∑ (xi - x)
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83
CO-RELATION CO-EFFICIENT
DEFINITION
Let X and Y be two random variables. The linear correlation co-efficient or
Pearson’s Correlation co-efficient between X and Y denoted by
TERMINOLOGY
BAYES THEOREM
FOR EXAMPLE,
A Customer is described by the attribute age and income respectively, and
that ‘X’ is a 35 year old customer with an income of $40,000.
Suppose that ‘H’ is the hypothesis that our customer will buy a computer
given that we know the customer’s age and income.
P(H) Prior-probability, for our example, this is the probability that any
given customer will buy a computer regardless of age, income or any other
information.
Similarly, P(X/H) is the posterior probability.
P(X) prior probability of ‘ X ‘
Above probabilities are estimated using Bayes Theorem.
BAYES THEOREM,
Given a tuple‘ X’, the classifier predict that ‘ X’ belongs to the class
having the highest posterior probability , conditioned on ‘ X’ i.e ,
the Naïve Bayesian Classifier predicts the tuple ‘ X ‘ belongs to the
class if and only if
By Bayes Theorem,
Use,
g(x,μ,σ)=(1/√2ᴨσ)eᶺ-((x-μ)ᶺ2)/2σᶺ2
P(/)=g(,μ,,σ)
In other words, the predicted class label is the class for which
P(X/)P() is the maximum.
This shows a CPT for the variable lung cancer. The conditional
probability for each known value of lung cancer is given for each
possible combination of values of its parents. For instance from the
table, the upper leftmost and bottom right most entries we see that
Dimension reduction.
Cause -effect relationship
Cluster analysis:
Clusters are homogenous with itself but different from another cluster. It
tells us how the individuals are similar and dissimilar among themselves.
Dimensionality reduction
PCA (K-L method)
Factor analysis
PCA searches for K ‘n’ dimensional orthogonal vectors that can best be
used to represent the data where k<=n. It combines the essence of
attributes by creating an alternating smaller set of variables. The entire data
can then be projected into the smaller set. PCA often reveals the
relationship that were not previously supported.
The basic procedure is as follows:
The input data are normalized, so that each attribute falls within the
same range. This step helps us to ensure that the entire attribute with
large domain will not dominate attributes with smaller domain.
PCA computes k orthogonal vector that provide a basis for
normalized input data. These are unit vectors that each point in the
direction perpendicular to the others. These vectors are referred to as
the principal components. The input data are linear combination of
the principal components.
The principal components are sorted in the order of decreasing
“significance” or strength. The principal components essentially serve
as the new set of axes for the data, providing important information
about variance.
Figure shows the first two principal components Y1,Y2 for the given set of
data originally mapped to the axes X1 and X2. This information helps
identify groups or patterns within the data.
Rule Induction
A decision tree is a structure that includes a root node, branches, and leaf
nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The
topmost node in the tree is the root node.
The following decision tree is for the concept buys_computer that indicates
whether a customer at a company is likely to buy a computer or not. Each
internal node represents a test on an attribute. Each leaf node represents a
class.
It is easy to comprehend.
The learning and classification steps of a decision tree are
simple and fast.