Correlation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

CORRELATION

Correlation
key concepts:
Types of correlation
Methods of studying correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
d) Method of least squares
Correlation
Correlation: The degree of relationship between the
variables under consideration is measure through the
correlation analysis.
The measure of correlation called the correlation coefficient
The degree of relationship is expressed by coefficient which
range from correlation ( -1 ≤ r ≥ +1)
The direction of change is indicated by a sign.
The correlation analysis enable us to have an idea about the
degree & direction of the relationship between the two
variables under study.
Correlation
Correlation is a statistical tool that helps
to measure and analyze the degree of
relationship between two variables.
Correlation analysis deals with the
association between two or more
variables.
Correlation & Causation
Causation means cause & effect relation.
Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is essential
that the two phenomenon should have cause-effect
relationship,& if such relationship does not exist then the
two phenomenon can not be correlated.
If two variables vary in such a way that movement in one
are accompanied by movement in other, these variables
are called cause and effect relationship.
Causation always implies correlation but correlation does
not necessarily implies causation.
Types of Correlation
Type I

Correlation

Positive Correlation Negative Correlation


Types of Correlation Type I
Positive Correlation: The correlation is said to
be positive correlation if the values of two
variables changing with same direction.
Ex. Pub. Exp. & sales, Height & weight.
Negative Correlation: The correlation is said to
be negative correlation when the values of
variables change with opposite direction.
Ex. Price & qty. demanded.
Direction of the Correlation
Positive relationship – Variables change in the
same direction.
Indicated by
As X is increasing, Y is increasing
As X is decreasing, Y is decreasing
sign; (+) or (-).
E.g., As height increases, so does weight.
Negative relationship – Variables change in
opposite directions.
As X is increasing, Y is decreasing
As X is decreasing, Y is increasing
E.g., As TV time increases, grades decrease
More examples
Positive relationships Negative relationships:
water consumption alcohol consumption
and temperature. and driving ability.
study time and Price & quantity
grades. demanded
Types of Correlation
Type II

Correlation

Simple Multiple

Partial Total
Types of Correlation Type II
Simple correlation: Under simple correlation
problem there are only two variables are studied.
Multiple Correlation: Under Multiple
Correlation three or more than three variables
are studied. Ex. Qd = f ( P,PC, PS, t, y )
Partial correlation: analysis recognizes more
than two variables but considers only two
variables keeping the other constant.
Total correlation: is based on all the relevant
variables, which is normally not feasible.
Types of Correlation
Type III

Correlation

LINEAR NON LINEAR


Types of Correlation Type III
Linear correlation: Correlation is said to be linear
when the amount of change in one variable tends to
bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear
relationship will form a straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
Non Linear correlation: The correlation would be
non linear if the amount of change in one variable
does not bear a constant ratio to the amount of change
in the other variable.
Methods of Studying Correlation

Scatter Diagram Method


Graphic Method
Karl Pearson’s Coefficient of
Correlation
Method of Least Squares
Scatter Diagram Method

Scatter Diagram is a graph of observed


plotted points where each points
represents the values of X & Y as a
coordinate. It portrays the relationship
between these two variables graphically.
A perfect positive correlation
Weight
Weight
of B
Weight A linear
of A
relationship

Height
Height Height
of A of B
High Degree of positive correlation
Positive relationship
r = +.80

Weight

Height
Degree of correlation
Moderate Positive Correlation

r = + 0.4
Shoe
Size

Weight
Degree of correlation
Perfect Negative Correlation

r = -1.0
TV
watching
per
week

Exam score
Degree of correlation
Moderate Negative Correlation
r = -.80
TV
watching
per
week

Exam score
Degree of correlation
Weak negative Correlation

Shoe
r = - 0.2
Size

Weight
Degree of correlation
No Correlation (horizontal line)

r = 0.0
IQ

Height
Degree of correlation (r)
r = +.80 r = +.60

r = +.40 r = +.20
2) Direction of the Relationship
Positive relationship – Variables change in the
same direction.
Indicated by
As X is increasing, Y is increasing
As X is decreasing, Y is decreasing
sign; (+) or (-).
E.g., As height increases, so does weight.
Negative relationship – Variables change in
opposite directions.
As X is increasing, Y is decreasing
As X is decreasing, Y is increasing
E.g., As TV time increases, grades decrease
Advantages of Scatter Diagram
Simple & Non Mathematical method
Not influenced by the size of extreme
item
First step in investing the relationship
between two variables
Disadvantage of scatter diagram

Can not adopt the an exact degree of


correlation
Karl Pearson's
Coefficient of Correlation
Pearson’s ‘r’ is the most common
correlation coefficient.
Karl Pearson’s Coefficient of Correlation
denoted by- ‘r’ The coefficient of
correlation ‘r’ measure the degree of
linear relationship between two variables
say x & y.
Karl Pearson's
Coefficient of Correlation
Karl Pearson’s Coefficient of
Correlation denoted by- r
-1 ≤ r ≥ +1
Degree of Correlation is expressed by a
value of Coefficient
Direction of change is Indicated by sign
( - ve) or ( + ve)
Karl Pearson's
Coefficient of Correlation
When deviation taken from actual mean:
r(x, y)= Σxy /√ Σx² Σy²
When deviation taken from an assumed
mean:
r= N Σdxdy - Σdx Σdy
√N Σdx²-(Σdx)² √N Σdy²-(Σdy)²
Procedure for computing the
correlation coefficient
Calculate the mean of the two series ‘x’ &’y’
Calculate the deviations ‘x’ &’y’ in two series from their
respective mean.
Square each deviation of ‘x’ &’y’ then obtain the sum of
the squared deviation i.e.∑x2 & .∑y2
Multiply each deviation under x with each deviation under
y & obtain the product of ‘xy’.Then obtain the sum of the
product of x , y i.e. ∑xy
Substitute the value in the formula.
Interpretation of Correlation
Coefficient (r)
The value of correlation coefficient ‘r’ ranges
from -1 to +1
If r = +1, then the correlation between the two
variables is said to be perfect and positive
If r = -1, then the correlation between the two
variables is said to be perfect and negative
If r = 0, then there exists no correlation between
the variables
Properties of Correlation coefficient
The correlation coefficient lies between -1 & +1
symbolically ( - 1≤ r ≥ 1 )
The correlation coefficient is independent of the
change of origin & scale.
The coefficient of correlation is the geometric mean of
two regression coefficient.
r = √ bxy * byx
The one regression coefficient is (+ve) other regression
coefficient is also (+ve) correlation coefficient is (+ve)
Assumptions of Pearson’s
Correlation Coefficient
There is linear relationship between two
variables, i.e. when the two variables are
plotted on a scatter diagram a straight line
will be formed by the points.
Cause and effect relation exists between
different forces operating on the item of
the two variable series.
Advantages of Pearson’s Coefficient

It summarizes in one value, the


degree of correlation & direction
of correlation also.
Limitation of Pearson’s Coefficient

Always assume linear relationship


Interpreting the value of r is difficult.
Value of Correlation Coefficient is
affected by the extreme values.
Time consuming methods
Coefficient of Determination
The convenient way of interpreting the value of
correlation coefficient is to use of square of
coefficient of correlation which is called
Coefficient of Determination.
The Coefficient of Determination = r2.
Suppose: r = 0.9, r2 = 0.81 this would mean that
81% of the variation in the dependent variable
has been explained by the independent variable.
Coefficient of Determination
The maximum value of r2 is 1 because it is
possible to explain all of the variation in y but it
is not possible to explain more than all of it.
Coefficient of Determination = Explained
variation / Total variation
Coefficient of Determination: An example
Suppose: r = 0.60
r = 0.30 It does not mean that the first
correlation is twice as strong as the second the
‘r’ can be understood by computing the value of
r2 .
When r = 0.60 r2 = 0.36 -----(1)
r = 0.30 r2 = 0.09 -----(2)
This implies that in the first case 36% of the total
variation is explained whereas in second case
9% of the total variation is explained .
Spearman’s Rank Coefficient of
Correlation
When statistical series in which the variables
under study are not capable of quantitative
measurement but can be arranged in serial order,
in such situation pearson’s correlation coefficient
can not be used in such case Spearman Rank
correlation can be used.
R = 1- (6 ∑D2 ) / N (N2 – 1)
R = Rank correlation coefficient
D = Difference of rank between paired item in two series.
N = Total number of observation.
Interpretation of Rank
Correlation Coefficient (R)
The value of rank correlation coefficient, R
ranges from -1 to +1
If R = +1, then there is complete agreement in
the order of the ranks and the ranks are in the
same direction
If R = -1, then there is complete agreement in
the order of the ranks and the ranks are in the
opposite direction
If R = 0, then there is no correlation
Rank Correlation Coefficient (R)
a) Problems where actual rank are given.
1) Calculate the difference ‘D’ of two Ranks
i.e. (R1 – R2).
2) Square the difference & calculate the sum of
the difference i.e. ∑D2
3) Substitute the values obtained in the
formula.
Rank Correlation Coefficient
b) Problems where Ranks are not given :If the
ranks are not given, then we need to assign
ranks to the data series. The lowest value in the
series can be assigned rank 1 or the highest
value in the series can be assigned rank 1. We
need to follow the same scheme of ranking for
the other series.
Then calculate the rank correlation coefficient in
similar way as we do when the ranks are given.
Rank Correlation Coefficient (R)
Equal Ranks or tie in Ranks: In such cases
average ranks should be assigned to each
individual. R = 1- (6 ∑D2 ) + AF / N (N2 – 1)

AF = 1/12(m13 – m1) + 1/12(m23 – m2) +…. 1/12(m23 – m2)


m = The number of time an item is repeated
Merits Spearman’s Rank Correlation
This method is simpler to understand and easier
to apply compared to karl pearson’s correlation
method.
This method is useful where we can give the
ranks and not the actual data. (qualitative term)
This method is to use where the initial data in
the form of ranks.
Limitation Spearman’s Correlation
Cannot be used for finding out correlation in a
grouped frequency distribution.
This method should be applied where N
exceeds 30.
Advantages of Correlation studies
Show the amount (strength) of relationship
present
Can be used to make predictions about the
variables under study.
Can be used in many places, including natural
settings, libraries, etc.
Easier to collect co relational data

You might also like