Correlation & Regression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Correlation

• Meaning and Definition.

• Types.

• Graphical Measurement of correlation.

• Correlation Coefficient.

• Rank Correlation coefficient.


Meaning of correlation

• There are various methods of measuring the relationship existing variables.


The simplest of them is correlation.

• Correlation is the Co-variation or co-relation between two variables. These


variables change together.

Defining Correlation

• Correlation is a measure of the strength of relationship existing between two


are more variables.

• The strength of linear relationship between two variables is called simple


correlation and the strength of linear relationship between three are more
variables is called multiple correlation.

• A correlation does not ensure “causation”. it is a measure of association only.

• Both variables tend to be high or low (positive relationship) or one tends to be


high when the other is low (negative relationship). There is no relationship
between variables( Zero relationship
Types of Correlation:

Positive Correlation:
• Both variables moves in the same direction.
• Association between variables such that high scores on one variable
tend to have high scores on the other variable.
Ex: supply and price.

Negative Correlation:
• Both variables moves in the opposite direction.
• Association between variables such that high scores on one variable
tend to have low scores on the other variable
Ex: demand and price.

Uncorrelated : No relationship between variables.


Measurement of Correlation
• Graphic Method
• Algebraic Method

Scatter Diagram:
• A scatter diagram is a graphical tool for analyzing
correlation between two variables.

• One variable is plotted on the horizontal axis and


the other is plotted on the vertical axis.

• The pattern of their intersecting points can


graphically show correlation patterns
Scatter Plots
Linear relationships Curvilinear relationships

Positive
Positive
linear Correlation
Curvilinear Correlation
Y Y

X X
Negative Curvilinear
correlation
Y Negative Y
linear correlation

X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Scatter Plots

No relationship

Zero Correlation
X

X
Correlation Coefficient
Need for Correlation Coefficient
• Scatter diagram give only a rough idea of relationship between
two variables.
• It can not give a precise quantitative measure of correlation
between two variables.
• We use a statistic called correlation coefficient which is a pure
number that quantifies the relationship between two variables.
The Carl Pearson Correlation Coefficient:
• A statistic that quantifies a linear relation between two continuous
variables.
• Most widely used coefficient in the literature

• Symbolized by the italic letter “r “ when it is a statistic based on


sample data.
• Symbolized by the italic letter “ρ(rho)” when it is a population
parameter.
Formula for Pearson’s correlation coefficient:

𝐶𝑜𝑣 𝑋𝑌
rxy = σ𝑋 σ𝑌

shortcut method
Assumptions of Pearson’s Correlation Coefficient:

• No distinction between independent (x) and dependent(y)


variable.

• Both variables are continuous.

• The relationship between x and y is linear.

• Both variables must be normally distributed.

• There are no extreme values for both x and y.


Properties of Pearson Correlation Coefficient.

• The sign of r denotes the nature of association.

• r can be either positive , negative or zero.

• while the value of r denotes the strength of association.

• r falls between -1.00 and 1.00

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


Negative Positive
perfect perfect
correlation correlation
no relation
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table. It is required to find the
correlation between age and weight.

S.No Age Weight (Kg)


(years)

1 7 12

2 6 8

3 8 12

4 5 10

5 6 11

6 9 13
Age Weight
Serial
(years) (Kg) XY X2 Y2
no.
(x) (Y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑Y= ∑ xY= ∑x2= ∑y2=
41 66 461 291 742
calculation

r = 0.759

strong positive correlation


Testing the significance of “r”
1.Objective: To test the significance of correlation coefficient.

2. Hypotheses:
Ho : ρ = 0
H1 : ρ ≠ 0 (two-tailed),
: ρ > 0 or ρ < 0 (one-tailed)
where ‘ρ’ is the population correlation coefficient.

𝑟
3. Test statistic: t = ( 𝑛−2 ) with (n – 2) d.f
1−𝑟2
where ‘ r ’ = sample correlation coefficient.
Example:1
Suppose twenty observations on prices and
quantity sold yielded a correlation coefficient of
0.62. Can we infer that the price and quantity
sold are correlated? Test at 5% level.
Spearman Rank Correlation Coefficient
 It is a non-parametric measure of correlation.

 This procedure makes use of the two sets of ranks that


may be assigned to the sample values of x and Y.

 Spearman Rank correlation coefficient could be computed


in the following cases:

 Both variables are quantitative.

 Both variables are qualitative ordinal.

 One variable is quantitative and the other is qualitative


ordinal.
Spearman’s Rank Correlation Coefficient
What is Spearman’s rank correlation coefficient ?
 It is a non-parametric measure of correlation.

 This procedure makes use of the two sets of ranks that may
be assigned to the sample values of x and Y.

Why choose Spearman rank correlation coefficient instead of a


Pearson correlation coefficient?

 Both X and Y are measured in terms of ranks.

 Sample size is small.

 X and Y are continuous but are not normally distributed


(e.g. are severely skewed).
Formula of Spearman’s rank correlation coefficient.

i i

Where Di = Xi - Yi for i = 1, 2, -------, N.


we have −1 ≤ rs ≤ 1
1. Rank the values of X and Y in ascending
or descending order.
2. Compute the value of Di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
3. Square each Di and compute ∑Di2 which
is the sum of the squared values.
4. Compute N(N2 – 1), where N is the
sample size.
Example.1
In a study of the relationship between level
education and income the following data was
obtained. Find the relationship between them
and comment.

sample level education Income


numbers (X) (Y)
A Preparatory. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F illiterate 50
G University. 60
solution
Rank Rank di di2
(X) (Y) X Y
A Preparatory 25 5 3 2 4

B Primary. 10 6 5.5 0.5 0.25


C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university. 60 1.5 1 0.5 0.25

∑ di2=64
Problem.II
The following are the ranks given by two judges X & Y for 12
contestants(A, B,…..) in a singing competition. Find out whether
the judges are in agreement.

S.NO A B C D E F G H I J K L
X 1 9 2 10 3 11 8 4 12 7 5 6
Y 2 9 1 7 4 10 8 3 12 6 5 11
Some of the many
Types of Correlation Coefficients

Name X variable Y variable

Pearson r Interval/Ratio Interval/Ratio

Spearman rho Ordinal Ordinal

Kendall's Tau Ordinal Ordinal

Phi Dichotomous Dichotomous

Intraclass R Interval/Ratio Interval/Ratio


Test Retest
Point Bi serial r Interval/Ratio dichtomous

You might also like