0% found this document useful (0 votes)
64 views42 pages

Topic 3 Regression & Correlation

This document discusses regression and correlation analysis techniques. It begins with an introduction to regression and correlation, explaining they are used to measure the linear relationship between two variables. Key terms like regression equation, slope, and correlation coefficient are defined. The document then explains regression is used to obtain a mathematical equation describing the relationship between variables, while correlation measures the strength of association between variables. Finally, it discusses measures of correlation like the product moment correlation coefficient and Spearman rank correlation coefficient and includes examples demonstrating their calculation and interpretation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
64 views42 pages

Topic 3 Regression & Correlation

This document discusses regression and correlation analysis techniques. It begins with an introduction to regression and correlation, explaining they are used to measure the linear relationship between two variables. Key terms like regression equation, slope, and correlation coefficient are defined. The document then explains regression is used to obtain a mathematical equation describing the relationship between variables, while correlation measures the strength of association between variables. Finally, it discusses measures of correlation like the product moment correlation coefficient and Spearman rank correlation coefficient and includes examples demonstrating their calculation and interpretation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 42

Probability & Statistical Modelling

AQ077-3-2-PSMOD and Version 1

Regression and Correlation


Topic & Structure of The Lesson

 Introduction
 Regression
 Correlation

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Learning Outcomes

At the end of this section, You should be able to:


 Analyse bi-variate data using regression & correlation techniques
used to measure the linear relationship between two variables.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Key Terms You Must Be Able To
Use
If you have mastered this topic, you should be able to use the following terms
correctly in your assignments and exams:
(Prepare your own list )

 Regression equation
 Least square
 Slope
 Y-intercept
 Pearson product moment correlation
 Coefficient of determination
 Spearman rank correlation

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Introduction

 Correlation & Regression are concerned with


measuring the linear relationship between two
variables.
 Scattergram is used to illustrate any
relationship between two variables.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Correlation analysis is used to
measure strength of the association
(linear relationship) between two
variables
Only concerned with strength of the
relationship
No causal effect is implied

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Examples of scatter plots

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


AQ077-3-2 Probability and Statistical Modelling Regression and Correlation
AQ077-3-2 Probability and Statistical Modelling Regression and Correlation
 Regression
 Regression in concerned with obtaining a
mathematical equation which describes the
relationship between two variables.
 The independent variable is the one that is chosen freely or
occurs naturally.
 The dependent variable occurs as a consequence of the
value of the independent variable.
 It is normally used for estimation purposes.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Types of Regression Models

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 3 common methods used to determine a
regression line
 inspection method
 semi-average method
 least square method

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Least square method
 the standard method of obtaining a regression line.
 For any set of bivariate, there are two regression
line which can be obtained
 x on y regression line
 used for estimating x given a value of y
 y on x regression line
 used for estimating y given a value of x.
 Note that for this syllabus, only the y on x regression
line is dealt with.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 If the least square equation is given by
y= a + bx,
then,

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Example 1
• The following table shows the amount spent on advertising and the
corresponding sales of the product from 6 companies.
Company Advertising Cost ($000) Sales ($000)
A 8 25
B 12 35
C 11 29
D 5 24
E 14 38
F 3 12

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


(a) Plot a scattergram showing the relationship
between advertising cost and sales of the
product.
(b) Calculate the equation of the regression line of
sales on advertising costs. Draw the regression
line on the scattergram.
(c) Use the regression line to forecast sales if
advertising costs were
(i) $10000
(ii) $1000
(d) Justify your answer in part (c)(ii).
AQ077-3-2 Probability and Statistical Modelling Regression and Correlation
Correlation
 It is a technique used to measure the strength
of relationship between two variables by
measuring the degree of ‘scatter’ of the data
values.
 The less scatter the data values are, the
stronger the correlation.
 Two types of correlation
 Positive (direct)
 Negative (inverse)

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Measures of correlation
 Product moment correlation coefficient
 Coefficient of determination
 Spearman rank correlation coefficient

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Product moment correlation coefficient, r
 It measures the extent to which two variables
move in sympathy with or in opposition to one
another.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 The correlation coefficient, r lies between 0
and  1.
 When r = 0, it signifies there is no correlation
present
 When r = 1, it signifies perfect positive
correlation
 When r = -1, it signifies perfect negative
correlation
 The further away r is from 0, the stronger is
the correlation.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Examples of appropriate r values

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Coefficient of determination, r2
 It indicates the proportion of variance in the
dependent variable that is explained
statistically by knowledge of the independent
variable and vice versa.
 Notice that, since –1  r  +1, it follows that 0
 r2  +1

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Example 2
Use the data of Example 1, calculate the
(i) Product moment correlation coefficient
(ii) Coefficient of determination
and interpret the result.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Spearman rank correlation coefficient, rs
 It can be used:
 as an approximation to the product moment
coefficient
 With non-numeric data that can be ranked

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Procedure for obtaining rs
 Rank the x values, rx
 Rank the y values, ry
 For each pair of ranks, calculate d2 =(rx – ry)2
 Calculate d2
 The value of the rank correlation coefficient can
then be calculated as below:

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Example 3
In a survey of TV viewers in Sabah and KL, the
following programme were ranked in order of
preference. Calculate the Spearman’s rank
correlation coefficient for the data. Comment on
the result. TV programme Sabah KL
Biggest Loser 1 2
Amazing Race 2 5
TV3 news 4 4
Hero 3 6
24 5 1
CSI 7 7
Ultraman 6 3

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Example 4
• The following table shows the marks of
eight pupils in biology and chemistry.
Find the value of Spearman’s coefficient
of rank correlation.
Biology (x) 65 65 70 75 75 80 85 85

Chemistry (y) 50 55 58 55 65 58 61 65

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Comparison of rank and product
moment correlation ( with (+) and (-) signifying whether the
feature can be thought of as an advantage or disadvantage respectively)
 Product moment coefficient
 The standard measure of correlation (+)
 Data must be numeric (-)
 The calculations can be awkward. (-)

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Rank coefficient
 Only an approximation to the product moment
coefficient. (-)
 Easier to use with less involved calculations. (+)
 Can be used with non-numeric data. (+)
 Can be insensitive to small changes in actual
values. (-)

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Practical difficulties in drawing conclusions
from correlation coefficient
 A high correlation coefficient does not necessarily
imply that the variables are related to one another-
spurious correlation
 A low correlation coefficient between two variables
does not necessarily mean that there is little
relationship between them but there are also some
additional factors exerting an influence.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


EXCEL
=SLOPE( )
=INTERCEPT( )
=PEARSON( )

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Summary of Main Teaching Points

 Regression
 It is concerned with producing a mathematical function
which describes the relationship between two variables.
It is normally used for estimation purposes.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 There are three common methods used to determine a
regression line for a set of bivariate data.
 Inspection
 Method of semi-averages
 Method of least squares.
 This is the standard technique for obtaining a regression line.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 In most examinations questions the bivariate variables
involved will be labelled (usually x and y) and the
regression line of y on x will be asked for. Where this is
not the case, it is usual to label the independent variable
as x and the dependent variable as y and thus the y on x
regression line will be appropriate.
 Please take note that
 Other forms of regression are sometimes
appropriate and calculated, e.g. curvilinear
regression.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 An independent variable can be affected by more
than 1 dependent variable.
 Interpolation involves estimating a value of the
dependent variable given a value of the independent
variable within the range of the data used to calculate
the regression line and can be carried out with some
confidence. Estimation outside this range is known as
extrapolation and the results should be treated with
caution.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 Correlation
 Concerned with describing how well two variables are
associated by measuring the degree of ‘scatter’ of the data
values.
 Two types
 Positive (direct)
 Increases in one variable are associated with increases in the
other.

Negative (inverse)
 Increases in one variable are associated with decreases in the
other.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 A quantitative measure of correlation is given by the
(product moment) correlation coefficient, r.
 -1  r  1
 r = - 1 signifies perfect negative correlation
 r = 0 signifies no correlation
 r = + 1 signifies perfect positive correlation
 The product moment correlation coefficient is the standard
measure of correlation for numeric data. It cannot be calculated
for non-numeric data.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


 The coefficient of determination,r2, is used to indicate the
proportion of the total variation in the dependent variable
(y) that is due to variations in the independent variable
(x).
 Spearman’s rank correlation coefficient can be used:
 As an approximation to the product moment coefficient
 With non- numeric data that can be ranked.
 Correlation does not necessarily imply causality.

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


Question and Answer Session

Q&A

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


What we will cover next

 Probability Distribution

AQ077-3-2 Probability and Statistical Modelling Regression and Correlation


AQ077-3-2 Probability and Statistical Modelling Regression and Correlation
Slide ‹#› of 9
AQ077-3-2 Probability and Statistical Modelling Regression and Correlation
Slide ‹#› of 9

You might also like