Chapter 6

Chapter 6 : Simple Linear Regression
CHAPTER 6 : SIMPLE LINEAR REGRESSION

Sub-Topic
Introduction.
Scatter plots.
Simple linear regression model.
The least square method.
Inference of regression of coefficient.
Confidence intervals of the regression line.
Coefficient of determination.
Coefficient of pearson correlation.
Chapter Learning Outcome
Solve the problems involve the simple linear regression.
Learning Objective
By the end of this chapter, students should be able to
Draw the scatter plots.
Plot the regression line through the least square method.
Make inference concern to regression coefficients.
Find and interpret the determination coefficient and correlation coefficient.
Key Term (English to Bahasa Melayu)
English
Bahasa Melayu
1.
Independent variable
Pemboleh ubah tidak bersandar
2.
Dependent variable
Pemboleh ubah bersandar
3.
Scatter plot
Plot serakan
4.
Intercept
Pintasan
5.
Slope
Kecerunan
6.
Simple linear regression
Regresi linear ringkas
7.
Least square method
Kaedah kuasa dua terkecil
8.
Correlation
Hubungan
9.
Confidence interval
Selang keyakinan
249
6.1 Introduction
A major objective of many statistical investigations is to establish relationships that

make it possible to predict one or more variables in terms of others. Thus, studies are
made to predict the potential sales of a new product in terms of its price, a patients
weight in terms of the number of weeks he or she has been on diet, family
expenditures on entertainment in terms of family income, the per capita consumption
of certain foods in terms of their nutritional values and the amount of money spent
advertising them on television, and so forth.
Although it is desirable to be able to predict one quantity exactly in terms of
others, this is seldom possible, and in most instances we have to be satisfied with
predicting averages or expected values. We may not be able to predict exactly how
much money Aida will make 10 years after graduating from college, but if we are
given suitable data, we can predict the average income of a college graduate in terms
of the number of years she has been out of college.
6.2 Scatter Plots
Definition 1
A scatter plot is a graph of the ordered pairs ( x, y ) of numbers consisting of the
independent variable x and the dependent variable y.
Theory 1
In simple correlation and regression studies, the researcher collects data on two
numerical or quantitative variables to see whether a relationship exists between the
variables. The two variables for this study are called the independent variable and the
dependent variable. The independent variable, x, is the variable in regression that
can be controlled. It is also a variable used to predict or model. The dependent
variable, y, is the variable in regression that cannot be controlled. It is a variable to
be predicted or modeled.
250
For example, if the researcher wishes to see whether there is a relationship

between number of hours study and test scores in exam. In this case, the independent
variable is number of hours study while the dependent variable is the test scores in
the students exam. The reason for this selection is the test scores depends on the
number of hours he studied. He can control the number of hours he studies for exam.
Example 1
Construct a scatter plot for the data obtained in a study on the number of absences and
the final grades of seven randomly selected students from a statistics class. The data
are shown below :
Student
A
B
C
D
E
F
G
Number of absences, x
6
2
15
9
12
5
8
Final grade, y (%)

82
86
43
74
58
90
78
Answer Example 1
1. Draw and label the x and y axes.
2. Plot each point on the graph.
Scatter Plot for Example 1
100
Final grade, y
80
60
40
20
0
0
10
Number of absences, x
251
15
20
Example 2
Suppose an experiment involving five subjects is conducted to determine the
relationship between the percentage of a certain drug in the bloodstream and the
length of time it takes to react to a stimulus.
Subject
1
2
3
4
5
Amount of drug, x (%)

1
2
3
4
5
Reaction time, y(seconds)

1
1
2
2
4
Answer Example 2
1. Draw and label the x and y axes.
2. Plot each point on the graph.
Reaction time, y
(seconds)
Scatter Plot for Example 2

4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
Amount of drug, x (%)
252
Exercise 6.2
1.
A researcher wishes to determine if a persons age is related to the number of

hours he or she exercises per week. Draw a scatter plot for the variables. The
data for the sample are shown below.
2.
Age, x
18
26
32
38
52
59
Hours, y
10
1.5
The number of calories and the number of milligrams of cholesterol for a

random sample of fast-food chicken sandwich from six caf are shown below.
Draw a scatter plot for the variables.
3.
Calories, x
390
535
720
300
430
500
Cholesterol, y
43
45
80
50
55
52
Various doses of a poisonous substance were given to five mice and following
results were observed. Draw a scatter plot for the variables.
4.
Dose, x (mg)
10
12
14
16
No. of deaths, y
14
16
20
A researcher desires to know whether the typing speed of a secretary (in

words per minute) is related to the time (in hours) that it takes the secretary to
learn to use a new word processing program. Draw a scatter plot for the
variables. The data are shown below.
5.
Speed, x
48
74
52
79
83
56
85
63
88
74
90
92
Time, y
3.5
2.3
2.1
4.5
1.9
1.5
The following data pertain to the chlorine residual in a swimming pool at

various times after it has been treated with chemicals. Draw a scatter plot for
the variables.
No. of hours, x
Chlorine residual (parts per million), y
253
10
12
1.8
1.5
1.4
1.1
1.1
0.9
6.
Mehta and Deopura (1995) studied the mechanical properties of spun PETLCP blend fibers. They believe that the modulus (the response) depends on
the percent of PET in the blend. The data is given by the table below. Make a
scatter plot of the data.
7.
PET %, x
100
97.5
95
90
80
50
Modulus, y
2.12
2.26
2.57
3.26
3.46
4.54
8.5
The job placement center at State University wants to determine whether

students grade point averages (GPAs) can explain the number of job offers
they receive upon graduation. The data seen here are for 10 recent graduates.
Draw a scatter plot for the variables.
8.
GPA, x
3.25
2.35
1.02
0.36
3.69
2.65
2.15
1.25
3.88
3.37
Offers, y
Dr. Ahmad has noticed many of his students have been absent from class this
semester. He feels that he can explain this sluggish attendance by the
distance his students live from campus. Eleven students are selected as to
how many miles they must travel to attend class and the number of classes
they have missed. Draw a scatter plot for the variables given in the table
below.
9.
Miles, x
12
16
Misses, y
Ten sales people were surveyed and the average number of clients contacts
per month , x, and the sales volume, y (in thousands), were recorded for each.
Draw the scatter plot for the variables.
X
12
14
16
20
23
46
50
48
50
55
15
25
30
30
30
80
90
95
110
130
254
10.
The following are loads (grams) put on the ends of like plastic rods with the
resulting deflections (cm). Draw the scatter plot for the variables.
11.
Load (x)
25
30
35
40
55
45
50
60
Deflection (y)
1.58
1.39
1.41
1.60
1.81
1.78
1.65
1.94
The following are the sample data provided by a moving company on the
weights of six shipments and the damage that was incurred. Draw the scatter
plot for the variables.
Weight (1000 pounds) (x)
Damage (dollars) (y)
12.
1.6
1.2
3.4
4.8
160
112
69
90
123
186
The following data pertain to the demand for a product (in thousands of unit)
and its price (in cents) charged in five different market areas. Draw the scatter
plot for the variables.
13.
Price, x
20
16
10
11
14
Demand, y
22
41
120
89
56
To reduce crimes, the president has budgeted more money to put more police
on our city streets. Use the data below to draw a scatter plot for the variables.
14.
Police, x
13
15
23
25
15
10
20
No. of reported crimes, y
12
18
10
Aunt Reeta wants to get more yields from her tomato plants this summer by
increasing the number of times she uses fertilizer. Based on the data below,
draw a scatter plot for the variables.
Use of fertilizer, x
Yield (pounds), y
12
20
15
17
255
15.
The resident of Taman Seri are worried about a rise in housing costs in the
area. The head of the people think that home prices fluctuate with the land
values. Data on 10 recently sold homes and the cost of the land on which they
were built are seen here in thousands of ringgit. Draw a scatter plot for the
variables.
Land values, x
7.0
6.9
5.5
3.7
5.9
3.8
8.9
9.6
9.9
10.0
Cost of the house, y
67
63
60
54
58
36
76
87
89
92
30
40
50
Answer Exercise 6.2

1.
12
Hours, y
10
8
6
4
2
0
0
10
20
60
70
Age, x
2.
90
Cholesterol, y
80
70
60
50
40
30
20
10
0
0
200
400
Calories, x
256
600
800
3.
25
No. of deaths, y
20
15
10
5
0
0
10
15
20
Dose, x
Time, y
4.
9
8
7
6
5
4
3
2
1
0
0
20
40
60
80
100
Speed, x
Chlorine residual, y
5.
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
No. of hours, x
257
10
12
14
Modulus,y
6.
9
8
7
6
5
4
3
2
1
0
0
20
40
60
80
100
120
PET %,x
7.
7
6
Offers, y
5
4
3
2
1
0
0
GPA, x
8.
6
Misses, y
5
4
3
2
1
0
0
10
Miles, x
258
15
20
9.
140
Sales volume, y
120
100
80
60
40
20
0
0
10
20
30
40
50
60
No. of clients,x
10.
2.5
Deflection (y)
2
1.5
1
0.5
0
0
10
20
30
40
50
60
70
Load, x
Damage (dollars),y
11.
200
180
160
140
120
100
80
60
40
20
0
0
3
Weight (pounds),x
259
12.
140
Demand, y
120
100
80
60
40
20
0
0
10
15
20
25
Price, x
No. of reported crimes, y
13.
20
18
16
14
12
10
8
6
4
2
0
0
10
15
20
25
30
Police, x
14.
Yield (pounds), y
25
20
15
10
5
0
0
Use of fertilizer, x
260
10
Cost of the house, y
15.
100
90
80
70
60
50
40
30
20
10
0
0
10
12
Land values, x
6.3 Simple Linear Regression Model
Definition 2
A simple linear regression is a statistical technique used to find relationships
between variables for the purpose of predicting future values. It enables us to see the
trend and make predictions on the basis of data.
Theory 2
Given a scatter plot, one must be able to draw the line best fit. Best fit means that the
sum of the squares of the vertical distances from each point to the line is at a
minimum. The closer the points are to the line, the better the fit and the prediction
will be.
261
y
Observed value, y
(x,y)
Line of best fit
y 0 1 x
( x, y )
Predicted value of y, y
From the graph above, the error is approximated by e y y , the difference

between the observed value of y and the predicted value of y, y , at a given value of
x.
The model for simple linear regression is y 0 1 x , where
y = dependent or response variable (variable to be modeled)
x = independent or predictor variable (variable used as a predictor of y)
0 y intercept of the line ( the point at which the line intersects or cuts
through the y-axis)
1 slope of the line (the amount of increase (or decrease) in the

deterministic component of y for every 1-unit increase in x)
statistical error (random variable that accounts for the failure of the
model to fit the data exactly )
This regression model is said to be simple, linear in the parameters, and linear in the
predictor variable. It is simple in that there is only one predictor variable, linear in
the parameters, because no parameter appears as an exponent or is multiplied or
divided by another parameter, and linear in the predictor variable, because this
262
variable appears only in the first power.

6.4 The Least Square Method
Definition 3
One way to know how well a straight line fits a set of data is to note the extent to
which the data points deviate from the line. The deviations (the difference between
the observed and the predicted values of y) or the errors of prediction are the vertical
distances between observed and predicted values. The sum of errors and the sum of
squares of the errors (SSE) gives greater emphasis to large deviations of the points
from the line. It is possible to find many lines for which the sum of errors is equal to
0, but it can be shown that there is one (and only one) line for which the value of sum
of squares of the errors is a minimum. This line is called the least squares line or the
regression line. The methodology used to obtain this regression line is called the
least squares method.
Theory 3
Given the sample data ( xi , yi ); i 1,2,....,n, the coefficients of the least squares line,
for y 0 1 x , the coefficients are;
Sxy
(slope) and 0 y 1 x , (y-intercept) where
Sxx
n
n
1 n n
Sxy ( xi x )( yi y ) xi yi xi yi ,
n i 1 i 1
i 1
i 1
1 n
Sxx ( xi x ) xi xi ,
n i 1
i 1
i 1
n
and n = sample size.
Example 3
Raw material used in the production of a synthetic fiber is stored in a place that has
no humidity control. Measurements of the relative humidity and the moisture content
of samples of the raw material (both in percentage) on 12 days yielded the following
263
results:
(a)
Humidity (x)
Moisture content (y)
46
53
37
42
34
29
60
44
41
48
33
40
12
14
11
13
10
8
17
12
10
15
9
13
Fit a least squares line that will enable us to predict the moisture content in
terms of the relative humidity. Interpret the result.
(b)
Estimate the moisture content when the relative humidity is 38 percent.
Answer Example 3
(a)
x 507, x 22265, y 144 ,

y 1802, xy 6314, n 12
2
We get
2
from the data. Thus,

Sxy 6314
1
1
(507 )(144 ) 230, and Sxx 22,265 (507 ) 2 844.25
12
12
Sxy
230
0.2724 and
Thus, 1
Sxx 844 .25
0 y 1 x
144
507
(0.2724 )
0.4911 ,
12
12
and the equation of the least squares line is y 0 1 x ,

y 0.4911 0.2724 x .
When the humidity is increase by one percent, the moisture content will also
increase by 0.2724 percent.
264
(b)
Substituting x = 38 into the equation obtained in answer (a), we get

y 0.4911 0.2724 (38)
y 10.8423 or y 11, rounded to the nearest unit.
Example 4
The following are the scores that 12 students obtained on the mid-term and final
examinations in a course in statistics.
Mid-term examination, x
Final examination, y
71
49
80
73
93
85
58
82
64
32
87
80
(a)
83
62
76
77
89
74
48
78
76
51
73
89
Find the equation of the least squares line that will enable us to predict a
students final examination score in this course on the basis of his or her score
in the mid-term examination. Interpret the result.
(b)
Predict the final examination score of a student who score 84 in the mid-term
examination.
Answer Example 4
(a)
x 854, x 64222 , y 876

y 65850 , xy 64346 , n 12
2
We get
2
from the data. Thus,

Sxy 64346
1
(854 )(876 ) 2004 , and
12
265
Sxx 64222
1
(854 ) 2 3445 .67
12
Sxy
2004
0.5816 and
Thus, 1
Sxx 3445 .67
0 y 1 x
876
854
(0.5816 )
31.609 ,
12
12
and the equation of the least squares line is y 0 1 x ,

y 31.609 0.5816 x .
When the score in mid-term examination is increase by one mark, the score in
final examination will also increase by 0.5816 marks.
(b)
Substituting x = 84 into the equation obtained in answer (a), we get

y 31.609 0.5816 (84)
y 80.4634 or y 80, rounded to the nearest unit.
Exercise 6.4
1.
From the Exercise 6.2(1), find the regression line using the least squares
method. Interpret the result. Then, estimate the number of hours he or she
exercises per week when his or her age is 50 years old.
2.
method. Interpret the result. Then, estimate the number of milligrams of
cholesterol when the number of calories is 650.
3.
method. Interpret the result. Then, estimate the number of deaths when the 5
mg dose of a poison is given to the mice.
4.
method. Interpret the result. Then, estimate the time that it takes the secretary
266
to learn when the typing speed is 100 words per minute.
5.
method. Interpret the result. Then, estimate the chlorine residual in a
swimming pool when the various times after it has been treated with
chemicals is 13 hours.
6.
method. Interpret the result. Then, estimate the modulus when the PET in the
blend is 88%.
7.
method. Interpret the result. Then, estimate the number of job offers when the
GPA of a student is 2.98.
8.
method. Interpret the result. Then, estimate the number of class a student will
misses when he or she lives 15 miles from campus.
9.
method. Interpret the result. Then, estimate the sales volumes when the
number of clients is 60.
10.
method. Interpret the result. Then, estimate the deflections when the load is 65
grams.
11.
method. Interpret the result. Then, estimate the damage incurred when the
weight is 5500 pounds.
267
12.
method. Interpret the result. Then, estimate the demand for a product when the
price is 50 cents.
13.
method. Interpret the result. Then, estimate the number of reported crimes
when there are 19 policemen.
14.
method. Interpret the result. Then, estimate the yields of her tomato plants
when she uses the fertilizer 10 times.
15.
method. Interpret the result. Then, estimate the cost of the house when the
land value is RM 73000.
Answer Exercise 6.4

1.
(a) y 10.4989 0.17997 x ,
(b) y 1.5004
2.
(a) y 20.2369 0.07081 x ,
(b) y 66.2634
3.
(a) y 6.5357 1.625 x ,
(b) y 1.5893
4.
(a) y 14.083 0.1371 x ,
(b) y 0.373
5.
(a) y 1.8999 0.0857 x ,
(b) y 0.7858
6.
(a) y 8.221 0.0602 x ,
(b) y 2.9234
7.
(a) y 0.248 1.272 x ,
(b) y 3.5426
8.
(a) y 2.647 0.06974 x ,
(b) y 3.6931
9.
(a) y 13.4202 2.303 x ,
(b) y 124.7598
10.
(a) y 1.086 0.01314 x ,
(b) y 1.9401
11.
(a) y 34.146 29.729 x ,
(b) y 197.6555
12.
(a) y 196.775 9.238 x , (b) y 265.125
268
13.
(a) y 0.930 0.642 x ,
(b) y 11.268
14.
(a) y 4.8592 1.668 x ,
(b) y 21.5392
15.
(a) y 17.855 7.071x ,
(b) y 534.038
6.5 Inference of Regression Coefficient
Definition 4
The inference of regression coefficients describes how to conduct a hypothesis test
to determine whether there is a significant linear relationship between an independent
variable x and a dependent variable y. The test focuses on the slope of the regression
line y 0 1 x , where 0 is a constant, 1 is the slope (also called the regression
coefficient), x is the value of the independent variable, and y is the value of the
dependent variable.
6.5.1
Hypothesis testing on slope, 1
Definition 5
Hypothesis testing concerning 1 and 0 requires the additional assumption that the
model errors i are normally distributed. Thus the complete assumptions are that the
errors are normally and independently distributed (NID) with mean 0 and variance
2 , i ~ (0, 2 ) .
Theory 4
To test the hypothesis that the slope equals a constant, say C .
The appropriate hypothesis is
H 0 : 1 C
H 1 : 1 C , or 1 C , or 1 C
Therefore, the statistics Z test
2
.
with 1 ~ 1 ,
Sxx
1 C
2 / Sxx
269
is distributed with N (0, 1) if the null
hypothesis, H 0 : 1 C is true.
However, the residual mean square, MSE is an unbiased estimator of 2 , and the
distribution of (n-2) MSE / 2 is n22 . Both MSE and 1 are independent variables,
so these conditions imply that if we replace 2 in Z test by 2 MSE, the statistics
Ttest
1 C
MSE / Sxx
is distributed as T
with n-2 degrees of freedom if the null
hypothesis, H 0 : 1 C is true. The statistics of T is used to test the null hypothesis

by comparing the observed value of T with the upper / 2 percentage point of the
t n 2 distribution (t / 2,n2 ) and rejecting the null hypothesis if Ttest t / 2,n 2 .
Example 5
Based on the Example 3, test the hypothesis concerning H 0 : 1 1 against the
H 1 : 1 1 at the 0.05 level of significance.
Answer Example 5
Step 1 : State the hypothesis
H 0 : 1 1
H 1 : 1 1
Step 2: 0.05 , v n 2 12 2 10 , this is a one-tailed test (right)

Ttable= t ,v t 0.05,10 1.812 , reject H 0 when Ttest is more than Ttable.
Step 3: Compute MSE, and Ttest.
Sxy 230, Sxx 844.25 , 1 0.2724 and 0 0.4911 .
Syy 1802
144 2
74
12
SSE = Syy 1 S xy 74 0.2724 (230) 11.348

MSE
SSE 11.348
1.1348
n2
10
270
Ttest =
1 C
MSE S xx
0.2724 1
1.1348 844.25
19.8458
Step 4 : Make decision

Do not reject H 0 since Ttest is less than Ttable
Step 5 : Make conclusion
We can conclude that the slope is equal to one.
Example 6
Answer Example 6
H 0 : 1 5
H 1 : 1 5
Step 2: 0.05 , / 2 0.025 v n 2 12 2 10 , this is a two-tailed test

Ttable = t / 2,v t 0.025,10 2.228 , reject H 0 when T test is more than 2.228 or less
than -2.228.
Sxy 2004, Sxx 3445.67 , 1 0.5816 and 0 31.609 .

Syy 65850
876 2
1902
12
SSE = Syy 1 S xy 1902 0.5816 (2004 ) 736.4736

MSE
Ttest =
SSE 736 .4736
73.64736
n2
10
1 C
MSE S xx
0.5816 5
73.64736 3445 .67
30.222

Reject H 0 since Ttest is less than -2.2282.
271

We can conclude that the slope is not equal to five.
Exercise 6.5.1
1.
Based on the Exercise 6.2(1), test the hypothesis concerning H 0 : 1 1

against the H 1 : 1 1 at the 0.05 level of significance.
2.

3.

4.
Based on the Exercise 6.2(4), test the hypothesis concerning H 0 : 1 0.5

against the H 1 : 1 0.5 at the 0.1 level of significance.
5.

6.

7.

8.

272
9.

10.

11.

12.

13.

14.

15.

Answer Exercise 6.5.1

1.
Ttest = -19.664, do not reject H0.
2.
Ttest = 35.5618, do not reject H0.
3.
Ttest = -3.7238, reject H0.
4.
5.
6.
273
7.
8.
9.
10.
Ttest =-281.0637, reject H0.
11.
Ttest =5.6907, reject H0.
12.
Ttest =-5.9512, do not reject H0.
13.
Ttest =-3.3558, reject H0.
14.
Ttest =2.6367, do not reject H0.
15.
Ttest =8.0134, reject H0.
6.5.2
Hypothesis testing on intercepts, 0
Theory 5
A similar procedure can be used to test hypothesis about the intercept. To test the
hypothesis that the intercept equals a constant, say C . The appropriate hypothesis
are
H 0 : 0 C
H 1 : 0 C , or 0 C , or 0 C
The statistics Ttest
1 x2
.
with 0 ~ 0 ,
n Sxx
0 C
MSE(1 / n x 2 / Sxx)
is distributed as T with n-2 degrees of
freedom if the null hypothesis is true. The statistics of T is used to test the null
hypothesis by comparing the observed value of T with the upper / 2 percentage
point of the t n 2 distribution (t / 2,n2 ) and rejecting the null hypothesis if
Ttest t / 2,n 2 .
Example 7
274
Answer Example 7
H 0 : 0 1
H1 : 0 1
Step 2: 0.05 , v n 2 12 2 10 , this is a one-tailed test (right)

Ttable = t ,v t 0.05,10 1.812 , reject H0 when Ttest is more than Ttable.
Step 3: Compute MSE, and Ttest..
Sxy 230, Sxx 844.25 , x 2 1785.06 and 0 0.4911 .
Syy 1802
144 2
74
12
SSE = Syy 1 S xy 74 0.2724 (230) 11.348

MSE
Ttest =
SSE 11.348
1.1348
n2
10
0 C
MSE(1 / n x / Sxx)
2
0.4911 1
1.1348 (1 / 12 1785 .06 / 844 .25)
0.3222
Do not reject H 0 since Ttest is less than Ttable..
We can conclude that the intercept is equal to one.
Example 8
Answer Example 8
H 0 : 0 5
275
H1 : 0 5
Step 2: 0.05 , / 2 0.025 v n 2 12 2 10 , this is a two-tailed test

Ttable = t / 2,v t 0.025,10 2.228 , reject H 0 when Ttest is more than 2.228 or less
than -2.228.
Sxy 2004, Sxx 3445.67 , x 2 5064 .69 and 0 31.609 .
876 2
Syy 65850
1902
12
SSE = Syy 1 S xy 1902 0.5816 (2004 ) 736.4736

MSE
Ttest =
SSE 736 .4736
73.64736
n2
10
0 C
31.609 5
73.64736 (1 / 12 5064 .69 / 3445 .67)
2.488
Reject H 0 since Ttest is more than 2.2282.
We can conclude that the intercept is not equal to five.
.6.2 Confidence Intervals for intercept, 0 .

Theory 7
The slope 0 of the regression line of the population can be estimated by means of a
confidence interval.
x2
x2
0 0 t / 2,v MSE
0 t / 2,v MSE
n
Sxx
n Sxx
where v = n-2
276
Example 11
Based on the Example 3, find the 95% confidence interval for the populations
intercept, 0 .
Answer Example 11
Step 1 : n = 12, v n 2 12 2 10,
0.05, t / 2,v t 0.025,10 2.228

Step 2: The value of 0 =0.4911, MSE = 1.1348, x 2 1785.06 and Sxx = 844.25
1 x2
1 x2
0 0 t / 2,v MSE
Step 3: 0 t / 2,v MSE

n Sxx
n Sxx
1 1785 .06
1 1785 .06
0.4911 2.228 1.1348
0 0.4911 2.228 1.1348
12 844 .25
12 844 .25
0.4911 3.5185 0 0.4911 3.5185

3.0274 0 4.0096
Step 4 : We are 95% confident that on average, the mean moisture is between -30274
and
4.0096 percent.
Example 12
Based on the Example 4, find the 90% confidence interval for the populations
intercept, 0 .
Answer Example 12
Step 1 : n = 12, v n 2 12 2 10,
0.10, t / 2,v t 0.05,10 1.812

Step 2: The value of 0 =31.609, MSE = 73.64736 x 2 5064 .69 and Sxx = 3445.67
277
1 x2
1 x2
0 0 t / 2,v MSE
Step 3: 0 t / 2,v MSE

n Sxx
n Sxx
1 5064 .69
1 5064 .69
31.609 1.812 73.64736
0 31.609 1.812 73.64736
12 3445 .67
12 3445 .67
31.609 19.3799 0 31.609 19.3799

12.2291 0 50.9889
Step 4 : We are 90% confident that on average, the mean scores in final examination
is between 12.2291 and 50.9889.
Exercise 6.5.2
1.

2.

3.

4.
Based on the Exercise 6.2(4), test the hypothesis concerning H 0 : 0 0.5

against the H 1 : 0 0.5 at the 0.1 level of significance.
5.

6.

278
7.

8.

9.

10.

11.

12.

13.

14.

15.

279
Answer Exercise 6.5.2

1.
Ttest = 3.9466, reject H0.
2.
3.
4.
5.
6.
7.
8.
9.
Ttest =-0.0669 , do not reject H0.
10.
11.
Ttest =2.0166 , do not reject H0.
12.
13.
14.
15.
6.6 Coefficient of Determination
Definition 7
The coefficient of determination measures the variation of the dependent variable
that is explained by the regression line and the independent variable, x. the symbol for
the coefficient of determination is r2.
Theory 8
If ( xi , yi ); 1,2,...n are the values of a random sample from a bivariate population,
280
then
r2
Syy SSE
SSE
1
. Notes that r2 is always between 0 and 1, because r
Syy
Syy
(correlation coefficient) is between -1 and +1. In simple linear regression, it may also
be computed as the square of the coefficient of correlation, r.
Example 13
Refer to data in Example 5, find and interpret the coefficient of determination.
Answer Example 13
From the example 5, Sxy 230, Sxx 844.25 , Syy 74 and SSE 11.348 , therefore
the coefficient of determination is r 2 1
SSE
11.348
1
0.8466 .
Syy
74
85% of the total variation is explained by the regression line using the independent
variable.
Example 14
Refer to data in Example 6, find and interpret the coefficient of determination.
Answer Example 14
From the Example 6, Sxy 2004, Sxx 3445.67 , Syy 1902 and SSE 736.4736 ,
therefore the coefficient of determination is r 2 1
SSE
736 .4736
1
0.6128 .
Syy
1902
61% of the total variation is explained by the regression line using the independent
variable.
Exercise 6.7
1.
From the Exercise 6.2(1), find and interpret the coefficient of determination.
2.
3.
281
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Answer Exercise 6.7

1.
r 2 0.6922
2.
r 2 0.5803
3.
r 2 0.9812
4.
r 2 0.9488
5.
r 2 0.9522
6.
r 2 0.974
7.
r 2 0.712
8.
r 2 0.061
9.
r 2 0.959
10.
r 2 0.700
11.
r 2 0.897
12.
r 2 0.906
13.
r 2 0.858
14.
r 2 0.915
15.
r 2 0.916
282
6.7 Coefficient of Pearson Correlation
Definition 8
A correlation exists between two variables when one of them is related to the other
in some way. The coefficient of Pearson correlation measures the strength and
direction of a linear relationship between the two variables. The symbol for the
sample Pearson correlation coefficient is r. The symbol for the population correlation
coefficient is .
Theory 9
If ( xi , yi ); 1,2,...n are the values of a random sample from a bivariate population,
then
r
Sxy
. When r is between 0 to 0.5, the correlation between the variables is
Sxx Syy
positively weak or negatively weak. But when r is between 0.5 to 1, the

correlation between the variables is positively strong or negatively strong. There is no
correlation between the variables if r = 0.
Example 15
Refer to data in Example 5, find and interpret the Pearson correlation coefficient.
Answer Example 15
From the Example 5, Sxy 230, Sxx 844.25 and Syy 74 ,
therefore the Pearson correlation coefficient is
r
Sxy
230
0.9202 .
Sxx Syy
(844 .25)(74)
A Pearson correlation coefficient of 0.9202 indicates a strong positive linear

relationship between the variables.
283
Example 16
Refer to data in example 6, find and interpret the Pearson correlation coefficient.
Answer Example 16
From the Example 6, Sxy 2004, Sxx 3445.67 and
Syy 1902 , therefore the
Pearson correlation coefficient is

r
Sxy
2004
0.7828 .
Sxx Syy
(3445 .67 )(1902 )
A Pearson correlation coefficient of 0.7828 indicates a strong positive linear

relationship between the variables.
Exercise 6.8
1.
From the Exercise 6.2(1), find and interpret the Pearson correlation
coefficient.
2.
coefficient.
3.
coefficient.
4.
coefficient.
5.
coefficient.
6.
coefficient.
284
7.
coefficient.
8.
coefficient.
9.
coefficient.
10.
coefficient.
11.
coefficient.
12.
coefficient.
13.
coefficient.
14.
coefficient.
15.
coefficient.
285
Answer Exercise 6.8

1.
r 0.832
2.
r 0.7618
3.
r 0.9905
4.
r 0.9742
5.
r 0.9759
6.
r 0.987
7.
r 0.844
8.
r 0.248
9.
r 0.979
10.
r 0.837
11.
r 0.947
12.
r 0.952
13.
r 0.926
14.
r 0.957
15.
r 0.957
EXERCISE CHAPTER 6
1.
The table shows the elongation (in thousands of an inch) of steel rods of
nominally the same composition and diameter when subjected to various
tensile forces (in thousands of pounds).
(a)
Force (x)
Elongation (y)
1.2
15.6
5.3
80.3
3.1
39.0
2.2
34.3
4.1
58.2
2.6
36.7
6.5
88.9
8.3
111.5
7.6
99.8
4.9
65.7
Assuming a linear relationship, use the least-squares method to find

the regression coefficients of 0 and 1.
(b)
Interpret the meaning of the slope 1 in this problem.
(c)
Predict the elongation of steel rods when the various tensile forces are
286
5000 pounds.
(d)
Find the coefficient of determination and coefficient of Pearson

correlation. Interpret the results.
2.
The owner of MSR Enterprise would like to study the effect of number of sold
cars (in 1000 units) on economy growth per year (in %) as stated on table
below.
Economy
1.3
1.8
2.5
3.5
4.8
6.5
7.7
1.2
1.5
1.8
2.3
2.2
2.5
2.7
Growth
No. of Sold Cars
(a)
Find the Pearsons correlation coefficient between economy growth X,

and the number of sold cars Y. Interpret your result.
(b)
Obtain the linear regression model on number of sold cars against

economy growth.
(c)
3.
Predict the number of sold cars in which the economy growth is 6 %.
During the harvest season in Malaysia, paddies are sold in large quantities at
farm. One researcher wanted to study a relationship between calcium and the
yield of paddy. To determine whether this was really true, a sample of 7 plots
of paddy was measured for the weight of calcium and the weight of paddy.
The following results shown table below.
Calcium (mg)
Weight (kg/1m2)
50
2.2
55
3.0
54
2.5
52
2.7
37
1.5
287
(a)
52
2.0
53
2.5
Assuming a linear relationship, use the least-squares method to find

the regression coefficients 0 and 1.
(b)
Interpret the meaning of the slope 1 in this problem.
(c)
Predict the weight for paddy where the paddy plot consists of 60 mg
calcium.
4.
Crickets make a chirping sound with their wing covers. Scientists have
recognized that there is relationship between the frequency of chirps and the
temperature. 15 data had been observe from the study, are as below :
Chirps, x
20
Temperature, y
16
88.6 71.6
19.8
18.4
17.1
15.5
14.7
17.1
93.3
84.3
80.6
75.2
69.7
82
Chirps, x
15.4
16.3
15
17.2
16
17
14.4
Temperature, y
69.4
83.3
79.6
82.6
80.6
83.5
76.3
(a)
sketch a scatter plot for the data above.
(b)
use the method of least squares to estimate the regression line.

Interpret the result.
(c)
predict the temperature when x = 15 chirps per second.
(d)
test
the
null
hypothesis
1 3
against
the
alternative
hypothesis 1 3 at the 0.01 level of significance.
5.
An engineer conducted a study to determine whether there is a linear

relationship between the breaking strength, y, of wooden beams and the specific
gravity, x, of the wood. Ten randomly selected beams of the same crosssectional dimensions were stressed until they broke. The breaking strength and
the specific gravity of the wood
288
are shown in table below for each of the ten beams.

Beam
Breaking strength, y
Specific gravity, x
11.14
0.499
12.74
0.558
13.13
0.604
11.51
0.441
12.38
0.550
12.60
0.528
11.13
0.418
11.70
0.480
11.02
0.406
10
11.41
0.467
(a)
Construct a scatter plot of the data.
(b)
Assuming the relationship between the variables is best described by a

straight line, y 0 1 x, use the method of least squares or
maximum likelihood to estimate the value of y-intercept, 0 and slope
of the line, 1 . Interpret the results.
(c)
Estimate the average of breaking strength when specific gravity is

0.455.
(d)
Test the hypothesis H 1 : 1 0 by taking level of significance, =

0.05.
(e)
Find the correlation coefficient r, and coefficient of determination, r2

and then interpret the results.
6.
An officer wants to study the relationship between biomass productions of

orange and cumulative intercepted solar radiation (Wh/m2) over a six-week
period following emergence. Biomass production is the mean dry weight in
grams of independent samples of four plants which is collected at XY
Plantation. The data of this study are
289
shown in table below.

Solar Radiation (X)
Plant Biomass (Y)
28.8
15.8
48.5
48.2
68.3
71.1
90.5
95.7
120.2
150.4
170.5
210.5
(a)
Sketch the scatter diagram for the above data.
(b)
Compute 0 and 1 for the linear regression of plant biomass on

intercepted solar radiation. Write the regression equation and interpret
the result.
(c)
Calculate the sample correlation coefficient, r and interpret your result.
(d)
Predict the plant biomass for 300 Wh/m2 solar radiations.
(e)
Test the null hypothesis, 1 = 0 against the alternative hypothesis,
1 0 at 5% level of significance.
7.
Thermal conductivity of a material is the quantity of heat, transmitted through

a thickness in a direction normal to a surface of area. The thermal conductivity
is due to a temperature gradient under a steady state conditions. The materials
with high thermal conductivities are good conductors of heat, whereas
materials with low thermal conductivities are good thermal insulator. A test
has been conducted to investigate the relationship between thickness of a
material (millimeter) and the thermal conductivity of the material (Watt per
meter Kelvin). Assume that there is a linear relationship between the thermal
conductivity of a material and the thickness of the material. Seven materials
are chosen at random where the pressure and temperature are at normal rate.
The thicknesses of 7 materials are measured and the thermal conductivity of
each material is recorded as shown in the table below.
290
Thickness (x)
21 26 28
31
25
19
35
Thermal Conductivity (y)
12 16 19
21
14
11
24
i 1
i 1
i 1
i 1
i 1
xi 185, yi 117, xi2 5073, yi2 2095, xi yi 3250

(a)
Plot the data on a scatter diagram.
(b)
Estimate the regression line by using the method of least square.

Interpret your result.
(c)
Estimate the average of the thermal conductivity if the thickness of a

material is 29mm.
(d)
8.
Calculate the coefficient of correlation r and r2. Interpret their values.
From the past experience, a certain type of plastic indicates that a relation
exists between the mean hardness (measured in Brinell units) of items molded
from the plastic ( Y ) and the elapsed time (hours) since termination of the
molding process ( X ). Twelve batches of the plastic were made, and from
each batch one test item was molded and the hardness measured at some
specific point in time. The results are shown in following table.
Batch
10
11
12
32
48
72
64
48
16
40
48
48
24
80
56
230
262
323
298
255
199
248
279
267
214
359
305
(a)
Draw a scatter plot.
(b)
Find the estimated regression line by using the least square method.
(c)
Estimate the mean hardness when the elapsed time is 48 hours.

(d)
Calculate the coefficient of correlation, r and coefficient of

determination. Interpret these results.
291
9.
Zaiton wishes to buy a car. She read a news paper to find the price of the used
car for a local compact car. The data of the age (in years) and the prices (RM
in thousand) are shown in table below.
Age (x)
Price (y)
10
11
33.4 29.3 29.0 28.1 27.5 26.0 24.2 19.5 14.7 14.0 13.4 13.0
(a)
Sketch a scatter plot for the data.
(b)
Use the method of least squares to estimate the regression line.

Interpret the results.
10.
12
(c)
Test the slope, 1= -1 at 5% level of significance.
(d)
Estimate the car price when the cars are 14 years old.
Consider the following data for 10 such samples.
Soil Sample
Strontium Distribution Coefficient
Total Aluminium
100
200
120
225
300
325
250
310
400
350
500
400
450
375
445
385
310
350
10
200
290
Let Y represent the strontium distribution coefficient and X represent the total
aluminium.
292
11.
(a)
Find the equation of the line of best fit.
(b)
Find a 95% confidence interval of 1 .
Suppose a fire insurance company wants to relate the amount of fire damage
in major residential fires to the distance between the burning house and the
nearest fire station. The study is to be conducted in a large suburb of a major
city. A sample of 10 recent fires in this suburb is selected. The distance
between the fire and the nearest fire station, x, and the amount of damage, y,
are recorded for each fire. The results are given in the table below.
Distance from Fire Station x
(miles)
3.4
1.8
4.6
2.3
3.1
5.5
0.7
3.0
2.6
4.3
Fire Damage y
(thousands of dollars)
26.2
17.8
31.3
23.1
27.5
36.0
14.1
22.3
19.6
31.3
(a)
(b)
Find and interpret the coefficient of determination and the Pearson

correlation coefficient.
(c)
Find the regression line using the least squares method. Interpret the
result.
(d)
Test the hypothesis concerning H 0 : 1 5 against the H 1 : 1 5 at

the 0.05 level of significance.
12.
A manager of a car dealership believes that there is a relationship between

the number of salespeople on duty and the number of cars sold in a week. The
following data in table is used to develop a simple regression model.
293
Week
Number of Sales People, x
Number of Cars Sold, y
79
64
49
23
52
xi 21 ,
i 1
yi 267 ,
i 1
xi yi 1256 ,
i 1
xi 101 ,
i 1
2
i
15971
i 1
(a)
(b)
Calculate the sample correlation coefficient and interpret the result.
(c)
By using the least square method, estimate the regression line.

(d)
Estimate the number of sales people when the number of cars sold is
41. Interpret the result.
(e)
Test the slope whether it is greater than ten at 5% level of significance.
ANSWER EXERCISE CHAPTER 6

1.
(a) 2.1978 + 13.2756x, (c) 66380.1978, (d) r = 0.9939, r2 = 0.9878
2.
(a) 0.9319, (b) y = 1.1841 + 0.2104x, (c) 2446
3.
(a) y 1.0913 0.0681 x (c) 2.9947
4.
(b) y 24.967 3.3057 x , (c) 74.5524 , (d) T = 0.512, do not reject H0
5.
(b) y 6.47 + 10.901x, (c) 11.43, (d) T = 1.824, do not reject H0,
(e) r = 0.913, r2= 0.834
6.
7.
8.
9.
(b) y 22.372 1.378 x , (c) r = 0.9977, (d) 391.028, (e) T = 30.144,

reject H0
(b) y 5.9958 0.8593 x , (c) 18.9239, (d) r = 0.9863, r2 = 0.9728
(b) y 153.915 2.416 x , (c) 269.883 , (d) r = 0.97942 , r2 0.95926
(b) y 35.5225 1.9766 x, (c) T = -6.2516, reject H0 , (d) 7.8505
294
10.
(a) y 348.3351 2.0431x, (b) 1.6063 1 2.4799
11.
(b) r2 0.9380 , r 0.9685 , (c) y 10.250 4.6868 x , (d) T = 0.7354 , do

not reject H0.
^
12.
(b) r = 0.9089, (c) y 9.234375 10.515625 x , (d) 3.2258, (e) T = 0.1852, do

not reject H0.
SUMMARY CHAPTER 6
1.
Simple Linear Regression Model

(i)
Least Squares Method

The model :
y 0 1 x
Sxy
(slope) and 0 y 1 x , (y-intercept) where
Sxx
n
n
1 n n
Sxy ( xi x )( yi y ) xi yi xi yi ,
n i 1 i 1
i 1
i 1
1 n
Sxx ( xi x ) xi xi ,
n i 1
i 1
i 1
n
Syy ( y i y ) y i
2
i 1
i 1
1 n
yi
n i 1
and n = sample size
2.
Inference of Regression Coefficients

(i)
Slope
SSE
SSE = Syy 1 S xy , MSE
,
n2
(ii)
Intercept
Ttest =
0 C
295
Ttest =
1 C
MSE S xx
3.
Confidence Intervals of the Regression Line

(i)
Slope, 1
1 t / 2,v MSE / Sxx 1 1 t / 2,v MSE / Sxx ,

where v = n-2
(ii)
Intercept, 0
1
n
0 t / 2,v MSE
1 x2
x2
0 0 t / 2,v MSE
,
Sxx
n Sxx
where v = n-2
4.
Coefficient of Determination, r 2.
r2
5.
Syy SSE
SSE
1
Syy
Syy
Coefficient of Pearson Correlation, r.

Sxy
r
Sxx Syy
296
CORRECTION PAGE CHAPTER 6
297
298

Chapter 6

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Chapter 6

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6

Uploaded by

Copyright:

Available Formats

Chapter 6 : Simple Linear Regression

CHAPTER 6 : SIMPLE LINEAR REGRESSION

Pemboleh ubah tidak bersandar

Pemboleh ubah bersandar

Simple linear regression

Regresi linear ringkas

Least square method

Kaedah kuasa dua terkecil

Chapter 6 : Simple Linear Regression

A major objective of many statistical investigations is to establish relationships that

6.2 Scatter Plots

Chapter 6 : Simple Linear Regression

For example, if the researcher wishes to see whether there is a relationship

Final grade, y (%)

Chapter 6 : Simple Linear Regression

Amount of drug, x (%)

Reaction time, y(seconds)

Scatter Plot for Example 2

Amount of drug, x (%)

Chapter 6 : Simple Linear Regression

A researcher wishes to determine if a persons age is related to the number of

The number of calories and the number of milligrams of cholesterol for a

A researcher desires to know whether the typing speed of a secretary (in

The following data pertain to the chlorine residual in a swimming pool at

Chapter 6 : Simple Linear Regression

The job placement center at State University wants to determine whether

Chapter 6 : Simple Linear Regression

No. of reported crimes, y

Chapter 6 : Simple Linear Regression

Cost of the house, y

Answer Exercise 6.2

Chapter 6 : Simple Linear Regression

Chapter 6 : Simple Linear Regression

Chapter 6 : Simple Linear Regression

Chapter 6 : Simple Linear Regression

No. of reported crimes, y

Chapter 6 : Simple Linear Regression

Cost of the house, y

6.3 Simple Linear Regression Model

Chapter 6 : Simple Linear Regression

From the graph above, the error is approximated by e y y , the difference

x = independent or predictor variable (variable used as a predictor of y)

1 slope of the line (the amount of increase (or decrease) in the

Chapter 6 : Simple Linear Regression

variable appears only in the first power.

and n = sample size.

Chapter 6 : Simple Linear Regression

Moisture content (y)

Estimate the moisture content when the relative humidity is 38 percent.

x 507, x 22265, y 144 ,

from the data. Thus,

and the equation of the least squares line is y 0 1 x ,

Chapter 6 : Simple Linear Regression

Substituting x = 38 into the equation obtained in answer (a), we get

x 854, x 64222 , y 876

from the data. Thus,

Chapter 6 : Simple Linear Regression

and the equation of the least squares line is y 0 1 x ,