Regression Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

LESSON-7

REGRESSION ANALYSIS

Ref: Consult your Text Book, Chapter-7: pp-238-271

Simple Linear Regression

Prediction or estimation is one of the major problems in almost all the spheres of human
activity. The estimation or prediction of future production, consumption, prices, investments,
sales, profits, income etc. are of very great importance to business professionals.

Regression analysis is a statistical tool for the investigation of relationships between


variables. Regression analysis was explained by M. M. Blair as follows: “Regression analysis is
a mathematical measure of the average relationship between two or more variables in terms
of the original units of the data.”

Regression Model: A mathematical (or theoretical) equation that shows the linear relation
between the explanatory variable and the response variable. The simple linear regression
model is Y = 0 + X + ,
Where
2
 i is called the error term with mean 0 and variance   .

 Y is called the response (dependent) variable,


 X is called the predictor (explanatory variable)
  0 is the intercept of the true regression line.
 β is the slope of the true regression line.

 Slope is the average amount of change in Y for one unit of increase in X.


 Intercept is the value of Y when X = 0.

In simple linear regression a single independent variable is used to predict the value of a
dependent variable. The equation that describes how y is related to x and an error term is
called the regression model. The regression model used in simple linear regression follows.

Simple linear regression model


y  β 0  β1 x  ε .................(1)
β 0 and β1 are referred to as the parameters of the model, and ε is a random variable
referred to as the error term.

In practice, the parameter values are not known, and must be estimated using sample data.
Sample statistics b 0 and b1 are computed as estimates of the population
parameters β 0 and β1 . The estimated regression equation for simple linear regression
follows.

1
Estimated simple linear regression equation
ŷ  b 0  b1 x ................(3)
Where
 x y
 xy  n
b1 
 x 2
x 2

n
b 0  y  b1 x

x
x and y
y
n n

 b1 , the regression coefficient of y on x indicates the change in the value of y


(dependent variable) for unit change of x (independent variable)
 b1  0 , y does not change for changing x.
 b1  1 , one unit change of x results one unit change in the value of y.
 b1  2 one unit change of x results two unit change in the value of y.

Example 1:

Suppose one hundred food restaurants of a businessman given name Arman located in our
country. The most successful locations are near college/university campuses. The managers
believe that quarterly sales for these restaurants (denoted by y) are related positively to the
size of the student population (denoted by x); that is, restaurants near campuses with a large
student population tend to generate more sales than those located near campuses with a small
student population. Data were collected from a sample of 10 Arman’s restaurants located near
college/university campuses to see how the dependent variable y is related to the independent
variable x.

Restaurant Student Quarterly


Population Sales
(1000s) ($1000s)
x y
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202

2
Estimate the regression equation of sales on student population. Also predict quarterly sales
for a restaurant to be located near a campus with 16,000 students.

Solution:

Estimated simple linear regression equation


ŷ  b 0  b1 x ................(1)
Where
 x y
 xy  n
b1 
 x 2
x 2

n
b 0  y  b1 x

x
x and y
y
n n

Table: Calculations for the estimated regression equation

Restaurant x x2 y xy

1 2 4 58 116
2 6 36 105 630
3 8 64 88 704
4 8 64 118 944
5 12 144 117 1404
6 16 256 137 2192
7 20 400 157 3140
8 20 400 169 3380
9 22 484 149 3278
10 26 676 202 5252
Totals 140 19600 1300 182000

 x y
 xy  n
We have, b1  5
 x 2
x 2

n

The calculations of the y intercept ( b0 ) follows


b 0  y  b1 x
= 130 – 5(14)
= 60
Thus, the estimated regression equation from (1) is given by
ŷ  60  5x ........(2)

3
Y = 60+5x
R2 = 0.902

The slope of the estimated regression equation ( b1  5) is positive, implying that as student
population increases, sales increase. In fact, we can conclude that an increase in the student
population of 1000 is associated with an increase of $5000 in expected sales; that is,
quarterly sales are expected to increase by $5 per student. It has been estimated that
R 2  0.902 which indicates 90.2% variation of the sales is explained by size of student
population.

If x = 16, then ŷ  60  5 (16)  140


Hence, we would predict quarterly sales of $140,000 for this restaurant.

Example 2: A department store has the following statistics of sales(x) for a period of last
one year of 10 salesmen, who have varying years of experience(y).

Salesperson Years of Annual sales


experience (Tk. 000s)
1 1 80
2 3 97
3 4 92
4 4 102
5 6 103
6 8 111
7 10 119
8 10 123
9 11 117
10 13 136

Predict the annual sales volume of persons who have 15 years of sales experience.

4
Solution:

Salesperson x y x2 xy
1 1 80 1 80
2 3 97 9 291
3 4 92 16 368
4 4 102 16 408
5 6 103 36 618
6 8 111 64 888
7 10 119 100 1190
8 10 123 100 1230
9 11 117 121 1287
10 13 136 169 1768
Totals 70 1080 632 8128

The estimated regression equation is


ŷ  b 0  b1 x ,

 x y
 xy  n
We have, b 0  y  b1 x  80 and b1  4
 x 2
x 2

n
Thus the estimated regression line is ŷ  80  4 x

Now the estimated sales for x  15 is ŷ  80  4(15)  140 (000s)Tk.

Example 3: The following data relate to advertising expenditure (in lakhs of taka) and their
corresponding sales (in lakhs of taka):

Adv. Exp. : 14 16 18 20 24 30 32
Sales. : 52 62 65 70 76 80 78

Estimate the sales corresponding to advertising expenditure of Tk. 50 lakhs

5
Solution: Since sales depend on advertisement expenditure, let advertising expenditure be
denoted by x and sales by y.
Calculation for Regression equations
2
x x y y2 xy
14 196 52 2704 728
16 256 62 3844 992
18 324 65 4225 1170
20 400 70 4900 1400
24 576 76 5776 1824
30 900 80 6400 2400
32 1024 78 6084 2496
Total 154 3676 483 33933 11010

The estimated regression equation is


ŷ  b 0  b1 x ,
 xy
 xy  n
We have, b 0  y  b1 x  39.67 and b1   1.33
 x 2
x 2

n
Thus the estimated regression line is ŷ  39.67  1.33x

Now the estimated sales for x  15 is ŷ  39.67  1.33(50)  106.32 lakhs

Thus, the likely sales corresponding to advertising expenditure of Tk. 50 lakhs is


Tk. 106.324 lakhs

6
Multiple Linear Regression

In order to predict the dependent variable as accurately as possible, it is usually necessary


to include multiple independent variables in the model.

The method of estimating the rate of average change in the value of the dependent variable
for per unit change in the value of two or more independent variables is known as multiple
regression. In fact, an equation describing the relationship of a dependent variable with two
or more independent variables is called a multiple regression equation.

For example, a multiple-regression analysis might reveal a positive relationship between


demand for sunglasses and various demographic characteristics (age, income) of the buyers-
that is, demand varies directly with changes in their characteristics. Multiple regression
thereby helps marketers to identify their best prospects.

If a dependent variable Y depends linearly on two independent variables X 1 and X2, then the
population multiple regression equation of Y on X1 and X2 is given by

Y  β 0  β1 X1  β 2 X 2
Where β 0 , β1 and β 2 unknown parameters and β1 and β 2 are called partial regression
coefficient of Y on X1 and Y on X2 respectively.

Sample multiple regression equation of Y on X1 and X2 is given by

Y  b0  b1X1  b 2 X 2

Where b0, b1 and b2 are estimates of β 0 , β1 and β 2 respectively.

The general form of the linear multiple regression function for k independent variables
X1, X2, ......., Xk is Y  b 0  b1 X1  b 2 X 2  ...........  b k X k

Let us try further to understand the necessity of considering multiple regression with an
example. If income ( Y ) is correlated with a single independent variable-education ( X1 ), and
rxy  0.5 is found, this will mean that r 2  0.25 and that X1 explains only 25
a correlation of
percent of the total variation in Y . In order to explain the remaining 75 percent of the
variation, we might add additional variables to the analysis. We can add such variable as age
to the analysis and label this variable as X 2 . The resulting observed regression function is of
the form:
Y  b0  b1 X1  b 2 X 2  e
The function above is the simplest form of a multiple regression function and the analysis
involved here is known as the multiple regression analysis.

7
Example:

Multiple linear regression model,

wage  β 0  β1educ  β 2 exper  u ,

where exper is years of labor market experience. Thus, wage is determined by the two
explanatory or independent variables, education and experience, and by other unobserved
factors, which are contained in u.

Some multiple linear regression model:

 wage  β 0  β1educ  β 2 ability  u


 wage  β 0  β1educ  β 2 exper  β 3 ability  u
 Price  β 0  β1sqrft  β 2 bdrms  u
 profit  β 0  β1sales  β 2 adv. exp.  u ,

profit is determined by the two explanatory or independent variables, sales and


advertising expenditure, and by other unobserved factors, which are contained in u.

Example 4: A marketing manager might attempt to predict profit for a given level of sales
and advertising expenditures. The following data relate to sales, advertising expenditure (in
lakhs of taka) and their corresponding profit (in lakhs of taka):

Profit.(y) : 40 50 50 70 65 65 80
Sales(x1). : 100 200 300 400 500 600 700
Adv. exp(x2).: 10 20 10 30 20 20 30

Compute the estimated profit for x1  900 and x 2  50 .

Solution:

If a dependent variable profit ( y ) depends linearly on two independent variables sales ( x 1 )


and Adv. exp. ( x 2 ), then the sample multiple regression equation of y on x 1 and x 2 is given
by
y  b0  b1 x 1  b 2 x 2

From the given data we get,


 y  420,  x 1  2800 , x 2  140

 x  2

sp(x1 y)   x 1 y 
 x  y  16500 0
ss(x1 )   x 1   280000 ,
2 1 1

n n

8
 x  2

sp(x 2 y)   x 2 y 
 x  y  600
ss(x 2 )   x 2   400 ,
2 2 2

n n
 y 2

sp(x1 x 2 )   x 1 x 2 
x x
ss(y)   y 2   1150 ,  7000
1 2

n n

ss(x 2 )sp(x1 y)  sp(x1 x 2 )sp(x 2 y)


Now b1   0.0381
ss(x1 )ss(x 2 )  sp(x1 x 2 )
2

ss(x1 )sp(x 2 y)  sp(x1 x 2 )sp(x1 y)


Again b 2   0.833
ss(x1 )ss(x 2 )  sp(x1 x 2 )
2

and b 0  y  b1 x 1  b 2 x 2  28.1

For x1 = 900 and


Estimated regression equation is
x2 = 50
ŷ  28.1  0.038 x1  0.833 x 2

Therefore the estimated profit ŷ  28.1  0.038  900  0.833  50  103.95 lakhs

RSS b1sp(x1 y)  b 2 sp(x 2 y)


Now, R2    0.98
ss(y) ss(y)

RSS/(k - 1) RSS/(3 - 1)
and F   104.729 * * with (2, 4) d.f.
SSR/(n  k) ss(y)  RSS/(7  3)

Now, the tabulated value of F with (2, 4) df at 1% level of significance is 18.0. As calculated
value of F is greater than the table value of F so the R value is significant at 1% level.
2

Thus, our result says that the combined effect of sales and advertising expenditure
significantly contribute to the variation in profit described by the multiple regression
equation. It has been estimated that R  0.98 which indicates 98% variation of the profit
2

is explained by sales and advertising expenditure.

You might also like