Regression Analysis
Regression Analysis
Regression Analysis
REGRESSION ANALYSIS
Prediction or estimation is one of the major problems in almost all the spheres of human
activity. The estimation or prediction of future production, consumption, prices, investments,
sales, profits, income etc. are of very great importance to business professionals.
Regression Model: A mathematical (or theoretical) equation that shows the linear relation
between the explanatory variable and the response variable. The simple linear regression
model is Y = 0 + X + ,
Where
2
i is called the error term with mean 0 and variance .
In simple linear regression a single independent variable is used to predict the value of a
dependent variable. The equation that describes how y is related to x and an error term is
called the regression model. The regression model used in simple linear regression follows.
In practice, the parameter values are not known, and must be estimated using sample data.
Sample statistics b 0 and b1 are computed as estimates of the population
parameters β 0 and β1 . The estimated regression equation for simple linear regression
follows.
1
Estimated simple linear regression equation
ŷ b 0 b1 x ................(3)
Where
x y
xy n
b1
x 2
x 2
n
b 0 y b1 x
x
x and y
y
n n
Example 1:
Suppose one hundred food restaurants of a businessman given name Arman located in our
country. The most successful locations are near college/university campuses. The managers
believe that quarterly sales for these restaurants (denoted by y) are related positively to the
size of the student population (denoted by x); that is, restaurants near campuses with a large
student population tend to generate more sales than those located near campuses with a small
student population. Data were collected from a sample of 10 Arman’s restaurants located near
college/university campuses to see how the dependent variable y is related to the independent
variable x.
2
Estimate the regression equation of sales on student population. Also predict quarterly sales
for a restaurant to be located near a campus with 16,000 students.
Solution:
x
x and y
y
n n
Restaurant x x2 y xy
1 2 4 58 116
2 6 36 105 630
3 8 64 88 704
4 8 64 118 944
5 12 144 117 1404
6 16 256 137 2192
7 20 400 157 3140
8 20 400 169 3380
9 22 484 149 3278
10 26 676 202 5252
Totals 140 19600 1300 182000
x y
xy n
We have, b1 5
x 2
x 2
n
3
Y = 60+5x
R2 = 0.902
The slope of the estimated regression equation ( b1 5) is positive, implying that as student
population increases, sales increase. In fact, we can conclude that an increase in the student
population of 1000 is associated with an increase of $5000 in expected sales; that is,
quarterly sales are expected to increase by $5 per student. It has been estimated that
R 2 0.902 which indicates 90.2% variation of the sales is explained by size of student
population.
Example 2: A department store has the following statistics of sales(x) for a period of last
one year of 10 salesmen, who have varying years of experience(y).
Predict the annual sales volume of persons who have 15 years of sales experience.
4
Solution:
Salesperson x y x2 xy
1 1 80 1 80
2 3 97 9 291
3 4 92 16 368
4 4 102 16 408
5 6 103 36 618
6 8 111 64 888
7 10 119 100 1190
8 10 123 100 1230
9 11 117 121 1287
10 13 136 169 1768
Totals 70 1080 632 8128
x y
xy n
We have, b 0 y b1 x 80 and b1 4
x 2
x 2
n
Thus the estimated regression line is ŷ 80 4 x
Example 3: The following data relate to advertising expenditure (in lakhs of taka) and their
corresponding sales (in lakhs of taka):
Adv. Exp. : 14 16 18 20 24 30 32
Sales. : 52 62 65 70 76 80 78
5
Solution: Since sales depend on advertisement expenditure, let advertising expenditure be
denoted by x and sales by y.
Calculation for Regression equations
2
x x y y2 xy
14 196 52 2704 728
16 256 62 3844 992
18 324 65 4225 1170
20 400 70 4900 1400
24 576 76 5776 1824
30 900 80 6400 2400
32 1024 78 6084 2496
Total 154 3676 483 33933 11010
6
Multiple Linear Regression
The method of estimating the rate of average change in the value of the dependent variable
for per unit change in the value of two or more independent variables is known as multiple
regression. In fact, an equation describing the relationship of a dependent variable with two
or more independent variables is called a multiple regression equation.
If a dependent variable Y depends linearly on two independent variables X 1 and X2, then the
population multiple regression equation of Y on X1 and X2 is given by
Y β 0 β1 X1 β 2 X 2
Where β 0 , β1 and β 2 unknown parameters and β1 and β 2 are called partial regression
coefficient of Y on X1 and Y on X2 respectively.
Y b0 b1X1 b 2 X 2
The general form of the linear multiple regression function for k independent variables
X1, X2, ......., Xk is Y b 0 b1 X1 b 2 X 2 ........... b k X k
Let us try further to understand the necessity of considering multiple regression with an
example. If income ( Y ) is correlated with a single independent variable-education ( X1 ), and
rxy 0.5 is found, this will mean that r 2 0.25 and that X1 explains only 25
a correlation of
percent of the total variation in Y . In order to explain the remaining 75 percent of the
variation, we might add additional variables to the analysis. We can add such variable as age
to the analysis and label this variable as X 2 . The resulting observed regression function is of
the form:
Y b0 b1 X1 b 2 X 2 e
The function above is the simplest form of a multiple regression function and the analysis
involved here is known as the multiple regression analysis.
7
Example:
where exper is years of labor market experience. Thus, wage is determined by the two
explanatory or independent variables, education and experience, and by other unobserved
factors, which are contained in u.
Example 4: A marketing manager might attempt to predict profit for a given level of sales
and advertising expenditures. The following data relate to sales, advertising expenditure (in
lakhs of taka) and their corresponding profit (in lakhs of taka):
Profit.(y) : 40 50 50 70 65 65 80
Sales(x1). : 100 200 300 400 500 600 700
Adv. exp(x2).: 10 20 10 30 20 20 30
Solution:
x 2
sp(x1 y) x 1 y
x y 16500 0
ss(x1 ) x 1 280000 ,
2 1 1
n n
8
x 2
sp(x 2 y) x 2 y
x y 600
ss(x 2 ) x 2 400 ,
2 2 2
n n
y 2
sp(x1 x 2 ) x 1 x 2
x x
ss(y) y 2 1150 , 7000
1 2
n n
and b 0 y b1 x 1 b 2 x 2 28.1
Therefore the estimated profit ŷ 28.1 0.038 900 0.833 50 103.95 lakhs
RSS/(k - 1) RSS/(3 - 1)
and F 104.729 * * with (2, 4) d.f.
SSR/(n k) ss(y) RSS/(7 3)
Now, the tabulated value of F with (2, 4) df at 1% level of significance is 18.0. As calculated
value of F is greater than the table value of F so the R value is significant at 1% level.
2
Thus, our result says that the combined effect of sales and advertising expenditure
significantly contribute to the variation in profit described by the multiple regression
equation. It has been estimated that R 0.98 which indicates 98% variation of the profit
2