17 Regression Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

UNIT REGRESSION

5 analysis
INTRODUCTION TO REGRESSION:

Business Executives and economists often have to forecast for future.


Regression analysis is one of the most popular techniques of statistics used for
forecasting. In regression analysis we find the functional relationship between two sets
of data.

Regression analysis can be simple or multiple. In simple regression we find the


functional relationship between two variables say X and Y where the variable X
influences the value of Y. X is called the independent variable and Y is known as
dependent variable. An economist is interested to find the relationship between income
and consumption; a doctor is interested to know the relationship between salt intake
and rise in blood pressure, a municipal official is keen to gauge the impact of more
buses on roads on city’s pollution level are some of the examples of simple regression
analysis. In multiple regression analysis the value of Y depends upon a number of
independent variables X 1 , X 2 , X 3 ….. X n .

Moreover, regression analysis can be linear and curvilinear. In linear


regression, the relationship between Y and X can be represented by a straight line while
in curvilinear regression the relationship in represented by a curve and the value of Y is
not affected by a constant value of X. In this chapter we shall discuss only the simple
and linear regression.
However it may be noted that the regression relation is irreversible, i.e., the
regression equation used to predict the value of Y from a given value of X cannot be used
to predict the value of X from the given value of Y, i.e., the regression relation is average,
irreversible, functional relation.

Examples of Irreversible Relation

1. If the income of a family increase the expenditure will increase but if the
expenditure increases, there is no guarantee that the income will also increase.
2. If the rainfall is timely and good the crop will be good but if the crop is good there
is no guarantee that the rainfall was timely and good.

REGRESSION EQUATIONS:
DEEPAK SHARMA
5.2 REGRESSION ANALYSIS [C.P.T.]

Regression equations are algebraic expressions of the regression lines. Since


there are two regression lines, there are two regression equations—the regression
equation of x on y is used to describe the variation in the values of x for given changes
in Y and the regression equation of Y on X is used to describe the variation in the values
of Y for given changes in X.

Regression Equation of Y on X:

The regression equation of Y on X is expressed as follows:


Yc = a + bX

Regression Equation of X on Y:

The regression equation of X on Y is expressed as follows:


X c = a + bY

METHODS OF REGRESSION ANALYSIS:

Graphic method :

Under this method the two variables are plotted on a graph. It is a normal practice to
plot the dependent variable on the y-axis and independent variable on the x-axis. The
points plotted in the space from a scatter diagram which is shown below.

Now we have to draw a line called the line of regression which passes through these
points, bearing equal number of points in both sides the line should be drawn in such a
manner that all the points should be as close as possible to the line. This is achieved by
using method of least squares. In this method the sum of the squares of the differences
between observed value of Y and estimated value of Y from the line for every point is
minimised.

Y Y = a + bX

• •
• • •
• • • •
• • •
• •

O X
(1) Least Square Method

(a) Equation of Regression line of Y on X


DEEPAK SHARMA
5.3 REGRESSION ANALYSIS [C.P.T.]

Yc = a + bX

To determine the values of a and b, the following normal equations are to be solved
simultaneously :
∑ Y = Na + b∑ X
∑ XY = a∑ X + b∑ X 2

(b) Equation of Regression line of X on Y


X c = a + bY

To determine the values of a and b, the following normal equations are to be solved
simultaneously :
∑ X = Na + b∑ Y
∑ XY = a∑ Y + b∑ Y 2

(2) Deviation Taken from A.M. of X and Y


(a) Regression equation of y and x :
Y –Y = byx (X- X )
(b) Regression equation of x and y :
X – X = bxy (Y- Y )

σ σ
y
(3) bxy = r. x ; byx = r.
σy σx
nΣ dx dy − Σdx. Σdy
(4) bxy = (Short cut Method)
nΣdy 2 − (Σ dy )2

nΣ dx. dy − Σdx. Σdy


(5) byx = (Short cut Method)
nΣdx 2 − (Σ dx) 2

n Σ x y − Σ x. Σ y
(6) bxy = (Direct Method)
nΣ y 2 − ( Σ y) 2

nΣ x y − Σ x. Σ y
(7) byx = (Direct Method)
nΣ x 2 − ( Σ x) 2

EXERCISE :

Q.1 From the following data obtain the two regression equations :
DEEPAK SHARMA
5.4 REGRESSION ANALYSIS [C.P.T.]

X 6 2 10 4 8
Y 9 11 5 8 7
(Ans: Y = 11.9 – 0.65 X ; X = 16.4 – 1.3 Y)

Q.2 The following table shows the ages (X) and blood pressure (Y) of 8 persons :
X: 52 63 45 36 72 65 47 25
Y: 62 53 51 25 79 43 60 33
Obtain the regression equation of Y on X and find the expected blood pressure of a
person who is 49 years old.
[Ans. Y = 11.87 + 0.768x, Y = 49.502]

Q.3 (i) Compute the two regression equations on the basis of the following
information:
X Y
Mean 40 45
Standard Deviation 10 9
Karl Pearson’s correlation coefficient between X and Y = 0.50
(ii) Also estimate the value of Y for X = 48 using the appropriate regression
equation.
[Ans: Y = 0.45 x + 27 X = 48 is 48.6.

Q.4 In a partially destroyed laboratory record of an analysing of correlation data, the


following results only are legible :
Variance of X = 9
Regression equations 8 X – 10 Y + 66 = 0
40 X – 18 Y = 214
Find on the basis of the above information :
(i) The mean values of X and Y
(ii) Coefficient of correlation between X and Y, and
(iii) Standard deviation of Y. [Ans. (i) X = 13, Y = 17 (ii) r = 0.6 (iii) σ y =4 ]
DEEPAK SHARMA
5.5 REGRESSION ANALYSIS [C.P.T.]

Q.5 You are given the following data:


X Y
Arithmetic Mean 36 85
Standard Deviation 11 8
Correlation coefficient between X and Y = 0.66
(i) Find the two Regression Equations, and
(ii) Estimate the value of X when Y = 75.
[Ans: X = – 41.1375 + .9075 Y, Y = 67.72 + 0.48 X, Y 75 = 26.925.]

DIFFERENCE BETWEEN CORRELATION AND REGRESSION ANYLYSIS:

1. Whereas coefficient is a measure of degree of covariability between X and Y, the


objective of regression analysis is to study the ‘nature of relationship’ between the
variables so that we may be able to predict the value of one on the basis of another.
Closer the relationship between two variables, greater the confidence that may be
placed in the estimates.
2. Correlation is merely a tool of ascertaining the degree of relationship between two
variables and, therefore, we cannot say that one variable is the cause and other the
effect. For example, a high degree of correlation between price and demand for a
certain commodity or a particular point of time may not suggest which is the cause
and which is the effect. However, in regression analysis one variable is taken as
dependent while the other as independent- thus making it possible to study the
cause and effect relationship. It should be noted that the presence of association
does not imply causation, but the existence of causation always implies association.
3. In correlation analysis r xy is a measure of direction and degree of linear relationship
between two variables X and Y, r xy and r yx are symmetric (r xy = r yx ), i.e., it is
immaterial which of X and Y is dependent variable and which is independent
variable. In regression analysis the regression coefficients b xy and b yx are not
symmetric, i.e., b xy ≠ b yx and hence it definitely makes a difference as to which
variable is dependent and which is independent.

4. Correlation coefficient is independent of change of scale and origin. Regression


coefficients are independent of change of origin but not of scale
DEEPAK SHARMA
5.6 REGRESSION ANALYSIS [C.P.T.]

IT,S YOUR TURN:


1. Who propounded the theory of regression in statistics?
(a) Sir Francis Galton (b) Prof. M. M. Blair
(c) Karl Pearson (d) F. W. Taylor.

2. Regression analysis is concerned with


(a) Establishing a mathematical relationship between two variables
(b) Measuring the extent of association between two variables
(c) Predicting the value of the dependent variable for a given value of the independent
variable
(d) Both (a) and (c).

3. What are the limits of the two regression coefficients?


(a) No limit (b) Must be positive
(c) One positive and the other negative
(d) Product of the regression coefficient must be numerically less than unity.

4. If one regression coefficient is positive and the other is negative, we can say that
(a) correlation coefficient is positive
(b) correlation coefficient is negative
(c) correlation coefficient is imaginary
(d) given information is defective.

5. Generally, there are two regression lines for two regression equations; it can be one only
when correlation coefficient is equal to
(a) +1 (b) —I
(c) +1 or—I (d) zero.

6. If both regression lines completely falls on each other the correlation between X and Y
variables shall be
(a) perfect (b) no correlation
(c) significant (d) insignificant.

7. When r = ±1, the two regression lines


(a) coincide (b) are perpendicular to each other
(c) are parallel to each other (d) are nothing of these.

8. The value of correlation coefficient is equal to _________ of regression coefficients.


(a) sum (b) multiplication
(c) arithmetic mean (d) geometric mean.

9. If one regression coefficient is greater than unity, the other must be


DEEPAK SHARMA
5.7 REGRESSION ANALYSIS [C.P.T.]

(a) equal to unity (b) less than unity


(c) more than unity (d) none of these

10. If both regression coefficients are 2 and 0.45 respectively, the value of r will be
(a) 0.90 (b) 0.30
(c) 0.95 (d) 0.03.

11. If the value of correlation coefficient is Zero, the regression lines


(a) cover each other (b) make an angle of 45°
(c) are parallel to each other (d) are perpendicular to each other.

12. If the correlation coefficient between X and Y is 0.50, what percentage of the total
variations remains unexplained by the regression equation?
(a) .25 (b) .50
(c) .75 (d) 1

13. The regression coefficients remain unchanged due to a


(a) Shift of origin (b) Shift of scale
(c) Both (a) and (b) (d) (a) or (b).

14. The regression lines cut each other


(a) at the point of x only (b) at the point of Y only
(c) at the point of x and Y (d) at any point.

15. Following are the two normal equations obtained for deriving the regression line of y and
x:
5a + 10b = 40
10a + 25b = 95
The regression line of y on x is given by
(a) 2x + 3y = 5 (b) 2y + 3x = 5 (c) y = 2 + 3x (d) y = 3 + 5x

16. If the regression line of y on x and of x on y are given by 2x + 3y = –1 and 5x + 6y = –1


then the arithmetic means of x and y are given by
(a) (1, –1) (b) (–1, 1) (c) (–1, –1) (d) (2, 3)

17. Given the regression equations as 3x + y = 13 and 2x + 5y = 20, which one is the
regression equation of y on x?
(a) 1st equation (b) 2nd equation
(c) both (a) and (b) (d) none of these

18. If regression coefficients b xy and b yx are —0.4 and +1.6 respectively, the coefficient of
correlation should be
(a) 1 (b) —0.8
(c) +0.4 (d) None of the above.
DEEPAK SHARMA
5.8 REGRESSION ANALYSIS [C.P.T.]

19. If b xy = —0.4 and b yx = —1.6, the coefficient of correlation shall be


(a) +0.08 (b) -0.08
(c) -0.8 (d) +0.8.

20. For a bivariate data the mean value of X is 20, and that of Y is 45.The regression coefficient
of Y on X is 4 and that of X on Y is 1/9. From these data find out the coefficient of correlation,
(a) 0.35 (b) 0.47
(c) 0.67 (d) 0.87

21. If bxy = 0.5, r = 0.8 and V y = 16, the σ x is


(a) 2.5 (b) .642
(c) .10 (d) 25.6.

22. If Regression equation of Y on X is 6Y —2X = 6, value of b yx shall be


(a) 1/2 (b) 1/6
(c) 2 (d) 1/3

23. Given the following equations: 2x – 3y = 10 and 3x + 4y = 15, which one is the regression
equation of x on y ?
(a) 1st equation (b) 2nd equation
(c) both the equations (d) none of these

24. If u = 2x + 5 and v = –3y – 6 and regression coefficient of y on x is 2.4, what is the


regression coefficient of v on u?
(a) 3.6 (b) –3.6 (c) 2.4 (d) –2.4

25. If 4y – 5x = 15 is the regression line of y on x and the coefficient of correlation between x


and y is 0.75, what is the value of the regression coefficient of x on y?
(a) 0.45 (b) 0.9375 (c) 0.6 (d) none of these

26. If the regression line of y on x and that of x on y are given by y = –2x + 3 and 8x = –y + 3
respectively, what is the coefficient of correlation between x and y?
(a) 0.5 (b) –1/ 2 (c) –0.5 (d) none of these

27. Regression equations of Y on X and X on Y are 6Y + 5X = 90 and 15X = 8Y + 130


respectively, Y shall be
(a) 80 (b) 40 (c) 50 (d) 60.

28. The correlation co-efficient between x and y is 0.6 hence the correlation co-efficient
between x + 0.2 and y + 0.2 is
(a) 0.4 (b) 0.6 (c) —0.6 (d) 0.9.

29. If two regression lines are 3x = y and 4y = 3x and S x = 2, then r =


(a) 0.05 (b) -0.05 (c) 0.5 (d) 0
DEEPAK SHARMA
5.9 REGRESSION ANALYSIS [C.P.T.]

30. If two regression lines are 3x =y and 4y = 3x and S x = 2, then S y =


(a) 2 (b) —2 (c) 3 (d) 5

31. If the regression coefficient of y on x, the coefficient of correlation between x and y and
variance of y are –3/4, –√3/2 and 4 respectively, what is the variance of x?
(a) 2/(√3/2) (b) 16/3 (c) 4/3 (d) 4

32. If y = 3x + 4 is the regression line of y on x and the arithmetic mean of x is –1, what is the
arithmetic mean of y?
(a) 1 (b) –1 (c) 7 (d) none of these

33. The correlation co-efficient between x and y is 0.87, hence the correlation co-efficient
between —x and —y is
(a)—0.87 (b) 0.87 (c) either a or b (d) none of them.

34. Estimate the loss of production in a week when the number of workers on strike is 1800
from the following information:
Average number of workers on strike = 800; Average loss of daily production in ‘000 Rs. = 35;
Standard deviation of numbers of workers on strike = 100; Standard deviation of loss of daily
production in ‘000 Rs. = 2; Coefficient of correlation between the number of workers and the
daily production loss = 0.8 assuming 6 working days in a week.
(a) 3,00,000 (b) 3,05,000
(c) 3,06,000 (d) 3,07,000

35. Given the following data:


Variable: x y
Mean: 80 98
Variance: 4 9
Coefficient of correlation = 0.6
What is the most likely value of y when x = 90 ?
(a) 90 (b) 103 (c) 104 (d) 107

36. The two lines of regression are given by 8x + 10y = 25 and 16x + 5y = 12 respectively. If
the variance of x is 25, what is the standard deviation of y?
(a) 16 (b) 8 (c) 64 (d) 4

37. Given below the information about the capital employed and profit earned by a company
over the last twenty five years:
Mean SD
Capital employed ( 0000 Rs) 62 5
Profit earned ( 000 Rs) 25 6
Correlation Coefficient between capital and profit = 0.92. The sum of the Regression
coefficients for the above data would be:
(a) 1.871 (b) 2.358 (c) 1.968 (d) 2.346
DEEPAK SHARMA
5.10 REGRESSION ANALYSIS [C.P.T.]

38. If the two lines of regression in a bivariate distribution are X + 9Y = 7 and Y + 4X = 16,
then Sx : Sy is
(a) 3:2 (b) 2:3 (c) 9:4 (d) 4:9.

39. If the relationship between two variables x and u is u + 3x = 10 and between two other
variables y and v is 2y + 5v = 25, and the regression coefficient of y on x is known as 0.80,
what would be the regression coefficient of v on u?
(a) 0.1067 (b) 2.3 (c) 0.945 (d) 0.498.

ANSWERS:

1. (a) 2. (d) 3. (d) 4. (d) 5. (c) 6. (a) 7. (a)


8. (d) 9. (b) 10. (c) 11. (d) 12. (c) 13. (a) 14. (c)
15. (c) 16. (c) 17. (b) 18. (d) 19. (c) 20. (c) 21. (a)
22. (b) 23. (d) 24. (b) 25. (a) 26. (c) 27. (b) 28. (b)
29. (c) 30. (b) 31. (a) 32. (b) 33. (b) 34. (c) 35. (d)
36. (b) 37.(a) 38.(a) 39.(a)

You might also like