Computer Oriented Statistical Techniques: Time: 2 HRS.) (Marks: 75
Computer Oriented Statistical Techniques: Time: 2 HRS.) (Marks: 75
Computer Oriented Statistical Techniques: Time: 2 HRS.) (Marks: 75
IV
Computer Oriented Statistical Techniques
Time : 2½ Hrs.] Prelim Question Paper Solution [Marks : 75
-1-
Vidyalankar : S.Y. B.Sc. (IT) COST
N 466
To calculate Q1, we need value i.e. = 116.5
4 4
Since this value lies in between 96 and 172, we select 40-50 class interval for computation
of Q1
N
(cf)
Formula to compute, Q1 = L1 + 4 c
f
Where, L1 : lower limit of selected class interval
cf: value of cumulative frequency above selected class
f : frequency of selected class interval
c : Class size
466
(96)
Therefore, Q1 = 40 + 4 10 = 42.69
76
3N 3 466
To calculate Q3, we need value i.e. = 349.5
4 4
Since this value lies in between 263 and 364, we select 60-70 class interval for computation
of Q3
3N
(cf)
Formula to compute, Q3 = L1 + 4 c
f
Where, L1 : lower limit of selected class interval
cf : value of cumulative frequency above selected class
f : frequency of selected class interval
c : Class size
349.5 (263)
Therefore, Q3 = 60 + 10 = 68.56
101
Q.1(d) Calculate the standard deviation of the heights of 10 students gives as, [5]
Height
161 162 160 163 160 163 164 164 170 164
(in cms)
Ans.: First we will find mean of given data,
161 162 160 163 160 163 164 164 170 164
X = = 163.1
10
Standard deviation,
(X X)2
S =
10
(161 163.1)2 (162 163.1)2 ....... (164 163.1)2
= = 2.7367
10
Q.1(e) Find the quartile deviation for the following data. [5]
Marks, x 5 10 15 20 25 30
No. of Student, f 2 3 8 7 6 4
Ans.:
Marks x No of Cumulative
students frequency
5 2 2
10 3 5
15 8 13
20 7 20
25 6 26
30 4 30
-2-
Prelim Question Paper Solution
th
N 1 th
To find Q1 we need observation i.e. 7.75 observation. Cumulative frequency more
4
than 7.75 is 13, therefore marks corresponding to that is value of Q1 = 15.
th
N 1 th th
Now to find Q3, we need 3 observation i.e. 3 7.75 = 23.25 observation.
4
Cumulative frequency more than 23.25 is 26, therefore marks corresponding to that is value
of Q3 = 25.
Q Q1 25 15
Therefore, Quartile Deviation = 3 = =5
2 2
Q.1(f) Define Factors and Data Frames in ‘R’. How to create them in ‘R’? [5]
Ans.: Factors are the data objects which are used to categorize the data and store it as levels.
They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in
data analysis for statistical modelling.
Factors are created using the factor () function by taking a vector as input.
Example : data = c("East", "West", "south", "North")
factdata = factor(data)
print(factdata)
Xj X
r
j 1
X X
r
mr = = = (X X)r
N N
The rth moment about any origin A (raw moments) is defined as
N
Xj A
r
j 1
X A
r
mr = = = (X A)r
N N
The relation between raw moments and central moments is given by:
m2 = m2 m
m3 = m3 3m1m2 2m13
m4 = m4 4m1m3 6m12m2 3m14
Q.2(b) Define Skewness. Compute Coefficient of Skewness for the following observations [5]
2, 3, 5, 7, 4, 8, 1.
Ans.: Skewness is the degree of asymmetry of a distribution. If the frequency curve of a
distribution has a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right, or to have positive skewness.
If the reverse is true, it is said to be skewed to the left, or to have negative skewness.
3 Mean - Median
Pearson’s 2nd coefficient of skewness =
Standard deviation
-3-
Vidyalankar : S.Y. B.Sc. (IT) COST
= = 2.37
7
3 4.28 - 4
Coefficient of skewness = = 0.3544
2.37
Q.2(c) A random variable X has the following probability distribution values of X [5]
X 0 1 2 3 4 5 6 7
P(X) 0 k 2k 2k 3K k2 2k2 7k2 + k
Find (i) k (ii) P(X<6) (iii) P(X6) (iv) P(0<X<5)
Ans.: We know that x =1
Here adding all the probabilities, we will get,
1
10k2 9k = 1 k= or k 1
10
but k = 1 is not possible since 0 < k < 1
1
k=
10
1
(i) k =
10
(ii) P(X < 6) = P(X = 0) + P(X = 1) + + P(X = 7)
1 2 2 3 1 81
=0 =
10 10 10 10 100 100
2 17 19
(iii) P(X 6) = (P(X = 6) + P(X = 7) = =
100 100 100
(iv) P(0 < X < 5) = P(X = 1) + + P(X = 4)
1 2 2 3 8
= =
10 10 10 10 10
Q.2(d) The data from a survey of 140 students showed that 37 study Music, 103 play a [5]
sport and 25 do neither. Create a Venn diagram to illustrate the data collected
and then determine the probability that if a student is selected at random : (i)
He or she will study music, (ii) He or she will study music given that he or she
play sport.
Ans.: Let M represent the set of students who study music and S represent the set of students
who play sports. First let’s determine the number of students that study music and play a
sport to fill in the overlapping region in the diagram and then we can find the other values.
n(M) +n(S) n(M S) = n(M S)
37+ 103 n(M S) = 115
n(M S) = 25
(i) The probability that a randomly selected student studies music is the number of
students who study music divided by the total numbers of students surveyed.
-4-
Prelim Question Paper Solution
n M 37
P(M) = =
140 140
(ii) The probability that a randomly selected student will study music given that he/she
plays a sport is
n(MS) 25
P(MS) = =
n S 103
Q.2(e) It has been found that 2% of the tools produced by a certain machine are [5]
defective. What is the probability that in a shipment of 400 such tools :
(i) 3% or more, (ii) 2% or less, will prove defective?
(0.02)(0.98) pq
Ans.: p = p = 0.02 and P = =
= 0.007
N 400
(i) Using the correction for discrete variables,
1 1
= = 0.00125
2N 800
We have,
= (0.03 0.000125) in standard units
0.03 0.00125 0.02
= = 1.25
0.007
Required probability = (area under normal curve to right of z = 1.25)
= 0.1056
0.02 0.00125 0.02
(ii) (0.02 + 0.00125) in standard units = 0.18
0.007
Required probability = (area under normal curve to left of z = 0.18)
= 0.50 + 0.0714 = 0.5714
Q.2(f) The electric light bulbs of manufacturer A have a mean lifetime of 2800 hours [5]
with a standard deviation of 400 hours. While those of manufacturer B have a
mean lifetime of 2400 hours with standard deviation of 200 hours. If random
samples of 250 bulbs of each brand are tested, what is the probability that the
brand A bulbs will have a mean lifetime that is at least, (i) 320 hours, (ii) 500
hours more than the brand B bulbs?
Ans.: Let XA and XB denote the mean lifetimes of samples A and B respectively,
then, X = X X = 2800 2400 = 400 hr
A XB A B
z =
X A
XB 200
28.28
320 400
(i) The difference 320 hrs in standard units is = 2.8288
28.28
Thus, Required probability
= (area under normal curve to right of z = 2.8288)
= 0.5 + 0.4946 = 0.9976
500 400
(ii) The difference 500 hrs in standard units is =3.53
28.28
Thus, Required probability
= (area under normal curve to the right of z = 3.53)
= 0.5 0.4998 = 0.0002
-5-
Vidyalankar : S.Y. B.Sc. (IT) COST
(ii) The 99% confidence limits are X 2.58 error will be less than 0.01
N
0.05
if 2.58 0.01 2.58 N 166.4 < N
N 0.01
Thus we can be 99% confident that the error of the estimate will be less than 0.01
seconds if N is 167 or larger.
Q.3(b) A measurement was recorded as 216.480 grams (g) with a probable error of [5]
0.272 g. What are the 95% confidence limits for the measurement?
Ans.: The probable error is 0.272
Now, 0.272 = 0.6745 X
0.272
X = X = 0.4033
0.6745
Thus the 95% confidence limits are X1.96X
216.480 1.96 (0.4033)
216.480 0.79
The confidence limits are (215.69, 217.27)
Q.3(c) A sample poll of 100 voters chosen at random from all voters in a given district [5]
indicated that 55% of them were in favor of a particular candidate. Find the (a)
95%, (b) 99%, and (c) 99.73% confidence limits for the proportion of all the
voters in favor of this candidate.
Ans.: Here, p = 0.55 q = 0.45
(i) The 95% confidence limits for the population are
pq (0.55)(0.45)
p 1.96p = p 1.96 = 0.55 1.96
N 100
= 0.55 0.10
-6-
Prelim Question Paper Solution
Q.3(d) Explain Type I and Type II errors and Level of Significance. [5]
Ans.: Type – I error and Type – II error:
When we reject a hypothesis when it should be accepted, we will say that Type – I error has
been made when we accept a hypothesis when it should be rejected we say that Type – II
error has been made.
In order for decision rules (or tests of hypotheses) to be good, they must be designed so as
to minimize error of decision. This is not a simple matter, because for any given sample size,
an attempt to decrease one type of error is generally accompanied by an increase in the
other type of error. In practice, one type of error may be more serious than the other and
so a compromise should be reached in favour of limiting the more serious error. The only
way to reduce both types of error is to increase the sample size, which may or may not be
possible.
Level of Significance :
The maximum probability with which we are ready to risk type – I error is called the level of
significance or significance level.
This probability is denoted by and is specified before a sample is drawn.
A significance level of 0.05 (5%) or 0.01 (1%) is common.
The significance level is 5% it means that there are 5 chances in 100 that we would reject a
hypothesis when actually it should be accepted. This means we are 95% confident that the
decision taken is right.
Q.3(e) The breaking strengths of cables produced by a manufacturer have a mean of [5]
1800 pounds (lb) and a standard deviation of 100 lb. By a new technique in the
manufacturing process, it is claimed that the breaking strengths can be
increased. To test this claim, a sample of 50 cables is tested and it is found
that the mean breaking strength is 1850 lb. Can we support the claim at the
0.01 significance level?
Ans.: Let,
H0 : = 1800 lb and there is no change in breaking strength
H1 : > 1800 lb and there is a change in breaking strength
One tailed test + Reject H0
= 100, N = 50, X = 1850. L.O.S. = 0.01 Reject
X 1850 1800
Z= = = 3.5355 > 2.33
/ N 100 / 50
By new technique breaking strength has increased.
Q.3(f) Two groups, A and B, consist of 100 people each who have a disease. A serum [5]
is given to group A but not to group B (which is called the control); otherwise,
the two groups are treated identically. It is found that in groups A and B, 75
and 65 people, respectively, recover from the disease. At significance levels of
(a) 0.01, (b) 0.05, and (c) 0.10, test the hypothesis that the serum helps cure
the disease. Compute the p-value and show that p-value>0.01, p-value4\>0.05,
but p-value<0.10.
Ans.: Let, H0 : p1 = p2
H1 : p1 > p2
One tailed test (L.O.S. = 0.01, table value = 2.33)
75 65
N1 = 100, p1 = = 0.75 N2 = 100, p2 = = 0.65
100 100
N P N2 P2 (1000.75)(1000.65)
p = 11 = = 0.7
N1 N2 100100
q =1–p = 0.3
-7-
Vidyalankar : S.Y. B.Sc. (IT) COST
Q.4(b) Pumpkins were grown under two experimental conditions. Two random samples of [5]
11 and 9 pumpkins show the sample standard deviations of their weights as 0.8
and 0.5 respectively. Assuming that the weight distributions are normal, test the
hypothesis that the true variances are equal, against the alternative that they
are not, at the 10% level. [Assume that P (F10, 5 = 3.35) = 0.05 and P (F8, 10
3.07) = 0.05.
Ans.: For the two samples 1 and 2
we have, N1 = 11, N2 = 9
S1 = 0.8, S2 = 0.5
-8-
Prelim Question Paper Solution
Q.4(c) The standard deviation of the heights of 16 male students chosen at random in a [5]
school of 1000Male students is 2.40 in. Find the (i) 95% and (ii) 99% confidence
limits of the standard deviation for all male students at the school.
Ans.: (i) The 95% confidence limits are given by
S N S N
0.975 0.025
From table 20.975 = 27.5 corresponding to v = 15
0.975 = 5.24
Also, 20.025 = 6.26 corresponding to v = 15
0.025 = 2.50
2.40 16 2.40 16
The 95% confidence limits are and
5.24 2.50
The 95% confidence limits are 1.83 and 3.84
Q.4(d) Calculation the chi-square value for the following data. [5]
Colour Red Green Yellow
Observed Frequency 12 16 20
Expected Frequency 16 8 15
Ans.: Given data :
Colour Red Green Yellow (0 e1 )2 (02 e2 )2 (03 e3 )2
2 = 1
Observed e1 e2 e3
12 16 20
Frequency (12 16)2 (16 8)2 (20 15)2
=
Expected 16 8 15
16 8 15
Frequency = 10.667
Q.4(e) Acme Toy Company prints baseball cards. The company claims that 30% of the [5]
cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars.
Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-
Stars. Is this consistent with Acme’s claim? Use a 0.05 level of significance.
(Use chi-square goodness of fit).
Given P(2 > 19.58) = 0.0001
Ans.: Given data can be tabulated as
Observed Frequency 50 45 5
Expected Frequency 30 60 10
(01 e1 )2 (O2 e2 )2 (O3 e3 )2
2 =
e1 e2 e3
2 2
(50 30) (45 60) (5 10)2
=
30 60 10
= 19.58
At 0.05 level of significance 20.95 = 7.81 at df = 3
Since 19.58 > 7.81
Acme’s claim should be rejected.
-9-
Vidyalankar : S.Y. B.Sc. (IT) COST
Q.4(f) A survey of 320 families with 5 children each revealed the following distribution: [5]
Boys 5 4 3 2 1 0
Girls 0 1 2 3 4 5
No. of families 14 56 110 88 40 12
In this result consistent with the hypothesis that male and female births are
equally probable?
Ans.: Let p : probability of male birth
q : probability of female birth
(p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5
1
If p = q =
2
5
1 1
P(5 boys and 0 girls) = =
2 32
4
1 1 5
P(4 boys and 1 girl) = 5 =
2 2 32
3 2
1 1 10
P(3 boys and 2 girls) = 10 =
2 2 32
2 3
1 1 10
P(2 boys and 3 girls) = 10 =
2 3 32
4
1 1 5
P(1 boy and 4 girls) = 5 =
2 2 32
5
1 1
P(0 boys and 5 girls) = =
2
32
The expected number of families with 5, 4, 3, 2, 1 and 0 boys are obtained by multiplying
the above probabilities by 320 and the results are 10, 50, 100, 100, 50 and 10.
Hence,
(18 10)2 (56 50)2 (110 100)2 (88 100)2 (40 50)2 (8 10)2
2 =
10 50 100 100 50 10
= 12.0
Thus we can conclude that the results are probably significant and male and female births
are not equally probable.
- 10 -
Prelim Question Paper Solution
x y x2 x3 x4 xy x2y
5 11 25 125 625 55 275
10 13 100 1000 10000 130 1300
15 16 225 3375 50625 240 3600
20 20 400 8000 160000 400 8000
25 28 625 15625 390625 700 17500
30 36 900 27000 810000 1080 32400
Total 105 124 2275 55125 1421875 2605 63075
Q.5(c) Calculate the Coefficient of Correlation between the Age and Blood pressure of [5]
given people from a colony.
Age in Years 60 65 70 40 45 50 55
Blood Pressure 145 160 160 125 140 140 145
Ans.: Correlation coefficient,
N XY X Y
r =
N X2
X N Y2 Y
2 2
=
7 56575 385 1015
7 21875 385 7 148075 1015
2 2
= 0.9449
- 11 -
Vidyalankar : S.Y. B.Sc. (IT) COST
x y x2 y2 xy
60 145 3600 21025 8700
65 160 4225 25600 10400
70 160 4900 25600 11200
40 125 1600 15625 5000
45 140 2025 19600 6300
50 140 2500 19600 7000
55 145 3025 21025 7975
Total 385 1015 21875 148075 56575
Q.5(d) Given the following data estimate the linear trend equation. Find trend value and [5]
calculate the trend value of
Year 2010 2011 2012 2013 2014
No. of cars (in Thousand) 11 30 38 50 56
Ans.:
Year X Y x = X - Xbar y = Y - Ybar x2 xy
2010 1 11 -2 -26 4 11
2011 2 30 -1 -7 1 60
2012 3 38 0 1 0 114
2013 4 50 1 13 1 200
2014 5 56 2 19 4 280
Total 15 185 0 0 10 665
xy
For computations of finding the trend line we will use y= x
x2
Where, y = Y Y and x = X X
Here, Y = 37 and X = 3
xy 665
Therefore, Now, y = x y= x y = 66.5x
x2 10
But y = Y Y and x = X X
(Y37) = 66.5(X3)
Y = 66.5X 162.5 is the required trend line.
Q.5(e) Find (i) x, (ii) y, (iii) V(x), (iv) V(y) and (v) cov (x, y) for the following data: [5]
x 1 2 3 5 4 3
y 2 4 5 5 3 1
Also verify r = xy/x y
Ans.:
x y (x - xbar) (y - ybar) xy (x - xbar)2 (y - ybar)2
1 2 -2 -1.33 2 4 1.7689
2 4 -1 0.67 8 1 0.4489
3 5 0 1.67 15 0 2.7889
5 5 2 1.67 25 4 2.7889
4 3 1 -0.33 12 1 0.1089
3 1 0 -2.33 3 0 5.4289
Total 18 20 0 0.02 65 10 13.3334
(i) x =
(x x)
2
= 1.2909
N
- 12 -
Prelim Question Paper Solution
(ii) y =
(y y) 2
= 1.4905
N
(iii) V(x) = 2x = 1.666
2
(iv) V(y) = y
= 2.2216
(v) Cov(x, y) =
xy xy =
65
(3) 3.33 = 0.8433
n 6
Now, r =
x x y y = 0.4335
Nx x
Q.5(f) The two regression lines between x and y are given below. Find x, y and r [5]
100y 45x 1400 = 0, 4y – 5x + 200 = 0
Ans.: Solving both the equations simultaneously,
We get, x = 80, y = 50
Assume that 1st equation gives the regression equation of y on x which can be rewritten as
y = 0.45x + 14 bxy = 0.45
And let 2nd equation be regression equation of x on y which can be rewritten as x = 0.8y + 40
bxy = 0.8
Now, r byx bxy = 0.45 0.8 = 0.6
- 13 -