EM 521 Study Set 1 Solutions
EM 521 Study Set 1 Solutions
EM 521 Study Set 1 Solutions
15/03/2012
1. We have randomly selected 100 telephone numbers from METU Phone Book 1970 and
recorded the last digit of each. Suppose we are also given the last digits of 100 randomly
selected phone numbers from METU Phone Book 2000. Below are the box-plots for the two
data sets:
5
Data
a. Analyzing the box-plots above, what comparison can you make on the mean, median, and
the variance of the two data sets?
Median x =3
Median y=4.5
Distribution of X is right skewed, so Meanx>Median x
Distribution of Y is approximately symmetric, so Meany Median y
Std.dev. x< Std. Dev. y
b. From the three histograms given below, which two correspond to the data sets of METU
Phone Book 1970 and 2000 respectively?
Figure 1 Figure 2
20
14
12
15
10
Frequency
Frequency
8
10
4
5
0 0
0 2 4 6 8 0 2 4 6 8
Figure 3
20
15
Frequency
10
0
0 2 4 6 8
c. Below are the descriptive statistics for the two data sets and the aggregate data obtained
by combining these two data sets. What can be the reason of the differences between the
descriptive statistics values given below?
Descriptive Statistics:
Variable Mean Variance
METU Phone Book 1970 3,640 7,647
METU Phone Book 2000 4,500 10,293
Aggregate Data 4,070 9,111
In year 1970, the telephone numbers the last digits of which are higher than 6 are out of the
interquartile range. The reason may be that, since the population of METU was smaller in 1970 than
2000, the telephone numbers ending with higher numbers were not used much. However in year 2000,
by the increase of the population of METU, uniformity has been obtained over the usage of the last
digits of the telephone numbers. Consequently, both mean and variance are higher in year 2000 than
1970. When we consider the aggregate data Z ( , we see that we are aggregating one
data set with a small sample mean and variance and another data set with a higher sample mean and
sample variance. Therefore the mean and the variance of the aggregated data set will be between the
values of the individual data sets.
1 80 5
1 81
1 82
1 83
1 84
1 85
1 86
3 87 22
4 88 0
4 89
4 90
4 91
4 92
4 93
4 94
5 95 5
5 96
7 97 03
13 98 004477
(4) 99 5788
13 100 012355
7 101 255
4 102 669
1 103
1 104
1 105 5
Mean= 9.8037, Median= 9.9750, Mode(s)= 8.72, 9.8, 9.84, 9.87, 9.98, 10.05, 10.15, 10.26
40
x30 12
100
12th observation: 9.87
d. Draw box-and-whisker plot. Write down on the plot the numerical values of the lines. Are
there any outlier(s)?
10.5
10.0
Remote Location
9.5
9.0
8.5
8.0
3. The table contains 50 random samples of random digits, y = 0,1,2,3,....,9, where the
probabilities corresponding to the values of y are given by the formula p(y) = 1/10. Each
sample contains n = 6 measurements.
b. Calculate s2 for the 300 digits. This should be close to the variance of y, 2= 8.25.
Descriptive Statistics: C5
N for
Variable Q3 Maximum IQR Mode Mode Skewness Kurtosis
C5 7.000 9.000 5.000 6 42 -0.12 -1.17
c. Calculate y for each of the 50 samples. Construct a relative frequency distribution for the
sample means to see how close they lie to the mean of = 4.5. Calculate the mean and
standard deviation of the 50 means.
The mean for each sample is calculated and is available in the Excel file. The mean of the 50 means is
4,68 and the standard deviation of 50 means is 1,173.
N for
Variable Q3 Maximum IQR Mode Mode Skewness Kurtosis
C6 5.500 7.333 1.667 3.83333 4 0.09 -0.15
Histogram of C6
10
8
Frequency
0
2 3 4 5 6 7
C6
To see the effect of sample size on the standard deviation of the sampling distribution of a
statistic, combine pairs of samples (moving down the columns of the table) to obtain 25
samples of n=12 measurements.
d. Calculate the mean for each sample.
The paired samples means and standard deviations are given in the Excel file.
e. Construct a relative frequency distribution for the 25 means. Compare this with the
distribution that is based on samples of n=6 digits.
Histogram of C8
10
8
Frequency
0
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
C8
f. Calculate the mean and standard deviation of the 25 means. Compare the standard
deviation of this sampling distribution with the standard deviation of the sampling
distribution in c. What relationship would you expect to exist between the two standard
deviations?
The mean of 25 sample means is 4,68 and the standard deviation of 25 sample means is 0,827. The
mean is the same as in part c as expected. The standard deviation is smaller compared to 50 sample
case.
N for
Variable Q3 Maximum IQR Mode Mode Skewness Kurtosis
C8 5.083 6.333 0.542 4.58333, 4.75 3 -0.56 0.92
4. An industrial engineer is about to take on two projects, say A and B. He knows that project
A will end with success with probability 0.70. If he also knows that project B will end with
success with probability 0.60 and the probability of both projects ending in success is 0.50,
a) What is the probability that at least one of the projects will end in success?
P(AUB) = P(A) + P(B) P(A B) = 0.70 + 0.60 0.50 = 0.80
d) What is the probability that exactly one project will end in success?
P[(A-B)U(B-A)] = P(AUB) P(A B) = 0.80 0.50 = 0.30
e) What is the probability that none of the projects will end in success?
P[(AUB)] = 1 P(AUB) = 1 0.80 = 0.20
5. a) Find a formula for the probability distribution of the number of heads when a coin is
tossed four times.
P(X = i) = C(4,i)*(1/2)i*(1/2)4-i = C(4,i)*(1/2)4
Then, the probability mass function can be obtained as:
1/16, x = 0
4/16, x = 1
6/16, x =2
p(x) =
4/16, x = 3
1/16, x = 4
0, o/w
0, x < 0
1/16, 0 x < 1
11/16, 2 x < 3
15/16, 3 x < 4
1, 4 x
6. Recent research on concrete structures shows that Poisson distribution can be used to
represent occurrence of structural loads over time. Suppose that on the average, the time
between occurrences of loads is 0.5 year.
= 1/0.5 = 2 structural loads per year
b) What is the probability that more than 5 loads occur during a 2-year period?
P( X 5) 1 P( X 0) P( X 1) P( X 2) P( X 3) P( X 4) P( X 5)
e 4 4 0 e 4 41 e 4 4 2 e 4 4 3 e 4 4 4 e 4 4 5
1 0.215
0! 1! 2! 3! 4! 5!
c) How long must a time period be so that the probability of no loads occurring on that period
is at most 0.1?
Let Y: number of structural loads occurring in a t-year period
Y ~ Poisson (2t)
We want P(Y 0) 0.1 .
e 2t (2t ) 0
P(Y 0) 0.1
0!
e 2t 0.1
2t ln 0.1
ln 0.1
t 1.151
2
t2
d) If it is known that at least 1 structural load will occur in the coming year, what is the
probability that at least 3 structural loads will occur?
Z: number of occurrences of structural loads in a year
Z ~ Poisson (2)
P( Z 3) 1 P( Z 0) P( Z 1) P( Z 2)
P( Z 3 | Z 1)
P( Z 1) 1 P( Z 0)
e 2 2 0 e 2 21 e 2 2 2
1
0! 1! 2! 0.323 0.374
2 0
e 2 0.865
1
0!
7. The time X (in minutes) for a lab assistant to prepare the equipment for a certain lab
experiment is assumed to have a uniform distribution with A = 25 and B = 35.
0,12
0,10
0,08
0,06
0,04
0,02
0,00
25 35
b) What is the probability that the preparation time exceeds 33 minutes?
35
1 2 1
P( X 33) 10 dx 10 5
33
9. The Rockwell hardness of a metal is determined by impressing a hardened point into the
surface of the metal and then measuring the depth of penetration of the point. Suppose the
Rockwell hardness of a particular alloy is normally distributed with mean 70 and standard
deviation 3. (Rockwell hardness is measured on a continuous scale.)
a) If a specimen is acceptable only if its hardness is between 67 and 75, what is the
probability that a randomly chosen specimen has an acceptable hardness?
X: Rockwell hardness of the specimen
X ~ Normal (70, 9)
67 70 75 70
P(67 X 75) P Z
3 3
5
FZ FZ (1) 0.95254 0.15567 0.79687
3
b) If the acceptable range is as in part (a) and the hardness of each of 10 randomly selected
specimens is independently determined, what is the expected number of acceptable
specimens among the 10?
Y: number of acceptable specimens out of 10
Y ~ Binomial (10, 0.7968)
E(Y ) n * p 10 * 0.7968 7.968
10. Suppose that Bob can decide to go to work by one of three modes of transportation, car,
bus, or commuter train. Because of high traffic, if he decides to go by car, there is a 50%
chance he will be late. If he goes by bus, which has special reserved lanes but is
sometimes overcrowded, the probability of being late is only 20%. The commuter train is
almost never late, with a probability of only 1%, but is more expensive than the bus.
a) Suppose that Bob is late one day, and his boss wishes to estimate the probability that he
drove to work that day by car. Since he does not know which mode of transportation Bob
usually uses, he gives a prior probability of 1/3 to each of the three possibilities. What is
the boss estimate of the probability that Bob drove to work?
By Bayes Theorem,
Pr{ car | late } = =
=0.7042
b) Suppose that a coworker of Bobs knows that he almost always takes the
commuter train to work, never takes the bus, but sometimes, 10% of the
time, takes the car. What is the coworkers probability that Bob drove to
work that day, given that he was late?
By Bayes Theorem,
Pr{ car | late } = =
=0.8475