Statistics For Economists

CHAPTER ONE
OVERVIEW OF BASIC PROBABILITY THEORY

1.1. Sample Space, Sample Points, Events & Event Space
The sample space(S): is a set that consists of all possible outcomes of a random experiment.
The outcomes listed must be mutually exclusive and exhaustive.
E.g., S= {1,2,3,4,5,6}
A sample point: Each element in the sample space.
Event (E): a subset of the sample space.
E.g., E= {1,3,5}
1.2. Definitions of Probability
Probability as a general concept can be defined as the measure of a chance that something will occur.
Probability of a specific event: a mathematical statement about the likelihood that it will occur.
The sum of the probabilities of all possible outcomes of any event is 1.
1.1.2 Axioms and Rules of Probability
The probability of an event
The probability of an event always lies between 0 and 1 inclusive. Thus, when an event does not
occur, its probability is zero. For instance, the probability of an empty set is zero or the probability
of getting a hen from a snake’s egg is zero since it is impossible to get a hen from a snake’s egg. On
the other hand, events that are certain to occur have a probability of one. The probability of the
entire (whole) sample space (S) is equal to one. Therefore, probabilities are values of a set of
functions so-called probability measure, for, as we shall see this function assigns real numbers to
the various subsets of a sample space (S). Now let us formulate the probabilities here which apply
only when the sample space is discrete.
Postulate 1. The probability of an event is a non-negative real number.
i.e., P (A) ≥ 0, for any subset A of S
Postulate 2. The probability of a sample space is equal to one.
i.e, P(S) = 1
Postulate 3.If A1, A2, A3… is a finite or infinite sequence of mutually exclusive events of S, then
( A )=∑ P ( A )
∞ ∞
P ¿ i i
i=1
P(A1 U A2 U A3 U…..) = P (A1) + P(A2) + P(A3) + …. = i=1
1
Example
If we have four possible outcomes A, B, C and D, that are mutually exclusive, are the following
probabilities permissible?
a) P(A) = 0.12; P(B) = 0.63; P(C) = 0.46, P(D) = - 0.21
Based on the above postulates:
i) P(D) = - 0.21 is not permissible, since it is not a non-negative real number (it
violets postulate 1)
b) P(A) = 9/120; P(B) = 45/120; P(C) = 27/120; P(D) = 46/120
This is not permissible, since it violates postulate two.
i.e, P(S) = P(A U B U C U D) = P (A) + P(B) + P(C) = P(D) ≠ 1
Theorem 1.1 If A is an event in a discrete sample S, then P(A) equals the sum of the probabilities
of the individual outcomes comprising A.
Proof: let Y1, Y2, Y3, …., be a finite or infinite sequence of outcomes that comprise the event A.
Thus A = Y1, U Y2 U Y3 U …… = i=1

∞
¿ y . Since the individual outcomes the ‘Y ’ are mutually
i
s
exclusive, the third postulate of probability gives:
∑ P ( yi)
∞
P(A) = P(Y1) + P(Y2) + P(Y3) + …… = i=1 which completes the proof.

Example
If twice tossed a coin, then the probability of getting at least one T will be :( hint: in statistics at
least one means one or more than)
Solution: S = {HH, HT, TH, TT}, so the probability of each sample point is ¼.
Let A is the event that we will get at least one T, then we get A = {HT, TH, TT} and P(A)
= P(HT) + P(TH) + P(TT) = ¼ + ¼ + ¼ = ¾
Theorem 1.2 If an experiment can result in any one of M different equally likely outcomes, and if
m of these outcomes together constitute event A, then the probability of event A is P(A) = m/M.
Proof: - let Y1,Y2, …..YM represent the individual outcomes in S, each with probability 1/M.
If A is a union of m of these mutually exclusive outcomes, then P(A) = P(Y 1, U Y2, U Y3, U
…… U Ym)= P(Y1) + P(Y2) + PY3) .... + P(Ym) = 1/M + 1/M + 1/M + ….+ 1/M = m/M
Example see example of theorem 2.1
2
Some rules of probability
Theorem 1.3 If A and A1 are complementary events in a sample space S, then P(A1) = 1- P(A).
Poof: - we know that A U A1 = S  according to the properties of complement and mutually
exclusive ideas.
Then 1 = P(S)  by postulate 2
= P(A U A1)
= P(A) + P(A1)  by postulate 3
Therefore, P(A) = - P(A1)  algebra.
Example:
If an event occurs 40% of the time, it does not occur 60% of the time.
Theorem 1.4 The probability of an empty set is zero. i.e., P (ø) = 0 for any sample space S.
Proof: - Since S and ø are mutually exclusive and S U ø = S and S  according to the properties of
ø, then P(S) = P(S U ø)
P(S) = P(S) + P(ø)  by postulate 3
P(S) - P(S) = P(ø) = 0  algebra
Therefore P (ø) = 0
Example
i. The probability of getting four heads from a toss of three coins is zero.
ii. The probability of getting pigeon from snake is zero.
Theorem 1.5 If A and B are events in a sample space S, and A ⊂ B, then P(A) ¿ P(B).
Proof: - since A ⊂ B, we can write mutually exclusive events as follows and, we get:
P(B) = P[A U (A1 n B)]
P(B) = P(A) + P(A1 ¿ B)  by postulate 3

P(B) ≥ P(A)  by postulate 1
Example
If twice tossed a coin
i) the probability of a sample space S is:
P(S) = P(HH U HT U TH U TT) = P(HH) + P(HT) + P(TH) +P(TT) = ¼ + ¼ + ¼ + ¼ = 1
ii) the probably of getting two heads:
Let A be an event that contains two heads i.e., A = {HH} therefore A ⊂ S, then P (A) =
P (HH) = ¼, hence P(S) ≥ P (A) => 1 ≥ ¼.
3
Theorem 1.6 0 ≤ P (A) ≤1, for any event A.
Proof: - by theorem 2.5 and facts that ø ⊂ A ⊂ S for any event in a sample space S, then we have:
P (ø) ≤ P (A) ≤ P (S), and then P (ø) = 0 and P(S) = 1 by theorem
(1.4), and postulate 2 respectively. Therefore: - 0 ≤ P(A) ≤ 1, but P(A) cannot be greater than one or
less than zero, by postulates 1 and 2 and the definition of probability.
Theorem 1.7 If A and B are any two events in a sample space S, then P (A U B) = P(A) + P(B) –
P(A ¿ B)
Proof:- Using the help of the Venn diagram, let a, b and c be probabilities to the mutually
exclusive events A ¿ B, A ¿ B1 and A1 ¿ B
P(A U B) = a + b + c
= (a+b) + (c+a) – a
= P(A) + P(B) – P (A ¿ B)
Example:
In a sample of 1000 students, 640 said they had pen, 350 said they had pencil, and 200 said they
had both. If a student is selected at random, the probability that the student has either a pen or a
pencil will be:
Solution: Let S is a sample space, E an event for a pen and R – for pencil then in a Venn-diagram it
could be illustrated as follows:
E R
440 200 150
P (E U R) = P (E) + P(R) - P (E ¿ R)
= 640/1000 + 350/1000 – 200/1000
= 0.64 + 0.35 – 0.20 = 0.79
= or 79%
4
Theorem 1.8. If A, B and C are any three events in a sample space S, then
P(A U B U C) = P(A) + P(B) + P(C) – P(A ¿ B) – P(A ¿ C) – P(B ¿ C) + P(A ¿ B ¿ C).

Proof:- A U B U C = A U (B U C) and by Theorem 2.7, P[A U(B U C)] should be shown P[A U (B
U C)] and P (B U C), then P(A U B U C) = P[A U (B U C)] =
= P(A) + P(B U C) – P[A ¿ (B U C)]  theorem 1.7
= P(A) + P(B) + P(C) – P(B ¿ C) – P[(A ¿ (B U C)]
= P(A) + P(B) + P(C) – P(B ¿ C) – P[(A ¿ B) U (A ¿ C)]
= P(A)+P(B)+P(C)–P(B ¿ C)–{P(A ¿ B)+P(A ¿ C)– P[(A ¿ B ¿ (A ¿ C)] }

Therefore P(AUBU C)=P(A)+P(B)+P(C)–P(B ¿ C)-P(A ¿ B)–P(A ¿ C)+P(A ¿ B ¿ C)
Example based on the given Venn-diagram, below, determine the P(A U B U C).
A B
0.18
0.06 0.24
0.22
0.06 0.14
0.06
0.06
C
Solution
P(S) = 1
P (A U B U C) = P (A) +P (B) +P(C)–P (B ¿ C)-P (A ¿ B)–P (A ¿ C)+P(A ¿ B ¿ C)

= 0.58 + 0.78 + 0.48 – 0.36 – 0.46 – 0.28 + 0.22
= 0.96
1.3 Conditional probability and the Bayes’ theorem
1.3.1 Conditional probability
Difficulties can easily arise when probabilities are quoted without specification of the sample space.
For instance if we ask for the probability that an economist makes more than birr 240,000 per year,
we may get several answers, and they may all be correct. Out of them one might apply to all
5
Economics school graduates, another might apply to all persons licensed as an economist, a third
might apply to all those who are actively engaged in the practice of their profession.
Here, since the choice of the sample space is by no means always self evident, it often helps to use
the symbols P(A/S) to denote the conditional probability of event A given S. In other words the
probability occurrence of something is conditional upon your information set. Thus it is the
probability occurrence of one composite outcome given that another composite outcome has
occurred.
Definition 1.1 If A and B are any two events in a sample space S, and P (A) ≠ 0, the conditional
P ( Aintersect B )
P ( B/ A )=
probability of B given A is: P ( A)
Example
1) Suppose from a fertilizer dealer enterprise an experiment had taken place and two items were
successively selected at random which is conducted from a lot containing 20 quintals defective
and 60 quintals non-defective fertilizers. Let A is the first fertilizer defective and B is the second
fertilizer defective. Then P(A) = 20/80 = ¼. But, in computing P(B), it is assumed that this
experiment is conducted without replacement, that is the defective fertilizer is not thrown back
to the lot before the next fertilizer is selected. In this case P(B) depends on whether A has
occurred or not. The second fertilizer selected being defective is event B.
Therefore P(B/A), the probability of B where A has occurred is P(B/A)= 19/79. This is because
after the first selection was found to be defective only 19 defective items remain from the remaining
79 quintals of fertilizers. Hence applying the general rule of multiplication, the probability that both
fertilizers are defective is given by:
P(A n B) = P(A) P(B/A) = (20/80 ) (19/79) = 380/6320 = 38/632 = 0.06
Example
2) Suppose a die is loaded in such a way that each odd number is twice as likely to occur as each
even number.
i. What is the probability that a number greater than 3 occurs on a single roll of the
die?
ii. What is the probability that the number of points rolled is a perfect square?
iii. What is the probability that it is a perfect square given that it is greater than 3?
6
Solution
Let X is the event that the number of points rolled is greater than 3, and Y is the event that it is a
perfect square, we have X= {4, 5, 6}; Y= {1, 4} and X ¿ Y= {4} and let the space contain the
elements S = {1, 2, 3, 4, 5, 6}; and let probability E to each even number and let probability 2 E to
each odd number.
Then, i. P(S) = 2 E + E + 2E + E + 2E + E = 9E = 1 by postulate 2.
 P(S) = 2/9 + 1/9 + 2/9 + 1/9 + 2/9 + 1/9 = 1
Therefore P(X) = P (4) + P (5) + P (6) = 1/9 + 2/9 + 1/9 = 4/9
ii. P(Y) = P (1) + P (4) = 2/9 + 1/9 = 3/9 = 1/3.
iii. P(Y/X) = P(X n Y) = (1/9) = ¼.
P(X) (4/9)
If we multiply the expression on both sides of the formula of definition 1.1 by P (A), we obtain the
following rule.
Theorem 1.9 If A and B are any two events in a sample space S and P(A) ≠ 0, then
P (A n B) = P (A). P (B/A)  conditional but not independent.
Alternatively if P(B)≠ 0, the probability that A and B will both occur is the product of the
probability of B and the conditional probability of A given B. i.e., P(B n A) = P(B). P (A/B).
Example
1. If you randomly pick two sacks of teff in succession from a store of 240 sacks of teff of which
15 are defective, what is the probability that they will both be defective?
Solution: Assuming that there is an equal probability for each selection and let A is the 1 st defective
sack of teff and B – is the 2 nd defective sack of teff. Hence, the probability that the 1 st sack of teff
will be defective is P(A) = 15/240, and the probability that the 2 nd sack of teff will be defective
given that the 1st sack of teff is defective is P(B/A)=14/239.
Thus the probability that both sacks of teff will be defective is P(A n B) = P(A). P(B/A) = 15/240 X
14/239 = 7/1912 it is called sampling without replacement, because the first sack of teff is not
replaced before the 2nd sack of teff is selected.
2. Based on the above example, what is the probability that both will not be defective?
7
Solution: If A is the 1st defective item, then A1 is 1st non defective, and analogically B1 is the 2nd
non-defective sack of teff hence, P (A1 n B1) = P (A1).P (B1/A1) = 225/240 X 224/239.Because P
(A1) = 225/240; P (B1 /A1) =224/239.
3. If you randomly roll two fair dice such that the events A and B: A = {(X,Y) / X + Y = 11} and
B = {(X,Y) / X > Y}, where X is the element from the first die, Y-the element from the 2 nd die.
What is the probability of B given A (or what is the probability of having X > Y given X + Y =
11)
Solution:
1st let’s find event A = {(5, 6); (6, 5)}
=> P (A) = 2/36
2nd event B = {(2, 1), (3, 1),(4,1), (5, 1), (6, 1), (3, 2), (4, 2), (5, 2), (6, 2), (4,3), (5, 3),(6, 3), (5,
4), (6, 4), (6, 5)}
=> P (B) = 15/36
3rd the event (A ¿ B) = {(6, 5)}
=> P (A ¿ B) = 1/36
Therefore P (B/A) = P (A ¿ B) = 1/36 = ½.

P (A) 2/36
Theorem 1.10 If A, B, and C are any three events in a sample space S such that
P(A ¿ B) ≠ 0, then P(A ¿ B ¿ C) = P(A). P(B/A). P[C/(A ¿ B)]
Proof: - It is known that A ¿ B ¿ C= (A ¿ B) ¿ C using theorem 2.9 we get:
P(A ¿ B ¿ C)=P[(A ¿ B) ¿ C]=P(A ¿ B).P[C/(A ¿ B)]= P(A). P (B/A).P[C/(A ¿ B)]

Example:-
1. From whole sales a container of fertilizer contains 20 sacks of fertilizers of which 5 racks are
defective. If 3 of the sacks are selected at random and removed from the container in
succession without replacement, what is the probability that all 3 sacks are defective?
Solution: - Let: A. is the event 1st sack of fertilizer is defective
B.” ” 2nd ” ” ”
C.” ” 3rd ” ” ”
Then P(A) = 5/20; P(B/A) = 4/19; and P[C/ (A ¿ B)] = 3/18, and substitution in to the formula gives:
P(A ¿ B ¿ C) = P(A).P(B/A). P[C/A ¿ B)] = 5/20 x 4/19 x 3/18 =1/14.
8
1.3.2 Independent events
We may confront rarely in our real world that the occurrence of either one event does not
affect the probability of the occurrence of another. In other words two events say A and B are
independent if the occurrence or nonoccurrence of either one does not affect the probability of the
occurrence of the other.
Definition 1.2 Two events A and B are independent if and only if P(A ¿ B)=P(A) . P(B)
We can here, elaborate the above definition as follows:
Two events A and B are said to be independent of each other if and only if:
i. P(A/B) = P(A) or
ii. P(B/A) = P(B)
iii. P(A ¿ B) = P(A). P(B)

Example
1. An air craft has two independent safety systems. The probability that the 1 st will not operate
properly in an emergency P (A) is 0.01, and the probability the 2nd will not operate P(B) in an
emergency is 0.02. What is the probability that in an emergency both of the safety systems
will not operate?
Solution:
The probability that both will not operate is:
P (A ¿ B) = P(A). P (B) = 0.01 x 0.02 = 0.0002
B
1
Theorem 1.11 If A and B are independent, then A and are also independent.
I.e. P (A ¿ B) = P(A).P(B) → P(AnB1) = P(A).P(B1)
Proof: - since A= (A ¿ B) ¿ (A ¿ B1) → are mutually exclusive
P (A) = P [(A ¿ B) ¿ (A ¿ B1)]
P (A) = P [(A ¿ B) + P (A ¿ B1) – postulate 3 and theorem 1.1
P (A) = P (A). P (B) + P (A ¿ B1) – given that A and B are independent
P (A) - P (A). P (B) = P (A ¿ B1) – algebra
P (A) [1 –P (B)] = P (A ¿ B1) → distributive property
9
P (A). P (B 1) = P (A ¿ B1) → theorem 1.3. And hence that A and B1 are
independent.
Definition 1.3. Events A1, A2… Ak are independent if and only if the probability of the intersection
of any 1, 2, or k of these events equal the product of their respective probabilities.
( A )=∏ P( A )
k k
P intersect i i
I.e. P (A1 ¿ A2 ¿ ….. ¿ AK) = P (A1) P (A2).P(A3) ….P (AK) = i=1 i=1 .
Example:
1) Using the figure below, show that P (A ¿ B ¿ C) = P (A). P (B). P(C).
A B
0.02
0.18
0.06 0.24
0.24
0.06 0.14
0.06
C
Solution: - P (A) = 0.06 +0.06 + 0.24 + 0.24 = 0.6

P (B) = 0.24 + 0.24 + 0.14 + 0.18 = 0.8
P (C) = 0.06 + 0.06 + 0.14 + 0.24 = 0.5
∴ P (AnBnc) = 0.24 and P (AnBnC) = P (A).P (B). P(C) = 0.6 x 0.8 x 0.5 = 0.24
1.3.2. Bayes’ theorem

Theorem 1.12. If the events B1, B2,,,BK constitute a partition of the sample space S and P(B i) ≠ 0
for I = 1,2,3 ,,,,,k , then for any event A in S
( B ). P ( A / B )
k
P( A )=∑ P i i
i=1 .
Example: The completion of a given job in a textile factory may be delayed due to the
machineries problem. The probabilities are 0.55 that it will be defective machinery. 0.75 That the
job will be completed on time, if there is good machinery, and 0.35 that the job will be completed
on time if there is defective machinery. What is the probability that the job will be completed on
time?
Solution: Let A – the event, that the job will be completed on time, B is the event, that there will
be defective machinery, B1 is the event that there will be good machinery.
10
Given: - P (B) = 0.55 → P(B1) = 1- 0.55 = 0.45 and A ¿ B and A ¿ B1 are mutually exclusive
and alternative rules of multiplication
P (B) = 0.75
P (A/ B1) = 0.75 P (A/B) = 0.35
P (A) =P[(A ¿ B) ¿ (A ¿ B1)] = P(A ¿ B)+ P (A ¿ B1)=P(B). P(A/B) + P(B1) P(A/ B1)
= 0.55 X 0.35 + (0.45) x 0.75
= 0.1925 + 0.3375 = 0.53
Theorem 1.13. If B1, B2, ……, BK constitute a partition of the sample space S and
P(Bi) ≠ o for i =1,2,…, K; then for any event A and S such that P(A) ≠ 0
P( B ). P ( A/ B )
P ( B / A )= r r
∑ p ( B ). P ( A / B )
r k
i r
i=1 , for r = 1,2,3, ……, K
It is possible to understand easily that theorem 1.13 is the combination of theorem 1.9 and
theorem 1.12. And therefore we can restate the theorem as follows.
If an event A can only occur in conjunction with one of the K mutually exclusive and exhaustive
events, B1, B2,…, BK and preceded by the particular event Bi (i=1,2,…,k), A is given by.
( B) P A∩ P( B ) . P ( A / B ) P (B ∩ A)
P ( B / A )= =
i
=
r r r
∑ P ( B ). P ( A / B ) ∑ P ( B ) . P ( A / B )
r k k P(A)
i i i i
i =1 i=1 , this is called Bayes’ theorem.
Proof: - Since A can occur in combination with any of the mutually exclusive and exhaustive
events B1 B2… BK we have A= (A ¿ B1) ¿ (A ¿ B2) ¿ ….. ¿ (A ¿ BK).
P (A) = P (A ¿ B1) +P(A ¿ B2) + …. +P(A ¿ Bk)
∑ P ( Bi ). P ( A / Bi )
k
= P (B 1). P (A/B1) + P (B2). P (A/B2) + ….+P (Bk). P (A/Bk) = i=1 for

any particular event Br, the conditional probability P(Br/A) is given by P(Br n A) = P(A). P(Br/A)
P ( B ∩A ) P(B ). P( A / B )
⇒P ( B / A )= r
=
r r
∑ P ( B ). P ( A/ B )
r P(A) k
i i
i=1
Example:
1. Waliin Tour firm rents cars for its tourists from their rental agencies such as Alfa, gamma and
delta rental agencies: 55% from agency Alfa, 35% from agency gamma and 10% from agency
11
delta. If 12% of the cars from agency Alfa need a tune up, 16% of the cars from agency
gamma need a tune up and 8% of the cars from agency delta need a tune up. What is the
probability that a car came from rental agency gamma?
Solution: i) first lets’ find the probability that the rental car will need a tune up and let α -Alfa; G-
gamma and D- delta. And if A is the event that a car needs a tune up.
P( α ) = 0.55; P(G) = 0.35; P(D) = 0.10 and P(A/ α ) = 0.12; P(A/G) = 0.16; P(A/D) = 0/08
Therefore P(A) = P( α ). P(A/ α ) + P(G) . P(A/G) + P(D).P(A/D) = 0.55 X 0.12 + 0.35 X 0.16 +
0.10 X 0.08
= 0.066 + 0.56 + 0.008 = 0.13
ii) then the probability that it come from the rental agency gamma will be
P (G / A )=
P(G) . P A / ( B) 2
=
0. 35 X 0 .16
=
0. 056
=0. 431.
P ( α ) . P ( A / α )+ P ( G ) . P ( A/ G ) + P ( D ) . P ( A / D ) 0 . 66+0 . 56+0 . 008 0 .13
CHAPTER TWO
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
12
2.1 Discrete random variable and their probability distribution
Probably distribution:
This depicts the possible outcomes of an experiment and the probability of each of these
outcomes.
Random variable:
Definition 2.1 If S is a sample space with a probability measure and X is a valued function defined
over the elements of S, and then X is called random variable.
It is a quantity resulting from an experiment that, by chance, can assume different values.
Random variable may have two forms which are discrete random variable and continuous random
variable.
Discrete random variable (DRV)
A variable that can assume only certain clearly separated values resulting from a count of some
item of interest or if a random variable X can assume only a particular finite or countable infinite
set of values, it is said to be discrete random variable.
Example
1) The number of students earning a grade of A in 1st year economic class.
Here, there can be 25A’s, but can not be 25.15 A grades.
2) Suppose, when a coin is tossed 5 times, then the number of heads from the given experiment
will be:
X = 0, 1, 2, 3, 4, 5 (i.e, if we let X be the number of heads, then the values for x are those.)
3) For an experiment in which we roll a pair of dice say die A and die B, observing that the
random variable X takes on the value A + B is:
X 2 3 4 5 6 7 8 9 10 11 12
P(X) = X 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1//36
Each of the possible outcomes has the probability 1/36.
S = {(1, 1), (1,2), (2, 1), (1, 3), (3, 1), (2, 2), (1, 4), (4, 1,), (2, 3), (3, 2), (1, 5),
(5, 1), (2, 4), (4, 2), (3, 3), (1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3), (2, 6),
(6, 2), (3, 5), (5, 3), (4, 4), (3, 6), (6, 3), (4, 5), (5, 4), (5, 5), (4, 6), (6, 4),
(5, 6), (6, 5), (6, 6)}
13
Definition 2.2 If X is a discrete random variable, the function given by f(x) = P(X=x) for each x
within the range of X is called the probability distribution of X.
Theorem 2.1 A function can serve as a probability distribution of a discrete random variable ⇔ its
value f(x), satisfy the conditions:
i) f(x) ≥ 0 for each value within its domain
∑ f ( x ) =1
ii) X , where the summation extends over all the values within its domain
Example
1) Consider the experiment of rolling two dice. Assume X is the sum of numbers shown on the
two dice. Then X will take all values as shown below: (where X is discrete random variable)
X 2 3 4 5 6 7 8 9 10 11 12
P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1//36
2) For the probability of the total number of heads obtained in 4 tosses of a coin. Describe the
probability distribution for the given experiment.
Solution: Let the possible out comes of heads be noted by X the random variable X and its
corresponding probabilities by f(x), then
X 0 1 2 3 4
P(X) 1/16 4/16 6/16 4/16 1/16
∑ f (t )
Distribution Function: If a discrete random variable, the function given F(x) = P (X≤ x) = t≤x ,
for - ∞ < x < ∞, where f(t) is the value of the probability distribution of X at t is called the
distribution function or the cumulative distribution of X.
The values F(X) of the distribution function of a discrete random variable X satisfy the conditions:
i) F (- ∞) = 0 and F(∞) = 1
ii) If a < b, then F(a) ≤ F(b), for any real numbers a & b.
iii) 0 ≤ F(x) ≤ 1
Example
1) Find the distribution function of the total number of heads obtained in five tosses of a coin.
14
Solution: we get f (0) = 1/32; f(1) = 5/32; f(2) = 10/32; f(3) = 10/32; f(4) = 5/32; f(5) = 1,it follows
that by definition of distributive function
F(0) = f(0) = 1/32
1
∑ f (x )
F (1) = f (0) + f(1) = 1/32 + 5/32 = 6/32 = F(0) + f(1) = 0
2
∑ f (x )
F (2) = f (0) + f(1) + f(2) = 1/32 + 5/32 + 10/32 = 16/32 = F(1) + f(2) = 0
3
∑ f (x )
F (3) = f (0) + f(1) + f(2) + f(3) = 1/32 + 5/32 + 10/32 + 10/32 =26/32= F(2) + f(3)= 0
F (4) = f (0) + f(1) + f(2) + f(3) + f(4) = 1/32 + 5/32 + 10/32 + 10/32 + 5/32
4
∑ f (x )
= 31/32 + F(3) + f(4) = 0
F (5) = f (0) + f(1) + f(2) + f(3) + f(4) + f(5) = 1/32 + 5/32 + 10/32 + 10/32
5
∑ f (x )
+ 5/32+ 1/32 = 1 =F(4) + f(5) = 0
{{{
Hence, the distribution function is:
1 6 16 26 31
{0,forx<0¿ ,for0≤x<1¿ ,for1≤x<2¿ ,for2≤x<3¿ ,for3≤x<4¿ ,for4≤x<5¿
F(x)=
32 32 32 32 32
2) Find the distribution function (X) of the random variable of the probabilities given:
f (0) = 3/28, f (1) = 15/28; f (2) = 10/28
Solution: F (0) = f (0) = 3/28; F (1) = f (0) + f (1) = 3/28 + 15/28 = 18/28,
F (2) = f (0) + f (1) + f (2) = 3/28 + 15/28 +10/28 = 1
15
F(X) =
3 18
28 28{{
{0,for,x<0¿ ,for,0≤x<1¿ ,for,1≤x<2¿ ¿
Theorem 2.3. If the range of a random variable X consists of the values x 1< x2<x3 <,,,<Xn, then
f(x1) = F(x), and f (xi) = F(xi) – F (Xi-1), for i = 2,3,…,n.
Example:-
1. If the distribution function of X is given by:
F(x) =
{{{1 6 16 26 31
{0,forx<0¿ ,for0≤x<1¿ ,for1≤x<2¿ ,for2≤x<3¿ ,for3≤x<4¿ ,for4≤x<5¿
32 32 32 32 32
Solution – Using theorem 2.3.
f (0) = F (0) = 1/32
f (1) = F (1) – F (0) = 6/32 – 1/32 = 5/32
f(2) = F (2) – F (1) = 16/32 – 6/32 = 10/32
f (3) = F (3) - F (2) = 26/32 – 16/32 = 10/32
f (4) = F (4) – F (3) = 31/32 – 26/32 = 5/32
f(5) = F (5) – F (4) = 32/32 – 31/32 = 1/32
This is thee total number of heads obtained in five tosses of a coin.
2.2. Expected value and variance of discrete random variable
16
In order to summarize the behavior of the probability distribution, some of the most common
measures include measuring central tendency and measures of dispersion. And these terminologies,
the so called average and dispersion are known as the expected value and the variance of the
probability distribution respectively.
The mean of a population is the expected value of the population and it is denoted by E(X) or μ .
Therefore; the mean:
- Reports the central location of the data
- Is the long run average value of the random variable in probability distribution
- Is a weighted average
Definition 2.4. If X is discrete random variable and f(x) is the value of its probability
μ=Ε ( x )=∑ x . f ( x )
distribution at x, the expected value of x is: x
Example
1. A lot of 12 TV sets includes 2 with white cords. If 3 of the sets are chosen at random
for shipment to a hotel, how many sets of white cords can the shipper expect to send
to the hotel?
Solution- X of the two sets with white cords and 3-x of the 10 other sets can be chosen in
( )( ) ways, 3 of the 12 sets can be chosen in ( ) ways and, then ( )

2 10 12 12
X 3−X 3 3 possibilities are equi-

probable, we find that the probability distribution of X, the number of sets with white cords shipped
to the hotel is given by:
( 2x )(3−x10 ) x 0 1 2
f ( x )=
( 123 ) =, for x= 0, 1,2. f(x) 6/11 9/12 1/22
2
∑ x . f (x)
And then, E(x) = i=0 = ox( 6/11) + 1x (9/22) + 2x(1/22) = ½
Theorem 2.4. If X is a discrete random variable and f(x) is the value of its probability distribution
∑
at x, the expected value of g(x) is given by E[g(x)] = x g(x) f(x)
17
Example:-
2. If X is the number of points rolled with a balanced die, find the expected value of
g(x) = 2X2+1.
∑ (2 X +1 ) .1/6
6 2
Solution: since each possible outcome has the probability 1/6,weget: E[g(x)]= i=1
=(2x 1 +1).1/6+(2x 2 +1).1/6+(2x 3 +1).1/6 +(2x 4 +1).1/6 +(2x 5 +1).1/6 +(2x 6 +1) .1/6
2 2 2 2 2 2
= 1/6 [3 + 9 +19 +33 + 51+73] = 188/6 = 94/3

Properties of expectation
1. If a and b are constants then the expected value of ax+b is given by:
E (ax+b) = a E (x) +b
Proof: - Let X be a random variable, which can take the value x 1,x2,..xn and with respective
probabilities f(x1),f(2),…,f(xn).
E(x) =∑x f (x)
n
∑ x i f ( x i)
= x1 f(x1) +x2f (x2) + ,,, +xn f(xn) = x=1
∑ f ( xi)
n
x =1 = f(x1) +f(x2) +,,,,+ f(xn)=1
∑ (a x i +b ) f ( x i )
n
Therefore E (a X + b) = (aX 1 + b)f(X1)+ (aX2 +b) f(x2) +…+(aXn + b) f(xn) = x =1 =

aX1 f(x1) + ax2f(x2) +…+aXn f(xn) + bf(x1)+ bf(x2))+…+bf(xn)
= a[X1f(x1) +x2f(x2)+ …+ xnf(xn] + b[f(x1) + f(x2)+ …+ f(xn]
(x )
n
a ∑ x i f ( x i)
n
b∑ f i
= i=1 + i=1
=
a∑
i=1
x f ( x ) + b x 1 = a E(X) + b
i i
2. if we set b = 0 or a = 0, it follows from the above that;

i. If a is constant, then E(ax) = a E(x)
Proof: - Let X be a random variable, which can take the value X1, X2, …,Xn with respective
probabilities f(x1), f(x2), …, f(xn), then by definition:
E(ax) = (aX1)f(x1) +(aX2). f(X2)+ … +(aXn) f(Xn)
18
= a [X1 f(X1) + X2 f(X2) +…+ Xnf(Xn)]
= a E(x)
ii. if b is constant, then E(b) =b
Proof: - E (b) = b f(X1) + b f(X2) +…. + b f(Xn) = b[f(X1) + f(X2) +….+ f(Xn)] = b x 1 = b
[ g (x )]
n n
∑ C i gi ( x )=∑ C i Ε i
iii. if C1, C2, …,Cn are constants, then E [ i=1 i=1
Example
a. If X is the total number of heads obtained in two tosses of a coin, find the expected value of
g(x) = 3x + 5.
Solution: E (3x + 5) = (3 × 0 + 5) ¼ + (3x1+5) 2/4+ (3 × 2 + 5) ¼
= 3 [0 × ¼ + 1x2/4 + 2×1/4] + 5=3× 4/4 + 5 = 32/4
b. Given X 0 1 2
f(x) 3/28 15/28 10/28
Find the expected value of g(x) = 4x2

Solution:
E[g(x)] = E (4x2) = 4 [02 ×3/28 + 12× 15/28 + 22 ×10/28]
= 4 [0 + 15 + 5 ] = 4 × 35 = 35
28 56 56 14
c. Given X 0 1 2 4
F(x) 0.11 0.53 0.16 0.20
Solution:
E (3) = 3 × 0.11 × 3 × 0.53 + 3 × 0.16 + 3 × 0.20 = 3 × 0.20 = 3[0.11 + 0.53 + 0.16 +
0.20] = 3
3. The expectation of the sum of two functions g(x) and h(x) is the sum of the expectations.
i.e, E [g(x) + h(x)] = E [g(x)] +E [h(x)]
4. If x and y are two random variables, then
E ( x + y) = E (x) + E (y)
5. If x and y are two independent random variables, then
E (xy) = E (x). E (y)
19
Variance of the discrete random variable
The variance
- Is the amount of spread/variation of distribution of a random variable
Definition. 2.5. μ 2 is called the variance of the distribution of X, or simply the variance of X, and
σ σ is the positive square root of the variance, and is called the

2
it is denoted by , var (x);

standard deviation.
[( x−μ ) ] =∑ ( x−μ ) . f ( x )
2 2
Therefore μ 2 =E x
Variance of X in terms of expectation
σ μ μ
2
= 2 1 – 2 2
Proof σ
2
= E [ (X – μ )2]
= E [X2 – 2 μ X+ μ 2]
= E [X2] – 2 μ E [x] + E [ μ 2] / ∵ E(x2) = ∑x2 f(x)
/∵ μ 21 = 2 μ . μ - μ2
= μ 21 = μ 2
Some properties on variances of a discrete random variable
1. If X is a random variable and b is a constant, then the variance of the sum/ difference/ of the
random variable X and the constant b is given by :
σ
2
= Var (x+b) = Var (x), where b is constant.

Proof:- E(X+b) = E(x) + E(b) = E(x) +b
−
Var (x+b) = E [(x +b) – (E (x) +b)]2 = E [x+b – E(x) + b]2
= E (x – E(x)]2 = E(x2) – 2E (x) E(x)+ [E (x)]2
= E (x2) – [E(x)]2 = Var (x)
∴ Variance is independent of change of origin.
Var (ax) = a
2
2. var (x), where a is constant
20
Proof:- var (ax) = E [ax – E(ax)]2 = E[ax –aE(x)]2 = E [a {x- E (ax)]2 = a2 E [x – E(x)]2 = a2 var (x)
∴ Variance is not independent of change of scale
If X is variance σ , then var (ax+b) = a σ

2 2
3. 2
Or var (ax + b) = var (ax) = a2 var (x)

i.e, var (ax+b) =a2 var (x) ,,,,,, by property number 1.
4. var (b) =0, where b is constant.
Proof: - Var (b) = E [b – E (b)]2 = E[b –b]2 = E(0) =0
5. If x and y are independent random variables, then the variance of the sum of X and Y is given
by:
V (X+Y) = V (X) + V(Y)
6. If a and b are constants, then the variance of ax+by is given by: Var (ax+by) = a2 var (x) + b2
var (y) in general, if x and y are two random variables, then the variance of their sum is given
by: var(x+y)=Var (x) + Var (y) – 2 cov (x,y) and var(x-y)=Var (x) - Var (y) – 2 cov (x,y). Note
that cov (x,y)=0, if x and y are independent.
Examples
1. Calculate the variance of X, representing the number of points rolled with a balanced die.
Solution:
6
∑ x i f ( x i)
μ = E(x)= x=1 =1x1/6 +2 x1/6 +3x1/6 +4x1/6+5x1/6 + 6x1/6 = 7/2
μ = E(x ) = 1 x1/6 + 2 x1/6 + 3 x1/6 +4 x1/6 + 5 x1/6 + 6 x1/6 = 91/6

1
2 2 2 2 2 2 2 2
= μ – ( μ ) = 91/6 – (7/2) = 35/12

1
Hence, σ
2
2 2 2
Or f(x) = 1/6; X =1, 2, 3,4,5,6
μ = ∑ ( x−μ )
2
= σ
2
.f (x)
2 = (1 – 7/2)2 x1/6 + (2 -7/2)2 x 1/6 + (3- 7/2)2 x 1/6 + (4- 7/2)2 x
1/6 +(5- 7/2)2 x 1/6 +(6- 7/2)2 x 1/6 = 1/6 [25/4 + 9/4 + ¼ + ¼ + 9/4 + 25/4] = 1/6 x 70/4 = 35/12.
Calculate the variance of g(x) = x2+1, representing the number of points rolled with a balanced
die.
Solution
21
Var [g(x)] = E [(x2+1) - E(x2+1)]2 = E [x2 –E (x2)]2
 E (x2+1) = (12+1) 1/6 + (22 +1) x 1/6 + (33+1) 1/6 +42+1)1/6+ (52+1)1/6+ (62+1)1/6 = 1/6 [2 + 5
6
∑ ( x +1) . f ( x )
2
+10+17+26+37]= 97/6 = i=1

6
∑
E (x2 – 97/6)2 = i =1 (x2 – 97/6)2.1/6
= 1/6 [(12 – 97/6)2 + (22 – 97/6)2 x (32 – 97/6)2 + (42 – 97/6)2 + (52 –
97/6)2 + (62 – 97/6)2]
= 1/6 [14 + 24 + 34 +54 +64] – (97/6)2
= 1/6 [1 +16 +81+256+625+1296]-(9409/36)
= (2275/6) - (9406/36) = (13650-9406)/36= (4241/36) = var(x2)
3. Calculate the variance of g(x) = 2 x2 for the number of points rolled with a
balanced die.
Var (2 x2) = 4 var (x2)
Solution: from example 2 we obtained that var (x2) =
=(4241/36), hence var (2x2)= 4 var (x2) = 4 x (4241/36)=(4241/9)
2.3. Continuous random variables (CRV)

Continuous random variable is a variable that can assume one of an infinitely large number of
values.
Example:-
 Height of a student
 Weight of a student
Continuous probability distribution for a continuous variable X is called the probability density
function and is represented by a curve bounded by two values from the abscissa X =a and X =b.
The region bounded by the two values and the curve gives the probability that x lies between the
values a and b.
b
∫ f ( x ) dx
I.e. P (a ≤ X ≤ b) = a
22
Therefore a function of the above expression can serve as a continuous probability density
function of a continuous random variable ⇔ its value f(x) satisfy the conditions:
i) f(x) ≥0
∞
ii)
∫ f ( x ) dx =1
−∞
b
∫ f ( x ) dx
iii) a =. P (a ≤ X ≤ b)
As in the discrete case, F(x) is called the cumulative probability of X, you should note the
following here too.
i. F (- ∞ ) = 0, F (+ ∞ ) =1
b
∫ f ( x ) dx
ii. F (b) – F (a) = a = P (a < X < b); for a<b.
2.4 Expected value and variance of a continuous random variable

Definition.2. If X is a continuous random variable and f(x) is the value of its probability density at
x, the expected value of X is:
∞
μ = E(x) = −∞
∫ x.f ( x ) dx
Example
1. A certain coded measurements of an objects diameter of threads of a fitting have the probability
density.
{ ( x ) ¿ ¿ ,for 0<x<1,and
2
π 1+
4¿
f(x) =
0,elsewhere ¿ , find the expected value of this random variable.
Solution: Using the definition of 2.6
( x + 1)| =lnπ4 =0. 44
1 1 2 1
4 4 x 4 1
∫ x. dx= ∫ dx= . . ln
( x) π 0
x π 2
2 2
0 0
μ = E (x) = π 1+ 1+
Theorem 2.5. If X is continuous random variable and f(x) is the value of its probability density at
x, the expected value of g(x) is given by:
∞
E[g(x)] = −∞
∫ g ( x ) .f ( x ) dx .
23
Example;-
{e ,for, x>0,and ¿ ¿¿¿

−x
1. If X has the probability density f(x) =
/
Find the expected value of g(x) = e 4x 5
Solution: According to theorem 2.5 we have:
[e ] ∫e [e ]
4x ∞ 4x ∞ −x −x −∞ −0
e e e e
−x
dx=∫
5 5 5 1 5 ∞ 5 5
E = . dx= | =−5 − =−5 [ 0−1 ] =5
0 0
( )
1
−5
0
Properties of expectation for continuous random variable

The properties of expectation for discrete random variables work here too.
For instance, if a and b are constant and X is a continuous random variable, then E(ax +b) = a
E(x) +b.
Proof:- Let g(x) = ax+b, then
∞
E ( ax+b ) = −∞ ∫ ( ax+b) .f ( x ) dx
∞ ∞
a ∫ x .f ( x ) dx+b ∫ f ( x ) dx
= −∞ −∞
= a E(x) +b
Variance of continuous random variable
Definition .2.7. μ - is called the variance of the density of x or it is denoted by σ

2
2 , var(x).
[( x−μ ) ] ∫ ( x−μ )
2 ∞ 2
∴μ=E = f ( x ) dx→
−∞ for continuous RV.
Example:-
1. If f(x) =
1
xln4 {
,for,1<x<3 ¿ ¿¿¿
, find the variance
3 3
∫ x . x ln1 4 dx=ln14 ∫ dx= ln14 [ x ] 31=ln24 =1. 44

Solution: 1st find E(x) = 1 1
24
Then,
] [ ]
3
σ =ln14 ∫ ( x−1. 44 ) . 1x dx= ln14 ∫ xdx −2. 88 ∫ dx +2 .07 ∫ 1x dx = ln14 x2 −2 .88 x +2 .07 ln x
2
[
2 3 2 3 3 3
=
1 1 1 1 1
=
1 9 1
[
− −2 .88 x 2+2. 07 ln 3
ln 4 2 2 ]
= 0.72 [4 -5.76 + 2.27] = 0.72 x 0.51 = 0.37
Moments
moment about the origin of a random variable value X, denoted by μ

1
Definition 2.8. The r th r is

X
r
the expected value of , symbolically,
( X ) = ∑X X
r
μ=
1
f (x)
r
E
r , for r = 0, 1, 2, …., when x is discrete and
∞
X f ( x ) dx
r
μ= ( X ) = −∞∫
1
r
E
r , when x is continuous
μ
1
And 1 is called the mean of distribution of x, or simply the mean of x, & it is denoted by μ .
Where as the rthmoment about the mean of a random variable x, denoted by μ , is the expected
r
value of (x – μ )r, symbolically, μ = E (x – μ ) = ∑

r r x (x – μ )r f(x) for r = 0, 1,2 …. When x is
∞
discrete and μ E (x – μ ) = −∞∫ (x - μ ) , f(x)dx, when x is continuous.

r = r r
Note that μ =1 and μ =0 for any random variable for which μ – exists.
0 1
I) μ =1 = E (X - μ ) = ∑ (x - μ ) f(x)= 1
0 0 0
μ
II) 1 = 0 = E (x - μ )1 = ∑ (x - μ )1 f(x) = 0
= E(x) – E ( μ ) = μ - μ = 0
And μ 2 is called the variance of the distribution of X, or simply the variance of x, and it is
σ
2
denoted by , var(x), there fore from the above:

μ 2 = E[ (x - μ )2 ] = ∑ (x - μ )2 f(x)  for discrete random variables.
25
∞
μ 2
∫
= E[ (x - μ )2 ] = −∞ (x - μ )2 f(x) dx  for continuous random variables.
CHAPTER 3
SPECIAL PROBABILITY DISTRIBUTION AND DENSITIES
3.1 Distributions
In this chapter we shall study some of the special probability distributions which figure most
prominently in statistical theory and applications in various disciplines. Here we also study their
and σ
2
parameters called μ .
A number of applied problems in experimental situation have been utilized to the formulation of
various discrete probability distribution formulae.
These distributions are probability distribution. That is they directly describe the behavior of
attributes. I.e. for instance the event may be success or failure, good or bad, black or white, etc.
Hence they are discrete distributions.
3.1.1. The discrete uniform distribution

If a random variable can take on k different values with equal probability, we say that it has a
discrete uniform distribution.
Definition 3.1. A random variable X has a discrete uniform distribution and it is referred to as a
discrete uniform random variable ⇔ its probability distribution is given by f(x) = 1/k,for x = x1 , x2
…….. xk, and where xi ≠ xj , when i ≠ j.
The mean and the variance of this distribution are
k k
∑x σ =∑ ( xi −μ ) . 1k
2
1 2
i. k
μ = i−1 and i=1
3.1.2. The Bernoulli distribution

In connection with this type of distribution a success may be getting heads with a balanced coin,
it may be catching pneumonia, it may be passing /or failing an examination, it may be gaining/ or
losing a profit in Business activities.
This inconsistency is carried over from the days when probability theory was applied only to
games of chance. For this reason we refer to an experiment to which the Bernoulli distribution
applies as a “Bernoulli trial”. There fore if an experiment has two possible outcomes, “success” and
26
“failure”,and their probabilities are, respectively P and (1 –P), then the number of successes, 0 or 1,
has a Bernoulli distribution.
Definition 3.2.A random variable X has a Bernoulli distribution and it is referred to

as a Bernoulli random variable ⇔ its probability distribution is given by:
f(x; p) = Px ( 1 – p)1-x, for x = 0,1
Example
1. In an industry’s quality control system it was observed that there is a 95% success in
producing quality product
i) what is the probability of obtaining defective products?
ii) List the probabilities for values x = 0,1.
Solution
i) f (0 ; 0.95) = 0.950 (1 – 0.95) 1-0 = 0.05
ii) f (0; 0.95) = 0.05; f(1;0.95) = (0.95)1 (1 – 0.95) 1-1 = 0.95
In general, if we have p – success and the random variable x = 0, 1 then f(0;p)= p0
(1- P) 1 -0 = 1 – p, and f(1;p) = p1(1-p)1-1 =p
3.1.3 The Binomial distribution

Repeated trials play a vital role in various disciplines in general and in probability and statistics
in particular, especially when the number of trials is fixed, the parameter P (probability of success)
is the same for each trial, and the trials are all independent. This type of distribution has many
applications, for instance, it applies if we want to know the probability of getting 6 heads in 16 flips
of a coin, the probability that 35 of 60 persons will escape from a poverty line, the probability that
18 of 20 persons will recover from HIV. However, this is the case only if each of the 20 person has
the same chance of recovering from the disease and their recoveries are independent (say, they are
treated by different physicians in different hospitals). To derive the formula for the probability here
is the definition.
Definition 3.3.A random variable X has a binomial distribution and it is referred to as a binomial
random variable ⇔ its probability distribution is given by:
p ( 1− p )
x n− x
b ( x ;n , p )= (¿ xn ) ,
for x=0,1,2,…,n.
27
→ Where: n-be the number of trials; x- be the number of observed success; p- be the probability of
)=n C x= x ! ( n−x ) !
n!
(
each trial; x
n
Characteristics of the binomial distribution

 An outcome of an experiment is classified into one of the two mutually exclusive categories
i.e., success or failure.
 The data collected are the results of counts.
 The probability of success stays the same for each trial.
 The trials are independent.
Examples
1. In binomial situation n=4 and p=0.25, determine the probabilities of the following events.
i) X=2; ii) x=3
Solution
p ( 1− p )
x n− x
b ( x ;n , p )= (¿ xn ) ,
To solve the given problems apply the formula for x=0, 1, 2,…, n.
( 0.25 ) ( 1−0.25 ) ( 0.25 ) ( 0.75 ) =

2 4−2 2 2
4!
( ¿ 24 ) = ( )
i) b(2;4,0.25)= 2! 4−2 !
=6 ( )( )
1 9
16 16
=0 .2109
( 0. 25 ) ( 0 .75 )
3 4−3
4!
= =0 . 0469
ii) b(3;4,0.25)= 3 ! ( 4−3 ) !
1. A tele center marketer makes six phone calls per hour and is able to make a sale on 30% of
these contacts. During the next two hours, find:
I. The probability of making exactly four sales.
II. The probability of making no sales.
Solution
( 0 .30 ) ( 0 .70 )
4 12−4
12 !
= =0. 0576
I. b(4;12,0.3) 4! (12−4 ) !
28
( 0. 30 ) ( 0 .70 )
0 12−0
12 !
= =0 . 014
II. b(0;12,0.3) 0 ! ( 12−0 ) !
2. Suppose 50% of the people in the region prefer inferior goods, we select 10 people for
further study.
i. What is the probability 5 of those surveyed will prefer inferior goods?
( 0.5 ) ( 0 .5 ) ( 0 .5 ) = 63256
5 10−5 10
10 !
= = ( ¿ 5 10 )
Solution: b(5;10,0.5) 5 ! ( 10−5 ) !
3. In binomial, a binomial distribution n=8 and p=0.30, find the probsbilities of the
following events.
I. x ¿ 2
II. x ¿ 3
Solution:
I. b(x ¿ 2 ;8,0.3)=p(0)+p(1)+p(2)=
( 0.3 ) ( 0 .7 ) +( ¿ ) ( 0 .3 ) ( 0.7 ) + (¿ ) ( 0 .3 ) ( 0 .7 ) = 0.058+0.198+0.296=0.552

0 8 1 7 2 5
( ¿ 08 ) 18 28
II. b(x ¿ 3 ;8,0.3)=1- b(x ¿ 2 ;8,0.3)=1-0.552=0.448
Theorem 3.1. The mean and the variance of the binomial distribution are:
= np and σ
2
μ = npq respectively.
If we let p(x =r) = p ( r ) ; b (r; n,p) and p and (1 – p) =q; when r= 0,1,2 ….,n.
Then μ = mean = ∑ rp (r) = np
Proof: - μ = ∑r. p (r)
C pq C pq nC p q = C pq
0 n−0 1 n−1 2 n−2 n 0
0.n 0 +1 . n 1 +2. n 2 + .. .+n n
2n ( n−1 ) 3 n ( n−1 ) ( n−2 )

np q + 2! p q p q +. . .+n p q =
n−1 2 n−2 3 n−3 n 0
1
+ 3!
= 1!
( n−1 ) ( n−1 ) ( n−2 )
C pq pq pq p q ]=
0 n−1 1 n−2 2 n−3 n−1 0
np [n−1 0 + 1! + 2! +. ..+n
=
C pq C pq C pq C p q ]=
0 n−1 1 n−2 2 n−3 n−1 0
=np [ n−1 0 + n−1 1 +. n−1 2 +.. .+n−1 n−1
29
[ q+ p ]
n−1
=np =np
, ∵ p+q=1
[ ]
2
n
∑ rp ( r )
n
∑r r (μ)
2 2 2
p ( r )− =∑ p (r )−
σ
2
i =0
Variance = = i =0
Examples
1. If the probability of defective goods in a particular factory is 0.1, find the mean and the
variance, for the distribution of defective goods in a total of 200.
Solution:-
σ
2
Given: n = 200; p =1/10; q = 1-p = 1 -1/10= 0.9, hence μ =np = 200 x1/10=20 and = npq =
200 x1/10x9/10=18
2. For a binomial distribution, μ =4 and var = 2.4, find the parameters p and q.
Solution:-
σ = npq = 2.4, then q= ( σ / μ ) = (npq/np) = (2.4)/4=0.6

2 2
Given: np=4;
p = 1-q = 1-0.6 = 0.4
3. 1.4. The hyper geometric distribution
Here we are choosing, with out, replacement, n of the N elements contain in the set, there are
( Mx ) ways of choosing x of the M successes and ( N−n−xM ) ways of
choosing (n–x) of the (N–M) failures and, hence, ( ¿ x )( ¿ n−x ) ways of choosing x successes and
M N−M
(n–x) failures. Since there are ( ¿ n ) ways of choosing n of the N elements in the set, assuming that
N
they are all equally likely, there fore, the probability of “x successes in n trials” is
( ¿ x M )( ¿ n−xN−M ) / ( ¿ nN ) .
Definition 3.4. A random variable X has a hyper geometric distribution and it is referred to as a
hyper geometric random variable ⇔ its probability distribution is given by:
30
h ( x ;n , N , M )=
M C x. N−M C n−x
N C n =, for x = 0, 1, 2, …, n.
X ≤ M and n – x ≤N – M
N – The size of the population
M – Number of successes in the population
n – The size of sample (number of trials)
x – Number of successes in the sample (n)
C – The symbol for a combination
The hyper geometric distribution should be applied:
. If the sample is selected from a finite population
. If the size of the sample n is more than or equal to 5% of the population N
. It is especially appropriate when the size of the population is small
Example
1. A factory employs 50 people in the assembly department 40 of the employees belong to a
union. 5 employees are selected at random to form a committee to meet with management
regarding shift starting times. What is the probability that 4 of the 5 selected for the
committee belong to a union?
Solution:-
Given: N= 50 the number of employees
M = 40 the number of union employees
X = 4 the number of union employees selected
n = 5 the number of employees selected
40
h ( 4 ;5 , 50 , 40 ) =
C .50−40 C
4 5−4
(= 404 !36! ! )( 10!1 !9 ! ) = (91390 ( 10) ) =0. 431
50 C 5 ( 5 ! 45 ! )
50 ! 2118760
Or it is possible to solve the above problem using binomial distribution hence

let p=(M/N) =(40/50) = 0.8, n = 5, x = 4, then b(4;5,0.8) = 5C4 (0.8)4 (1-0.8)5-1 = 5!/4!/! (0.8)4 (.2)1
= 5 X 0.4096 X 0.2 = 0.410
31
Theorem 3.2 The mean and the variance of the hyper geometrical distribution are:
nM ( N − M ) ( N −n )
σ
2
nM =
μ=
N
2
N , and ( N −1 )
nM
μ=
1) Prove that N ?
nm ( N −m ) ( N −n )
σ
2
=
N
2
2) Variance= ( N −1 )
Example:
1. Suppose the Economics Department of Gondar University wants to recruit four people for a
teaching position and for this 16 applicants have applied. Among the 16 applicants for a
teaching position, 10 have university degree. If 4 of the applicants are randomly chosen for
interviews, what are the mean and the variance?
Solution:
Given:-N=16;m=10;n=4,then
nm ( N −m ) ( N −n ) 4 X 10 ( 16−10 ) ( 16− 4 )
σ 40 X 6 X 12 3
2
= = =16 X 16 X 15 = 4 =0 .75
N 16 ( 16−1 )
2 2
( N−1 )
nm 4 x 10 10
μ= = = =2 .5 ,
N 16 4 and
S2 = nm (N –m) (N –n) = 4 x 10 (16 – 10) (16-4) = 40 x 6 x 12 = 3=0.75
N2 (N -1) 162 (16 -1) 16 x 16 x 15
2. Based on the example above, if two of the four will be taken for the job, what will be the
probability?
Solution:-
b ( 2 ;4 , 16 , 10 )=
( )( ) ( )( )
10 16−10
2 4−2
= 16 =
10 6
2 2 [ 10 !
.
6!
2 !8 ! 2 ! 4 ! ] =
135
=0 . 371
( 4) ( 4)
[ ]
16 ! 364
16
4 !12!
3.1.4. The Poisson distribution

We have seen the binomial distribution so far. As the number of trials (n) be come large the
calculation of binomial probabilities with the formula of definition 3.3 will involve a prohibitive
amount of work. For instance to calculate the probability that 2 of 10,000 persons watching
(celebrating) a ceremony of” timket” on a very hot day will suffer from heat exhaustion, we first
32
have to determine (100002 ) and if the probability is 0, 0005, that any one of the 10,000 persons
cerebrating the ceremony will suffer from heat exhaustion we also have to calculate the valve of
[0.0005]2(0.9995)9998. So in this case, since it gives a burden we are forced to look at the poison
probability distribution.
Therefore the limiting form of the binomial distribution where the probability of success p is
very small and n is large is called the Poisson distribution [ie, p→0 , n→∞ ]
Definition 3.5.
A random variable X has a poisson distribution and it is referred to as a poison random

variable ⇔ its probability distribution is given by:
e m
−m x
P(x; m) = p (X =x) = x! , for x=0,1,2,3, ------
Where m=np, e ¿ 2.71828 is the base of the system of natural logarithm.

Poisson distribution will provide a good approximation to binomial probabilities when n≥30 and
p ≤ 0.05 and np<5. Poison distribution may be obtained as a limiting case of Binomial
probability distribution under the following conditions:
i. n, the number of trials is indefinitely large (i.e., n→ ∞ )
ii. P, the constant probability of success for each trial is indefinitely small (i.e.,
p→0 )
iii. np =m, (say which is finite.)
Example
1. Calculate the probability that 2 of 10000 persons celebrating a ceremony of “ Timket” on a very
hot day will suffer from heat exhaustion, if the probability is 0.005, that any one of the 10000
persons celebrating the ceremony will suffer from heat exhaustion
Solution
e m
−m x
5
m=np=10000 . =5 ; x=2 , p ( x , m )=
10000 x!
e 5 = 0 . 0067 x 25 =0 . 08375
−5 2
p( 2; 5 )=
2! 2 x1
33
2. If the probability that an individual suffers from allergy from the by-product of tannery’s
factory is 0.001, determine the probability out of 2000 individuals.
a) x = 3
b) > 2
c)
Solution:
e 2 =e
−2 3 −2
1 x8
2000 x =2⇒ p ( 3; 2 )= =0 . 1804
a) m = np = 1000 3! 6
b) p (x >2; 2) = p(3;2) +p(4;2)+p (5;2)+….. + P (n; 2) =
[
e 2 +e x2 +e x2
]
−2 0 −2 1 −2 2
1− =1−0 .1353−0 .2706−0 . 2706=0 . 3233

= 0! 1! 2!
3. If 5% of the marble manufactured by a company are defective, what is the probability that in a
sample of 100 marbles.
a. none is defective
b. 5 marbles will be defective
Solution
5
100 x =5
m = np = 100
e 5 =e
−5 0
−5
⇒ p( 0 ; 5 )= ≈0 . 007
a) x = 0 0!
e 5 = 0 .007 x 625 x 5 =0 . 1823

−5 5
b) p (5;5) = 5! 5 x 24
μ =m and σ
2
Therefore 3.3. The mean and the variance of the poison distribution are =m.
Now let’s show the proof of the above theorem below;
1. μ=m
e e + e m + e m +. .. , r=0,1,2,3, . ..
−m 0 −m 1 −m 2
p( r ; m)=
Proof: Let 0! 1! 2!
34
m + m + m +. .. . .
2 3 4
e
m
=1+m+
And 2! 3! 4!
Therefore:
e m +1. e m + 2. e m +3 . e m + 4 . e m +. ..
−m 0 −m 1 −m 2 −m 3 −m 4
∞
μ = i =0
∑ r . p ( r )=0 . 0! 1! 2! 3! 4!
e m + e m + e m +. ..
−m 2 −m 3 −m 4
= e m+
−m
1! 2! 3!
m + m + m +. .. .. ]=e
2 3 4
e e
−m −m m
m[ 1+ m+ .m. =m
2! 3! 4!
Variance = σ
2
2. =m
∑ r . p(r )− [ ∑ r . p(r ) ]
∞ 2
∞ ∞
∑ r . p(r )− m
2 2
=
Proo σ
2 2
= r=0 r =0 r=0
∞
∑ r . p(r )=
2
Now: r=0
0 . e 0!m +1 . e 1!m +2 . e 2 m e m + 4 . e m +5 . e m + .. .
−m 0 −m 1 −m 2 −m 3 −m 4 −m 5
+3 .
2 2 2 2 2 2
! 3! 4! 5!
e m + 3 e m + 4 e m + 5 e m +. ..
−m 2 −m 3 −m 4 −m 5
e
−m
2
= m+
1! 2! 3! 4!
m + 4 m + 5 m +. . .. . ]
2 3 4
=e
−m
3
m[1+2 m+
2! 3! 4!
m +m +m m m + 4 m +. .. . . }]
2 3 4 2 3 4
=e
−m
2 3
m[{ 1+m + +.. . .. }+{ m + +
2! 3! 4! 2! 3! 4!
[ {
e m {e }+ m 1+ m+ m2 ! + m }]
2 3
−m m
3!
+. .. =e
−m
m [ {e }+ m {e }]= e m [ e +m e ]
m m −m m m
e e m , σ =m+ m − m =m
−m m 2 2 2 2
= .m. [ 1+m ] =m [ 1+ m ]=m+
35
σ
2
Hence, for the poison distribution with parameters m, we have, μ= =m
Examples
1. In poison distribution, if p(x) for x = 0 is 10%, find the mean and the variance of the
distribution.
Solution:
e m =10 %= e
−m 0
e
−m −m
= ; ⇒ln =ln 10 % ⇒ m=ln 10≈2 .3026
P(x=0) 0!
2. If the standard deviation of a Poisson distribution is √ 2, find the probability that x = 0, and
, and σ
2
μ .
Solution:
2
e −(√ 2 )
2 =e
0
−2
. 1
= = =0 . 135
0!
e
2
1. p(x =0)
σ =( √ 2 ) =2
2 2
μ=
2.
3.2 Densities
3.2.1 The uniform density (Rectangular distribution)
Definition 3. 6. A random variable has a uniform distribution and it is referred to as a continuous
uniform random variable ⇔ its probability density is given by:
U ( x ; α , β )= { β−α1 , for α< x<β , 0 , elsewhere , the parameter α and β of this probability density
are real constants, with α< β .
Example:
1. In a certain experiment, the error made in determining the density of a substance is a random
variable having a uniform density with α = - 0.015 and β = 0.015. Find the probability that
such an error will be between – 0.002 and 0.003.
Solution:-
36
[ ]
0 .003 0. 003
1
0.015−(−0.015 )
∫ dx= 0.103 ∫ dx=
U [(-0.002, 0.003); - 0.015, 0.015] = −0 .002 −0 .002
0. 003
1 0. 005 1
= [ x] = = =0. 167
0 . 03 −0 .002 −0. 03 6
Theorem 3 .4. The mean and the variance of the uniform distribution are
σ ( β−α ) .
2 2
α+ β 1
μ= ; =12
given by : 2
3.2.2. The exponential density

The exponential distribution applies not only to the occurrence of the first success in Poisson
process, but also it applies to the waiting times between successes.
Definition 3.7 A random variable X has an exponential distribution and it is referred to as an
exponential random variable ⇔ its probability density is given by:
−x
g(x; ፀ)=
{1¿
θ¿
0,elsewhere ¿
e θ
¿ ,forxθ>0,
1
Note: If ፀ= m , then the above formula will be:
{me , for,m>0¿ ¿¿¿

−m
g(x; m) =
Examples:
1. At Bahir Dar town highway, the number of cars exceeding the speed limit by more than 10km
per hour in half an hour is a random variable having a Poisson distribution with m = 8.4.What is
the probability of a waiting time of less than 5 minutes between cars exceeding the speed limit
by more than 10 km per hour?
Solution:
37
Taking half an hour as the unit of time, we have ፀ=8.4. Therefore the waiting time is a random
1 1
θ=
variable having an exponential distribution with 8 . 4 , and since 5 minute is 6 of the unit
of time. Then the required probability is:
1 1
e −e
6 −8 .4 x −8.4 x 6
=∫ 8 .4 dx= | ≈0.75
g (x; m) 0 0
2. A certain kind of appliance requires repairs on the average once every 2 year. Assuming that the
times between repairs are exponentially distributed, what is the probability that such an
appliance will work at least 3 years with out requiring repairs?
Solution:
Given: μ =2 = ፀ and x – at least 3 years
[e ]
1 1 1 1 3
− (3 )
e e e e
∞ − x − x∞ − (∞ ) −
1 2 2 2 2 2
⇒∫ − dx=− =− −− =0+ =0. 2231
3 2 3
Theorem 3.4. The mean and variance of thee exponential distributions are given by:
μ = ፀ and σ
2
= ፀ2
3.2.3. The normal distribution
Definition 3.8. A random variable X has a normal distribution and it is referred to as a normal
random variable ⇔ its probability density is given by:
[ ] , for ,−∞< x <∞ , where ,σ >0 .

2
−
1 x−μ
1 2 σ
= .
σ
2
n (x; μ , ) σ √2 π
Characteristics of a normal probability distribution

i. This is bell shaped and has a simple peak at the exact center of the distribution.
0.5 0.5
μ=mo=me
38
ii. Mean ( μ ), mode (M) and median (Me) of the distribution are equal and located at the
peak
iii. Half the area under the curve is above the peak ,and the other half is below it
iv. The normal probability distribution is symmetrical about its mean
v. The normal probability distribution is asymptotic i.e.,the curve gets closer and closer to
the axis but never actually touches it.
The normal distribution with μ =0 and σ =1 is referred to as the standard normal distribution.
1 2
x
e
−
1 2
= . =f ( x ) , x≈ N ( 0,1 )
n (x; , σ ) √ 2 π
μ
x−μ
The standard normal value = Z value= σ , then the standard normal variate
= Z has mean = 0 and standard deviation ( σ ) = 1

x−μ
Let us show it: we know that Z = σ , then we have
x−μ 1 1 1
]= E ( x−μ )= [ E ( x )−E ( μ ) ] = [ μ−μ ] =0 , and
[
E(Z) =E σ σ σ σ
σ
2
Var ( Z )=Var ( x−μ
σ
=)1
. Var ( x−μ )=
1
. Var ( x ) ,∴ Var ( Z ) =
1
. =1
σ σ σ
2 2 2
a
2
∵Var ( ax )= Var ( x ) ,
∵ Variance is independent of the change of origin
Hence the probability density function of standard normal variate
Z2
e
−
1 2
= . ,−∞<Z <∞
Z is given by: ø(Z) √2 π
How to read the standard normal probability distribution
This method calls for standardizing the distribution or changing from the X scale to Z- scale.
39
To find the area between a value of interest (x) and the mean ( μ ), we first compute the difference
between the value (x) and the mean ( μ ); then we express that difference in units of standard
deviation. In other words we compute the value.
x−μ
Z= σ
Finally, we find the desired area under the curve, or the probability, by referring to a table whose
entry corresponds to the calculated value of Z.
Therefore the valve of Z follows a normal probability distribution with a mean of zero and standard
deviation of one unit. This probability distribution is called standard normal probability distribution.
Thus we can converse any normal distribution to the standard normal distribution.
Examples:
1. The monthly incomes of employees in a textile factory are normally distributed with a
means of birr 2000 and a standard deviation of birr 200.
a) What is the Z value for an in come of birr 2200?
b) What is the Z value for an income of 1700?
Solution:
a) Given x =2200; μ = 200 and σ = 200
x−μ 2200−2000
= =1
then for x = 2200; Z= σ 200
x−μ 1700−2000
= =−1 .5
b) For x = 1700; Z = σ 200
Hence a Z- value of 1 indicates that the value of 2200 is one standard deviation above the mean of
2000, while a Z- value of 1700 is 1.5 standard deviation below the mean of 2000.
Areas under the normal curve
1. About 68% of the area under the normal carve is within one standard deviation of the mean
=μ±1σ .
2. About 95% of the area under the normal curves is within two standard deviation of the mean
= =μ±2 σ .
3. About 99.74% of the area under the normal curve is within 3 standard deviation of the mean
= =μ±3σ .
40
Example;
2. The daily water usage per person in Ethiopia is normally distributed with a mean of 20
gallons and a standard deviation of 5 gallons.
i) What is the probability that a person from Ethiopia selected at random will use less than
20 gallons per day?
ii) What percentage use between 20 and 24 gallons?
iii) What percentage of the population uses between 18 and 26 gallons?
Solution:
x−μ
i) Z -value is = Z= σ =0,thus P(x<20)=p(Z<0)=0.5
41
x−μ 20−20 24−20
= =0 ,
ii) The Z value with X = 20 is Z = σ 5 and with X=24 is , Z = 5 =
4
=0. 8 ,
5 then P (20 < x< 24) = p(0< Z < 0.8) = 0.2881 ,using the Z table ∴ it is 28.81%.
24 X
Z
0 0.8
18−20 18−20
=−0 . 4 =1 .2
iii) The Z – value with X =18 is Z = 5 and for X =26, Z= 5 , thus
p(18<x<26) = p (-0.4 <x<1.2) , using the Z -table ⇒ 0.1554 + 0.3849 = 0.5403 it is

54.03%
42
The normal approximation to the Binomial
Relation between Poisson and normal distribution
If E(x) = m = mean and var (x) = σ

2
= m, for m →∞ , thus standard poison variate becomes

x−E ( x ) x −m
Z= = .
(Spv): σ √m
Relation between binomial and normal distribution
Normal distribution is a limiting case of binomial probability distribution under the condition that
n, the number of trials, n close to large, and either P or q is very small, then since E(x) =np and var
(x) =npq under binomial probability distribution. Hence standard Binomial variate:
x−E ( x ) x −np
Z= = .
σ √ npq
The normal distribution is sometimes introduced as a continuous distribution, which provides a
close approximation to the binomial, which is a discrete one ,when n, number of trials is very large
and P the probability of success on the individual trial is close to ½.
43
When to use the normal approximation
The normal probability distribution is a good approximate to the binomial probability distribution
where np and n (1-p) are both at least 5. However, before we apply the normal approximation, we
must make sure that our distribution of interest is infact a binomial distribution. So, when we
approximate a discrete probability distribution by a continuous one we need to make an adjustment
called the contrinuity correlation. In this adjustment, we add and/or subtract 0.5 to the value that the
binomial distribution is approximated by the normal.
How to apply the correction factor
- for the probability at least X occur, use the area above (X – 0.5)
- for the probability that more than X occur, use the area above (X + 0.5)
- for the probability that X or fewer occur, use the area below (X + 0.5)
- for the probability that fewer than X occur, use the area below (X – 0.5)
0.1977
  56 59.5
Examples
Z
1. A restaurant
Z=0 1.2
found that
70% of
their new customers return for another meal. For a week in which 80 new customers were
denied occurs at that restaurant’s, what is the probability that 60 or more will return for another
meal?
Solution: - given n=80; p=0.70; x ≥ 60,
μ = np = 80 x 0.70 =56
√
σ =√ np ( 1− p )= 80 .
70 30
.
100 100
=√ 16 .8=4 .10
X = x given= 0.5 = 60 -0.5 = 59.5

x−μ 59 . 5−56
Z= = =0 . 85 , P (Z=0. 85 )=0 . 3023
σ 4 . 10
P (60 ≤ x < 80) = P (Z ≥ 0.85) = 0.5 – 0.3023 = 0.1977
2. Assume a binomial probability distribution with n=50 and p=0.25, compute the following:
44
a. the mean and the standard deviation of the random variable
b. the probably that x is 15 or more
c. The probability that x is 10 or less
Solution:
a. μ= np = 50 x 0.25 = 12.5
σ =√ np ( 1− p )= √50 x0 .25 x0. 75=√ 9. 375=3 .0619

b. X = 15 – 0.5 = 14.5
14.5−12 .5
Z = 3.0619 = 0.65  P (Z = 0.65) =0.2422, then P (x ≥ 15) = P (Z ≥ 0.65) = 0.5 –
0.2422 = 0.2578 is the area
0.2578
X
  12.5 14.5
  12.5
X=14.5
Z=0.65
c. x = 10 + 0.5 = 10.5
45
10 .5−12. 5
=−0 .65 ,
z = 3. 0619 the area is 0.2422 = P(Z = 0.65) then P(x ¿ 10) = P(Z ¿ 0.65)=/-0.5
+ 0.2422/ =P(0<x ¿ 10)=0.5-0.2422= 0.2578
X
X=10.5
  12.5
Z
-0.65 Z=0
3. If z is a random variable having the standard normal distribution, find the probabilities that it
will take on a value:
a. greater than 1.14
b. greater than – 0.36
Solution:
a. Z value, Z = 1.14 from the normal distribution Z-table = 0.3729
Probability greater than Z ⇒ P (Z > 1.14) = 0.5 – 0.3729 = 0.1271
46
X
Z
Z=0 1.14
P(Z>1.14)
b. /Z/ = / -0.36/ = 0.1406

Therefore P(Z> -0.36) = 0.5 + 0.1406 = 0.6406
Z
-0.36 Z=0
P(Z>-0.36)
47
48
CHAPTER፡ 4
JOINT, MARGINAL AND CONDITIONAL PROBABILITY DISTRIBUTIONS
4.1 Joint probability distribution
Definition 4.1 If X and Y are discrete random variables, the function given by F(x,y) = P(X = x,Y =
y) for each pair of values (x,y) within the range of x and y is called the joint probability distribution
of x and y.
Example: 1) Assuming that if the variable x denotes the sex of the person and the variable y
denotes whether the person is smoker or not. Based on the above understanding if the probability of
male who is smoker is 50%, probability of female who smoker is 20%, probability of male who is
non-smoker is 20% and probability of female who is non – smoker is 10%. Then the joint
probability distribution of x and y can be depicted as follows in tabular form.
Joint distribution of x and y

Value of x
Values of y
X1 = male X2 – Female
Y1 smoker 0.50 0.20
Y2= non – smoker 0.20 0.10
Theorem 4-1 A bivariate function can serve on the joint probability distribution of a pair of discrete
random variable, x and y ⇔ its values f(x,y), satisfy the conditions:
1) f(x,y) ¿ 0 for each pair of values (x,y), within its domain,
∑ ∑ f ( x , y )=1,
2) x y where the double summation extends over all possible pairs (x,y) within
its domain.
Examples:
1) Check theorem 4.1 based on problem one from the above example one.
Solution:
i) f(x,y) ¿ 0 ⇒ f ( x1 , y 1 )=0 . 5 , f ( x 1 y 2 )=0 . 2, f ( x2 , y 1 )=0. 2 , f ( x 2 , y 2 )=0 .1
∑ ∑ f ( x , y )= 1⇒ f ( x 1 , y 1 )+ f ( x 1 , y 2 )+ f ( x 2 , y1 )+ f ( x 2 y 2 )
ii) x y
= 0.5 + 0.2 + 0.2 + 0.1 = 1

49
3) Determine the value of k for which the function given by f(x, y) = k[y(y-x)], for x= o,1, 2, y
=0,1,2 can serve as a joint probability distribution.
Solution:
Substituting the various value of x, and y, we get
f(0,0) = 0;f(0,1) = k; f(0,2) = 4k,
f( 1,0) = 0;f(1,1) = 0; if (1,2) = 2K ; f(2,2) =0
f(2.0)= 0;f(2,1)=-k → is not a probability distribution since it violets the first condition of theorem
4.1
3) Determine the value of K for each the function given by f(x,y) =k(x+y), for x=1,2; y=1,2.
Solution:
f= (1, 1) =2k; f (1, 2) =3k; f (2, 1) =3k;f(2,2)=4k
1
∑ ∑ f ( x , y )=1=2 k +3 k +3 k +4 k=k (3+3+3+4 )=12 k , ∴= 2
x y
Definition 4.2 If X and Y are discrete random variables the function given by:
∑ ∑ f (s,t )
F(x, y)= P(X ¿ x , Y ≤ y ) = s≤x e≤ y) for −∞< x <∞ ,−∞< y <∞ ,where f(s, t) is the value of the
joint probability distribution of x and y at (s, t),and is called the joint distribution function or the
joint commutative distribution of x and y.
Examples:
1) If the values of the joint probability distribution of x and y are as shown in the table:
X
0 1 2
Y
0 6 12 3
36 36 36
1 8 6
36 36
0
2 1
36
0 0
Solution
i) F (2, 0) = f(0,0) + f(1,0) +f(2,0) = P( ¿ 2 , y ≤0)
50
1 1 1 7
+ + =
= 6 3 12 12
ii) F(1,1) = f(0,0) +f(0,1) f(1,0) +f(1,1) = P( ¿ 1 , y≤1)

6 8 12 6 32 8
+ + + = =
36 36 36 36 36 9
iii) F(3,4)=f(0,0) + f(0,1) + f(0,2) + f(0,3) + f(0,4) +
+f(1,0) + f(1,1) + f(1,2) + f(1,3) + f(1,4)+
+f(2,0) +f(2,1) + f(2,2) + f(2,3) + (2,4) +
+f(3,0) +f(3,1) + f(3,2) + f(3,3) + (3,4) =
6 8 1 12 6 3
= + + +0+0+ + + 0+0+0+ +0+0+ 0+0+ 0+0+0+ 0+0=
36 36 36 36 36 36
6 8 1 12 6 3 36
= + + + + + = =1
36 36 36 36 36 36 36
IV) F(-2,1) = 0, there is no value of x<0, there fore P[x<0; y ¿(0 ;1)]=0
Definition 4.3 A bivariate function with values f(x,y), defined over the xy- plane, is called a joint
probability density function of the continuous random variables (CRVS) x and y ⇔ p[( x , y )εA ]=
∫ A ∫ f (x , y)dxdy ; for any region A in the xy= plane.

Theorem 4.2. A bivariate function can serve as a joint probability density function of a pair of
continuous random variables x and y if its values, f(x,y), satisfy the conditions:
1) f (x , y )≥0 , for−∞<x<∞ ,−∞< y <∞

∞ ∞
2) ∫−∞ ∫−∞ f (x , y )dxdy=1
Examples:
1) If the joint probability density of x and y is given by f(x,y)
{x
5
( x+ y ) ,for,o<x<2,0<y<1, ¿ ¿¿¿
Find P[(x,y) EA], where A is the region

{( x , y )|0<x < 32 , 0< y <1}
51
Solution
3
3 1 2 x
0< x < , 0< y <1 ]=∫0 ∫0 ( x+ y ) dxdy =
P[(x,y) EA] = P[ 2 5
3
X
2 3
1 1 1 1 x3 x2 2
∫0 ∫0 2
5
( +xy )dxdy = ∫0 [ + y ]0 dy=
5 3 2
=
2
1 1 9 9 1 8 1 1 8 y
∫ ( + y )dy = × ∫0 (1+ y )dy= × [ y+ ]10 =
= 5 0 8 8 5 9 5 9 2
1 8 1 1 8 3 27
× (1+ )= × × =
5 9 2 5 9 2 80
2) Determine R, so that f(x,y) =

{ Rx(x−y),for,0<x<1,−1<y<1,¿¿¿¿
Solution
Using theorem 4.2(2), we have:
1 1 1 1
1= ∫−1 ∫0
Rx ( x− y )dxdy=R ∫−1 ∫0 ( x − yx )dxdy=
2
3 2
∫−1 ( x3 − y x2 )|10 dy=R ∫−1 ( 13 − 2y )dy=
1 1
=R
2
y y 1 1 1 1 1 2
− )|−1 =R [( − )−(− − )]= R
= R( 3 4 3 4 3 4 3
3
⇒ R=
2
Definition 4.4 If x and y are continuous random variables (CRVs), the function given by:
y x
F(x,y) = P(x ¿ x, y≤ y )=∫−∞ ∫−∞ f (s ,t )dsdt , for −∞< x <∞ ;−∞< y< ∞ , where f(s, t) is the value
of the joint probability distribution at (s, t), is called the joint distribution function of x and y.
Example 1) Find the joint probability density of the two random variables x and y whose joint
distribution function is given by:
52
{( e )(1− e ) , for , x >0 , and , y> 0,0 , elsewhere , also use the joint probability
3 3
−x −y
1−
F(x,y) =
density to determine P(1<x ¿ 2,1≤2,1< y ≤2) .

Solution
1st we have to find the primitive (joint distribution function of x and y) function. Hence
[∫ ∫ ]
y x
dd dd
dxdy
[ F ( x , y ) ]=
dxdy
f ( s ,t ) dxdy = f ( s ,t ) ,
−∞ −∞ So
dd d
[(1−e−x3 )(1−e− y3 )]=3 x 2 e− x3 (1−e− y3 )=
dxdy dy
= 9x2y2 e-x3 e-y3 = 9 x2y2e-(x3+y3), but joint probability density P(1<x ¿ 2,1< y≤2 )
3
ye
2 2 2
2 −y
⇒∫1 ∫1 9 x y e 2 2 −( x 3+ y 3)
dxdy =∫1 3 [−e−x 3 ]21 dy=
2
)∫1 3 y e
−( 1)3 −(2 )3 2 −y3 −1 −8 −y 3 2
⇒(e −e dy =(e −e )[−e ]1 =
( )
−1 −8 2
= (e -e ) (e -e ) = e −e
-1 -8 -1 -8
= e-2-2e-9+e-16
4.2 Marginal probability distribution
Definition 4.5 If x and y are discrete random variables and f(x,y) is the value of their joint
∑ f ( x , y)
probability distribution at (x,y), the function given by g(x) = y
for each x within the range of x is called the marginal distribution of x.

∑ f ( x, y)
Correspondingly, the function given by h(y) = x for each y within the range of y is called
the marginal distribution of y.
Definition 4.6 If x and y are continuous random variables (CRVs) and f(x,y) is the value of their
∞
joint probability density at (x,y),the function given by g(x) = ∫−∞ f ( x , y )dy ,for−∞<x<∞ is
called thee marginal density of x.
∞
Correspondingly, the function given by h(y) = ∫−∞ f ( x , y )dx ,for−∞< y<∞ ,is called the marginal
density of y.
Examples:
53
1. The joint probability distribution of X and Y is given by:
1
f ( x , y ) = 30 ( x + y ) , for , x=0,1,2,3 ; y=0,1,2, find :
i) The marginal distribution of x.

ii) The marginal distribution of y.
Solution
i) The marginal distribution of x.
2
∑ f ( x , y ), for
g(x) = y =0 x = 0,1,2,3
1 2 3
+ =
g(0) = f(0,0) + f(0,1) + f(0,2) = 0+ 30 30 30
1 2 3 6
+ + =
g(1) = f(1,0) + f(1,1)+ f(1,2) = 30 30 30 30
2 3 4 9
+ + =
g(2) = f(2,0) + f(2,1) + f(2,2) = 30 30 30 30
3 4 5 12
+ + =
g(3) = f(3,0) + f(3,1) + f(3,2) = 30 30 30 30
ii) The marginal distribution of y
3
∑ f ( x , y );
h(y) = x −2 for y = 0,1,2
1 2 3 6
+ + =
h(0) = f(0,0) + f(1,0) + f(2,0) + f(3,0) = 0+ 30 30 30 30
1 2 3 4 10
+ + + =
h(1) = f(0,1) + f(1,1) + f(2,1) + f(3,1) = 30 30 30 30 30
2 3 4 5 14
+ + + =
h(2) = f(0,2) + f(1,2 + f(2,2) + f(3,2) = 30 30 30 30 30
Here we, can also illustrate in diagram (tabular) form as follows:
x 0 1 2 3 Marginal
y distribution of
Y
54
0 0 1 2 3 6
36 36 36 36
1 1 2 3 4 10
36 36 36 36 36
2 2 3 4 5 14
36 36 36 36 36
Marginal distribution 3 6 9 12
36 36 36 36
Of X
2) Given the joint probability density f(x,y) =

{2¿
3¿
0,elsewhere¿
¿ ( 2 x+y ) ,for,0<x<1;0<y<1,and
Solution
i) The marginal density of x.
∞ 1 (4 x +2 y ) 2 1
∫−∞ ( x , y )dy=∫0 3
dy= ∫0 (2 x+ y )dy=
3
g(x) =
2
2 y2 2 (1) 1
[ 2 xy+ ]10 = [(2 x (1)+ −(2 x (0 )+ (0 )2 ]
= 3 2 3 2 2
2 1 1
(2 x+ )= (4 x +1 )
= 3 2 3
ii. The marginal density of y.
∞ 1 2 2 1
∫−∞ f ( x , y )dx =∫0 3 (2 x + y )dx= 3 ∫0 (2 x+ y)dx=
h(y) =
1 + y (1 ))−(0 + y( 0))]= 32 (1+ y )

2
2 2 x2 2 2
[ + yx ]10 = [( x 2 + yx ]10 = [ 2
= 3 2 3 3
4.3 Conditional Distribution

4.3.1 Conditional distribution
So far we have seen the conditional probability from chapter one (1,3).And the concept of
conditional distribution has the same procedure. For this matter let us recall the following
definitions and examples.
55
Definition 4.7 If f(x,y) is the value of the joint probability distribution of the discrete random
variables x and y at (x, y) and h(y) is the value of the marginal distribution of y. at y, the function
f (x,y)
f (x / y )= , h( y )≠0 ,
given by: h( y) for each of x within the range of x, is called the
conditional distribution of x.
Correspondingly, if g(x) is the value of the marginal distribution f x at x, the function given by
f ( x, y)
W ( y/ x )= ; g (x )≠0 ,
g( x) for each y within the range of y, is called the conditional distribution
of y given x=x
Examples:
xy
,
1) Given the joint probability distribution f(x, y) = 36 for x = 1,2,3 and y = 1,2,3 , find the
conditional distribution of x given y=1
Solution:
1×1 1
][ [ ]
f (1,1) f (1,1 ) 36 36 1
= = = =
h(1 ) f (1,1 )+f (2,1 )+ f (3,1 ) 1×1 + 2×1 + 3×1 [ 6 ] 6
f(1,1) = 36 36 36 36
2×1
[]
(y)
x 1 2 3
f ( 2,1 ) 36 2 1
f (2 /1 )= = = = y h i
h(1) 1 2 3 6 3
+ + 1 1 2 3 6
36 36 36
36 36 36
h(1)= 36
3×1
2 2 4 6 12
[ ] 36 36 36
f (3,1 ) 36 3 1 h(2)= 36
f (3 /1)= = = =
h(1) 1 2 3 6 2 3 3 6 9 18
+ +
36 36 36 36 36 36
h(3)= 36
2) Suppose if a textile factory is interested to know the relation ship between the work experience of
employees (x) and their grade (work level) in the factory (y) and if these two bivariate random
variables are constructed as a joint probability distribution denoted by f(x, y) for x = 0, 1, 2 and y=
0, 1,2 when the individual employee is selected at random there may be 9 possible outcomes.
56
These are shown in the table below as follows which depicts the relation ship between work
experience of employees and their level of work in a textile factory.
x 1 2 3 h(y)
y
0 0.44 0.22 0.03 0.69
1 0.22 0.06 0 0.28
2 0.03 0 0 0.03
g(x) 0.69 0.28 0.03 1=F(x,y)
Where x= Number of work experience of employees

y= employees work level (grade)
g(x) = marginal distribution of x.
h(y) = marginal distribution of y.
F(x, y) = joint probability distribution of x and y
i) Calculate the marginal distribution of x
ii) Calculate the marginal distribution of y
iii) What is the probability that x = o given that y=1?
iv) What is probability that y =1 given tat x = 1? (find the conditional distribution of
x = 0 given y=1?)
Solution:
∑ f ( x, y),
i) g(x) = y for x = 0,1,2
g(0) = f(0,0) + f(0,1,) + f(0,2) = 0.44 + 0.22 + 0.03 = 0.69
g(1) = f(1,0) + f(1,1) + f(1,2) = 0.22 + 0.06 + 0 = 0.28
g(2) = f(2,0) + f(2,1) + f(2,2) = 0.03 + 0 + 0 = 0.03
∴ g( x )=0 , 69, 0.28, 0.03
∑ f ( x, y),
iii)h(y)= x for y = 0,1,2.
h(0) = (0,0) + f(1,0) + f(2,0) = 0.44 + 0.22 + 0.03 = 0.69
h(1) = (0,1) + f(1,1) + f(2,1) = 0.22 + 0.06 + 0 = 0.28
h(2) = f(0,2) + f(1,2) + f(2,2) = 0.03 + 0+ 0 = 0.03
∴ h(y) = 0.69, 0.28, 0.03
and the joint probability distribution function or joint cumulative distribution of x and y = F(x, y) =
1 (see definition 4.2)
f (0,1) p ( x=0 , y=1) 0 . 22 0 .22
f (x=0/ y=1 )= p( x=0 / y =1)= = = =
iii) h(1) p ( y=1 ) 0 . 22+ 0. 06+0 0. 28
f (x =1, y=1 ) f (1,1) p( x=1 , y =1)
w ( y=1/1 x=1 )= p( y=1/x=1)= = = =
iv) g ( x=1) g (1) p( x=1 )
0. 06 0 . 06
= =0 . 21
= 0. 22+0 . 06+0 0 . 28
57
And hence, by proceeding similarly for all cases, we can obtain the conditional distribution of x
given y and the conditional distribution of x given y and the conditional distribution of y given x.
The above definition 4.7 of conditional distribution holds for both discrete and continuous random
variables for the assurance of it lets see the next definition.
Definition 4.8 If f(x, y) is the value of the joint density of the continuous random variables (CRVs)
X and Y at (x, y) and h(y) is the value of the marginal density of y at y, the function given by f(x, y)
f (x,y)
; h( y )≠0 , for−∞<¿<∞ ,
= h ( y ) is called the conditional density of x given y=y,
Correspondingly, if f(x) is the value of the marginal density of x at x, the function given by
f (x,y)
w ( y /x )= ,
g (x ) g(x) ¿ 0 , for−∞< y <∞ , is called the conditional density of y given x=x.
Example
1) Given the joint probability density

2
{
f (x, y)=¿ (x, y),for,0<x<1,0<y<1, ¿ ¿¿¿
3
i. Find the conditional density of x given y=y and use it evaluate

1
[
p x≤ , y=
2
1
2 ]
ii. Find the conditional density of y given X=x and use to evaluate
[ 1
P x≤ ; y≤
2
1
2 ]
Solution
i. 1st find the marginal density of y at y
∞ 1 2 2 x2 2
∫−∞ f ( x , y )dx =∫0 3 (2 x+ y)dx= 3 + yx ]10 = ( y+1),
h(y) = [2 2 3
2
(2x+ y )
f ( x , y) 3
f ( x / y )= = , 2 x+ y
h( y ) 2
( y +1)
then 3 for o<x<1 and f ( x / y )=0 elsewhere = y+1
1
2 x+
( )
f x/ =
1
2 1
2 4 x +1
+1
=
3
Hence, 2
1 1
1 1 1 1 4 x2
P[ x≤ / y= ]=∫02 (4 x +1)dx= [ +x ]02 =
There fore, 2 2 3 3 2
58
1 1 1 1 1 1 1
[(2( )2 + )−(2(0 )2 +(0 )2 ]= ( + )=
= 3 2 2 3 2 2 3
ii. 1st find the marginal density of x at x
∞ 12 2 y2 1 2 1
g( x)=∫−∞ f ( x , y )dy=∫0 (2 x+ y )dy = [ 2 xy + ]0 = (2 x+ ),
3 3 2 3 2
2
( 2 x+ y )
f (x,y) 3
= ,
g (x ) 2 (2 x + 1 )
then w(y/x) = 3 2 for 0 < y < 1 w(y/x) = 0 elsewhere
1
[2( )+ y ]
2 2
= ( y +1 ) 1
1 1 3 1 1 2 2
[2( )+ ] ¿ / x= ]=∫0 ( y +1) dy
= 2 2 , there fore, P[y 2 2 3
1
1 ( )2
2 y2 2 2 1 2 5 5
[ + y ]0 2 = [ + ]= × =
= 3 2 3 2 2 3 8 12
CHAPTER 5: ESTIMATION
5.1 basic concepts

In statistical inference, the problem of estimation is part of decision making process. In the problem
of estimation determination we must determine the value of a parameter (s) from a possible
continuum of alternatives.
We use the value of a statistic to estimate a population parameter. We call it point estimation, and
we refer to the value of the statistic as a point estimate of the parameter. For example: Suppose, we
−
use a value of x to estimate the mean of a population, and observed sample proportion to estimate
the parameter P of binomial population or a value of S2 to estimate a population variance. We, in
each case use a point estimate of the parameter. These estimates are called point estimates.
−
Correspondingly as point estimators for instance, x may be used as a point estimator of μ and
−
s may be used as a point estimator of σ s

2 2 2
similarly , in which cases x and are point estimates

of these parameters respectively. In real situation to estimate a point is some what problematic to
measure, and hence we distinguish in rural estimators and interval estimates. The only thing that
59
makes difference from the “point” is here we consider the lower and the upper boundary that the
estimate could be appropriately found.
5.2 desirable properties of estimators
Various statistical properties of estimators can thus be used to decide which estimator is most
appropriate in a given situation or that may be given smallest risk, may give most information at the
lowest cost, etc. The particular properties of estimators are unbiased ness, minimum variance,
efficiency, consistency, sufficiency and robustness.
A) Unbiased estimators
It is known that perfect disc ion does not exist, but relatively. In connection with problems of
estimation, there is no perfect estimator which always gives right value. Thus an estimator should at
least lie on the average; i.e. its expected value should equal the parameter that it is supposed to
estimate and it is said to be unbiased.
¿ ¿
Definition 6.1 A statistic θ is an unbiased estimator of the parameter θ, if and only if E ( θ ) = θ

Example:
1. If X has the binomial distribution with the parameter n and P, show that the sample
x
,
proportion n is an unbiased estimator of P.
Solution:
Since E(X) = np, it follows that
x 1 1 x
) ×E ( x ) . np= p , )
E(n = n =n and hence that ( n is unbiased estimators of P.
Therefore: 6.1 if S2 is the variance of a random sample from an infinite population with the finite
variance σ σ
2 2
, then E (S 2) =
.
Proof: - by definition 5.2,
60
[ ]
n
1
E . ∑ (xi−x )2
n−1 i =1
[∑ ]
n
1
E {( xi−μ )−( x−μ ) }
2
¿ .
n−1 i −1
[∑ ]
n
1
¿ . E {( xi−μ ) ¿ 2 } −n . E {( x−μ )¿ 2 }
n−1
E (S 2) i =1
Then, since
{
E (x i −μ) ¿ 2 } =S2 andE X −μ ) ¿ 2 } = { S2
,
n it follows that,
[ ]
n
1 S2
E( S2 )=
n−1
. ∑ S 2−n . n
=S 2
i=1
B) Efficiency
It is when we choose one of several unbiased estimators of a given parameter, usually taken the
one whose sampling distribution has the smallest variance. It is more reliable and called the best
¿
unbiased estimator. Therefore, if θ is unbiased estimator of θ then Var
(θ )
¿
1
[( )]
=
d ln f ( x )
2
n× E
dθ ,
Where f(x) is the value of the population density at X and n is the size of the random sample.
¿
This formula is called the Cramer-Rao inequality. Then θ is a minimum variance unbiased
estimator of θ .
Example:
−
1) Show that x is a minimum variance unbiased estimator of the μ of a normal population.

Solution: -
( x−μ
σ )
2
1
−
1
We know f(x) = √ 2 π
σ
× e 2
, for - ∞< x <∞ applying ln to both sides of the above

function we get:
61
1 x −μ 2 d ln f ( x ) 1 x−π
−ln σ √ 2 π − ( ) = ( )
Ln f(x) = 2 s then ,and dμ σ σ ,and hence
[( d ln f ( x )
) ] [( )]
2
x−μ
2
1 1 1
E = ×E = ×1= 2
dμ σ2
σ σ2 σ
2
1 1 σ
[( )]
= =
1 n
d ln f ( x ) n×
2
σ2
n×E
dμ
−
Thus, and since x is unbiased and

σ
− 2
( x )=
−
var n according to theorem 5.2, it follows that ( x ) is a minimum variance unbiased

estimator of μ . It is also understandable that unbiased estimators are not unique and therefore to
choose the appropriate one among those we use their variance to compare them. Hence
unbiased estimators of one and the same parameter are usually compared in terms of the size of
¿ ¿
their variances. If θ 1 and θ 2 are two unbiased estimators of the parameter θ of a given
¿ ¿ ¿
population and the variance of θ 1 is less than the variance of θ 2, we say that θ 1 is relatively
(θ 1)
¿
var
(θ 2)
¿
¿ ¿
more efficient than θ 2,also we use the ratio

var
as a measure of the efficiency of θ 2
¿
relative to θ 1 .
Example:
1) If X1, X2,…,Xn constitute a random sample from a uniform population with α=0 , then
n+1
n
× y n
is unbiased estimator of β . Show that 2
−
x is also an unbiased estimator of β .

Solution:
62
β
μ=
Since the mean of the population 2 according to theorem 3.4 it follows from theorem 5.1 that E
− β − −
( x ) = 2 and hence that E= ( 2 x ) = β . Thus, 2 x is an unbiased estimator of β .

C. Consistency
Definition 6.2: The statistic θ is a consistent estimator of the parameter θ if and only if for C>0
(θ )
¿
lim
P ¿ −θ/¿ c =1
n→∞ .
Consistency is an asymptotic property that is a limiting property of an estimator. The above
definition says that when n is sufficiently large, we can practically certain that the error made with a
consistent estimator will be less than any small reassigned positive constant. The kind of
convergence expressed by the limit in definition 6.2 is generally called convergence in probability.
x
We have shown in definition 6.1 example one that n is a consistent estimator of the binomial
parameter θ and in theorem 5.2 that X is a consistent estimator of the mean of a population with a
finite variance. In practice, we can often judge whether an estimates is consistent by using the
¿ ¿ ¿
following sufficient condition. I.e. If θ is an unbiased estimator of the parameter θ and var ( θ)
¿
→ 0 and as n→ ∞ then θ is a consistent estimator of θ .

Example: 1) show that for a random sample from a normal population, the sample variance (S2) is a
consistent estimator of σ
2
.
Solution:
is an unbiased estimator of σ
2
Since S 2
in accordance with theorem 6.1, it remains to be shown that
var (S2) →0 as n→ ∞ . We find that for a random sample from a normal population var (S 2) =
σ
2
2
2σ lim 2
=0 .
n−1 .It follows that n→∞
n−1 which is convergent.
D. Sufficiency
63
¿
An estimator θ is said to be sufficient if it utilize all the information in a small relevant to the
estimation of θ , that is if all the knowledge about θ that can be gained from the individual sample
¿
values and their order can just as well be gained from the value of θ alone.
¿
The sufficiency property of an estimator of the parameter θ can be described each value of θ by
¿ ¿
the conditional probably distribution (density) of the sample values given θ = θ is expressed as: f(x , 1
¿
¿
f ( X 1 , X 2 , .. . , Xn , θ= f (X 1, X 2 . .. , X n )
θ)=
¿ ¿
x2,…,xn/ g( θ) g( θ) .
¿ ¿
The above expression is not depend on θ then the particular values of X 1, X2…Xn yielding θ =θ
will be just as likely for any values of θ , and the knowledge of these sample values will be no help in
the estimation of θ .I.e., the statistic θ is a sufficient estimator of the parameter θ if and only if for
¿
each value of θ the conditional probability distribution or density of the random sample X1,X2,X3,
¿ ¿
…,Xn, given θ =θ is independent of θ .

¿
Accordingly, the statistics θ is a sufficient estimator of the parameter θ if and only if the joint
probability distribution or density of the random sample can be factored so that.
(θ ) (θ )
¿ ¿ ¿
f ( x 1 , x 2 , .. . , xn ;θ )=g ,θ ×h ( x 1 , x 2, . .. , xn )
, where
g ,θ
depends only on θ and θ and
h ( x1,x 2,..., xn ) doesn’t depend on θ .
E) Robustness
Robustness is indicative of the extent to which estimation procedures are adversely affected by
violation of underlining assumption. In other words, an estimator is said to robustness if it is sampling
distribution is not seriously affected by violations of assumptions. Such violations are often due to
64
outliers caused by out right errors made; say in reading instruments or recording the data or by
mistakes in experimental procedures.
5.3 Method Estimation
There can be many different estimators of one and the same parameter of a population. Therefore it
would seem desirable to have some general methods that yield estimates with as many desirable
properties as possible. And hence some possible methods of estimation are: method of moments, least
square methods and methods of maximum likely hood estimation.
5.3.1 Methods of moments
The method of moments consists of equating the first few moments of a population to the
corresponding moments of a sample, thus getting as many equations as are needed to solve for the
unknown parameters of the population.
Definition: 6.3 The Kth sample moment of a set of observations X 1, X2, …, Xn is the mean of their Kth
n
∑ xi
k
M=
'
i=1
M
'
powers and it is denoted by k ; symbolically, k n , thus if population has r parameters

' '
the method of moments consists of solving the system of equations: mk =μk , K=1,2, . .. , r For the r
parameters.
Example: 1) Given a random sample of size n from a uniform population with α=1, use method of
moments to obtain a formula for estimating the parameter β .

Solution:
−
m = x , and μ = α +2 β = 1+2 β according to

'
m '1=μ ; where 1
'
1
The equation that we shall have to solve is
−
theorem 3.1 thus,

x = 1+2 β ⇒ we can write the estimate of β as β=2 x−1
5.3.2 Least square method

A major objective of many statistical and/or socio-economic investigations is to establish relation
ships that make it possible to predict one or more variables in terms of others. If we are given the
joint distribution of two random variables X and Y and X is known to take on the value x, the basic
65
problem of bivariate regression is that of the determining the conditional mean μ y /x that is the
“average” value of Y for the given value of X. The term “regression” as it is used under the basic
statistic course, dates back to Francis Galeton, who employed it to indicate certain relation ships in
the theory of heredity.
In problems involving more than two random variables, that is in multiple regression we are
concerned with quantities such as μ z/x , y , the mean of Z for a given values of X and Y,
μx x x x
4 / 1, 2, 3 ,
the mean of X4 for a given value of X1, X2 and X3, and so on.
If f(x, y) is the value of the joint density of the two random variables X and Y at (x, y) the problem of
bivariate regression is simply that of determining the conditional density of Y given X=x and then
μ
∞
evaluating the integral

y /x =E( y/ x)= ∫ y. w ( y/x )dy
−∞ the resulting equation is called the
regression equation of Y on X. Alternatively, we might be interested in the regression equation
μ
∞
x/ y =E( x/ y)= ∫ x.f ( x/ y)dx .
−∞
In the discrete care, (when dialing with probability distribution) the integrals in the two regression
equations given above are replaced by sums. i.e.
μ y / x =E( y / x )=∑ y . w( y / x )
y
μ x / y = E( x / y )=∑ x . f ( x / y )
x
Understanding the regression equation possible to talk about linear equation which is of the form:
μ y / x =α + βx , where α and β are constants called the regression coefficients. The regression
coefficients can be expressed interims of some of the lower moments of the joint distribution of X
σ σ
2 2
and Y, that is interims of E(x) = μ x , E ( y )=μ y var( X )= x ,var ( Y )= y , and cov(x, y)= σ xy .
ρ=
σ xy
Then also using the correlation coefficient σσ

x y where ρ -is the correlation coefficient its
range is measured between −¿ ∂≤1 then we can prove the following results:
66
μ y / x= μ y + ρ
σ y
( x− μ x )
Theorem 6.2 If the regression of Y on X is linear, then σ x
μ x / y= μ x + ρ
σ x
( y −μ y )
And if the regression of X and Y is linear then σ y
Proof: - Since
μ y / x=α+βx , it follows that ∫ y .w( y/ x)dy= α+ βx,
Multiply on both sides of this equation by g(x), the corresponding value of the marginal density of X
and integrate on X, we obtain
∬ y.w( y, x)g( x)dydx=α ∫ g( x)dx+β∫ x .g( x )dx , Or μ y =α + βμ x
Since w(y/x) g(x) =f(x,y). If we multiplied the equation for

μ y/ x on both sides by x.g(x) before
integrating on X we would have obtained:
∬ xy .f ( x , y)dydx=α ∫ x .g( x )dx+β ∫ x 2 . g( x)dx

Or
E( xy )=αμ x + βE( x 2 )
Solving μ =α + β μ
y x
2
and E( xy )=αμ x +βE( x ) for α∧β and making use of the fact that
σ σ +μ
2 2
2
E( xy )= xy + μ x μ y∧E ( x )= x x ,we find that
α = μ y−
σ xy
×μ x = μ y − ρ
σ y
×μ x , and , β == ρ
σ xy
=ρ
σ y
σ σ σ σ
2 2
x x x x
μ y / x= μ y + ρ
σ y
( x− μ x )
This enables us to write the linear regression equation of Y on X as σ x
μ x / y= μ x + ρ
σ x
( y −μ y )
When the regression of X on Y is linear, similar steps lead to the equation σ y .
The method of least squares

If a set of paired data {(Xi, Yi) ; i=1,2,…,n}gives the indication that the regression is linear, where
we don’t know the joint distribution of the random variables under consideration but nevertheless
67
want to estimate the regression coefficients α∧β this can be handled by the method of least
squares. It is a method of curve fitting. For instance given the paired of data then plotting the given
data on the number plane we may get the impression that a straight line provides a reasonably good
fit. But some points may not fall on a straight line, the overall pattern suggest that the average period
of the points may will be related to an equation of the form μ y / x =α + βχ .

Here if we decided that the regression is approximately linear, next comes a problem of how to
estimate the coefficient α∧β from the sample data. And let the deviation from a point to the line be
e i which is called the error term (residual) the least squares estimates of the regression coefficients
¿ ¿
are the value of α ∧β for which the quantity:

n n
∑ e 2i =∑ [ yi−(α + β xi)]2
Q= i=1 i =1 is a minimum.
¿ ¿
Differentiating partially with respect to α ∧β and equating these partial derivatives to Zero we
obtain:
[ y (α β x )]=0
¿ ¿
n
dQ
¿ =∑ (−2 ) i− + i
α
i=1
d
[
x y −( α + β x ) =0 ]
¿ ¿
n
dQ
¿ =∑ ( 2 ) i i i
β
i=1
And d
And then the system of normal equations will be:
¿ ¿
n n
∑ y i =α n+ β . ∑ x i
i=1 i=1
¿ ¿
n n
∑ x i y i= α ∑ x i + β ∑ X 2i
n
i=1 i=1 i=1
Solving the above system, we find the least squares estimate β∧α . i.e.
68
¿
β =n( ∑∑ X −(∑∑Xi )
n( XiYi )−( Xi)( ∑ Yi)
2 2
i
and
¿
∑ Yi− β . ∑ Xi
¿ ¿ ¿
α =n α=y−β x or
again, if , s =∑ ( Xi−X ) =∑ x −n (∑ xi) ,

2
1 2 2
xx i
s =∑ (Yi− y ) =∑ y −1n ( ∑ y i ) , and

2
2 2
yy i
− −
s ∑ ( xi− x )(Yi− y )= ∑ x i y i −n (∑ x i )(∑ y i )

1
xy =
s
¿ − ¿ −
¿ ¿ ¿
β=
xy
, And α= y−β x .
Then for the least squares line y =α + β x . s xx
Example1. Various doses if poisonous substance were given to groups of 10 mice and the
following results were observed.
4 9 10 14 4 7 12 22 1 17
Dose of (mg); Xi
31 58 65 73 37 44 60 91 21 84
Number of deaths ;Yi
a) Find the equation of the least squares line fit to this data.
b) Estimate the number of deaths in group of 10 mice who receive an 8 mg dose of this poison.
Solution:
a) Find that and we get n=10 given
∑ X i=100 , ∑ X 2i =1376 , ∑ Yi=564 , And ∑ XiYi=6945
s ( 100 ) =376 ; and s
2
1 1
thus , xx =1376− xy =6945− ( 100 ) (564 )=1305
10 10
¿
s ¿ − ¿ −
thus , β 1305
= 376 =
xy
=3 . 471and α = y − β . x =564 −3 . 471 x
100
=21. 69 , and
s xx
10 10
the equation of the least square line is y =21.69+3.471 X i
69
¿ ¿
b) Substituting X=8 in to the equation obtained in part (a) we get y =21.69+3.471x8= 49.458 or y
= 50 rounded to the nearest unit. Therefore, in 8 mg of a dose of poison about 50 mice of death
results.
5.3.3 Maximum likelihood estimation
Another method of estimation is called the method of maximum likelhood. The advantage of this
method is that, it yields sufficient and asymptotically minimum variance unbiased estimators.
To understand more suppose, that four news letters arrive to somebody every morning but
unfortunately one of them is absent if among the remaining three news letters two contain job-
Vance notice and the other one doesn’t what might be a good estimate of K, the total number of
job vacant notice among the four letters received? Clearly, K must be two or three and if we
assume that each news letter had the same chance of being absent, we find that the probability of
the observed data (two of the three remaining news letters contain a job-vacancy notice) is:
( )( ) = 1 for , K =2 , and ( )( ) = 3
2 2 3 1
2 1 2 1
() ()
4 2 4 4
3 3
For K=3 therefore if we choose or estimate of K the value that maximizes the probability of
getting the observed data, we obtain K=3. We call this estimate a maximum like hood estimate
and the method by which obtained is called the method of maximum like hood.
Thus its essential feature is that we look at the sample values and then choose as our estimates of
the unknown parameters the values for which the probability or probability density of getting the
sample values is a maximum.
In the discrete case, if the observed sample values are x 1, x2, …,xn, the probability of getting them
is: P(X1=x1,X2=x2,..,Xn=xn)=f(x1,x2,…,xn; θ ) which is just the value of the joint probability
distribution of the random variables X 1,X2,…,Xn at X1=x1,X2=x2,…,Xn=xn. Since the sample
values have been observed and are therefore fixed numbers we regard f(x 1,x2,…,xn; θ ) as a value
of a function of θ , and we refer to this function as the likelihood function.
Accordingly, the random sample comes from a continuous population which is analogous. In
general if x1,x2,..., xn are the values of a random sample form a population with the parameter θ , the
70
likelihood function of the sample is given by: L( θ ) = f(x1, x2, …xn; θ ) for values of θ within a
given domain. Here f(x1, x2, …xn; θ ) is the value of the joint probability distribution or the joint
probability density of the random variables x1,x2,…, xn at X1 = x1, X2=x2, …, Xn = xn.
Example: 1) If x “successes” in n trials, find the maximum likelihood estimate of the parameter θ
of the corresponding binomial distribution.
Solution:
To find the value of θ that maximizes
L ( θ ) = ( ¿ x ) θ x(1- θ ) n-x
n
The value of θ that maximizes L( θ ) will also maximize lnL( θ )=ln ( ¿ x ) + x . ln θ + (n-x). ln (1- θ
n
∂( ln L(θ )) x n−x
= −
),we get: ∂θ θ 1−θ and then we equate to zero the above result and solve for θ , we can
x x
θ= θ=
find the likelihood function which has a minimum at n . Hence n is the maximum
Λ
x
θ=
Lakewood estimate of the binomial likelihood parameter θ .And n as the corresponding
maximum like hood estimator.
5.4 Estimation application
Introduction
Λ
Previously we considered a point estimate θ of θ with the size of the sample and the value of var(
Λ Λ
θ ) or with some other information about the sampling distribution of ( θ ). This will enable us to
appraise the possible size of the error. And alternatively, we might use interval estimation. An
Λ Λ Λ Λ
interval estimate of θ is an interval of the form θ1 <θ <θ2 , where θ1 and θ2 are values of
Λ Λ Λ Λ
appropriate random variables θ1 and θ2 . By “appropriate” we mean that P( θ1 <θ <θ2 ) = 1- α for
Λ Λ
some specified probability 1- α .For a specified value of 1- α , we refer to θ1 <θ <θ2 as a (1- α )
100% confidence interval for θ . Also 1- α is called the degree of confidence. And the end points of
Λ Λ
the interval, θ1 and θ2 , are called the lower and upper confidence limits. For in stance, when α =
71
0.01, the degree of confidence is 0.99 and we get a 99% confidence interval. Here, it should be
under stood that, like point estimates, interval estimates of a given parameter are not unique.
5.4.1 Estimation of means

To illustrate how the possible size of errors can be appraised in point estimation, suppose that the
mean of a random sample is to be used to estimate the mean of a normal population with the known
variance σ .
2
The sampling distribution of x for random, samples of size n from a normal
2
population with the mean μ and the variance σ is a normal distribution with
μ x=μ and σ 2x =σ 2n
x−μ α
Z= ,
Thus we can write P (/Z/ < Z

(
P ¿ Z /¿ Z α
2
) =1−α ,
were
σ 2
√n -is the level of significance
at both from the left and right sides.
In words, if X, the mean of the random sample of size n from a normal population with known
2
variance σ , is to be used as an estimator of the mean of the population, the probability is 1- α that
σ σ
α × . Then , P(/ x−μ/¿ Z α × )=1−α
the error will be less than Z 2 √n 2 √n .It follows that:
σ σ
P( x−Z α × < μ< x + Z α × )=1−α
2 √n 2 √ n
In other words if X is the value of the mean of a random sample of size n from a normal
σ σ
x−Z α × <μ<x+ Ζ α × isa(1−α ) 100 %
population with the known variance σ , then
2
2 √n 2 √n
2
confidence interval for the mean of the population. Or when σ is unknown, but n ¿ 30 (large)
which is also possible to use the above formula for the estimation of population mean.
Example: 1 If a random sample of size n = 40 form a normal population with the variance
2
σ = 255 has the mean x=64.3 construct a 0.95 or (95%) confidence interval for the population
mean μ .
Solution:
72
2
Given n =40, x=64. 3 ,σ =225, and 1- α=0 . 95
⇒ σ=√ σ 2=σ=15
α=1−0 . 95=0 .05
α
=0 . 025
2
α =Z 0. 025 =1. 96 ,
Since n ¿ 30, then we use Z - table and form the Z-table Z 2 and final apply the
σ σ
x−Z α × < μ< x −Ζ α ¿
interval formula, we get: 2 √n 2 √n
15 15
⇒64 . 3−1. 96× <μ<64 .3+1. 96×
√ 40 √ 40
59.7 < μ<68 .9
When we have a random sample from a normal population n < 30, and σ is unknown, we use a
random variable having the t- distribution with n-1 degrees of freedom.
x −μ
,
tα <T <t α )=1−α tα S
is discussed on 5.3 and there fore, T= √n we get
, n−1 , n−1 ,n−1
i.e. P(- 2 2 , where 2
the following confidence interval for μ .
If x and S are the values of the mean and the standard deviation of a random sample of size n from
S S
x−t α × < μ< x +t x ×
a normal population, then 2
, n−1 √n 2,
n−1 √ n
is a (1- α )100 % confidence interval
for the mean of the population.
Example: 1 A fertilizer manufacture wants to estimate the average number of tons of fertilizer sold
per month. One year was monitored, and average monthly sales of 10 tons were recorded. If the
sample variance is 4 tons, compute the confidence limits at the 95% level.
Solution:
Given: x=10 tons, S2= 4 tons → S = 2 tons, n = one year → 12 months, confidence level = 1-
α
α=0 . 95 , 2 = 0.025
73
α =t α =t α =2. 201
, n−1 (12−1) ,11
There fore we use t – distribution and t 2 2 2
Then the interval estimate is:

s S
x−t α × < μ< x +t α ×
2
, n−1 √n 2
, n−1 √n
2 2
¿ <μ <10+2. 201×
10 – 2.201 √12 √ 12
8.72 < μ<11.27
5.4.2 Estimation of differences between means
If we have two populations with specified means and variances and n 1 and n2 be two large samples
each greater than or equal to 30 drawn from the first and the second populations respectively the
√
σ 2 σ 2
1 2
+
sampling distribution of
x 1−x 2 is normal with mean μ1 −μ 2 and standard deviation n1 n2
( x1 −x 2 )−( μ 1−μ2 )
√
σ 2 σ 2
1 2
+
The statistic Z = n1 n2 has the standard normal distribution.
α < Z< Z α )=1−α
If we substitute the above expression for Z in to P(-Z 2 2
√ √
2 2 2
σ1 σ2 σ 2 σ2
x 1−x 2 −Z α × + < μ1 −μ2 <( x 1−x 2 )+Z α × + )=1−α
n1 n 2 n1 n2
P( 2 2
In other words, if
x 1 and x 2 are the values of the means of independent random samples of size n
1
2 2
and n2 from a normal populations with the known variances σ 1 and σ 2 , then
√ √
2 2 2 2
σ1 σ2 σ1 σ2
( x1 −x 2 )−Z α × + < μ1 −μ2 <( x1 −x 2 )+Z α × +
2
n1 n2 2
n1 n2
is a (1- α )100% confidence interval for the difference between the two population means. On the
other hand, when the sample sizes n1 and n2 are small, and σ 1 and σ 2 are unknown the procedure
for estimating the difference between the means of the two normal population could be constructed
by assuming that σ 1 = σ 2
If σ 1 = σ 2 = σ , then
74
( x1 −x 2 )−(μ 1−μ 2 )
Z=
σ
√ 1 1
+
n1 n 2 is a random variable having the standard normal distribution, and σ 2 can
be estimated as pooled estimator by pooling the squared deviations form the means of the two
samples.
2 2
2 (n1 −1)S 1 +(n2 −1)S 2
p =
S n1 + n2 −2 2
is indeed, and unbiased estimator of σ . Recalling the chi-square
2 2
(n1 −1)S 1 (n 2−1 )S2
2
and 2
distributions, the independent random variables σ σ have the chi-square
distribution with n1-1 and n2-1 degrees of freedom, and their sum
( n1 −1) S 2 ( n2 −1 ) S 2 ( n1 + n2 −2) S 2p
1 2
+ =
Y= σ2 σ2 σ2 has a chi-square distribution with n1+n2-2 degrees of
freedom. Since the random variables Z and Y are independent, then
Z ( X 1− X 2 )−( μ1 −μ2 )
=
T = √ y
n1 + n2 −2
Sp
√ 1 1
+
n 1 n2 has a t-distribution with n1+n2+2 degrees of
freedom and hence substituting this expression for t- in to
(−t α <T <t α )=1−α ,
P 2
,n−2
2
, n−1 X and X 2 and S and S are the values of the
hence for μ1 −μ 2 if 1 1 2
means and the standard deviations of independent random samples of size n 1 and n2 from normal
populations with equal variances, then
( X 1 −X 2 )−t α
2
,n 1+n 2−2
×Sp
√ 1 1
+ ¿ μ 1−μ2 ¿( X 1−X 2 )+t α
n1 n2 , n +n −2
2 1 2
¿ Sp
1 1
+
n1 n2 √ is a
(1- α )100% confidence interval for the difference between the two population means.
Example: 1 A study has been made to compare the nicotine contents of two brands of tea leaves.
Ten packets of tealeaves of “wushwush” had an average nicotine content of 3mg with a variance of
0.25mg while eight packets of tea leaves of “Ambessa” had an average nicotine content of 2.8mg
with a variance of 0.49mg. Assuming that the two sets of data are independent random samples
from normal populations with equal variance, construct a 0.95 confidence interval for the difference
between the mean nicotine contents of the two types of tea leaves.
75
Solution:
Given:
x 1=3 mg ; x 2=2 . 8 mg , and
2 2
n1 = 10, n2 = 8, S 1 =0 .25→ S1 =0 . 5 , S2 =0 . 49→ S R =0 . 7 ,
√ √
2 2
(n ,−1) S1 +(n 2−1 )S2 (10−1)0 .25+(8−1)0 . 49
= =
then Sp = n1 +n2 −2 10+8−2
= √ 9×0 . 25+7×0. 49
16
=0 .596
and since n1<30 and n2< 30 we use t – distribution, and
t α ,n 1 +n2 −2
2 = t0.025, 16 = 2.120(from t-table), then a 95% confidence interval:
(
X 1 −X 2 )−t α
2
,n 1+n 2−2
×Sp
√ 1 1
+ ¿ μ1−μ 2 ¿( X 1− X 2 )+t α
n1 n2 , n +n −2
2 1 2 √
¿ Sp
1 1
+ =
n 1 n2
1 1
√
+ <μ 1−μ2 <(3−2. 8 )+2. 120×0. 596×
=(3–2.8)–2.120 ¿ 0.596 10 8
1 1
+ =
10 8 √
=0.2-1.26352 ¿ 0.474341649< μ1 −μ 2 <0.2+1.26352 ¿ 0.474341649=-0.39934016< μ1 −μ 2
<0.79934016
⇒ -0.4< μ1 −μ 2 <0.8
76
5.4.3 Estimation of proportions
In the analyses of qualitative data or to estimate proportions, probabilities, percentages or rates
which is reasonable to assume sampling a binomial population and hence, that our issue is to
estimate the binomial parameter θ . Of course with fact that for large n value the binomial
x−nθ
Ζ=
distribution can be approximated with a normal distribution. i.e., √ nθ(1−θ) is a standard
normal distribution.
P(−Ζ α < Ζ< Ζ α =1−α P(−

2 2
Z α<
2 nθ(1−θ )
<
x −nθ
Z α =1−α
2
¿ ¿
√ √
¿
P( θ −Ζ α ×
θ (1−θ )
n
<θ< θ +Ζ α ×
θ ( 1−θ )
n
)=1−α ,
θ x
=n
2 2 Where
Example: In a random sample 200 of 500 persons had a garden. Construct a 95% confidence
interval for the true proportion of persons who had a garden
Solution:
¿
θ 200 2
=500 = 5 Ζα= Z 0 . 025=1 .96
Given n = 500, and 2 then the 95% confidence interval proportion is
√ √
2 2 2 2
×(1− ) (1− )
2 5 5 2 5 5
−1. 96 <θ< + 1. 96× =
5 500 5 500
=0.4 – 1.96 ¿ 0 .021902<θ<0 . 4+1 . 96×0. 021908902
0.357 < θ < 0.443
5.4.4 Estimation of differences between proportions

In many problems we estimate the difference between the binomial parameters θ1 and θ2 on the
bases of independent random samples of size n1and n2from two binomial populations.
For instance, if we want to estimate the difference between the proportions of two categories
If the respective numbers of successes are x1 and x2, and the corresponding sample proportions are
¿
x ¿
x2
θ 1= 1
, and , θ2= , Λ Λ Λ Λ
denoted by n 1 n 2 the sampling distribution of θ1 −θ2 is E(θ1 −θ2 )=θ 1−θ 2 and
77
θ 1(1−θ1 ) θ2 (1−θ2 )
Λ Λ = +
var ( θ1 −θ2 ) n1 n2 and for large sample, X1 and X2 and also their difference
approximated with normal distributions
Λ Λ
(θ1 −θ1 )−( θ1 −θ2 )
Z= √ θ1 (1−θ1 ) θ2 (1−θ2 )
n1
+
n2 is a random variable having approximately the standard normal
distribution.
Then an approximate (1- α ) 100% confidence interval for θ1 −θ2 is

Ζ α < Ζ <Ζ α )=1−α
p (- 2 2
√
Λ Λ Λ Λ
θ1 ( 1−θ1 ) θ 2 ( 1−θ 2 )
Λ Λ Ζ α ×¿ ¿ + Λ Λ
P[( θ 1- θ 2) - 2
n1 n2 ¿ θ1 −θ2 <( θ1 −θ2 )+
√
Λ Λ Λ Λ
θ1 ( 1−θ1 ) θ2 ( 1−θ2 )
Zα × + ]=1−α
2
n1 n2
Hence, it is:
√ √
Λ Λ Λ Λ Λ Λ Λ Λ
Λ Λ θ ( 1−θ1 θ2 ( 1−θ2 ) Λ Λ θ ( 1−θ1 ) θ2 (1−θ2 )
θ1 −θ2 )−Ζ α ¿ 1 + <θ 1−θ 2 <(θ 1−θ 2 )+ Z α × 1 +
n1 n2 n1 n2
( 2 2
Example: In a random sample of visitors to a famous tourist attraction 132 of 200 men and 90 of
150 women bought souvenirs. Construct a 99% confidence interval for the difference between the
true proportions of men and women who buy souvenirs at this tourist attraction.
Solution:
Λ
132 Λ 90
θ1 = ,θ = , Ζ α =Ζ 1% =Z 0 .005 =2 .575
200 2 150
Determining 2 2 , then the confidence interval is
√ √
Λ Λ Λ Λ Λ Λ Λ Λ
Λ Λ θ (1−θ 1 ) θ2 (1−θ2 ) Λ Λ θ1 (1−θ1 ) θ 2 (1−θ 2
θ1 −θ2 )−Ζ α × 1 + θ1 −θ2 <( θ1 −θ2 )+Ζ α +
n n2 n1 n2
( 2 1 < 2 ¿
78
(0.66–0.60)–2.575 ¿
¿
√ 0.66(1−0.66) 0.60(1−0.60)
200
+
150 <
θ1−θ2<(0.66−0.60 )+2.575×
√ 0.66(1−0.66) 0.60(1−0.60 )
200
+
150
- 0.074 < θ1 −θ2 <¿ ¿ 0.194

5.4.5 Estimation of Variances
S
2
If we have n independent observations with the variance of a random sample from a normal
population, then a random variable having a chi-square distribution with n-1 degrees of freedom
(n−1 )S 2
σ2 .
2
2 ( n−1) S
1−α < < X 2α ]=1−α
Thus, P[X 2
, n−1 σ2 2
,n−1
( n−1 ) S 2 2 (n−1) S 2
<σ < 2 ]=1−α ,
X 2α X 1−α
, n−1 ,n−1
P[ 2 2
2 2 α 1−α
α 1−α
,n−1 ,n−1
Where X2 and X 2 are the chi-square value with 2 and 2 level significances from the
lower and upper (left and right) part of the distribution and with n-1 degrees of freedom.
2
There fore, a (1- α )100 % confidence interval for σ is
2 2
( n−1 )S ( n−1) S
2
< σ 2< 2
Xα X 1−α
, n−1 ,n−1
2 2
Example: 1 The length of the skulls of 10 fossil skeletons of an extinct species of bird has a mean
of 5.68cm and a standard deviation of 0.29cm assuming that such measurements are normally
distributed. Construct a 95% confidence interval for the true variance of the skull length of the
given species of bird.
Solution:
2
Given: n = 10, S = 0.29 →S =0. 0841
2 2
X 0.025 ,9=19.023 ; X 0 .95 ,9 =3 .325
79
2
0(0 . 29)2 9(0 .29 )
⇒ <σ 2<
19 . 023 3 . 325 ¿ 0.04 < σ 2 <0 .23 = 0.2 < σ <09. 48
5.4.6 Estimation of ratio of two variances
2
If two independent random samples of size n1 and n2 and the with corresponding variances S 1 and
2 2
S 2 were drawn from two normal populations (or the same normal population) then S 1 is
2
distributed chi-square with n1-1 degrees of freedom, and S 2 is distributed chi-square with n2-1
2 2
σ 2 S1
2 2
degrees of freedom. The ratio F = σ 1 S2 is a random variable having an F – distribution with n1-1
and n2 -1 degrees of freedom. Thus we can write
[ ]
2
σ 2 S1
2
f 1−α ,n 1,−1n 2−1 < < f α , n ,−1 , n2 −1 =1−α ,
2 σ S22
22 2
Thus we can write: P[
f α ,n 1−1,n −1 f 1−α ,n 1−1 ,n −1 ,
2 2
Where 2 and 2 are F-distribution as stated from 5.4 and
1
f 1−α ,n 1−1 ,n −1
2 f α
, n −1 , n1−1
2 = 2 2
2 2
There fore, for S 1 > S2 ;
σ
2
S
n −1 , n −1
2
S1 1 1 12 σ
× < < ×f α , 1 12
n −1 ,n −1
2
S 22 σ
22
S
22 2
f , σ
is a (1- α )100% confidence interval for
α 1 2
2 22 .
σ
12
σ
Example: 1 with reference to example 1 of section 6.4.2, find a 98% confidence interval for 22 .
Solution:
2 2
Given n1=10, n2=8, S 1 =0 .25 , S 2 =0 . 49 and f 0. 01, 9,7 = 6,72 and f 0. 01, 7,9 = 5.61 from F-
σ 2
0. 25 1 1 0 . 25
× < < ×5 . 61
0. 25 6 .72 σ 2 0 . 49
table. Then, 2
80
σ
12
<2 . 862
σ
¿ 0.076< 22
CHAPTER 6: HYPOTHESIS TESTING
6.1 Basic principles in design and evaluation of hypothesis

A hypothesis is an idea about something which is given by the researcher /person/ as an option for
the given issue or it is an assertion regarding the value of a population parameter based on the
sample information. Statistical hypothesis testing is used for different fields of study and for
economics in particular. We have two types of statistical hypothesis:
Null hypothesis (Ho):
It is an assertion that a population parameter assumes a fixed (beginning) value. Usually it is a
statement of inert (no effect). It always includes the equality sign and is denoted by Ho.
Alternative hypothesis (H1):
It is the alternative available when the null hypothesis has to be rejected. It is the complement of the
null hypothesis. It always includes the inequality sign and is denoted by H1.
Types of testing:
a. one – tailed test:
A test of hypothesis, when the alternative hypothesis (H1) is given by the inequality sign, because it
is given in one direction
Example: - if Ho: μ = 0
H1: μ > 0 or H1: μ < 0
b. Two – tailed tests:
If the alternative hypothesis is given by the statement or symbol ( ¿ )
Example: if Ho: μ = 0
H 1: μ ¿ 0, since H1: has a compound statement this type of testing is called two
tailed test.
81
Hypothesis testing
It is a procedure for examining the validity of the statistical hypothesis.
Critical value:
It is the boundary/ demarcation point between the acceptance and the rejection area.
Test statistic:
It is a value which is computed from a sample that is used to make a decision whether the null
hypothesis should be accepted or rejected.
Types of errors:
a) Type I error: The rejecting of Ho, when it is actually true. It is the probability of rejecting
Ho, while it is actually true is equal to α and α is called level of significance.
b) Type II error: Not rejecting Ho, while it is false. The probability of type II error is equal to
β.
Level of significance ( α )
It is the probability of type I error, which is the probability that depicts the level of confidence for
the researchers conclusion. It is usually understood as the allowable error.
Hypothesis testing procedures (steps)
In hypothesis testing one has to follow the following general steps:
Step one: determining the null and alternative hypothesis
Step two: choosing the level of significance. It is referring to the probability of making a type I
error ( α )
Step three: determining the test statistic
Step four: define a decision rule (the rejection or critical regions) based on the level of significance
chosen is step 2. And the sampling distribution from step 3
Step five: performing the statistical test. Calculate the test statistic based on the available
information from the sample.
Step six: compare the test statistic with the decision rules and draw the statistical conclusion.
6.2 Tests about the mean, proportion and variances
6.2.1. Tests about the mean
The test regarding a single population parameter (say μ ) may take one of the following forms:
I. Two tailed test:
Ho:------------
μ=μ 0
H1: -----------
μ≠μ 0
II. one- tailed test:
82
a) Ho:------------
μ=μ 0
H1: -----------
μ< μ0
b) Ho:------------
μ=μ 0
H1: -----------
μ> μ0
Where
μ0 is a specified (hypothesized) value for the population mean.
Test rule for testing about a population mean (Large samples)
If the sample size is large (n ¿ 30) or the population standard deviation ( σ ) is given, use the normal
distribution (z distribution.)
α ¿−Z α
i) Reject the Ho, if Z cal ¿ Z z or Zcal 2 fortwo tail test
ii) Reject Ho,if Zcal

¿ Z α , for alternative --------------- μ< μ0
iii) Reject Ho,if Zcal

¿ Z α , for alternative ------------------ μ> μ0
Where zcal → normal distribution value, which is obtained by calculation using the formula and
α
Z 2 or Z α is normal distribution value, which is obtained from the Z-table.
x− μ→ 0
σ
( )
and Zcal = √n if the standard deviation is given ( σ − is known)
x− μ→ 0
s
( )
or Zcal = √n if the population standard deviation is not given ( σ is unknown), but n is large.
x− is the sample mean,
μ0 - is the population mean
σ - is the population standard deviation
S- is the sample standard deviation

n- is the sample size
Test rule for testing about a population mean (small sample)
If the sample size is small (n<30) and population standard deviation is not given ( σ − unknown),
use t-distribution (student distribution)
83
tα ¿−t x
i) Reject Ho: if tcal ¿ ,
2 (n-1) or tcal ,
2 (n-1)
ii) Reject Ho: if tcal< - t α , (n−1) for alternative

μ< μ0
iii) Reject Ho. if tcal> t α ,(n−1) for alternative

μ> μ0
x −μ 0
S
( )
Where tcal = √n
Example: - 1. The ADLI strategy will continue its policy if it brings an impact on house hold
earnings. So that it will be resumed, if the mean annual house hold income reaches at least 24, 000
Birr. If the government is checking the condition frequently and took a random sample of 100
house holds to be 23, 600 Birr with a standard deviation of 4000 Birr. At the 5% level of
significant, would you conclude that ADLI should continue as a strategy or not?
Solution: Given:
μ0 = 24,000, x = 23,600, S = 4000, n = 100
H0:
μ0 ≥24000
H1:
μ0 < 24000
Level of significance α = 0.05
Test statistic:
x−μ0 23600−24000 −4
= =
S 4000 4
( ) ( )
Zcal = √n √100 = -1
α=0 . 05⇒ Z α =1 .64 ⇒−Z α =−1 .64 (From Z-table)
Since
Z cal <Z α ⇒ 1<1.64, then accepts H :
0
Therefore, the strategy may continue as a policy instrument.

Example 2: The production of garment in Bahir Dar textile factory is normally distributed with
mean 400 and standard deviation 32, assuming that a random sample of 100 parts is taken from the
whole production in a particular period of time yielded a mean 406.
Test the mean using the level of significance α = 0.05.
Solution: Given
μ0 = 400, n =100, x = 406,  = 32
Ho:  = 400
84
H1: μ≠¿ ¿ 400
σ = 0.05
x −μ 0 406−400
=
σ 32
Zcal = √n √ 100 = 1.875
α =Z 0. 025
Z 2 =1.96
Since 1.875 does not fall in the rejection region, the null hypothesis is not rejected. No significant
evidence from the sample at α = 0.05, to conclude that the mean different from 400.
The P-value for the test is
P[/Z/ ¿ 1.88] = 2[0.5 – P(0 ¿ Z ¿ 1.88)] = 2[0.5 – 0.4699] = 2(0.0301) = 0.0602
Hence, since p - value = 0.0602 > α , accept H0:
Tests involving two populations’ means (large samples)

According to the socio-economic phenomenon the researcher may be interested in comparing two
populations with respect to a given characteristics (in particular two population means). For
example, mean productivity of labor of a company under different conditions, mean performances
of two groups of farmers, mean performance of two groups of students in this course, etc.
Here the six step hypothesis – testing procedure is the same one used for one sample tests. And the
type of testing have the same as in the previous, i.e., posses one-tailed test and two-tailed test.
However, the formula for the test statistic Z (Zcal) is slightly different, and The difference between
two normal distribution is also a normal distribution.
Tests’ concerning two populations means (large samples)

The test distribution, where the sample size (n) is large, and the population standard deviations are
given (known) or if population standard deviations are unknown.
x 1−x 2
Use, Zcal= √ σ 21 σ 22
+
n1 n2 , for n – is large and σ 21 ∧σ 22 are known
85
x1 −x 2
Use Zcal = √ S 21 S 22
+
n 1 n2 , for n – is large, but σ 21 ∧σ 22 are unknown (not given)
Test concerning two populations means (small sample sizes)

2 2
Under this condition, if these samples have the variances S 1 ∧S 2 , for the two same normal
populations and assuming that the two populations have equal variances, then we can find the
common standard deviation the so called pooled standard deviation.
√
2 2
(n1−1)S 1 +(n 2−1 )S 2
Hence, Sp = n1 +n 2−2
Then use student (t – distribution) for test statistic.

x 1 −x2
Then t =
Sp
√ 1 1
+
n 1 n 2 is t – distribution with n + n -2 degrees of freedom and for n<30(small)
1 2
Example: 1) A farmer is interested to know the efficiency of teff production. It is hypothesized that
there is no difference between the mean product of teff in << red soil>> and << Black soil>>. A
random sample of 35 hectares of red soil land and 45 hectors of black soil land were tested. The
mean product of teff in red soil was 3.8 tons with variance 0.64 tone. The mean product of teff in
black soil was 3.5 tons with a standard deviation of 0.25 tones. At the 0.05 level of significance,
determine for the farmer if there is a significant difference in the outputs of the two groups of lands.
Solution:
Given:
x 1= 3.8, σ 21 = 0.64, n1 = 35
x 2= 3.5, 2
σ 1 = 0.25, n2 = 45, α = 0.05
And let μ1 and μ2 is the mean output production of teff in the red soil and black soil
respectively.
H0: μ1 =μ 2
H1: μ1 ≠μ 2 , then we use two tail test.

α = 0.05
Since n1 ¿ 30 and n2 ¿ 30, then we use the normal distribution formula.
86
x 1−x 2 3 . 8−3 . 5
= =2
√ √
σ 21 σ 22 0 . 64 0 .25
+ +
n1 n2 35 45
Zcal =
α α
( )
Ztab = Z 2 = Z(0.025) = 1.96 ⇒ Zcal> Z 2 , then we reject H0. Therefore there is significant
difference between the mean products (output) of the two groups of lands.
Example 2) Suppose the teacher is interested to compare the performance of students in two
groups, assuming that the 1st group has 5 students while the 2nd group has 6 students. And the
sample means and variances are 4 and 5; 8.5 and 4.4 for the 1 st and 2nd group respectively. Test
whether the two groups have different population means at level of significance 0.05.
Solution:
2 2
Given: n1 = 5, S 1 = 8.5; n2 = 6, S 2 = 4.4 and α = 0.05.
Since n1< 30 and n2< 30, then we use t – distribution.
And since the population variances are assumed to be same, we have to compute the pooled
variance.
√ √
2 2
(n1−1)S 1 +(n 2−1 )S 2 (5−1 )( 8. 5 )+(6−1)(4 . 4 )
= = √ 6 . 22≈2 . 49
Sp = n1 +n 2−2 5+ 6−2
H0: μ1 =μ 2
H1: μ1 ≠μ 2 , then we use to tail test

α = 0.05
x 1 −x2 4−5
=
tcal =
Sp
√ 1 1
+
n1 n2
1 1
2 . 49 +
√
5 6 = -0.66
α
, n 1+n2 −2
ttab = t 2 = t(0.025),(5+6-2) = t(0.025),9 = -2.262
α
⇒ /tcal/ < ttab ¿ tcal> - t 2 , we accept H0:
Hence, there is no significanct evidence from the sample to say that the two populations have the
same mean.
87
6.2.2 Tests about single proportions
If we have a population which consists of two groups (categories) and if we call one categories of
our interest, for instance the type of questions that consists “yes” or “No” answer, or items
produced can be defective or non – defective, or a possible answer say absent or present, etc. If our
objective is to identify one of form or present, etc. if our objective is to identify one of from the two
categories, the proportion of interest is the proportion of that identified items in the population
(sample area).
There fore when we have two categories, the proportion of interest ‘P’ can be taken as the
probability of success in binomial experiment and its complement, q = 1-p, can be taken as the
probability of failure in the binomial experiment.
Tests about a single proportion
The test regarding a single population proportion parameter “P 0” may take one of the following
forms:
I. Two – tailed test:
H0: P = P0
H1: P ¿ P0
II. One – tailed test:
a) H0: P = P0
H1: P < P0
b) H0: P = P0
H1: P > P0
Where, P0 is a specified (hypothesized) value for the population proportion.
Test rule for testing about a single proportion, when n is small (n<30)
In a sample of n there might be k elements that could be produced which belong to the category of
k
interest. Then P = n is called sample proportion. And the binomial probability distribution will be:
n k n−k
b(x = k) = (k ) P (1−p ) , for k = 0, 1, 2, …, n
Test rule and statistic

i) H0: P = P0
H1: P ¿ P0
α
Reject H0: if b(x ¿ k) or b(x ¿ k) < 2 where k > nP0 or k < nP0 respectively
88
ii) H0: P = P0
H1: P < P0
n
∑ (nx ). P x0(1−P0 )n−x

Reject H0: if b(x ¿ k) ¿ α , where b(x ¿ k ) = x =0
iii) H0: P = P0
H1: P > P0
Reject H0: if b(x ¿ k )>α , where b(x ¿ k) = 1-b(x < k).
Test statistic for single proportion, when the sample size is large (n ¿ 30)
If n ¿ 30, then use the normal distribution, with mean po and standard deviation
σ=
√ po(1− po )
n
,
p− po p− p0
=
√
σ p o ( 1−p o )
then test statistic: Zcal = n
Test rule
i) Ho: p = po
α ¿−Z α
H1: p ¿ po; reject Ho; if Zcal ¿ Z Z or Zcal 2
ii) Ho: p = po
H1: p < po; reject Ho: if Zcal

¿−Z α
iii) Ho: p = po
H1: p < po; reject Ho: if Zcal

¿ Zα
Tests about two populations proportions
In this case we can apply the chi-square test and the Z- test. The chi-square test is used in the case
of two – tailed test. Where as Z-test employed both for one-tailed and two – tailed test alternatives.
There fore, for Z-test, let us suppose that p1 and p2 are the proportion (percentage) elements having
the characteristics of interest in population one and population two respectively. Then the test rule
for these proportions, i.e., the difference between p1 and p2 is:
i) H0: P1-P2 = 0
H1: P1-P2 ¿ 0
89
α
Reject Ho: if /Zcal/ > Z 2
ii) Ho: P1 – P2 = 0
H1: P1 – P2>0
Reject Ho: if Zcal< Z α
iii) Ho: P1-P2 = 0
H1: P1 – P2< 0
Reject H0: if Zcal> Z α
Test static for n1 and n2 samples:
Λ Λ
p 1− p2
√
− −
p (1− p ) p (1− p)
+
Zcal = n1 n2
Λ
Where, p1 - the proportion (percentage) of sample one with the desired characteristic,
Λ
Where, p2 - the proportion (percentage) of sample two with the desired characteristic
x1+ x2
p= ,
n 1 +n2 and
X1= the number of elements having the desired characteristic in sample one.
X2= the number of elements having the desired characteristic in sample two.
Example
1) If x = 5 of n = 20 patients suffered serious side effects from a given medication, test the null
hypothesis P=0.50 against the alternative hypothesis P ¿ 0.50 at the 5% level of significance.
Here P is the true proportion of patients suffering serious side effects form the medication.
Solution
Ho: p = 0.5
H1: p = 0.5
α=0 . 05
n< 30 ⇒ n = 20, and k = 5<20 ¿ 0.5, then use b( x≤5 )
at x≤5 ⇒b ( x ≤5)=b( x=0 ) +b(x = 1)+b(x = 2) + b(x = 3)+b(x = 4) + b(x =5)=
( 0 .5 ) ( 0 .5 )
1
20 0 20 20 19
0 )( 0 .5 ) ( 1−0 .5 ) +( 1 ) +(20 2 18 20 5 15
2 )( 0. 5 ) ( 0 .5 ) +.. .+( 5 )( 0 .5 ) ( 0 . 5) =
=( \
=0.000000953+0.000019073+0.000181198 + 0.001087188+0.004620552+0.014785766=
= 0.020694730
90
α
¿ 5)<
b(x 2
0.0207 < 0.25, we reject Ho: or p-valve = 2(0.02069) =0.0414 and α =0.05 then, P-value < α and
we conclude that p ¿ 0 .5 then, p-value < α .
and we conclude that p ¿ 0 .5
2. With reference to the above example, show that the critical region is x ¿ 15 and that,
corresponding to this critical region the level of significance is 0.0414
Solution:-
The procedure is the same as we did above. And K = 15 ¿20×0. 5, then use b(x ¿15) and form the
bionomical rule for n = 20, p = 0.5, b(X ¿ 15)=b( X ≤5), hence p(X ¿ 15)≈0 . 02069
And P-value = 2(0.02069) ¿0,0414 is the critical value.

Example 3) Suppose a consumer claims that 50% of the societies have not satisfied on its purchase.
Test this claim at the 5% level of significance, if a random check revels that 150 of 200 consumers
have not satisfied on its purchase?
Solution:
Ho: p = 0.5
H1: P ¿ 0.5
k 150 3
⇒n=200 ; k =150; p= = =
α=0 . 05 n ¿ 30, n 200 4
p− p0 k−np0 150−200(0 . 5) 50 0.75−0.5
or = =
Zcal = √
p 0 (1−p 0 ) √ np0 (1− p 0 ) √ 200(0 .5 )(0 .5 ) 7. 071067812
n √
(0 .5)(0 .5)
= 200
0. 25
= 0. 035355339 = 7.07107
α =Z 0. 025 =1. 96 ,
Ztab = Z 2 then rejects H0:
7.2.3 Test about the variances
There are lot of reasons why this type of test is important to test hypotheses concerning the variance
of populations. There distinguishes between direct and indirect applications. The direct applications
will have to perform tests about the variability of his product; an instructor may want to know about
statements that may be true about the variability that he or she can expect in the performance of a
student, etc. And indirect applications are concerned, tests about variances are usually prerequisites
91
for tests regarding to other parameters. For example in the case of two samples t-test requires the
two population variances be equal, and hence etc.
2 2
The test regarding whether a given population variance ( σ ) equals to a specified vale σ 0 , may
take one of the following form:
I. Two-tailed test:
2 2
H0: σ =σ 0
2 2
H1: σ ≠σ 0
II. One-tailed test
2 2
a. H0: σ =σ 0
2 2
H1: σ <σ 0
2 2
b. H0: σ =σ 0
2 2
H1: σ >σ 0
Test statistic:
Chi – square distribution is employed for the test-statistic.
X 2cal =
( n−1) S
2
, ∑ ( xi−x)2
then σ 20 where S2 = n−1
Test rule
For chi-square distribution and for a fixed α at n –1 degrees of freedom, we have:
2
i) H0: σ 2=σ 0
2
H1: σ 2≠σ 0
X 2cal > X 2α 2
X 1−α
Reject Ho: if 2 or X cal ≤ 2
2
ii. Ho: σ 2=σ 0
92
2
H1: σ 2<σ 0
2 2
Reject H0: if X cal ≤X 1−α
2
iii. Ho: σ 2=σ 0
2
Ho: σ 2>σ 0
2 2
Reject H0: if X cal ≥X α
Testing the equality of two populations variances
Given independent random samples of size n 1 and n2 from two normal populations with the
2 2
variances σ 1 and σ 2 , the test for equality of the two variances may take one of the following form:
I. Two – tailed test:
2 2
H0: σ 1 =σ 1
2 2
H1: σ 1 ≠σ 2
II. One – tailed test
2 2
a. H0: σ 1 =σ 1
2 2
H1: σ 1 <σ 2
2 2
b. H0: σ 1 =σ 1
2 2
σ 1 >σ 2
H:1
Test statistic
S 21
2 2 2
F – Distribution is employed for the test, statistic then F cal= S 2 , where S 1 and S 2 are the sample
variances drawn from sample one of population one and sample two of population two.
Test rule:
For F-distribution, with n1-1= V1 and n2-1= V2 degrees of freedom, we have:
2 2
i) H0: σ 1 =σ 2
2 2
H1: σ 1 ≠σ 2
93
α
Reject H0: if Fcal ¿ F 2 , (n1-1,n2-1)
¿ F 1−α
,(n 1−1,n 2−1)
Or, if Fcal 2
2 2
ii) H0: σ 1 =σ 2
2 2
H1: σ 1 <σ 2
¿ F1−α ,(n −1, n −1 )
Reject H0: if Fcal 1 2
2 2
iii. H0: σ 1 =σ 2
2 2
H1: σ 1 >σ 2
Reject H0: if Fcal

¿ Fα ¿
Example: 1 A sample of n = 19 drawn a normally distributed population depicts a variance of S2
σ
2
= 2.5. Test the hypothesis that the population variance 0 =3 .5 against the alternative σ ≠3 .5 at
5% level of significance.
Solution
2
Ho: σ =3 . 5
2
H1: σ ≠3 . 5
α=0 . 05
2
Xα
Reject H0: if X 2cal > 2
,( n−1)
S2 = 2.5
2
(n−1)S 18×2. 5 18×5
X 2cal = = = ≈12 . 857
σ 20 3.5 7
2 2 2
X tab = X α =X 0 . 025 , 18 =31. 5 2
, n−1 2 18
2 , Since X cal =12 .9< X 0 . 025 ,=31. 5 , we accept Ho. Which means that
2
there is no significant evidence from the sample to conclude that alternative σ is different form
3.5
Example: 2In comparing the variability of the test of two kinds of beers, an experiment gave the
2 2
following results: n1 = 13, S 1 =21 and n2 = 16, S 2 =5. Assuming that these two independent
94
2 2 2 2
samples evolved form two normal populations, test H0: σ 1 =σ 2 against H1: σ 1 ≠σ 2 at 5%level of
significance.
Solution
2 2
H0: σ 1 =σ 2
2 2
H1: σ 1 ≠σ 2
α=0 .05 ; n1 = 13; n2 = 16
¿ Fα
Rejected H0: if Fcal 2 ,( n1−1 , n2−1)
F tab = F α
2
, ( n 1−1 , n 2−1 )
= F 0 . 025 , (12 , 15 ) =2. 48
S
2
F = 1 21
2= 5 =4 . 2
S
cal
2
Since Fcal> Ftab, the H0: rejected and conclude that the variability of the test of the two kinds of Beers
is not the same (i.e., they are differ one with another).
References
1. Freund, J.E.: Mathematical Statistics,6th edition, India, Rahul Graphics,2002
2. Fred Bown.: General Statistics, 3rd edition, Framingham State College, John Wiley & Sons
Inc,1996
3. Sheldon, M.R.: Introductory Statistics, University of California, Berkley. The McGraw-Hill
companies Inc,1996
4. Steven, C.A.: Statistics, Concepts & applications, CambridgeUniversity press,1995
5. William, G.C.: Sampling Techniques 3rd edition, John Wiley & Sons, Newyork,1999
6. Robert Loveday.: Statistics, CambridgeUniversity press,1977
7. Monga, G.S.: Mathematics & Statistics for economists, Vikas publishing House PVT
LTD,1996
8. Dominick Salvatore. : Schaum’s outline of Statistics & Econometrics,2nd edition,McGraw-
Hill,2005
95

Statistics For Economists

Uploaded by

Copyright:

Available Formats

Statistics For Economists

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Economists

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

OVERVIEW OF BASIC PROBABILITY THEORY

Thus A = Y1, U Y2 U Y3 U …… = i=1

exclusive, the third postulate of probability gives:

P(A) = P(Y1) + P(Y2) + P(Y3) + …… = i=1 which completes the proof.

P(B) = P(A) + P(A1 ¿ B)  by postulate 3

440 200 150

P(A U B U C) = P(A) + P(B) + P(C) – P(A ¿ B) – P(A ¿ C) – P(B ¿ C) + P(A ¿ B ¿ C).

= P(A) + P(B U C) – P[A ¿ (B U C)]  theorem 1.7

= P(A) + P(B) + P(C) – P(B ¿ C) – P[(A ¿ (B U C)]

= P(A) + P(B) + P(C) – P(B ¿ C) – P[(A ¿ B) U (A ¿ C)]

= P(A)+P(B)+P(C)–P(B ¿ C)–{P(A ¿ B)+P(A ¿ C)– P[(A ¿ B ¿ (A ¿ C)] }

P (A U B U C) = P (A) +P (B) +P(C)–P (B ¿ C)-P (A ¿ B)–P (A ¿ C)+P(A ¿ B ¿ C)

3rd the event (A ¿ B) = {(6, 5)}

Therefore P (B/A) = P (A ¿ B) = 1/36 = ½.

P(A ¿ B) ≠ 0, then P(A ¿ B ¿ C) = P(A). P(B/A). P[C/(A ¿ B)]

Proof: - It is known that A ¿ B ¿ C= (A ¿ B) ¿ C using theorem 2.9 we get:

P(A ¿ B ¿ C)=P[(A ¿ B) ¿ C]=P(A ¿ B).P[C/(A ¿ B)]= P(A). P (B/A).P[C/(A ¿ B)]

P(A ¿ B ¿ C) = P(A).P(B/A). P[C/A ¿ B)] = 5/20 x 4/19 x 3/18 =1/14.

iii. P(A ¿ B) = P(A). P(B)

P (A ¿ B) = P(A). P (B) = 0.01 x 0.02 = 0.0002

I.e. P (A ¿ B) = P(A).P(B) → P(AnB1) = P(A).P(B1)

Proof: - since A= (A ¿ B) ¿ (A ¿ B1) → are mutually exclusive

P (A) = P [(A ¿ B) ¿ (A ¿ B1)]

P (A) = P [(A ¿ B) + P (A ¿ B1) – postulate 3 and theorem 1.1

P (A) = P (A). P (B) + P (A ¿ B1) – given that A and B are independent

P (A) - P (A). P (B) = P (A ¿ B1) – algebra

P (A) [1 –P (B)] = P (A ¿ B1) → distributive property

Solution: - P (A) = 0.06 +0.06 + 0.24 + 0.24 = 0.6

1.3.2. Bayes’ theorem

events B1 B2… BK we have A= (A ¿ B1) ¿ (A ¿ B2) ¿ ….. ¿ (A ¿ BK).

P (A) = P (A ¿ B1) +P(A ¿ B2) + …. +P(A ¿ Bk)

= P (B 1). P (A/B1) + P (B2). P (A/B2) + ….+P (Bk). P (A/Bk) = i=1 for

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

2.2. Expected value and variance of discrete random variable

( )( ) ways, 3 of the 12 sets can be chosen in ( ) ways and, then ( )

X 3−X 3 3 possibilities are equi-

= 1/6 [3 + 9 +19 +33 + 51+73] = 188/6 = 94/3

x =1 = f(x1) +f(x2) +,,,,+ f(xn)=1

Therefore E (a X + b) = (aX 1 + b)f(X1)+ (aX2 +b) f(x2) +…+(aXn + b) f(xn) = x =1 =

2. if we set b = 0 or a = 0, it follows from the above that;

Find the expected value of g(x) = 4x2

σ σ is the positive square root of the variance, and is called the

it is denoted by , var (x);

Variance of X in terms of expectation

= E [X2] – 2 μ E [x] + E [ μ 2] / ∵ E(x2) = ∑x2 f(x)

= Var (x+b) = Var (x), where b is constant.

2. var (x), where a is constant

If X is variance σ , then var (ax+b) = a σ

Or var (ax + b) = var (ax) = a2 var (x)

μ = E(x ) = 1 x1/6 + 2 x1/6 + 3 x1/6 +4 x1/6 + 5 x1/6 + 6 x1/6 = 91/6

= μ – ( μ ) = 91/6 – (7/2) = 35/12

Or f(x) = 1/6; X =1, 2, 3,4,5,6

+10+17+26+37]= 97/6 = i=1

= (2275/6) - (9406/36) = (13650-9406)/36= (4241/36) = var(x2)

2.3. Continuous random variables (CRV)

2.4 Expected value and variance of a continuous random variable

{e ,for, x>0,and ¿ ¿¿¿

1. If X has the probability density f(x) =