2.1 The Concept of A Random Variable: 2. Random Variables and Probability Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

2.

Random Variables and Probability Distributions


2.1 The Concept of a Random Variable
Introduction
In the first chapter, we covered basic probability concepts and computation of
probability of an event. In many cases, however, we are not interested in knowing which
of the outcomes has occurred. Rather we are interested in knowing the numbers
associated with outcomes of an experiment. Thus, we associate a real number with each
outcome. In other words, we are considering a function whose domain is the set of all
possible outcomes (sample space, S) and whose range is the subset of real numbers.
Such function is known as the random variable (rv). This implies that random variable is
not a variable but it is a function that maps the elements of S into the real number. That
is, a rv transforms the outcome of an experiment into a real number.
Definition: Let X represent a function that associates a real number with each and every
elementary event in a sample space, S. Then X is known as a random variable.
Alternatively, a random variable is a function from the sample space S to the real
numbers.
NB: the random variable is denoted by capital letters like X, Y, Z, …, and the specific
value it takes is represented by small letters like x, y z, ….
Examples:
1. Consider an experiment of tossing two coins. Let the random variable X denote the
number of heads, and let the random variable Y denote the number of tails. Then S =
{HH, HT, TH, TT}, and X(x) = 2 if x = HH, X(x) = 1 if x = HT or TH, X(x) = 0 if
x=TT. Similarly, Y(y) = 2 if y = TT, Y(y) = 1 if y = HT or TH, Y(y) = 0 if y = HH

Possible outcomes X=x Y X+Y


(domain) (range)

HH 2 0 2

HT 1 1 2

TH 1 1 2

TT 0 2 2

1
We say that the space of the R. V. X = {0, 1, 2}
2. Tossing a fair coin three times. The sample space is

HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 


The random variable can be the number of heads in this three tosses.

Possible outcomes X=x


(domain) (range)

HHH 3

HHT 2

HTH 2

HTT 1

THH 2

THT 1

TTH 1

TTT 0
The space of the R. V. X = {0, 1, 2, 3}
3 The sample space for rolling a die once is S = {1, 2, 3, 4, 5, 6}.
Let the rv X denote number on the face that turns up in a sample point, then we can
write
X(1) = 1, X(2) = 2, X(3) = 3, X(4) = 4, X(5) = 5, X(6) = 6.
Note that in this case X(x) = x (value of the event=value of rv X): we call such a
function an identity function.
4. Rolling pair of a fair dice. A random variable X is the sum of the two dice. The
sample space is {(1,1), (1,2), …, ((6,6)}, N = 36
The space of the R. V. X = {2,3,4,5,6,7,8,9,10,11,12}, number of element is 11
5. The sample space for tossing a coin until a head turns up is S = {H, TH, TTH, TTTH,
...}.
Let the rv X be number of trials required to produce the first head, then we can write
X = {1, 2, 3, 4, ...} X(H) = 1, X(TH) = 2, X(TTH) = 3, .....

2
In this case, the space or the range of the rv X = {1, 2, 3…}

We can say X is more tangible than S. It is easier to work with X for two reasons; (i) it’s
smaller, having 11 elements (as opposed to 36), in the case of example 4, and (ii) its
elements are numbers (as opposed to outcomes) allowing the use of various
mathematical operations.

Note that: A sample space S, contains all possible outcomes. But a random variable X
contains real numbers. That is, elements of S are events but specific value of X is real
number.

Types of Random Variable

a. Discrete Random Variable: If a random variable can assume only a particular finite
or accountably infinite set of values, it is said to be discrete random variable.
b. Continuous Random Variable: if a random variable can assume infinite and
uncountable set of values, it is known as continuous random variable

2.2. The probability distribution/function of a discrete random variable

Once a random variable X is defined, the sample space is no longer important. All
relevant aspects of the experiment can be captured by listing the possible values of X
and their corresponding probabilities. This list is called a probability density function
(pdf) or probability mass function or probability distribution. Formally, the pdf,
denoted by f, is the function defined by, f(χ) = P(X = χ) for −1 < χ < 1
Definition: If X is a discrete random variable, the function given by f(x) = P(X=x) for
each x within the range of X is known as the probability distribution of X.

 P ( X = x ) , x = x i
f ( x) =  , i = 1, 2,3,...
 0 , x  xi

3
The probability distribution reveals association of each value of X with the
corresponding probability.

A function can serve as the probability distribution of a discrete random variable X iff
its value, f(χ), satisfy the following two conditions:
1. f ( x)  0 for each value within its domain
Where the summation extends over all the
2. f ( x) = 1,
x

values within its domain.


Properties of Probability Distribution of Discrete rv
1. f ( x )  0

2.  f ( x) = 1
x

3. If X is a discrete rv, then


b −1
f ( a  X  b) = 
x = a +1
f ( x)
b
f ( a  X  b) = 
x = a +1
f ( x)
b −1
f ( a  X  b) =  f ( x )
x=a
b
f ( a  X  b) =  f ( x )
x=a

pdf can be indicated using: function, table or graph.

Tables of Probability Distributions


Examples
1. Consider an experiment of tossing two coins. Let the random variable X denote the
number of heads, Then S = {HH, HT, TH, TT}, and X(x) = 2 if x = HH, X(x) = 1 if x =
HT or TH, X(x) = 0 if x=TT.

4
Possible X=x f(x) =P(X=x) X=x f(x) =P(X=x)
outcomes

HH 2 1/4 2 1/4

HT 1 1/4 1 1/4

TH 1 1/4 1 1/4

TT 0 1/4 0 1/4

Alternatively,

X=x f(x) =P(X=x)

0 1/4

1 1/2

2 1/4
1. f ( x)  0

2. f ( x) = 1,
x

2. Tossing a fair coin three times. The random variable can be the number of heads in
this three tosses.
The probability distribution is given by (table):
X=x 0 1 2 3
f(x) =P(X=x) 1/8 3/8 3/8 1/8
1. f ( x)  0

2. f ( x) = 1,
x

3. Rolling pair of a fair dice. A random variable X is the sum of the two dice
X=x 2 3 4 5 6 7 8 9 10 11 12
f(x) =P(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

5
1. f ( x)  0

2. f ( x) = 1,
x

4. In the experiment of flipping a coin and generating the number of tosses required to
find a head, the sample space, S = {H, TH, TTH, TTTH ...}
Let the R.V. X = the number of tosses required to produce the first head and let P(H) =
p, P(T) = 1-p, then

X=x fX(x) = P(X = x)


1 p
2 (1 - p)p
3 (1 - p)2p
4 (1 - p)3p
. .
. .
. .
The sum of this process must be equal to 1.
Proof: The series: p, p(1-p), p(1-p)2, p(1-p)3, …, is in GP with a1 = p and r = 1-p
a1
S = , a1 = p & r = 1 − p
1− r
p
=
1 − (1 − p )
p
= =1
p
In general, if X is a discrete rv with values x1, x2, x3,…, xn with associated probabilities
f(x1), f(x2), f(x3),…, f(xn), then the set of pairs, f(x1)
X=x x1 x2 x3 …. xn
f(x) =P(X=x) f(x1) f(x2) f(x3) …. f(xn)

This pair is known the probability function (or probability distribution) of X

6
Functions of Probability Distribution for Discrete Random Variable

A function can serve as the probability distribution of a discrete random variable X iff
its value, f(χ), satisfy the following two conditions:
1. f ( x)  0 for each value within its domain
Where the summation extends over all the
2. f ( x) = 1,
x

values within its domain.


Examples:
1. Check whether the following function represents probability distribution
x
f ( x) = ,1,2,3,4,5,6
21
Solution: Check for the two conditions:
f ( x)  0, for all the given value of x
6

 f ( x) = 1
x =1

= f (1) + f (2) + f (3) + f (4) + f (5) + f (6)


1 2 3 4 5 6 21
= + + + + + = =1
21 21 21 21 21 21 21
2. Probability function of the above uniform distribution is given by:
1
 , x = 1 , , 6
f X
(x) = 6
 0 , elsewhere

3. Check whether the following function represents probability distribution
x−2
f ( x) = , x = 1,2,3,4,5
5
Since f(x)<0 for x=1, it does not represent a probability distribution

7
4. Check whether the following function represents probability distribution
3
3! 1
f ( x) =   , x = 0,1, 2, 3
x !( 3 − x ) !  2 
3! 1 1
f (0) = =
0!( 3) ! 8 8
3! 1 3
f (1) = =
1!2! 8 8
3! 1 3
f (2) = =
1!2! 8 8
3! 1 1
f (3) = =
3!0! 8 8
3
  f ( x ) = f (0) + f (1) + f (2) + f (3) = 1
x =0

5. For what value of k can the following function serve as probability distribution:
x
1
f ( x) = k   , x = 1, 2,3,...
4
k
f (1) =
4
k
f (2) =
16
k
f (3) =
64
.
.
The series: k/4, k/16, k/64,…, is in GP with a1 = k/4 and r = ¼. Then,
a1 1 k
S = = 1, r = & a1 =
1− r 4 4
k
 4 =1
1
1−
4
k
 4 =1
3
4
k
 =1
3
 k = 3

8
Graphs of Probability Distributions for Discrete Random Variable
The graphs include (probability) histogram and bar chart.
Example: Tossing a fair coin three times. The random variable can be the number of
heads in this three tosses.
The probability distribution is given by (table):
X=x 0 1 2 3
f(x) =P(X=x) 1/8 3/8 3/8 1/8

Bar chart of Probability Distribution


0.4 0.375 0.375

0.35

0.3

0.25
f(x)

0.2

0.15 0.125 0.125

0.1

0.05

0
0 1 2 3
Num ber of Heads

Cumulative Distribution Functions (CDF) a Discrete Random Variable


It tries to answer the question “what the probability that a discrete rv X will assume a
value less than or equal to a given number?”
Definition: A cdf shows the relation between the possible value t of a rv X and the
probability that the value of X is less than or equal to t. That is, the cdf relates each
possible value t that variable X might take on the probability f ( X  t ) , the probability
of the value less than or equal to t. This cumulative probability f ( X  t ) is written as
F ( x) = f ( X  x) . Formally

F ( x) = P ( X  x) =  f (t ) for −   x  
tx

9
The values of F(x) satisfy the following conditions:

F ( ) = 1
F (−) = 0
If a  b, then F (a)  F (b) for any real numbers a & b

Properties of Cumulative Distribution Function (CDF) or Distribution Function


(DF):
The following five properties hold for the CDF
a) 0 ≤ F(x) ≤ 1
b) F(x) is non-decreasing (i.e., if x1  x2 , then F ( x1 )  F ( x2 ) )

c) F(x) = 0 for x < x1, x1 being the minimum/least of the values of the random variable
X.
d) F(x) = 1 for x ≥ xn, xn being the maximum/largest value of X
e) P(x < X ≤ x) = P(X ≤ x) -P (X ≤ x) = FX(x) - FX(x)

Example: Suppose the pdf is given by:


 f ( x i ) , x = x i; i = 1, ,k
f ( x) = 
0 , elsewhere

Then,
F(x) = 0 for x < x1
F(x) = f(x1) for x1 ≤ x < x2
F(x1) = f(x1)
F(x2) = f(x1) + f(x2)
F(x) = f(x1) + f(x2), for x2 ≤ x < x3
F(x3) = f(x1) + f(x2) + f(x3),
F(x) = f(x1) + f(x2) + f(x3), for x3 ≤ x < x4

.
.
.
F(x) = 1 when x ≥ xk

10
1
 , x = x i; i = 1, 2 ,6
Example: f ( x ) = 6
0 , elsewhere

Then,
F ( x ) = 0, x 1
1
F ( x) = , 1 x  2
6
 F (1) = f (1)
F ( x ) = f (1) + f (2), 2  x  3
1 1 2
F (2) = + =
6 6 6
F ( x ) = f (1) + f (2) + f (3), 3  x  4
2 1 3
F (3) = F (2) + f (3) = + =
6 6 6
F (4) = F (3) + f (3), 4  x  5
3 1 4
= + =
6 6 6
F (5) = F (4) + f (5), 5  x  6
4 1 5
= + =
6 6 6
F (6) = F (5) + f (6), x  6
5 1
= + =1
6 6

0 for x  0
1
 for 1  x  2
6
2
 for 2  x  3
6
3
Thus, F ( x ) =  for 3  x  4
 6
4
 6 for 4  x  5

 5 for 5  x  6
6

1 for x  6

11
Graphically,

F(x)

5/6

4/6

3/6

2/6

1/6

x
0 1 2 3 4 5 6

Graphically F(x) is a step function with the height of the step at xi equal to f(xi).
Note that:
1. F(x) gives us the probability that the rv X will assume a value less than or equal to a
given number. But f(x) gives us the probability that the rv X will assume a particular
value.
E.g. In the experiment of rolling a die we have the sample space, S = {1, 2, 3, 4, 5, 6}.
Here, X(x) = x, P(x) = 1/6. Such distributions are known as uniform distributions.

x f(x) =P(X = x) P(X≤x)=F(x)


1 1/6 1/6
2 1/6 2/6
3 1/6 3/6
4 1/6 4/6
5 1/6 5/6
6 1/6 1

2. Given f(x) we can derive F(x) or given F((x) we can derive f(x)

12
F ( x ) =  f ( t ) for −   x  
tx

1
 , x = x i; i = 1, 2 ,6
E.g. f ( x ) = 6
0 , elsewhere

0 for x  0
1
 for 1  x  2
6
2
 for 2  x  3
6
3
Then F ( x ) =  for 3  x  4
6
4
 6 for 4  x  5

 5 for 5  x  6
6

1 for x  6
x1  x2  x3  ...  xn , then f ( x1 ) = F ( x1 ) and
If the range of a rv X consists of the values
f ( xi ) = F ( xi ) − F ( xi −1 ) for i = 2,3, 4...n

2.3 Probability Density Functions (pdf) of the Continuous rv X


Continuous Random Variable: if a random variable can assume infinite and
uncountable set of values, it is known as continuous random variable (e.g. height,
weight, and the time elapsing between two telephone calls). More technically,
Definition: A random variable X is called continuous if there exists a function f(.) such

x
that P(X≤ x) = F ( x) = 
−
f (u)du for every real number x, and F(x) is called the

distribution function of the random variable X.


Definition: A function with values f(x), defined over the set of all real numbers, is
b
known as a pdf of the continuous rv X iff: p (a  X  b) =  f ( x)dx , for any real
a

constants a and b with a  b . Or if X is a continuous random variable, the function f(.)

13
x
in F ( x) = 
−
f (u)du is called the probability density function of X.

A function can serve as a probability density function of a continuous random variable


X if its values, f(χ), satisfy the following two conditions:
1. f ( x)  0, for −   x  


2. −  f ( x)dx = 1
b
3. p ( a  X  b) = a f ( x )dx

NB: the conditions are not iff as that of the case of discrete rv X. The reason is that f(x)
could be negative for some values of the rv without affecting any of the probabilities. In
practice all probabilities are non-negative and hence these conditions are satisfied.

Properties of Probability Density Function (pdf)

1. f ( x)  0, for −   x  


2. −  f ( x)dx = 1
3. Probability of a fixed value of a continuous rv X is zero.
 p (a  X  b) = p(a  X  b) = p(a  X  b) = p(a  X  b) . For any real constants
a and b with a  b .
Probability Density Function (pdf) can be indicated using graphs or functions
Since the probability that a continuous rv X will assume a particular value is zero, the
probability distribution of continuous rv X can not be given in tabular form.
The graphs of a pdf of rv X is a continuous curve with any shape

Functions of pdf for the continuous rv X


Examples:

1. f ( x) = 6 x(1 − x), for 0  x  1

14
Solution
1
 6 x(1 − x)dx = 1
0

 6(  ( x − x )dx) | = 1
1
2 1
0 0

1 1
 ( x 2 − x 3 ) |10
2 3
3x − 2 x3 1
2
 6( ) |0 = 1
6
 3− 2 =1
1=1
graph it x-intercept= 0 or 1
2. For what value of k the function f ( x) = kx(1 − x), for 0  x  1 can serve as the
probability density function?
Solution:
1
0
kx(1 − x)dx = 1
1
 k  ( x − x 2 )dx = 1
0

1 2 1 3 1
 k( x − x ) |0 = 1
2 3
3− 2
 k( ) =1
6
k
 =1
6
k =6

 c
 ,0  x  4
3. The pdf of the continuous rv X is given by: f ( x) =  x
0
 , otherwise
a. Find the value of c
1
b. Find p ( X  ) & p ( X  1)
4

15
Solution:
a. b.
1
c 1 1

4
1
dx = 1 p( X  ) = 4 1
0 4 0
x2 4x 2
−1
1 1 −1
=  4 x 2
4
 c x 2
dx = 1
0 4 0
1 1
1 1 1
x 4 2 = x 2 |0 4
c | =1 4 1− 1
1 0
2
2
1 1 12
1 = 2( )
 2cx 2
|04 = 1 4 4
1
1
=
 2c (4) = 1 2
4
 4c = 1
1
c=
4
1
1
p ( X  1) = 
0
4
1
2
4x
1 4 −21
4 1
= x
1
1 1
= x 2 |14
4 1− 1
2
1 1
1 1
= 2(4) 2 − 2(1) 2
4 4
1 1
= 1 − =
2 2
4. Given the pdf:

kxe − x ,for x  0
2

f ( x) = 
0 ,for x  0

Find the value of k.

16
Solution:

 kxe − x dx = 1
2

let v = e − x
2

 dv = −2 xe − x dx
2

dv
 xdx = −
2e − x
2

Then,

 kxe − x dx = 1
2

0
 1

−x
 −k dv = 1
2

2 e
0 −x
2e
k 
2 0
− 1dv = 1

k
 − v | 0 = 1, but v = e − x2

2
k
 − e − x | 0 =1
2

2
k k
 lim( − e − x ) − ( − e 0 ) = 1
2

x → 2 2
k
 0+ =1
2
k =2
Cumulative Distribution Functions (cdf) of a continuous rv X
Definition: if X is a continuous rv and the value of its probability density at t is f(t),
x
then the function given by: F ( x) = p( X  x) =  f (t )dt, for −   x   is known as
−

the Cumulative Distribution Function of X. F(.) must be continuous with domain the
set of all real numbers and range between 0 and 1 (inclusive).

17
Properties of cdf of a continuous rv X
1. F (+  )  lim F (x ) = 1
x → +

2. F (−  )  lim F (x ) = 0
x → −

3. Properties 1 and 2 imply that 0  F X (x )  1

4. If f(x) and F(x) are the values of the pdf and cdf of a continuous rv X, then

p(a  X  b) = F (b) − F (a ) for any real


constant a and b with a  b and
dF ( x)
f ( x) = where the derivative exists
dx
5. F(x) is non decreasing, i.e., if a >b then F(a) - F(b) ≥ 0 which is P(x ≤ a)-P(x ≤ b).
dF ( x )
6. If F(x) is continuous and differentiable, then = f (x )
dX
The function f (x) is known as the probability density function of the continuous
random variable X.
d F X ( x) x
7. = f ( x )  F ( x ) − F ( - ) =  f (u )du
dx −

8. Since F(x) is non-decreasing it follows that


a) f(x) ≥ 0
+
b)  f ( x ) dx
-
= 1

Note that f(a) ≠ P(X = a), and f(a) could actually be greater than 1.
Examples
1. Given f ( x) = 6 x(1 − x), for 0  x  1 . Find its cdf and P(0.5<X<1)

18
Solution:

i.
x
F ( x) = −
f (t ) dt
x
=  6t (1 − t ) dt
−
x
= 6  (t − t 2 ) dt
0

t2 t3 x
= 6( − ) |0
2 3
3x − 2 x3
2
= 6( )
6
F ( x) = 3 x 2 − 2 x3
0 ,for x  0

 F ( x ) = 3 x 2 − 2 x 3 ,for 0  x  1
1 ,for x  1

Its graph is continuous between 0 and 1
ii.
p(0.5  X  1) = F (1) − F (0.5)
= 1 − (3(0.5)2 − 2(0.5)3 )
= 1 − 0.5 = 0.5
 c
 ,0  x  4
2. Given The pdf of the continuous rv X is given by: f ( x) =  x
0
 , otherwise
Find its cdf and p(X>1)

Solution:

19
 1 1
 ,c = , for 0  x  4
f ( x) =  4 x 4
0
 ,for otherwise
x 1
i.F ( x ) = −
4 t
dt

1 x − 12
=  t dt
4 0
1
1 1
= t 2 |0x
4 1− 1
2
1 12 x
= t |0
2
1 12
= x
2
0 ,for x  0
1

 F ( x) =  x , for 0  x  4
 2

1 , for x  4
ii. p ( X  1) = F (4) − F (1)
1 1
= ( 4 − 1) =
2 2
1. Find the cdf of the pdf of the form:


kxe − x ,for x  0
2

f ( x) = 
0 ,for x  0

1
2. Suppose X is a continuous rv with cdf: F ( x ) = . Find its pdf and compute
1 + e− x
p(-1<x<2) using both pdf and cdf
Note that: given F(x) we can find f(x) or vice versa.

20
2.4. The Expected Value of a Random Variable and Moments

Mathematical Expectation:
1. Let X be a discrete random variable taking values x1, x2, x3,…with f(xi) as its
probability density, then the expected value of X, denoted by E(X), is defined as

E ( X ) = x1 f ( x1 ) + x1 f ( x1 ) + x2 f ( x2 ) + x3 f ( x3 ) + ...

E( X ) =  x f (x )
i
i i

That is, E(X) is the weighted mean of the possible values of X, each value is weighted
by its probability. Example:
2. Let X be a continuous random variable with probability density function f(x), then

the expected value of X, denoted by E(X), is defined as: E ( X ) =  xf ( x)dx .
−

20,000
Example: f ( x) = , x  100
x3
Properties of mathematical expectations
a. if c is a constant E(c) = c
 E(c) = (c)1 = c
b. E(aX+b) = aE(X)+b, where a and b are constants in 
Proof: For discrete

E (aX + b) =  (ax + b) f ( x)
=  axf ( x) +  bf ( x)
= a  xf ( x) + b f ( x)
= aE ( x) + b, (  f ( x) = 1)

21
For continuous:

E (aX + b) =  (ax + b) f ( x)dx
−
 
=  axf ( x )dx +  b f ( x )dx
− −
 
= a f ( x ) xdx + b  f ( x )d
− −

= aE ( x ) + b, ( 
−
f ( x ) = 1)
c. Let X and Y be random variables with finite expected values. Then
E(X + Y ) = E(X) +E(Y )
Expectation of a Function of a Random Variable

Let g(x) be a function of a rv X, then

E ( g ( x)) =  g ( x) f ( x), for X is a discrete rv


i

E ( g ( x)) =  g ( x) f ( x), for X is a continuous rv
−

Examples: X=x 0 1 2 3
1. Given: F(x) 1/3 1/2 0 1/6

Find the EV of g ( x) = ( X − 1) 2

E ( g ( x)) =  i =0 g ( x) f ( x)
3

=  i =0 ( x − 1) 2 f ( x)
3

1 1 1
Solution: = (0 − 1) 2
+ (1 − 1) 2 + (2 − 1) 2 0 + (3 − 1) 2
3 2 6
1 4
= + 0 + 0 + = 1
3 6
2. Let X be a rv with density function

 x2
 , −1  x  2
f ( x) =  3
0,
 otherwise

22
Find the EV of the function: g ( x ) = 2 X − 1
Solution:

E ( g ( x )) = −
g ( x) f ( x)
 x2
= − (2 x − 1)
3
dx

2 x2
= −1 (2 x − 1) 3 dx
1 2
= (  (2 x − 1) x 2 dx)
3 −1
1 1 4 1 3 2
= x − x  |−1
3  2 3 
1  16 8 1 1  3
=  ( − ) − ( + ) =
3 2 3 2 3  2

For c is any positive constant and g(x) be a function of a rv X, then


1. for the discrete rv:
Ec( g ( x)) = cE ( g ( x))
proof: E (cg ( x)) =  cg ( x) f ( x) = c  g ( x) f ( x) = cE ( g ( x ))

2. for the continuous rv:

E (cg ( x)) =  cg ( x) f ( x)
= c  g ( x) f ( x)
= cE ( g ( x))
Variance and Standard Deviation of a rv X
The variance of the rv X measures the spread or dispersion of a rv X
Let X be a rv with the following distribution.
X=x x1 x2 x3 ….
f(x) f(x1) f(x2) f(x3) ….

Then, the variance of the X, denoted by Var(X), is defined by:

23
Var ( X ) =  ( x −  ) f ( x)
2

= E( X )2 −  E( X )
2

= E( X )2 −  2
But E ( X ) 2 =  x 2 f ( x), for discrete
E ( X ) 2 =  x 2 f ( x) dx, for continuous
Var ( X ) =  x 2 f ( x) −  2
Var ( X ) =  x 2 f ( x)dx −  2
Properties of Var(X)
1. If “a” is any real constant, then Var(a)=0
2. if Var ( X ) =  2 , then the variance of Y in Y=aX+b is given by:

Var (Y ) = a 2 2
Examples: for discrete
X=x 0 1 2 3
1. Given
f(x) 1/8 3/8 3/8 1/8

Find: a. E(X). b. Var(X)


Solution:

E ( X ) =  xf ( x)
1 3 3 1 12
= 0. + 1. + 2. + 3. = = 1.5
8 8 8 8 8
1 23 3 1
E( X ) = 02 + 1 + 2 2 + 32
8 8 8 8
3 12 9 24
= 0+ + + = =3
8 8 8 8
Var ( X ) = 3 − (1.5) 2 = 0.75
In the three tosses of a fair coin, on average, we get 1.5 heads.
2. Bernoulli Random Variable: A random variable with only two outcomes (0
and 1) is known as a Bernoulli R.V.
Let X be a random variable with probability p of success and (1 - p) of failure.

24
x fX(x)
Success 1 p
Failure 0 (1-p)

E(X) = 0(1 - p) + 1(p) = p.


Var(X)=p
The above tabular expression for the probability of a Bernoulli R. V. can be written as

 p (1 − p )
 1− x
x
; if x = 0,1
f ( x) = 

0 ; otherwise

Let X be the number of trials required to produce the 1st success, say a head in a toss of a
fair coin. This is easily described by a geometric random variable and is given as:
x fX(x)
1 p
2 (1 - p)p
3 (1 - p)2p
4 (1 - p)3p
. .
. .
. .
E(x) = (1)p + (2)(1 - p)p + (3)(1 - p)2p + (4)(1 - p)3p + ....
= p[1 + (2)(1 - p) + (3)(1 - p)2 + (4)(1 - p)3 + ....]
Let S = 1 + (2)(1 - p) + (3)(1 - p)2 + (4)(1 - p)3 + ....
 E(x) =Sp
Then (1 - p)S = (1 - p) + (2)(1 - p)2 + (3)(1 - p)3 + ....
S-S+Sp = 1 + (1 - p) + (1 - p)2 + (1 - p)3 + ....
 Sp = 1 + (1 - p) + (1 - p)2 + (1 - p)3 + ....
(1 - p)Sp = (1 - p) + (1 - p)2 + (1 - p)3 + ....
 Sp - (1 - p)Sp = 1

 Sp2 = 1  S = 1/p2  E(x) = (p)(1/p2) = 1/p

 p(1 − p )
 x −1
, x = 1, 2, 3,
Alternatively, f ( x ) = P( X = x) = 

0 , elsewhere

25
  
E(X ) =  xf ( x ) =  xp(1 − p ) p  x(1 − p )
x −1 x −1
=
x =1 x =1 x =1

d  d
p − (1 − p )  , as − (1 − p )  = x(1 − p )
x −1
=
x x

x =1 dp  dp
 d  x  d  x  d  1  1
= p  −  (1 − p )  = p  −  (1 − p )  = p −    =
 dp x =1   dp x =0   dp  p   p

Note: if p = ½ then E(x) = 2; if p = 1/3 then E(x) = 3; if p = 1/4 then E(x) = 4


if p =1/5 then E(x) = 5; .... if p = 1/10 then E(x) = 10
As p becomes smaller the number of trials required to get a success increases.
Examples for Continuous rv X
1. Given the pdf:
 1 − 14 x
 e , x0
f ( x)  4
0, otherwise

Find: a. E(X) and b. Var(X)

1 − e −  x , x  0 &   0
2. F ( x) = 
0, otherwise
Find: a. E(X) and b. Var(X)
Solution
E(X)=1/λ, Var(X)=1/ λ2

26
Moments of a probability distribution
The mean of a distribution is the expected value of the random variable X. A
generalisation of this is to raise X to any power r, for r =0, 1, 2,... and compute the
E(Xr). This is known as the moment of order r about the origin. The rth moment about

the origin, denoted by  r .


/

for r = 0, 0/ = E ( X 0 − 0) = 1
for r = 1, 1/ = E ( X 1 − 0) = E ( X ) = mean =  xf ( x )
for r = 2,  2/ = E ( X 2 − 0) = E ( X 2 )
for r = 2, 3/ = E ( X 3 − 0) = E ( X 3 )
.
.
.
r/ = E ( X r − 0) = E ( X r )
In general,

 r/ = E ( X r ) = x r
f X ( x ) ; r = 0,1, 2, (Discrete)


/
r = E(X r
) =  x f ( x)dx ; r = 0,1, 2,
r
(Continuous)
−

Moments can also be generated around the mean, which are known as central

moments or moments around the mean. It is denoted by r


for r = 0, 0 = E ( X − X ) 0 = 1
for r = 1, 1 = E ( X − X )1 = 0
for r = 2,  2 = E ( X − X ) 2 = Var ( X )
for r = 3, 3 = E ( X − X )3
.
.
.
r = E ( X − X )r

27
In general,

 r = E ( x −  )  = ( x −  ) f ( x)
r r
, r = 0,1, 2, or
  x

= E ( x −  )  =  ( x −  ) f ( x ) dx
r r
r  
, r = 0,1, 2,
−

Relationship between 
/

r
and r

a) 0/ =  0
=1

b) 1/ = X

a) 2 = E( x − X )2 =  2 is defined as the variance of the random variable, and is also


denoted by Var(X) or V(X).

E ( x − X ) 2 = E ( x 2 − 2 xX + X 2 )
= E ( x 2 ) − 2 XE ( x) + X 2
= E( x2 ) − 2 X 2 + X 2
= E( x2 ) − X 2

2 = 2' − (1' )2 = E( x2 ) − (E( x))2 . Thus, i.e, the variance of a random variable is
expected value of the square of the random variable less the square of the expected
value of the random variable.
b) 3 = E( x − X )3

E ( x − X )3 = E ( x3 − 3x 2 X + 3xX 2 − X 3 )
= E ( x3 ) − 3 XE ( x 2 ) + 3 X 2 E ( x) − X 3
= E ( x3 ) − 3 XE ( x 2 ) + 3 X 3 − X 3
= E ( x3 ) − 3 XE ( x 2 ) + 2 X 3

= μ 3/ -3μ 2/ 1/ +2 1/


3
is the third moment about the mean and is equal to : μ3

28
c)  4 = E ( x −  ) 4

E ( x − X )4 = E ( x 4 − 4 x 3 X + 6 x 2 X 2 − 4 xX 3 + X 4 )
= E ( x 4 ) − 4 XE ( x 3 ) + 6 X 2 E ( x 2 ) − 4 X 3 E ( x) + X 4
= E ( x 4 ) − 4 XE ( x 3 ) + 6 X 2 E ( x 2 ) − 4 X 4 + X 4
= E ( x 4 ) − 4 XE ( x 3 ) + 6 X 2 E ( x 2 ) − 3 X 4
is the fourth moment about the mean and is equal to :

4 = μ 4/ -4μ1/ μ 3/ +6μ1/ μ 2/ -4μ1/


2 4

Interpretations
1. The first moment about the origin is the mean of the distribution. That is, μ1/ =μ is a

measure of central tendency


2. The second moment about the mean is the variance of the distribution. μ2 which is
denoted by 2, Var(X), or V(X), is the known as the variance, and is a measure of
dispersion of the random variable. If X is a random variable given in centimetres,
2's dimension is centimetres squared ((cm)2)
3. , or, the positive root of the variance, is called the standard deviation of the
random variable and its dimension is given in centimetres. Thus, to convert these
measures of dispersion into a dimensionless measure, we divide it by the mean to

get the coefficient of variation given as cv = . This is a dimensionless measure of

dispersion of the random variable.
Skewness and Kurtosis
A fundamental task in many statistical analyses is to characterize the location and
variability of a data set. A further characterization of the data includes skewness and
kurtosis.

Skewness is a measure the lack of symmetry. A distribution, or data set, is symmetric if


it looks the same to the left and right of the center point.

29
Kurtosis is a measure of whether the data are peaked or flat relative to a normal
distribution. That is, data sets with high kurtosis tend to have a distinct peak near the
mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to
have a flat top near the mean rather than a sharp peak. A uniform distribution would be
the extreme case.

4. μ3 is the third moment about the mean and is used to calculate the measure of

3
skewness which is given as  3 = and known as the Pearson’s measure of
3
skewness. If α3 = 0 then the distribution is symmetric. If α3 > 0 then the distribution

is positively skewed, and there is a spread to the right–few observations on the


right-hand of the mean pull the mean to the right. If α3 < 0 then the distribution is

negatively skewed, and there is a spread to the left–few observations on the left-
hand of the mean pull the mean to the left.
5. μ4, the fourth moment about the mean and is used to calculate the measure of

4
pickedness or flatness (which is known as kurtosis) and is given as  4 = . α4 =3
4
for a normal distribution. α4 >3 if the distribution is narrower and thinner at its tails

than the normal distribution (it is known as leptokurtic). α4 <3 if the distribution is

flatter and thicker at its tails than the normal distribution (it is known as
platykurtic).

Example
We have already obtained the expected value of the Bernoulli random variable

p (1 − p )
 x 1− x
; if x = 0, 1
f X (x ) = 

0 ; otherwise
to be equal to E(x) = 0(1-p)+1(p) = p.
To obtain the variance of the Bernoulli random variable we first get
E(x2) = 02(1 - p) + 12(p) = p.

30
Thus 2 = E(x2) - (E(x))2 = p - p2 = p(1 -p)

Thus  = p − p 2 = p(1 − p)

p − p2 p (1 − p )
and cv = =
p p

Exercise
Let the random variable Y be the number of failures preceding a success in an
experiment of tossing a fair coin, where success is obtaining a head. Find E(Y) and 2
for this random variable.

Moment Generating Function (mgf)


Moments of most distributions can be determined directly through integration or
summation. An alternative technique to calculate moments of a distribution is known as
the moment generating function (mgf).

Definition: Moment generating function of a random variable X, written as M(t), is

defined as m(t ) = E (etx ) , where t is any constant in neighbourhood of 0 (zero).


Taylor series expansion about any point x0 of a continuously differentiable function
f(x) is given by:

f ( x 0 ) ( x − x0 )0 f ( x 0 ) ( x − x0 )1 f ( x 0 ) ( x − x0 ) 2 f ( x 0 ) ( x − x0 )3
f ( x) = + + + + ...
0! 1! 2! 3!

f ( x0 )( x − x0 )r
f ( x) = f ( x0 ) + 
r =1 r!
r
f (x )
; Where f r (x0 ) = d r
d x x = x0

31
The Maclaurin’s series = the Taylor series expansion about the origin or zero is given
by:

f ( 0 ) ( x − 0)0 f ( 0 ) ( x − 0)1 f ( 0 ) ( x − 0) 2 f ( 0 ) ( x − 0)3


f ( x) = + + + + ...
0! 1! 2! 3!

f (0)( x) r
f ( x) = f (0) + 
r =1 r!

Example: Let f ( x) = e x , then


f (0) x1 f (0) x 2 f (0) x3 f (0) x 4
f ( x ) = f (0) + + + + + ...
1! 2! 3! 4!
2 3 4
x x x
f ( x) = 1 + x + + + +
2! 3! 4!
Similarly,

( tx ) + ( tx ) + ( tx ) + ( tx )
2 3 4 5

e
tx
= 1 + tx + +
2! 3! 4! 5!
2 3 4 5
Hence, m ( t ) = E ( etx ) = 1 + tE ( X ) + E ( x ) + E ( x ) + E ( x ) + E ( x 5 ) +
t 2 t 3 t 4 t
2! 3! 4! 5!
2 3 4
m (t ) = E ( X ) + E ( x ) + E ( x ) + E ( x ) + E ( x5 ) +
d 2t 3t 4t 5t
 2 3 4

dt 2! 3! 4! 5!
d d
 m ( t ) t =0 = m ( 0 ) = E ( X ) , the first moment about the origin.
dt dt
2 2 3
m (t ) = E ( x ) + E ( x ) + E(x )+ E ( x5 ) +
d 2 6t 3 12t 4 20t
Similarly, 2
dt 3! 4! 5!
2 2
( ) m ( 0 ) = E ( x 2 ) , the second moment about the origin.
d d
 2
m t t =0 = 2
dt dt

32
In general,
d r m(t )
r
|t =0 = r/ , the r th moment about the origin
dt
Examples

1. Given the pdf f ( x ) = 2 x, 0  x  1


Solution:
1
m(t ) = E (etx ) =  etx f ( x )dx
0
1
=  2 xetx dx
0
1
= 2  xetx dx
0
Using integration by parts:
1
= 2  xetx dx, u = x  du = dx
0

1 tx
v / = etx dx  v = e
t
1 x tx 1 tx 1
Then, 2  xetx dx = 2( e − 2 e ) |0
0 t t
1 1 1
= 2( et − 2 et + 2 et )
t t t
tet − et + 1
= 2( )
t2
2
= 2  et (t − 1) + 1
t
2
 m(t ) =  et (t − 1) + 1
2 
t
The Maclaurin’s series of the mgf is:

33

2 f (0)t1 f (0)t 2 f (0)t 3 f (0)t 4 
m(t ) = 2
( f (0) + 1! ) + 2! + 3! + 4! + ...)(t − 1) + 1
t 
2 t2 t3 t4 
= 2 (1 + t + + + + ...)(t − 1) + 1
t  2 6 24 
2  t3 − t2 t4 − t3 t5 − t4 
= 2 t − 1 + t − t + 2 + 6 + 24 + ... + 1
2

t  
2  2 t 2 (t − 1) t 2 (t 2 − t ) t 2 (t 3 − t 2 ) 
= 2 t + + + + ...
t  2 6 24 
 1 t t2 
= 2 1 + + + + ...
 2 3 8 
dm(t) 2
Then, |t =0 = = mean
dt 3
2
d m(t ) 1
2
|t = 0 = = E( X 2 )
dt 2
1 4 1
Var ( X ) = − =
2 9 18
x
1
2. Given the probability distribution: f ( x) = 2   , x = 1, 2,3,...
 3
Find: a. moment generating function
b. The mean and the variance of the distribution using a
Solution:
2et
m(t ) = . To solve for the mean and variance take the derivative of m(t) as it is (no
3 − et
need of the Maclaurin’s series) and evaluate it at zero.

Note:
1. Taking higher order derivative of the moment generating function and then evaluating
the resulting function at the origin (t=0) can also generate all higher order moments
about the origin of a random variable.

2. There is a one-to-one correspondence between the probability density function of a


random variable and its moment generating function, it exists.

34
3. Special Probability Distributions and Densities
3.1 Some special Probability Distributions: The case of Discrete Random Variable
3.1.1 The Bernoulli distribution
A random variable with only two outcomes (0 = failure and 1=success) is known as a
Bernoulli rv.
Let P(success) = p implying P(failure) = (1 - p). Defining the rv X = X(success) = 1 and
X(failure) = 0 then P(X = 1) = p and P(X = 0) = (1 - p). That is,
Definition: A rv X has a Bernoulli distribution and it is referred to as a Bernoulli rv iff
its probability distribution is given by:

 p (1 − p )
 1− x
x
, for x = 0, 1 and 0  p  1
f (x; p ) = 

0 , elsewhere

E ( X ) =  xf ( x) = 0 p 0 (1 − p)1 + 1 p(1 − p)0 = p


Var ( X ) =  x 2 f ( x) − [ E ( X )]2
= 02 ( p 0 (1 − p)1 ) + 12 ( p(1 − p) 0 ) − p 2 = p(1 − p)
3.1.2 The Binomial Distribution
The binomial distribution arises out of repeated Bernoulli trials. There are only two
possible outcomes on each trial: success and failure. Such naming does not imply one
outcome is good and the other is bad. Let p be the probability of a success in a given
trial and let 1- p be the probability of failure.
Assumptions:
1. the probability of success is the same for each trial, and
2. there are n independent trials (i.e., the outcome of one trial doe not affect the
outcome of any other trial.
3. the outcomes are mutually exclusive
4. the random variable of the binomial distribution is the result of counts. That is, we
count the number of successes in the total number of trials
Definition: A rv X has a binomial distribution and it is known as a binomial rv iff its

n n− x
probability distribution is given by: f ( x; n, p) =   p (1 − p) , x = 1, 2,3,..., n
x

 x

35
Where n = the number of independent trials and p = probability of success

n n n!
  = n combination x and is given by:   =
 x  x  x !(n − x)!
Consider the case of 4 trials (n=4) and denote success by S and failure by F.
1) A single success can occur in any one of the following four ways.
(SFFF, FSFF, FFSF, FFFS). Each of these has the probability p(1-p)3 =p(1-p)3,
 4
( )
3
therefore, f(1;4,p) = 4p(1-p)3 =   p 1 − p
1
2) Two successes can occur in six distinct ways.
(SSFF, SFSF, SFFS, FSSF, FSFS, FFSS). Each of these has the probability p2(1-p)2,
thus, f(2;4,p) = 6p2(1-p)2, using the factorial notation we can write
 4
f (2, p ) =   p2 (1 − p ) =
4!
p (1 − p ) = 6 p (1 − p )
2 2 2 2 2

 2 (4 − 2)!2!
by similar argument:

 4 4!
f ( 3;4, p ) =   p 2(1 − p ) = p (1 − p ) = 4 p (1 − p )
2 2 2 2 2

3 ( 4 − 3)!3!

The general form of the probability function of the binomial distribution is given by

 n  x n− x
  p (1 − p) , x = 0,1, 2, n
f ( x; n, p) =  x 
0
 , Otherwise
n x
that:  x =0   p (1 − p)
x =n n− x
Note = ( p + 1 − p) n = 1 . That is, the sum of
 x
probability is equal to one.
For a random variable having a binomial distribution with parameters n and p:

36
The generating function:

m ( t ) = E e tx ( )
n n x
=  e tx   p (1− p)
n-x

x =0  x
n n
=   (et p) x (1− p)
n-x

x =0  x 

( )
n
= pe t +(1− p)

( )
n −1
dm(t )
Then, = n pe t + (1− p) pe t
dt
dm(t )
|t =0 = np
dt
 E ( X ) = m / ( 0 ) = np , and

( ) ( )
n−2 n −1
d 2 m(t )
2
= n(n − 1) pe t + (1− p) pe pe + n pe t + (1− p)
t t
pe t
dt
2
d m(t )
2
|t =0 = n(n − 1) p 2 + np
dt
Then, E ( X 2 ) = n(n − 1) p 2 + np
Var ( X ) = E ( X 2 ) −  E ( X ) 
2

= n(n − 1) p 2 + np − (np) 2
= n 2 p 2 − np 2 + np − n 2 p 2
= np − np 2 = np(1 − p)
Alternatively,
n n −1
n
n n
n!   x −1
E ( X ) =  x   p x(1 − p) n− x =  x x n− x
pq = np   p (1 − p)
n−x
= np
x =0  x  x =0 x !( n − x ) ! x =1  x − 1 

Example
Suppose a manufacturer of TV tubes draws a random sample of 10 tubes. The
production process is such that the probability that a single TV tube, selected at random,

37
is defective is 10 percent. Calculate the probability of finding
a) exactly 3 defective tubes
b) no more than 2 defective tubes.
a) note n =10, x = 3, p = 0.1, and q = 0.9, therefore
10 
P(X= 3) =   (0.1)3  (0.9)7
3

10  10  10 


b) P(X  2) =   (0.9)10 +   (0.1)1 (0.9)9 +   (0.1)2 (0.9)8
0 1 2

Exercise
1. A company that markets brand A cola drink claims that 65 percent of all residents of
a certain area prefer its brand to brand B. The company that makes Brand B employs an
independent market research consultant to test the claim. The consultant takes a random
sample of 25 persons and decides in advance to reject the claim if fewer than 12 people
prefer Brand A. What is the probability that the market researcher will make the error of
rejecting the claim even though it is correct?

2. An English teacher in Flen 101 gives a test consisting of 20 multiple choice questions
with four possible answers to each, of which only one is correct. One of the students,
who has not been studying, decides to check off answers at random. What is the
probability that he will get half of the questions right?

3. An owner of a mountain resort has 15 rooms available for rent, and they are rented
independently. The probability that any one of them will be rented for a single night is
0.8. Compute the probability that at least 12 cabins will be rented in a single night?

3.1.3 The hypergeometric distribution


The objective is to obtain the formula analogous to that of the binomial distribution. But
in this case the trials are not independent (i.e., selection is without replacement). This
implies that if the population size is small, probability of success is not constant across
trials. Since one of the criteria to use binomial distribution is that probability of success

38
is the same from trial to trial, it is not applicable. In this case, we use the
hypergeometric distribution. Conditions: sampling must be without replacement and
population size (N) is finite and the sample size (n) is greater than 5% of the total
population.

Suppose a set of N elements (or finite population size) of which M are successes and N-
M are failures. Here again we are interested in the probability of getting x successes in n
trials (sample size). But now the choice of n out of N elements is without replacement.
m
The ways of choosing x successes out of the total of M successes are:   and n-x
x 
N −M 
failures out of the total N-M failures   .Then the ways of choosing x successes
n− x 
 M  N − M 
and n-x failures are:    . The ways of choosing n elements (sample size) out
 x  n − x 
N
of N elements (population size) are:  
n 

 M  N − M 
  
 x  n − x 
The probability of getting x successes in n-trials is given by: N
 
n 
Definition: A random variable X has a heypergeometric distribution and it is known as
a heypergeometric rv iff its probability distribution is given by:

 M  N − M 
  
 x  n − x 
f ( x; n, N , M ) = , for x = 0,1.2,..., x  M & n − x  N − M
N
 
n 
The mean and variance of the heypergeometric distribution:

nM nM ( N − M )( N − n)
E( X ) = & Var ( X ) =
N N 2 ( N − 1)

39
Example1: Suppose electronic component factory ships components in lots of 100, of
which 10 are defective. A quality controller draws a sample of 5 to test. What is the
probability that two of the five are defective?
10  90 
  
=    .
2 3
f ( 2;5,10,90 )
100 
 
 5 
Example2: An urn contains 5 black and 7 red balls. Two balls are selected at random
without replacement.
Let X = the number of black balls in the selected sample of 2 balls.
 X = {0, 1, 2}
Question: Find P(X = x)
a) total number of ways of selecting two balls from an urn containing 12 balls is

12 
 
2
 5
b) the number of ways of selecting x black balls is  
 x

 7 
c) the number of ways of selecting 2-x red balls is  
 2 − x

 5 7 
   
 x   2− x 
 P (X =x ) =
 12 
 
 2 
 P(X = x) for x = 0, 1, 2 is as given below.
 5  7  5! 7! 1(7  6)
  
p(X = 0) =  0  2  5!(0!) 2!(5!)
= = 2 =
21
= 0.32
12  12! 12 11 66
  2!(10!) 2
2

40
 5  7  5! 7!
  
5 7
p(X = 1) =    =
1 1 4!(1!) 1!(6!) 35
= = = 0.53
12  12! 12 11 66
  2!(10!) 2
2

 5  7  5! 7! 5 4
  
p(X = 2) =    =
2 0 2!(3!) 7!(0!) 10
= 2 = = 0.15
12  12! 12 11 66
  2!(10!) 2
2

3.1.4 The Poisson distribution: Probability of rare Events

Suppose the number of trial is very large (i.e., n →  ) and the probability of
getting success is very small (i.e., p → 0 ). In such a case calculating binomial
probabilities is very difficult. Fortunately, we can approximate binomial distribution
using the Poisson distribution, after French mathematician Simon Poisson.

Consider the binomial distribution f(x; n, p) with the following things holding
(a) n →, and
(b) p → 0, but in such a way that
(c) np remains constant and hence letting np =  for > 0.
Thus, the probability of success is very small, but the number of trials is very large.
Example when we have the probability of a disease is small but the number of patients
is large. We are interested in the distribution in this case:

n ( n − 1)( n − 2 ) ( n − x + 1)
p (1 − p )
n− x
f ( x; n, p ) = x

x!
n ( n − 1)( n − 2 ) ( n − x + 1)  x
( )
n− x

=

x 1 −
x! n n
 1  2   x − 1  n− x

( )
11 − 1 −  1 −
n  x
 n n
 n  n   
=   1− 
x!
 n 
if we let n →

41
n-x
  λ n  n
 i
(a) lim 1 −  = 1 for i = 1, 2, ... x-1 and, (b) lim   1−   = e -λ
n → 
 n  
n →  n

Definition: A random variable X has a Poisson distribution and it is known as a Poisson


rv iff its probability distribution is given by:

  x −
 e , x = 0,1,2, , &  0
f ( x;  ) =  x!
0
 , elsewhere

Note that 1. f(x) > 0, and


2.


 x
x

  

x =0
f ( x ) = e −

x =0 x !
= e − 
e = 1 (
x =0 x !
is the Maclaurin expansion of: e )

If X is a random variable and X −poi(), then its moment generating function is:

42
m(t ) = E (etx )

=  etx f ( x)
x =0

=  etx f ( x)
x =0
  x e− 
=  e tx

x =0 x!
 x
= e  e− tx

x =0 x!
 (e t  ) x
= e −

x =0 x!
Let f ( z ) = e z , z =  et
Then, the Maclaurin expansion of f (t ) is :
( z )1 ( z)2 ( z )3
f (t ) = f (0) + f (0) + f (0) + f (0) + ...
1! 2! 3!
 ( z) x  ( et ) x
=  = . Thus it is the Maclaurin expansion of ee
t

x =0 x ! x =0 x!
 m(t ) = e −  ee = e ( e −1)
t t

43
dm(t )
=  et e ( e −1)
t

dt
dm(t )
|t =0 =  et e ( e −1) =  = E ( X )
t

dt
d 2 m(t )
=  e t
 e t  ( et −1)
e +  e t  ( et −1)
e
dt 2
d 2 m(t )
2
|t =0 =  2 +  = E ( X 2 )
dt
Var ( X ) =  2 +  −  2 = 
Applications:
Analysis of accidents.
Analysis of waiting at service giving centres.
Analysis of telephone calls per hour.
Defective parts in outgoing shipments

Example: A manufacturer of light bulbs knows that 2% of his production (number of


bulbs) is defective bulbs. Find the probability that a box of 100 bulbs contains at most 3
defectives.
Let X = number of defectives in a sample of 100 bulbs and p = 0.02
Find P(X ≤3).
Using the binomial distribution
100  3
f ( x) =  x
 (0.02) (0.98)
100 − x
and P( x  3) =  f ( x)
 x  x =0

100!
= (0.02) 0 (0.98)100−0
0!(100 − 0)!
100! 100!
+ (0.02)1 (0.98)100−1 + + (0.02) 2 (0.98)100 −2
1!(100 − 1)! 2!(100 − 2)!
100!
+ (0.02)3 (0.98)100−3
3!(100 − 3)!
= (0.98)100 + 99(0.02)(0.98)99 + 4950(0.02) 2 (0.98) 98 + 161700(0.02)3 (0.98)97
= 0.1326 + 0.2679 + 0.2734 + 0.1823
= 0.8562

44
We could also use the Poisson distribution and approximate this result as follows:
n = 100, p = 0.02 implying np =  = 2 and X = {0, 1, 2, 3}
e −2
thus, f ( x) = 2 x , X = {0, 1, 2, 3}
x!
P ( X  3) = P ( X = 0 ) + P ( X = 1) + P ( X = 2 ) + P ( X = 3)
 0 1 2 3

= e −2  2 + 2 + 2 + 2 
 0! 1! 2! 3! 
= 0.1353  6.33333  0.8569

45
3.2. Some Special Probability Densities: The case of continuous
3.2.1 The uniform Continuous distribution
It is the simplest form of special probability densities.

Definition: A random variable X is said to have a uniform continuous distribution with


parameters  and  ( x ~ U ( ,  ) ) on the interval [, β] iff its probability density

function is given by:

 1
 , if   X  
f ( x) =   − 
0, otherwise


E(X ) =  xf (x ) dx

1
=  x  −  dx
x
2 
=
2( −  ) 

=
1
2( −  )
(
2 −2 )
=
1
( −  )( +  )
2( −  )

=
( +  )
2
Similarly

46

E (X )2
= x
2
f ( x ) dx


1
= x dx
2

 −
x
3

=
3 ( − ) 

=
1
( 3
− 3 )
3 ( − )
(  −  ) (  2 +  +  2 )
1
=
3 ( − )
( 2
+  +  2 )
=
3
Therefore
Var ( x) = E ( x 2 ) − E ( x) 2
(  2 +  +  2 )  ( +  ) 
2
= −
3  2 
(  2 +  +  2 ) (  2 + 2 +  2 )
= −
3 4
4  + 4 + 4 − 3 − 6 − 3 2
2 2 2

12
 − 2 + 
2 2
(  −  )2
= =
12 12

Moment generation Function of a uniformly distributed random variable, X, is given by:


β

m(t ) = E e ( ) tx
= e
tx 1 1
α β-αdx = t (  −  ) e |
tx 

e

− e tα
=
t ( β-α )
Exercise: For a uniformly distributed random variable, X, over the interval [α, β], show,
using moment generating function, that

47
 +  2 +  +  2
E(X ) = , E (X 2 ) = , and hence deduce its variance.
2 3

Note that: First find the Maclaurin expansion of both et  & et separately

3.2.2 The normal distribution


One of the most important distributions in the study of probability and statistics is the
normal (Gaussian) distribution because most hypothesis tests that are used assume
that the random variable being considered has an underlying normal distribution.

Definition: A random variable X with parameters μ (for μ is the set of all real numbers)
and σ (>0) is said to follow the normal distribution if its probability density function is
given by:
1
1 − 2 2 ( x− )2
f ( x) = e , −  x   , and usually written as X ~ N(μ, σ2).
 2
The distribution is usually written as X ~ N (  ,  2 ) , and the values of the two

parameters are unknown. Though tedious, it possible to indicate that for the normal
distribution:
Note that: i ) f ( x )  0 x  R

ii )  f ( x )dx = 1
−

Normal distribution has important characteristics:

(1) the curve has a single peak;

(2) it is bell-shaped;

(3) the mean (average) lies at the center of the distribution and the distribution is
symmetrical around the mean;

(4) the two tails of the distribution extend indefinitely and never touch the
horizontal axis;

(5) the shape of the distribution is determined by its Mean (µ) and Standard
Deviation (σ).

48
The shape of the normal distribution shows that observations occur mostly in the
neighbourhood of the mean, which is equal to the median and the mode of the
distribution. Their frequency decreases as they move away from the mean.
Approximately, 68% of the area under the curve lies in the region [μ-σ, μ + σ], 95% in
[μ-2σ, μ+2σ] and 99% in [μ-3σ, μ+3σ]. That is, p(μ-σ, μ + σ) = 0.68; p(μ-2σ, μ+2σ) =
0.95 and p(μ-3σ, μ+3σ) = 0.99 Graphically,

49
50
Moment Generating Function of a Normal Random Variable
Let X  N(, 2), then
m(t ) = E (etx )
( x− )
2
1
 −

e 
1 2

= e
tx 2
dx,
−  2
but ( x −  )2 = x 2 − 2 x +  2
x −2  x+ 
1  

2 2 2
 − 2
− 2t x

e 
1  
= 
2
dx
− 2

but (x 2 2 2
)
− 2 x +  − 2t x = ( x − (  + t 2 )) 2 +  2 − (  + t 2 ) 2

( ( )) ( )
 2 2
1 
x −  + t 2 +  −  + t 2 
2
 − 2 
1 

2  
= e dx,
− 2

( x − (  + t 2 ) ) ( ) dx,
2  2
1 
 −  + t 2 
1 2
 − − 2 
1   
2


2
=
2  
e e
− 2

( ) ( ) dx,  =  + t
 2 2

 −  + t 2 x − *
1  2
 1
− −
2  
= e  e 
 1 2


2   2 * 2

− 2
=1 as the integrand is N (  +t 2 ,  2)
( the sum of probability is one)

( )
 2
1 
 −  + t 2 
2
− 2 
= e 
2 
 

1 
 − 
2 2
− 2
− 2  t 2 − t 2 4 

= e 
2  

t
1 2 2
t +
 m(t ) = e 2

51
1
dm(t ) t  + t 2 2
So E ( X ) = = (  + t )e 2 2
dt
dm(t )
|t =0 = 
dt
1 1
d 2 m(t ) t  + t 2 2 t  + t 2 2
2
=  (e 2 2
) + (  + t )(  + t )e 2
2 2

dt
d 2 m(t )
2
|t = 0 = 
2
+  2 = E ( X 2)
dt
 Var ( X ) = E ( X 2) − ( E ( X ) ) = 2
2

Note: If X ~ N(0, 1), then m() = exp[(1/2) θ2]

3.2.2.1 The standard normal distribution


A special case of the normal distribution is when  = 0 and 2 =1. This is known as the
standard normal distribution and its density function is independent of the parameters
as follows
2
1 − x2
f ( x) = e , −  x  
2
X −
Proposition: If X ~ N(μ, σ2) and Z = then Z ~ N(0, 1). Prove that!!

The standard normal is symmetric around the origin and is bell shaped. Graphically,

52
Its CDF is given as
x
( x ) F (z )
1 − 1t 2
= = 
− 2
e 2 dt

the integral does not have a closed form solution but requires numerical integration. The
values of the standard normal are tabulated in most statistics books.
Proof:
Let Ф(z) be the distribution function of Z, and
F(x) be the distribution function of X

(z ) = P (Z  z )
 X − 
= P  z
  
= P( X   + z )
 x− 
2
 +z
1
e    dx
1
= 

2
−  2
  (z ) = (z ) =
d 1 −1 z2
e 2
dz 2
Hence, Z ~ N(0, 1)
z
1 − 12u 2
(z) = 
− 2
e du is the cumulative distribution function of a standard

normal random variable, and is tabulated for the positive values of Z, in any standard
statistics text books.
We use the fact that P( Z  − z ) = P( Z  z ) = 1 − P( Z  z ) to calculate the probability of Z

lying below the mean value.

Example: Let Y be the marks obtained by students in an examination and the following
probabilities are given:
P(Y  60) = 0.2 and P(Y < 40) = 0.3
Find the mean and the standard deviation of the marks. Now

53
 Y −  60 −  
P(Y  60) = P    assuming that Y ~ N (  ,  )
2

   
 60 −  
= P z 
 
Z ~ N (0,1)

 60 −  
→ P z 
 
=0.2

From the z (standard normal table) we obtain z = 0.84 to be associated with the
probability of 0.2. We first find 0.5- 0.2 = 0.3. Then around 0.3 values 0.2995 and
0.3023. The value 0.2995 is closer to 0.3. From 0.2995 first read horizontally to the left
and then vertically. We get the value 0.84
Similarly, we have
 40 −  
P(Y  40) = P  z 
  
 40 −  
→ P z 
 
=0.3

From the z (standard normal table) we obtain z = -0.52 to be associated with the
probability of 0.3. 0.5 – 0.3 = 0.2. Then value associated to 0.2 is -0.52 (because it is to
the left of zero).
Therefore,
60 − 
= 0.84 →  + 0.84 = 60 and

40 − 
= −0.52 →  − 0.52 = 40

 = 60 − 0.84 from the 1st equation
→ 60 − 0.84 − 0.52 = 40
20 = 1.36
 = 14.71
→  = 60 − 0.84 14.71 = 47.6436
Note that standard normal distribution:
1. is symmetric about z = 0, i.e., f(-z) = f(z)
2. attains maximum value at z = 0
3. the maximum value of the function is (1/2)½
4. mean = mode = median

54
3.2.3 The Normal Approximation to the Binomial Distribution
Let X ~ B[n, p], then E ( X ) = np & Var ( X ) =  2 = npq
X − np
Let Y =
npq
A
In the limit, i.e., as n gets larger, it can be shown that, Y ~ N[0, 1]. Under such

circumstances we say the asymptotic distribution of the binomial distribution is N(0,1),


X − np A

~ N (0,1) . An equivalent statement is X ~ N (np, npq) .


A
which is denoted by
npq
Example: Find the probability of getting 6 heads and 10 tails in 16 tosses of a balanced
coin, and also use the normal distribution to approximate this problem.
Let X = # of heads in 16=n tosses of a balanced coin. Then,
 16 
P ( X =6 ) =   0.5 6 0.5 10
 6 

=
16!
( )
6! x 10! 2
1
16

8008
=  0.1222
65536
P[X = 6] is approximated by the area of the normal distribution between 5.5 and 6.5.
Thus, np = μ = 8 & σ2 = np(1-p) = 4 → σ = 2
P( X = 6) = P ( 5.5  X  6.5 )
 5.5 − 8 6.5 − 8 
= P Z  X − A
 2 2  Where Z = N (0,1)
 ~
= P ( −1.25  Z  −0.75 )
= (−0.75) − ( −1.25)
Note (−1.25) = 1 − (1.25) & (−0.75) = 1 − (0.75)
= 1 − 0.8944 & = 1 − 0.7734
= 0.1056 & = 0.2266
 P( X = 6)  0.2266 − 0.1056 = 0.121

55
Exercise: Let X be a random variable with probability density function given by:
cxe −2 x , 0  x  
f ( x) = 
 0 otherwise
a) Find the value of c that makes f(x) a proper probability density function.
b) Give the mean and variance of the random variable X.

3.2.4. The Gamma Distribution:


Definition: The function that takes the form:

 ( ) = x
 −1 − x
e dx ,   0 is known as the gamma function
0

Note that: i) Γ(0) is undefined.


ii) For α > 1, Γ(α+1) = α Γ(α)
iii) Γ(1) = 1
iv) For integer α ≥ 1, Γ(α+1) = α!
v) Γ(1/2) = √π
Definition: A random variable X has the Gamma Distribution with parameters α and β
(X ~ G(α , β) iff its probability density function is given by:

 x −1e − x 
  , 0  x   &  0 &   0
f ( x ) =    ( )
0
 , elsewhere
The Moment generating Function for a gamma random variable is:

56
m(t ) = E  e tx 
  −1 −x 
x e
= 0 e   ( ) dx
tx

   ( ) 
−1

e
−x 
=  tx
x
 −1
e dx
0

   ( ) 
−1 1
= x
  −1 − x( −t )
e  dx
0

   ( ) 
−1 −x
= x
  −1
e   (1− t  ) dx
0

=    ( ) 

−1
    ( )
1 − t  
1
(1 − t  )
−
= =
(1 − t  )

(1 − t  )
−
 m(t ) =
So E ( X ) = M (1)
X ( 0) = 
E ( X 2 ) = M (2)
X ( 0) =  2 2 +   2

 Var ( X ) = E ( X 2 ) −  E ( X ) 
2
= 2

Special Cases of a Gamma Distribution


i) When α = 1, the Gamma distribution boils down to the exponential
distribution (Exponential Density) given by:
 e− x 
 , 0 x
f ( x) =  
0
 , elsewhere
E(X) = β and Var(X) = β2

57
m(t ) = E (etx )
−x


e
= 
tx
e dx
0 
1  tx − x


= e dx
 0

1  − x( 1
 −t )
=
 
0
e dx

− 1
− x ( 1 −t ) 

= e |0
 1− t
1
= = (1 −  t ) −1
1− t
dm(t )
= −1(1 −  t ) −2 (−  )
dt
dm(t )
|t =0 =  = E ( X )
dt
d 2 m(t )
2
= 2(1 −  t ) −3 (−  )(−  )
Then, dt
d 2 m(t )
2
= 2 2 = E ( X 2 )
dt
Var ( X ) = 2  2 −  2 =  2
ii) When α = ν/2 and β =2, the Gamma distribution simplifies to the Chi-Square
Distribution with ν degrees of freedom given by:

 x 2−1e − x 2
 2 , 0 x
f ( x) =  2  ( 2 )
0
 , elsewhere
Moment Generating Function for Chi-Square Distribution is

(1 − 2t )
− 2
m(t ) = , E( X ) =  & Var ( X ) = 2

iii) If 1/β = λ, then the exponential distribution can be rewritten as:

58
 e−  x , x  0,   0
f ( x) = 
0, elsewhere
Exercise: For a random variable, X, possessing an exponential distribution, show that
E(X) = 1/λ, and Var(X) = 1/λ2.
The cumulative distribution function of an exponential distribution is F(x) = 1-e-λx
Proof:
when x = 0, f(x) = 
given f ( x) =  e−  x , x  0

x 0 x
F ( x) = 
−
f (t )d (t ) = 
−
f (t )d (t ) +  f (t )d (t )
0
=0
x
=  f (t )d (t )
0
x
=   e − t d (t )
0


x

= − e − t
 0
− x
= −e − (−1)
= 1 − e−  x
Note that the exponential distribution is used to find the waiting time between two
events –say the time period elapsing between two telephone calls. Recall that the
Poisson distribution defines the probability for the number of times that an event takes
place per unit time. We know that the probability distribution of a random variable, X,
used to forecast such occurrences is given by

− x
f ( x) = e , x = 0,1, 2,...
x!
Recall that  = E ( x) is the expected number of occurrences of the event per unit of
time.
Let y = number of occurrences in t unit of time (t > 0 & t +), then E ( y ) = t 
i.e., x ~ poi( ) & y ~poi(t  )

Let Z = the time elapsing before the 1st occurrence of the event (z > 0)

59
Then, F ( z ) = P ( Z  z ) = 1 − P ( Z  z ) & P( Z  z ) = P(No occurrence of the event in time
interval z).
 P(Z  z ) = e− Z
 F Z ( z ) = 1 − e − Z
d F Z (z )
 f Z (z ) = =  e − Z
dz

Hence, E (Z ) =  z f (z )dz
Z
0

=  z e
− Z
dz
0

1
=   z e− Zdz =
0

Example: Suppose that the life of a light bulb has an exponential distribution with  =

1/400 = 0.0025. What is the probability that 4 out of 5 bulbs chosen at random have life
in excess of 500 hours?
Let X = the life of a bulb, then p = P(X  500) is the probability of observing that a bulb
lasts for at least 500 hours. Thus,

 e− x , x  0
f (x ) = 
0 , otherwise

p = P (X  500 ) =  0.0025e
− 0.0025 x
dx
500

= − e − 0.0025 x
500

= 0 − ( − e − 0.0025 x 500
)
= e −1.25
 0.2865

 n  y n− y
and the probability that 4 out of 5 bulbs have X > 500 is   p q where n = 5, y = 4,
 y
p = 0.2865 thus the required probability is 5(0.2865)4(0.7135) = 0.024

60