K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
K. Sam Shanmugan, Arthur M. Breipohl-Random Signals - Detection, Estimation and Data Analysis-Wiley (1988) PDF
DETECTION,
ESTIMATION
AND
DATA ANALYSIS
K. Sam Shanmugan
University o f Kansas
Arthur M. Breipohl
University o f Oklahoma
WILEY
10 9 8 7
CONTENTS
CHAPTER 1 Introduction
1.3 References 7
2.1 Introduction 8
2.2 Probability 9
2.2.1 Set Definitions 9
2.2.2 Sample Space 12
2.2.3 Probabilities of Random Events 12
2.2.4 Useful Laws of Probability 14
2.2.5 Joint, Marginal, and Conditional Probabilities 15
2.9 Summary 95
A
2.10 References 96
A
2.11 Problems 97
t
s
CHAPTER 3 Random Processes and Sequences
%
3.1 Introduction 111
i
viii CONTENTS
CHAPTER 8 Statistics
Summary 547
i
xii CONTENTS
APPENDIXES
A. Fourier Transforms 626
B. Discrete Fourier Transforms 628
C. Z Transforms 630
D. Gaussian Probabilities 632
E. Table of Chi-Square Distributions 633
F. Table of Student's f Distribution 637
G. Table of F Distributions 639
H. Percentage Points of Run Distribution 649
I. Critical Values of the Durbin-Watson Statistic 650
Index 651
About The Authors
more general parameters such as the autocorrelation and power spectral density
function directly from data without such a specific model.
There are several possible courses for which this book could be used as a
text:
Introduction
(
Transmitted Transmitted Received Recovered 2
sequence waveform waveform data sequence
Z ~ i a i(M ( “ “ I x xm (— “ I y i U) I " : I bi i k)
Terminal ^ Transmitter Transmission ^ Receiver
n ...1001011... link #1 #1
INTRODUCTION
Another example of a random signal is the “ noise” that one hears from an
A M radio when it is tuned to a point on the dial where no stations are broad
casting. If the speaker is replaced by an oscilloscope so that it records the output
voltage of the audio amplifier, then the trace on the oscilloscope will, in the
course of time, trace an irregular curve that does not repeat itself precisely and
cannot be predicted.
Signals or waveforms such as the two examples presented before are called
random signals. Other examples of random signals are fluctuations in the in
stantaneous load in a power system, the fluctuations in the height o f ocean waves
at a given point, and the output o f a microphone when someone is speaking
into it. Waveforms that exhibit random fluctuations are called either signals or
noise. Random signals are waveforms that contain some information, whereas
noise that is also random is usually unwanted and interferes with our attempt to
extract information.
Random signals and noise are described by random process models, and
electrical engineers use such models to derive signal processing algorithms for
recovering information from related physical observations. Typical examples
include in addition to the recovery o f data coming over a noisy communication
channel, the estimation o f the “ trend” o f a random signal such as the instan
taneous load in a power system, the estimation of the location of an aircraft
from radar data, the estimation o f a state variable in a control system based on
noisy measurements, and the decision as to whether a weak signal is a result of
an incoming missile or is simply noise.
The earliest stimulus for the application o f probabilistic models to the physical
world were provided by physicists who were discovering and describing our
physical world by “ laws.” Most o f the early studies involved experimentation,
and physicists observed that when experiments were repeated under what were
assumed to be identical conditions, the results were not always reproducible.
Even simple experiments to determine the time required for an object to fall
through a fixed distance produced different results on different tries due to slight
changes in air resistance, gravitational anomalies, and other changes even though
4 INTRODUCTION
1.2 OUTLINE OF TH E B O O K
This book introduces the theory of random processes and its application to the
study o f signals and noise and to the analysis o f random data. After a review
of probability and random variables, three important areas are discussed:
In the first part o f the book, Chapters 2, 3, 4, and 5, we develop models for
random signals and noise. These models are used in Chapters 6 and 7 to develop
signal-processing algorithms that extract information from observations. Chap
ters 8 and 9 introduce methods o f identifying the structure of probabilistic mod-
OUTLINE OF THE BOOK 5
els, estimating the parameters o f probabilistic models, and testing the resulting
model with data.
It is assumed that the students have had some exposure to probabilities and,
hence, Chapter 2, which deals with probability and random variables, is written
as a review. Important introductory concepts in probabilities are covered thor
oughly, but briefly. M ore advanced topics that are covered in more detail include
random vectors, sequences o f random variables, convergence and limiting dis
tributions, and bounds and approximations.
In Chapters 3, 4, and 5 we present the basic theory o f random processes,
properties o f random processes, and special classes o f random processes and
their applications. The basic theory o f random processes is developed in Chapter
3. Fundamental properties o f random processes are discussed, and second-order
time domain and frequency domain models are emphasized because o f their
importance in design and analysis. Both discrete-time and continuous-time models
are emphasized in Chapter 3.
The response of systems to random input signals is covered in Chapter 4.
Time domain and frequency domain methods o f computing the response o f
systems are presented with emphasis on linear time invariant systems. The con
cept o f filtering is introduced and some examples of filter design for signal
extraction are presented.
Several useful random process models are presented in Chapter 5. The first
part o f this chapter introduces discrete time models called autoregressive moving
average (A R M A ) models which are becoming more important because of their
use in data analysis. Other types o f models for signals and noise are presented
next, and their use is illustrated through a number o f examples. The models
represent Markov processes, point processes, and Gaussian processes; once
again, these types o f models are chosen because o f their importance to electrical
engineering.
Chapters 6 and 7 make use o f the models developed in Chapter 5 for de
veloping optimum algorithms for signal detection and estimation. Consider the
problem of detecting the presence and estimating the location of an object in
space using a radar that sends out a packet of electromagnetic energy in the
direction of the target and observes the reflected waveform. We have two prob
lems to consider. First we have to decide whether an object is present and then
we have to determine its location. If there is no noise or distortion, then by
observing the peak in the received waveform we can determine the presence of
the object, and by observing the time delay between the transmitted waveform
and the received waveform, we can determine the relative distance between the
radar and the object.
tn the presence o f noise (or interference), the peaks in the received waveform
may be masked by the noise, making it difficult to detect the presence and
estimate the location of the peaks. Noise might also introduce erroneous peaks,
which might lead us to incorrect conclusions. Similar problems arise when we
attempt to determine the sequence o f binary digits transmitted over a commu
nication link. In these kinds o f problems we are interested in two things. First
of all, we might be interested in analyzing how well a particular algorithm for
6 INTRODUCTION
1.3 REFERENCES
[1] Davenport, W. B., and Root, W. L., Introduction to Random Signals and Noise,
McGraw-Hill, New York, 1958.
[2] Gauss, K. G., Theory o f Motion o f the Heavenly Bodies (translated), Dover, New
York, 1963.
[3] Kalman, R. E., “ A New Approach to Linear Filtering and Prediction Problems,”
J. Basic Eng., Vol. 82D, March 1960, pp. 35-45.
2.1 INTRODUCTION
functions and density functions are developed. We then discuss summary meas
ures -(averages or expected values) that frequently prove useful in characterizing
random variables.
Vector-valued random variables (or random vectors, as they are often re
ferred to) and methods of characterizing them are introduced in Section 2.5.
Various multivariate distribution and density functions that form the basis of
probability models for random vectors are presented.
As electrical engineers, we are often interested in calculating the response
of a system for a given input. Procedures for calculating the details of the
probability model for the output of a system driven by a random input are
developed in Section 2.6.
In Section 2.7, we introduce inequalities for computing probabilities, which
are often very useful in many applications because they require less knowledge
about the random variables. A series approximation to a density function based
on some o f its moments is introduced, and an approximation to the distribution
o f a random variable that is a nonlinear function o f other (known) random vari
ables is presented.
Convergence of sequences of random variable is the final topic introduced
in this chapter. Examples of convergence are the law of large numbers and the
central limit theorem.
PROBABILITY
null set are called finite sets. A set that is not countable is called uncountable.
A set that is not finite is called an infinite set.
A C B
or equivalently
B DA
A <1S
l C A
A C A
Set Equality. Two arbitrary sets, A and B, are called equal if and only if they
contain exactly the same elements, or equivalently,
A UB
and is the set o f all elements that belong to A or belong to B (or to both). The
union o f N sets is obtained by repeated application o f the foregoing definition
and is denoted by
N
A , U A 2 U ••■U A n U A,
and is the set o f all elements that belong to both A and B. A n B is also written
A B . The intersection o f N sets is written as
N
A i (~l A 2 (~l ■ 0 * O Aft na1
A n B = AB = l
Aj D A ; = ^ for all i, j, iA j
Commutative Laws.
A UB = B UA
A n B = B n A
Associative Laws.
(i4 U B ) U C = A U (B U C) = A U B U C
(A nB )nc = A n ( B n c ) = A fiB n c
Distributive Laws.
A n (B u C) = (a n b ) u (A n c )
AU(BnC) = (AUB)n(AUC)
12 REVIEW OF PROBABILITY AND RANDOM VARIABLES
DeMorgan’s Laws.
(A U B ) = A D B
(A n B ) = A U B
L P(S) = 1 ( 2 . 1)
(2.3)
if A, n A, = $ for i A j,
and A' may be infinite
(0 is the empty or null set)
P ( A ) = lim — (2.4)
/.-» n
For example, if a coin (fair or not) is tossed n times and heads show up nH
times, then the probability of heads equals the limiting value of nHln.
(2.5)
If we use this definition to find the probability o f a tail when a coin is tossed,
we will obtain an answer o f |. This answer is correct when we have a fair coin.
If the coin is not fair, then the classical definition will lead to incorrect values
for probabilities. We can take this possibility into account and modify the def
14 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Upface 1 2 3 4 5 6 Total
Relative
Frequency .155 .159 .164 .169 .174 .179 1.000
1 1 1 1 1 1
Classical 6 6 6 6 6 6 1.000
They obtained these frequencies by calculating the excess of even over odd in
Longcor’s data and supposing that each side o f the die is favored in proportion
to the extent that is has more drilled pips than the opposite side. The 6, since
it is opposite the 1, is the most favored.*1
then
P ( A ) = P ( A n 5) = P[A fl {Ax U A ; U • • • U A n)\
= p [(a n A i) u (A n a 2) u • • • u (A n a „)]
= F ( A n ^ j ) + P (A n a 2) + • • ■ + P (A n A n) (2.10.e)
The sets A ,, A 2, . . . , A n are said to be mutuallyexclusive and exhaustive
if Equations 2.10.C and 2.10.d are satisfied.
8. = P { A l) + P { A 2A 2) + P { A lA1A 2) + ■ ■ •
( 2 . 11)
S2 o f Ez consists o f outcomes b\, b2, . . . , b„2, then the sample space S of the
combined experiment is the Cartesian product o f Si and S2. That is
S = Si x S2
= {(a „ bj): i — 1,2, , nu j = 1, 2, . . . , n-h
= £ P(A,B,) ( 2 . 12)
1=1
(2.13)
Given that the event A has occurred, we know that the outcome is in A. There
are NA outcomes in A. Now, for B to occur given that A has occurred, the
outcome should belong to A and B. There are NaB outcomes in AB. Thus, the
probability o f occurrence o f B given A has occurred is
P(B\A) - (2.14)
One can show that P(B\A) as defined by Equation 2.14 is a probability measure,
that is, it satisfies Equations 2.1, 2.2, and 2.3.
P { A ) = 2 F(A|B/)P (B /) (2.18)
18 REVIEW OF PROBABILITY AND RANDOM VARIABLES
EXAMPLE 2.2.
Class of Defect
Si — B2 — B3 — 64 = 65 —
Manufacturer none critical serious minor incidental Totals
A4, 124 6 3 1 6 140
m2 145 2 4 0 9 160
m3 115 1 2 1 1 120
M4 101 2 0 5 2 110
Totals 485 11 9 7 18 530
What is the probability o f a component selected at random from the 530 com
ponents (a) being from manufacturer M2 and having no defects, (b) having a
critical defect, (c) being from manufacturer M,, (d) having a critical defect given
the component is from manufacturer M2, (e) being from manufacturer M u given
it has a critical defect?
SOLUTION:
(a) This is a joint probability and is found by assuming that each component
is equally likely to be selected. There are 145 components from M 2
having no defects out of a total of 530 components. Thus
/><«,«,) =i
(b) This calls for a marginal probability.
P ( B 2) = P ( M i B2) + P (M 2B2) + P (M 3B2) + P ( M 4B2)
6 2 1 2 11
~ 530 + 530 + 530 + 530 ~~ 530
Note that P ( B 2) can also be found in the bottom margin o f the table,
that is
PROBABILITY 19
(d) This conditional probability is found by the interpretation that given the
component is from manufacturer M2, there are 160 outcomes in the
space, two o f which have critical defects. Thus
P (B 2\M2) =
160
2
P (B 2M2) 530 2
P (B 2\M2)
P (M 2) 160 160
530
(e)
p {m ,\b 2) = Y\
Bayes’ Rule. Sir Thomas Bayes applied Equations 2.15 and 2.18 to arrive at
the form
P(A|Z?,.)P(fi,)
P(Bj\A) = (2.19)
£ P(A\B^P{B,)
i= i
EXAMPLE 2.3.
one is received as a one is .90. We also assume the probability a zero is transmitted
is .4. Find
SOLUTION: Defining
A — one transmitted
A = zero transmitted
B = one received
B = zero received
P ( B ) = P (B \ A )P (A ) + P {B \ A )P (A )
= .90(.6) + ,05(.4)
= .56.
(b) Using Bayes’ rule, Equation 2.19
P(A\B) = = (-90X-6) _ 27
' P (B ) .56 28
or when
PiA^Bj) = P ( A t) (2.20.b)
RANDOM VARIABLES 21
Equation 2.20.a implies Equation 2.20.b and conversely. Observe that statistical
independence is quite different from mutual exclusiveness. Indeed, if A t and Bj
are mutually exclusive, then P{AiBj) = 0 by definition.
2.3 R A N D O M V A R IA B L E S
x~\T) = {x e S :Z (x ) e t }
P(X = x) = P (X:Z(X) = x)
P (Z < x) = P (X:Z(X) < x\
P{xx < X s x2) = P {X:jc, < Z(X) s x j
22 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
F ig u r e 2 .1 M a p p i n g o f t h e s a m p l e s p a c e b y a r a n d o m v a r ia b le .
EXAMPLE 2.4.
Consider the toss o f one die. Let the random variable X represent the value of
the up face. The mapping performed by X is shown in Figure 2.1. The values
of the random variable are 1, 2, 3, 4, 5, 6.
1. Fx {-* > ) = 0
2. Fx (« ) = 1
3. lim Fx {x + e) = Fx (x)
€'—
*0
€>0
4. Fx {xi) s F x ( x 2) if Xi < x 2
5. P [ x x < X £ x 2] = F x { x 2) - Fx (*,)
EXAMPLE 2.5.
Consider the toss o f a fair die. Plot the distribution function of X where X is a
random variable that equals the number o f dots on the up face.
RANDOM VARIABLES 23
1 1 1 1 1 1 1 1 1
1 T----------------------------------------
1
5/6
r r1 — 1
4/6
h 1
* W 3/6 t— 1
1
2/6 ■ """
1/6
- T---------1
1 i i i I l l_____ 1______1______
00 1 2 3 4 5 6 7 8 9 10
X
Figure 2.2 D is t r ib u t io n f u n c t i o n o f th e r a n d o m v a r ia b le X s h o w n in F ig u r e 2 .1 .
Joint Distribution Function. We now consider the case where two random
variables are defined on a sample space. For example, both the voltage and
current might be of interest in a certain experiment.
The probability o f the joint occurrence o f two events such as A and B was
called the joint probability P (A IT B). If the event A is the event { X < x) and
the event B is the event (Y < y), then the joint probability is called the joint
distribution function o f the random variables X and Y ; that is
f x .A.x , y) = =£ * ) n (Y =£ y)}
EXAMPLE 2.6.
Consider the toss o f a fair die. Plot the probability mass function.
P ( X = x i)
1/6
3. P (X = x,|Y = = P ( X p ( Y = Yyj) ~ ’ P (Y = y ) # °
(2.25)
(2.26)
4. Random variables X and Y are statistically independent if
EXAMPLE 2.7.
Find the joint probability mass function and joint distribution function of X , Y
associated with the experiment o f tossing two fair dice where X represents the
I
26 REVIEW OF PROBABILITY A N D RANDO M VARIABLES
4 number appearing on the up face o f one die and Y represents the number
appearing on the up face of the other die.
i
SOLUTION:
%
i 1
P ( X = i, Y = / ) = — , i = 1 ,2 , . . . ,6 ; j = 1, 2, . . - , 6
i x y i
f x .y ( x , y) = X X * = i >2 , . . . ,6 ; y = 1, 2, . . . , 6
t 1=1 /=i do
i xy
36
i
^ If jc and y are not integers and are between 0 and 6, Fx x (x, y ) = FXiy([^], [y])
^ where [a:] is the greatest integer less than or equal to x. Fx Y(x, y) — 0 for x <
4 1 or y < 1. Fx x {x, y ) = 1 for x s 6 and y > 6. Fx x (x, y ) = Fx {x) for y > 6.
Fx x {x, y ) = F y(y) for x > 6 .
i
I -------------------------------------------------------------------------------------------------------------------
i
jl 2.3.3 Expected Values or Averages
] The probability mass function (or the distribution function) provides as complete
a description as possible for a discrete random variable. For many purposes this
^ description is often too detailed. It is sometimes simpler and more convenient
f to describe a random variable by a few characteristic numbers or summary
measures that are representative of its probability mass function. These numbers
1 are the various expected values (sometimes called statistical averages). The ex-
I pected value or the average o f a function g (X ) o f a discrete random variable X
is defined as
1
, E {g {X ) } = 2 g ( x d P ( X = x,.) (2.28)
i ;=i
j It will be seen in the next section that the expected value of a random variable
is valid for all random variables, not just for discrete random variables. The
I form o f the average simply appears different for continuous random variables.
Two expected values or moments that are most commonly used for characterizing
a random variable X are its mean p.A- and its variance ux . The mean and variance
! are defined as
i
n
r E {X } = p* = ^ x ,P { X = x,) (2.29)
1= 1
RANDOM VARIABLES 27
A useful expected value that gives a measure of dependence between two random
variables X and Y is the correlation coefficient defined as
The numerator of the right-hand side of Equation 2.33 is called the covariance
(<Tsr) o f X and Y. The reader can verify that if X and Y are statistically inde
pendent, then PxY = 0 and that in the case when Wand Y are linearly dependent
(i.e., when Y = (b + kX), then |pxy| = 1. Observe that pXY = 0 does not imply
statistical independence.
Two random variables X and Y are said to be orthogonal if
E {X Y } = 0
where the subscripts denote the distributions with respect to which the expected
values are computed.
One o f the important conditional expected values is the conditional mean:
The conditional mean plays an important role in estimating the value o f one
random variable given the value o f a related random variable, for example, the
estimation of the weight o f an individual given the height.
Gx { z ) = 2 z kP { X = k) (2.35.a)
k = 0
1. Gx { 1) = X P ( X = k) = 1 (2.35.b)
k =0
2. If Gx (z ) is given, p k can be obtained from it either by expanding it in a
power series or from
Cn = E { X ( X - l X * - 2) ■ • • ( X - n + 1)}
-£ lO A z)iU (2.35.d)
RANDOM VARIABLES 29
From the factorial moments, we can obtain ordinary moments, for example, as
M-x —
and
o-x = C2 + Q - Cl
where
The reader can verify that the mean and variance of the binomial random variable
are given by (see Problem 2.13)
l^x = nP (2.38.a)
u\ = tip{l - p ) (2.38.b)
then the number o f events in a time interval o f length T can be shown (see
Chapter 5) to have a Poisson probability mass function o f the form
X*
P ( X = k) * = 0, 1, 2, (2.39.a)
k!
where X — X'T. The mean and variance of the Poisson random variable are
given by
M-x = X. (2.39.b)
oi = X (2.39.c)
P (X i — X 2 — x2, X k — xk)
n\
P\pxi ■ ■ ' PT (2.40)
Xllx2l x k-i\xk\
E X A M PLE 2.8.
(Note that this is similar to Example 2.3. The primary difference is the
use of random variables.)
SOLUTION:
(a) Using Equation 2.24, we have
P (Y = 1) = P { Y = 1|AT = 0)P(AT = 0)
+ P ( Y = 1\X = 1) P ( X = 1)
23
P ( Y = 0) = 1 — P { Y = 1) =
P ( Y = 1\X = 1) P ( X = 1)
P { X = 1| Y = 1) =
P (Y = 1)
2
JL 3
32
EXAMPLE 2.9.
SOLUTION:
(a) Let X be the random variable representing the number o f errors per
block. Then, X has a binomial distribution
E {X } = np = (1 6 )(.l) = 1.6
(b) The variance of X is found from Equation 2.38.b:
cri = rip(l - p) = (16)(.1)(.9) = 1.44
(c) P ( X > 5) = 1 - P { X < 4)
= 1 - 2 d V l ) * ( 0 . 9 ) 16-*
*=o K
= 0.017
EXAMPLE 2.10.
The number N o f defects per plate of sheet metal is Poisson with X = 10. The
inspection process has a constant probability of .9 o f finding each defect and
the successes are independent, that is, if M represents the number of found
defects
Find
(a) The joint probability mass function o f M and N.
(b) The marginal probability mass function o f M.
(c) The condition probability mass function of N given M.
(d) E{M\N}.
(e) E{M } from part (d).
SOLUTION:
n = 0, 1, .
e' 10 n
(a) P (M = i, N = « ) = (10)"(^)(-9)£(.1)"_ i = 0, 1, . •, n
»! ,.9 y u r ,
(b) P(M ' n! i\{n — i)!
g - 10( 9 ) ' ^ 1
i! n=, (n - /)!
i = 0, 1,
Thus
E{M\N} = .9N
This may also be found directly using the results of part (b) if these results are
available.
= dFx {x)
(2.41)
dx
With this definition the probability that the observed value of X falls in a small
interval of length A x containing the point x is approximated by )> (x)A x. With
such a function, we can evaluate probabilities of events by integration. As with
a probability mass function, there are properties that f x ( x ) must have before it
can be used as a density function for a random variable. These properties follow
from Equation 2.41 and the properties of a distribution function.
1. fx (x ) & 0 (2.42.a)
2. J f x ( x ) dx = 1 (2.42.b)
F ig u r e 2 .4 D i s t r ib u t i o n f u n c t i o n a n d d e n s it y fu n c t i o n f o r E x a m p le 2 .1 1 .
CONTINUOUS RANDOM VARIABLES 35
EXAMPLE 2.11.
Resistors are produced that have a nominal value of 10 ohms and are ±10%
resistors. Assume that any possible value of resistance is equally likely. Find the
density and distribution function o f the random variable R, which represents
resistance. Find the probability that a resistor selected at random is between 9.5
and 10.5 ohms.
SOLUTION: The density and distribution functions are shown in Figure 2.4.
Using the distribution function,
10.5 - 9.5 _ 1
9.5 2 2 ~ 2
F x k)
X
0
Figure 2 . 5 E x a m p l e o f a m i x e d d is t r ib u t io n fu n c t io n .
36 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
d2Fx,Y(x, y)
f x A x >y)
dx dy
f x A x >31) - 0
F x A x ’ y) /x,y (F , v) d\x dv
f x A P ; v) d\X dv 1
S!
From the joint probability density function one can obtain marginal proba
bility density functions f x (x), f r(y), and conditional probability density func
tions /Ar|y(^|y) and f y\x(.y\x ) as follows:
f x i x) = J fx,y(x, y ) dy (2.43.a)
f x ,r ( x >y) = f x ( x ) f r ( y ) (2-45)
EXAMPLE 2.12.
F Y( y ) = 0, y< 2
= 1, y > 4
1 P dv
= h l j [ xvdxdv 6L v
= 12 “ 4^’ 2 - ^ - 4
~ J j (x ~ tArXy ~ ^y)fx,y(x, y) dx dy
E {(X - p ^ X Y - p,y)}
Px y = (2.47.d)
(JxCTy
It can be shown that —I s pXY < 1. The Tchebycheff’s inequality for a contin
uous random variable has the same form as given in Equation 2.31.
Conditional expected values involving continuous random variables are de
fined as
E {g (X ) h ( Y )} = E {g (X ) }E { h ( Y )} (2.49)
It should be noted that the concept of the expected value o f a random variable
is equally applicable to discrete and continuous random variables. Also, if gen
eralized derivatives of the distribution function are defined using the Dirac delta
function S(jc), then discrete random variables have generalized density functions.
For example, the generalized density function o f die tossing as given in Example
2.6, is
If this approach is used then, for example, Equations 2.29 and 2.30 are special
cases o f Equations 2.47.a and 2.47.b, respectively.
For a continuous random variable (and using 8 functions also for a discrete
random variable) this definition leads to
(2.50.a)
which is the complex conjugate o f the Fourier transform of the pdf of X . Since
|exp(;W)| < 1,
Using the inverse Fourier transform, we can obtain f x (x) from T x(co) as
Thus, f x (x) and T x (co) form a Fourier transform pair. The characteristic function
of a random variable has the following properties.
E { X k} at co = 0 (2.51.a)
du>k
n . x 2(0, 0) = 1
and
dmdn
E{XTXa = [¥ * „ * (» ,, co2)] at (co,, co2) = (0, 0) (2.51.C)
The real-valued function Mx {t) = £ {exp (fX )} is called the moment generating
function. Unlike the characteristic function, the moment generating function
need not always exist, and even when it exists, it may be defined for only some
values of t within a region o f convergence (similar to the existence of the Laplace
transform). If Mx {t) exists, then M x {t) = tyx (t/j).
We illustrate two uses o f characteristic functions.
CONTINUOUS RANDOM VARIABLES 41
EXAMPLE 2.13.
X j and X 2 are two independent (Gaussian) random variables with means p t and
p 2 and variances a} and cr2. The pdfs of X x and X 2 have the form
1 (x, ~ p ,)2]
fx,(xd exp 1 ,2
V 2 tr cr; 2cr} J’
SOLUTION:
p» 1
(a) = —i==— e x p [ - ( ^ t - Pi)2/2cri]exp(/co^1) dxx
J V 2 tt (Tx
and hence
= exP[/M-i“ + (cri/co)2/,2] •J -
x exp[—( x x — p [)2/2cr2] dxt
where pj = Pi + cr2/co.
The value of the integral in the preceding equation is 1 and hence
¥ * ,(«) = exp[/p!co + (cr1/u ) 2/2]
Similarly
= exp[yp2co + (cr2/co)2/ 2]
= 3cr4
Following the same procedure it can be shown for X a normal random
variable with mean zero and variance cr2 that
n = 2k + 1
s m - {;.3 1K n = 2k, k an integer.
42 REVIEW OF PROBABILITY A N D RA NDO M VARIABLES
Thus
expfC^w )} = ¥ * ( 10)
- 1 + E[X](ju>) + E [ X 2] + ■ • • + E [ X n]
+ • • • (2.52.b)
2- n\
2 nl
CONTINUOUS RANDOM VARIABLES 43
E [X ] = K x
(2.52.c)
E [ X 2] = K 2 + K\
(2.52.d)
E [ X 3] = K 3 + 3K2K, + K\
(2.52.e)
E [ X 4] = K4 + 4K 3K, + 3 K\ + 6 K 2K\ + K\ (2.52.f)
b + a
~ ~Y~ (2.53.b)
2 (b - a f
x ~ 12 (2.53.C)
Gaussian Probability Density Function. One of the most widely used pdfs is
the Gaussian or normal probability density function. This pdf occurs in so many
applications partly because o f a remarkable phenomenon called the central limit
theorem and partly because o f a relatively simple analytical form. The central
limit theorem, to be proved in a later section, implies that a random variable
that is determined by the sum o f a large number o f independent causes tends
to have a Gaussian probability distribution. Several versions o f this theorem
have been proven by statisticians and verified experimentally from data by en
gineers and physicists.
One primary interest in studying the Gaussian pdf is from the viewpoint of
using it to model random electrical noise. Electrical noise in communication
44 REVIEW OF PROBABILITY AND RANDOM VARIABLES
fxb)
1 O - fxx) 2
fx(x) = exp (2.54)
V 2t 2<
j\
The family of Gaussian pdfs is characterized by only two parameters, \lx and
a x2, which are the mean and variance of the random variable X . In many ap
plications we will often be interested in probabilities such as
l (x ~ P x ) 2
P ( X > «) = f exp dx
Ja V 2 ttcrl 2oi
1
P (X > a ) = r exp( - z 2/2) dz
P-x)/ax V 2 tt
CONTINUOUS RANDOM VARIABLES 45
Various tables give any of the areas shown in Figure 2.7, so one must observe
which is being tabulated. However, any of the results can be obtained from the
others by using the following relations for the standard (p = 0, cr = 1) normal
random variable X:
/ > ( * < * ) = 1 - g (x )
P ( - a < X < a) = 2 P ( - a < I < 0 ) = 2F(0 < AT < a)
P ( X < 0) = ^ = g (0 )
EXAMPLE 2.14.
EXAMPLE 2.15.
The velocity V of the wind at a certain location is normal random variable with
|jl = 2 and cr = 5. Determine P( —3 < V < 8).
SOLUTION:
i f («-2)n
P( - 3 < V : exp
V2ir(25) L 2(25) J
( 8 - 2 ) /5 1
-I ( —3 —2)/5 V Z r r
exp
H] dx
Bivariate Gaussian pdf. We often encounter the situation when the instanta
neous amplitude of the input signal to a linear system has a Gaussian pdf and
we might be interested in the joint pdf o f the amplitude o f the input and the
output signals. The bivariate Gaussian pdf is a valid model for describing such
situations. The bivariate Gaussian pdf has the form
1 ■1
f x A x>y) exp
2'irCTA-cryVl — p 2(1 - p*)
2p(* - - |xy)~
ux u Y (2.57)
The reader can verify that the marginal pdfs o f X and Y are Gaussian with
means \lx , Py, and variances cr\, u\, respectively, and
Z = X + jY
RANDOM VECTORS 47
E{ g ( Z) } = | | g ( z ) f X:Y(x, y) dx dy
m — 1 integrals
and
fx,,X2(XU x 2)
= J J ■ ■ ■J fx „x 2....,xJx 1 , x 2>x 3, ■ ■ ■ , x m) dx3 dxA ■ ■ • dxm (2.58)
m — 2 integrals
Note that the marginal pdf of any subset o f the m variables is obtained by
“ integrating out” the variables not in the subset.
The conditional density functions are defined as (using m = 4 as an example),
r r \ \ fx,.X,.X,.xAXl’ X2>x 3 j x 4)
fx„X2.X3\xAXl’ X2,*3l*4) = -.J...*■ , , -------------- (2.59)
and
f x ,. X - , .X ,. x A X l ’ X 2i X 3i * 4 )
/ v 1.A’2|A’3.V4(-':1> -^-2!35 * 4) — (2.60)
f x 3.xXXl’ x*)
E { g ( X u X 2, X 3, X ,) }
(2.61)
= I I g ( * l , x 2, X 3, x J f x ^ . X i X ^ x S X u X^ X 3i X * ) d x \ d x 2 ( 2 -6 2 )
J — CO J — =0
RANDOM VECTORS 49
Important parameters o f the joint distribution are the means and the co-
variances
*x, = E{X,}
and
Note that a XXi is the variance of AT,-. We will use both <JX X , and ux . to denote
the variance o f X r Sometimes the notations EXi, Ex .x ., Ex .x . are used to denote
expected values with respect to the marginal distribution o f X h the joint distri
bution of X t and X h and the conditional distribution o f A", given Xjt respectively.
We will use subscripted notation for the expectation operator only when there
is ambiguity with the use o f unsubscripted notation.
The probability law for random vectors can be specified in a concise form
using the vector notation. Suppose we are dealing with the joint probability law
for m random variables X lt X 2, . ■ ■ , X m. These m variables can be represented
as components of an m x 1 column vector X ,
Xi
x 2
X == or X T = { X „ X 2, . . . , X m)
xm
where T indicates the transpose of a vector (or matrix). The values of X are
points in the m-dimensional space Rm. A specific value o f X is denoted by
X T = (*!, x 2, . . . , X m)
E(X, ) '
E ( X 2)
M-x = £ (X ) =
E { X m)
50 REVIEW OF PROBABILITY AND RANDOM VARIABLES
°> ,* 2 ■" V x ,x „
=
UX X22 <*x2x 2 ••• <JX X m2
J*X „X 2 ® x mx 2 "■ O ’X „ x m_
The covariance matrix describes the second-order relationship between the com
ponents of the random vector X . The components are said to be “ uncorrelated”
when
<rx,x, = <T; = 0, i j
and independent if
where |xx is the mean vector, 2 x is the covariance matrix, 2 X* is its inverse, |2 X|
is the determinant of 2 X, and X is of dimension m.
”x ,
~x; ~xt+1~
x2 x k+2
X = x t = x2=
x 2
_Xk_ _X„ _
and
i_W__
X 12
d. i
M'X = * 1 Xx —
^X2
1 X2i X22
0 0 •■ 0
0 0 •• 0
2X—
0 0 0 ■■
P y — A|xx (2.65.a)
XY = A X xA r (2.65.b)
and
Properties (1), (3), and (4) state that marginals, conditionals, as well as linear
transformations derived from a multivariate Gaussian distribution all have mul
tivariate Gaussian distributions.
52 REVIEW OF PROBABILITY AND RANDOM VARIABLES
EXAMPLE 2.15.
2
1
1
0
and
6 3 2 1
3 4 3 2
2X
2 3 4 3
1 2 3 3
Let
X2 = *3
*4
2Xx
Y = X x + 2X2
X3 + X4
SOLUTION:
(a) X j has a bivariate Gaussian distribution with
2 6 3
M-x, = and ^Xj
1 3 4
2 0 0 0
X2
Y = 1 2 0 0 = AX
X ,3
0 0 1 1
X4
RANDOM VECTORS 53
"2~
"2 0 0 o' ~4~
1
|Ay — A|XX — 1 2 0 0 = 4
1
0 0 1 1 1
0
and
2 y = A 2 XA 7
"2 1 o'
2 0 0 0
0 2 0
1 2 0 0
0 0 1
0 0 11
0 0 1
24 24 6
24 34 13
6 13 13
2 1 4 3 *3 - 1
J-Lx.x, — 3 3
3 2 x4 — 0
*3 “ 3 *4 + 1
3 **
and
-1
'4 3" '2 3"
-X ,| X 2
3 l\ -\ _3 3_ _! 2_
14/3 4/3
4/3 5/3
where <aT = (coj, o)2, . . . , o)„). From the joint characteristic function, the
moments can be obtained by partial differentiation. For example,
To simplify the illustrative calculations, let us assume that all random variables
have zero means. Then,
^ ( “ i , w 2, u>3, co4) = 1 - - w r S x to
+ i (t» T2 x o>)2 + R
where R contains terms o f o) raised to the sixth and higher power. When we
take the partial derivatives and set 0)3 = o>2 = o)3 = co4 = 0, the only nonzero
terms come from terms proportional to co1a>2a>3co4 in
- (w TS x w )2 = - + c r 22c o l + O 33 0)^ + 0 -4 4 ^
When we square the quadradic term, the only terms proportional to (j) 1w2oj3co4
will be
Taking the partial derivative o f the preceding expression and setting to = (0),
we have
= E { X 1X 2} E { X 3X 4} + E { X 2X 3} E { X 1X 4}
+ E {X 2X A} E { X l X 7} (2.69)
The reader can verify that for the zero mean case
In the analysis of electrical systems we are often interested in finding the prop
erties of a signal after it has been “ processed” by the system. Typical processing
operations include integration, weighted averaging, and limiting. These signal
processing operations may be viewed as transformations of a set of input variables
to a set of output variables. If the input is a set of random variables, then the
output will also be a set of random variables. In this section, we develop tech
niques for obtaining the probability law (distribution) for the set o f output
random variables given the transformation and the probability law for the set
o f input random variables.
The general type of problem we address is the following. Assume that X is
a random variable with ensemble Sx and a known probability distribution. Let
g be a scalar function that maps each x G Sx to y = g(x). The expression
^ = g (X )
56 REVIEW OF PROBABILITY AND RANDOM VARIABLES
F ig u r e 2 .8 T r a n s f o r m a t i o n o f a r a n d o m v a r ia b le .
defines a new random variable* as follows (see Figure 2.8). For a given outcome
k, X(k) is a number x, and g[X(X.)] is another number specified by g(x). This
number is the value o f the random variable Y, that is, Y(k) = y = g(x). The
ensemble SY of Y is the set
s Y = {y = sM '■x e sx}
B = {* :g (* ) S C}
P (C ) = P (A ) = P (B )*1
3
2
*For Y to be a random variable, the function g : X —> Y must have the following properties:
2. It must be a Baire function, that is, for every y , the set / , such that g(;t) s y must consist
o f the union and intersection o f a countable number o f intervals in S x . Only then {Y s y}
is an event.
3. The events {X : g(/Y (X )) = ± » } must have zero probability.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 57
P(C ) = P ( Y ^ y ) = Fy(y)
= [ f x (x ) dx
Jb
P (Y E l y) - f Y(y ) Ay
= J f x (x ) dx
which shows that we can derive the density o f Y from the density of X.
We will use the principles outlined in the preceding paragraphs to find the
distribution of scalar-valued as well as vector-valued functions of random vari
ables.
P { Y = yi) = £ P { X = x,)
two roots are x® — + V y and x® = —V y ; also see Figure 2.9 for another
example.) We know that
Now if we can find the set of values of x such that y < g(x) < y + Ay, then
we can obtain f Y(y ) Ay from the probability that X belongs to this set. That is
For the example shown in Figure 2.9, this set consists o f the following three
intervals:
x® < x £ x® + Ax®
x® + A x(2) < x s x®
x® < x :£ x® + Ax®
TRANSFORMATIONS ( FUNCTIO NS ) OF RANDO M VARIABLES 59
where Ax® > 0, Ax® > 0 but Ax® < 0. From the foregoing it follows that
We can see from Figure 2.9 that the terms in the right-hand side are given by
A x(1) = A y/g'(x(1))
Ax® = Ay /g '(x ® )
A x® = A y/g'(x® )
Hence we conclude that, when we have three roots for the equation y = g(x),
^ lx (£ ^ (2.71)
fv(y)
h ig'(^(0)l
g'(x) is also called the Jacobian of the transformation and is often denoted by
J{x). Equation 2.71 gives the pdf of the transformed variable Y in terms of the
pdf o f X , which is given. The use of Equation 2.71 is limited by our ability to
find the roots o f the equation y = g(x). If g(x) is highly nonlinear, then the
solutions of y = g(x) can be difficult to find.
EXAMPLE 2.16.
x (1) = + V y — 4
x (2) = —V y — 4
and hence
f x ( x m) f x ( * <2))
fr(y)
|g'(*(1>)l |g'(*(2,)l
/* (* ) = x 2/2),
we obtain
1
e x p ( - ( y - 4)/2), y > 4
V 2 -ir(_y — 4)
fr(y) =
0 y < 4
EXAMPLE 2.17
Using the pdf of X and the transformation shown in Figure 2.10.a and 2.10.b,
find the distribution o f Y.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 61
F ig u r e 2 .1 0 T r a n s fo r m a t io n d is c u s s e d in E x a m p le 2 .1 7 .
All the values of jc > 1 map to y = 1. Since x > 1 has a probability o f §, the
probability that Y = 1 is equal to P ( X > 1) = l Similarly P (Y = - 1 ) = i.
Thus, Y has a mixed distribution with a continuum of values in the interval
( - 1 , 1) and a discrete set of values from the set { - 1 , 1}. The continuous
part is characterized by a pdf and the discrete part is characterized by a prob
ability mass function as shown in Figure 2.10.C.
Yi = gi{X u X 2, . . . , X n), 1 , 2,
Let us start with a mapping o f two random variables onto two other random
variables:
Yi = Si(Xi, X2)
Y = g2( x u X 2)
and
There are k such regions as shown in Figure 2.11 (k = 3). Each region consists
o a parallelogram and the area o f each parallelogram is equal to Ay,Ay,/
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 63
dgi dgi
dxi dx2
J(xi, x 2) = (2.72)
dgi dg2
dx2 dx2
By summing the contribution from all regions, we obtain the joint pdf of Yj and
Y2 as
f Xl,Xl{x f , 4 ° )
y2) = z l7 (4°> 4 '))|
(2.73)
Using the vector notation, we can generalize this result to the n-variate case as
/ x ( x ( ->)
/v (y ) = X |/(x « )| (2.74.a)
i= l
Suppose we have n random variables with known joint pdf, and we are
interested in the joint pdf o f m < n functions o f them, say
yj = gj(x i, x 2, , x„), j = m + 1, . . . , n
in any convenient way so that the Jacobian is nonzero, compute the joint pdf
of Yi, Y2> • • ■ > Yn, and then obtain the marginal pdf o f Yj, Y2, . . . , Ym by
64 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
integrating out Ym+l, . . . , Y„. If the additional functions are carefully chosen,
then the inverse can be easily found and the resulting integration can be handled,
but often with great difficulty.
EXAMPLE 2.18.
* 1*2
(*1 + * 2)
y2 = *2
where
*2 *1
(*1 + *2? (*1 + * 2)2
T ( * i, *2) = 0 1
(*1 + * 2)2
(yi - y 1)2
yl
TRANSFORMATIONS {FUNCTIONS) OF RANDOM VARIABLES 65
We are given
= 0 elsewhere
Thus
1 yi
fyt,r2(yu yi) yi,y 2 = ?
4 (y2 - y 1 ) 2 ’
= 0 elsewhere
We must now find the region in the y u y 2 plane that corresponds to the region
9 < * < 1 1 ; 9 < *2 < 1 1 . Figure 2.12 shows the mapping and the resulting
region in the y lt y 2 plane.
Now to find the marginal density o f Yu we “ integrate out” y 2.
, 19
frjyi) = J99
r■9n'V-r1) ____y|
4 (y 2 - y j
2 dy 2 , 2
v, < 4 —
71 20
yi dy 2)
19
4 — < v, < 5 -
)nv(ii-r.> 4(tt “ y i):
20 71 2
= 0 elsewhere
*2
yi = 9yi/(9-yi)
11
9— ?2 = ll>i/(U -^ 1)
4Y2 4 % 5 V2
yl=xix2Hxi + x2)
(a )
(6)
F ig u r e 2 .1 2 T r a n s f o r m a t io n o f E x a m p l e 2 .1 8 .
66 REVIEW OF PROBABILITY AND RANDOM VARIABLES
yi + y 2 In yi , 1 19
4 ~ — y i — 4
2 + 2(9 - y t) 9 - y / 20
11 - yi yf 11 - yi 19 .1
+ y 1 In yx
2 (H - yO y\ 20
= 0 elsewhere
where the atJ s and £>,•’s are all constants. In matrix notation we can write this
transformation as
Y = AX + B (2.75)
X = A - 'Y - A -1B
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 67
0/i,l 0/i,2 ■ a n ,n
Substituting the preceding two equations into Equation 2.71, we obtain the pdf
of Y as
/v (y ) = /x (A -* y - A - 1B)||A||-1 (2.76)
/y, — f x 2 * f x 2 (2.77.b)
Thus, the density function of the sum of two independent random variables is
given by the convolution of their densities. This also implies that the charac-
68 REVIEW OF PROBABILITY AND RANDOM VARIABLES
teristic functions are multiplied, and the cumulant generating functions as well
as individual cumulants are summed.
EXAMPLE 2.19.
-l
-l
( 6)
I f x ( 0 . 5 - y 2 ) f X2( , y 2 )
V M /J/A
(0.5)
-2 0 .5 2
Id)
EXAMPLE 2.20.
f y(y) = P exp(-Ac 02 e x p [ - 2 (y - * 0 ] d xx
Jo
EXAMPLE 2.21.
Now if we define S Y = A 2 xA r, then the exponent in the pdf of Y has the form
which corresponds to a multivariate Gaussian pdf with zero means and a co-
variance matrix o f 2 Y. Hence, we conclude that Y, which is a linear transforma
tion of a multivariate Gaussian vector X, also has a Gaussian distribution. (Note:
This cannot be generalized for any arbitrary distribution.)
Order Statistics. Ordering, comparing, and finding the minimum and maximum
are typical statistical or data processing operations. We can use the techniques
outlined in the preceding sections for finding the distribution of minimum and
maximum values within a group of independent random variables.
Let X lt X 2, X 3, . . . , X n be a group of independent random variables having
a common pdf, f x (x ) , defined over the interval (a, b). To find the distribution
of the smallest and largest o f these Xfi, let us define the following transformation:
Y„ = largest of (X u X 2, . . . , X„)
That is Yi < Y2 < ••• < Y„ represent X i , X 2,. . . , X nwhen the latter are arranged
in ascending order o f magnitude. Then Yt is called the ith order statistic of the
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 71
group. We will now show that the joint pdf of Yu Y2, ■ . . , Yn is given by
We shall prove this for n = 3, but the argument can be entirely general.
With n = 3
/ a:t,x2,x3( x u x 2 , x 3) = f x ( x 1) f x ( x 2) f x ( x 3)
Yj = smallest of ( X 3, X 2, X 3)
Y2 = middle value of ( A j, X 2, X 3)
Y3 = largest o f { X u X 2, X 3)
A given set of values x u x 2, x 3 may fall into one of the following six possibilities:
Xi < X2 < X3 or yi = yi = x 2,
xu y3 = x 3
Xi < X 3< X 2 or yi = yi = x 3,
X u y3 = x 2
x2< Xi < X 3 or yi = X 2, yi = X u y3 = x 3
x 2 < X 3< Xi or yi = X2, yi = x 3, y3 = X i
x3 < X i < X 2 or yi = x 3, yi = X u y3 = X 2
x3< x2< Xi or y i = X 3, y z = x 2, y3 = X 1
0 0 1
J = 1 0 0 1
0 1 0
The reader can verify that, for all six inverses, the Jacobian has a magnitude of
1, and using Equation 2.71, we obtain the joint pdf of Y1; Y2, Y3 as
Equations 2.78.b and 2.78.C can be used to obtain and analyze the distribution
of the largest and smallest among a group o f random variables.
EXAMPLE 2.22.
f x ( x ) = ae “, x > 0
= 0 x <0
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 73
£ {h (Y )]} = f h ( y ) f Y(y)dy
h
£ Y{h(Y )} = £ x{h(g(X )}
Using the means and covariances, we may be able to approximate the dis
tribution of Y as discussed in the next section.
Y = g ( Z 1; . . . , X n)
74 REVIEW OF PROBABILITY AND RANDOM VARIABLES
F ig u r e 2 .1 5 S im p le M o n t e C a r l o s im u la t io n .
zv
If
O f
6£ '
8£ '
LZ-
92'
S£'
VS'
££'
Z£-
IE'
or
62 '
82 '
LZ'
92'
92'
VZ‘
£2'
22'
12'
02'
61 '
81 '
LV
9Y
S I'
H'
£1'
21'
R e s u lt s o f a M o n t e C a r lo s im u la t io n .
ir
or
60 '
80 '
/O'
90 '
SO'
fO '
£ 0'
20'
10'
0
10-
20-
£ 0' -
VO'-
S O '-
90-
LO'-
F ig u r e 2 .1 6
s3|diues jojsquinN
76 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Fx.(x) = 0 x < 10
= (x - 10 ) / 10 , 10 < x < 20
= 1 x > 20
Notice FxHu) = 10m + 10. Thus, if the value .250 were the random sample
of U, then the corresponding random sample of X would be 12.5.
The reader is asked to show using Equation 2.71 that if X t has a density
function and if X t = F r 1(U) = g(U) where U is uniformly distributed between
zero and one then Fp 1 is unique and
dFj(x)
fx ,(x ) where Ft = (F r T 1
dx
In these cases we use several approximation techniques that yield upper and/or
lower bounds on probabilities.
BOUNDS AND APPROXIMATIONS 77
if |*| s e
Yt =
( if |*1 < e
and thus
However,
(Note that the foregoing inequality does not require the complete distribution
o f X , that is, it is distribution free.)
Now, if we let X = {Y - p,y), and e = k, Equation 2.82.a takes the form
or
Equation 2.82.b gives an upper bound on the probability that a random variable
has a value that deviates from its mean by more than k times its standard
deviation. Equation 2.82.b thus justifies the use o f the standard deviation as a
measure o f variability for any random variable.
X a e
W< e
euY.
and, hence,
or
P ( X a e) < e - KE{e,x}, ta 0
Furthermore,
Equation 2.83 is the Chernoff bound. While the advantage of the Chernoff
bound is that it is tighter than the Tchebycheff bound, the disadvantage of the
Chernoff bound is that it requires the evaluation of E{e'x } and thus requires
more extensive knowledge o f the distribution. The Tchebycheff bound does not
require such knowledge o f the distribution.
BOUNDS A N D APPROXIMATIONS 79
(2.84)
EXAMPLE 2.23.
(a) Find the Tchebycheff and Chernoff bounds on P(X, > 3) and compare
it with the exact value o f P (X x == 3).
(b) Find the union bound on P (X x > 3 o r X 2 > 4 ) and compare it with the
actual value.
SOLUTION:
(a) The Tchebycheff bound on P (X x > 3) is obtained using Equation 2.82.C
as
Hence,
P { X l £ e) < e~*2'2
(b) P(Xi £ 3 or X 2 £ 4)
= P(Xt £ 3) + P (X 2 £ 4) - P ( X t £ 3 and X 2 £ 4)
= P { X l £ 3) + P {X 2 £ 4) - P ( X Y£ 3) P (X 2 £ 4)
since X x and X 2 are independent. The union bound consists o f the sum
o f the first two terms o f the right-hand side o f the preceding equation,
and the union bound is “ off” by the value of the third term. Substituting
the value o f these probabilities, we have
The union bound is usually very tight when the probabilities involved
are small and the random variables are independent.
Y — g (X i, X 2, , X n)
II Y is represented by its first-order Taylor series expansion about the point p.L,
P '2 , ■ ■ . , |A„
Y dg
g( Pl , Pi, ■■■ , Pi.) + s (Pi, p 2, ■ • • , Pi.) [X, - P,]
i=1 dXj
then
where
P; = E[Xi]
vl, = E[X, - P,)2]
_ E\(X, - p )(A - |x,)]
EXAMPLE 2.24.
y = § 2 + X 3X 4 - X I
p*! = 10 = 1
Px2 = 2 (j-2x 2 — -2
Px3 = 3 tri
*3 = 4
-
P*4 = 4 CTx = -
x4 3
Px5 = 1 (Tx = -
*5 5
Find approximately (a) py, (b) a\, and (c) P (Y < 20).
SOLUTION:
(a) py « y + (3)(4) -
00 ^ “ (^)2(l) + (
= 11.2
(c) With only five terms in the approximate linear equation, we assume,
for an approximation, that Y is normal. Thus
where
(2 . 86)
and the basis functions o f the expansion, Hj(y), are the Tchebycheff-Hermite
( T-H) polynomials. The first eight T-H polynomials are
H0(y) = 1
tfi(y) = y
H2(y) = / ~ 1
H3(y) = / - 3y
H ly ) = / - 6/ + 3
Hs(y) = / - 1 0 / + 15y
H6(y) = y 6 - 15y4 + 4 5 / - 15
H-ly) - / - 2 1 / + 105/ - 105y
Hs(y) = y s - 2 8 / + 210 / - 4 2 0 / + 105 (2.87)
d(Hk-i(y )h (y ))
1. Hk(y)h(y) = -
dy
84 REVIEW OF PROBABILITY AND RANDOM VARIABLES
3. J Hm(y )H n(y)h(y) dy = 0 , m ^n
( 2 . 88)
= nl, m - n
The coefficients o f the series expansion are evaluated by multiplying both
sides of Equation 2.85 by H k(y) and integrating from — co to <». By virtue o f the
orthogonality property given in Equation 2.88, we obtain
Ck H k{y)fr(y)dy
h i:
i JfcM m
(2.89.a)
l\ *■ ( 2 ) 1 ! ^ ~ 2 + 2 Z2 ! * k~4
where
= E {Y m}
and
^ = (k - m)\ = k{k ~ 1} [k ~ (m _ ^ k ~ m
The first eight coefficients follow directly from Equations 2.87 and 2.89.a and
are given by
C0 = 1
Cl = P-!
C2 = \ ( p* - 1)
c3 = g (p3 - 3p.i)
c 4 = ~ (p -4 - 6 p.2 + 3)
C5 = 1^0 “ 10^3 + ^ l )
Substituting Equation 2.89 into Equation 2.85 we obtain the series expansion
for the pdf of a random variable in terms o f the moments of the random variable
and the T-H polynomials.
The Gram-Charlier series expansion for the pdf of a random variable X with
mean | jla- and variance crA has the form:
where the coefficients C, are given by Equation 2.89 with |a[ used for |xt where
EXAMPLE 2.25.
SOLUTION:
^ = 1
9|x2 + 27 (jlj — 27
8
12p,3 + 54|x2 — 108(Xj 81
= 3.75
16
86 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Co — 1
Cx = 0
C2 = 0
C3 = 2 ( - . 5 ) = -.0 8 3 3 3
o
c * = h (3 -75 - 6 + 3) = .03125
+ J 1 .03l25h{z)Hi(z) dz
P{Z < 1)
= .8413 + .0833/z(1)//2(1) - .03125/i(l)tf3(l)
which says that (if only the first and second moments of a random variable are
known) the Gaussian pdf is used as an approximation to the underlying pdf. As
BOUNDS A N D APPROXIMATIONS 87
we add more terms, the higher order terms will force the pdf to take a more
proper shape.
A series of the form given in Equation 2.90 is useful only if it converges
rapidly and the terms can be calculated easily. This is true for the Gram-Charlier
series when the underlying pdf is nearly Gaussian or when the random variable
X is the sum of many independent components. Unfortunately, the Gram-
Charlier series is not uniformly convergent, thus adding more terms does not
guarantee increased accuracy. A rule of thumb suggests four to six terms for
many practical applications.
(2.91.a)
where
1
b2 = - .356563782
1 + py
|e(y)| < 7.5 x 10“8 b3 = 1.781477937
p = .2316419 b4 = -1.821255978
b, = .319381530 b5 = 1.330274429
88 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
t „ —* t 0 as n —> “
, converges for every X. e S, then we say that the random sequence converges
everywhere. The limit of each sequence can depend upon X, and if we denote
the limit by X , then X is a random variable.
Now, there may be cases where the sequence does not converge for every
outcome. In such cases if the set o f outcomes for which the limit exists has a
probability of 1 , that is, if
then we say that the sequence converges almost everywhere or almost surely.
This is written as
for all x at which F(x) is continuous, then we say that the sequence X„ converges
in distribution to X .
Zn = 2 (Xi - M.)/VmT2
Then Z„ has a limiting (as ti —* °°) distribution that is Gaussian with mean 0 and
variance 1 .
The central limit theorem can be proved as follows. Suppose we assume that
the moment-generating function M(t) of X^ exists for |f| < h. Then the function
m(t)
exists for - h < t < h. Furthermore, since X k has a finite mean and variance,
the first two derivatives o f M(t) and hence the derivatives of m(t) exist at t =
90 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Next consider
t ) = £ {e x p (T Z „ )}
exp - M. *2 - \l - M -
l t exp I t
= E{ aVn f fV « j ' ' 6XP cr's/rt
- M-'
E'|explT^ f)}
* .r h < ---- 7= < h
c r V /2 / o -V /2
^ i + l ! + [m 'U ) - j 2] t 2
Ct V h / 2/2 2 / 2 (7 2
0< 5<
lim[m"(^) — a2] = 0
and
lim M „( t) = lim j 1 + — }
n—** n~♦* 2/2 J
= e x p (T2/2) (2.94)
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 91
(The last step follows from the familiar formula o f calculus l i n v ,4 l + a!n\" =
e“). Since exp(i-2/2) is the moment-generating function o f a Gaussian random
variable with 0 mean and variance 1 , and since the moment-generating function
uniquely determines the underlying pdf at all points o f continuity, Equation
2.94 shows that Z n converges to a Gaussian distribution with 0 mean and vari
ance 1 .
In many engineering applications, the central limit theorem and hence the
Gaussian pdf play an important role. For example, the output of a linear system
is a weighted sum of the input values, and if the input is a sequence o f random
variables, then the output can be approximated by a Gaussian distribution.
Another example is the total noise in a radio link that can be modeled as the
sum o f the contributions from a large number of independent sources. The
central limit theorem permits us to model the total noise by a Gaussian distri
bution.
We had assumed that X -s are independent and identically distributed and
that the moment-generating function exists in order to prove the central limit
theorem. The theorem, however, holds under a variety of weaker conditions
(Reference [6]):
The assumption of finite variances, however, is essential for the central limit
theorem to hold.
Finite Sums. The central limit theorem states that an infinite sum, Y, has a
normal distribution. For a finite sum of independent random variables, that is,
Y = 2 X>
1= 1
then
f y — f x t * fxt * ■ ■ • * fx„
>M“ ) = IT ‘M 03)
i=*l
92 REVIEW OF PROBABILITY AND RANDOM VARIABLES
and
Cy(w) - ^ C^.(w)
K,y = 2 K-lx,
y=i
M-r = 2 Mw,
cry = 2 ar
M = 2 K ^t = 2 (M X - p *)4} - 3 K xx)
For finite sums the normal distribution is often rapidly approached; thus a
Gaussian approximation or aGram-Charlier approximation is often appropriate.
The following example illustrates the rapid approach to a normal distribution.
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 93
9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25
x
EXAMPLE 2.26.
is, if
P{|X - X n\> e} —» 0 as n —» oo
for any e > 0, then we say that X„ converges to the random variable X in
probability. This is also called stochastic convergence. An important application
of convergence in probability is the law of large numbers.
X» = - S X, (2 .95.a)
n i
The law of large numbers can be proved directly by using Tchebycheff’s ine
quality.
E [ { X n - X ) 2] *0 as n -» (2.96)
If Equation 2.96 holds, then the random variable X is called the mean square
limit of the sequence X„ and we use the notation
l.i.m. X n = X
For random sequences the following version of the Cauchy criterion applies.
E { ( X n - X ) 2} 0 as n —» cc
if and only if
2.9 SUMMARY
The reviews of probability, random variables, distribution function, probabil
ity mass function (for discrete random variables), and probability density
functions (for continuous random variables) were brief, as was the review of
expected value. Four particularly useful expected values were briefly dis
cussed: the characteristic function £ {ex p (/o)X )}; the moment generating func
tion £ {exp (fX )}; the cumulative generating function In £ {e x p (fX )}; and the
probability generating function E {zx } (non-negative integer-valued random
variables).
96 REVIEW OF PROBABILITY AND RANDOM VARIABLES
The review o f random vectors, that is, vector random variables, extended the
ideas of marginal, joint, and conditional density function to n dimensions,
and vector notation was introduced. Multivariate normal random variables
were emphasized.
2.10 REFERENCES
The material presented in this chapter was intended as a review of probability and random
variables. For additional details, the reader may refer to one of the following books.
Reference [2], particularly Vol. 1, has become a classic text for courses in probability
theory. References [8] and the first edition of [7] are widely used for courses in applied
probability taught by electrical engineering departments. References [1], [3], and [10]
also provide an introduction to probability from an electrical engineering perspective.
Reference [4] is a widely used text for statistics and the first five chapters are an excellent
introduction to probability. Reference [5] contains an excellent treatment of series ap
proximations and cumulants. Reference [6] is written at a slightly higher level and presents
the theory of many useful applications. Reference [9] describes a theory of probable
reasoning that is based on a set of axioms that differs from those used in probability.
[1] A. M. Breipohl, Probabilistic Systems Analysis, John Wiley & Sons, New York,
1970.
[2] W. Feller, An Introduction to Probability Theory and Applications, Vols. I, II,
John Wiley & Sons, New York, 1957, 1967.
[3] C. H. Helstrom, Probability and Stochastic Processes for Engineers, Macmillan,
New York, 1977.
[4] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, Macmillan,
New York, 1978.
PROBLEMS 97
[5] M. Kendall and A. Stuart, The Advanced Theory o f Statistics, Vol. 1, 4th ed.,
Macmillan, New York, 1977.
[6] H. L. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences,
Vol. I, John Wiley & Sons, New York, 1979.
[7] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Hill, New York, 1984.
[8] P. Z. Peebles, Jr., Probability, Random Variables, and Random Signal Principles,
2nd ed., McGraw-Hill, New York, 1987.
[9] G. Shafer, A Mathematical Theory o f Evidence, Princeton University Press, Prince
ton, N.J., 1976.
[10] J. B. Thomas, An Introduction to Applied Probability and Random Processes, John
Wiley & Sons, New York, 1971.
2.11 PROBLEMS
2.1 Suppose we draw four cards from an ordinary deck o f cards. Let
A x: an ace on the first draw
2.2 A random experiment consists of tossing a die and observing the number
of dots showing up. Let
A x: number of dots showing up = 3
2 3 A box contains three 100-ohm resistors labeled R x, R2, and R3 and two
1000-ohm resistors labeled R 4 and Rs. Two resistors are drawn from this
box without replacement.
98 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Work parts (b), (c), and (d) by counting the outcomes that belong to the
appropriate events.
2.4 With reference to the random experiment described in Problem 2.3, define
the following events.
a. P {A U S U C ) = P ( A ) + P (B ) + P {C ) - P { A B ) - P (B C )
- P ( C A ) + P {A B C ).
b. P(A\B) = P { A ) implies P(B\A) = P(B).
c. P ( A B C ) = P (A )P (B \ A )P (C \ A B ).
2.6 A u A 2, A 2 are three mutually exclusive and exhaustive sets of events as
sociated with a random experiment E u Events Blt B2, and B3 are mutually
- r '" - B
---- ^ ^ ___
Figure 2.19 Circuit diagram for Problem 2.8.
PROBLEMS 99
\ B,
A \ e, b2 e3
4, 3/36 * 5/36
A2 5/36 4/36 5/36
A3 * 6/36 *
P[B,) 12/36 14/36 *
2.7 There are two bags containing mixtures o f blue and red marbles. The first
bag contains 7 red marbles and 3 blue marbles. The second bag contains 4
red marbles and 5 blue marbles. One marble is drawn from bag one and
transferred to bag two. Then a marble is taken out of bag two. Given that
the marble drawn from the second bag is red, find the probability that the
color of the marble transferred from the first bag to the second bag was
blue.
2.8 In the diagram shown in Figure 2.19, each switch is in a closed state with
probability p, and in the open state with probability 1 — p. Assuming that
the state o f one switch is independent of the state o f another switch, find
the probability that a closed path can be maintained between A and B
(Note: There are many closed paths between A and B.)
2.9 The probability that a student passes a certain exam is .9, given that he
studied. The probability that he passes the exam without studying is ,2.
Assume lhat the probability that the student studies for an exam is .75 (a
somewhat lazy student). Given that the student passed the exam, what is
the probability that he studied?
2.10 A fair coin is tossed four times and the faces showing up are observed.
a. List all the outcomes o f this random experiment.
b. If Al is the number of heads in each of the outcomes of this ex
periment, find the probability mass function o f X.
I
2.11 Two dice are tossed. Let X be the sum o f the numbers showing up. Find
the probability mass function o f X .
2.13 Show that the mean and variance o f a binomial random variable X are
M-v = nP and &x = npq, where q = 1 — p.
2.14 Show that the mean and variance o f a Poisson random variable are p x =
X. and o * = X..
2.15 The probability mass function o f a geometric random variable has the form
P ( X = k ) = pq*~\ k = 1, 2, 3, . . . ; p, q > 0, p + q = 1.
a. Find the mean and variance of X .
b. Find the probability-generating function o f X.
2.16 Suppose that you are trying to market a digital transmission system (m o
dem) that has a bit error probability of 10 ~4 and the bit errors are inde
pendent. The buyer will test your modem by sending a known message of
104 digits and checking the received message. If more than two errors
occur, your modem will be rejected. Find the probability that the customer
will buy your modem.
\x
Y\ -1 0 1
-1 1 1 0
4 8
1
0 0 4 0
1 0 1
i 4
c. Find pxy -
PROBLEMS 101
2.18 Show that the expected value operator has the following properties.
a. E{a + b X } = a + b E {X }
b. E {a X + b Y } = aE {X } + b E {Y }
c. Variance o f a X + b Y = a1 Var[V] + b 2 Var[Y]
+ lab Covar[X, Y]
2.20 A thief has been placed in a prison that has three doors. One o f the doors
leads him on a one-day trip, after which he is dumped on his head (which
destroys his memory as to which door he chose). Another door is similar
except he takes a three-day trip before being dumped on his head. The
third door leads to freedom. Assume he chooses a door immediately and
with probability 1/3 when he has a chance. Find his expected number of
days to freedom. {Hint: Use conditional expectation.)
2.21 Consider the circuit shown in Figure 2.20. Let the time at which the ith
switch closes be denoted by X t. Suppose X x, X 2, X 3, X 4 are independent,
identically distributed random variables each with distribution function F.
As time Increases, switches will close until there is an electrical path from
A to C. Let
U = timewhen circuit is first completed from A to B
V = time when circuit is first completed from B to C
W = timewhen circuit is first completed from A to C
Find the following:
a. The distribution function o f U.
------------- 0 ^ 0 -------------
o
F ig u r e 2 .2 0 C ir c u it d ia g r a m f o r P r o b l e m 2 .2 1 .
102 REVIEW OF PROBABILITY AND RANDOM VARIABLES
2.23 Show that the mean and variance of a random variable X having a uniform
distribution in the interval [a, b] are p.* = (a + b )l 2 and = (b —
a fm .
2.25 X is a zero mean Gaussian random variable with a variance o f <j \ . Show
that
2.26 Show that the characteristic function o f a random variable can be expanded
as
**(<■ >) =
k =
i 0
“
K
r
-
e w
(Note: The series must be terminated by a remainder term just before the
first infinite moment, if any exist).
2.27 a. Show that the characteristic function of the sum of two independent
random variables is equal to the product of the characteristic functions of
the two variables.
b. Show that the cumulant generating function of the sum of two
independent random variables is equal to the sum o f the cumulant gen
erating function of the two variables.
c. Show that Equations 2.52.C through 2.52.f are correct by equating
coefficients of like powers o f / o> in Equation 2.52.b.
a
fx 0 ) = a > 0,
tt( x 2 + a 2) ’
a. Find the characteristic function of X.
b. Comment about the first two moments of X.
2.33 X and Y are independent zero mean Gaussian random variables with
variances &x , and crY. Let
Z = i (X + T) and W - %(X — Y)
a. Find the joint pdf f z,w (z, w).
b. Find the marginal pdf / Z(z).
c. Are Z and W independent?
Z = - [X, + X 2 + • • • + X„]
n
104 REVIEW OF PROBABILITY AND RANDOM VARIABLES
is a Gaussian random variable with p.z = 0 and = a 2ln. (Use the result
derived in Problem 2.32.)
2.35 X is a Gaussian random variable with mean 0 and variance Find the
pdf of Y if:
a. Y = X2
b- Y = |A1
c. Y = i [ X + l*|]
f 1 if X > crx
d. Y = \ X if |*| £ a x
1 -1 if * < —<jx
2.38 Xi and X 2 are two independent random variables with uniform pdfs in the
interval [0, 1], Let
Yi = Xi + X 2 and Y2 = X x - X 2
a. Find the joint pdf f y lrY2( y i, Yz) and clearly identify the domain
where this joint pdf is nonzero.
b. Find py,y2 and E{Yi\Y2 = 0.5).
2.39 X x and X 2 are two independent random variables each with the following
density function:
/* ,(* ) = e x, x > 0
= 0 x < 0
PROBLEMS 105
Let Fj = Xi + X 2 and Y2 = X J ( X 2 + X 2)
Y= t x j
i= 1
2.41 X is uniformly distributed in the interval [ —or, -rr]. Find the pdf of
Y - a sin(AT).
" i i i ”
"6 " 2 4 3
i O 2
|J-X = 0 2 x — 4 - ^ 3
I 2 1
8 3 3 1
Find the mean vector and the covariance matrix of Y = [F l5 F2, F3]r,
where
Fi = X 1 — X 2
Y2 = Xi + X 2 - 2 X 3
F3 = X x + X 3
Vi
y = VJ
y„_
v r 2 xv s o
(This is the condition for positive semidefiniteness of a matrix.)
106 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Y = AX
where
T
A [V 1; v 2, v 3, . . . , v„] n x n
r^i
\2 0
Xy =
o K
2.48 If U(x) 2: 0 for all x and U(x) > a > 0 for all x E. t, where £ is some
interval, show that
P[U(X)*a]£±E{U(X)}
2.49 Plot the Tchebycheff and Chernoff bounds as well as the exact values for
P ( X > a), a > 0 , if W is
2.50 Compare the Tchebycheff and Chernoff bounds on P ( 7 > a) with exact
values for the Laplaeian pdf
fr(y) = |e x p (-|y |)
Y = X + N
where X is the “ signal” component and N is the noise. X can have one
of eight values shown in Figure 2.21, and N has an uncorrelated bivariate
Gaussian distribution with zero means and variances o f 9. The signal X
and noise N can be assumed to be independent.
The receiver observes Y and determines an estimated value X of X
according to the algorithm
if y G Aj then X = x;
The decision regions A, for i = 1, 2, 3, . . . , 8 are illustrated by A , in
Figure 2.21. Obtain an upper bound on P ( X # X ) assuming that P ( X =
x,) = s for i = 1 , 2 , . . . , 8 .
Hint:
8
1. P ( X # X ) = ^ P ( X ^ X\X = x ,)P (X = x,.)
Figure 2.21 Signal values and decision regions for Problem 2.51.
108 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
= H k( y ) h ( y ) , *= 1, 2, . . .
2.53 X has a triangular pdf centered in the interval [ - 1 , 1], Obtain a Gram-
Charlier approximation to the pdf o f X that includes the first six moments
o f X and sketch the approximation for values o f X ranging from —2 to 2.
2.54 Let p be the probability of obtaining heads when a coin is tossed. Suppose
we toss the coin N times and form an estimate of p as
f x hX 2 X„(*l> X 2, , X „) — fx (X i )
i=i
Assume that p.x = 0 and <j 2 xis finite,
a. Find the mean and variance of
n fr,
2.56 Show that if A,s are o f continuous type and independent, then for suffi
ciently large n the density of sin(A"! + X 2 + • • • + X n) is nearly equal
to the density o f sin(A') where X is a random variable with uniform dis
tribution in the interval ( —it, it).
2.57 Using the Cauchy criterion, show that a sequence X n tends to a limit in
the MS sense if and only if E { X mX n} exists as m, n —* °o.
2.58 A box has a large number o f 1000-ohm resistors with a tolerance of ±100
ohms (assume a uniform distribution in the interval 900 to 1100 ohms).
Suppose we draw 10 resistors from this box and connect them in series
PROBLEMS 109
and let R be the resistive value o f the series combination. Using the Gaus
sian approximation for R find
P[9000 < R < 11000]
2.59 Let
2.60 Y is a Guassian random variable with zero mean and unit variance and
sin (Y /n) ify>0
cos( Y /n ) if y s 0
Discuss the convergence o f the sequence X„. (Does the series converge,
if so, in what sense?)
2.61 Let Y be the number of dots that show up when a die is tossed, and let
X n = e x p [ - n ( Y - 3)]
Discuss the convergence o f the sequence X n.
2.62 Y is a Gaussian random variable with zero mean and unit variance and
X n = ex p (— Y/n)
Discuss the convergence o f the sequence X„.
CHAPTER THREE
Random Processes
and Sequences
3.1 INTRODUCTION
By processing y,(f) the receiver can generate the output sequence b t(k). Thus,
by extending the concept o f random variables to include time and using the
results from deterministic systems analysis, we can model random signals and
analyze the response of systems to random inputs.
The validity of the random-process model suggested in the previous para
graph for the signals shown in Figure 3.1 can be decided only by collecting and
analyzing sample waveforms. Model building and validation fall into the realm
o f statistics and will be the subject of coverage in Chapters 8 and 9. For the
time being, we will assume that appropriate probabilistic models are given and
proceed with the analysis.
We start our study of random process models with an introduction to the
notation, terminology, and definitions. Then, we present a number of examples
and develop the idea o f using certain averages to characterize random processes.
Basic signal-processing operations such as differentiation, integration, and lim
iting will be discussed next. Both time-domain and frequency-domain techniques
will be used in the analysis, and the concepts of power spectral distribution and
bandwidth will be discussed in detail. Finally, we develop series approximations
to random processes that are analogous to Fourier and other series represen
tations for deterministic signals.
Outcome Waveform
1 * i( 0 = - 4
2 x 2(t) = - 2
114 RANDOM PROCESSES AND SEQUENCES
Outcome Waveform
3 x 3(t) = + 2
4 x 4(t) = + 4
5 x 5(t) = — r/2
6 x 6(t) = t/2
The set o f waveforms { jcx(f), x 2(t), . . . , * 6(r)}, which are shown in Figure
3.2, represents this random process and are called the ensemble.
3.2.2 Notation
A random process, which is a collection or ensemble o f waveforms, can be
denoted by X(t, A ), where t represents time and A is a variable that represents
an outcome in the sample space S of some underlying random experiment E.
Associated with each specific outcome*, say X.,, we have a specific member
function xf t ) of the ensemble. Each member function, also referred to as a
sample function or a realization of the process, is a deterministic function of
time even though we may not always be able to express it in closed form.
*If the number o f outcomes is countable, then we will use the subscripted notation X, and x , ( f ) to
denote a particular outcom e and the corresponding member function. Otherwise, we will use X and
x ( t ) to denote a specific outcom e and the corresponding member function.
DEFINITION OF RANDOM PROCESSES 115
While the notation given in the preceding paragraphs is well defined, con
vention adds an element of confusion for the sake o f conformity with the notation
for deterministic signals by using X(t) rather than X(t, A) to denote a random
process. X(t) may represent a family o f time functions, a single time function,
a random variable, or a single number. Fortunately, the specific interpretation
o f X(t ) usually can be understood from the context.
For the random process shown in Figure 3.2, the random experiment E consists
of tossing a die and observing the number of dots on the up face.
Note that A x is an event associated with E and its probability is derived from
the probability structure o f the random experiment E. In a similar fashion, we
can define joint and conditional probabilities also by first identifying the event
that is the inverse image of a given set of values of X ( t ) and then calculating
the probability of this event.
For the random process shown in Figure 3.2, find (a) P [ X { 4) = - 2 ] ; (b) P[W(4)
< 0]; (c) P[W(0) = 0, X(4) = - 2 ] ; and (d) P [ X ( 4) = —2|AT(0) = 0],
SOLUTION:
(a) Let A be the set of outcomes such that for every X/ £ A, X{4, X,) =
- 2 . It is clear from Figure 3.2 that A = {2, 5}. Hence, P [ X ( 4) =
-2] = P(A) = I = i
(b) P [ X ( 4) £ 0 ] = P[set of outcomes such that W(4) < 0] = § =
(c) Let B be the set of outcomes that maps to W(0) = 0 and X(4) = - 2 .
Then B = {5}, and hence P [X (0 ) = 0, X{4) = - 2 ] = P( B) = g.
(d) P [ X ( 4) = -2 | X (0 ) = 0] = = - 2 : * l ° > = °1
w L w 1 w J P [Jf(0) = 0]
( 1/ 6) 1
( 2/ 6) 2
P [ X { 4) = - 2 ] = l i m -
n—*<x f t
the noise is stationary, whereas the process shown in Figure 3.2 is nonstationary,
that is, A'(O) has a different distribution than X(4). More concrete definitions
of stationarity and several examples will be presented in Section 3.5 of this
chapter.
A random process may be either real-valued or complex-valued. In many
applications in communication systems, we deal with real-valued bandpass ran
dom processes o f the form
Z(t) = A(t)cos[2irfct + 6 ( 0 ]
W (0 = A (0 c o s 6 ( 0 + j A (Osin 6 ( 0
= Z ( 0 + ;Y (0
X(t2), . . . . X (r „) ( * 1 j X 2, • • ■ , X„)
= p [ X ( tl) ^ Xl, . . .
for all n and tx, . . . , tn E T (3.1)
*It is necessary only to assume that X ( t ) is measurable on S for every t e T. A random process is
sometimes also defined as a family o f indexed random variables, denoted by [ X ( t , •); t E T], where
the index set T represents the set o f observation times.
120 RANDOM PROCESSES A N D SEQUENCES
of the form given in Equation 3.1. This leads to a formidable description o f the
process because at least one «-variate distribution function is required for each
value of n. However, the first-order distribution function(s)P[A'(t1) s and
the second-order distribution function(s) P [ X (q ) £ ax, X ( t 2) ^ a2] are primarily
used. The first-order distribution function describes the instantaneous amplitude
distribution of the process and the second-order distribution function tells us
something about the structure o f the signal in the time-domain and thus the
spectral content of the signal. The higher-order distribution functions describe
the process in much finer detail.
While the joint distribution functions o f a process can be derived from a
description of the random experiment and the mapping, there is no technique
for constructing member functions from joint distribution functions. Two dif
ferent processes may have the same nth order distribution but the member
functions need not have a one-to-one correspondence.
EXAMPLE 3.3.
For the random process shown in Figure 3.2, obtain the joint probabilities
P[2f(0) and X ( 6)] and the marginal probabilities P[^f(0)] and P[Ar(6)].
SOLUTION: We know that X (0 ) and X(6) are discrete random variables and
hence we can obtain the distribution functions from probability mass functions,
which can be obtained by inspection from Table 3.2.
'--------------- / -----------------------------------“------------------------------------------------------
I Marginal probabilities of X(6)
Y (0 = A cos(108f + 0 )
where A and © are random variables representing the amplitude and phase of
the received waveforms. It might be reasonable to assume uniform distributions
for A and 0 .
Representation o f a random process in terms of one or more random variables
whose probability law is known is used in a variety o f applications in commu
nication systems.
M O = E { X ( t )} (3.2)
Autocorrelation. The autocorrelation of X( t ), denoted by R xx(tu h) , is
the expected value of the product X*(t { ) X ( t 2)
The mean of the random process is the “ ensemble” average of the values o f all
the member functions at time t, and the autocovariance function Cxx(tu tt) is
the variance of the random variable X (tj). For tx A t2, the second moments
Rxx itx, h ) , CXx(t\i h)> and rxx{t{ , t2) partially describe the time domain
structure of the random process. We will see later that we can use these functions
to derive the spectral properties of X(t).
For random sequences the argument n is substituted for t, and n x and n2 are
substituted for t\ and t2, respectively. In this case the four functions defined
above are also discrete time functions.
EXAMPLE 3.4.
Find ixx (t), R xx{tl , t2), Cxx(ti, t2), and rxx(tl, t2) for the random process
shown in Figure 3.2.
MO = £{*(')} = 7 1 *,(0 = o
0 i=i
Cxx(t\> h ) = Rxx(h, h)
and
EXAMPLE 3.5.
X{t) = A cos(100f + 0 )
SOLUTION:
M r ) = E { A } £{cos(100t + 0 ) } = 0
Three averages or expected values that are used to describe the relationship
between X (t ) and Y(t) are
Cross-correlation Function.
R x r i t u h ) = E { X * { t , ) Y { t 2)} (3.6)
Cross-covariance Function.
Equality. Equality o f two random processes will mean that their respective
member functions are identical for each outcome X G S. Note that equality also
implies that the processes are defined on the same random experiment.
Uncorrelated. Two processes X(t ) and Y(t) are uncorrelated when
EXAMPLE 3.6.
Outcome of Outcome of
Experiment m Y(t) Experiment Z(t)
2 -2 -4 2 (tail) sin t
3 2 4
4 4 -2
5 -1/2 0
6 t!2 0
Random processes X{t) and Y (f) are defined on the same random experiment
E u However, X(t ) Z Y (f) since x,(t) Z y,(t) for every outcome, X,. These two
processes are orthogonal to each other since
They are also uncorrelated because C*y(*i, t2) = 0. However, X(t ) and Y(t)
are clearly not independent. On the other hand, X ( t ) and Z(t) are independent
processes since these processes are defined on two unrelated random experiments
E y and E 2, and hence for any pair o f outcomes X; G Si and q} G S2,
properties are also used in a similar fashion in random signal analysis. In this
section, we introduce examples of a few specific processes. These processes and
their applications will be studied in detail in Chapter 5, and they are presented
here only as examples to illustrate some o f the important and general properties
o f random processes.
P [ X ( t k) S: x k\X{tk_,), . . . , * ( * ! ) ]
Gaussian random processes are widely used to model signals that result from
the sum of a large number o f independent sources, for example, the noise in a
low-frequency communication channel caused by a large number o f independent
sources such as automobiles, power lines, lightning, and other atmospheric phe
nomena. Since a fc-variate Gaussian density is specified by a set of means and
a covariance matrix, knowledge of the mean (f), t £ T, and the correlation
function R Xx ( h , t2), , f 2 6EF, are sufficient to completely specify the probability
distribution of a Gaussian process.
If a Gaussian process is also a Markov process, then it is called a Gauss-
Markov process.
Random Walk. A discrete version o f the Wiener process used to model the
random motion o f a particle can be constructed as follows: Assume that a particle
is moving along a horizontal line until it collides with another molecule, and
that each collision causes the particle to move “ up” or “ down” from its previous
path by a distance “ d.” Furthermore, assume that the collision takes place once
every T seconds and that the movement after a collision is independent o f all
previous jumps and hence independent of its position. This model, which is
analogous to tossing a coin once every T seconds and taking a step “ up” if heads
show and “ down” if tails show, is called a random walk. The position of the
particle at t = nT is a random sequence X ( n ) where in this notation for a
sequence, X(n) corresponds with the process X { n T) , and one member function
of the sequence is shown in Figure 3,4. We will assume that we start observing
the particle at t = 0 , its initial location W(0) = 0 and that the jump o f ± d
appears instantly after each toss.
If k heads show up in the first n tosses, then the position o f the particle at
t = nT is given by
X ( n ) = kd + ( n — k) ( —d)
= (2k — n) d (3.14)
128 RANDOM PROCESSES AND SEQUENCES
XM
Figure 3.4 Sample function of the random walk process. Values of X(n) are shown
as
and X ( n ) is a discrete random variable having values md, where m equals —n,
—n + 2, . . . , n — 2, n. If we denote the sequence of jumps by a sequence of
random variables { /,} , then we can express X ( n ) as
X ( n ) — Jx + J2 + ■ ■ ■ + Jn
P ( Ji = d) = P(Ji = - d ) = |
E{J,} = 0 E{Jj } = d 2
, m + n
P [ X ( n ) = md] = P[ k heads in n tosses], k =
2
SPECIAL CLASSES OF RANDOM PROCESSES 129
k = 0 , 1 , 2 , . . . , n; m = 2k - n
and
E{X(n)} = 0
E { X { n f } = £ {[7 , + J2 + ■■ ■ + Jn\2}
= nd2
R x x ( n x, n2) - E { X ( n x) X { n 2)}
= E { X ( n x) [ X { n x) + X ( n 2) -
= E { X ( n iy } + E { X { n x) [ * ( « , ) - ^ ( hj)]}
If « ! > n2, then RXx(n 1 , ni) = n 2 d2 and in general we can express R xx(nu
n2) as
A sample function o f Y(t) is shown as a broken line in Figure 3.4. The mean
and variance o f Y(t) at t = nT are given by
The Wiener process is obtained from Y(t) by letting both the time (T ) between
jumps and the step size (d) approach zero with the constraint d 2 = a T to assure
that the variance will remain finite and nonzero for finite values o f t. As a result
o f the limiting, we have the Wiener process W(t) with the following properties:
A sample function o f the Wiener process, which is also referred to as the W iener-
Levy process, is shown in Figure 3.5. The reader can verify that the Wiener
process is a (nonstationary) Markov process and a Martingale.
W (t)
F ig u r e 3 .5 S a m p l e f u n c t i o n o f t h e W ie n e r —L e v y p r o c e s s .
SPECIAL CLASSES OF RANDOM PROCESSES 131
Q(f>
1. For any times tly t2 G. T and t2 > the number o f events Q(t2) —
Q ( t r) that occur in the interval tx to t2 is Poisson distributed according
to the probability law
k = 0, 1, 2, . . . (3.18)
P[Q(t) = k] = ^ - e x p ( - \ r ) , k = 0, 1 , 2 , . . .
R qqU i ’ h ) = E { Q ( t i) <2(^2)}
= E { Q ( h ) [ QUi) + Q{t 2) - Q (ti)]} for t2 s h
= E{Q2
1{4
3tx)} + E {Q( t O} E { Q ( t 2) - Q { t x)}
= [\f! + \2tl] + - fO]
= \ fi[l + \t2] for t2 s rt
= \2 ti t2 + \ ■ min(r1; t2) for all tl712 E T (3.20)
The reader can verify that the Poisson process is a Markov process and is
nonstationary. Unlike the W iener-Levy process, the Poisson process is not a
Martingale since its mean is time varying. Additional properties of the Poisson
process and its applications are discussed in Chapter 5.
XU)
The random sequence o f pulses shown in Figure 3.7 is called a random binary
waveform, and it can be expressed as
X(t) = 2 A kP(t - k T - D )
1
n i
1 1 OzsDst!
1 i i 1 1 and t 2 belong to the
0 D tl h T same pulse interval and
X (fi)X (t2) = 1
-1
t2<DsT
1 fi and t 2 belong to the
0 h 1 f2l D T same pulse interval and
X (t!)X ((2) = 1
i \
-1
i 1
1
1
1 ti <£><(2
1 i tiand t 2 belong to different
h\ D h T pulse intervals and
X U i ) X l t 2) = ± 1
1
-1
1
F ig u r e 3 .8 C a lc u la t io n o f RXx(tt, h)-
The random variable D has a uniform distribution in the interval [0, T] and
hence P[0 < D < tl or t2 < D < T] = 1 - (t2 - tJ/T, and P( t t < D < t2)
~ (h — h)!T. Using these probabilities and conditional expectations, we obtain
R x x i f u h ) = E { X ( t , ) X { t 2)}
= ATft2)|0 < D < t x or t2 < D < T }
■ P[0 < D < t 1 or t2 < D < T]
+ W i ) ^ 2)|fi =s D < t2} • P(r, < D < t2)
4- 1 . - . ^ 2 _ i . 1 . ft 2 h)
2 T 'l T
_ 2 _ (f2 f l)
T
1*2 ~ *i|
T 1*2 ~ *i |< T
Rxx{t\, h ) (3.22)
0 elsewhere
The reader can verify that the random binary waveform is not an independent
increment process and is not a Martingale.
A general version of the random binary waveform with multiple and cor
related amplitude levels is widely used as a model for digitized speech and other
signals. We will discuss this generalized model and its application in Chapters
5 and 6.
3.5 STATIONARITY
A random process X(t) is called time stationary or stationary in the strict sense
(abbreviated as SSS) if all o f the distribution functions describing the process
are invariant undera -translation of time. That is, for all tu t2, . . . , tk, tx -hr,
l2 + t, + t 6 T and all A: = 1 , 2 , . . . ,
^D[A (t1) £ x t, X ( t 2) ^ x 2, . . . , X ( t k) £ x k]
= P [ X ( t x + t) £ x 1; X ( t 2 + t) £ x 2, . . . , X ( t k + t) £ x k] (3.23)
1 , . . . , N but not necessarily for k > N, then the process is said to be Nth
order stationary.
From Equation 3.23 it follows that for a SSS process
for any t implies that the second-order distribution is strictly a function o f the
time difference t2 - tx. As a consequence of Equations 3.24 and 3.25, we
conclude that for a SSS process
- - It should be noted here that a random process with a constant mean and an
autocorrelation function that depends only on the time difference t2 — q need
not even be first-order stationary.
Two real-valued processes X(t ) and Y(t) are jointly stationary in the strict
sense if the joint distributions of X(t) and Y(t) are invariant under a translation
of time, and a complex process Z{t) = X(t ) + jY(t) is SSS if the processes
X(t ) and Y (f) are jointly stationary in the strict sense.
A less restrictive form of stationarity is based on the mean and the autocorre
lation function. A process X(t) is said to be stationary in the wide sense (WSS
or weakly stationary) if its mean is a constant and the autocorrelation function
depends only on the time difference:
E { X ( t ) } = p* (3.28.a)
Two processes X(t) and Y(t) are jointly WSS if each process satisfies Equation
3.28 and for all t G T.
E{X(k)} = (3.30.a)
and
It is easy to show that SSS implies WSS; however, the converse is not true in
general.
3.5.3 Examples
EXAMPLE 3.7.
Two random processes X(t) and Y{t) are shown in Figures 3.9 and 3.10. Find
the mean and autocorrelation functions of X(t) and Y{t) and discuss their sta
tionarity properties.
5
* i (r i = 5
3
Xi it) = 3
1
*3(0 = 1
------------------------- ---- ------------------------ t
-1
*4 (f) = ~ 1
-3
x 5U) = - 3
-5
x 6( t ) = - 5
SOLUTION:
Furthermore, a translation of the time axis does not result in any change in any
member function, and hence, Equation 3.23 is satisfied and X(t) is stationary
in the strict sense.
For the random process Y (f), £ {Y (f)} = 0, and
Since the mean of the random process Y (f) is constant and the autocorrelation
function depends only on the time difference f2 — tu Y(t) is stationary in the
wide sense. However, Y(r) is not strict-sense stationary since the values that
Y(t) can have at t = 0 and t = rr/4 are different and hence even the first-order
distribution is not time invariant.
EXAMPLE 3.8.
P[X(n) = 0, X ( n + 1) = 0] = 0.2,
P [A (n ) = 0, X ( n + 1 ) = 1 ] = 0.2
P[X(n) = 1, X ( n + 1) = 0] = 0.2,
P[X{n) = 1, X ( n + !) = !] = 0.4
SOLUTION:
P [ X { n ) = 0] = P [ X ( n ) = 0, X ( n + 1) = 0]
+ P [ X { n ) = 0, X { n + 1) = 1]
= 0.4
P [ X { n ) = 1] = 0.6
E { X ( n ) X ( n + 1)} = 1 •P [ X { n ) = 1, X {n + 1) = 1] = 0.4
E { X ( n ) X { n + 2)} = 1 •P [ X ( n ) = 1, X ( n + 2) = 1]
= 1 •P[.Y(n) = 1, X ( n + 1) = 1, X ( n + 2) = 1]
+ 1 • P [ X ( n ) = 1, X( n + 1) = 0, X (n + 2) = 1]
= 1 • P [ X ( n ) = 1] P [ X ( n + 1) = 1|X ( n ) = 1]
x P [ X { n + 2) = ljA (n ) = 1, X (n + 1) = 1]
+ 1 • P [ X ( n ) = 1] P[ X( n + 1) = 0|X ( n ) = 1]
x P [ X ( n + 2) = 1|X (n ) = 1, X( n + 1) = 0]
140 RANDOM PROCESSES A N D SEQUENCES
P [ X{ n + 2) = 1|X { n ) = 1, X (n + 1) = 1]
= P [ X { n + 2) = 1 |X ( n + 1) = 1]
P [ X{ n + 2) = l\X(n) = 1, * ( « + 1) = 0]
= P[ X{ n + 2) = l|AT(/i + 1) = 0]
and hence
02
£ W „ W „ + 2 ) ) . ( 0 .6) ( ^ ) ( ^ + (0.6)
0.6
« 0.367
Thus we have
E { X { n ) } = 0.6
Rxx{n, n + 1) = 0.4
R xx(n, n + 2) ~ 0.367 ,
EXAMPLE 3.9.
SOLUTION:
Since E{AjAj}, £ {/l,B ,}, £{A;By}, and £(B,By}, i A j are all zero, we have
A Gaussian random process provides one of the few examples where WSS
implies SSS.
Rxx( t) = E { X ( t ) X {t + t)}
There are some general properties that are common to all autocorrelation func
tions o f stationary random processes, and we discuss these properties briefly
before proceeding to the development o f power spectral densities.
IK^OOI s Rxx(0)
This can be verified by starting from the inequalities
E{[ X{t + t) - A (r)]2} £ 0
2 R « (0 ) — 2R xx( t) s 0
+ T0) - R x x W ] 2 ^ 2 ( ^ ( 0) - R Xx(To)]Rxx(0)
for every t and To- If Rxx(To) — Rxx(0), then Rxx{^ + Ta) — R Xx(T)
for every t and R Xx(t) is periodic with period T0.
7. If 7?xa-(0) < 00 and R Xx(t) is continuous at t = 0, then it is continuous
for every t .
If the deterministic signal is periodic with period T0, then we can define a time-
averaged autocorrelation function { / ^ ( t))^ as*
Px = f S J j ) df (3.38)
In Equation 3.38, the left-hand side represents the total average power in the
signal, / is the frequency variable expressed usually in Hertz (Hz), and Sxx(f)
has the units of power (watts) per Hertz. The function Sxx(f) thus describes the
power distribution in the frequency domain, and it is called the power spectral
density function of the deterministic signal x(t).
The concept of power spectral density function also applies to stationary
random processes and the power spectral density function of a WSS random
process X(t) is defined as the Fourier transform o f the autocorrelation function
Equation 3.39 is called the Wiener-Khinchine relation. Given the power spectral
density function, the autocorrelation function is obtained as
*The notation ( )ro denotes integration o r averaging in the time domain for a duration of T0 seconds
whereas E { } denotes ensemble averaging.
146 RANDOM PROCESSES AND SEQUENCES
W M ) = R x x ( 0) = f X Sxx( f ) df ■ (3.41)
Note that if X{t ) is a current or voltage waveform then E{X*{t)} is the average
power delivered to a one-ohm load. Thus, the left-hand side o f the equation
represents power and the integrand Sx x (f) on the right-hand side has the units
of fr e T e lT ^ ^ * ( /) glV6S the distribution o f power as a function
o f frequency and hence is called the power spectral density function o f the
stationary random process X(t).
B
Jc
f - ~2 If I +
fhe usd funct , Ulat' 0nS- A s stated in Equation 3.41, the area under
the psd function gives the total power in X(t ) . The power in a finite band of
requencies, / , to f 2, 0 < f l < f 2 ls the area under the psd from - f 2 to
plus the area betw een /! t o / 2, and for real A (f) 1 h
Sxx(f)
I
SxxW
Sxx(T>
The proof of this equation is given in the next chapter. Figure 3.11.C makes it
seem reasonable The factor 2 appears in Equation 3.43 since we are using a
^wo-sided psd and Sxx(f) is an even function (see Figure 3.11.c and Equation
Some processes may have psd functions with nonzero values for all finite
values,of./., Fox ..example, ,Sxx( f) = eXp ( - . f / 2 ) . For such processes, several
indicators are used as measures of the spread of the psd in the frequency domain.
One popular measure is the effective {or equivalent) bandwidth B,„. For zero
mean random processes with continuous psd, BcSt is defined as
^ j Sxx(f) df
2 m a x [5 ^ (/)] (3.44)
148 RANDOM PROCESSES AND SEQUENCES
Sxx(f>
(See Figure 3.12.) The effective bandwidth is related to a measure of the spread
of the autocorrelation function called the correlation time tc, where
(3.45)
Rxx( 0)
(3.46)
Other measures of spectral spread include the rms bandwidth defined as the
standard deviation of the psd and the half-power bandwidth (see Problems 3.23
and 3.24).
and
Unlike the psd, which is a real-valued function of / , the cpsd will, in general,
be a complex-valued function. Some of the properties of cpsd are as follows:
1. SXY(f) = S*yX(f)
2. The real part of SXY(J) is an even function o f f , and the imaginary part
of / is an odd function of f.
3. Sxrif) = 0 if X ( t ) and Y(t) are orthogonal and Sxy(f) = p-A-P-r&Cf) if
X(t ) and Y(t) are independent.
i (3.49)
Pxr(f) =
SXX(f)Syy(f)
The definition implies that SXx(J) is periodic in f with period 1. We will only
consider the principal part, —1/2 < / < 1/2. Then it follows that
It is important to observe that if the uniform sampling time ( T.) is not one seeond
i o t ' l buU s 1 /r , mS,e“d ° f "> ,hen ‘he aCtual trefl“ »cy range is
If X{n) is real, then Rxx{n) will be even and
which implies that Sxx(f) is real and even. It is also nonnegative. In fact, Sxx(f)
o f r n ^ UenCeH r of Soamg£Pr0pCrtieS as - M / ) of a continuous process except
course, as defined, Sxx{f ) o f a sequence is periodic.
fn r ^ n f ?hSh ^ ^ f 3 random secIuence can be defined as the Fourier trans-
form o f the autocorrelation function Rxx(n) as in Equation 3.50.a, we present
a shghdy modified version here that will prove quite useful later on. To simplify
the derivation, let us assume that E{X(n)} = 0. P y
We start with the assumption that the observation times of the random
equence are uniformly spaced in the time domain and that the index n denotes
X p(0 - 2 x (n)p{t - nT - D)
n = —co
AUTOCORRELATION AND POWER SPECTRAL DENSITY 151
S I 'T "T
el 2 I e/2 XWIle
T~l f m\ X ( n + k)/e
I i I
n I ~*1 p —[ e/2 —( t'—D )
W ----- v W - '
t \ —n T nT+D [ n + k )) T U2=
t = *In
{ n++ bk l) T j+. T '
(n + k ) T + D
= Rxpxp{kT + t')
From Figure 3.14, we see that the value o f the product X p(t}) X p(t2) will depend
on the value of D according to
X{n)X{n + k)
X p(li)Xp(l2)
0
otherwise
When t ' > e, then irrespective of the value o f D, t2 will fall outside of the pulse
at t = kT and hence X { t 2) and the product W(r1)W(t2) will be zero. Since X p{t)
is stationary, we can generalize the result to arbitrary values o f t ' and k and
write RxApAp
x as
R x x ik ) |t '| < e
R x Px p { k T + t ') =
0 e < |t '| < T —e
or
R m 6 ~ KT ~ k T )\ |kT - t|< e
RxpXp{ T) = R x x ^k ) TV1
l o elsewhere
and
SXpxp(f) = F{ RXfXr( t ) }
_ 1
Rxx(0) + 2 2 Rxxik) cos lnkfr (3.53)
~ T k= 1
i /J ,,(0 )/er
R x x d V 'T
R xx(3)U T
- 2 T
Z \ 2 T
V
T 3 T
—6
RxxWUT RxxWtT
Figure 3.15 Autocorrelation function of Xp(t).
If the random sequence X{n) has a nonzero mean, then SXx{f) will have
discrete frequency components at multiples o f 1IT (see Problem 3.35). Other
wise, SXx(f) will be continuous in / .
The derivation leading to Equation 3.53 seems to be a convoluted way of
obtaining the psd of a random sequence. The advantage of this formulation will
be explained in the next chapter.
EXAMPLE 3.10.
Find the power spectral density function o f the random process X(t) =
10 cos( 2000iTt + 0 ) where 0 is a random variable with a uniform pdf in the
interval [ - t r, i t ] .
SOLUTION:
Rxx(t) — 50 cos(2000ttt)
and hence
The psd of Sxx{fy shown in Figure 3ul6 has two discrete components in the
frequency domain at / = ± 1000 Hz. Note that
2
154 RANDOM PROCESSES AND SEQUENCES
Sxx^P
I
25 6 (/■+1000) 25 &if- 1000)
-1 0 0 0 Hz 0 1000 Hz
Figure 3.16 Psd of 10 cos(2000tt( + 0) and 10 sin(2000ir/ + 0).
Also, the reader can verify that Y{t) = 10 sin(2000irt + 0 ) has the same psd
as X( t ), which illustrates that the psd does not contain any phase information.
EXAMPLE 3.11.
R X p X p M
Rz, z,( t) 4 i
■* fc= — *
6 exP( 0.5|/r|)5(t - kT)
and 7?* * (t) = 7?ZjjZ (t) + (see Figure 3.17). Taking the Fourier transform,
we obtain the psd’s as
Sy'Y'V) = £ I 45 ( / - |
and
The psd o f X p(t) has a continuous part SZpZp(f) and a discrete sequence of
impulses at multiples o f 1IT.
The psd o f X (n ) is the Fourier transform of R zz{k) plus the Fourier transform
of R Yy{k) where
and
Thus
Note the similarities and the differences between SXpX and Sxx. Essentially
Sxx( f) is the principal part o f SXpX/i (i.e. the value of SXpX(f) for — < f < '-)
and it assumes that T is 1 .
EXAMPLE 3.12.
Find the psd of the random binary waveform discussed in Section 3.4.4.
M < T
Rxx(t)
0 elsewhere
The psd o f X ( t ) is obtained (see the table o f Fourier transform pairs in Appendix
A ) as
s xx{f) T
A sketch o f Sxx(f) is shown in Figure 3.18b. The main “ lobe” of the psd extends
from - 1/T to 1/T Hz, and 90% of the signal power is contained in the main
Rxx M
I
SxxU~>
t \
f
S IT
Figure 3.186 Power spectral density function of the random binary waveform.
lobe. For many applications, the “ bandwidth” o f the random binary waveform
is defined to be l/7\
EXAMPLE 3.13.
SOLUTION:
a 2 + ( 2 ir/ ) 2
2 max[Sxr(/)] 2 5^(0)
I . _ d _ = ^H z
2 ’ 2 Al a 4
158 RANDOM PROCESSES AND SEQUENCES
EXAMPLE 3.14.
The power spectral density function o f a zero mean Gaussian random process
is given by (Figure 3.19)
Find Rxx(t) and show that X(t) and X (t + 1 ms) are uncorrelated and, hence
independent. ’
SOLUTION:
IDUU exp(/2rr/T)
R x x ( t) = I exp(;'2n-/T) df =
/2 ttt
sin 2-nB-r
= (2 B) B = 500 Hz
2 ttB t
To show that X(t ) and X ( t + 1 ms) are uncorrelated we need to show that
E { X ( t ) X (t + 1 ms)} = 0.
2 5 ^ = 0
Sxx(f)
V
-------- f W z )
-5 0 0 0 500
Figure 3.19a Psd of a lowpass random process X (t).
AUTOCORRELATION AND POWER SPECTRAL DENSITY 159
1000
EXAMPLE 3.15.
1, l/l < B
Sxx(f) - 0 elsewhere
SOLUTION:
Al
R y y {t ) C 0 s ( 2 tt/ c7 )
2
and
Rzz( t) = E { X ( t ) Y ( t ) X ( t + t) Y{ t + t )}
A2
= Rxxi7) ■ y cos(2 tt/ ct)
A2
= Rxxi7) — [exp(2 'rr;'/c7) + e x p ( - 2 Tr;/c7 )]
160 RANDOM PROCESSES AND SEQUENCES
Sxx^fi Szz(f)
1 1 A 2/4 1
1 1 1
-* -2 B -» - l
I 1 1 * Lowpass Modulated i A 1A i
- B O B ‘ signal X(t) signal Z(t) fc
~fc
Carrier Y { t ) - A pos (2 * f et + B)
Syy^P
I
I
{A*/4)d{f+fc)\ I (A 2/4)5 ( f - f c)
-f.
A v
t- 0
■V
v f'
f
Szzif) = F { R zz{t ) }
= A* '
j R x x ( f ) e w ( ; '2 T T / c T ) e x p ( - ; ' 2 tt/ t ) d~
4
+ j R x x ( ' r ) e x p ( - j 2 T r f cT ) e x p ( - j 2 T T f r ) d j
A}
j ^ j r j f ( T ) e x p [ — ; ‘2 tt ( / - / c) t ] d i
4
J_ ^j«r(T)exp[ —; 2 tt( / + / c) t ] di
The preceding equations shows that the spectrum of Z(t) is a translated version
o f the spectrum o f X(t ) (Figure 3.20). The operation of multiplying a “ message”
signal X(t) by a “ carrier” Y(t) is called “ modulation” and it is a fundamental
operation in communication systems. Modulation is used primarily to alter the
frequency content o f a message signal so that it is suitable for transmission over
a given communication channel.
U.m. X { t ) •= X { t „)
where l.i.m. denotes mean square (MS) convergence, which stands for
3.7.1 Continuity
A stationary, finite variance real random process X( t ), t G T, is said to be
continuous in a mean square sense at tB G T if
and hence
lim E{ [ X ( t ) - X (f0)]2} = 0
'o
lim E { g ( X m = £ { g ( I ( /0)))
'-*'o
3.7.2 Differentiation
The derivative o f a finite variance stationary process X{t) is said to exist in a
mean square sense if there exists a random process X'{t) such that
X ( t + 6) - * ( Q
x '(0 (3.56)
e—
»0 6
CONTINUITY, DIFFERENTIATION, AND INTEGRATION 163
Note that the definition does not explicitly define the derivative random process
X'(t). To establish a sufficient condition for the existence of the MS derivative,
we make use o f the Cauchy criteria (see Equation 2.97) for MS convergence
which when applied to Equation 3.56 requires that
Completing the square and taking expected values, we have for the first term
X( t + €l) - X (t )
E
Now, suppose that the first two derivatives of R Xx(~) exist at t = 0. Then, since
R xx( t) is even in t , we must have
R'xx{ 0) = 0
and
2 [R^-(e) - Rxx(0)]
Rxx(P) = lim
6—*0 6‘
Hence
X { t + e,) - X{t)
lim E R'xxi 0)
El
Proceeding along similar lines, we can show that the cross-product term in
Equation 3.57 is equal to 2RXX(Q), and the last term is equal to - R ' xx{0).
Thus,
if the first two derivatives of R xx( t) exist at t = 0, which guarantees the existence
of the MS derivative of X(t). This development is summarized by:
A finite variance stationary real random process X(t) has a MS derivative, Ai'(t),
if Rxx( t ) has derivatives o f order up to two at t = 0.
164 RANDOM PROCESSES AND SEQUENCES
X ( t + e) - X ( Q
£ { * '( 0 1
= ]im E { X ( t + e)} - £ {* ( * )}
*o e
= m-KO
E { X ’ {t)} = 0
which yields
The functions on the right-hand side of the preceding equation are deterministic
and the limiting operation yields the partial derivative of RXx({u h) with respect
to t2. Thus,
dRxxjtn t2)
Rxx'{ti> h)
dt2
dRxx'ih, h)
Rx'x'ih,
dh
d2Rxx(.h, h)
dtidtj
E { X ’ {t)} = 0 (3.58)
dRxx{T)
Rxx'ir) (3.59)
dt
d2RXx(r)
Rx'x'ir) (3.60)
dr2
3.7.3 Integration
The Riemann integral of an ordinary function is defined as the limit of a summing
operation
n-1
dr = lim ^ x(-Tj) At,
i= 0
where t0 < q < t2 < • • • < t„ = t is an equally spaced partition of the interval,
[f„, r], Ati *= ti +x — f,-, and-T, is a point in the ith interval, [th fi+1]. For a random
process X(t ) , the MS integral is defined as the process Y(t)
It can be shown that a sufficient condition for the existence of the MS integral
Y(t) of a stationary finite variance process X(t) is the existence of the integral
Note that finite variance implies that R**(()) < 10 and MS continuity implies
continuity of R xx (t) at t -- 0, which alsorimplies continuity for all values of r.
These two conditions guarantee the existence of the preceding integral and,
hence, the existence of the MS integral.
When the MS integral exists, we can show that
E { Y ( t ) } = ( t - t0) p x (3.62)
166 RANDOM PROCESSES AND SEQUENCES
and
(3.63)
EXAMPLE 3.16.
SOLUTION: For the random binary waveform X(t ) , the autocorrelation func
tion is
elsewhere
(a) Since Rxx(j) is continuous at t = 0, X(t) is MS continuous for all t.
(b) The derivative o f X(t ) does not exist on a sample-function-by-sample-
function basis and ^ **(0) and R^a-(O) do not exist. However, since their
existence is only a sufficient condition for the existence o f the MS de
rivative of X(t), we cannot conclude whether or not X(t) has a MS
derivative.
(c) Finite variance plus MS continuity guarantees the existence of the MS
integral over any finite interval [f0, f].
{X(t))T is also referred to as the time average of X(t ) and has many important
applications. Properties of (X( t )) T and its applications are discussed in the fo l
lowing section.
errors.” If the value of the variable being measured is constant, and errors are
due to “ noise” or due to the instability of the measuring instrument, then av
eraging is indeed a valid and useful technique. Time averaging is an extension
of this concept and is used to reduce the variance associated with the estimation
o f the value of a random signal or the parameters of a random process.
As an example, let us consider the problem of estimating the amplitudes of
the pulses in a random binary waveform that is corrupted by additive noise.
That is, we observe Y(t) — X(t ) + N(t) where X(t) is a random binary wave
form, N(t) is the independent noise, and we want to estimate the pulse ampli
tudes by processing Y(t). A sample function of Y(t) is shown in Figure 3.21.
Suppose we observe a sample function y(t) with D = 0 over the time interval
(0, T), or from (k - 1)T to kT in general, and estimate the amplitude of x(t)
in the interval (0, T). A simple way to estimate the amplitude of the pulse is
to take one sample of y(t) at some point in time, say q £ ( 0 , T), and estimate
the value o f x(t) as
1 m
+1 for 0 < t< T if ~ 2 y ( t , ) > 0; f, e (0, T)
m
m - 1 "
-1 for 0 < t< T if f, £ (0, T)
168 RANDOM PROCESSES AND SEQUENCES
f
+1
x(t) = ■
-1
V.
The decision rule given above, which is based on time averaging, is extensively
used in communication systems. The relationship between the duration of the
integration and the variance of the estimator is a fundamental one in the design
of communication and control systems. Derivation of this relationship is one of
the topics covered in this section.
We have used ensemble averages such as the mean and autocorrelation
function for characterizing random processes. To estimate ensemble averages
one has to perform a weighted average over all the member functions o f the
random process. An alternate practical approach, which is often misused, in
volves estimation via time averaging over a single member function of the proc
ess. Laboratory instruments such as spectrum analyzers and integrating volt
meters routinely use time-averaging techniques. The relationship between
integration time and estimation accuracy, and whether time averages will con
verge to ensemble averages (i.e., the concept of ergodicity) are important issues
addressed in this section.
or
Time-averaged. Mean.
( X ( t ) X ( t + t) ) T = ( R x x ( t) ) t = ^ r X ( t ) X ( t + t) dt (3.67)
* J- 772
i|<X(t)exp(-2Tr;70>rP = (Sxx( f ) ) T
= 4 [ X ( t ) e x p( - j 2t Tf t ) dt (3.68)
I J - T l 2
Notice that in this example, none of the values of (p-x)r equals the true ensemble
mean o f X ( t ) , which is zero.
The determination of the probability distribution function of the random
variable (g[26(r)])7 is in general very complicated. For this reason, we will focus
our attention only on the mean and variance of (g[A 70])r and use them to
analyze the asymptotic distribution of (g[A'(t)])T as T —> « . In the following
derivation, we will assume the process to be stationary so that the ensemble
170 RANDOM PROCESSES AND SEQUENCES
averages do not depend on time. Finite variance and MS continuity will also be
assumed so that the existence of the time averages is guaranteed.
and
= A
nr
2 2
j j
c x x (\i - y|a ) (3.7i)
cl
E { Y } = p.^- and cr\ — ~~ (3.72)
m
m = g [x(t)]
and
1 f T'2
Y = ■=.[ Z ( t ) dt
1 J - T12
TIME AVERAGING AND ERGODICITY 171
Then
1 [T il
= ~ E { Z( t ) } dt
1 J-T I2
1 fT /2
= ~ P-z dt = n-z (3.73)
7 7-272
f 1 fra 1 rr /2 7
E { Y 2} = £ | - J ^ Z (tj) dt, - J ^ Z (t 2) dr2j
T il
and
Tl 2
With reference to Figure 3.22, if we evaluate the integral over the shaded strip
centered on the line r, — r2 = t , the integrand Czz(r, — r2) is constant and
equal to Czz( t ) , and the area of the shaded strip is [T — | t |] dr. Hence, we
can write the double integral in Equation 3.74 as
272
or
2
1
Y Czz(T) di (3.75.a)
T
172 RANDOM PROCESSES A N D SEQUENCES
/A
/
/
/
/
— T - r
F ig u r e 3 .2 2 E v a l u a t i o n o f th e d o u b l e in t e g r a l g iv e n in E q u a t i o n 3 .7 4 .
(3.75.b)
where
The advantages of time averaging and the use of Equations 3.71, 3.75.a, and
3.75.b to compute the variances of time averages are illustrated in the following
examples.
EXAMPLE 3.17.
10-6
/(KHz)
-5 0 0 0 500
Figure 3.23 Psd of X(t) for Example 3.17.
SOLUTION:
E{Y} = - £ { Z ( A ) + X (2 A ) + + A (10A )}
E { Y 2} = E
10
IH G IH
hi 2 2 E { X ( i E ) X ( j E ) }
100
~ ^QQ 2 S E Xx(ji /l ^ )
1 1
£ { y ! > - I o o , ? , R “ (0) 10
or
cr|
10
EXAMPLE 3.18.
spectral density of
Let
1 fT I 2
y = r X ( t ) dt
1 J- 772
SOLUTION:
= 2AB
1 fr /2
E{Y} = - \ E {X ( t ) } d t = 0
i J - T i l
= E{ Y2} = < //
From Figure 3.24, we see that the bandwidth or the duration of (sin vfT/'nfTf
is very small compared to the bandwidth of Sxx{f ) and hence the integral of
the product can be approximated as
or
and
<j2
x 2AB
= 2 BT, B T » 1
a\ {AIT)
The result derived in this example is important and states that time averaging
of a lowpass random process over a long interval results in a reduction in variance
by a factor o f 2BT (when BT » 1). Since this is equivalent to a reduction in
variance that results from averaging 2BT uncorrelated samples of a random
sequence, it is often stated that there are 2BT uncorrelated samples in a T
second interval or there are 2 B uncorrelated samples per second in a lowpass
random process with a bandwith B.
EXAMPLE 3.19.
(a) S, = y ( q ) , q e (0, T)
SOLUTION:
and
var(51) = R nn( 0) = 1
E { $ 2} = ^ j T E{ 1 + N(t)} dt = 1
and
var{S2}
V”{ lf "‘'W
C NN( t) dr
f f_ J
1 J
If exp( —T/a) dr
2a 2a2
[1 - e x p (-7 7 a )]
~T
1
var{52} » y =
500
3.8.2 Ergodicity
In the analysis and design of systems that process random signals, we often
assume that we have prior knowledge of such quantities as the means, auto
correlation functions, and power spectral densities of the random processes
involved. In many applications, such prior knowledge will not be available, and
TIME AVERAGING AND ERGODICITY 177
and attempt to use the time average as an estimate of the ensemble average,
^ (0-
Whereas the time average (x(t))T is a constant for a particular member
function, the set of values taken over all member functions is a random variable.
That is, ( X( t ) ) T is a random variable and (x(t))T is a particular value o f this
random variable. Now, if p.A(f) is a constant (i.e., independent of t), then the
“ quality” o f the time-averaged estimator will depend on whether E{ ( X( t )) rj —>
|aa- and the variance of {(Ar(t))r} - » 0 as T - » °°. If
lim E { ( X ( t ) ) rj = |aa
and
then we can conclude that the time-averaged mean converges to the ensemble
mean and that they are equal. In general, ensemble averages and time averages
are not equal except for a very special class of random processes called ergodic
processes. The concept of ergodicity deals with the equality o f time averages
and ensemble averages.
The problem of determining the properties of a random process by time
averaging over a single member function of finite duration belongs to statistics
and is covered in detail in Chapter 9. In the following sections, we will derive
the conditions for time averages to be equal to ensemble averages. We will focus
our attention on the mean, autocorrelation, and power spectral density functions
of stationary random processes.
178 RANDOM PROCESSES AND SEQUENCES
l.i.m. (q.A")r — Px
where l.i.m. stands for equality in the mean square sense, which requires
and
lim var{(p^)r} = 0
- 7 /2 - 7 /2
TIME AVERAGING AND ERGODICITY 179
var{{(xAr)7-} = j j\ - y j Cx x ( t ) d-r
Although Equation 3.77 states the condition for ergodicity of the mean of
X (t), it does not have much use in applications involving testing for ergodicity
of the mean. In order to use Equation 3.77 to justify time averaging, we need
- prior knowledge of C x x ( t ) . However, Equation 3.77 might be o f use in some
situations if only partial knowledge o f Cxx( t ) is available. For example, if we
know that | C , o - ( t )| decreases exponentially for large values of |t |, then we can
show that Equation 3.77 is satisfied and hence the process is ergodic in the mean.
The reader can show using Equations 3.73 and 3.75.a that
and
l.i.m. ( R xx( a ) ) T = R x x ( a )
7"—
if
which is also called the periodogram o f the process. Note that the integral
represents the finite Fourier transform; the magnitude o f the Fourier transform
squared is the energy spectral density function (Parseval’s theorem); and 1/T is
the conversion factor for going from energy spectrum to power spectrum.
Unfortunately, the time average (5A-A- ( /) ) r does not converge to the ensemble
average Sx x( f ) as T —» =°. We will show in Chapter 9 that while
EXAMPLE 3.20.
For the stationary random process shown in Figure 3.9, find £ {< p x)r} and
var{(p*)r}. Is the process ergodic in the mean?
= i{5 + 3 + l - l - 3-5} = 0
Note that the variance of {p _v}7 does not depend on T and it does not decrease
as we increase T. Thus, the condition stated in Equation 3.77 is not met and
the process is not ergodic in the mean. This is to be expected since a single
member function o f this process has only one amplitude, and it does not con
tain any o f the other five amplitudes that X(t) can have.
EXAMPLE 3.21.
X { t ) = 10 cos(100/ + 0 )
SOLUTION:
1 fr /2
- 100 cos( 100 t + 0 ) cos( 100t + 100 t + 0 ) dt
T J-T/2
1 rr /2
- 50 c o s (IO O t ) dt
T J-T/2
1 [T/2
+ — I 50 cos(200r + 100t + 2 0 ) dt
T J-T/2
(Rxx(t))t = 50 cos(100t )
= Rxx ( t )
Other Forms of Ergodicity. There are several other forms of ergodicity and
some of the important ones include the following:
Tests for Ergodicity. Conditions for ergodicity derived in the preceding sections
are in general of limited use in practical applications since they require prior
knowledge of parameters that are often not available. Except for certain simple
cases, it is usually very difficult to establish whether a random process meets
the conditions for the ergodicity of a particular parameter. In practice, we are
usually forced to consider the physical origin of the random process to make an
intuitive judgment of ergodicity.
For a process to be ergodic, each member function should “ look” random,
even though we view each member function to be an ordinary time signal. For
example, if we consider the member functions of a random binary waveform,
randomness is evident in each member function and it might be reasonable to
TIME AVERAGING AND ERGODICITY 183
expect the process to be at least weakly ergodic. On the other hand, each of
the member functions of the random process shown in Figure 3.9 is a constant
and by observing one member function we learn nothing about other member
functions of the process. Hence, for this process, time averaging will tell us
nothing about the ensemble averages. Thus, intuitive justification of ergodicity
boils down to deciding whether a single member function is a “ truly random
signal” whose variations along the time axis can be assumed to represent typical
variations over the ensemble.
The comments given in the previous paragraph may seem somewhat circular,
and the reader may feel that the concept of ergodicity is on shaky ground.
However, we would like to point out that in many practical situations we are
forced to use models that are often hard to justify under rigorous examination.
Fortunately, for Gaussian random processes, which are extensively used in
a variety of applications, the test for ergodicity is very simple and is given below.
EXAMPLE 3.22.
Cxxi7} dv ^ ■
Hence,
184 RANDOM PROCESSES AND SEQUENCES
since
dx < co
V = j J ^ l - Czz( x) dx; Z ( 0 = * ( 0 * ( f + a)
| C z z ( t )| d x
where
^zz(T) £{*(0-^'0 +
~ o t)* (f + x)*(t + a + x)} — R x x i 01- )
we have
and
Since J“ ^ |7? ^ ( t)| dr < <*, the upper bound approaches 0 as T —» °°, and hence
the variance (V) o f the time-averaged autocorrelation function —* 0 as t —»
Thus, if the autocorrelation function is absolutely integrable, then the stationary
Gaussian process is ergodic. Note that this is a sufficient (but not a necessary)
condition for ergodicity. Also note that dj < °° requires that
We have seen that a stationary random process can be described in the frequency
domain by its power spectral density function which is defined as the Fourier
transform o f the autocorrelation function o f the process. In the case o f deter
ministic signals, the expansion o f a signal as a superposition of complex expo
nentials plays an important role in the study of linear systems. In the following
discussion, we will examine the possibility of expressing a random process X(t)
by a sum of exponentials or other orthogonal functions. Before we start our
discussion, we would like to point out that each member function of a stationary
random process has infinite energy and hence its ordinary Fourier transform
does not exist.
We present three forms for expressing random processes in a series form,
starting with the simple Fourier series expansion.
N
X(t) = X Cx ( nf0)exp(j2-nnfat); (3.83.a)
where
(3.83.b)
186 RANDOM PROCESSES AND SEQUENCES
1. E { C x ( n f 0)C*x ( m f 0)} = 0 , n ^ m;
The rate of convergence or how many terms should be included in the series
expansion in order to provide an “ accurate” representation of X{t ) can be
determined as follows: The MS difference between X(t ) and the series X ( t ) is
given by
N
E{\X(t) - x m = E x (t) - X C A-(«fo)exp(/2TTn/oO
n= - N
= E{|*(0|2} - £ £ {| C * («/0)|2}
n = —N
= E{\X{t) - X(t)\i}
(3.84)
N E{\X(t)\2}
can be used to measure the rate of convergence and the accuracy of the series
representation as a function of N. As a rule of thumb, t?N is chosen to have a
value less than 0.05, which implies that the series representation accounts for
95% of the normalized MS variation of X ( t ).
SPECTRAL DECOMPOSITION AND SERIES EXPANSION 187
N
1
X(t) = X C x ( n f 0)exp( J2Trnf 0t), (3.85)
n = - N /
where
1. E { C x ( nf 0) C $ ( m f 0)} = 0, m ^ n
2. E{\X(t)\2} = £{|AT(0|2} a s n- >=o
f ( n + l /2)f0
3. E{\Cx ( nf 0)\2} = Sxx( f ) d f
J(«- I/2)/0
4. Sa(f) = £ E{ICx ( n f 0) m f - n f 0)
„ f {n +■1/2)/0
= 2 X ^ (/)[1 - cos 2-nt(f - nfo)] d f
n=-°° *'(n-l/2)/0
6. For any finite value o f N, the MS error is the shaded area shown in
Figure 3.25.
The proofs of some of these statements are rather lengthy and the reader is
referred to Section 13-2 o f the first edition o f [9] for details.
'772
72 T
Rxx(t ~ t )4 ( t ) dr = A<j>(r), |r| < - (3.87)
772 2
The solution yields a set o f eigenvalues A.] > A2 •> A3, . . . , and eigenfunctions
4i(0> 4*2( 0 * 4*3( 7), . . . , and the K-L expansion is written in terms of the
eigenfunctions as
m = 2 A nu t ) , k! < ? (3.88)
n =1
where
(3.89)
SAMPLING AND QUANTIZATION OF RANDOM SIGNALS 189
1. l.i.m. X ( t ) = X(t)
CTI 2
m = n
2. 4>„(0<t>£(0 dt =
J—772 ■ f r
m — n
3. E{A„A*J = jj" ’
m ¥= n
<0
4. E{X\t)} = Rx x (0) = s
n= 1
£ K
n= N + 1
5. Normalized MSE =
The main difficulty in the use of Karhunen-Loeve expansion lies in finding the
eigenfunctions of the random process. While much progress has been made in
developing computational algorithms for solving integral equations of the type
given in Equation 3.87, the computational burden is still a limiting factor in the
application of the K-L series expansion.
Bxx(f)
where
jj 1
27” B
and
1 rb
Cx {nTs) = — j ^ Sx x ( f ) e x p ( - j 2 T t n f T s) d f (3.91)
Rxx(‘) — F 1 | 2 Cx (nTs)exp(j2-unfTs)\
i.n= j
- 2 ( [ ° Sx x ( f 2) e x p ( - j 2 - n n f 2Ts) d f 2
n= - * J - B ’ £ & 0 J ~ B 0
x 'exp[/2'ir/I(T + nTs)\ d f r
= Y r 1 - n T '*—L sin 2tt5 '( t + nTs)
Rxx{ t ) = 2 BTS ^ R x x (n T s)
It is convenient to state two other versions o f Equation 3.92 for use in deriving
the sampling theorem for lowpass random signals. With a an arbitrary constant,
the transform of R Xx {T ~ n) is equal to Sx x ( f ) e x p ( —j 2n f a) . This function is
also lowpass, and hence Equation 3.92 can be applied to R xx{ t - a) as
where
Sin 7TX
sine x
We now state and prove the sampling theorem for band-limited random pro
cesses.
L
192 RANDOM PROCESSES A N D SEQUENCES
£ {[ -^ ( 0 - -^ v (0 ]2} = 0 as N c o (3.96)
Now
E { [ X ( t ) - Z ( 0 ] 2} = E { [ X ( t ) - * ( 0 ] * ( t ) }
- £ { [ * ( 0 - * ( /)]* (* )} (3.97)
The first term on the right-hand side o f the previous equation may be written
as
E{[X{t) - X (t)]X(t)}
and hence
E{[X(t) - X (t) ]X (t )}
Now
E { [ * ( 0 - X ( t ) ] X ( m T , )}
co
= Rxx(t - mTs) - Y, 2BTsRxx{nTs - mTs)sinc[2B(t - nTs)\
n = —«
Hence
Substitution of Equations 3.98 and 3.99 in Equation 3.97 completes the proof
of the uniform sampling theorem.
The sampling theorem permits us to store, transmit, and process the sequence
X {n Ts) rather than the continuous time signal X( t ) , as long as the samples are
taken at intervals less than 1/(2B ). The minimum sampling rate is 2B and is
called the Nyquist rate. If X ( t ) is sampled at rates lower than 2 B samples/second,
then we cannot reconstruct X (t ) from X ( n Ts) due to “ aliasing,” which is ex
plained next.
Aliasing Effect. To examine the aliasing effect, let us define the sampling
operation as
where D is a random variable with a uniform distribution in the interval [0, 7)],
and D is independent o f X(t). The product 3^.(0 = X(t) ■ S(t) as shown in
194 RANDOM PROCESSES AND SEQUENCES
(a)
---------------- X U )
/
\ *
(6) 1
S it)
" ■ r r r n i i f i n T i -
D Z)
(c)
X s(*) = XU)SU)
- x f f f T K
X > N u T, ‘
\ ,
__ z
(4)
-B B f
(e)
A I T ,2
:t s : -f,
v
-B C B
7 V
A
. > 2B
'
v>
A /T s2 A <2B
... X , X X X , X , X ...
X \ I X \ [ X \ \ / \ X \
-2 7 , -f. 0 / A, If,
Aliasing
Figure 3.27 Sampling operation.
Following the derivation in Section 3.6.5, the reader can show that the auto
correlation function o f X s(t) is given by
= i 7 R*x(kTs) 8( t - kTs)
k =-* l s
The last step results from one of the properties of delta functions. Taking the
Fourier transform of Rx,x,(t) we obtain
sx,xXf) = y, S x A f ) * 8 (t ~
*■ { £ S(t - * r , ) } = ^ £ 8( / - k fs)
The preceding equation shows that the psd of the sampled version X s(t) of X(t)
consists of replicates of the original spectrum Sxx( f ) with a replication rate of
f s. For a band-limited process X(t), the psd of X(t) and X s(t) are shown in
Figure 3.27 for two sampling rates f s > 2B and f, < 2B.
When f s > 2B or Ts < 1 /(2 S ), SXlx,{f) contains the original spectrum of
X{t ) intact and recovery of X(t') from X s(t) is possible. But when f s < 2B,
replicates of Sxx( f) overlap and the psd of X j t ) does not bear much resemblance
to the psd of X { t ) . This is called the aliasing effect, and it often prevents us from
reconstructing X(t) from X s(t) with the required accuracy.
When / , > (2J5), we have shown that X(t) can be reconstructed in the time
domain from samples of X(t) according to Equation 3.95. Examination of Figure
3.27e shows that if we select only that portion of Sx x f f ) that lies in the interval
[ —13, 13], we can recover the psd of X(t). This selection can be accomplished
in the frequency domain by an operation known as “ lowpass filtering,” which
196 RANDOM PROCESSES AND SEQUENCES
will be discussed in Chapter 4. Indeed, Equation 3.95 is the time domain equiv
alent of lowpass filtering in the frequency domain.
3.10.2 Quantization
The instantaneous value of a continuous amplitude (analog) random process
X{t) is a continuous random variable. If the instantaneous values are to be
processed digitally, then the continuous random variable X , which can have an
uncountably infinite number of possible values, has to be represented by a
discrete random variable with a finite number of values. For example, if the
instantaneous value is sampled by a 4-bit analog-to-digital converter, then X is
approximated at the output by a discrete random variable with one of 24 possible
values. We now develop procedures for quantizing or approximating a contin
uous random variable X by a discrete random variable X q. The device that
performs this operation is referred to as a quantizer or analog-to-digital con
verter.
An example of the quantizing operation is shown in Figure 3.28. The input
to the quantizer is a random process X(t), and we will assume that the random
signal X(t) is sampled at an appropriate rate and the sample values X(kTs) are
converted to one of Q allowable levels, mb m2, . . . , mQ, according to some
predetermined rule:
We see from Figure 3.28 that the quantized signal is an approximation to the
original signal. The quality o f the approximation can be improved by increasing
the number o f quantizing levels Q and for fixed Q by a careful choice o f x-’s
and m^s such that some measure of performance is optimized. The measure of
performance that is most commonly used for evaluating the performance of a
quantizing scheme is the normalized MS error,
2 = E m m - x a{kTs)f\
62 E{X\{kTs)}
mean, stationary random process with a pdf We will use the abbreviated
notation, X to denote X{kTs) and X q to denote X q{kTs). The problem of quan
tizing consists of approximating the continuous random variable ATby a discrete
random variable X q such that E{ ( X - X q)2} is minimized.
(b ~ a)
A = (3.102. a)
Q
198 RANDOM PROCESSES AND SEQUENCES
where
Xi = a + iA (3.102.c)
and
x ,_ , + X i
m = ---------------- ;
2 (3.102.d)
NQ = E { { x - x qy }
= [ (x ~ xq)2fx(x) dx
Ja
= 2 [ (x - mi)2f x( x) dx (3.103.a)
;=i V i
Sq = E{{ Xqf }
S, rxi
= 2 (w <)2 / * 0 ) dx (3.103.b)
i= I Jx, .
SAMPLING AND QUANTIZATION OF RANDOM SIGNALS 199
The ratio NQ/SQ is e| and it gives us a measure of the MS error o f the uniform
quantizer. This ratio can be computed if the pdf o f A'is known.
EXAMPLE 3.23.
The input to a Q-step uniform quantizer has a('uniform pdf over the interval
[ —a, a]. Calculate the normalized MS error as a function o f the number o f
quantizer levels.
N,
-i i~ 1 J - a + 0-1
(i )4 \
x + a -
aV i
iA + - I ~ d x
2 / 2a
4 m 12/
QA3 A2
since QA = 2a
( 2 a) 12 12
Now, the output signal power SQ can be obtained using Equation 3.103.b 2as
SQ = 4 , ( " 0
,= 1 \2a
Q2 - l
(A)2
12
No _ 1 1
(3.104)
sq Q2 - i Q2 w hen ^ > ;> 1
Equation 3.104 can be used to determine the number of quantizer levels needed
for a given application. In quantizing audio and video signals the ratio NQ/SQ
is kept lower than 10“ 4, which requires that Q be greater than 100. It is a common
practice to use 7-bit A /D converters (128 levels) to quantize voice and video
signals.
200 RANDOM PROCESSES AND SEQUENCES
X q = m, if < X :£ x h i = 1, 2, . . . , Q
X 0 = X q = °o (3.105)
The step size A, = x, — is variable. The quantizer end points x.-’s and the
output levels m^s are chosen to minimize NQ/SQ.
The design of an optimum nonuniform quantizer can be approached as fol
lows. We are given a continuous random variable X with a pdf f x (x). We want
to approximate X b y a discrete random variable X q according to Equation 3.105.
The quantizing intervals and the levels are to be chosen such that NQ is mini
mized. This minimizing can be done as follows. We start with
‘ After finding all the x ’ s and m ’s that satisfy the necessary conditions, we may evaluate N a at these
points to find a set o f j:,’s and m ’ s that yield the absolute minimum value o f N Q . In most practical
cases we will get a unique solution for Equations 3.106.a and 3.106.b.
SAMPLING AND QUANTIZATION OF RANDOM SIGNALS 201
equal to zero:
3Nq
= (xj - mj)2f x ( x - (Xj - m/+l)2f x (Xj) = 0
dXj
j = 1, 2 , . . . , Q - 1 (3.106.a)
Xj = ^ (m; + mj+i)
which implies that is the centroid (or mean) of the /th quantizer interval.
The foregoing set of simultaneous equations cannot be solved in closed form
for an arbitrary f x (x). For a specific f x(x), a method of solving Equations 3.107.a
and 3.107.b is to pick m t and calculate the succeeding jc. ’ s and m ’s using Equa
tions 3.107.a and 3.107.b. If m, is chosen correctly, then at the end of the
iteration, me will be the mean o f the interval °°]. If mQ is not the centroid
or the mean of the Qth interval, then a different choice of mx is made and the
procedure is repeated until a suitable set of x' s and m- s is reached. A computer
program to solve for the quantizing intervals and the means by this iterative
method can be written.
Quantizer for a Gaussian Random Variable. The end points o f the quantizer
intervals and the output levels for a Gaussian random variable have been com
puted by J. Max [15]. Attempts have also been made to determine the functional
dependence of N q on the number of levels Q. For a Gaussian random variable
with a variance o f 1, Max has found that N q is related to Q by
202 RANDOM PROCESSES AND SEQUENCES
N q ~ ( 2 . 2 M Q - 1-96 (3.108)
-1 .9 6
(3.109)
Equation 3.109 can be used to determine the number of quantizer levels needed
to achieve a given normalized mean-squared error for a zero-mean Gaussian
random process.
3.11 SUMMARY
In this chapter, we introduced the concept o f random processes, which may
be viewed as an extension of the concept o f random variables. A random
process maps outcomes o f a random experiment to functions of time and is a
useful model for both signals and noise. For many engineering applications, a
random process can be characterized by first-order and second-order proba
bility distribution functions, or perhaps just the mean and variance and auto
correlation function. For stationary random processes, the mean and
autocorrelation functions are often used to describe the time domain structure
of the process in an average or ensemble sense. The Fourier transform of the
autocorrelation function, called the power spectral density function, provides
a frequency domain description of the random processes.
3.12 REFERENCES
[3] W. B. Davenport, Jr. and W. L. Root, Introduction to Random Signals and Noise,
McGraw-Hill, New York, 1958.
[4] W. A. Gardner, Introduction to Random Processes: With Applications to Signals
and Systems, Macmillan, New York, 1986.
[8] N. Mohanty, Random Signals, Estimation and Identification, Van Nostrand, New
York, 1986.
[9] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Hill, New York, 1965, 1984.
[10] A. Papoulis, Signal Analysis, McGraw-Hill, New York, 1977.
[11] P. J. Peebles, Probability, Random Variables and Random Signal Principles, 2nd
ed., McGraw-Hill, New York, 1986.
[12] M. O’Flynn, Probabilities, Random Variables and Random Processes, Harper and
Row, New York, 1982.
[13] M. Schwartz and L. Shaw, Signal Processing: Discrete Spectral Analysis, Detection
and Estimation, McGraw-Hill, New York, 1975.
3.14 PROBLEMS
-2 k = 1
-1 k = 2
1 k = 3
X(t) =
2 k = 4
t k = 5
-t k = 6
3.2 The member functions of two random processes X(t ) and Y(t) are shown
in Figure 3.31. Assume that the member functions have equal probabilities
of occurrence.
3.3 X{t) is a Gaussian random process with mean p.jr(f) and autocorrelation
function Rxxi k, k)- Find £ {A '(f 2)|A'(fi)}, tx < t2.
3.5 For a Markov process X(t) show that, for t > tx > f0>
/x(/)|v(/,l)('t|'to) = /tr(i)|A-(!1)('t|'':i)/A-(/1)|tr(r„)(a:i|^o)
J dx\
a. P[ X( 2) = 0]
b. P [ X( 8) = 0|*(6) = 2]
c. £ { * ( 10 )}
d. £ {*(1 0 )| * (4 ) = 4}
206 RANDOM PROCESSES AND SEQUENCES
P [ X ( n ) = 1] = P [ X ( n ) = - 1 ] = |
Y{n) = 2 X(k), n = 1, 2, 3, . . .
k= 1
3.10 Let N(t), f > 0 be the Poisson process with parameter X, and define
x{t) = f 1 1 !? M 0 isod d
v’ [- 1 if N{t) is even
3.11 X{t) is a real WSS random process with an autocorrelation function ft.o-(T)-
Prove the following:
a. If X(t ) has periodic components, then R xx (j ) will also have pe
riodic components.
b. If Rxx(0) < oo, and if R xx{t) is continuous at t = 0, then it is
continuous for every t.
3.12 X(t) and Y(f) are real random processes that are jointly WSS. Prove the
following:
3.13 X(t ) and Y{t) are independent WSS random processes with zero means.
Find the autocorrelation function of Z(t) when
a. Z{t) = a + bX{t) + cY(t)
b. Z(t) = aX(t)Y(t)
3.14 X(t ) is a WSS process and let Y(t ) = X ( t + a) - X ( t - a). Show that
3.16 Determine whether the following functions can be power spectral density
functions of real-valued WSS random processes.
a. (1 + 10/2)" 1/2
sin 1000/-
b' 1000/
c. 50 + 205(/ - 1000)
d. 1 0 8 (f) + 55(/ + 500) + 55(/ - 500)
e. exp( - 200-rr/2)
f f ....-
i f 2 + 100)
3.17 For each of the autocorrelation functions below, find the power spectral
density function.
a. exp( —<i |t|), a > 0
sin 1000 t
’ 1000 t
3.18 For each o f the power spectral density functions given below, find the
autocorrelation function.
b. 1 /( 1 + 4it2/ 2)2
c. 1008(/) + 2ci/(ci2 + 4it2/ 2)
3.20 The psd o f a WSS random process X(t ) is shown in Figure 3.32.
a. Find the power in the DC term.
b. Find £ { Z 2(t)}.
c. Find the power in the frequency range [0, 100 Hz].
1006(f)
f
-1 0 0 0 0 1000
Figure 3.32 Psd of X( c) for Problem 3.20.
PROBLEMS 209
3.23 For the random process X(t) with the psd’s shown in Figure 3.33, deter
mine
a. The effective bandwidth, and
b. The rms bandwidth which is defined as
f f S x x ( f ) df
BL
f Sxx(f) df
[Note: The rms bandwidth exists only if SXX(J) decays faster than 1I f ]
4 f (/ —f o ) 2S x x ( f ) df
Jo
BL
f~ S x x ( f ) d f
Jo
r / s XX(d df
Jo
r s xx( f) df
Jo
Find the rms bandwidth of
A
Sxx(f) A, B , f a > 0
210 RANDOM PROCESSES AND SEQUENCES
Sxx^P Syyif)
//
//
Area = o Y2 //
By « By
~Bx 0 &x
■f JL
-By 0 By
3.26 X{t) and Y(t) are two independent WSS random processes with the power
spectral density functions shown in Figure 3.34. Let Z(t) = A '(f)F (t).
Sketch the psd of Z (f), and find Szz(0).
Milliseconds
Figure 3.35 Autocorrelation function for Problem 3.27.
PROBLEMS 211
3.27 A WSS random process X{t) has a mean of 2 volts, a periodic component
^ ( f ) , and a random component A r( /) ‘, that is, A (r) = 2 4- X p(t) 4- X r(t).
The autocorrelation function o f X(t) is given in Figure 3.35.
3.30 Show that for a lowpass process with a bandwidth B , the amount of change
from t to t 4- t is bounded by
a. EUA'ft 4- t) - A'UlR =£ (2 »S f)2K + p.2vl
b. Rxx{0) - tf.w(T) < (2TTBr):Rxxm/2
3.31 X (r) and Y(t) are two independent WSS processes that are MS continuous.
a. Show that the sum A(r) 4- Y(t) is MS continuous.
b. Show that the product X(t)Y(t) is also MS continuous.
3.32 Show that both MS differentiation and integration obey the following rules
of calculus:
a. Differentiating and integrating linear combinations,
b. Differentiating and integrating products of independent random
processes.
3.33 Show that the sufficient condition for the existence of the MS integral of
a stationary finite variance process X(t) is the existence of the integral
3.35 Let Z (t) = x(f) + Y (f) where x(f) is a deterministic, periodic power signal
with a period T and Y(t) is a zero mean ergodic random process. Find the
autocorrelation function and also the psd function of Z{t) using time av
erages.
3.36 X(t) is a random binary waveform with a bit rate of IIT, and let
Y(t) = X { t ) X { t - 772)
a. Show that Y(r) can be written as Y{t) = v(t) ■+■ W(t) where v(t)
is a periodic deterministic signal and W(t) is a random binary waveform
of the form
b. Find the psd o f Y(r) and show that it has discrete frequency spectral
components.
3.37 Consider the problem o f estimating the unknown value of a constant signal
by observing and processing a noisy version of the signal for T seconds.
Let X ( t ) = c + N(t) where c is the unknown signal value (which is assumed
to remain constant), and N(t) is a zero-mean stationary Gaussian random
process with a psd SNN(f) - N0 for |/| < B and zero elsewhere ( B »
1IT). The estimate of c is the time-averaged value
b. Find the value of T such that P{\c — c| < 0.1c} > 0.999. (Express
T in terms of c, B, and Na.)
3.38 Give an example of a random process that is WSS but not ergodic in mean.
Rxxi'f) = 1 0e xp (-|T |)
Show that X ( t ) is ergodic in the mean and autocorrelation function.
b. Show that
E{(Sxx( f ))T} = Sxx(f), as T —» 00
and
V a r {(5 ^ (/))r} > [E{(Sxx( f ) ) T}]2
= Jj i * (0
and
( Rxx(k))N = ^ £ X ( i ) X ( i + * )
3.42 Prove the properties of the Fourier series expansion given in section 3.9.1
and 3.9.2.
a. Show that the basis vectors V[, v2, . . . , v „ are the eigenvectors of
Xx corresponding to X.b \ 2, . . . , \m, respectively.
b. Show that the coefficients A t are random variables and that A t =
X Tv,.
c. Find the mean squared error.
3.44 Suppose we want to sample the random processes whose power spectral
densities are shown in Figure 3.33. Find a suitable sampling rate using
the constraint that the ratio of Sxx(0) to the aliased spectral component
at / = 0 has to be greater than 100.
3.45 Show that a WSS bandpass random process can also be represented by
sampled values. Establish a relationship between the bandwidth B and the
minimum sampling rate.
In many cases, physical systems are modeled as lumped, linear, time invariant
(LLTIV), and causal, and their dynamic behavior is described by linear differ
ential or difference, equations with constant .coefficients. The response (i.e., the
output) of a LTIV (lumped is not a requirement if the impulse response is known)
system driven by a deterministic input signal can be computed in the time domain
via the convolution integral or in the transform domain via Fourier, Laplace,
or Z transforms. Although the analysis of LTIV systems follows a rather standard
and unified approach, such is not the case when the system is nonlinear or time
varying. Here, a variety of numerical techniques are used and the specific ap
proach used will be highly problem dependent.
In this chapter, we develop techniques for calculating the response of linear
systems driven by random input signals. Regardless of whether or not the system
is linear, for each member function x(t) o f the input process W(r), the system
produces an output y(t) and the ensemble of output functions form a random
process Y(i), which is the response of the system to the random input signal
X(t). Given a description of the input process X(t ) and a description of the
system, we want to obtain the properties of T(r) such as the mean, autocorre
lation function, and at least some of the lower order probability distribution
functions of Y(t). In most cases we will obtain just the mean and autocorrelation
function. Only in some special cases will we want to (and be able to) determine
the probability distribution functions.
We will show that the determination of the response of a LTIV system
responding to a random input is rather straightforward. However, the problem
of determining the output of a nonlinear system responding to a random input
signal is very difficult except in some special cases. No general tractable analytical
216 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
H (0 = / M O ; - 30 < 1 < ” ]
and
/ M i (0 + a2x 2(t)] = a , / M O ] + ar / M 0 ]
(i.e., superposition applies).
3. Time Invariant. I f y ( 0 = /W O L then
(i.e., a time shift in the input results in a corresponding time shift in the
output).
4. Causal. The value of the output at t = t0 depends only on the past
values of the input x(t), t < r0, that is
y(t0) = f [x ( t) ; t < t0], -o c < t, to < «=
Almost all o f the systems analyzed in this chapter will be linear, time invariant,
and causal (LTIVC). An exception is the memoryless systems discussed in the
next subsection.
y( 0 = 2 M ( 0 ,
where the a,’ s are -known constants. Such systems can be analyzed using the
techniques o f Section 2.6, as illustrated by the following example.
EXAMPLE 4.1.
MO = o
M M = exp( - |TI)
Y(t) = A 0 0
SO L U T IO N :
218 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
Because E { X 2(t)}
* = Rxx(0)
M O = M -(O ) = 1
W i)r (0 ]
2'rrV'l — exp{ —2\tl — f3|}
E{X\X\] = E{X\}E{X\} + 2E 2{ X lX 2]
Thus
where t = t2 - r,.
N N
2 amy[(m + n)Ts] = 2 bmx[(m + n)Ts] (4.2)
m=0 ;n=0
where x(kTs) and y( kTs) are the input and output sequences, Ts is the time
between samples, and the a,’s and £>,’s are known constants. We will assume x,
y, a,’s, and £>,•’s to be real-valued and set Ts = 1. The last assumption is equivalent
RESPONSE OF LTIVC DISCRETE TIME SYSTEMS 219
= Yj h(m)x(n - m) (4.3.b)
m =-»
2 \h(k)\ < °°
fc=0
The Fourier transform of the unit pulse response is called the transfer func
tion, H ( f ) , and is
where / i s the frequency variable. The unit pulse response can be obtained from
H ( f ) by taking the inverse transform, which is defined as
h(n) = F ~ l { / / ( / ) } = f H( f) ex p (j 2i mf ) df (4.4.b)
J- 1/2
If we assume that the Fourier transforms of x(n) and y(n) exist and are called
X F( f ) and YF( f ) , respectively, then the input-output relationship can be ex-
220 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
If the system is stable, then the order of summation can be interchanged, and
we have
2 h{m) X Fif)Qxp{-j2Tsmf)
YFi f ) = X Fi f ) H i f ) (4.5.a)
x z(z) = 2 xin)z~"
*=0
Note that there are two significant differences between the Z transform and the
Fourier transform. The Z transform is applicable when the sequence is defined
on the nonnegative integers, and with this restriction the two transforms are
RESPONSE OF LTIVC DISCRETE TIME SYSTEMS 221
Note that Y(n) represents a random sequence, where each member function is
subject to Equation 4.3.
The mean and the autocorrelation of the output can be calculated by taking
the expected values
and
Pr = £ { K ( « ) } = ^ = ^ X h{m)
m=--»
= PxH{Q) (4.8.a)
and
R YY{nu «2) - X 2 h ( m x) h { m 2)
m\~ -» m2-
(4 8 .b)
x Rxx[{n2 - n ,) - (m2 - m x)]
The ripht 1 sh? wsf^ t the mean of Y does not depend on the time index
I " / I6 Equation 4.8.b depends only on the difference of and
of a TTIVT 6 Rt y(n-’ » l c lU be 3 functi0n of "2 ~ n t. Thus, the output Y[n)
of a LTIVC system is WSS when the input X(n) is WSS.
stationary
stationary (SSS), then the ^output ^will inpUt
also bel° SSS.
a L T IV C system is strict-sense
The assumption that we made earlier about zero initial conditions has an
londTi beaflnn ° n the Stationarity of the outPut- K we have nonzero initial
Rvx(k) = E{ Y{ n) X{ n + k)}
= E Yj h(m)X(n - m) X( n + k)
= X h( m) E{ X( n - m) X( n + k)}
= X h(m)Rxx(k + m) = £ h( —n)RXx{k - n)
n = - X
or
and hence
we have
Syy(f) = F{RyY(k)}
= F{Rxx(k) * h(k) * h(- k)}
224 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
Since the Fourier transform of the convolution o f two time sequences is the
product of their transforms, we have
Syy(f) = F{Rx x ( k ) } F { h ( k ) } F { h ( - k ) }
= Sxx( f ) H ( f ) H ( - f )
= Sxx( f ) H( f ) H*( f ) _ ----- ,
= Sx x ( f) \ H ( f ) \ 2 C (4 . 1 1 .b)
Equation 4.11.b is the basis o f frequency domain techniques for the design of
LTIVC systems. It shows that the spectral properties of a signal can be modified
by passing it through a LTIVC system with the appropriate transfer function.
By carefully choosing H ( f ) we can remove or filter out certain spectral com
ponents in the input. For example, suppose we have X {n ) = S(n) + N(n),
where S(n) is a signal of interest and N(n) is an unwanted noise process. Then,
if the psd of S(n) and N(n) are nonoverlapping in the frequency domain, the
noise N(n) can be removed by passing X (n ) through a filter H ( f ) that has a
response of 1 for the range of frequencies occupied by the signal and a response
of 0 for the range of frequencies occupied by the noise. Unfortunately, in most
practical situations there is spectral overlap and the design o f optimum filters
to separate signal and noise is somewhat difficult. We will discuss this problem
in some detail in Chapter 7. Also note that if X(n) is a zero-mean white noise
sequence, then SYY( f ) = cr2\H(f)\2, and RXY{k) = a 2h(k). Thus, white noise
might be used to determine h(k) for a linear time-invariant system.
From the definition of the Z transform, it follows that
7U[exp(/2rr/)] = H ( f )
Defining
= S U z ) H ( z ) H ( z ~') (4.12.b)
RESPONSE OF LTIVC DISCRETE TIME SYSTEMS 225
EXAMPLE 4.2.
and
for k = 0
for k 5^ 0
for k = 0 , 1
for k > 1
Find the mean, the autocorrelation function, and the power spectral density
function of the output Y(n).
SOLUTION:
To find R YY(k), let us first find SYY( f ) from Equation 4.1 l.b. We are given that
H(f) = 2 h(k)exp( —j l v k f )
= 1 + e x p (-/2 -it/ )
and
Sx x ( f ) = F{Rx x (k)}
T < \
Hence
flry(O) = 2
f ly y (± l) = 1
^yy(^) = 0 , \k\ > 1
EXAMPLE 4.3.
The input X(n) to a certain digital filter is a zero-mean white noise sequence
with variance a 2. The digital filter is described by
a0 + QiZ 1* + a2z ~2
Hz(z)
1 + b lZ~l
If the filter output is Y(n), find S f y(z), and the power spectral density of
Y in the normalized frequency domain.
a~, n = 0
P.v — 0 ; Rxx(n)
0 elsewhere
P-y = 0( / / ( 0)) = 0
l/l<
= J h ( j ) x ( t — t ) (iT (4.13.b)
where /i(t) is the impulse response of the system and we assume zero initial
conditions. For a stable causal system
f | h (t ) | dr < =o
and
h (r ) = 0, t < 0
YF( f ) = H ( f ) X F( f ) (4.14)
228 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
and y(t) is obtained by taking the inverse Fourier transform of YF( f). The
forward and inverse transforms are defined as
Y fU ) = J y ( 0 exp ( - / 2^ f 0 dt
Note that the frequency variable /ranges from —oo to <» in the continuous time
case.
When the input to the system is a random process X(t), the resulting output
process Y(t) is given by
X ( j ) h{t — t ) dj (4.15.b)
Note that Equation 4.15 implies that each member function o f X(t) produces a
member function of Y(t) according to Equation 4.13.
As with discrete time inputs, distribution functions of the process Y{t) are
very difficult to obtain except for the Gaussian case in which Y(t) is Gaussian
when A (r) is Gaussian. Rather than attempting to obtain a complete description
of Y(t), we settle for a less complete description of the output than we have for
deterministic problems. In most cases with random inputs, we find the mean,
autocorrelation function, spectral density function, and mean-square value of
the output process.
E { Y ( t )} = E ( X(t - t ) / i ( t ) dj
= E{X(t - t )}/ z (t ) di
and
Y(t) = X{t - t ) / i (t ) dx
and
T (f + e) = * ( f + e - t ) / i (t ) dx
Now, if the processes X{t ) and X(t + e) have the same distributions [i.e., X(t)
is strict-sense stationary] then the same is true for Y(t) and Y(t + e) and hence
F(r) is strict-sense stationary.
If X(t) is WSS, then p ^ f ) does not depend on t and we have from Equation
4.16
E{Y{t ) } = J \xx h{ x) dx
= Ex f h ( j ) dx = \xx H ( 0 ) (4.18)
Thus, the mean of the output does not depend on time. The autocorrelation
function of the output given in Equation 4.17 becomes
Since the integral depends only on the time difference t2 - tu R YY(t,, t2) will
also be a function of the difference t2 - f,. This coupled with the fact that p y
230 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
is a constant establishes that the output process Y(t) is WSS if the input process
X(t) is WSS.
R y x ( t) = R x x { t) * h ( - t) (4.20 .a)
and
R y y (t ) = R y x (t) * h ( t) (4.21)
= Rxx( t) * /i ( t) * h ( - T ) (4.22)
Equation 4.23, which is o f the same form as Equation 4.1 l.b, is a very important
relationship in the frequency domain analysis of systems that are driven by
random input signals. This equation shows that an input spectral component at
frequency / is modified according to \H(f)\2, which is sometimes referred to
as the power transfer function. By choosing H ( f ) appropriately, we can em
phasize or reject selected spectral components of the input signal. Such oper
ations are referred to as “ filtering.”
Note that in the sinusoidal steady-state analysis of electrical circuits we use
an input voltage (or current) of the form
x(t) = ztsin(2Tr/V)
as the input to the system and write the output voltage (or current) as
Power Spectra] Density Function. The definition of psd given in Equation 3.43
can now be justified using Equation 4.23. If we have an ideal bandpass filter
which is defined by
E [ Y \ t ) } = 2 ff: Sx x ( f ) df
Because the average power of the output Y(t) of the ideal bandpass filter is the
integral of the power spectral density between —/ , and —f x and between / ,
and f 2, we say that the power of X(t) between the frequencies/! and / 2 is given
by Equation 3.43. Thus, we naturally call Sxx( f ) the power spectral density
function.
The foregoing development also shows, because £ [K J(f)] a 0, that
Sxxif) s 0 for all/.
EXAMPLE 4.4.
X ( t ) is the input voltage to the system shown in Figure 4.1, and Y ( t ) is the
output voltage. X ( t ) is a stationary random process with p.-V = 0 and R xx { t) =
exp( —a j t |). Find p.r , SYY{ f ) , and RYY{t).
F ig u r e 4 .1 C ir c u it f o r E x a m p le 4 .1 .
232 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
R
H(f) = R + j2-nfL
Also
a 2 + ( 2 tr/ ) 2
M-v = 0
2a R2
S yyU ) = a 2 + (2 tt / ) 2 R 2 + (2tr /L )2
Lj
R yy( t) exp( - a| t |) + exp
EXAMPLE 4.5.
HU) = i ^ f
SOLUTION:
Sx'x'U) = (2trf)2Sxx(f)
RESPONSE OF LTIVC CONTINUOUS-TIME SYSTEMS 233
and
which agrees with Equation 3.60. Note that differentiation results in the mul
tiplication of the spectral power density b y /2. If X(t) is noise, then differentiation
greatly magnifies the noise at higher frequencies and provides a theoretical
explanation for the practical result that it is impossible to build a differentiator
that is not “ noisy.”
= 0 elsewhere
K (0 = AT(f) * h(t)
Y ( t ) = j . I ' ^ X ( j ) ch
SOLUTION:
exp( ~ / 2 t: / t ) d- = exp ( - / tt / T )
thus
This result demonstrates, if X{t) is noise, how the higher frequency noise is
reduced by integration.
The mean square value of the output, which is a measure o f the average value
of the output “ power,” is o f interest in many applications. The mean-square
value is given by
E { Y -( t ) } — R yy(0) — f SYY( f ) df
J —■x.
= df (4.24)
Except in some simple cases, the evaluation o f the preceding integral is somewhat
difficult.
If we make the assumption that Rxx( t) can be expressed as a sum of complex
exponentials [i.e., Sxx(f) is a rational function o f / ] , then the evaluation of the
integral can be simplified. Since Sxx( f ) is an even function, we can make a
transformation s = 2irjf and factor S.xx(f) as
^ , (_ L ) =
V - v j/ b ( s ) b ( ~ s )
where a(s)/b(s) has all of its poles and zeros (roots) in the left-half o f the s--
plane and a ( - s ) / b ( - s ) has all of its roots in the right half plane. No roots of
o(s) are permitted on the imaginary axis. We can factor [ / / ( / ) 12 in a similar
fashion and write Equation 4.24 as
J _ P° c 0 ) c ( - ^ )
2 ny J_y„ d ( s ) d ( - s ) (4.25)
where
Sx x if ) \f^ i = TFTJpX
d(s)d(-s)
RESPONSE OF LTiVC CONTINUOUS-TIME SYSTEMS 235
2 d 0 d,
^ _ c ,d 0 + C qd 2
2 2d0d,d2
c l( - d 2
ad3 + d0d,d2) + ( c l - 2 c , c 3) d 0d , d 4
/„ = + ~ 2c0c2)dad2dt + c j{ - d , d j + d2d2dd
*____ 2 dad A ~ d 0d 2 - d 2,d t + d ,d 2d3)
and c ( » and if 0 ) contain the left-half plane roots of 5 KK. Values of integrals of
the form given in Equation 4.25 have been tabulated in many books and an
abbreviated table is given in Table 4.1.
We now present an example on the use o f these tabulated values.
EXAMPLE 4.7.
1
H{f)
1 + / ( / / 1000)
$ x x ( f ) = 10 12 watt/Hz
SOLUTION:
1 1
Syy( f ) = 10-
1 + /(//1 0 0 0 ) ' 1 - /(//1 0 0 0 )
Transforming to the s-domain with s = 2irjf, we can write the integral for
E {Y 1{/)} using Equation 4.25 as
io- io-
E { Y 2{t)} = J - f - ds
2 -tt/ J _/=0 1 + (s/2000-ir) 1 - (s/2000it)
EXAMPLE 4.8.
F (0 = x A ( k ) p ( t - k Ts - D)
k= -x
where p{t) is an arbitrary deterministic pulse of known shape and duration less
than Ts, D is a random variable with a uniform distribution in the interval
[0, Ts], and A { k ) is a stationary random sequence (see Figure 4.2 for an example
where A (k ) is binary). Find the psd of Y(t) in terms of the autocorrelation (and
psd) function of A (A:) and Pf (f ) .
X(t) = X A ( k ) H t - kTs - D)
k= -x
The only difference between Y(t) and X(t ) is the “ pulse” shape and it is easy
to see that if X{t) is passed through a linear time invariant system that converts
each impulse §(r) into a pulse p(t), the resulting output will be Y(l). Such a
system will have an impulse response o f p(t) and we can write
n o = x { t ) * P{t)
RESPONSE OF LTIVC CONTINUOUS-TIME SYSTEMS 237
t
-Ts Ts
m
KTCy
, X( t)
... r n ^ n ...
i 1 , 1 1 1 i i i
i 1 1 1 1 1
J
'1
(c)
Note: *
£
0
LTIV system
XU)
h( t )=p( t)
yu)
(U )
F ig u r e 4 .2 R e la t io n s h ip b e t w e e n - Y ( 0 a n d Y(t) fo r E x a m p le 4 .8 .
Syy(f) = Sxx(f)\PF(f)\2
and hence
Syy(f ) ~ + 22 R AA(k)cos 2 ~ k f T s^
N °te that the preceding equation, which gives the psd of an arbitrary pulse
process, has two parts. The first part \ P F( f ) \ 2 shows the influence of the pulse
shape on the shape of the spectral density and the second part, in brackets,
shows the effect of the correlation properties of the amplitude sequence. The
factor 1/7; converts energy distribution (or density) to power distribution.
Occasionally we will have to analyze systems with multiple inputs and outputs.
Analysis of these systems can be reduced to the study of several single input-
single output systems (see Figure 4.3). Consider two such linear systems with
two inputs X t (t) and X 2{t) and impulse responses h ^ t ) and h , ( t ) as shown in
Figure 4.3.
Assuming the systems to be LTIVC, and the inputs to be jointly stationary,
we have
Y M ~ j ■X' i U ~ a) k i ( a ) da
X t (t) Y-, U)
X 2 U) Y 2 U)
F ig u r e 4 .3 M u lt ip le i n p u t - o u t p u t s y s te m s .
RESPONSE OF LTIVC CONTINUOUS-TIME SYSTEMS 239
R r,r2( t ) = r x , y 2( t ) * / ii ( - t) (4.26.a)
s y 1y 2 (/) = (4.27.a)
Hence
Equations 4.26 and 4.27 describe the input-output relationship for multiple
input-output systems in terms o f the joint properties o f the input signals and
the system impulse responses (or transfer functions).
4.3.6 Filters
1
\H(f)\2
1 +
where n is the order of the filter and B is a parameter that determines the
bandwidth o f the filter. For a detailed discussion o f filters, see Reference [1],
240 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
1
\mn\
-B 0 \ B
la) ^ B ID
Lowpass filter
itfini
1
--- ^-J'_______
_______
-B 0 B
lb)
Highpass filter ^ o(f)
F ig u r e 4 .4 Ideal filters.
Ideal filter \ H ( f ) \ z
Consider an ideal and actual lowpass filter whose input is white noise, that
is, a noise process whose power spectral density has a constant value, say i\/2,
for all frequencies. The average output powers of the two filters are given by
E{Y-(C)} = \ H {f )V - df
E { Y \ t ) } = ( j ) \H{G)V-2Bn
\H(f)\2 df
Bn (4.28)
2\H(0)\2
242 RESPONSE OF LINEAR SYSTEMS TO RANDOM INPUTS
EXAMPLE 4.9.
IH ( f ) \ 2 = 1/[1 +
= B{ ti/2)
The reader can verify (Problem 4.18) that the noise equivalent bandwidth of an
«th order Butterworth filter is
B n = B[(-rr/2n)/sin (ir/2n)]
4.4 SUMMARY
After reviewing deterministic system analysis for linear-time invariant causal
systems, we considered these systems when the input is a random process.
It was shown that when the input to a LTIVC system is a SSS (or WSS) ran
dom process, then the output is a SSS (or WSS) random process. While the
distribution functions of the output process are difficult to find except in the
Gaussian case, simple relations were developed for the mean, autocorrelation
REFERENCES 243
function, and power spectral density functions of the output process. These
are as follows:
Syy( f ) = \H(f)V-SXX{ f )
where W(r) is the input process, Y(t) is the output process, h(t) is the im
pulse response, and H ( f ) is the transfer function o f the system. The average
power in the output Y(t) can be obtained by integrating SYY( f ) using the ta
ble of integrals provided in this chapter.
In the case of random sequences, the relation for power spectral density func
tion was found using the Fourier transform and the Z transform and an appli
cation to digital filters was shown. The relations for the mean, correlation
functions, and power spectral density functions for continuous random
processes were found to be of the same form as those for sequences.
4.5 REFERENCES
4.6 PROBLEMS
4.1 X ( t ) is a zero-mean stationary Gaussian random process with a power
spectral density function SXx( f) - Find the power spectral density function
of
no = x \ t )
4.2 Show that the output o f the LTIVC system is SSS if the input is SSS.
Y(n) = \ £ X( n - i)
K i= i
a. Find the transfer function of the system.
b. If the input X{n) is stationary with
E{X( n)} = 0
for k = 0
Rxx(k)
for k A 0
E {X { n) X(n + j )} = | j ’
a. Find p.*(n).
b. Find n), Rx*(1, 1), ^ aw( 1> 2), and Rxx(3, 1)-
4.9 With reference to the model defined in Problem 4.8, find SYY( f ) for the
following two special cases:
a. X(t) is Gaussian with SXx { f ) — V 2 for a ll/, and
b2 = b} = ■■ ■ = b„ = 0
a{ = a2= •• • = am= 0
= b 2= •■ • — bn = 0
4.10 Establish the modified versions of Equations 4.20.a and 4.20,b when both
X(t ) and h(t) are complex-valued functions o f time.
4.11 Repeat Problem 4.10 for Equations 4.26.a, 4.26.b, and 4.27.c.
Y(t) = ^ [ A'(a) da
* Jt-T
Sxx(f) = V 2
find E { Y 2(t)}.
■rpr—
4.13 Using the spectral factorization method, find E { Y 2(t)} where Y(t) is a
stationary random process with
( 2 rr/ ) 2 + 1
Syy(f) =
( 2 it/ ) 4 + 8(2 rr/ ) 2 + 16
Sxxif) = V2
and that the impulse response of the system is
_ f exp(
ex —t), t> 0
HO
~ to elsewhere
X{t) = 2 A ( k ) p ( t - kTs - D)
k=
b. Find the psd of Y(t) and compare it with the psd of X(t).
Y(t) = 2 M k ) p ( t - k Ts - D)
k= -»
PROBLEMS 247
E{A(k)} = l E { A { k ) 2} = |
E { A { k ) A { k + /')} = for a l l 1
16
4.17 Find the transfer function of a shaping filter that will produce an output
spectrum
( W )2 + 1
Syy(f) =
(2tt/ ) 4 + 13(2tt/ ) 2 + 36
SxxU) = -n/2
4.18 a. Find the noise bandwidth of the nth order Butterworth filter with
the magnitude response
l « ( / ) l 2 = 1/[1 + (f/B)21
for n = 1, 2, 3, 4, and 8 .
b. From a noise-rejection point of view, is there much to be gained
by using anything higher than a third-order Butterworth filter?
4.19 Find the noise bandwidth of the filters shown in Figure 4.6.
R
--------- W W \-------
+
input .Y(f) R
o— ------------------
(a)
°----- W AV- o
+
11 'f L C * - 1
Input Output
XU) YU)
R 'JCVL*-1
o o
(A)
F ig u r e 4 .6 C ir c u it s f o r P r o b l e m 4 .1 9 .
a. Find the power spectral density function of the output signal and
noise.
b. Find the ratio of the average output signal power to output noise
power.
Special Classes of
Random Processes
5.1 INTRODUCTION
for modeling and analyzing queues, and for describing “ shot noise” in com
munication systems. In Section 5.4, we develop several point-process models
and illustrate their usefulness in several interesting applications.
By virtue of the central limit theorem, many random phenomena are well
approximated by Gaussian random processes. One of the most important uses
of the Gaussian process is to model and analyze the effects o f “ thermal” noise
in electronic circuits. Properties of the Gaussian process are derived and the use
of Gaussian process models to analyze the effects o f noise in communication
systems is illustrated in Section 5.5.
In this section, we introduce two stationary linear models that are often used
to model random sequences. These models can be “ derived” from data as is
shown in Chapter 9. Combinations o f these two models describe the output of
a LLTIVC system, and they are the most used empirical models o f random
sequences.
E{e{n)} = 0
' V2
The sequence e(n) is called white Gaussian noise. (See Section 5.5.2.) Thus,
an autoregressive process is simply another name for a linear difference equation
model when the input or forcing function is white Gaussian noise. Further, if
the difference equation is of order p (i.e., § p,p ¥= 0), then the sequence is called
DISCRETE LINEAR MODELS 251
a pth order autoregressive model. We now study such models in some detail
because of their importance in applications, primarily due to their use in creating
models of random processes from data. Autoregressive models are also called
state models, recursive digital filters, and all-pole models as explained later.
Equation 5.1 can be easily reduced to a state model (see Problem 5.1) of
the form
X ( « ) = <E>X(« - 1) + E ( h) (5.2)
In addition, models of the form of Equation 5.1 are often called recursive digital
filters. In this case, the <J>p I’s are usually called /i/s, which are terms o f the unit
pulse response, and Equation 5.1 is usually written as
and the typical block diagram for this model is shown in Figure 5.1.
Using the results derived in Chapter 4, we can show that the transfer function
of the system represented in Equation 5.1 and Figure 5.1 is
1
H{f) = p
(5.3)
1 - 2 4>f,.,exp(-/2TT//)
and the autoregressive process X(n) is the output of this system when the input
is e(n). Note that there are no zeros in the transfer function given in Equation
5.3.
where e(n) is zero-mean stationary white Gaussian noise. Note that Equation
5.4 defines X(n) as a Markov process. Also note that Equation 5.4 is a first-
order regression equation with X(n — 1) as the “ controlled” variable. (Regres
sion is discussed in Chapter 8.) A sample function for X(n) when <|>u = .48 is
shown in Figure 5.2.
We now find the mean, variance, autocorrelation function, which is also the
autocovariance, correlation coefficient, and the power spectral density of this
process.
Since we wish the model to be stationary, this requirement imposes certain
conditions on the parameters of the model. The mean of X(n) can be obtained
as
M-at = 0
£ { * ( 0)} = 0 (5.5)
o i = E {X ( n) 2}
= E{$\tlX(n - l )2 + e{n)2 + 2<j>u X { n - 1)e{n)} (5.6)
2 I2 1 i 2
&x = <PT.icrjr + cr-N
and hence
(5.7)
1.1
Rxxjm)
rXx(m) 4>u m a 0 (5.10)
Rxx( 0)
Sxx( f ) = m f W - s M )
0.50
0.40
0.30
rXxM
0.20
0,10
0.00
1 2 3 4 5 6 7 8 9 10
F ig u r e 5 .3 C o r r e l a t i o n c o e f f i c i e n t o f t h e fir s t -o r d e r a u t o r e g r e s s iv e p r o c e s s : X(n) =
A8X(n - 1) + e(n).
DISCRETE LINEAR MODELS 255
1
H(f) = <
1 - <J>u e x p ( - / 2 T f/)’
and hence
i
l/l<
1 — 2 <t>i,iCos 2 ttf -P (J)]j’
Thus
o i ( i - 4>i.i )
Sxx(f) l/l < } (5.12)
1 — 2<)>u cos 2-Tr/ -P <t>pi
Equation 5.12 also can be found by taking the Fourier transform of Equation
5.9 (see Problem 5.5).
If we define z ^1 to be the backshift or the delay operator, that is,
or
e(n)
X(n) (5.14)
1 - 1
(See Problem 5.20.) Thus, this first-order autoregressive model can be viewed
as a weighted infinite sum of white noise terms.
We now seek \xx , <rx , Rx x , rx x , and Sxx and sufficient conditions on 4>2,i
and <t>2,2 in order to ensure stationarity. Taking the expected value of Equation
5.16
and hence
M-x = 0
if 4>2,i + <J>2,2 ^ 1, a required condition, as will be seen later. The variance can
be calculated as
= E{X( n)X(n) ] } = E{ ^u X { n ) X ( n - 1)
+ <t>2,2X ( n ) X ( n — 2) + X(n)e(n)}
= <|52,1^Aa (1) + $2.2Rxx (2) + <Jn
2
Substituting Rxx{k) — crx rxx(k) into the previous equation and solving for a-x ,
we have
1 4>2,IrA'A'(l) 4>2,2rxx(2)
or
where kr and X2 are the roots of the characteristic equation obtained by assuming
Rxx(m) = km, for m > 1, in Equation 5.19. This produces
X2 = + ^ 2,2
or
Thus, Rx x{m) can be a linear combination of geometric decays (X[ and X2 real)
or decaying sinusoids (X[ and X2 complex conjugates) or o f the form, B{km +
B2mkm, where X] = X2 = X.
The coefficients A t and A 2 (or Bi and B2) must satisfy the initial conditions
^ aw(O) = <r2
x
and
Rxx{l) = : .~ ~ 4 (5-22)
1 9 2 ,2
^ ( 0 ) = <Tl = A , + A 2 (5.23.a)
Rxx(m)
rxx(m) a,Xr + a2\? (5.24.a)
<*i
where
A,
«/ = 2 f i = 1 ,2 (5.24.b)
<?x
<(>2,1
rxxO-) (5.25)
1 - 4*2,2
4 * li
rXx(2) + 4*2,2 (5.26)
1 ~ 4*2,2
_____________________ o ~ v ( l ~ 4>2.2)______________________
<*i = (5.27)
(1 + 4>2,2)(1 - 4*2,1 - <(>2 .2 )(1 + 4>2 ,1 - 4>2.2>
i
4*2.2 A - 1
4*2,1 + 4*2.2 A 1
4*2.2 — 4*2.1 A 1
260 SPECIAL CLASSES OF RANDOM PROCESSES
where
___________________1___________________ 1
HU) = l/l<
1 - 4*2,,e x p (-/2 it/ ) - 4>2,2e x p ( - / 4i r / ) ’ 2
Thus
Sxx(f)
|1 - 4*2,lexp( —j l l t f ) - 4>2,2exp( —y4'ir/)|2’
which can also be found by taking the Fourier transform of Rxx(m) as given by
Equation 5.20.a. In this case it can be seen that
Using Equations 5.21 and 5.23 one can show that the two expressions for
SxxU) are equivalent (see Problem 5.17).
V-x = 0 (5.29.a)
and
rxx(p)
Jxxip ~ 1) rxx(p - 2) 1
_<tw_
(5.31.a)
262 SPECIAL CLASSES OF RANDOM PROCESSES
or
Equation 5.32 can be used to estimate the parameters , o f the model from
the estimated values o f the correlation coefficient rxx{k), and this is of consid
erable importance in data analysis.
The power spectral density of X ( n ) can be shown to be
Sxx(f) = See(f)\H(f)\2
cr2
N
i/l < ^ (5.33)
1 - 21 <t>P./ex p (—/ 2 tt/ i')
It is clear that this is a Markov process, that is given X(n - 1 ), the previous
A s that is, X(n - 2), X(n - 3 ) , . . . , are of no use in determining or predicting
X(n). But as we see from Equation 5.10, the correlation between X(n) and
X(n - . 2 ) is not zero, indeed:
DISCRETE LINEAR MODELS 263
1
rx x ( 3 ) —
8
We now suggest that the partial autocorrelation between X(n) and X(n - 2)
after the effect of X(n - 1 ) has been eliminated might be of some interest. In
fact, it turns out to be o f considerable interest when estimating models from
data.
In order to define the partial autocorrelation function in general, we return
to the Yule-Walker equation, Equation 5.31. When p = 1 , Equation 5.31 re
duces to
rx x ( l ) — 4 >1,1
and in general
rxx(l) l rx x ( l ) rx x ( 2 ) • rxx(p - 1)
r x x ( 2)
=
rxx(l) 1 rXx ( l ) • rxx(p - 2)
1--------
"T
-
|W
Jxx(p ~ 1) rx x ( p 3) • I
(5.34)
-1
4*2,1 1 rxx(l) rx^r(l)
1
or
rxx(2) ~ r\x ( 1 )
4*2,2 (5.35)
1 - r%x (l)
rxx(m) = 4*u
4>u - 4* 1.1
4*2,2
1 - 4*[,i
showing that for a first-order autoregressive process the partial correlation be
tween X(n) and X(n — 2) is zero.
The partial autocorrelation function of a second-order autoregressive process
can be obtained as follows. Using Equations 5.25 and 5.34 with k = 1, the first
partial correlation coefficient is
4*2,1
4*u = rxx(l) =
1 — 4)2.2
Also using Equations 5.35, 5.25, and 5.26, we find the second partial correlation
coefficient for a second-order autoregressive model as
4*2,1 , \ 4*2,1
1 - 4*2.2 ^ 7 (1 - 4*2,2 )2
4*2,2
4*2,1
1
(1 ~ 4 *2 .2 )-
DISCRETE LINEAR MODELS 265
= 4*2,2
This justifies the notation for the partial correlation coefficient agreeing with
the parameter in the autoregressive model. It can be shown that for a second-
order autoregressive process (see Problem 5.18)
<t*M = 0 , k > 2
4>w = 0 , k > p
In Chapter 9, this fact will be used to estimate the order of the model from
data.
Note that if 2 0, = 1 and 0, a 0, then this is the usual moving average of the
inputs e(n). We change the parameter limits slightly, and rewrite the preceding
equation as
where 0,iO = 1 a n d - 0 ,¥=0. The model given in Equation 5.36 can be represented
in block diagram form as shown in Figure 5.5.
The reader can show that the transfer function of the system shown in Figure
5 .5 is
266 SPECIAL CLASSES OF RANDOM PROCESSES
Note that this transfer function does not have any poles and hence is called an
all-zero model.
A sample sequence is plotted in Figure 5.6. A different form of this model can
be obtained using the backshift operator
X(n) = + 1)e(n)
or
And if
- 1 < 0U < 1
then
= 2 (-eu)'*(« - 0
DISCRETE LINEAR MODELS 267
0 20 40 60 80 100
n
Figure 5.6 S a m p le fu n c t io n o f th e fi r s t -o r d e r m o v in g a v e r a g e m o d e l : X (n ) =
ASe(n — 1) + e(n).
Thus, the first-order moving average model can be inverted to an infinite au
toregressive model. In order to be invertible, it is required that - 1 < 0,, < 1.
Returning to Equation 5.37, we find p*, cr2
x , Rx x , rx x , the partial correlation
coefficients, and Sxx( f ) as
Rxx(k) = E { X ( n ) X (n - *)}
= £ { [ 0 u e(n - 1 ) + e(n)]
x [0 ,Ae(n - k — 1) + e{n - A:)]}
= f 9i.icr^> k = 1
10 k > 1 (5.40.a)
268 SPECIAL CLASSES OF RANDOM PROCESSES
and hence
8U
rXx ( 1 )
1 + 0 i,i
and
Note the important result that the autocorrelation function is zero for k greater
than one for the first-order moving average sequence.
The partial autocorrelation coefficients can be obtained from Equation 5.34
as
8 |,i
<t>u = rx x ( 1 ) (5.41.a)
1 + 9i,i
- pi ,i
rxx(2) — r2x x{\) (1 + 9i,i)2
<t>2.2 =
1 - r\x { 1 ) _ 9i.i
(1 + 0 i,i)2
= ~ 9U (5.41.b)
1 + 01.1 + 0 u
Thus, the partial autocorrelation coefficients do not become zero as the corre
lation coefficients do for this moving average process.
The spectral density function Sxx( f ) is
X ( n ) = Q2,ie (n — 1) + — 2) + e(n)
has a mean
Thus
____________ -.2
^ ( 2 ) - ! + 02 , + (5.45.c)
The last result, that is, Equation 5.45.d, is particularly important in identifying
the order of models, as discussed in Chapter 9.
The power spectral density function of the second-order moving average
process is given by
General Moving Average Model. We now find the mean, autocorrelation func
tion, and spectral density function of a gth-order moving average process which
is modeled as
M-a - = £ {* (« )} = 0 (5.47.a)
and
r q
o\Y E {X ( n ) X ( n ) } = E 2 0 2 Oq.jS(n
J L/=o
= 2 e?.'crv
= vl 1 + 2 01,
i= X (5.47.b)
R x x( m ) = E 2 - /) 2 0, ./<?(« - " ! - / )
L/ =0
a2
N 1 + 2 e?./ > m = 0
/=i
0?.i + 2 1 /n = 1
i=2
0<,.2 + 2 0i/.;0</./-2 = 9
/ =3
In general
“
R\x{m) a~N 0,.™ + 2 e,./ q.j-m m < q
/ = rn + 1
3
ii
o-T/0 '/•<7 »
we obtain
1 + 2 6,.rexp(-;2TT/T)
SxxU) l/l< (5.51)
1 - 2 <fv.-exp(-;'2 tt/ 0
272 SPECIAL CLASSES OF RANDOM PROCESSES
F ig u r e 5 .7 A n a u t o r e g r e s s iv e m o v in g a v e r a g e A R M A ( p , q) filt e r .
Note that the transfer function H( f ) and the power spectral density Sxx( f ) have
both poles and zeros. The autocorrelation function Rxx(m) of the A R M A proc
ess is
Rxx(m) = E{ X( n - m)X{n)}
= E |[AT(« - m)] E 4 y ;^ 0 t - 0
+ E %.ke(n - k) + e{n)
Because
E { X ( n — m)e(n)} = 0, m > 1
<7
The expansion of the middle term in an infinite series shows that X(n) is an
infinite series in z ~ l. Thus, X(n) depends upon the infinite past and the partial
autocorrelation function will be nonzero for an infinite number of values.
P a- = 4> u P a - + 0
p.* = 0 (5.54.a)
F ig u r e 5 .8 S a m p le fu n c t i o n o f th e A R M A ( 1 , 1 ) m o d e l : X(n) = .5X(n - 1) +
.5 e{n - 1) + e(n).
which leads to
, _ [1 + 8ij + 2<j3|-!91i]crjv
(5.54.b)
" (1 -
and
The least complicated model of a random process is the trivial one in which the
value of the process at any given time is independent o f the values at all other
times. In this case, a random process model is not needed; a single random
variable model will suffice with no loss o f generality.
A more complicated model is one in which the value of the random process
depends only upon the one most recent previous value and given that value the
random process is independent o f all values in the more distant past. Such a
model is called a Markov model and is often described by saying that a Markov
process is one in which the future value is independent of the past values given
the present value.
Models in which the future depends only upon the present are common
among electrical engineering models. Indeed a first-order linear differential
equation or a first-order linear difference equation is such a model. For example,
the solution for i(t) that satisfies
di
— + a0i(t) = f (t )
for f > f0 requires only r(f0) and the solution cannot use knowledge of i(t) for
t < t0 when /(t0) is given. Even if f (t ) is random, values of f (t ) or i(t) for t <
t0 are of no use in predicting /(f) for t > t0 given
Higher order difference equations require more past values (an nth order
equation requires the present and n — 1 past values) for a solution. Similarly,
an nth order differential equation requires an initial value and n — 1 derivatives
at the initial time. An nth order difference equation can be transformed to n
first-order difference equations (a state variable formulation) and thus the de
pendence on initial conditions at n different times is transformed to n values at
one time. Such models are analogous to an nth order Markov processes.
We have argued that Markov processes are simple and analogous to familiar
models. We present later several examples that have proved to be useful. Before
presenting these examples and discussing methods for analyzing Markov proc-
Continuous Discrete
esses, we classify Markov processes and present a diagram called a state diagram
which will be useful for describing Markov processes!
The classification of Markov process is given in Table 5.1. Note that if the
values of X(t) are discrete, then Markov processes are called Markov chains.
In this section only Markov chains, including both sequences and continuous
time processes, are discussed.
Markov chains, that is, Markov processes with discrete X(t), are usually
described by referring to their states. There are a finite or at most a countable
number of such states, and X(t) maps each state to a discrete value or number.
With the Markov concept, the next state is dependent only upon the present
state. Thus, a diagram like Figure 5.9 is often used to describe a Markov chain
that is a sequence and a similar diagram is used to describe a Markov chain in
which time is continuous. In Figure 5.9, each number adjacent to an arrow
represents the conditional probability o f the Markov chain making the state
transition in the direction o f the arrow, given that it is in the state from which
the arrow emanates. For example, given that the Markov chain of Figure 5.9 is
in state 1, the probability is .4 that its next transition will be to state 2.
least the preceding letter (e.g., the probability of a “ u” is 1 given the preceding
letter is a “ q” ). If the probability o f a letter depended only upon the preceding
letter, then the sequence o f letters could be modeled as a Markov chain. However
the dependence in English text usually extends considerably beyond simply the
previous letter. Thus, a Markov model with a state space o f the 26 letters plus
a space would not be adequate. However, if instead o f a single symbol, the
states were to represent blocks of say 5 consecutive symbols, the resulting Mar
kov model might be adequate. In this case there are approximately (27)5 states,
but this complexity is often compensated by the fact that a Markov chain model
may be used. With the expanded state model the message:
“ This-book-is-easy-for . . .”
This-;book-;is-ea;sy-fo; . . .
and we model each state as being dependent only upon the previous state.
This example differs from the preceding one in the sense that time is continuous,
while the preceding example consisted o f a sequence. A piece o f equipment,
for example, a communication receiver, can have two states, operable and non-
operable. The transitions from the operable to the nonoperable state occur at
a prescribed rate called the failure rate. The transitions from the nonoperable
to the operable state occur at the repair rate. If the rates of transition depend
only on the present state and not on the repair history, then a Markov model
can be used.
P[ X(m) = x m\X(m - 1) = x m- u X { m - 2) = x m. 2,
• • • , AT(0) = x„]
= P[ X( m) = x j X ( m - 1) = xm_,] (5.59)
In this section, we will develop, in matrix notation, a method for finding the
probability that a finite Markov chain is in a specified state at a specified time.
That is, we want to find the state probabilities
Now from Chapter 2, the joint probability is given by the product o f the
marginal and the conditional probability, that is
Using the notation of Equations 5.60 and 5.61 in the preceding equation, we
have
F ig u r e 5 .1 0 S ta te d ia g r a m f o r E x a m p l e 5 .3 .
EXAMPLE 5.3.
Note that the sum across each row is one, as it must be. If we assume that A
corresponds with message one, B corresponds with message two, and C corre
sponds with message three, then the conditional probability in row i and column
j is - 1, m), i - 1, 2, 3, j = 1 ,2 , 3, for all m. For instance P2,i(m -
1, m) = .3. This example is displayed in the state diagram of Figure 5.10.
Assume that the probabilities of the three starting states are given as
p m = -5, p 2{o) = .3 , Pm = .2
MARKOV SEQUENCES AND PROCESSES 281
We now want the probabilities of the next message. These are found using
Equation 5.64 as follows:
M i) = m o ) pu (o, i) + M 0 )M (0 , i) + M 0 )M (0 , i)
= (,5)(.5) + (,3 )(.l) + (,2 )(.l) = .30
EXAMPLE 5.4.
.5 .1 .4
Pr( l) = [-5 -3 ■2] .1 .6 .3
.1 .2 .7
as found earlier.
282 SPECIAL CLASSES OF RANDOM PROCESSES
p r(l) = p r(0)P(0, 1)
p r(2) = p r( l) P ( l, 2) = p r(0)P(0, 1)P(1, 2)
Pr(3) = PT(0)P(0, 1)P(1, 2)P(2, 3)
where
P (l)'*1is an rc-stage transition matrix, that is, the i, yth element of P (l)'1represents
the probability of transferring, in n time intervals, from state i to state j.
MARKOV SEQUENCES AND PROCESSES 283
EXAMPLE 5.5.
Find P(2), P(3), . . . , P(10) for the homogeneous Markov chain represented
by the matrix
.5 .1 .4
1 .6 .3
1 .2 .7
.3 .19 .51
P(2) = P ( l ) 2 14 .43 .43
14 .27 .59
.220 .246 .534
P(3) = P ( l ) 3 .156 .358 .486
.156 .294 .550
.1880 .2764 .5356
P(4) = P ( l ) 4 = .1624 .3276 .5100
.1624 .3020 .5356
.17520 .29176 .53304
P(5) = P ( l ) 5 .16496 .31480 .52024
.16496 .30456 .53048
.1668 .3053 .5279
P(10) = P ( l ) 10 .1666 .3057 .5277
.1666 .3056 .5278
Note that the elements in each row seem to be approaching a constant value.
Thus
P [ ( X ( n x) = i), ( X ( n 3) = y)]
= E p [{X{ti.) = i), ( X ( n 2) = k), ( X ( n 3) = / ) ] (5.73)
all k
and
P [ ( * ( « i ) = i), ( X ( n 2) = k), ( X ( n 3) = ;) ]
= Pi{ni)Pi,k{n2 - ni)P*,/(«3 - n2) (5.75)
Dividing both sides by p f n f ) produces the desired result. Equation 5.72 is called
the Chapman-Kolmogorov equation and can be rewritten in matrix form for
finite chains as
A
Uj = lim p;(n)
or
Equation 5.79 can be used to find the steady-state probabilities if they exist.
The solution'to Equation 5.79 is not unique because P is singular. However, a
unique solution can be obtained by using
S ■■= 1 (5-80)
all/
EXAMPLE 5.6.
.5 .1 .4
.1 .6 .3
1 .2 .7
286 SPECIAL CLASSES OF RANDOM PROCESSES
or
These equations are linearly dependent (the sum of the first two equations is
equivalent to the last equation). However, any two of them plus Equation 5.80,
that is,
P(n) = P(l)"
b + a( 1 — a — b ) n a - a(l - a - b ) n
a + b a + b
P (l) (5.81)
b — b( 1 — a — b ) n a + b{ 1 — a — b ) n
a + b a + b
First the root of the induction follows by letting n = 1 in Equation 5.81; this
shows that
b + a — a1 — ab a - a + a2 + ab
a + b a + b
P (l) =
b — b + ab + b2 a + b — ab — b2
a + b a + b
1 —a a
P(n + 1) =
b 1 - b
b + a(l — a — b ) ” a - a( 1 - a — b y
a + b a + b
b — b(l — a - b)" a + b ( 1 - a — b) "
a + b a + b
Letting r = (1 — a - b)
This completes the inductive proof o f Equation 5.81. Note that if 0 < a < 1 and
0 < b < 1, then |r| < 1. Thus
b a
lim P (« ) a + b a + b
b a
a + b a + b
Note that
b a
-nT = lim p T(n) = p r(0) lim P (« )
n— n—*■» a + b’ a + b
is independent of p(0).
Once again, steady-state or stationary-state probabilities have been achieved.
Note that if a = b = 0, then
1 0
p(«) = p(0) = P(0)
0 1
Also if a = b = 1, then
0 1 fp(°)> if n is even
p(n) = p(0)
1 0 IP(0) [? a, if n is odd
Note that in this last case, that is, a = b = 1, true limiting-state probabilities
do not exist.
We have observed in both Examples 5.5 and 5.6 that as n —* °°, the n-step
transition probabilities Pi:j{n) approach a limit that is independent o f i (all
elements in a given column of the P matrix are the same). Many, but not all,
Markov chains have such a limiting steady-state or long-term behavior. The
investigation and analysis of the limiting conditions o f different types o f Markov
chains is beyond the scope of this book. See the references listed in Section 5.7.
MARKOV SEQUENCES AND PROCESSES 289
This is proved by the same steps used in proving Equation 5.72. (See Problem
5.36.)
Ad
[^ ,;( t ) ] U o / 7^ j (5.86)
J 3t
2 P ,^) = 1 (5.87)
all j
shows that
2 hi = 0 (5-88)
all j
or
x,„ - 2 hi (5.89)
= 1 - 2 M = 1 + hi* (5-90)
Note that X,; , i # j will be positive or zero and thus by Equation 5.89, X,, will
be nonpositive.
Now we use the transition intensities to find the state probabilities. Using
the basic Markov property, that is
we have
because the first sum is the probability of changing from another state to state
i in time e whereas the second sum is the probability of changing from state i
to a different state in time e.
Dividing both sides o f Equation 5.92 by e and then taking the limit as e
approaches zero, we have, using Equation 5.86
dpM
= 2 biPii0 - 2 K k P M (5.93)
dt j*i k*i
Equation 5.93 is true for all i; thus using Equation 5.89 we have for finite chains
dp i(Q dpm(t)
[P l(0 ■ • •PmiO] (5.94)
dt dt
EXAMPLE 5.7.
A communication receiver can have two states: operable (state 1) and inoperable
(state 2). The intensity o f transition from 1 to 2, Xu , is called the failure rate
and the intensity of transition from 2 to 1, \2.i is called the repair rate. To
simplify and to correspond with the notation commonly used in this reliability
model, we will call \L2 — X and \2.i = A Find
SOLUTION: With the notation introduced in this example, Equation 5.94 be
comes
K X
[P i(0 Pi(0] = [P i(0 Pz(0] p. -p
or
p[{t) = - X p ,( t) + P p,(t)
= -x p .( 0 + p [i - P i( 0 ]
292 SPECIAL CLASSES OF RANDOM PROCESSES
.. p r , N, [Xp,(0) — p p 2(0)l
‘ r h . + + ^ V + %
pM - 1 - pM - — 7— + e x p ( - ( \ + p)/] I m A j l A M O )]
X + p (X + p)
M-
lim pi(r) =
X+ p
X
lim p 2(t) =
X -I- p
Birth and Death Process. Suppose that a continuous-time Markov chain takes
on the values 0, 1, 2, . . . and that its changes equal + 1 or - 1 . We say that
MARKOV SEQUENCES AND PROCESSES 293
>*
11
©
j 5^ i — 1, i, i + 1
(The state diagram for this process is shown in Figure 5.14.C (page 305) for the
case when the rates are independent of i.) Then for small e
Then for n s: 1
p „(r + e) - p „(Q
p',(<) = lim
<L—o e
= Xo,n- 1p „ - 1( 0 - (x0i„ + xd,„)/>„(t) + x^^+1p „+1( 0 (5.96.a)
and for « = 0
PoW = - K o P o ( t ) + X ^ p ^ t) (5.96.b)
Equation 5.97 is called the balance equation, which we now solve. From Equa
tion 5.97.b we have
^■0,0
Pi ~ s P o
'V i
^ -0 .0 -1 ^0 ,0 * ^0 ,1 * " * ^ 0 .0 -1
P rt ~~ . Pn- 1 — . p0 (5.99)
Kd,n *d,l ■ n-d,2 ■ ■ ■ A
Since /?* = 1
Po = (5.100)
i + z n
A=1 / =o
Note that i denotes a particular member function and the argument 1 denotes
the first occurrence time after t = 0. Now, if T = (0, oo), then we specify the
fth member function by
For example, 7(5) is a random variable that represents the time of the fifth
occurrence after t = 0. If the point process has an uncountably infinite number
of sample paths, then we will use t(n) to denote a particular sample path. If
there are only a countable number of sample paths, then we will use t‘(n) to
denote the fth sample path.
A point process may also be described by a waiting (or interarrival) time
sequence W(n) where W{n) is defined as
W{n) = T(n + 1) - 7 (n )
« = 1, 0, 1, . . . if T = ( -o o , oo) (5.104)
or
The random variable W{n) represents the waiting time between the «th and
(n + l)st occurrences.
A third method of characterizing a point process is through the use of a
counting procedure. For example, we can define a process, X{ t ), t G T, such
that
Time
-i— i ------------------ 1
— i --------------i — :---------------- i ---------- i -
« -i) (to) 0 <(1) f(2) i(3) M )
I I
(o) Times of occurrence T in)
I I
Although models that do not meet all the foregoing requirements have been
developed, they are beyond the scope of this text, and we will focus our attention
only on models that satisfy all four assumptions.
In the following sections we develop two point-process models and illustrate
their use in several interesting applications.
If \(t) does not depend on t, then the process is called a homogeneous process.
The counting process associated with this point process is called the Poisson
process, and we now show that for a homogeneous Poisson (counting) process,
the number o f occurrences in the interval (0, t) is governed by a Poisson dis
tribution.
Let us denote the probability that there are k occurrences in the time interval
(0, t) by P{k, t). Then the probability o f no occurrences in the time interval
(0, t) is P(0, t), and P( 0, t + A/) is related to P ( 0, t) by
At v
dP (0, t)
- X P ( 0, t)
dt
Solving this first-order differential equation with the initial condition P(0, 0) =
1, we have
dP + XP(k, t) = \ P ( k - 1, f) (5.109)
Starting with P(0, t) = e x p (-X t), we can recursively solve Equation 5.110 for
P (l, t), P (2, t), and so on, as
P ( l , t) = \f exp( - \t)
P(2, t) = e x p ( -\ t )
P(k, 0 = e x p (-X f)
(5.111)
300 SPECIAL CLASSES OF RANDOM PROCESSES
Since we define the counting process to have 3f(0) = 0, Equation 5.111 also
gives the probability that X ( t ) - 3f(0) = k. That is
where
P [ X (t) = k ] = = 0, 1, . . . (5.115)
W ' i) = k u X ( t 2) = k2]
= P [ X ( t 2) = k2|AT(f,) = k ^ P l X i t , ) = * ,], h > h (5.116)
POINT PROCESSES 301
P [ X { t 2) = k2\X{tx) = A,]
= P [ X ( t 2) - X ( t x) = k2 - A,]
(\[t2 - ti])*2-*'
e x p (-\ [f2 - f j ) , k2 > k x
{ k2 - k x)\
= 0 k2 < kx (5.117)
Combining Equations 5.115, 5.116, and 5.117, we obtain the second-order prob
ability function as
P [ X ( tl) = A„ X ( t 2) = k2]
f - 'll)**'*1 e x p ( - \ f 2)
= kii(k2 - k xy. ’
1 0 k2 < k x (5.118)
Proceeding in a similar fashion we can obtain, for example, the third-order joint
probability function as
and thus show that the Poisson process has the Markov property. Recall also
that the Poisson process is an independent increments process.
302 SPECIAL CLASSES OF RANDOM PROCESSES
Note that the Poisson process is nonstationary. The distribution of the waiting
or interarrival time of a Poisson process is o f interest in the analysis of queueing
problems. The probability that the interarrival time W is greater than w is
equivalent to the probability that no events occur in the time interval (0, w).
That is
or
X e x p (-X w ), w a: 0
0 (5.121)
elsewhere
Note that the expected value of the waiting time has the value
= 1/X (5.122)
Superposition of Poisson Processes. The Poisson process has the very inter
esting property that the sum of n independent Poisson processes with rate func-
POINT PROCESSES 303
tiOns Xi(t), X2(0> • • • , X„(r) is a Poisson process with rate function X(r) where
Note that the average service time for a customer (after the customer reaches
the server) is
E{S} = f (5.125)
*-d
and the service times for different customers are assumed to be independent
random variables.
In the analysis of queues, one of the important quantities of interest is the
average number of customers waiting for service, that is, the average queue
length. This will be a function of the arrival (or “ birth” ) rate and the departure
"Queueing models are designated by three letters: A /R /S where the first letter denotes arrival
model (M for M arkov), the second letter denotes the service time distribution (M for M arkov), and
the last letter represents the number o f servers.
304 SPECIAL CLASSES OF RANDOM PROCESSES
Poisson ^
arrival > 1 Server Departures X<f
(Exponential
distribution)
Figure 5.14a M /M /l queue.
^ n+1
Queue
occupancy
t"-
Time
t t + At
1- \ aA t - \ dA t 1- \ aA t - \ dA t
Subtractingp n(t) from both sides, dividing by At, and taking the limit as At —>
0 produces the derivative p'„{t). Now if we focus on the stationary behavior of
the queue after the queue has been operating for a while, we can assume that
p'n(t) = 0. Furthermore, if we assume that ka At and kd At are both « 1, then
as Af —> 0 we can ignore all terms involving At2 and rewrite Equation 5.126 as
Equation 5.126 is shown diagrammed in Figure 5.14c. Note that this defines a
continuous time Markov chain, and that
1 (5.128)
— P oP
where
• Pi = (P + 1) P i - Ppo = P2Po
P„ = P nP o
Finally, substituting p„ in Equation 5.128 we can solve for p 0 and show that
Pn = (1 - p)p" (5.130)
= X nPn
n=0
= (5.131)
Using the result given in Equation 5.131, it can be shown [11] that the average
waiting time W is
(5.132)
Equations 5.131 and 5.132 are used extensively in the design of queueing systems.
EXAMPLE 5.8.
SOLUTION:
(b) E { W } £ 3 requires
Figure 5.15 Shot-noise process, (a) Single pulse. (b) Sample function of X(t).
where h(t) is the shape of the reflected pulse, -rk is the time at which the £th
pulse reaches the receiver, and A k its amplitude which is independent o f t * . A k
and Aj, for j k, can be assumed to be independent and identically distributed
random variables with the probability density function f A(a).
We now derive the properties o f X(t) and Y (t). The derivation of the prob
ability density functions o f X(t ) and Y (f) is in general a very difficult problem,
and we refer the interested reader to References [7] and [13] of Section 5.7.
Only the mean and autocorrelation function are derived here.
Mean and Autocorrelation of Shot Noise. Suppose we divide the time axis into
nonoverlapping intervals of length At, where At is so short that X At « 1. Let
Im be a random variable such that for all integer values of m
Since X At « 1, we can neglect the probability of more than one new pulse in
any interval of tength At. Then, because of the Poisson assumption,
P ( I m = 0) = exp( —X At) « 1 - X At
P ( I m = 1) = X At e x p (-X At) « X At
and
E { I J = X At
E { I i } = X At
x (0 = E - tn At)
m =-»
This new process X(t) in which at most one new pulse can start in each interval
At approaches X (t ) as At —» 0, and we use X(t ) to derive the mean and auto
correlation of X(t).
The mean of AT(f) is obtained as
E{ X( t )} - E{X( t )}
- S E { I m}h(t - m At)
m = —*
2) X At h{t — m At)
2) h(t - m At) At
_ m = —»
As At —> 0, the summation in the previous equation [which is the area under
h(t)] becomes an integral and we have
E { X( t) } = X h(u) du (5.135. a)
310 SPECIAL CLASSES OF RANDOM PROCESSES
Rxxi h, h) = E { X ( t O X ( t 2)}
= E { X ( t x) X { t 2)} = R**(tu h)
m} = X At,
E{I2 mx - m2 = m.
E { Im, L 2} =
£ { / mi} £ { / m;} = [X At]? m l ¥■ m2
Thus
and when At - » 0,
E { X ( t ) } = KH( 0) (5.136.a)
and
The reader can show that the process Y(t) given in Equation 5.134 has
£ { y ( t ) } = X £ {A } J ‘ 'h(u) du (5.137.a)
and
(5.137.b)
EXAMPLE 5.9.
The pulses in a shot noise process are rectangles of height A k and duration T.
Assume-that >l*!s are independent and identically distributed with P [ A k = 1] =
1/2 and P [ A k = —1] = 1/2 and that the rate, X = 1/T. Find the mean, au
tocorrelation, and the power spectral density o f the process.
SOLUTION:
= £ A kh(t - t*)
E {Y ( t ) } E { A k} J h(u) du
T
,= 0
Srr(f) =\ m m
where
H ( f ) = s in ( ir /r ) /i r /
The reader should note that both the autocorrelation function and the spectral
density function o f this shot noise process is the same as those of the random
binary waveform discussed in Section 3.4.4. These two examples illustrate that
even though the two processes have entirely different models in the time domain,
their frequency domain descriptions are identical. This is because the spectral
density function, which is the frequency domain description, and autocorrelation
function are average (second moment) descriptors and they do not uniquely
describe a random process.
In this section we introduce the most common models of noise and of signals
used in analysis o f communication systems. Many random phenomena in physical
problems including noise are well approximated by Gaussian random processes.
By virtue o f the central limit theorem, a number of processes such as the Wiener
process as well as the shot-noise process can be approximated by a Gaussian
GAUSSIAN PROCESSES 313
where
~xC •*i
X2 X ( t 2)
X = = X =
-G
~ E { X { h)} H - x ( / i)
E {X ( t 2)}
P-X = :
E { X { t n)}_ _ F -x (0 _
and
— R x x ( t i , tj) — Px(ti)v-x(tf)
E {X (t,)j = |xY
and
S nn( D — 2 (5 . 139)
GAUSSIAN PROCESSES 315
A , v ,v ( t ) = ^ S ( t ) (5.140)
which implies that N(t) and N(t + x) are independent for any value o f x t4 0,
The spectral density given in Equation 5.139 is not physically realizable since
it implies infinite average power, that is
J S,w (f)
However, since the bandwidths of real systems are always finite, and since
for any finite bandwidth B , the spectral density given in Equation 5.139 can be
used over finite bandwidths.
Noise having a nonzero and constant spectral density over a finite frequency
band and zero elsewhere is called band-limited white noise. Figure 5.16 shows
such a spectrum where
h
l/l < B
2’
0 elsewhere
The reader can verify that this process has the following properties:
1. E{N-{t)) = T,5
sin 2trfix
2. « W (T) = T,5 2^ 5 t
are independent.
It should be pointed out here that terms “ white” and “ band-limited white”
refer to the spectral shape of the process. These terms by themselves do not
imply that the distributions associated with X(t ) are Gaussian. A process that
is not a Gaussian process may also have a flat, that is, white, power spectral
density.
316 SPECIAL CLASSES OF RANDOM PROCESSES
S,v,v(/1
- B O B
-R,v,vM
For random sequences, white noise N(k) is a stationary sequence that has
a mean of zero and
E{N( n) N( m) } = 0, n ^ m
= o’y, n = m
Also
<t,v. n = 0
R\s(n)
0, n A0
and thus
Sss(f) = cr.V.
GAUSSIAN PROCESSES 317
1. E{Y(f)} = ^ nH ( 0) = 0 (5.140.a)
2. R ) =
yy{ t R nn( t) * K T) * h(-T) = j 8(t) * h ( t) * h { - t) (5.140.b)
In Equations 5.140.a-c, h(t) and H ( f ) are the impulse response and transfer
function of the system, respectively.
Since the convolution integral given by
Y(t) = h(t - a) N( a) da
is a linear operation on the Gaussian process N(t), the output Y(t) will have
Gaussian densities and hence is a Gaussian process.
where X(t) is a bandpass signal; R x (t) and Qx(t) are lowpass signals. Rx (t) is
called the envelope of the bandpass signal X(t), Ox(t) is the phase, and /„ is
the carrier or center frequency. X ( t ) can also be expressed as
X c(t) = Rx (t)cos 0 r (f) and X^t ) = /?A-(t)sin d x(t) are called the quadrature
components of X(t).
If the noise in the communication system is additive, then the receiver ob
serves and processes X(t ) + N(t) and attempts to extract X(t). In order to
analyze the performance of the receiver, it is useful to derive a time domain
318 SPECIAL CLASSES OF RANDOM PROCESSES
where iVc(/) and Ns(t) are two jointly stationary random processes. We now
attempt to find the properties of Nc(t) and Ns(t) such that the quadrature rep
resentation yields zero mean of N(t) and the correct autocorrelation function
/ ? m v ( t ) (and hence the correct psd).
+ - .R ^ C O js in 2 tt / 0t
- [ ^ w/ t) + E ^ M J s in 2-ir/0(2t + t)
Since N(t) is stationary, the left-hand side of the preceding equation does not
depend on t. For the right-hand side to be independent of t, it is required that
R ncNc( t ) — (5.144)
GAUSSIAN PROCESSES 319
and
Hence
Now, in order to find the relationship between R NcNf r ) , R NsNf ' r), and R NN( t),
we introduce the Hilbert transform N { t ) , where
N A(t ) = N ( t ) + j N ( t ) (5.148)
Using iV(t) and iV(r), we can form the processes N c(t) and N s(t) as
N c(t ) = R e { N A( t ) e x p ( - j 2 v f 0t)}
= N ( t )cos 2ttf 0t + N ( t )sin 2-rrf 0t (5.150.a)
and
and
In the frequency domain, the power spectral density o f the analytic signal
NA(t) can be obtained by taking the Fourier transform of Equation 5.149.C as
SNaNaV ) = 4 Sm ( f ) U ( f ) (5.154)
1, /> 0
U{f) =
0 / —o
W /)
r
-B 0 B
S x w i-f + f o ) U (~ f+ f o )
i
assumed to be known, and the power spectral densities o f the quadrature com
ponents o f Nc(t) and Ns(t).
S N cN c( f ) ~ ^ N sNs( f )
and
where
R n( t) = V N] (t ) + NKO (5.156)
and
It is left as an exercise for the reader to show that the envelope RNo f a bandpass
Gaussian process has a Rayleigh pdf and the phase 0 Whas a uniform pdf.
A- /T-
extraction at the receiver might be possible in the absence of “ noise” and other
contaminations. But thermal noise is always present in electrical systems, and
it usually corrupts the desired signal in an additive fashion. While thermal noise
is present in all parts o f a communication system, its effects are most damaging
at the input to the receiver, because it is at this point in the system that the
information-bearing signal is the weakest. Any attempt to increase signal power
by amplification will increase the noise power also. Thus, the additive noise at
the receiver input will have a considerable amount of influence on the quality
of the output signal.
2 2
Figure 5.18b Spectral density of the quadrature components.
324 SPECIAL CLASSES OF RANDOM PROCESSES
Using the relationship shown in the phasor diagram (Figure 5.18c), we can
express s(t) + N(t) as
E{N%t)} = E{N%t)} « j
Hence we can use the approximations (see Figure 5.18c) A + Nc(t) = A, and
tan-1 0 = 0 when 0 « 1 and rewrite s(t) + N(t) as
where
and
GAUSSIAN PROCESSES 325
are the noise-induced perturbations in the amplitude and phase of the signal
•s(0- The mean and variance o f these perturbations can be computed as
E {N M
E { A N{t)}
var{v4,v(7)} ~ var{/Vc(r)}
A2
and
E { e N(t)} = o
var{0N(t)} =
N it)
Noise
Figure 5.19a Model of an analog communication system.
326 SPECIAL CLASSES OF RANDOM PROCESSES
A
S (t)
N (t)
where hR(t) is the impulse response of the receiver and * denotes convolution.
The receiver transfer function is chosen such that S(t) is a “ good” estimate of
S(t).
sut+m t)
can be used to judge the effectiveness of the receiver. The receiver output can
be written as
EXAMPLE 5.11.
Given that 5(f) and N(t) are two independent stationary random processes with
2
Sss(f) and SNN( f ) = 1
(2i t / ) 2 + 1
(as shown in Figure 5.21.a), find the best first-order lowpass filter that produces
a minimum MSE estimator of 5(f) from 5 (f) + N(t), i.e., find the time constant
T of the filter H R( f) = 1/(1 + f lv f T) .
SOLUTION: Using Figure 5.216, we can show that the MSE is given by
(a )
where
R y, y2(t ) = *J ir( - t)
^ r , r 2( - T ) = R ss ( t ) * / / r ( t )
R y2Y2( t ) = R ss{t)
S M ) = [1 - / / « ( / ) ] [ ! - H*R(f)]Sss(f) + H R(f)H*R( f) Sm ( f)
GAUSSIAN PROCESSES 329
and
= f “ See{ f ) d f
J— so
T J_
MSE
1 + T + 2T
d(MSE)
dT
(V 2 - 1)
Noise Nit)
phasis) filter, both signal distortion and output noise can be minimized. Indeed
we can choose Hp( f ) and Hd( f ) such that the mean squared error
Hp( f ) H c( f ) H d( f ) = 1 (5.158)
£{52(r)}
(5.159)
E{Nl{t)}
Nit)
{Dt } lbk}
Pe = P{ Dk ^ Dk)
and the receiver structure and parameters are chosen to minimize Pe. Analysis
o f the effects of noise on the performance of digital communication systems and
the optimal designs for the receivers are treated in detail in the following chapter.
This section introduced models o f noise and signals that are commonly used in
communication systems. First, Gaussian processes were described because o f
their general use and their use to model thermal noise. Next, white and band-
limited white noise were defined for continuous processes, and white noise was
defined for sequences. The response of LTIV systems to white noise was also
given. Narrowband signals were modeled using the quadrature representation
by introducing Hilbert transforms. Such a representation was used to analyze
the effect of noise on the amplitude and phase o f a narrowband signal plus noise.
The design of a receiver, that is, filter, to “ separate” the signal from the
noise was introduced when the form of the filter is fixed. This idea is expanded
in Chapter 7. Similarly, receiver design for digital systems was introduced and
will be expanded in Chapter 6.
5.6 SUMMARY
This chapter presented four special classes of random processes. First, auto
regressive moving average models were presented. Such models of random
processes are very useful when a model o f a random process is to be esti
mated from data. The primary purpose o f this section was to familiarize the
reader with such models and their characteristics.
Second, Markov models were presented. Such models are often used in ana
lyzing a variety of electrical engineering problems including queueing, filter
ing, reliability, and communication channels. The emphasis in this chapter
was placed on finding multistage transition probabilities, probabilities of each
state being occupied at various points in time, and on steady-state probabili
ties for Markov chains.
Third, point processes were introduced, and the Poisson process was empha-
332 SPECIAL CLASSES OF RANDOM PROCESSES
Finally, models o f noise and signals that are usually employed in analysis of
communication systems were introduced. In particular, Gaussian white noise
and bandlimited white noise models were introduced. The quadrature model
o f narrowband noise was also introduced. Analog communication system de
sign was illustrated using an example o f minimizing the mean-square error.
5.7 REFERENCES
R e f e r e n c e s a r e lis t e d a c c o r d i n g t o th e t o p ic s d is c u s s e d in this c h a p t e r .
Gaussian Processes
[1 3 ] A . P a p o u lis , Probabilityr Random Variables and Stochastic Processes, M cG ra w -
H ill, N e w Y o r k , 1984.
5.8 PROBLEM S
5.2 Find the Z transform o f Equation 5.1, and the Z transform H z (z) o f the
digital filter shown in Figure 5.1. Assume that e(n) and X(n) are nonzero
only for n > 0, and that their Z transforms exist.
5.4 Refer to Problem 5.3, and assume stationarity. Find, for the sequence
X(tt), (a) \lx , (b) cry, (c) Rxx(k), (d) rxx(k), (e) Sxx( f).
5.5 Find the spectral density function of a random sequence that has the
autocorrelation function $7a<j 2
x , m z 0 by using the Fourier transform;
that is, show that
5.6 With cry = 1, plot Sxx( f ) for (a) 4>u = -8; (b) <t>u = ~-9.
5.7 Show that Equations 5.20.a and 5.20.b are possible solutions to Equation
5.19.
5.10 Show that if the inequalities following Equation 5.27 are true, then crx as
defined by Equation 5.27 will be positive. Is the converse true?
5.14 For a second-order autoregressive model with one repeated root of the
characteristic equation, find (a) the relationship between 4>2,i and ct>2,2; (b)
bx and b2 in the equation rxx(m) = blKm + b2m\m
5.15 Given that rxx{ 1) = f, rxx(2) = i, and rxx{3) = i, find the third-order
autoregressive model for X(n).
5.17 For cj>2ii = .5, § 2,i - ■06 find Sxx(f) by the two methods given for second-
order autoregressive models.
5.19 For a third-order autoregressive process, find 4*1,1 , 4*2,2. 4*3,3. an<i 4*4,4-
5.21 AT(n) = 0.8e(n - 1) + e(n) where e(n) is white noise with <j 2n= 1 and
zero mean. Find (a) p.*, (b) a 2
x , (c) rxx(m), (d) SXX(J), (e) 4>,.,.
Sxx(f) = |W(/)|2^ ( / )
and
H ( f ) = 6 u e x p ( - j/'2TT/) + 1
5.24 For the gth order moving-average model, find (a) rxx(m) and (b) the Z
transform of Rxx{m).
5.28 Find the digital filter diagram and the state model o f an A R M A (3, 1)
model.
5.29 Describe the random walk introduced in Section 3.4.2 by a Markov proc
ess. Is it a chain? Is it homogeneous? What is p(0)? What is P (l)?
X(n + 1) = X(n) - 1
Current Next
Message Message
1 2 3
1 .5 .3 .2
2 .4 .2 .4
3 .3 .3 .4
and that p r(0) = [.3 .3 .4]. Draw the state diagram. Find p(4).
336 SPECIAL CLASSES OF RANDOM PROCESSES
Pin - 1, n)
5.34 Assume that the weather in a certain location can be modeled as the
homogeneous Markov chain whose transition probability matrix is
shown:
Today's Tomorrow's
Weather Weather
Fair Cloudy Rain
Fair .8 .15 .05
Cloudy .5 .3 .2
Rain .6 .3 .1
5.35 A certain communication system is used once a day. At the time o f use
it is either operative G or inoperative B. If it is found to be inoperative,
it is repaired. Assume that the operational state o f the system can be
described by a homogeneous Markov chain that has the following transition
probability matrix:
l - q q 1
r 1 - r\
Find P(2), P(4), lim„_«, P(n). If p r(0) = [a, 1 — a\, find p(n).
= 1 + \jjz, k ~ j
Show that as t approaches t , t< t
or
p '( 0 = P (0 *
which has the solution
P (0 = p(0)exp(Xf)
5.38 For the two-state continuous-time Markov chain using the notation of
Example 5.7, find the probability o f being in each state, given that the
system started at time = 0 in the inoperative state.
5.39 For Example 5.7, find the average time in each state if the process is in
steady state and operates for T seconds.
5.40 For Example 5.7, what should pi(0) be in order for the process to be
stationary?
5.41 The clock in a digital system emits a regular stream of pulses at the rate
o f one pulse per second. The clock is turned on at t - 0, and the first
pulse appears after a random delay D whose density is
f D{d) = 2 ( 1 - d), 0 < d. < 1
5.44 (a) Show that the sum of two Poisson processes results in a Poisson process,
(b) Extend the proof to n processes by induction.
5.46 Assume that an office switchboard has five telephone lines and that starting
at 8 . M . on Monday, the time that a call arrives on each line is an
a
exponential random variable with parameter A.. Also assume that calls
338 SPECIAL CLASSES OF RANDOM PROCESSES
arrive independently on the lines and show that the time of arrival of the
first call (irrespective of which line it arrives on) is exponential with pa
rameter 5\.
5.47 Consider a point process on T = ( - 00, =°) such that for any 0 < t <
P[X (t) = k] = e x p ( - a t ) [ l — e x p ( -a t )] \ k = 0, 1, 2, . . .
where a > 0 is a constant. Find p,x (0 and the “ intensity function” of the
process \(f), where \(t) is defined as
= > 0
At
5.48 Jobs arrive at a computing facility at an average rate of 50 jobs per hour.
The arrival distribution is Poisson. The time it takes to process a job is a
random variable with an exponential distribution with a mean processing
time o f 1 minute per job.
a. Find the mean delay between the time a job arrives and the time
it is finished.
b. If the processing capacity is doubled, what is the mean delay in
processing a job?
c. If both the arrival rate and the processing capacity increase by a
factor o f 2 , what is the mean delay in processing a job?
5.49 Passengers arrive at a terminal for boarding the next bus. The times of
their arrival are Poisson with an average arrival rate of two per minute.
The times of departure of each bus are Poisson with an average departure
rate of four per hour. Assume that the capacity of the bus is large.
are rectangular
1, O< t < T
MO 0 elsewhere
Find P[X(t) = k}.
5.52 Verify the properties of band-limited white Gaussian noise (Page 315).
5.54 N(t) is a zero-mean stationary Gaussian random process with the power
spectral density
5.55 Let RN(t) and 0N(Obe the envelope and phase o f N(t) described in Prob
lem 5.54.
a. Find the joint pdf of R„, and 0 N.
b. Find the marginal pdfs of RN and QN.
c. Show that RN and 0 N are independent.
5.56 Let Z(t) = A cos(2irf ct) + N(t) where A and f c are constants, and N(t)
is a zero-mean stationary Gaussian random process with a bandpass psd
^v,v(/) centered at f c. Rewrite Z(t) as
Z(r) = R(t)cos[2-n f ct + 0 (0 ]
and
Find and sketch the power spectral density function of F(f). [Assume that
N(t) and 0 are independent.]
340 SPECIAL CLASSES OF RANDOM PROCESSES
(which implies that the envelope varies slowly with a time constant of the
order o f 1 IB).
Rxx{ t ) = exp(-a|T|)
Show that 2f(r) is Markov.
CHAPTER SIX
Signal Detection
6.1 INTRODUCTION
In Chapters 3 and 5 we developed random process models for signals and noise.
We are now ready to use these models to derive optimum signal processing
algorithms for extracting information contained in signals that are mixed with
noise. These algorithms may be used, for example, to determine the sequence
of binary digits transmitted over a noisy communication channel, to detect the
presence and estimate the location of objects in space using radars, or to filter
a noisy audio waveform.
There are two classes o f information-extraction algorithms that are consid
ered in this book: (1) signal detection and (2) signal estimation. Examples of
signal detection and signal estimation are shown in Figure 6.1a and 6.16. In
signal detection (Figure 6.1a), the “ receiver” observes a waveform for T seconds
and decides on the basis of this observed waveform the symbol that was trans
mitted during the time interval. The transmitter and the receiver know a priori
the set of symbols and the waveform associated with each of the symbols. In
the case of a binary communication system, the receiver knows that either a
“ 1” or “ 0” is transmitted every T seconds and that a “ 1” is represented by a
positive pulse and a “ 0” is represented by a negative pulse. What the receiver
does not know in advance is which one o f the two symbols is transmitted during
a given interval. The receiver makes this “ decision” by processing the received
waveform. Since the received waveform is usually distorted and masked by noise,
the receiver will occasionally make errors in determining which symbol was
present in an observation interval. In this chapter we will examine this “ decision”
(or detection) problem and derive algorithms that can be used to determine
342 SIGNAL DETEC TION
Transmitted
sequence 1 0 1
Transmitted
waveform
UHL
I t
Received
Y {t)'
waveform
Detected
sequence dk 1 o l
Figure 6.1a Signal detection.
of correct and incorrect decisions. The decision rule is then applied to the
detection problem when decisions are based on single or multiple observations.
Finally, we extend the results to the case o f M-ary detection (choosing between
one o f M possibilities) and continuous observations.
( 6 . 1)
P ( H 0\y) h.
which means choose Hi when the ratio is > 1 , and choose H 0 when the ratio is
< 1.
fY\H,(y\Hi)P(Hj)
fr(y)
P ( H i ) f y ]Hl( y l » l ) "■
P(.Ho)fY\Ha(y\Ho) «o
or
£ P (H o)
fr\H.(y\H 0) h0 P {H {)
(Note that the equal sign may be included in favor o f either H y or Hn). The
ratio
fy\H,(y \H i) (6.3)
L(y) =
fy\Ha(y I^ o)
346 SIGNAL DETECTION
is called a likelihood ratio and the M AP decision rule consists of comparing this
ratio with the constant P{H(f)IP{Hf), which is called the decision threshold.
L ( Y ) is called the likelihood statistic, and L ( Y ) is a random variable.
In classical hypothesis testing, decisions are based on the likelihood function
L (y ), whereas in the decision theoretic approach, the a priori probabilities and
costs associated with various decisions will also be included in the decision rule.
In signal detection we usually take into account costs and a priori probabilities.
If we have multiple observations, say Y lt Y2, ■ . . , Y„, on which to base
our decision, then the M AP decision rule will be based on the likelihood ratio
Let Di represent the decision in favor o f //, . Then, the first two conditional
probabilities denoted by P (D 0\H0) and P ( D l\Hl) correspond to correct choices
and the last two, P ( D l \H0) and P ( D 0\Hl), represent probabilities o f incorrect
decisions. Error 3 is called a “ type-I” error and 4 is called a “ type-II” error. In
radar terminology, P ( D l \H0) is called a “false alarm probability” ( P F) and
P ( D 0\Hl) is called the “probability o f a miss” (P M). The average probability
o f error Pe is often chosen as the performance measure to be minimized, where
Pe = P ( H 0) P ( D i |H q) + P i H O P iD o lH ,)
EXAMPLE 6.1.
Y = X + N
SOLUTION
(a) Let
H 0- 0 transmitted
H t- 1 transmitted
We are given that P ( H 0) = !, P ( H i) - i
and
fr\Hl( y | t f . ) = f r \ x ( y |1)
9
exp
exp
-2 < > - l)2
L(y)
exp
exp <
2 ^ - D H0
»l l l
y ^ ^ + - In (3) - 0.622
Ho 1 V
P (A )| tfi) = P (y G R o l H , ) = f f nHt(y \ H 1) d y
JRo
(■622 3 r 9 -|
= L \ 7 ^ eX P [ ~ 2 (>' - 1 )!J ^
= 2(1.134) = 0.128
and
Pe = P ( / f 0)P(£>1|//0) + P ( H l) P ( D 0\Hl)
= 1 42(1-866) + ^ 2(1-134)
= .055
alarm and an optimum decision rule has to take into account the relative costs
and minimize the average cost. We now derive a decision rule that minimizes
the average cost.
If we denote the decisions made in the binary hypothesis problem by D,,
i = 0, 1, where D 0 and D x denote the decisions in favor o f H 0 and H u re
spectively, then we have the following four possibilities: (£>,, Hj), i — 0, 1, and
j = 0, 1. The pair (D ,, Hj) denotes Hj being the true hypothesis and a decision
of £>,. Pairs ( D 0, H 0) and ( D x, H x) denote correct decisions, and ( D x, H 0) and
(£>o, H {) denote incorrect decisions. If we associate a cost C,; with each pair
(Di, Hj), then the average cost can be written as
C = 2 2 C,jP(Di, Hj)
/ =0,1
/—0,1 (6.5)
= 2 2 CijP(Hj)P(Dt\Hj)
where P( Dt\Hj) is the probability o f deciding in favor of //, when Hj is the true
hypothesis.
A decision rule that minimizes the average cost C, called the Bayes’ decision
rule, can be derived as follows. If R0 and Rx are the partitions o f the observation
space, and decisions are made in favor of H0 if y G R0 and H { if y G R ly then
the average cost can be expressed as
+ C0lP ( H t) f f YlHi(y \ H i )d y
Jr„
350 SIGNAL DETECTION
and make use of the fact that R t U R0 = ( —°°, “ ), and R l 0 /?0 = 0, then we
have
Since the decision rule is specified in terms of R0 (and /?t), the decision rule
that minimizes the average cost is derived by choosing R0 that minimizes the
integral on the right-hand side of Equation 6 .6 . Note that the first two terms in
Equation 6.6 do not depend on R 0.
The smallest value o f C is achieved when the integrand in Equation 6.6 is
negative for all values o f y £ R0. Since C0i > C u and C w > C00, the integrand
will be negative and C will be minimized if R0 is chosen such that for every
value of y E R 0,
Choose H 0 if
Note that, with the exception of the decision threshold, the form o f the Bayes’
decision rule given before is the same as the form of the MAP decision rule
given in Equation 6.2! It is left as an exercise for the reader to show that the
two decision rules are identical when C w - Ca0 = Cot - Cu and that the Bayes’
decision rule minimizes the probability o f making incorrect decisions when
Coo = Ct, = 0 and Cio = Cot = 1.
B I N A R Y D E T E C T IO N W ITH A S I N G L E O B S E R V A T I O N 351
W y | # .) Hi
( 6 . 8)
Ho
where y is the decision threshold. Thus, only the value of the threshold with
which the likelihood ratio is compared varies with the criterion that is optimized.
In many applications including radar systems, the performance of
decision rules are displayed in terms of a graph of the detection probability
Figure 6.3a Conditional pdfs (assumed to be Gaussian with variance = cr2); PF;
and PD.
352 SIGNAL DETECTION
P f ----- ^
(A )
In the binary detection problem shown in Figure 6.1a, the receiver makes its
decision by observing the received waveform for the duration of T seconds. In
the T-second interval, the receiver can extract many samples of Y (f), process
the samples, and make a decision based on the information contained in all o f
the observations taken during the symbol interval. We now develop an algorithm
for processing multiple observations and making decisions. We will first address
the problem of deciding between one of two known waveforms that are corrupted
by additive noise. That is, we will assume that the receiver observes
where
- and s0(t) and s t(t) are the waveforms used by the transmitter to represent 0 and
1, respectively. In a simple case s0(t) can be a rectangular pulse of duration T
and amplitude - 1 , and Sj(f) can be a rectangular pulse of duration T and
amplitude 1. In any case, s0(f) and SiO) are known deterministic waveforms;
the only thing the receiver does not know in advance is which one of the two
waveforms is transmitted during a given interval.
Suppose the receiver takes m samples of Y(l) denoted by Y i, Y2, ■ ■ ■ , Ym.
Note that Yk = Y(tk), tk £ (0, T), is a random variable and y k is a particular
value of Yk. These samples under the two hypothesis are given by
y k - so.k + n k under H 0
yk — si,k + n k under H x
where j 0i* = s0(tk), and j lt* = are known values of the deterministic
waveforms s0(0 and ^rCO-
Let
and assume that the distributions of Y under H 0 and H t are given (the distribution
of Yi, Y2, ■ • • , Ym will be the same as the joint distribution o f N {, N2, . . . ,
Nm except for a translation of the means by s0.k or k = 1 ,2 , , m).
A direct extension o f the MAP decision rule discussed in the preceding section
leads to a decision algorithm of the form
n _ /Y t//,(y l# i) J P ( H 0)
(6.9)
{y) /v;„D
(y|^o) PW
[No
Snn(/) = j 2 \ f\ < B ( 6 . 10)
elsewhere
l 0
where N0I2 is the two-sided noise power spectral density and B is the receiver
bandwidth, which is usually the same as the signal bandwidth. N(t) with the
foregoing properties is called bandlimited white Gaussian noise. It was shown
in Section 5.5 that
and
_ , , ., „ sin 2tt5 t _
R „ n( t) - N0B (2^ 5 t) (6.12)
and
£ {Y | H ,} = [su , s u2, . . . , SliM] r; covar{E*, Yj \Hl) = 5 kj(N0B)
Hence
1 (y* - siky
/v,„,(y|N,) = n= 1 V 2 ttct
exp
2a-2 ,
(6.13)
m m
X
k= 1
X y^o
k=\ Ho
cr2 In
P(H0)
.W ).
X
k=1
(6.14)
BINARY DETECTION WITH MULTIPLE OBSERVATIONS 355
or
where
The weighted averaging scheme given in Equation 6.14 is also called a matched
filtering algorithm. The “ dot” products, y 7^ and y rs0, indicate how well the
received vector y matches with the two signals Sj and s0.
(6.16)
(6.17)
and the receiver can obtain and process an uncountably infinite number of
independent samples observed in the interval (0, T).
A s ffi-> ® , the samples become dense, and the left-hand side of the decision
rule stated in Equation 6.14 becomes
This is the logarithm o f the likelihood ratio, and the receiver has to make its
decision by comparing this quantity against the decision threshold 7 . So the
356 SIGNAL DETECTION
[ Ty ( 0 * i ( 0 dt - f y ( t ) s 0(t) dt (6.18)
Jo Jo
A block diagram of the processing operations that take place in the receiver
is shown in Figure 6.4. The receiver consists o f two “ cross-correlators” (two
multiplier and integrator combinations) whose outputs are sampled once every
T seconds. When the output o f the correlator for signal sx{t) exceeds the output
of the correlator for s 0(f) by y, a decision in favor o f s,(r) is reached by the
receiver. Once a decision is reached, the integration is started again with zero
initial condition, and continued for another T seconds when another decision is
made. The entire procedure is repeated once every T seconds with the receiver
outputing a sequence of Is and Os, which are the estimates of the transmitted
sequence.
The value of the decision threshold y can be determined as follows. Let
z = r
Jo
dt
Then, the conditional distribution o f the decision variable Z under the two
hypotheses can be determined from
z = f [j , ( 0 + N ( 0 ] 0 i ( f ) - s o( 0 1 dt under H\
Jo
and
Now, Z has a Gaussian distribution and the binary detection problem is equiv
alent to deciding on H 0 or H x based upon the single observation Z. The MAP
decision rule for deciding between H a and H x based on Z reduces to
fz]n1(z\Hi) ih P (H „)
W * l# o ) P ( H X)
If fz\H, is Gaussian with mean |xt and variance u 2 (which can be computed using
the results given in Section 3.8), and fz\H0 is Gaussian with mean p,0 and variance
<j2, the decision rule reduces to
m + P-o + g2 j f P(H q)
(6.19)
2 M.i - M-o \P(Hi)
where
z = \ [ y ( t) s 0(t) dt
Jo Jo
Comparing the right-hand side o f Equation 6.19 with Equation 6.18, we see that
the decision threshold y is given by
M'o + p -i + <j 2 }n / P ( H 0)
( 6 . 20)
2 M-i — M-u n \ P (H ,)
EXAMPLE 6.2.
slW Soft)
0 1 ms 0 1 ms
P(lsent) = P (i?1) = V2 PCO sent)=,P(ff0) = V2
F ig u r e 6 .5 a S ig n a lin g w a v e f o r m s in a b in a r y c o m m u n i c a t i o n s y s te m .
F ig u r e 6 .5 6 D e n s it y fu n c t i o n s o f th e d e c i s i o n v a r ia b le .
BINARY DETECTION WITH MULTIPLE OBSERVATIONS 359
(a) Suppose the receiver samples the received waveform once every 0.1
millisecond and makes a decision once every millisecond based on the
10 samples taken during each bit interval. What is the optimum decision
rule and the average probability of error?
(b) Develop a receiver structure and a decision rule based on processing
the analog signal continuously during each signaling interval. Compare
the probability of error with (a).
SOLUTION:
(a) We are given that Y(t) = X (t) + N(t). With the SNN( j ) given, we can
show that samples o f N(t) taken 0.1 milliseconds apart will be inde
pendent, and hence, the decision rule becomes (Equation 6.15),
Hi l
y [si ~ s0] =£ —[sfs! — S(fso]
H0 2.
with
s f = [2, 2........... 2]
so = [ 1 , 1 , • • ■ , 1 ]
or
Hi 3m
2 * ?:
Ho ~2
Dividing both sides by the number of samples, we can rewrite the de
cision rule as
1 £
— Zj 1-5, m = 10
m i h 0
which implies that the receiver should average all the samples taken
during a bit interval, compare it against a threshold value of 1 .5 , and
make its decision based on whether the average value is greater than
or less than 1.5. The decision variable Z,
1
Z =
m
has a Gaussian distribution with a mean of 2 under H t and 1 under H 0.
The variance of Z can be computed as
a 2Z = variance of
= <t n
2
1/ m
360 SIGNAL DETECTION
where
Pe = P ( H 0) T fz\H0(z\H0) dz
J 1.5
+ P ( H J J L5 / Z|Wi(z | //i) dz
when m = 10
when m = 1
or
ample 3.18 as
Qy
°z
2 BT
Where the B T product is equal to (106) (10-3). Hence
.2
2
°z =
2000 2000
io-
The reader can show that y' = 1.5 and
0.5
Pe = Q
V W ~\
= 0
L(y)
- fw (y\H i) 5 p (h o) ( 6 .21)
/W y | tfo ) p (h ,)
£ { Y |//0} = s0; E lY l H ,} = s,
where X,v is the covariance (autocorrelation) matrix o f the noise samples N(f,),
i = 1, 2, . . . , m. That is the (i, j ) th entry in XN is
[X*]tf = £{A(OA(f;)}
(In the case o f uncorrelated samples, X A- will be a diagonal matrix with identical
entries of g n
2).
362 SIGNAL DETECTION
y T X N' Sj - y TX y 1 Sq + - s f s0 — - s f X N S1 (6.23)
Now, suppose that X1; X2, . . . , Xm, are distinct eigenvalues o f X N and V 1(
V 2> • • . , V m are the corresponding eigenvectors of X N. Note that the m eigen
values are scalars, and the normalized eigenvectors are m x 1 column vectors
with the following properties:
where
[V „ v 2, . • •> VTm \m
1 xm (6.25)
and
"a. 0 •■ o'
0 X2 •• 0
(6.26)
0 0 •• K
where
dfL
&
;>
>
So • , v £ s 0;r,
ii
Jo,* = Vfso
o
>
s( ii • , VTnSlY, j{,* = v*s,
y' = [Vfy, V£y, . . • , v i y Y ,
II
cr
-
(6.28)
<
are the transformed signal vectors and the observations.
The decision rule given in Equation 6.27 is similar to the decision rule given
in Equations 6.14 and 6.15. The transformations given in Equation 6.28 reduce
the problem of binary detection with correlated observations to an equivalent
problem of binary detection with uncorrelated observations discussed in the
preceding section. With colored noise at the receiver input, the components o f
the transformed noise vector are uncorrelated and hence the decision rule is
similar to a matched-filter scheme described earlier with one exception. In the
white noise case, and in the case of uncorrelated noise samples at the receiver
input, the noise components were assumed to have equal variances. Now, after
transformation, the variance of the transformed noise components are unequal
and they appear as normalizing factors in the decision rule.
The extension of the decision rule given in Equation 6.27 to the case of
continuous observations is somewhat complicated even though the principle is
simple. We now make use o f the continuous version o f the Karhunen-Loeve
expansion. Suppose g,(r), g2(t), . . . , and \2, • ■ • satisfy
m lim 2 Nigi(t)
/=! 1
N, = f N(t)gi{t) dt
J 0
Y(t) in terms o f gi(t), g2(t), • ■ • and represent each one by a set o f coefficients
1 = 1 , 2 , . . . , where
Equation 6.32 transforms the continuous observation Y(t), t E [0, T] into a set
of uncorrelated random variables, Y[, Y'2, . . . . In terms o f the transformed
variables, the decision rule is similar to the one given in Equation 6.27 and has
the form
The only difference between Equation 6.27 and Equation 6.33 is the number of
transformed observations that are used in arriving at a decision. While Equation
6.33 implies an infinite number of observations, in practice only a finite number
o f transformed samples are used. The actual number N is determined on the
basis the value of N that yields
That is, we truncate the sum when the eigenvalues become very small.
formulate a decision rule. Techniques for estimating parameter values from data
are discussed in Chapter 8 .
Another approach involves modeling the parameter as a random variable
with a known probability density function. In this case the solution is rather
straightforward. We compute the (conditional) likelihood ratio as a function of
the unknown parameter and then average it with respect to the known distri
bution o f the unknown parameter. For example, if there is one unknown pa
rameter 0 , we compute the likelihood ratio as
and then derive the decision rule. We illustrate this approach with an example.
Suppose we have a binary detection problem in which
n o = m + N(t)
y/.-, _ fji(0 = A under H u 0 < r< T
' ' | j 0(0 = 0 under H0
N(t) is additive zero-mean white Gaussian noise with a power spectral density
SNN(J) = Afo/2, and A is the unknown signal amplitude with a pdf
2a
/a (a ) = ¥ exp 0 <a\ R > 0
R
= 0 a < 0
Z = f r [ * ( 0 + AI(0] dt
Jo
__ faT + W under Hx
[0 + W under H0
\L\V ~ 0
366 SIGNAL DETECTION
and
N0T
ow =
f Z\Hi,a
L(z\A = a) =
fz\H0,a
W T2 - 2 aT z ) ]
exp |f ------
2a 2
w J
and
L(z) = f L( z \A = a ) f A( a ) da
Ja
Completing the integration, taking the logarithm, and rearranging the terms, it
can be shown that the decision rule reduces to
(6.35)
6.5 M - A R Y D E T E C T IO N
Thus far we have considered the problem o f deciding between one of two al
ternatives. In many digital communication systems the receiver has to make a
P(H,\y) =
fr(y)
The decision rule given in Equation 6.36 can be easily extended to the case
of multiple and continuous observations. We illustrate the M -ary detection prob
lem with a simple example.
EXAMPLE 6.3.
Yti) = X m + N ( t )
-3 2 -1 0 - y
2 3
7 (0 = X (t) + N(t)
. where N (t) is a zero-mean Gaussian random process with the power spectral
density function
The receiver takes five equally spaced samples in each signaling interval and
makes a decision based on these five samples. Derive the decision rule and
calculate the probability of an incorrect decision.
SOLUTION: During the first “ symbol” interval [0, 10 ps], X {t) is constant,
and the samples o f Y(t) taken 2 ps apart are independent (why?). The MAP
decision rule will be based on
-O '* - A )2
2cr2
or
where y = ^ ^
3 t =i
Thus, the decision rule reduces to averaging the five observations and choosing
the le_vel “ closest” to y as the output. The conditional pdfs of the decision vari
able Y and the decision boundaries are shown in Figure 6.7b. Each shaded
area in Figure 6.7b represents the probability of an incorrect decision and the
average probability of incorrect decision is
Pe = ^ P(incorrect decision|//,)F(//i)
i=i
~ i (6>a ( v ^ 5
The factor 6 in the preceding expression represents the six shaded areas and
cr2/5 is the variance of Y in the conditional pdfs,
Pe = j Q ( V l 2 3 ) = .00025
6.5 SUMMARY
In this chapter, we developed techniques for detecting the presence of known
signals that are corrupted by noise. The detection problem was formulated as
a hypothesis testing problem, and decision rules that optimize different per
formance criteria were developed. The MAP decision rule that maximizes the
370 SIGNAL DETECTION
r ( \ _ fy\m(y\H 0 5
7
/W y l^ o )
In the case o f detecting known signals in the presence of additive white Gaus
sian noise the decision rule reduces to
Hl
- so] 7'
Ho
This form o f the decision rule is called a matched filter in which the received
waveform is correlated with the known signals. The correlator output is com
pared with a threshold, and a decision is made on the basis of the compari
son. Extension to M -ary detection and detecting signals mixed with colored
noise were also discussed briefly.
6.7 REFERENCES
S e v e r a l g o o d r e fe r e n c e s th a t d e a l w ith d e t e c t io n t h e o r y in d e ta il a r e a v a ila b le t o th e
in t e r e s t e d r e a d e r . T e x t s o n m a th e m a tic a l sta tis tic s ( R e f e r e n c e [1 ] f o r e x a m p l e ) p r o v i d e
g o o d c o v e r a g e o f t h e t h e o r y o f h y p o t h e s is t e s t in g a n d s ta tis tica l i n f e r e n c e , w h ic h is th e
b a s is o f th e m a t e r ia l d i s c u s s e d in this c h a p t e r . A p p l i c a t i o n s t o s ig n a l d e t e c t i o n a r e c o v e r e d
in g r e a t d e t a il in R e f e r e n c e s [ 2 ] - [ 4 ] . R e f e r e n c e [2 ] p r o v id e s c o v e r a g e at t h e in t e r m e d ia t e
l e v e l a n d R e f e r e n c e s [3 ] a n d [4 ] p r o v i d e a m o r e t h e o r e t ic a l c o v e r a g e at t h e g r a d u a t e
le v e l.
6.8 PROBLEMS
6.1 Given the following conditional probability density functions o f an obser
vation Y
fr\Ha(y\H0) — 1, 0 :£ y < 1
PROBLEMS 371
and
6.2 Suppose that we want to decide whether or not a coin is fair by tossing it
eight times and observing the number o f heads showing up. Assume that
we have to decide in favor of one of the following two hypotheses:
= y>°
and
r>0
where A is the “ signal amplitude” when a target is present and N0 is the
variance o f the noise.
Derive the MAP decision rule assuming P( H0) = 0.999.
6.4 Figure 6.8 shows the conditional pdfs of an observation based on which
binary decisions are made. The cost function for this decision problem is
and compare it with the average cost of the decision rule derived in
part a.
6.5 Assume that under hypothesis Ff, we observe a signal o f amplitude 2 volts
corrupted by additive noise N. Under hypothesis H0 we observe only the
noise N. The noise is zero-mean Gaussian with a variance o f 9. Thus, we
observe
with
3 ( 9
//v («) =
V^TT eXP \ 2nj
6.6 With reference to Problem 6.5, find the decision rule that minimizes the
average cost with respect to the cost function
b. Show that the value o f P (/7,) that maximizes C (worst case) re
quires
6.9 With reference to Problem 6.8, assume that the a priori probabilities are
not known. Derive the minmax decision rule and calculate C. (Refer to
Problem 6.7 for the derivation of the minmax decision rule..)
J = PM + \(PF — cl)
£
fY\H0{y\Ha) Ha
6.11 Suppose we want to construct a N-P decision rule with PF = 0.001 for a
detection application in which
P ( H 0) = P ( H 0 = |
0 < y
Si(t) = 4 sin(2Tr/„ /) , 0 £ ( s T, I = 1 ms
P [j,(t)] = P [ j0(0 ] = \
Sm (f) = 10~3 W /H z
s,(t) = V 8 , 0<(/< r , T = 1 ms
6.16 Show that the eigenvector transformation (or projection given in Equation
6.24) transforms the correlated components o f the observation vector Y
to a set o f uncorrelated random variables.
PROBLEMS 375
(0 , 1 )
(0 , - 1 )
Figure 6.9 Signal constellation for Problem 6.19.
6.17 Given that / Y|Ho and / Y|h,, Y — [Yt, Y2]T, have bivariate Gaussian distri
butions with
r
covar{Y|//0} = covar{Y|//,} =
2
P ( H 0) = P { H X) = i
Show that the optimum decision criterion (that minimizes the average cost)
is to choose the hypothesis for which the conditional cost C(H,\y) is small
est, where
C { H b ) = 2 * CvP(Hj\y)
376 SIGNAL DETECTION
1 1 1 1
i i 1 1
- 3 , 1) 1--------------1 — _ Ifi'iL_i « . 1,
1 1 1 1
1 1 1 1
- 1 -------------- f O . - l
i i 1 1
i i 1 1
6.20 Repeat Problem 6.19 for the “ signal-constellation” shown in Figure 6.10.
CHAPTER SEVEN
7.1 IN T R O D U C T IO N
The basic idea of this chapter is that a noisy version of a signal is observed and
the “ true” value of a signal is to be estimated. This is often viewed as the
problem of separating the signal from the noise. We will follow the standard
approach whereby the theory is presented in terms of developing an estimator
that minimizes the expected squared error between the estimator and the true
but unknown signal. Historically, this signal estimation problem was viewed as
filtering narrow-band signals from wide-band noise; hence the name “ filtering”
for signal estimation. Figure 7.1 shows an example o f filtering X(t) — S(t) +
N(t) to produce an estimator S(t) of S(t).
In the first section, we consider estimating the signal S(t), based on a finite
number of observations of a related random process X{t). This is important in
its own right, and it also forms the basis for both Kalman and Wiener filters.
We then introduce innovations as estimation residuals or the unpredictable part
of the latest sample. In the third section we consider digital Wiener filters, and
then in the fourth section, the filter is extended to an infinite sequence of
observations where we emphasize a discrete recursive filter called the Kalman
filter. Finally, we consider continuous observations and discuss the Wiener filter.
Both Kalman and Wiener filters are developed by using orthogonality and in
novations.
In all cases in this chapter, we seek to minimize the expected (or mean)
squared error (MSE) between the estimator and the value of the random variable
378 L IN EA R M I N I M U M M E A N S Q U A R E D E R R O R FILTERING
SU)
S itt)
S 2 (t)
(a)
(b )
X lt )= S ilt ) + N li)
/VvWyWWWv/'AA
- t
(c)
F ig u r e 7 .1 F ilte r in g , ( a ) T h e t w o p o s s ib le v a lu e s o f S(t). ( b ) O n e s a m p le fu n c t io n
o f N(t). ( c ) T w o e x a m p le s o f p r o c e s s in g X ( t) t o p r o d u c e 5 ( t ) .
(or signal) being estimated. In addition, with the exception o f some theoretical
development, we restrict our attention to linear estimators or linear filters.
One other important assumption needs to be mentioned. Throughout this
chapter, we assume that the necessary means and correlation functions are
known. The case where these moments must be estimated from data is discussed
in Chapters 8 and 9.
LINEAR M IN IM U M MEAN SQUARED ERROR ESTIMATORS 379
Ps = £ {5 } (7.1)
and
Indeed
E { ( S ~ a) 2} = a ss + (p.j - af (7.3)
showing that the minimum MSE is the variance, and if the constant a is not the
mean, then the squared error is the variance plus the squared difference between
the estimator a and the “ best” estimator p.s.
‘ This is actually an affine transformation o f X because o f the constant a , but we follow standard
practice and call the affine transformation “ linear.” As we shall see it is linear with respect to a
and h .
380 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
selection of the constants a and h. Note that our estimator S is a linear (affine)
function o f X and that the criterion is to minimize the MSE; hence the name, ■
linear minimum mean squared estimation. We assume that all necessary moments
of X and S are known.
We now show that the best estimator is
a xs
h = (7.4)
& xx
t (Jxs
~
« = - nv-x = M-s--------F-x (7.5)
Vxx
where
Taking derivatives of the MSE and setting them equal to zero we have, because
the linear operations o f expectation and differentiation can be interchanged
8E{(5 - a - h X f }
- 2 E{S - a - hX} = 0 (7-7.a)
da
dE{(S - a - h X f }
-2E{(S - a - hX)X} = 0 (7.7.b)
dh
The coefficients a and h are now the values that satisfy Equation 7.7 and should
be distinguished by, say a0 and A0, but we stay with the simpler notation. Equa
tion 7.7.a can be written
- a - h\i.x = 0 (7.8)
Recognizing that
E { X 2} = a xx +
L IN EA R M I N I M U M M E A N S Q U A R E D E R R O R E S T I M A T O R S 381
and
O ats + = a ^ x + h u xx + h[Lx
or
h = ^
a XX
Thus, the solutions given do provide at least a stationary point. We now show
that it is indeed a minimum.
Assume S = c + dX, where c and d are arbitrary constants then
E{(S - c - d X) 2} = E{[S - a - h X + (a - c) + (h - d ) X ] 2}
= £ {(S - a - h X) 2} + £ {[(a - c)
+ (h - d ) X } 2} + 2(a - c)E{S - a - h X}
+ 2(h - d)E{(S - a - h X ) X }
The last two terms are zero if a and h are the constants chosen as shown in
Equation 7.7. Thus
£ {( S * - c - d X f } = £ {(S - a - h X ) 2} + E{[{a - c) + {h - d ) X ] 2}
which proves that a and h from Equations 7.4 and 7.5 minimize the MSE.
382 LINEAR M INIM UM MEAN SQUARED ERROR FILTERING
We now obtain a simpler expression for the minimum MSE. Since (see
Equation 7.7)
(7yc
= <rss - — (7.10)
v XX
or
Pv.s'O'.v.vO'.s'i-
£ {(S - a - h X f } = = crM ( l - pxvs) (7.11)
o-.v.v
where
a xs
'xs ------ and
O-A-CTi'
From Equation 7.11 it follows that if the correlation coefficient is near ±1 , then
the expected squared error is nearly zero. If pxs is zero, then the variance of S
is not reduced by using the observation; thus observing X and using it in a linear
estimator is of no value in estimating S.
EXAMPLE 7.1.
The observation X is made up of the true signal S and noise. It is known that
E at = 10, Es = 11, &xx = 1, css = 4, = .9. Find the minimum linear MSE
estimator of 5 in terms of X. What is the expected squared error with this
estimator? Find the estimate of the true signal if we observe X = 12.
LINEAR M INIM UM MEAN SQUARED ERROR ESTIMATORS 383
SOLUTION:
Using Equations 7.4 and 7.5 we obtain
= (1)(2)(.9) = Lg
a = 11 - (1,8)(10) = - 7
where
Then with these definitions d(X, Y) = 0 implies the vectors X and Y are
orthogonal. The results o f Section 7.2.2 can be visualized in a way that will be
384 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
F ig u r e 7 .2 O r t h o g o n a l i t y P r in c ip le .
useful in future developments by referring to Figure 7.2. In this figure, all means
are assumed to equal 0.
Note that:
In more complicated linear filtering problems that follow, we shall see that
these two conditions continue to be fundamental descriptors of the optimum
linear filters.
5 = h0 + 2 h,X(i)
i* I
E S hn i /i,x ( o j = o (7.12.a)
E 2 hiX(i) = 0,
XU) 1, ■ ■ . , n (7.12.b)
*=1
LINEAR M IN IM U M MEAN SQUARED ERROR ESTIMATORS 385
Note that Equation 7.12.b can be visualized as in the previous section as stating
that the error is orthogonal to each observation. In addition, Equation 7.12.a
can be visualized as stating that the error is orthogonal,to a constant. These two
equations are called the orthogonality conditions.
Equation 7.12.a can be converted to
Us - 2 hiii-xw (7.13)
/—I
where
and
Equations 7.13 and 7.14 can be shown to result in a minimum (not just a
saddle point) using the technique used in Section 7.2.2. (See Problem 7.8.)
The n equations of Equation 7.14 will now be written in matrix form. Defining
X r = [A X 1 ),* (2 ), . . . , X { n ) }
hr = [hx, h 2, . . . , hn]
^•XX — [ C x x ( * > /) ]
= [ o 's x ( l ) , Os;r(2)> ■ • • , t r Sj r ( , ) ]
— 2 sx (7.15)
386 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
h = Xxx'X-sx (7.16)
EXAMPLE 7.2.
The signal S is related to the observed sequence X ( l ) , X(2). Find the linear
MSE estimator, S = h0 + hyX ( 1) + h2X { 2). The needed moments are:
M-s = 1
as = .01
M-x(i) = -02
Cxx( l, 1) = (-005)2
M-x(z) = -006
Cxx(2, 2) = (.0009)2
Cxx( 1, 2) = 0
o'sxo) = -00003
(Tsxji) — .000004
SOLUTION:
Using Equation 7.16
-1
‘ .00003 ‘
o
o
o
0 (,0009)2_ .000004
.00003 6
hl ~ .000025 “ 5
.000004 400
hl ~ .00000081 ~ 81
LINEAR M INIM UM MEAN SQUARED ERROR ESTIMATORS 387
S
■946 + ^ > + l r ^
Now consider the case where S(n) is the variable to be estimated by 5(n),
and the N observations are S(n — 1), S(n — 2), S(n — 3), . . . , S(n — N),
that is, X ( l ) = S(n - 1), . . . , X ( N ) = S(n - N). Also assume S(n) is a
stationary random sequence with 0 mean. Then h0 = 0, and Equation 7.14
becomes
N'
EXA M P LE 7.3.
If Rss(n) = a", find the best linear estimator o f S(n) in terms of the previous
N observations, that is, find
SOLUTION:
Equation 7.18 becomes
I_________
l_________
1 a a2 ■ • aN- r
53 ^ ^ *
ala • aN~2
1!
------ ,
£
a'v- 1 • 1
*3 j
It is obvious that
hr = [a, 0, . . . , 0]
S(ri) = aS(n — 1)
is the minimum MSE estimator. Because S(n) is the “ best” estimator, and
because it ignores all but the most recent value of S, the geometric autocorre
lation function results in a Markov type o f estimator. '
E S - h0 - 2 W ) s - h0 2 h,X(j)
<
‘ =L
= E 2 h,X(i) 5 h0E S - h0
1=1
~ 2 hjE 2 W i)
i- 1
LINEAR M INIM UM MEAN SQUARED ERROR ESTIMATORS 389
The last two terms are zero by the orthogonality condition, Equation 7.12. Thus
n
E{(S - 5 )2} = cri + M-S - W- - 2 hilE{x (Os } ~ M^oP-s]
i=l
E{(S - S )A (t)} = 0
and
E{(S - S)} = 0
S = h„ + 2 hiX(i)
2= 1
and hence
£ {( 5 - 5 ) 2} = E{(S - 5 )5 } (7.21)
390 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
One other equivalent form of the residual error is important. From Equation
7.20
E{SS} = £ { i 2} (7.22)
This equation can be visualized using Figure 7.2. That is, if S = h X then the
square of the error, S — S, is equal to S2 — S2. If S is the projection of 5 on
a hyperplane, then the same conceptual picture remains valid.
We again consider S(n) and note that Equations 7.18 define the weights in
a finite linear digital filter when S(n) is a zero-mean wide-sense stationary process
and X(i ) = S(n - /), i = 1, 2, . . . , N. In this case Equation 7.19 becomes
jV
P(N) = £ {[S (« ) - 5 (« )]2} = R ^ O ) - 2 h,Rss(i) ' (7.24)
1= 1
Find the residual MSE for Example 7.2 with the constants chosen in Example
7.2.
SOLUTION:
Using Equation 7.19
SOLUTION:
Using Equation 7.24
EXAMPLE 7.6.
5 = X2
and
= 0 elsewhere
SOLUTION:
In this case from the assumed density of X and the relation S = X 2, the
moments are
E{X} = 0
£{S} = E { X 2} = i
E{XS} = E { X 3} = - x 3 dx = 0
l -1 2
392 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
hY = 0
P( 1) = E{(S - S)2} = a !
Thus the best fit is a constant, and the MSE is simply the variance. The best
linear estimator is obviously a very poor estimator in this case.
S = h0 + h , X + h2X 2
This estimator would result in a perfect fit for the Example 7.6. (See Problem
7.10.) However, is this a linear estimator? It appears to be nonlinear, but suppose
we define X 0 = 1, X { = X and X 2 = X 2; then
S —H $ X q 4- h\X[ + h2X 2
can be used. This estimator is linear in the h2s and thus is within the theory
developed.
LINEAR M IN IM U M MEAN SQUARED ERROR ESTIMATORS 393
S* = £ {S | *(1 ), . . . , * ( « ) } (7.25)
Thus in order to minimize the multiple integral, because f x(l) x(n) > 0, it
is sufficient to minimize the conditional expected value of (S - S f . We show
that S* as given by Equation 7.25 does minimize the conditional expected value.
Let S be another estimator (perhaps a linear estimator).
Note that given A^(l) = x,, . . . , X(n) = x„, the second term in the last equation
is a nonnegative constant and the third term is
Thus
Equation 7.26 shows that the conditional expectation has an expected mean
squared error that is less than or equal to any estimator including the best linear
estimator.
394 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
EXAMPLE 7.7.
Find (a) the minimum MSE estimator (b) the linear minimum MSE estimator
of S based on observing X and the MSEs if
s —x
f x( x) = e~\ x > 0
= 0 x < 0
SOLUTION:
(a) By identification of the mean of a Gaussian random variable
X2
5* = £ {5 1 ^ } = X + —
£{5} = Ex | A" + y j = 2
E{SX} = Ex ^ x [ x + y j j 5
a = 2 - 3 = -1
Thus
S = -1 + 3X
and the reader can show that E{(S — 5)2} = 2
The reader might wonder why linear mean squared estimators are stressed,
when we have just shown that the conditional expectation is a better estimator.
The answer is that the conditional expectation is not known unless the joint
LINEAR M INIM U M MEAN SQUARED ERROR ESTIMATORS 395
£{AT(1)|AT(2)} = + 9 - W ) - P2]
02
or
(7.27)
EXAMPLE 7.8.
Find:
(a) The marginal density functions of X and of S.
(b) fs\x-
(c) Simplify the expression found in (b) and identify the conditional mean
of S given X.
(d) Identify the conditional variance, that is
EsiX{[S -
(e) F i n d E ^ E ^ S - £ s)*(S|*)]2|*}
SOLUTION:
(a)
Similarly
= x} = |
INNOVATIONS 397
and
(d) The conditional variance is 3/4 and in this case is a constant (i.e. in
dependent of x).
(e) £*{3/4} = 3/4
7 .3 IN N O V A T IO N S
£ {S } = E{X(i)} = 0, i = l, . . . , n
and that all variables are jointly Gaussian. If the variables are not jointly Gaus
sian but the estimators are restricted to linear estimators, then the same results
would be obtained as we develop here.
Now we can find the optimal estimator 5, in terms of X ( l ) by the usual
method (Note that subscripts are used to differentiate the linear estimators when
innovations are used.)
S, = h , X ( 1) (7.28)
or
E{SX(l)}
(7.30)
E { X \ 1)}
Because X(2) and 26(1) both have zero means and because they are jointly
Gaussian we know that
C2 Cjf(nA'(2)
a, — p — (7.33)
Cl
and p is the correlation coefficient between 26(1) and 26(2) and cr, =
V'(TW)^(/). Thus
Note that
2. E { V X(2)} = E | z (2 ) - ^ 2 6 ( 1 ) } = 0 (7.35)
We now seek the estimator, S2, of S based on 26(1) and the innovation of 26(2),,
that is
where 26(1) is the “ old” observation and Vx (2) is the innovation or unpredictable
[from 26(1)] part of the new observation 26(2).
IN N OVA TION S 399
We know that Vx {2) and 26(1) are orthogonal (Equation 7.36). Thus Equation
7.15 becomes
C a (1)A(1) 0 hi
0 V v x ( 2 ) Vx (2)_ h 2_
_ °W .r (2)_
= E{SVX{2)} =
x ( 2 )“} V V xiljV ;
Note that these expected values can be found using Equation 7.34. Furthermore,
E{SVX{2)} is called the partial covariance of 5 and 26(2), and the partial
covariance properly normalized is called the partial correlation coefficient. The
partial correlation coefficient was defined in Chapter 5.
We now proceed to include the effect of 26(3) by including its innovation,
M 3 ) . By definition
Now we note from Equation 7,37 that 26(2) is a linear combination of 26(1) and
Vx {2). Indeed
Thus, knowing 26(1) and 26(2) is equivalent to knowing 26(1) and Vx {2), So
£ {36(3)| 36(1), M 2 ) } = ^ 2 6 (1 ) + b , V x (2 ) (7 .4 3 )
400 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
where
£ {Z (3 )Z (1 )}
(7.44)
E{X(1)-}
E { X ( 3 ) V x {2)}
(7.45)
E { VX( 2)2}
and
Note that
E{SVX{ 3)}
(7.49)
E { V x (3Y}
S W /) (7.50)
where
Vx ( l) = X ( l ) (7.51)
INNOVATIONS 401
= a s2 - (7.54)
= a s2 - 2 t f a Vx Q)Vx(j) (7.55)
/'-1
Vx ( l) = X ( l ) (7.56)
and we further restrict our attention to the zero-mean Gaussian case such that
E{X( j ) \X( l ), . . . , X ( j — 1)} is a linear function, that is
In this case, Vx (i) is a linear function of X(i), X(i - 1), . . . , 3 f(l). Equation
7.57 can be rewritten as
Note that Vx (i) is the unpredictable part of X(i). Equation 7.59 thus serves as
a decomposition of X( i ) into a predictable part, which is the conditional ex
pectation, and an unpredictable part, Vx(/).
Thus, by Equations 7.57 and 7.58, the innovation Vx (i) is a linear function
of X(n). We argue later that X( n) is also a linear function of Vx (n). Indeed:
V * (l) = A-(l)
* ( 1 ) = Vx( l ) (7.62)
and the l(i, j ) are constants which can be derived from set of 7 (i, j).
We now summarize the properties o f Vx (n) if X(n) is a zero-mean Gaussian
random sequence.
r x = v* (7.67)
where
X r = CAT(l), • • - , 26(n)]
T is an n x n matrix
V l = [ ^ ( 1 ) , ■ • ■ , Vx(n)]
Note that in order to be a realizable (i.e., causal— see Section 4.1.1) filter,
T must be a lower triangular matrix because VA-(1) must rely only on X( l ) ,
Vx(2) must rely only on X ( l ) and X( 2) , and so on. That is
0
y(n, 1) 7 (n, 2) 7 (n, 3) ••• 7 (n, «)_ X (n \ y x{n\
Note that Equations 7.67 and 7.68 are identical equations that generalize Equa
tions 7.59-7.61.
We now recall that
£ {A r(/)} = 0,
and we will call Rxx(i, j) the covariance between Z ( l ) and X(2), that is
E { X ( i ) X { j ) } = Rx x ( i , j )
7(1. 1) = 1 (7.69)
E { V X(1)VX(2)} = 0 = E{ X( l ) [ y( 2, l ) X ( l ) + X(2)]}
= 7(2, I f e O , 1) + Rxx{ 1 ,2 ) = 0
£ aw(1, 2)
(7.70)
***(1. 1)
404 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
£ {^ (1 )^ (3 )} = 0
E{ VX(2) VX(3)} = 0
EXAMPLE 7.9.
Find the transformation T when E{X{n)} = 0 and the covariance matrix o fX (l),
X(2), X(3) is
i *
1 i
i l
SOLUTION:
We know
1 0 0
r= y(2, l) 1 0
7(3,1) 7(3,2) 1
7(2, 1) = - i
E { V x ( 1 ) V x ( 3)} = 0
E {* (1 )[7 (3 , l)J f( l) + 7(3, 2 )* ( 2 ) F W(3)]} = 0
7(3, 1) F h(3, 2) + 1 = 0 (7.71)
and
E{ VX(2) VX(3)} = 0
or
7(3, 2) = - i
7(3, 1) = 0
Thus
1 0 0
1
2 1 0
1
0 “ “ 2 1
r 1= l (7.72)
Then
X = L V*
EXAMPLE 7.10.
SOLUTION:
1 0 0
1
2 1 0
1 1
4 2 1
Thus
X ( l ) = Vx (l)
X{ 2) = i V x (l) + Vx (2)
7.4 REVIEW
In the preceding sections we have shown how to find the “ best” linear estimator
of a random variable S or a random sequence S(n) at time index n in terms of
another correlated sequence X(n). The “ best” estimator is one that minimizes
the MSE between S(n) and the estimator S(n). However, the “ best” estimator
is actually “ best” only under one of two limitations; either it is “ best” among
all linear estimators, or it is “ best” among all estimators when the joint densities
are normal or Gaussian.
The best linear estimators are based on projecting the quantity to be esti
mated, S, onto the space defined by the observations V ( l ) , X( 2) , . . . , X(n)
in such a fashion that the error 5 — S is orthogonal to the observations. The
equations for the weights are given in Equation 7.16 and the constant, if needed,
is given in Equation 7.13. It is also important to find how good the best estimator
is, and Equation 7.19 describes the residual MSE.
Finally, the concept of innovations was introduced in order to solve this
relatively simple problem in the same fashion that will be used to derive both
the Kalman and the Wiener filters. The innovation Vx (n) o f a Gaussian sequence
V (l ) , X(2), . . . , X{ n) is the unpredictable part of X( n) . That is, the best
estimator of X( n) given V (l ) , X( 2) , . . . , X( n — 1) is E{X( n) \X( l ) , . . . ,
X( n — 1)} and the innovation of X( n) is the difference between X( n) and
E{X( n)\X(l ) , . . . , X( n — 1)}. The innovations of a sequence are mutually
orthogonal. Thus, from a vector space picture, Vx (n) is the part of X( n) that
is orthogonal to the space defined by V ( l ) , . . . , X{ n — 1).
The weights of the best linear filter can be described in terms o f innovations
and Equation 7.53 gives these weights. Equation 7.55 gives the error in terms
of innovations.
DIGITAL WIENER FILTERS 407
where S(n) is the zero-mean signal random sequence, v(n) is the zero-mean
white noise sequence, and S(n) and v(n) are uncorrelated. Thus, since they have
zero means, they are also orthogonal.
We now assume that the data are stored and X( n) is available for all n, that
is, the filter is no longer finite. We seek the best, that is, minimum linear mean
squared error estimator S, where
Now for practical cases, because memories are finite, the sum in Equation
7.74 will be finite, and the method o f Section 7.2 can be used as illustrated in
Example 7.10 except that in some cases the required matrix inversion of Equation
7.16 would be either impossible or too slow. In this section, we explore other
possibilities.
The MSE given by
£ {[S (n ) - S( n)] 2}
E S(n) ~ 2 h ( k ) X( n — k)
*=-»
X { n - i)
} = 0 for all i (7.75)
or
or
P = £{[<>(«) - 5 ( « ) ] 2}
= E S(n) - 2 h ( k ) X( n - k)
Defining
P( m) = R ss(m) - 2) K k ) R s x { m ~ k) (7.80)
k-
Then P(0) = P. Taking the Fourier transforms of both sides of Equation 7.80
results in
_ r+ 1/2 —
P( m) = PF( f ) exp(+;'2iT/m ) d f
J- 1/2
and
_ f +1/2
P = F(0) = [Sss( / ) - H ( / ) S W( / ) ] d f (7.82)
1 —1/2
EXAMPLE 7.11.
SOLUTION:
From Equation 7.78 we obtain
Sss(f) _ .
# (/) =
5 « (/)
r+ 1/2
P = [ M / ) - S « ( /) l df = 0
J - 1/2
410 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
EXAMPLE 7.12.
R ss(n) = aMo 2
s
SOLUTION:
(a)
Rsx(m ) = E { S ( n ) X ( n + m)}
— £'{5,(n)[5,(n + m) + v(n + m)]}
= R ss(m) = aw o l
Ssx(f) = Ss s ( f ) = <J2
s ^ a 1/1 e x p ( - / 2 i n / )
/= -x
Rxx(n) = R s s ( n ) + R , M )
a^Os + 8 (n)
o l ( l ~ a2) l
$xx(f) — i/l < 4
1 — 2a cos 2-nf + a2
1 + a2 + crj(l — a2) — 2a cos 2tt/
1 — 2a cos 2 ir/ + a2
l/l < 4
(b) If a i = 1, a = 1/2
3
4
# ( / ) = 2 — cos 2 it/ ’ 1/1 <
DIGITAL WIENER FILTERS 411
The filter defined by Equation 7.83.a is a realizable filter. That is, only present
and past values of X are used in estimating S(n). In this case, the filter can be
conceptually realized as shown in Figure 7.3. If, in fact, the correlation between
X and S is zero after a delay of more than M, that is
n
S(n) = 2 K X ( n - k)
k = 0
412 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
SM
F ig u r e 7 .3 R e a l i z a t i o n o f a d ig ita l filte r.
and the solution is o f the same form given in Equation 7.16, that is
h — R o-R xi (7.83.b)
where
hr = [h0, hu . . . , hn]
"• Rxx(n)
Rxx( 1) Rxx(0) •• Rxx(n - 1)
Rxx(n) RXx(n ~ 1) - Rxx(P)
In this case, Equation 7.16 is said to define a finite discrete Wiener filter. The
MSE is given in Equation 7.24.
EXAMPLE 7.13.
SOLUTION:
R x s ( k ) — R s s ( k ) —cr 1
Rx x { k ) = cr! + 8 (k)a\
where
k = 0
m = k A 0
cr! E hi + hpl = °s
/■=!
wtj 2 hi = n a 2
s
/= i ' =>
Thus
y=i
E hi «cr! 4- cr;
«cr!
4- cr; 4- = a!
nu\
h p l
r u r 's
or
ncr2
s + al n + y’
Note that all hi are equal, as should be expected because S is a constant and
the noise is stationary. A lso note that the weights hi change (decrease) as n
increases.
SM
0
P( S i ) = P (S 2) = Vi
2 2
p -s ~ °s = a
a • • • • • • • • • • • • Sjfa)
■0 ------------------------------------------- n
S2(n)
(a)
ib)
-a
(e )
F ig u r e 7 .4 D i f f e r e n t c a s e s f o r E x a m p le 7 .1 3 . (a) T h e t w o p o s s ib le v a lu e s o f S(n).
(b) X ( n ) g iv e n S(n) = 5 ,( n ) and a ; a1, (c ) X ( n ) g iv e n S(n) = Sz(n) a n d ct;< § a 2.
414
DIGITAL WIENER FILTERS 415
S(n) = 2 X{ i )
n + y n fTi _] n + y + n +r -y
The last equation shows S( n) to be the weighted average of the mean of the
observations and the a priori estimate of 0. If the noise to signal ratio y is small
compared with n, then S is the simple average o f the observations. Figure 7.4
shows different cases for this example.
The MSE P(n) is given by Equation 7.24 as
n
^2 i
P{ n) = cts - 2
i=i n + 7
■7
n ex;
“ CTs i -
n + 7. n +
EXAMPLE 7.14.
f l« ( 0 ) = 4 ■
flw (i) = i
Rss( 2) = 1
Rss(n) = 0, n > 2
X( n ) = S( n) + v (« ), 5 and v areuncorrelated
R m(n) = 5, n = 0
= 0, elsewhere
SOLUTION:
Rxs ( n ) — Rss(n)
Rxx( 0) = 1
Rxxil) = i
Rxx( 2) = ?
Rxx( n) = 0, n > 2
There are only two nonzero weights and these weights (/t’s) are the solution to
(see Equation 7.83.b)
n
1 i
2
i i
4 /to n 4i l
X
2
1
1
1
2 hi —
1
2
1
4
i
2
1
i _ h 2_
1
__4_
r i ran “2
/to 3 -i o ’ 4 3
2 2
h\ = ~ 3 ” 5 i 4
2 4
h2 0 3 _4_ 0
This is represented by the filter shown in Figure 7.5, and is usually called a finite
discrete Wiener filter.
results in
S(n) = 2 h ( k ) X( n - k)
-i
S{n) = ^ h( k ) X( n - k) + £ h ( k ) X ( n - k) (7.86)
*=-= *=■o
where the first sum requires future values of the observations, X( i ) , which are
not available in real time. The second sum is the realizable part.
Temporarily assume that X( n) is the zero-mean white noise, then the con
ditional expected value of X( n - k ) for negative k (i.e ., values of X that occur
after time n) would be zero, that is
E { X ( n + 1)|X ( n ) , X ( n - 1), . . .} = 0
418 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Sxs^D
XM H if) S in)
S xx(f>
Svxvx( f ) = S * * ( /) | t f .( /) | 2
1. It is realizable, and
2. |W,(/)p = [Sx x ( f ) ] ~ l
then
H it/1 H z {f)
1 VX M sxs(n
S in )
mr>
X M SM
Here we assume that such a filter can be found and we call it S£x ( f ) . Finding
it in this discrete case is not discussed in this book. However, a similar continuous
filter is described in Section 7.7.
Also note in Figures 7.6a and 1.6b that
# . ( / ) • H 2( f ) = H( f ) ; or s f x = S& ( / ) Sxx{ f )
Thus, the filters of Figure 1.6a and Figdre 1.6b are the same and each is that
given by Equation 7.78.
In the final step, shown in Figure 7.6c, the optimum realizable, (causal or
real-time) filter is H}, which is the real-time part o f H, because the input to H,
is white noise, that is the innovations o f X, and Equation 7.86 was used to justify
taking only the real-time part o f S(n) in the case o f white-noise input. This H}
is multiplied by //, assuming H t meets the conditions specified previously.
As stated, we do not emphasize the solution described for digital filters. A
solution is outlined in Problems 7.17-7.20. We emphasize recursive digital filters,
which are described in the next section.
Kalman filters have been effective in a number of practical problems. They allow
nonstationarity, and their recursive form is easy to implement. We describe the
general model, discuss the advantages of recursive filters, and then find the
optimum filter in the scalar case. Finally, we find the optimum Kalman filter in
the more general vector case.
We adopt the following notation for Kalman filtering: S(n), the signal, is a
zero-mean Gaussian sequence described by an autoregressive or Markov model
of the form
In the later case, the higher order difference equation will be converted to
a state model of the form
S (« ) = A ( « ) S ( « - 1) + W ( « ) (7.88)
where v(n) is white noise that is independent of both W( n) and S(n), and v(n)
is called observation noise. Equation 7.89 will be called the observation model.
We know that the optimal (minimum MSE) estimator of S(n + 1) using
observations W (l), . . . , X( n) is
S(n + 1) = E{ S{ n + 1)|X(1), . . . , * ( « ) }
If we assume that 5 and W are Gaussian, then this estimator will be linear.
If we delete the Gaussian assumption and seek the minimum MSE linear pre
dictor, the result will be unchanged. We want to find this optimum filter in a
recursive form so that it can be easily implemented on a computer using a
minimum amount of storage.
G _ * ( 1 ) + AT(2) + • • • + X( n - 1)
n - 1
KALMAN FILTERS 421
, ,, * ( ! ) + • • ■ + X(n)
S(n + 1) = — -------------------------
n
The problem with this form o f estimator is that the memory required to store
the observations grows with n, leading to a very large (eventually too large a)
memory requirement, However, if we simply express S(n + 1) in recursive
form, then
S(n + 1) = W n
-J i + m
n
This implementation of the filter requires that only the previous estimate
and the new observation be stored. Further,-note that the estimator is a convex
linear combination of the new observation and the old estimate. We will find
this is also the form of the Kalman filter.
5. We observe
X( n) = S(n) + v( n) (7.89)
where v(n) is also zero-mean white Gaussian noise with variance, a ;(n ),
which is independent o f both the sequence W( n) and 5(1).
422 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Our goal is to find the minimum MSE linear predictor o f S(n + 1) given
observations W (l), . . . , X( n) . Because o f the Gaussian assumption, we know
that
5 (n + 1) = E{ S( n + 1)|W(1), . . . , X ( n ) } (7.90)
will be linear and will be the optimum estimator. We want to find this conditional
expected value in a recursive form.
We know that the innovations process Vx and the observations X are linearly
equivalent. Thus,
S{n + 1) = 2 c ( n , j ) V x { j ) (7.92)
/=i
E { V x { j ) V x {k)} = 0, }* k
= Vv U) , j = k (7.93)
We now seek the values of c(n, ;') in this scalar case. A similar treatment of
the vector case is presented in the next section. In both cases we will:
E{ S( n + 1) V x ( j ) } = c(n, j ) a 2
v( j ) (7.94)
^Z2L^
E{ S( n + 1 )V * (/)} = £ {5 (n + l ) y * ( / ) }
This follows because the error, S(n + 1) - S(n + 1), is orthogonal to the
observations X ( j ) , j < n, due to the orthogonality condition. Because the in
novation Vx ( j ) is a linear combination of X( i ) , i < ;, the error is orthogonal
to v x( j ) , / S n, or
This produces the stated result and when this result is used in Equation 7.94 we
have
c(n, j ) = ^ ~ (7.95)
y E{ S( n + 1)VX(/)}
S(n + 1) = (7.96)
Vx U)
j-i <* W)
Now we use the state model, Equation 7.87, in order to produce a recursive
form
But the last term is zero. To see this, note that W( n + 1) is independent of
S(k) for k < n and E [ W( n + 1)] = 0. That is
E { W{ n + 1 )S (*)} = 0, k<n
E { W( n + l ) [ X { k ) - v (*)]} = 0, k < n
424 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
E { W( n + l ) X ( k ) } = 0, k<n
E { W( n + l ) V x ( j ) } = 0, j^n
E{ S{ n + 1) V x { j ) } = a(n + 1) E { S { n ) V x ( j ) } , j n (7.98)
a(n + l ) E { S ( n ) V x U) }
c{n, j ) =
VvU)
a. [ i\ r , n ^ E{ S { n ) V x { j ) } (7.99)
S (n + 1) = a (n + 1) 2 , -------- T F ~ \ v xKJ)
y=l
. E{ S ( n ) V x ( j ) }
S(n) = 2 j rr2 , ------- V x( J>
j= I a V \ l )
+ e i s w v m
(7.100)
S{n + 1) = a(n + 1)
vv( n)
= S,(n + 1) + b ( n ) Vx (n) (7.101)
where
b(n) = a(n + 1) ^
cr
v (n )
^ *s yet to determined (7.101.b)
Finally we define
b(n)
k{n) (7.103)
a(n + 1)
This form shows that the initial estimate S(n) at stage n is improved by the
innovation o f the observation X{ n) at stage n. The term within the brackets is
the best (after observation) estimator at stage n. Then a(n + 1) is used to
project the best estimator at stage n to the estimator at stage n + 1. This
estimator would be used if there were no observation at stage n + 1 and can
be called the prediction form o f the Kalman filter.
This equation can also be written as
showing that the best estimator at stage that is the term in brackets, is a
convex linear combination o f the estimator S(n) and the observation X( n) .
At this stage we have shown that the optimum filter is a linear combination
of the previous best estimator S( n) and the innovation Vx (n) = X { n ) — S(n)
of X( n) . However, we do not yet know k(n). We will find k(n) by minimizing
the MSE. First we find a recursive expression for the error.
Now from the definition of Equation 7.101.C and using the observation model,
Equation 7.89
Thus
E { S ( n ) V x (n)} = E{ S( n) S( n) } + E{ S{ n) v{ n) j
= E{ S[ n) S{ n) }
= £ {[S (n ) + 5 (n )]5 (« )}
E { S ( n ) V x (n)} = P( n) (7.110)
Now
We use Equation 7.104 for S(n + 1) and the state model, Equation 7.87, for
KALMAN FILTERS 427
S(n + 1) resulting in
Now, the fourth term o f Equation 7.112 is zero because o f the definition
o f innovations and the fact that S(n) is a linear combination o f Vx (l), . . . ,
Vx (n - 1). Similarly, the last two terms are zero due to the fact that W{n +
1) is orthogonal to S(n), S(n), and Vx (n). Using Equation 7.110 in Equation
7.112 results in
Now
E{V2
x (n)} = £ {[S (n ) + v{n) - 5 ( « ) ] 2}
= P{ n) + cr2(n)
Thus
where
<rw(n + 1)
C = P( n) + (7.116)
a2{n 4 - 1 )
P( n + 1) = a2{n + 1) [ k { n) D - B} 2 + C, (7.117)
where C t is some term not involving k(n). Comparing Equations 7.114 and 7.117
we find that
k ( n ) BD = P( n) k( n)
or
P(“ )
B = (7.118)
D
Kn) = | (7.119)
P{ n)
k(n) = (7.120)
P{ n) + al( n)
KALMAN FILTERS 429
This third and last of the defining equations of the Kalman filter can be
viewed, using Equation 7.106, as stating that the optimal estimator weights X ( n )
directly proportional to the variance of S(n), while S(n) has a weight
1 - k(n)
P( n) + crl(n)
[cr l ( n) P( n) ]
P(n + 1) = a2(n + + cr2
w(n + 1)
P{ n) + o'l(n) (7.121)
crt(n) P 2(n)
P(n + 1) = a2(n + 1) - P( n) +
[ P( n) + o•;(«)] [ P{ n) + <si(n)]2
+ cr2
w(n + 1) (7.122)
Initialize
n —1
P(l) = cr: (Assumed value, usually larger than cr; and <rtv)
S(l) - S (Assumed value, usually zero)
1. Start Loop
Get data: ct;(«), tr^(n), a(n + 1); X( n)
k(n) = P(n)/[P(n) + < r ; ( n ) ]
S(n + 1) = a(n + 1) ( 5 ( r i ) + k(n) [X(n) - 5 ( n ) ] }
P(n + 1) = a2(n + 1)[1 - k(n)}P(n) + <r:w(n + 1)
n = n + 1
Go to 1.
7.7(a) Algorithm
430 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
The term in the large brackets of Equation 7.122 shows that the revision of
5(n) using X ( n ) produces a linear combination of their variances, which is less
than either variance. This reduced variance is then projected to the next stage
as would be expected from the state model, Equation 7.87, where the variance
is multiplied by the square of a(n + 1) and the variance of W( n + 1) is added,
as would be expected. Finally, the term in brackets o f Equation 7.121 can be
expressed in terms of k(n) to find
EXAMPLE 7.15.
S(n) = .6 S(n — 1) + W( n)
X( n ) = S(n) + v(n)
<*w(n) = ^ :(« ) = \
The first observation is 5 f(l) and we start with the assumed values, 5(1) = 0,
E (l) = 1, and find the scalar Kalman filter.
n i) 1 2
k{ 1) =
P( 1) + °T 3
KALMAN FILTERS 431
P( 2) = (-6)2 1 - | 1 + \ = -37
5(3) =
« ,1385f(l) + .255W(2)
If a(n) does not vary with n, and VV(« ) and v(n) are both stationary, that
is, <iw(n) and ul ( n) are both constants, then both k(n) and P(n) will approach
limits as n approaches infinity. These limits can be found by assuming that P( n +
I) = P(n) = P in Equation 7.121 and using Equation 7.120.
P — a2( 1 - k ) P + a 2w
Thus
Solving the resulting quadratic equation for P and talcing the positive value
results in
Using this value in Equation 7.120 produces the steady-state Kalman gain, k.
EXAMPLE 7.16.
Using the data from Example 7.15, find the limits P and k.
SOLUTION:
.32
.390
.32 + .5
X( n) = H (n)S(n) + v( n) (7.125)
where
E [ W ( k ) W T(i)] = Q (* ), k = i
= 0 k 7^ i (7.126)
E [ v ( k ) v T(i)] = R (* ), k = i
= 0 k i (7.127)
We now proceed as in the scalar case and the interested reader can compare
the steps in this vector case with those of the earlier scalar case.
The optimal vector estimator will be
we have
n
S(n + 1) = 2 C(n, k ) Y x (k) (7.130)
First we find
where the subscript 1 indicates the optimal estimator that does not include Vx (n).
Using the state Equation 7.124 in Equation 7.132
E { W ( n + l)|VAr( l) , . . . , \ x (n)} = £ {W ( « + 1 )} = o
We now find
£ {X (n )| V * (l), . . . , Vx (n — 1 )}
= E{H( n) S( n) + v(n )| V *(l), . . . , \ x ( n - 1 )}
= H (n)£{S(rt)|VA.(l), . . . , y x (n - 1 )}
= H (n )S («)
v a- (« ) = X ( « ) - H (« )S (« ) (7.136)
B (n) = A ( « + l)K (n )
or
K («) = A - \ n + l)B(rt)
(7.138 )
This is the first o f the Kalman vector equations, which corresponds with the
scalar Equation 7.104. It can be rewritten as
or as
S (« ) = S(rt) - S (n)
(7.142)
P ( * ) = £ [ S ( n ) S r( « ) ] (7.143)
Now, using the definition o f VA-(n) from Equation 7.136 and Equation 7.142
This quadric form is now expanded using the definition in Equations 7.143 and
7.126:
Now the fourth and sixth terms are zero because S(n) is a linear combination
of V * (l), . . . V x (n — 1), which are orthogonal to V x (n). The eighth and ninth
terms are zero because W (n + 1) is orthogonal to S(n) and S(n). The tenth
and eleventh terms are zero because W (n + 1) is orthogonal to \ x (n). We now
evaluate the fifth and seventh terms. Using Equation 7.136 and the observation
Equation 7.125
V *(n ) = H (n )S (n ) + v ( n ) — H ( n ) S ( n )
= H (n )[S(n) - S(n)] + v ( n )
V x (n) = H (n )S (n ) + v ( n ) (7.146)
Thus
Similarly
£ {V * (n )S r(n )} = H (« )P (« ) (7.148)
P(n + 1) = A (n + l )P (n ) A r(n + 1)
+ A (n + T) K( n) E{ Yx ( n ) Y x ( n ) } K T( n ) A T(n + 1)
+ Q (n + 1) — A (n + l)P (n )H r(n )K r( n )A r(rt + 1)
- A (n + l)K (n )H (n )P (n )A r(rt + 1) (7.149)
Equation 7.150 is the second of the updating equations; it updates the covariance
matrix. We now wish to minimize by the choice of K (n) the sum of the expected
squared errors, which is the trace of P (n + 1). We proceed as follows. For
simplification, Equation 7.150 will be rewritten dropping the argument n and
rearranging
P (n + 1) = A P A r + Q + A K (H P H r + R )K rA r
- A P H rK rA T - A K H P A 7 (7 .1 5 1 )
438 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Now we use the fact that H PH r + R is a covariance matrix and thus a matrix
D can be found such that
H PH r + R = D D r (7.152)
Then
P ( n + 1) = A P A r + Q - A P H rK rA r
- A K H P A r + A K D D rKrA r (7.153)
B D rK r + K D B r = P H rK r + KHP
or
B = P H r( D r) “‘ (7.1.55)
KD - B = 0
or
K = B D 1
K = P H r( D r) >D 1
KALM AN FILTERS 439
Initialize
n = 1
P(l) = a-I (Assumed value)
S(l) = S (Assumed value)
1. Start Loop
Get data: R(n), H(n)Q(/z); A(« + 1); X(n)
K(/t) = V(n)W(n)[H(nW(n)HT(n) + R (« )]-'
S(« + 1 ) = A(n + 1){S(«) + K («)[X («) - H («)S(«)]}
^ - + ” K(«)H («)]P («)}A r(« + 1) + Q(n + 1 )
Go to 1 .
Figure 7.8 Kalman filtering algorithm (vector)
or
K = PH r(D D r) 1
K = PH r( H PH r + R ) - (7.156)
Equation 7.156 defines the Kalman gain and corresponds with the scalar Equa
tion 7.120. It, along with Equations 7.150 and 7.140, defines the Kalman filter
It Equation 7.156 is used in Equation 7.150, after some matrix algebra we
arrive at
or
As in the scalar case, the term inside the brackets will be interpreted as the
revised MSE, and the remainder of the equation will be viewed as updating.
The vector Kalman filter is summarized in Figure 7.8.
Figure 7.8 is completely analogous with Figure 7.7 with matrix operations
replacing the scalar operations. In addition, in the vector case, the observation
X is a linear transformation H of the signal, while h was assumed to be 1 in the
scalar case. (See Problem 7.23.)
440 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
EXAMPLE 7.17.
Assume that 5,(t) represents the position of a particle, and S2(t) represents its
velocity. Then we can write
Si (0 = P S2( t ) dj + 5,(0)
Jo
or with a sampling time of one second, this can be converted to the approximate
difference equation
I
S(n) = S(n - 1) + W («)
0
where
0 0
Q(n) 0 1
X ( n ) — H (/ i )S(/ i ) + v(n)
1 0
P (l) =
0 1
KALMAN FILTERS 441
and S (l) = [o] find K (« ) and P (n) for n = 1 , 2 , 3, 4. Also find S(2), S(3), and
S(4).
SOLUTION:
1 0 1 0
K (l) = 0 1
[1 0]
0 1 -r
1
2
0
1 1 1J X
2
~
S(2) = t o )]
0 1J l _ 0 _
Z (l)
2
. 0
i \ 1
1 1i r / 1 0 "l o' "l o ' "0 o'
P (2 ) =
0 1 U
0 1 _0_
[1 0 ]J
0 1 } 1 1
+
0 1
'1.5 1
1 2
1.5 1.5 1
[1 0] + 1
K(2)
1 1 2 r
.6
.4
i 1 0 1.5 1 1 0 0 0
P (3 ) [1 0]
1 0 1 1 2 1 1 0 1
2
2 .6 _
2 '
K(3) {3 + l } - ‘
2.6
.75
.5
442 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
and hence
X( 2 )
S(3) =
,6X(2) - .2X(\)
and
Note that this is the continuous analogue of the linear estimator of Section 7.5.
If the lower limit, a, in the integral of Equation 7.159 is — and b = t, that
is, we can observe the infinite past up to the present, then the estimator is usually
given the following names:
a > 0 prediction
a = 0 filtering
a < 0 smoothing
The most common measurement model and the only one considered in this
book is
The first step in finding the optimum solution is obtained quite simply from
the orthogonality condition
where h now indicates the optimum filter, that is, the h that satisfies the or
thogonality condition. Now, using the standard notation for the expected values
R xc(t - 0 = P 7 W S - V K ' - * )
a s ^ b (7.162)
Ja
444 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
The MSE is
Because both G{t) and G{t) are stationary, P(t) will not vary with time. Recalling
that the observations and the error are orthogonal
P = E G(t) - - \) d\ G(t)
R Xs( t + a) = R x x {t - t - k)h(t - k) dk
This integral equation is easily solved for the optimum filter via Fourier trans
forms. Indeed
or
(7.170)
If P ( t) = R ss{t) - \ P s*( t - a - X)/i(\) dX.
then P(0) = P.
Taking transforms of Equation 7.170 produces
Sxs(f)
PF( f) = Sssif ) - s sx( f )
Sxx(f)
\sxs(f)\2 (7.171)
= S « (/) -
Sxx( / )
Finally
f e ( / ) i 2~ (7.172)
P = P(0) = Sh ( / ) - df
$xx( f ) .
EXAMPLE 7.18.
and
R ss(t) = exp —| t |
r sn( t ) = 0
SOLUTION:
2
~ — + 1 + ----------- ^_____ = ( 2 -tt/ ) 2 + 9
(2 tt/ ) + 1 (2 tt/ ) 2 + 1 ( 2 -tt/ ) 2 + 1
/ / ( / ) = jffW . = 2
S xx(f) ( 2 tt/ ) 2 + 9
- 1
H { f ) = ------- 1____ = 3 3
(2 ^ / ) 2 + 9 jlrrf + 3 +
WIENER FILTERS 447
Thus
h{t) = - e x p ( - 3 t ) , t Si 0
= ^ exp ( + 3 1 ), t< 0
j Sss(J)
M /) |:
df
Sss(f) + SNN( f)
Sss(f)SNN( f)
-f1-°° Sssif) + SNN( f ) ^
2[(2vf)2 + 7]
/-» [ ( 2 - it/ ) 2 + 1 ] [ ( 2 tt / ) 2 + 9]
d f = .833
where the integral is evaluated using [2 from Table 4.1 (page 235).
EXAMPLE 7.19.
Z l Z T s 7 f ) t T S 5m U
d ing X(0 1 S(t) + N{0; S<t) 3nd are uncor-
7.9 s h o w ^ p S ° VedaP’ th3t Sssif) ' = ° ' Fl^ re
448 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
SOLUTION:
In this case using Equation 7.168
Sss(f)
(/) Sss( f ) + SNN( f)
H (f) = 1 M / ) > 0, Sm ( f ) = 0
= 0 > 0, Sss( f) = 0
= undetermined S m ( f ) = 0 = Sss(f)
and
f" Sss(f)SNN( f)
P =
J - 5 s (/) + s m(f)
where the subscript R denotes a realizable (causal) filter. Then the orthogonality
principle implies (see Equation 7.163)
Letting t — £ = t
Now letting (3 = t — X
Equation 7.176 is called the W iener-H opf equation. This integral equation
appears similar to Equation 7.166. However it is more difficult to solve because
of the range of integration and the restriction t s O precludes the use of Fourier
transform to directly find the solution. We will solve it using innovations, and
then we will prove that the solution does satisfy Equation 7.176. The steps that
follow in order to solve Equation 7.176 are
1. Assume the input is white noise and find the optimum realizable filter.
2. Discuss spectrum factorization in order to find a realizable innovations
filter.
3. Find the optimum filter with a more general input.
Optimum Realizable (Real-time) Filters with White Noise Input. If X(t), the
input to the filter, were white noise S(t + a) could, if all data were available,
be written as
where the first integral represents the past and the second integral represents
the future.
In this case, the optimum filter is defined by Equation 7.167. We will call
this filter the “ optimum unrealizable filter.”
Now assume that A(X) cannot be observed for X > f, that is, future X{k)
are not available now. Then it seems reasonable that if X(k) cannot be observed,
we should substitute for AT(X) the best estimate of A (X ), that is
Thus, referring to Equation 7.177, the last integral that represents the future or
unrealizable part o f the filter should be set equal to zero.
We have argued in this case where the input is white noise that the optimum
realizable filter is the positive time portion of the optimum unrealizable filter,
h{t), and the negative time portion o f h(t) set equal to zero. In equation form
the optimum realizable filter hRW with white noise input is
where
(7.180)
EXAMPLE 7.20.
Find the Fourier transform Q ( f ) and the two-sided Laplace transform L(s) of
1{C) = exp(-|fj)
SOLUTION:
£ ? (/)= exp(-|r|)exp(-/27r/r) dt
exp(f)exp( —j lj t ft ) dt + e x p ( -f )e x p ( —jl-nft) dt
1 -1
exp[(l - /27r/)f]|" + e x p [- ( l + j2itf)t]\;
1 - ; 2 ir/ 1 + ; 2 tt/
1 1
+
1 - j2i tf 1 + / 2 t t/
2
1 + ( 2 tt/ ) 2
Note that the 1/(1 - j2i\f) came from the negative time part of / ( f ) , which
might be called / “ ( f ) , and 1 / ( 1 + ) 2 i t / ) came from the positive time part,
/ + ( f ) , of l(t). Similarly
2 2
L(s) =
1 - i2 (1 + j ) ( l - s)
Now, because Q ( f ) is even, L(s) will also be even. That is, both the numerator
and denominator o f L(s) will be polynomials in —s2. Thus, if st is a singularity
452 LINEAR M INIM U M MEAN SQUARED ERROR FILTERING
(pole or zero) of L(s), then —5, is also a singularity. Then L(s) can be factored
into
L(s) = L +(s)L~(s)
where L +(s) contains all of the poles and zeros in the left-half 5-plane that
correspond with the positive time portion of /(f), and L~(s) contains all o f the
poles and zeros in the right-half 5-plane that correspond with the negative time
portion of /(f). Also
EXAMPLE 7.21.
Refer to Example 7.20. Find L +(s) and L~(s) and l +(t) and /"(f).
SOLUTION:
“• -iT + T - - ( s - m
V2 V2
= IT 7 ’ = —
l +{t) = \'2 e x p ( - f ) , ts 0
= 0 t< 0
then
V2
f V 2 exp(- 'f)e x p (-s f) dt
Jo 1+5
and that this integral will converge whenever Re(x) > —1. Also
V2
j V 2 exp(- f)exp( —st) dt
1 —5
WIENER FILTERS 453
and that this integral will converge whenever R e(s) < + 1 . Thus
EXAMPLE 7.22.
49 + 2 5 ( 2 - it f ) 2
Qif) = ( 2 t 7 / ) 2[ ( 2 t 7 / ) 2 + 169]
SOLUTION:
49 - 25s2
L(s) =
- s 2( - s 2 + 169)
(7 + 55)
L + (5)
0 ) 0 + 13)
(7 - 55)
L -0 )
( - 5 ) ( - 5 + 13)
Q ( f ) = Q +( / ) Q '( / )
Sxs(ReKP0’2 irfa)
xw m n = ---------------- Sit)
SxxiP
F ig u r e 7 .1 0 a O p t im u m u n r e a liz a b le filt e r .
Note that H x ■ H2 in Figure 7.106 equal H in Figure 7.10a using the fac
torization discussed in the preceding section, that is
Sxx(f) = S M f ) S M f ) (7.182)
Where Sxx(s/j2tt) contains the poles and zeros in the left-half of the s-plane
and its inverse transform is nonzero for positive time. Similarly Sx x (s/j2'n)
corresponds with the Laplace transform that has roots in the right-half plane
and whose inverse transform is nonzero for negative time. We can easily show
that Vx o f Figure 7.106 is white noise. Indeed
1
S Vx Vx(-f) Sxx(f), Sxx(f) > 0
Sxx(f)
1
Sxx(f) 1
Sxx(f)
H ( f ) = / / , ( / ) H2(f) (7.183)
where
1
# ,(/) = (7.184)
six(f)
F ig u r e 7 .1 0 6 A n o t h e r v e r s io n o f o p t i m u m u n r e a liz a b le filt e r .
WIENER FILTERS 455
= Sxs(J)exp(j2-nfu)
(7.185)
Sxx(f)
We now turn our attention toward the optimum realizable filter. Vx(t) is
white noise and the best estimate (conditional expected value) of future values
° f Vx (t) is zero. This justifies taking the positive time portion of H2{ f) and
implies that the equations that describe HR, the optimum realizable filter, are
where
h3(t) = h2{t), ta 0
= 0 t < 0 (7.188)
= c - \ A l hM QXP( - j l v f t ) dt
^xxU) Jo
= 1 ['■ Sxs(\)exp(j2-n\a)
Sxxif) Jo J-* ^xx(k)
x exp(y'2 'iTXt) dk e xp ( -j 2- nf t ) dt (7.190)
XU) ■SU)
We now find the mean squared error with the optimum realizable filter.
x e x p (/ 2 ir/£ ) df R s x ( - « - 0 d£
where
2(0 = r S x x (S )
exp(/2'ir\a)exp(/2'n,\t) dX.
Thus
P = /? «(0 ) - £ J “ exp(y2ir/g) / ? « ( - « - ?) d?
exp( - j l i r f t )
X d /A 2(t) dt
r3o
Sxs(f') / •„ r, \
P = R
J. « f ) eX p0M “)
e x p (/ 2 'ir/'t) d f ’ h2(t) dt
The term inside the brackets is H2( / ') by Equation 7.185; thus
P = f [1 - # « ( /) ] [ ! - H*R(f)]Sss(f) + df (7.192.b)
SOLUTION:
Sss(f) = k | e x p (c T )e x p (-/2 ir /¥ ) dr
k k 2 ck
c — j2irf c + ;2 tt/ c2 + (2 tt/ ) 2
2ck V 2 Tic V 2 Ik
Ss s
/ 2 rr, c - s c + s
458 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
c + / 2 tt/
Hi(f) =
V 2 dc
2 ck
c 2 - ( j 2 ir /)2)_
Hi(f) - eexpQ'2 ir/a)
V ®
c - jlixf_
V 2 ck e x p (/ 2 rr/a)
c +
and
= 0 t< 0
1
H3(f) = X/2ck exp( —ca)
c +
« * ( / ) = H i( / ) « 3( / ) = e x p ( - c a )
Ri
— exp( —ca)
Ri + R2
WIENER FILTERS 459
r2
+ ■W M V
Vi,
h = vout/vin
Figure 7.14 Voltage transfer function for Example 7.23.
460 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
EXAMPLE 7.24.
Given
S n n ( J ) — 1
Find the optimum unrealizable and the optimum realizable filters for estimating
SOLUTION:
Sxsif) = Sss(f)
Sxxif) — Sss( f) + SNN(f)
(2tt/ ) 4 + 169(2tt/ ) 2 ' + 3600
____________ 3600____________
H(f) =
(2tt/ ) 4 + (169)(2 tt/ ) 2 + 3600
Now, we seek the optimum realizable filter beginning with Equation 7.184:
(j2vf)(j2vf +13)
# .(/) =
0 2 it/ ) 2 + 17(/2 tt/ ) + 60
I I ( f s ... ^ , ^2 . n /) ■
2U) /2 tt/ j2itf + 13 ( /2 tt/ ) 2 - 17/2tt/ + 60
where
60 _8_
= A, =
13 13
and Y( f ) is not evaluated because these poles correspond with right-half plane
poles o f the corresponding Laplace transform.
h3(t) comes from the poles o f the corresponding Laplace transform that are
in the left-half plane. Thus
60 _8_
; / m _ 13 13 _ 4 (/2 tt/ ) + 60
3U> /2 tt/ jl-nf + 13 (j2-nf)(j2~nf + 13)
and
It can be shown using Equation 7.192.b that the MSE is approximately 4.0.
Although synthesis is not directly a part of this book, it is interesting to note
that HR{f) can be implemented as a driving point impedance as shown in Figure
F ig u r e 7 .1 5 a H(m) o f E x a m p l e 7 .2 4 as a d r iv in g p o in t i m p e d e n c e .
462 LINEAR M INIM U M MEAN SQUARED ERROR FILTERING
1 MO . 1 /if
F ig u r e 7 .1 5 b H Ro f E x a m p l e 7 .2 4 as v o l t a g e t r a n s fe r fu n c t i o n .
EXAMPLE 7.25.
This is the same problem as Example 7.18, except now a realizable (real-time)
filter is required. The error with this optimum realizable filter is to be compared
with the error obtained in Example 7.18 with the unrealizable filter.
SOLUTION:
As in Example 7.18
(2-jt/ ) 2 + 9
Sxx(f)
( 2 ttf ) 2 + 1
1 ; 2 tt / + l
Hx{f)
Sxx(f) /2 tt/ + 3
WIENER FILTERS 463
(— + 1)
ft (/) =
[(2 tt/ ) 2 + 1] + 3]
2 B
(/2-rrf + 1 ) ( - / 2 tt/ + 3) j'2-nf + 1 + 3
2 _ 1
A =
(1 + 3) ~ 2
1/2
H,{f) = 7
/ 2 tt/ + 1
1/2
HR(f)
/2 tt/ + 3
P = Rss( 0) - e xp ( - r ) dt
1
exp( —2 t ) d t
= 1 - 5■5 - 1 - 1 - B75
This error is somewhat greater than the .833 found for the unrealizable filter
that used stored data.
Proof that the Solution Satisfies the VViener-Hopf Equation. In the previous
section we argued that the solution given in Equation 7.190 solved the W iener-
H opf Equation 7.176. Here we formally show that it does.
464 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Theorem. If
‘SjW(\ )e x p (/ 2 -iT\ot)
exp(/'2irXf) dX dt
[1 Sxx (^ )
then
1
RHS Rxxir - P) exp(/2TT/p) df e x p ( - /' 2 ir//) dt
- f Sxxif) Jo
■S^(\)exp(/2 -iT\a)
X 1
/: ' 5^ ( \ ) — ~ e x p ( /2^Xr) d x j dp
= J " dX j R x x ( t — f )e x p (+ / 2 Tt\f) df
= f ” A « (P )/? ^ (t - p) dp = P M X) e* P ( / 2 * M exp(y2rr\T) dX
Jo 0 .0 - (X)
x [ ^A-^(>')exp(-/2TTX>') d>
SUM M ARY 465
ro
I Rx x ( y)exp(-jTv\y) dy = Sx x (\)
= J S;w(\)exp[/2ir\(a + t )] d\
= Rxs(cc + t ) Q .E .D .
7.8 SUMMARY
This chapter introduced the problem of how to estimate the value of a ran
dom signal S(t) when observations of a related random process X(t ) are
available. The usual case is that the observation is the sum o f signal and
noise. We assumed that the objective o f the estimation was to minimize the
mean squared error (MSE) between Y(f) and the estimator S(t).
466 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Optimal Wiener digital filters are exactly the same as those already discussed
when the number o f observations are finite. When there are more than a fi
nite number o f observations, the optimal digital filter was suggested, but
practical solutions were not developed.
Instead, recursive digital filters in the form o f Kalman filters were suggested.
The scalar case was considered first, and the minimal MSE recursive filter
was shown to be a weighted average of (1) the old estimate projected forward
via the state equation and (2) the innovation o f the new observation. The
weights are inversely proportional to the mean squared errors o f these two
estimators.
The com m on theme o f this chapter was the estimation o f an unknown signal
in the presence o f noise such that the MSE is minimized. The estimators were
limited to linear estimators, which are optimum in the Gaussian case. Innova
tions and orthogonality were used in developing both Kalman and Wiener es
timators.
7.9 REFERENCES
O p t i m u m fi l t e r i n g o r e s t i m a t i o n b e g a n w it h R e f e r e n c e [1 1 ], R e f e r e n c e [7 ] c o n t a in s a v e r y
r e a d a b l e e x p o s i t i o n o f th is e a r ly w o r k , a n d R e f e r e n c e [2 ] e x p la in s th is w o r k o n “ W i e n e r ”
f i l t e r i n g i n a m a n n e r t h a t f o r m s a b a s is f o r th e p r e s e n t a t io n in this t e x t . R e f e r e n c e [5 ]
in t r o d u c e d " K a l m a n ” f i l t e r in g . R e f e r e n c e [8 ] e m p h a s iz e s th e o r t h o g o n a l i t y p r in c ip le ,
PROBLEMS 467
a n d R e f e r e n c e [6 ] e m p h a s iz e s i n n o v a t io n s . R e f e r e n c e [1 ] is a c o n c i s e s u m m a r y o f “ K a l
m a n ” t h e o r y . R e f e r e n c e s [3 ] a n d [9 ] a n d [1 0 ] p r e s e n t a lt e r n a t e tre a tm e n t s o f f i l t e r i n g
a n d R e f e r e n c e [4 ] is a v e r y r e a d a b le a c c o u n t o f th e m a te r ia l c o n t a i n e d in th is c h a p t e r ,
a n d it a ls o c o n t a in s s o m e in t e r e s t in g p r a c t ic a l e x a m p le s a n d p r o b l e m s .
[2 ] H . W . B o d e a n d C . E . S h a n n o n , “ A S im p lifie d D e r iv a t io n o f L in e a r L e a s t S q u a r e s
S m o o t h i n g a n d P r e d ic t io n T h e o r y , ” Proceedings o f the IRE, V o l. 3 8 , A p ril 1950,
p p . 4 1 7 -4 2 4 .
[5 ] R . E . K a lm a n , “ A N e w A p p r o a c h t o L in e a r F ilte r in g a n d P r e d ic t io n P r o b l e m s ,”
Transactions o f the ASM E—Journal o f Basic Engineering, Ser. D , V o l. 82, M a rch
1960, p p . 3 5 -4 5 .
7.10 PROBLEMS
7.1 The median m of X is defined by Fx (m) = .5. Show that m minimizes
E{\X — m\} by showing for a < m
fm
E{\X - «|} = E{\X - m|} + 2 j (x - a ) f x(x) dx
7.2 Assume that X is an observation of a constant signal plus noise, and that
1 - ( * - 1)21
fs(s) = exp
V Z tt 2
468 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
Hint: First find f xis and use Bayes’ rule to find f six. Then use the mean
as the estimate.
7.3 We observe a signal 5(1) at time 1 and we want the best linear estimator
o f the signal 5(2) at a later time. Assume £'{5(1)} = £ { 5 ( 2 ) } = 0. Find
the best linear estimator in terms of moments.
7.4 Assume the same situation as in Problem 7.3, but now the derivative,
5 '(1 ), of the signal at time 1 can also be observed. £ {5 '(1 )} = 0. Find
the best estimator of the form
7.5 We wish to observe a state variable A o f a control system. But the ob
servation Y is actually
Y = mX + N
X = a + bY
E{Y\X = x) = a + bx
and
E{[ Y - (a + bX)Y\ = a } ( l - P2
xv)
Note that a and b are as prescribed by Equations 7.4 and 7.5 for the best
linear estimator.
7.8 Show that Equations 7.13 and 7.14 do minimize the mean-square error by
showing that
S - b0 - 2 W ) S ~h 0- ^ h,X(i)
7.10 S = X 2, and
= 0 elsewhere
such that the mean-square error is minimized. Find the resulting minimum
mean-square error.
XX 1 3 1
1
1 4
2
Find r and L.
7.16 Find the optimum realizable filter for the signal given in Example 7 12
with 07 = 1 and a = Vi. .
5 - 4 cos 2ttf
S(f)
10 — 6 cos 2tt/
then
5 - 2 (z + z ~ l)
Sz(z)
10 - 3(z + z “ l)
where
and
Sz[exp(j2Trf)] = S ( f )
5 - 2 (z + z -Q
Sz(z)
10 - 3(2 + 2 - ‘)
has two zeros and 2 , and two poles $ and 3 .
2z - 1
H(z)
3z — 1
7.20
Show that if the signal corresponding to the psd S(f) o f Problem 7 17 is
put into a digital filter
__1 3z — 1
H{z) ~ 2 l ~ l
then the output has the spectrum o f white noise, that is, the output is the
innovation o f the signal corresponding to the psd S(J).’
7.21
Assume that 5(0) is a Gaussian random variable with mean 0 and variance
cr . W(n), « = 0 , 1 , . . . , is a stationary uncorrelated zero-mean Gaussian
random sequence that is also uncorrelated with 5 ( 0).
5 (« + 1) = a(n)S(n) + W(n)
PROBLEMS 471
7.23 I f X ( n ) - h(n)S(n) + v(n), where the h(n) are known constants, is used
in place of Equation 7.89 and other assumptions remain unchanged show
that Equation 7.102 becomes ’
X {n )
S2{n + 1) = b'(n) S(n)
Jin)
where b ’ (n) and d(n) are constants
7.24 Equation 7.105 gives the best estimator of S(n + 1) using all observations
through X(n). Show that the best recursive estimator S(n) using all
observations through X (n ) is
S(n + 1) = a(n + l ) i ( « )
This is an alternate form of the Kalman filter, that is, S(n), is often used.
P'{n) = [1 - k(n)]P(n)
Also, show that
7.27 Let Z = k xX + k2Y, where X and Y are independent with means [ix and
|Ay and variance <
j \ and <
j2y . k l and k2 are constants.
a. Find p z and <
j \.
7.28 Refer to Example 7.15. Find 5(4), P( 4), 5(5), and P (5).
a. Find 5(1), 1(2), 5(3), and J(4) as defined in Problem 7.24 in terms
of X (l ), X(2), X(3), and AT(4). Show that the weight on 5(1) decreases
to zero. Thus, a bad guess of 5(1) has less effect.
b. How would you adjust P (l) in order to decrease the weight at
tached to 5(1)?
X (n ) = 5(n ) + v(n)
5(1) = 0, P ( 1) = 10
7.31 What is the steady-state gain, i.e. lim „_, k(n) for Problem 7.30?
7.32 Show in the vector case that Equation 7.139 is true following the steps
used from Equations 7.94-7.100 in the scalar case.
7.33 Explain why the cross-product terms are zero in the equation following
Equation 7.149.
7.34 Use Equation 7.150 and Equation 7.156 to produce Equation 7.157.
Sss(f) = (2 T t f f + 1
4
sNS(f) (2 ttf ) 2 + 4
PROBLEMS 473
Signal and noise are uncorrelated. Find the optimum unrealizable Wiener
filter and the MSE, when a = 0.
Sm(f ) ~ 1
Signal and noise are uncorrelated. Find the minimum MSE unrealizable
filter, H ( f ) for a = 0 and show that P = .577.
7.37 Assume that the signal and noise are as specified in Problem 7.36. How
ever, the filter form is specified as
= 1 + j l i rf R C = 1 + /2 ir jT
Find T for minimum mean squared error and show that the MSE is .914
if the T is correctly chosen.
7.38 Find the optimum realizable filter for the situation described in Problems
7.36 and 7.37, and show that P = .732. Discuss the three filters and their
errors.
Sm (f) = ^
7.41 Contrast the assumptions of discrete Wiener and discrete Kalman filters.
Contrast their implementation. Derive the Kalman filter for Problem 7.36
and compare this with the results of Problem 7.38.
7.42 An alternate form of the vector Kalman filtering algorithm uses the ob
servations at stage n in addition to the previous observations. Show that
it can be described by the following algorithm.
K (n) = P (n )H r(n)[H (n )P (n )H r(n) + R (n )]-‘
474 LINEAR M IN IM U M MEAN SQUARED ERROR FILTERING
P ' 0 ) = [i - K 0 ) H 0 ) ] P 0 )
S(n + 1) = A (n + 1)S(«)
PO + 1) = A r0 + l ) P '0 ) A r0 + 1) + Q O + 1)
This form is often used (see page 200 o f Reference [4]). In this form
P 'O ) is the covariance matrix reduced from P(n) by the use of the
observation at time n, and S is the revised estimator based on this obser
vation.
CH A PTER E IG H T
Statistics
8.1 INTRODUCTION
Statistics deals with methods for making decisions based on measurements (or
observations) collected from the results o f experiments. The two types o f de
cisions emphasized in this chapter are the decision as to which value to use as
the estimate of an unknown parameter and the decision as to whether or not to
accept a certain hypothesis.
Typical estimation decisions involve estimating the mean and variance of a
specified random variable or estimating the autocorrelation function and power
spectral density function of a random process. In such problems of estimating
unknown parameters, there are two important questions. What is a “ good”
method of using the data to estimate the unknown parameter, and how “ good”
is the resulting estimate?
Typical hypothesis acceptance or rejection decisions are as follows: Is a signal
present? Is the noise white? Is the random variable normal? That is, decide
whether or not to accept the hypothesis that the random variable is normal. In
such problems we need a “ good” method of testing a hypothesis, that is, a
method that makes a “ true” hypothesis likely to be accepted and a “ false”
hypothesis likely to be rejected. In addition, we would like to know the prob
ability of making a mistake of either type. The methods of hypothesis testing
are similar to the methods o f decision making introduced in Chapter 6. However,
the methods introduced in this chapter are classical in the sense that they do
not use a priori probabilities and they also do not explicitly use loss functions.
Also, composite alternative hypotheses (e.g., p A 0) will be considered in ad
dition to simple alternative hypotheses (e.g., p. = 1).
476 STATISTICS
In this chapter, we discuss the estimators o f those parameters that often are
needed in electrical engineering applications and particularly those used in Chap
ter 9 to estimate parameters of random processes. We also emphasize those
statistical (i.e., hypothesis) tests that are used in Chapter 9.
After a characterization of a collection o f observations or measurements
from a probabilistic or statistical viewpoint, some example estimators are intro
duced and then measures for evaluating the quality o f estimators are defined.
The method o f maximum likelihood estimation is introduced as a general method
of determining estimators. The distribution o f three estimators is studied in order
to portray how estimators may vary from one try or sample set to the next.
These distributions are also useful in certain hypothesis tests, which are described
next. This chapter concludes with a discussion o f linear regression, which is the
most widely used statistical technique o f curve fitting.
Note that in the first seven chapters o f the book, we had assumed that the
probability distributions associated with the problem at hand were known. Prob
abilities, autocorrelation functions, and power spectral densities were either
derived from a set o f assumptions about the underlying random processes or
assumed to be given. In many practical applications, this may not be the case
and the properties o f the random variables (and random processes) have to be
obtained by collecting and analyzing data. In this and the following chapter, we
focus our attention on data analysis or statistics.
X —m + N ( 8 . 1)
Note that in this model, m is an unknown constant whereas X and N are random
variables.
A very important special case o f this model occurs when the expected value
of N is zero. In this case, the mean o f X is m, and we say that we have an
unbiased measurement.
MEASUREMENTS 477
x = -'2 i x l ( 8 . 2)
« 11
m = x = - 2 X,
n 1
T h e basic prem ise o f estim ation is to determ ine the value o f an unknow n quantity
using a statistic, that is, using a fu nction o f m easurem ents. T h e estimator g ( X lt
X 2, ■ ■ ■ , X n) is a random variable. A sp ecific set o f m easurem ents will result
in X t = Xi and the resulting value g ( x , , x 2, . . . , x„ ) will be called an estimate
or an estim ated value.
— 1 "
is a statistic but
a 2 = 2 ---- ----- —
is not a statistic becau se it depends u pon the unknown param eter, |x.
In this section we describe three simple and widely used procedures for non
parametric estimation o f probability distribution and density functions: (1) the
empirical distribution function (or the cumulative polygon); (2) the histogram
(or the bar chart, also called a frequency table); and (3) Parzen’s estimator for
a pdf. The first two are used extensively for graphically displaying measurements,
whereas Parzen’s estimator is used to obtain a smooth, closed-form expression
for the estimated value of the pdf of a random variable.
fx|Ar„....x„(-00l"-) = 0
Fx |x,....x„(°°|-) = 1
and
A probability mass function can be easily derived either from the empirical
distribution function or directly from the data. The empirical distribution func
tion and the empirical probability mass function for the resistance data are shown
in Figures 8.1 and 8.2. Problems at the end of this chapter call for construction
of empirical distribution functions based on the definition in Equation 8.3.
Resistance
fu nction is
8.3.3 Histograms
W h en m any m easurem ents are available, in ord er to sim plify both data handling
and visual presentation, the data are often grou ped into cells. That is, the range
o f data is divided into a num ber o f cells o f equal size and the num ber o f data
poin ts within each cell is tabulated. This approxim ation o r grouping o f the data
results in som e loss o f in form ation. H ow ev er, this loss is usually m ore than
com pen sa ted for by the ease in data handling and interpretation when the goal
is visual display.
W h en the grou ped data are plotted as an approxim ate distribution fu n ction ,
the plot is usually called a cum ulative frequ ency p oly g on . A graph o f the g rou p ed
data p lotted as an approxim ate probability mass fu n ction in the form o f a bar
graph is called a histogram.
482 STATISTICS
EXAMPLE 8.1.
The values o f a certain random sequence are recorded below. Plot a histogram
of the values o f the random sequence.
SOLUTION:
The idea is to split the data into groups. The range of the data is 3.35 to 3.65.
If each interval is chosen to be .030 units wide, there would be 10 cells and the
resulting figure would be reasonable. The center cell of the histogram is chosen
to be 3.485 to 3.515 and this, with the interval size chosen, determines all cells.
For ease of tabulation it is common practice to choose the ends of a cell to one
more significant figure than is recorded in the data so that there is no ambiguity
concerning into which cell a reading should be placed. As with most pictorial
representations, some o f the choices as to how to display the data (e.g., cell
size) are arbitrary; however, the choices do influence the expected errors, as
explained in Section 8.5. The histogram is shown in Figure 8.3.
EXAMPLE 8.2.
A 1 n X ~ Xi
fx(x) = fx\Xl...,xSx \x i’ xn) (8.4)
nh(n) ,?i 8 7 m .
and
g(y) dy = 1 (8.5.C)
POINT ESTIMATORS OF PARAMETERS 485
While the choice of h(n) and g(y) are somewhat arbitrary, they do influence
the accuracy of the estimator as explained in Section 8.5. Recommended choices
for g (y ) and h(n) are
(8.6.a)
(8.6.b)
0 = g ( X u . . . , X n)
_ 1 "
M-jt (8.7)
2 C^inax -^-imn)
. 1
2. x such that Fm ....*,(* 1* 1 , • • • , x n) = -
4 = -
n
2 (xt - xy (8.8.a)
1
=
n - 12
(x, - xy (8.8.b)
POINT ESTIMATORS OF PARAMETERS 487
where NA is the random variable that represents the number o f times that event
A occurs in n independent trials.
Note that p is similar to the relative frequency definition o f probability.
However, there is an important difference. The relative frequency definition
takes the limit as n goes to infinity. This causes NA/n to become a number
whereas for finite n, NA/n is a random variable with a nonzero variance and a
well-defined distribution function.
(8.9.a)
or by
*XY = 2 (X ‘ ~ - y )
(8-9.b)
n A t=l
Fx (x; 8)
Then
(* ~ M')2
2
The only change is that our present emphasis on estimating the unknown pa
rameter 0 causes us to change the notation in order to reflect the fact that we
now enlarge the model to include a family o f distributions. Each value of 0
corresponds with one member of the family. The purpose of the experiment,
the concomitant measurements, and the resulting estimator is to select one
member of the family as being the “ best.” One general way of determining
estimators is described in the next subsection. Following that is a discussion of
criteria for evaluating estimators.
/ ( * ! , . . . , x n\ 0) = II f(x,; 9)
1=1
EXAMPLE 8.3.
= 0 x < 0
POINT ESTIMATORS OF PARAMETERS 489
If a sample consists of five i.i.d. measurements o f X that are 10, 11, 8, 12, and
9, find the likelihood function o f 0.
SOLUTION:
~Xj -5 0
£(e) = II /(*<; 0) = fl ^ exp 6
= 0 5 exp
0
0 > 0
i=l 1=1 “
EXAMPLE 8.4.
SOLUTION:
-50
L(0) = 0 5 exp ,0> 0
Setting the derivative equal to zero and solving for 0, the value of 0 that causes
the derivative to equal zero yields the estimate
0 = f = 10
490 STATISTICS
EXAMPLE 8.5.
SOLUTION:
Finding the value that maximizes ln[L(p.)] is equivalent to finding the value of
p. that maximizes L((x). Thus
or
n
nfi- = X xi
EXAMPLE 8.6.
SOLUTION:
W ith
= 0 elsewhere
we have
L(0)
v / = —
0„ 0 s x, < 0, i = 1 , 2,, . . . , n
0 = max(W,)
n
n fx,(x,; 0) / o(0)
492 STATISTICS
p
(a) (d)
(6) (e)
P
(c) if)
Figure 8.7 Bayes’ estimator.
MEASURES OF THE QUALITY OF ESTIMATORS 493
EXAMPLE 8.7.
A thumbtack when tossed has a probability P of landing with the head on the
surface and the point up; it has a probability 1 - P of landing with both the
head and the point touching the surface. Figure 8.7 shows the results of applying
Bayes’ rule with f P assigned to be uniform (Figure 8.7a). This figure shows
f P\k;n, that is, the conditional density function of P when k times out o f n tosses
the thumbtack has landed with the point up.
A study of Figure 8.7b will reveal that, after one experiment in which the
point was up, the a posteriori probability density function of P is zero at P =
0 and increases to its maximum value at P = 1. This is certainly a reasonable
result. Figure 8.7c illustrates, with the ratio o f kin = I, that the a posteriori
density function o f P has a peak at I, and, moreover, that the peak becomes
more pronounced as more data are used to obtain the a posteriori density
function (Figures 8.7c/, 8.7c, and 8.7/).
As demonstrated in Section 8.4 there can be more than one (point) estimator
for an unknown parameter. In this section, we define measures for determining
“ good” estimators.
What properties do we wish 0 to possess? It seems natural to wish that 0 =
0, but 0 is a random variable (because it is a function of random variables).
Thus, we must adopt some probabilistic criteria for measuring how close 0 is
to 0.
8.5.1 Bias
An estimator 0 of 0 is called unbiased if
£{0} = 0
494 STATISTICS
EXAMPLE 8.8.
17 /v ^ x X x + X z + ••• + X„
X = M- = g ( X u • • - , X n) = ------------------------------
E{Xt + * 2 - + /i Ix
E {X } = £{|i} = M-
n
EXAMPLE 8.9.
it is an unbiased estimator of tr2. (Note that the subscripts are omitted on both
p, and a 2).
SOLUTION:
First we note, because X, and Xj are independent, that
£ {(AT, - m-) ( Z Xi - |
tl
= E{{X, - ^)2} + Z E{ X, - ^ } £ { X ; - p} = cr2
MEASURES OF THE QUALITY OF ESTIMATORS 495
1A
E {( X S- x y -} = E X ~ Z ^ Xi
■1=1
= E Xi - p + p — 2 xi
■i=i
i a
= E Xi; - P - - 2 (X, - p)
n i= 1
1 A
= E{( X, - p ) 2} - - E \(Xi - p) 2 (X, - p) + - 2 ^
i= i
E{(Xi - X ) 2} = a 2 - - a 2 + - cr2
n n
But
£ { a 2} = - 2 E{( X, - X ) 2} = E{{X, - Z ) 2}
? or
2 2^ 1 ,
£ { a 2} - a 2 H---- a- 1 - (8.13)
s2 = 2 (Xi - xy
n - 1 1=1
is an unbiased estimator of a 2.
Notice that for large n, cr2 and 5 2 are nearly the same. The intuitive idea is
that one degree o f freedom is used in determining X . That is, X , X x, ■ ■ ■ ,
X n„ x determine X n. Thus, the sum should be divided by n — 1 rather than by
rt. (See Problem 8.27.)
496 STATISTICS
We have seen that X is an unbiased estimator of |x but there are many other
unbiased estimators o f p. For instance, the first measurement is also unbiased.
Thus, some other measure(s) o f estimators is needed to separate good estimators
from those not so good. We now define another measure.
or
MSE = b2 + Var[0]
a2
V a r[Z ] Var (8.15)
n
The average X has a lower variance and by the criterion of minimum variance
or o f minimum MSE, X is a better estimator than a single measurement.
The positive square root o f the variance o f 9 is often called the standard
error of 0 and sometimes also called the random error. The positive square root
of the MSE is called the root mean square error or the RMS error.
For a given sample size n, 9„ = g(Ar1; . . . , X n) will be called an unbiased
minimum variance estimator o f the parameter 0 if 0„ is unbiased and if the
variance o f 0„ is less than or equal to the variance o f every other unbiased
estimator of 0 that uses a sample of size n.
If 0 ¥= 0, the normalized bias, normalized standard deviation (also called the
coefficient of variation), and the normalized RMS error are defined as follows:
JV Nt
(Area),- = Vk =
NW N
498 STATISTICS
Because
n n
^ Nj = N then ^ (Area),- = 1
;=i i=i
where x i L and x LU are respectively the lower and upper limits o f the zth cell.
Note that N and W are constants, whereas N, is a random variable.
where
P{i) f ( x ) dx ( 8 .22)
X ,.L
1_
E [ f ;( x ) ] f ( x ) dx, i,L s X < x iM (8.23)
W
In general, Equation 8.23 will not equal f ( x ) for all values o f x within the
zth cell. Although there will be an x within the zth cell for which this is an
unbiased estimator, for most f ( x ) and most x, Equation 8.19 describes a biased
estimator. In order to illustrate the bias, we will choose x Ci to be in the center
o f the zth cell and expand f ( x ) in a second-order Tayor series about this point,
MEASURES OF THE QUALITY OF ESTIMATORS 499
that is
(x - Xcf
/ 0 ) = / 0 J + f ’ (xc) ( x - x c.) + f " ( x c) < X < Xu
1 CXc^W/2 T (x
f ( x c.) + f ' ( x c.)(x - x c.) + f " ( x c.) dx
w L".mi L
or
a yV
E [f i {x ) ] = f ( x c.) + f " ( x c.) x u < x < x i%u (8.24)
W _2
b [fi(x )] ~ f " ( x Ci) Hl (8.25)
24’
f " ( x c) W 2
(8.26)
f ( x Ci) 24
Note that the bias and the normalized bias increase with cell width.
V v a r /,-(*) 1
Vl - W f ( x c) (8.30)
f(*c)
The variance and normalized standard error decrease with W for N fixed. Thus,
the cell width has opposite effects on bias and variance. However, increasing
the number N of samples reduces the variance and can also reduce the bias if
the cell size is correspondingly reduced.
MSE = b2 + V a r [/,(x )]
'/ X ) l 2 [1 - Wf(Xc,)]
(8.32)
J ( x c) \ 576 N W f ( x c)
The normalized RMS error, e, is the positive square root of this expression.
Note that W f ( x c.) is less than one; thus increasing W clearly decreases the
variance or random error. However, increasing W tends to increase the bias. If
fV—* and W —* 0 in such a way that N W —* °° (e.g., W = 1 /V N ) then the
MSE will approach zero as /V —> Note that if f " ( x c.) and higher derivatives
are small, that is, there are no abrupt peaks in the density function, then the
bias will also approach zero.
Usually, when we attempt to estimate the pdf using this nonparametric
method, the estimators will be biased. Any attempt to reduce the bias will result
in an increase in the variance and attempts to reduce the variance by smoothing
(i.e., increasing W) will increase the bias. Thus, there is a trade-off to be made
between bias and variance. By a careful choice of the smoothing factor we can
make the bias and variance approach zero as the sample size increases (i.e., the
estimator can be made to be asymptotically unbiased).
MEASURES OF THE QUALITY OF ESTIMATORS 501
V g ( v ) l —*■ 0 as M - * 00
and
J y 2s \ y ) dy < CO
Xi
fx (x ) 2 s
nh(n) V h(n)
are
Variance[ f x ( x )] = — ^ j g 2( y ) dy (8.34)
h(n) —* 0 as n —* »
and
nh(n) —» as n —> °°
then the estimator is asymptotically unbiased and the variance of the estimator
approaches zero. h(n) — 1 / V « is a reasonable choice.
502 STATISTICS
*n = - £
has mean p and variance crVrz. Thus, as n -»■ °°, X„ has mean p and a variance
that approaches^. Thus, X„ converges in probability to p (by Tchebycheffs
inequality) and X„ is a consistent estimator of p. (See Problem 8.15.)
Note also that both
- 2
n « - * .)’
and
;r h 2 « - s '-) 1
V ar(02)
Var(01)
In some cases it is possible to find among the unbiased estimators one that
has the minimum variance, V. In such a case, the absolute efficiency of an
unbiased estimator 0 , is
BRIEF INTRODUCTION TO INTERVAL ESTIMATES 503
EXAMPLE 8.10.
SOLUTION:
It is easy to show that
504 STATISTICS
is normal with mean p, and variance cr2/n = 1/10 (see Problem 8.22). Thus,
( A — p,)/VTl is a standard normal random variable, and from Appendix D we
can conclude that
P -1 .9 6 =£
In Section 8.5, we discussed the mean and variance o f estimators. In some cases,
for example in interval estimation, we may be interested in a more complete
description o f the distribution o f the estimators. This description is particularly
needed in tests o f hypotheses that will be considered in the next section. In this
section we find the distribution o f four o f the most used estimators.
is also normal with mean p and variance cr2/n (see Problem 8.22). Note also
that the central limit theorem implies that, as n —» X will be normal, almost
regardless o f the distribution of X.
Note that ( X - p )/(o 7 V n ) is a standard (p = 0, cr2 = 1) normal random
variable.
DISTRIBUTION OF ESTIMATORS 505
EXAMPLE 8.11.
X, is normal with mean 1000 and variance 100. Find the distribution o f X n wheri
n = 10, 50, 100.
SOLUTION:
From the previous results, X„ is normal with mean 1000, and the variances of
X„ are
V a r [* 10] = 10
V ar[X50] = 2
VarfA'ioo] = 1
Note that with a sample o f size 100, X loo has a probability o f .997 of being within
1000 ± 3.
P{ Y, s y) = p( - Vy < Z, ^ Vy)
we have
506 S T A T IS T IC S
and
dFy(y) -y
f r ,( y ) y 1/2 exp }> > 0
dy V 2 tt
0 y < 0 (8.35)
The probability density function given by Equation 8.35 is called chi-square with
one degree o f freedom. (Equation 8.35 may also be found by using the change-
of-variable technique introduced in Chapter 2.)
From the characteristic function of a standard normal density function, we
have (see Example 2.13)
E {Z j} = 1
and
E {Z f} = 3
Thus
E {Y t} = E { Z } } = 1 (8.36)
and
Var[Yj] = E {Y j} - = E {Z f} - 1 = 2 (8.37)
= ( 1 - / 2 w) - 12 (8.38)
m m
2 x 2
Xm = s Y, = s z: 2
j=i i*i
DISTRIBUTION OF ESTIMATORS 507
= 0 y < 0 (8.40)
T (1/2) =
T (l) = 1
T (« ) = (n - 1 )T(n - 1 ), n > 1
and
Equation 8.40 defines the chi-square pdf; the parameter m is called the degrees
o f freedom.
Note that if R = V x f then R is called a Rayleigh random variable. Fur
thermore, M - V x i is called a Maxwell random variable. The density functions
for various degrees of freedom are shown in Figure 8 .8 . Appendix E is a table
of the chi-square distribution.
Using Equations 8.36 and 8.37, it is apparent that the mean and variance of
a chi-square random variable are
E { x l } = m E {Z }} = m (8.41)
x2
Figure 8.8 G r a p h s o f p r o b a b i l i t y d e n s it y fu n c t i o n s o f xi fo r m - 1,
4 , 10, an d 20.
(8.44)
T
z rrt and U =
DISTRIBUTION OF ESTIMATORS 509
and then finding the marginal density function of Tm. The joint density Tm and
U is given by
f r „ , u ( u) — f z,Ym ( t ~^r=, 11 I W
.( m l 2-1)
V 2 tt 2mllr(m l2)
Vw
x e x p f - f b + i ; u > 0
frJ O = f
Jo0 V 2 tt 2 m/T (m /2 ) Vm
Let
u f-
w i + -
2 m
Then
('» - 1)12
1 2w
frit) exp( - w) dw
V 2 t7 2m,2r ( m /2 ) V m Jo
Jo t~
1 + 1 +
m
2 (m+0 2 m + 1
(m-t-l) - r
V 2 ^ 2m/2r ( m /2 ) V m ( 1 +
or
m + 1
f r l 0 = ^2 \ ( ,n + 1)
(8.45)
V - ttw r ( m / 2) ( 1 +
510 S T A T IS T IC S
A
m —3
0 -3 2 -1 0 +1 +2 +3
T
Figure 8.9 Two examples of the T„ density.
The probability density function given in Equation 8.45 is usually called the
(Student’s) tm density. Note that the only parameter in Equation 8.45 is m, the
number o f independent squared normal random variables in the chi-square. A
sketch of f Tm is shown in Figure 8.9 for two values of m and Appendix F is a
table of the tm distribution.
cr2
ables, n a 2, X and S2 are independent random variables, and that (n — 1)52/
has the standard chi-square distribution with m = n — 1 degrees of freedom.
Also V n ( X — |x)/.S has the (Student’s) tm distribution with the m = n - 1.
EXAMPLE 8.12.
If there are 31 measurements from a normal population with cr2 = 10, what is
the probability that S2 exceeds 6.17?
SOLUTION:
EXAMPLE 8.13.
SOLUTION:
8.7.5 F Distribution
We seek the probability density function of the ratio of two independent chi-
square random variables. To this end consider two independent variables, U
and V, each of which has a density given by Equation 8.40 where U has the
parameter W| degrees of freedom, and V has oi; degrees o f freedom. The joint
density of these two independent R V ’s is
«>0, ua 0 (8.46)
F = and Z —V
,:0*r
T
512 STATISTICS
We now find the joint density o f F and Z and then integrate with respect to
z to find the marginal density of F. From Equation 8.46 and using the defined
transformation
(m,/2-l)
f f,z (^, z) — zX
r R r r
z / m,\ m,z
X Z (™ 2 / 2 -l) e x p
+ 1 X, z > 0
2 \ m2 m2 ’
m ,/2
fM — f
Jo
j ( / n i + rrj2)/ 2 - 1
r ly ) r (y l
z / m tX
x exp + 1 rfz
2 \ m2
Let
z / m,X
F = z — + 1
2
then
Aflj/2
(m|+m2)/2- 1
„ , i w,X
/ f (X)
r |y j r ( ^ 2 ) 2 (m' +mJ'2 J°
f 2^ < ^ 7 +1
x e x p (-y )
WjX
+ 1
vw 2
\^1^2
m. ^(ra,/2-l)
m, + m 2
------r1
(m,+m2)/
+ ra,)/2 X> 0
+ 1
0 elsewhere (8.47)
TESTS OF HYPOTHESES 513
EXAMPLE 8.14.
“ (x, - x ,y
m
and
y (X, - x 2f
,4 20
where X x is the sample mean o f first 11 samples and X 2 is the mean of the next
21 samples. If these 32 i.i.d. samples were from the same normal random
variable, find the probability that F10i20 exceeds 1.94.
SOLUTION:
From Appendix G
Aside from parameter estimation, the most used area of statistics is statistical
tests or tests of hypotheses. Examples of hypotheses tested are (1) The addition
o f fertilizer causes no change in the yield o f corn. (2) There is no signal present
in a specified time interval. (3) A certain medical treatment does not affect
cancer. (4) The random process is stationary. (5) A random variable has a normal
distribution.
We will introduce the subject o f hypothesis testing by considering an example
introduced in Chapter 6 and as we discuss this example, we will define the
general characteristics of hypothesis testing. Then we will introduce the concept
o f a composite alternative hypothesis. This will be followed by specific examples
o f hypothesis tests.
514 STATISTICS
(8.48)
and
(8.49)
For reasons that will be more apparent when composite alternatives are
considered, the first hypothesis stated above, that is, that a 0 was transmitted,
will be called the null hypothesis, H0. The alternative that a 1 was transmitted
will be called the alternative hypothesis, H {. Obviously, these definitions could
be reversed, but in many examples one hypothesis is more attractive for des
ignation as the null hypothesis.
We must decide which alternative to accept and which to reject. We now
assume that we will accept H0, if the received signal Y is below the value y and
we will accept Hi if the received signal is equal to or above y. Note that the
exact value of y has not yet been determined. However, we hope that it is
intuitively clear that small values o f Y suggest Hg, whereas large values of Y
suggest Hi (see Chapter 6 and Figure 8.10).
The set of values for which we reject H0, that is, Y > y, is often called the
critical region of the test. We can make two kinds of errors:
The probability o f a type I error, often called the significance level of the
test, is (in the example)
(8.50)
TESTS OF HYPOTHESES 515
(*■ - l )2 dk (8.51)
2
05 ' [' x k “p
= Q(y) (8.52)
7 ~ 1.65
With 7 set at 1.65, then the probability of a type II error is, using Equation 8.51
fl.6 5 1 (X - 1) 2~l
P(Type II error) = \ —£ = exp d\
1 - <2(.65) = .7422
516 STATISTICS
Thus, in this example, the probability o f a type II error seems excessively high.
This means that when a 1 is sent, we are quite likely to call it a zero.
Can we reduce this probability of a type II error? Assuming that more than
one sample o f the received signal can be taken and that the samples are inde
pendent, then we could test the same hypothesis with
Y,
n
and expect that with the same probability o f a type I error, the probability of
a type II error would be_reduced to an acceptable level. If, for example, n =
10, then the variance o f Y would be 1/10 and Equation 8.52 would become
.05 = exp dz
2
Thus
1.65
Q (V l0 y ) = .05 => .52
V lO
r.5 2 l
P(type II error) =» —= z exp (*• - i r d\
J -?o V.2'ir .2
- .48Vl0 1
dz
V ^ 6XP
.065
Note that not only is the type II error reduced to a more acceptable level,
but also the dividing line 7 for decision is now approximately halfway between
0 and 1 , the intuitively correct setting of the threshold.
The previous example could have been formulated from a decision theory
point of view (see Chapter 6). The significant difference is that here we follow
TESTS OF HYPOTHESES 517
H0: M-y — 0
However under the composite alternative hypothesis the exact level or mean of
the received signal is now unknown, except we know that the transmitted signal
or mean of Y is now greater then zero, that is
In this case the probability o f a type I error and the critical region remain
unchanged. However, the probability of a type II error becomes
1 - Q ( l ~ 5) (8.55)
If the significance level o f the test were set at .05 as in the previous section,
518 STATISTICS
S
Figure 8.11 Probability of type II error (power function).
then y = 1.65 as before. In this case, the probability o f a type II error is plotted
versus 5 in Figure 8.11.
A plot like Figure 8.11, which shows the probability o f a type II error versus
the value of a parameter in a composite alternative hypothesis, is called the
power o f the test or the power function.
H0: p = 0
Hp p. = 8 A 0
TESTS OF HYPOTHESES 519
Assume 10 i.i.d. measurements are to be taken and that we accept H0 if |X| <
7 . The variance o f X is
2 &x 10
CFtj 1
10
Now the probability, a ^ o f a type I error, that is, rejecting H n when it is true,
is the probability that X is too far removed from zero. _
This does not completely define the critical region because X could be too
large or too small. However, it is reasonable to assign half o f the error probability
to X being too large and half of the error probability to X being too small.
Thus
a
P ( X > 7 |H (8.56)
°> “ 2
a
P ( X < - 7 |n (8.57)
°) -2
P (|A | 7) = 1 - a (8.58)
If a = .01, then from Appendix D with = 1 , 7 = 2.58, and the critical region
R is
T =
to set up the hypothesis test. In Section 8.7.3 it was shown that this random
variable has the tm distribution, and at the a level of significance,
a
P > t„
2
520 STATISTICS
For instance if n — 10 and ot = .01, then f9i 0o5 = 3.25 (from Appendix F) and
the critical region R is
Thus we would compute X and S from the sample data, and then if the ratio
X
S/VT6
falls within the critical region, the null hypothesis that p, = 0 would be rejected.
versus
The random variable W = X t — X 2 will have mean p.[ — p.2, and if X x and
X 2 are independent, it will have variance u\/n, + u 2ln2. Further if = u2,
then the variance is [ ( l / « i ) + ( 1 / « 2) ] ctt , and if X { and X 2 are normal and we
assume that they_have the same variance then the difference in means will
be estimated by Xi — X 2 and will be estimated by
s (*■-*.)2+2
L;=i y=i
( x * i+ j -*2)2
(«i + n2 - 2)
TESTS OF HYPOTHESES 521
5 (x , - x , y + § ( x „ l+ j - x y
ni + n2 — 2
EXAMPLE 8.15.
We wish to test whether the two sets o f samples, each o f size four, are consistent
with the hypothesis that p.] = p.2 at the 5% significance level. We shall assume
that the underlying random variables are normal, that cr, = o-2 = <r, X x and
X 2 are independent, and the sample statistics are
X , = 3.0 X2 = 3.1
S\ = .0032 S\ = .0028
SOLUTION:
From Appendix F we find that f6; 025 = 2.447. Thus, the computed value of
T must be outside ±2.447 in order to reject the null hypothesis. In this example,
using Equation 8.62
1 l\ [(.0032)3 + (.0028)3]
\4 + 4 / 6
( f l /a ? )
(S?/cr?)
will have an F distribution. Furthermore, if both samples are from the same
distribution, then cr? = cr?. If this is the case, then the ratio o f the sample
variances will follow the F distribution. Usually, it is possible to also hypothesize
that if the variances are not equal, then Sl will be larger than S2. Thus, we
usually test the null hypothesis
versus
by using the statistic, 5?/5?. If the sample value of SySj is too large, then the
null hypothesis is rejected. That is, the critical region is the set of Sl/S2 such
that Sj/Sj > Fm^mi, where Fmi,„z is a number determined by the significance
level, the degrees o f freedom of S\, and the degrees of freedom of S2.
EXAMPLE 8.16.
rj . t t
n 0. cry — 0*2
versus
SOLUTION:
The critical region is determined by the value o f 01. This value is found in
Appendix G to be 10.456.
Thus, if the ratio S\IS\ > 10.456, the null hypothesis should be rejected.
However, if S\IS2 ^ 10.456, then the null hypothesis should not be rejected.
It was shown in Section 8.7.2 that the sum o f the squares o f zero-mean i.i.d.
normal random variables has a chi-square distribution. It can be shown that the
sum of squares o f certain dependent normal random variables is also chi-square.
We now argue that the sum o f squares of binomial random variables is also chi-
square in the limit as n —*
That is, if X x is binomial, n, p u that is
i =. 0, 1 , . . . , n
(8.63)
then Y will have a distribution which is standard normal as shown by the central
limit theorem.
If we now define
(8.64)
X, - npl = n - X 2 - n( 1 - p 2) = np2 - X 2
^ _ (X l - npi)2 (X 2 - np2f
-^l.n ~ “r (8.65)
npi np2
where
S Pi = l and S h = « (8.67)
;= i /= i
( 8 . 68)
np,
' P i = Pi.o
Pi = Pi.a
Pm — Pm.O (8.69)
TESTS OF HYPOTHESES 525
f P i ^ Pi.o
H t-
If the null hypothesis is true, then Z m_l nwill have an approximate chi-square
distribution with m — 1 degrees of freedom. If p 10, p 2,o» • • • , p m.o are the true
values, then we expect that Z m_I n will be small. Under the alternative hy
potheses, Z m_i,„ will be larger. Thus, using a table of the chi-square distribution,
given the significance level a, we can find a number \m-i;a such that
P ( Z m- 1,„ S: x L i J = <*
and values of Zm- Un larger than Xm-u* constitute the critical region.
EXAMPLE 8.17.
A die is to be tossed 60 times and the number of times each of the six faces
occurs is noted. Test the null hypothesis that the die is fair at the 5% significance
level given the following observed frequencies
Up face 1 2 3 4 5 6
No. of observances 7 9 8 11 12 13
SOLUTION:
Under the null hypothesis the expected number of observations of each face is
10 (npi), and hence
Note that this variable has 6 - 1 = 5 degrees of freedom because the number
o f observations in the last category is fixed given the sample size and the observed
frequencies in the first five categories.
Since from Appendix E, Xs;.o5 “ 11.07 > 2.8, we decide not to reject the
null hypothesis that the die is fair.
526 STATISTICS
d = m - 1 (8.71)
EXAMPLE 8.18.
Score
Number of Observations
99
1
98
1
94
1
92
1
91
1
90
2
89
2
88
3
85
1
84
4
83
3
82
2
81
2
80
3
79
2
78
2
77
1
76
4
75
5
74
1
73
2
TESTS OF HYPOTHESES 527
SOLUTION:
X = 75.74
and
52 « 426873 - 73(75.74)2
112.57
72
S « 10.61
The observation intervals, expected values (each is 73 times the probability that
a Gaussian random variable with mean 75.74 and variance 112.5 is within the
observation interval) and observed frequencies are
Observation
Interval Expected Observed
95.5-100 2.3 2
90.5-95.5 3.7 3
85.5-90.5 7.1 7
80.5-85.5 10.8 12
75.5-80.5 13.3 12
70.5-75.5 13.1 14
65.5-70.5 10.5 15
528 STATISTICS
Observation
Interval Expected Observed
60.5-65.5 6.7 3
55.5-60.5 3.7 2
0-55.5 1.8 3
Because two parameters, that is, p. and cr2, of the normal distribution were
estimated, there are 10 - 1 — 2 = 7 degrees of freedom, and
The value o f x 2. 01 is 18.47; thus, 6.045 is not within the critical region (>18.47).
Therefore, the null hypothesis o f normality is not rejected.
For testing against a one-sided (e.g., jjl > 0) alternate hypothesis, we use a
one-sided test. When the alternate hypothesis is two-sided (p. # 0) a two-sided
test is used.
In this section we seek an equation that “ best” characterizes the relation between
X , a “ controlled” variable, and Y, a “ dependent” random variable. It is said
that we regress Y on X , and the equation is called regression. In many applications
we believe that a change in X causes Y to vary “ dependency.” However, we
do not assume that X causes Y; we only assume that we can measure X without
error and then observe Y. The equation is to be determined by the experimental
data such as shown in Figure 8.12.
Examples are the following:
X y
Height of father Height of son
Time in years Electrical energy consumed
Voltage Current
•/
/
/
x
X Y
There are three parts of the problem to be solved: (1) What is the definition
of the "best” characterization of the relation? (2) What is the “ best” equation?
and (3) How good is the “ best” relation?
The accompanying model is Y = Y + e, where e is the error. Our ini
tial approach to this problem will restrict consideration to a “ best” fit of
the form Y = b0 + bxX , that is, a straight line or linear regression. Thus, errors
between this “ best” fit and the observed values will occur as shown in Figure
8.13.
The errors are defined by
ex — Y\ — Y\ — Y[ (b 0 -t- b\X\)
e„ = Y„ - Y„ = Y„ - (b0 + b {X n) (8.72)
Yj = b„ + b xXi + e „ i = 1, . . . , n (8.73)
n rt
, 2 {X,Yt - X Y )
(8.75)
bl ~ 2 ( x i - x 2)
o<>>
^1
*1
►
II
(8.76)
l
where
(8.77)
1 i-*
(8.78)
ii
M
S
(9
— = - 2 2 (Y, - b 0 - btXi) (8.79)
ob0
fip 2
^ = - 2 2 ( X ^ Y , - b0 - b M ) (8.80)
Setting these two equations equal to zero and assuming the solution produces
the values b0 and bx which minimize e2. The value of e2 when b0 and bv are used
will be called e2. Thus
2 ( Y , - bo - bxXi) = 0
2 (Xi)(Yi - ba - BxX t) = 0
532 STATISTICS
We then rearrange these two equations into what is usually called the normal
form:
X X,Y, = k 2 X, + bt 2 X } (8.82)
b0 = Y - b fx
2 ( X & ) = (Y - bxX ) 2 X, + k 2
or
To show that Equation 8.83.a is equivalent to Equation 8.75, consider the nu
merator of Equation 8.75.
It can also be shown in the same fashion that the denominators are equal (see
Problem 8.39).
We have shown that differentiating Equation 8.74 and setting the result equal
to zero results in the solution given by Equations 8.75 and 8.76. We now prove
that it is indeed a minimum. Define
Yf = a0 + a xXi
c-* = 2 (y, - Y? y ^ 2 [ y , - ( k + k m = e2
SIMPLE LINEAR REGRESSION 533
thus showing that ft0 and ft, as defined by Equations 8.75 and 8.76 do indeed
minimize the mean squared error:
e: * = ^ [y« - («o + a iX d f
= 2 [ ( y / “ £o ~ bxX ) + (ft„ - a0) + (ft, - a x)Xi)]2
= 2 (Yi ~ b0 - ft,*,-)2 + 2 [(b 0 - flo) +•<&! - a.)*,]2
+ 2(ft0 - fl0) 2 (y, - ft0 - ft,*,)
+ 2(&, - a,) 2 a ,( t , - b0 - ft, a ,)
The last two terms are equal to 0 by Equation 8.81 and Equation 8.82. Thus
Thus
2 ( * . ~ X )(Y, - Y)
_ _ _ _ _
where the equivalence of the denominators of Equation 8.75 and Equation 8.83.b
is shown similarly.
To summarize, the minimum sum of squared errors occurs when the linear
regression is
Y = b0 + bxX (8.84.a)
s;;S r
534 STATISTICS
where b0 is given by Equation 8.76 and b\ is given by Equation 8.75. The errors
using this equation are
and are called residual errors. The minimum sum of residual squared errors is
e2 = £ e? (8.84.C)
i=i
EXAMPLE 8.19.
X y
-2 -4
-1 -1
0 0
+1 +1
+2 +4
SOLUTION:
X = 0 Y = 0
2 X M = 8 + 1 + 0 + 1 + 8 = 18
2 ^ 1 = 4 + 1 + 1 + 4=10
- 18
K = 0
y,- = i.8 Xi
SIMPLE LINEAR REGRESSION 535
Xi Yi Yi e, e?
-2 -4 - 3 .6 -.4 .16
-1 -1 - 1 .8 + .8 .64
0 0 0 0 0
1 1 1.8 -.8 .64
2 4 3.6 + .4 .16
We have found a best fit and shown the errors that result from this fit. Are
the errors too large or are they small enough? The next subsection considers
this important question.
Yi = b0 + b.Xi
Y, - Y = b^X, - X ) (8.85)
Continuing to look at the residuals errors, we show the sum o f the residual
errors is zero. Using Equation 8.85, we have
Y, = Y + bx( X t - X )
The left-hand term is the sum of the squared error about the mean; the first
right-hand term is the sum of the squared error about the linear regression; and
the last term is the sum o f the squared difference between the regression line
and the mean.
Equation 8.86 can be derived as follows;
Y, - Y i = Y i - Y + Y - Yt
= (Yi - Y ) ~ (Y - Y)
Using Equation 8.85, the third (last) term in the foregoing equation may be
rewritten as
- 2 X (Y - Y)(Y, - Y) = - 2 b \ 2 (Xi - X ) 2
- 2 X ( Y - Y)(Yi - Y) = - 2 2 ( Y - Y)2
R2 > 0
Further, from Equation 8.86 and the fact that sums o f squares are nonnegative
2 (y‘ - Y f s 2 (Y - Y)2
0 < R2 < 1
Furthermore, a large R 2 implies a good fit and a small R 2 implies a fit that
is of little value. A very rough rule of thumb is that R2 > .8 implies a good fit.
For a better measure of fit, the concept of degrees o f freedom is needed.
S = S, + S2 (8.88)
where
S = £ M - Yf (8.89)
t=1
SIMPLE LINEAR REGRESSION 539
n
Si = £ - ft )2 (8.90)
i=l
n
S2 = £ (ft - Yf (8.91)
i=l
The degrees o f freedom associated with a sum of squares indicates how many
independent pieces of information are contained in the sum of squares. For
example
n
£ Yj
2 (» - y )2
£ iXi - Y) = 0
i= I
S2/l £ (ft - n 2
F = (8.92)
^ / ( n - 2) X (Yi - f t ) 2/ ( « - 2)
540 STATISTICS
then, F will have a larger value when the fit is good and a smaller value when
the fit is poor.
If = 0, then the ratio given in Equation 8.92 has an F distribution (assuming
that the underlying variables are normal) with degrees o f freedom nx = 1 and
n2 ~ n - 2, we can use an F test for testing if the fit is good or not. Specifically,
if Ha is the null hypothesis that b2 = 0 (indicates a bad fit), then
where Fj.„_2;a is obtained from the table o f F distribution and a is the significance
level.
EXAMPLE 8.20.
Test whether the fit obtained in Example 8.19 is good at a significance level of
SOLUTION:
From the data given in Example 8.19, we calculate S2 = 32.4 with one degree
of freedom and Sj = 1.6 with three degrees o f freedom. Hence
„ 32.4/1
F = ■ = 60.75 with nt = 1 and n2 = 3.
From Appendix G we obtain Fu ,0 l = 5.54 and since F = 60.75 > 5.54 we judge
the fit to be good.
Statisticians use an analysis o f variance table (AN O VA) to analyze the sum
of squared errors (5, 5), and S2) and judge the goodness o f fit. The interested
reader may find details o f A N O VA in Reference [3]. Commercially available
software packages, in addition to finding the best linear fit, also produce ANO VA
tables.
X, *2 Y
Height of father Height of mother Height of child
Time in years Heating degree days Electric energy consumed
Voltage Resistance Current
Impurity density Cross-sectional area Strength
Height Age Weight
We first assume that there are two “ controlled” variables and minimize the
mean-square error just as was done in the case o f simple linear regression. This
results in a set o f three linear equations in three unknowns that must be solved
for b0, bx, and b2. Next we solve the simple linear regression problem again,
this time by using matrix notation so that the technique can be used to solve
the general case o f n “ controlled” variables.
Y — bo + + b^Xi (8.93)
Yi — b$ + b\X\j + b iX ji + €i (8.94)
Si = Y{ - t = Yi - b0 - b xX u - b2X 2J (8.95)
n n
= 2 e,? = 2 - b Q - blx lti - b2x 2,,y- (8.96)
1=1 /=!
Just as in the last section, the sum o f the squared errors is differentiated with
respect to the unknowns and the results are set equal to zero. The b’s are replaced
by the b's, which represent the values of the b's that minimize e2.
dez
- 2 2 ( X u ) (Y t ~ — ~~ b iX u ) — 0
abi
de2
- 2 2 { x 2,d(Yt - b0 - b tX u - b2x u ) = 0 (8 .9 7 )
db2
542 STATISTICS
b0 + b j t i + b2X 2 = Y (8.98)
k I X u + b , Z X I + b2 'Z X UX 2J = 2 (8.99)
So 2 + Si 2 * i X i + 4 2 * 1 / = 2 ^2.,-^ (8.100)
These three linear equations with three unknown coefficients define the best
fitting plane. They are solved in Problem 8.43. We now introduce matrix notation
in order to solve the general n controlled variable regression problem. In order
to introduce the notation, the simple (one “ controlled” variable) linear regres
sion problem is solved again.
i_____
Sl
X = b = bo e = b = b0
_bK
e i0
y, 0_ _! X a_
Y is a 10 x 1 vector
X is a 10 x 2 matrix
b and b are 2 x 1 vectors
e is a 10 x 1 vector
The equations that correspond with Equations 8.73 and 8.74 are
Y = Xb + e (8.101)
e re = S e? (8.102)
The resulting normal equations that correspond with Equations 8.81 and 8.82
are
X rY = X TXb (8.103)
The solutions for b0 and 6, as given in Equations 8.75 and 8.76 are
b = ( X TX ) - ' X TY (8.104)
MULTIPLE LINEAR REGRESSION 543
In this case, the inverse (X TX)~' will exist if there is more than one value of
X. In the case of more than one controlled variable, the inverse may not exist.
We will discuss this difficulty in the next subsection.
The resulting simple linear regression is
Y = Xb (8.105.a)
*1
e (8.105.b)
eio
where
= Y - Y, (8.105.C)
and
€
2 (8.105.d)
In checking the matrix forms versus the earlier scaler results, observe that
1 X,
1 - 1
X TX =
-Yi ••• X w
1 X 10
« 2 xt
2 X; 2 x]
r 2 y, l
X TY
2 X,Y,
Note that the solution is given in Equations 8.104 and 8.105 and the reader can
verify this result is the same as Equations 8.75, 8.76 and 8.84.
linear regression is
Y = Xb + e (8.106)
Y = Xb (8.107)
where
e re = (Y - X b ) T(Y - Xb)
= Y rY - brX rY - Y rXb + brX rXb
= Y rY - 2brX rY + b rX rXb (8.108).
Note that Equation 8.108 is a scalar equation. The normal equations are found
by partially differentiating Equation 8.108 and setting the results equal to zero,
that is
3b b=6
This results in the normal equations (see Equations 8.98, 8.99, and 8.100):
X rXb = X rY (8.109)
b = (X rX ) - 1X rY ( 8. 110)
In this case the inverse of the matrix (X TX) may not exist if some of the
“ controlled” variables X t are linearly related. In some practical situations the
dependence between X t and Xt cannot be avoided and a colinear effect may
cause (X TX) to be singular, which precludes a solution, or to be nearly singular,
which can cause significant problems in estimating the components of b and in
MULTIPLE LINEAR REGRESSION 545
interpreting the results. In this case it is usually best to delete one of the linearly
dependent variables from the set of controlled variables. See Reference [3].
£ ( Y t- Y f = £ (y , - Yd2 + £ (y - E)2
i=i i=i ■ i=i
(n - 1) degrees o f ( « - p ) degrees o f (p - 1) degrees of
freedom freedom freedom
Proceeding along the lines of discussion given in Section 8.9.2, we can form the
test statistic
2 (Y, - Y )V (p - 1)
F ~ 2 (Y, - Y,yi(n - p )
X Y
-2 4
-1 1
0 0
1 1
2 4
One would find that bx = 0 and that the fit is useless (check this). It seems
obvious that we should try to fit the model Y = b0 4 - + b2X 2. But is this
546 STATISTICS
linear regression that we have studied? The answer is “ yes” because the model
is linear in the parameters: b0, bu and b2. We can define X x = X and X 2 = X 1
and this is a multiple linear regression model. There is no requirement that the
controlled variables X t be statistically independent; however colinear controlled
variables tend to cause problems, as previously mentioned.
Thus, standard computer programs which perform multiple linear regression
include the possibility o f fitting curves o f the form Y = b0 + bxX + b2X 2 +
b2X l or Y = b0 + bxX x + b2X 2 + b2X\ + b4X xX 2 + b5X\, and so on. These
models are called linear regression because they are linear in the parameters.
We present an example.
EXAMPLE 8.21.
X Y
-3 19
(b) Fit Y = b0 + b yX andjin d 2 (F , - f , ) 2 and
-2 11 compare it with 2(y, - Y )2. Sketch the regres
-1 4 sion line.
-1 5
0 1 (c) Fit y = b'0 + b [ X + b^X2 and find 2(y, - Y,)2
1 8 and compare it with £(y, - Y ) 2. Sketch the
regression line.
2 15
2 13 Fit y = bl + b'[X + b"_(rW. Find S(F,- - Y,)2
(d)
3 28 and compare with S( y, — Y )2. Sketch the regres
4 41 sion line.
4 40
(e) Comment on which is the best fit.
5 63
5 57
SOLUTION:
2 (F, - Y )2 « 5862.8
Y = 22.848 + 2.843X
(b)
2 it - F,)2 = 4725
SUMMARY 547
(e) The data, along with the fits, are shown in Figure 8.16. It is clear from
the plot and from the squared errors given above that (c) is the best fit.
8.11 SUMMARY
After measurements and statistics were discussed, this chapter presented the
ory and applications of three basic statistical problems. The first was how to
estimate an unknown parameter from a sample of i.i.d. observations or mea
surements. We emphasized point (rather than interval) estimators, and we
also emphasized classical (rather than Bayesian) estimators. The maximum
548 STATISTICS
likelihood criterion was used to derive the form of the point estimators, and
the bias and variance or MSE were used to judge the “ quality” of the esti
mators.
The second statistical problem discussed was how to test a hypothesis using
observations or measurements. Specific tests discussed were tests of means,
difference between means, variances, and chi-square tests of goodness of fit.
All o f these tests involved the following steps: (1) finding the statistic Y =
g ( X u X 2, • • ■ , X „) on which to base the test, (2) finding f Y\Hqand f Y\Hs, and
(3) given the significance level, finding the critical region for the test and
evaluating the power o f the test. This differed from the hypothesis testing of
Chapter 6 primarily because the alternative hypotheses are composite.
Finally, the problem of fitting curves to data was discussed using simple linear
regression and multiple linear regression. This is another case of finding lin
ear (affine) estimators. It differs from Chapter 7 because here we assume that
the required means and covariances are unknown; thus they must be esti
mated from data. This estimation, that is, curve fitting, differs from the pa
rameter estimation discussed at the beginning of the chapter because in curve
fitting the value of the dependent random variable is to be estimated based
upon observation(s) of one or more related (controlled) random variables.
Parameters of a distribution are estimated from a sample (usually more than
one observation) from that distribution.
8 .1 2 REFERENCES
Yl = X , Y2 = X 2, . . . , Yn = X, (A8.1)
then
X { = nX - 2 X , = nYl - 2 Y,
i ~2 f- 7
n
Y (-T - M-)2
exp Aj 1^2 (A8.2)
Now
n n
2 (*. - P -)2 = 2 (-T - X + JC - p .)2
1=1
£ (xt ~ P-)2 = E 0 ; - x ) 2 \ + n 0 - p)
X ( X i - x ) 2 4 - n(x - p )2 . (A8.3)
exp «
,/ —1_______________________
2a 2
The transformation A8.1 has a Jacobian of n; thus, using Equations A8.1 and
A8.3
JV,...r„Oi, ■ ■ ■ , y n)
n{vt^)
x exp (ny i - y 2 y* - y i ) 2
2(7 2
Oh - p)2l
fv,(yi) = fx(y i) = exp < -
V 2 tt c r /V n 2 < j2l n J
Thus
= fx 2,x„...,xjx(y2 , . . . , y„|yO = Vn
X exp
2<j 2
(nyi - yz y„ - j' i)2 + 2 (yt - j' i)2
i= 2
(A8.5)
APPENDIX 8-A 551
;o)(n — 1)52
E < exp y,
*}-Jl
. 2 <7co
q dy2, , dy„
x exp 2 ^ ! exp 2<j2
where
q = («yq - y 2 - . . . - y n - y i )2 + (_y. _ y
i-2
= 2 (*, - T )2 = ( « - l)s 2
1=1
Thus
jo)(n - l)i*2
£ exp
(« - l)/2
(1 - /2a))(n~1)/2 J_„ 2™ 2 ,
X 6XP 1 2 ^ ^ ~ ;2W^ | ^ 2 , • • • ,
~i (n- 1)/2
j(x>(n — 1)S2 1
E < exp r . = y, = (A8.6)
.(1 - /2co)J
Note that Equation A8.6 is the same as Equation 8.39 with n - 1 = m. Thus
(n - l)S 2l a 2 has a conditional standard chi-square distribution with (n - 1)
degrees of freedom.
Furthermore, because the conditional characteristic function of ( n — l)^ 2/
a 2 given Yx = y { or X = yq, does not depend upon yq, we have shown that.
.Aand (n - l)S 2/cr2 are independent. Thus, (n — l)S 2/a2is also unconditionally
chi-square. We have thus proved assertions (1) and (2).
552 STATISTICS
Finally, because X and (n - l )S 2/ar2 are independent and (n - l)S 2/cr2 has
a standard chi-square distribution with (n - 1) degrees of freedom and because
V n ( X - |x ) / ct will be a standard normal random variable, then for n i.i.d.
measurements from a Gaussian random variable
Vn(X - p,)/cr
T„., (A8.7)
l(n - l ) S 2/cr2
V (n - 1)
.44. Thus
V7i ( X -- P-)
Tn-i = (A8.8)
will have the density defined in Equation 8.45, thus proving assertion (3).
8.14 PROBLEMS
8.1 The diameters in inches of 50 rivet heads are given in the table. Construct
(a) an empirical distribution function, (b) a histogram, and (c) Parzen’s
estimator using g { y ) and h(n) as suggested in Equation 8.6.
.338 .342
.341 .350
.337 .346
.353 .354
.351 .348
.330 .349
.340 .335
.340 .336
.328 .360
.343 .335
.346 .344
.354 .334
.355 .333
.329 .325
.324 .328
.334 .349
.355 .333
.366 .326
PROBLEMS 553
.343 .326
.341 .355
.338 .354
.334 .337
.357 .331
.326 .337
.333 .333
8.2 Generate a sample o f size 100 from a uniform (0, 1) random variable by
using a random number table. Plot the histogram. Generate a second
sample of size 100 and compare the resulting histogram with the first.
8.3 Generate a sample o f size 1000 from a normal (2, 9) random variable and
plot the empirical distribution function and histogram. (A computer pro
gram is essential.)
8.4 Using the data from Example 8-1, compute X and S2 from
a. The data in column 1
b. The data in column 2
c. The data in column 3
d. The data in column 4
e. The data in column 5
f. The data in column 6
g. The data in column 7
Compare the results. Is there more variation in the estimates of the mean
or the estimates of the variance?
8.5 When estimating the mean in Problem 8.4, how many estimators were
used? How many estimates resulted?
8.6 Let p represent the probability that an integrated circuit is good. Show
that the maximum likelihood estimator o fp is A c /n where N c is the number
o f good circuits in n independent trials (tests).
8.7 Find the maximum likelihood estimators of p. and a 2 from n i.i.d. mea
surements from a normal distribution.
x = 0, 1, . . . ; \ > 0.
554 STATISTICS
8.9 Assume that the probability that a thumbtack when tossed comes to rest
with the point up is P, a random variable. Assuming that a thumbtack is
selected from a population of thumbtacks that are equally likely to have
P be any value between zero and one, find the a posteriori probability
density of P for the selected thumbtack, which has been tossed 100 times
with point up 90 times.
8.10 The number o f defects in 100 yards of metal is Poisson distributed; that
is
P ( X = k\A = X) = e x p (-X )
8.13 Does e = t b + er? If not, find an equation that relates these quantities.
8.17 Find the bias of the maximum likelihood estimator of 0 that was found in
Example 8.6.
Fx (m) = .5
Find m the estimator o f the median derived from the empirical distribution
function.
8.20 a. If X is uniform (0, 10) and 20 cells are used in a histogram with
200 samples, find the bias, MSE, and normalized RMS error in the his
togram.
is normal with
E[ X] = p
c r-
8.24 Let
Z =
n fr.
where the X/s are normal and independent with mean zero and variance
cr2. Show that \
2cr4
V ar[Z] =
8.25 Show that the product of f Yl as given in the equation following Equation
A8.4 and as given in Equation A8.5 produce Equation A8.4.
556 STATISTICS
8.28 With five i.i.d. samples from a normal R.V., find the probability that
a. 5 2/<r2 s .839
b. S2Ia 2 s: 1.945
c. .839 < 5 2/ a 2 < 1.52
8.30 If a 2 is known rather than being estimated by S2, repeat Problem 8.29 and
compare results.
8.31 Refer to Example 8.16. Find a such that P[F15.a > 4.876].
8.32 If one suspected that the two samples in Example 8.16 might not have
come from the same random variable, then one computed the ratio,
S2/SI = 7, what is the conclusion?
8.34 If X is normal and the variance is unknown, find the critical region for
testing the null hypothesis p = 10 versus the alternative hypothesis p #
10, if the significance level of the test is 1% . Find the answer if the number
o f observations is 1 ,5 , 10, 20, 50.
8.36 Test at the 10%, 5% , and 1% levels o f significance the hypothesis that
a 2 = a j if both 5 2 and 5? have 20 degrees o f freedom, that is, find the
three critical regions for F.
8.37 Test the hypothesis that the following data came from an exponential
distribution. Use 1% significance level.
PROBLEMS 557
2 X,2 - n X 2 = ^ (X , - :* ) 2
2 ( f , - Y )2 = b t h xyr, - —
8.41 Compare Equations 8.75 and 8.76 with Equations 7.4 and 7.5.
2 x}*2
2 (x, - x y
where cr2 is the variance of Y.
558 STATISTICS
8.43 Solve Equations 8.98, 8.99, and 8.100 for b0, b1( and b2.
2 { y< - yy
l 4 9 11 3 8 5 10 2 7 6
*2 8 2 -8 -1 0 6 -6 0 -1 2 4 -2 -4
y 6 8 1 0 5 3 2 -4 10 -3 5
a. Find the best fit of the form
Y = b0 + b lX l
and compute the normalized RMS error o f the fit and compare with (a).
c. At a significance level of 0.01, test if the fits obtained in (a) and
(b) are good.
Q1 2 8 8 5 .0 0 3 6 9 8 3 .7 2 3 2 0 0 .0 0 1 1 .0 0
Q2 3 7 6 2 .0 0 / 3 7 0 0 6 .0 5 3 2 3 1 .0 0 4 1 4 .0 0
Q3 4 2 1 8 .0 0 3 6 5 6 3 .0 6 3 2 5 1 .0 0 1 4 2 1 .0 0
Q4 3 0 0 4 .0 0 3 6 6 1 1 .0 0 3 2 7 1 .0 0 1 0 5 .0 0
1983
Q1 2 7 3 9 .0 0 3 5 5 7 4 .6 3 3 2 9 1 .0 0 1 .0 0
Q2 3 6 1 8 .0 0 3 5 6 7 7 .6 1 3 3 1 1 .0 0 3 7 8 .0 0
Q3 4 4 6 4 .0 0 3 5 2 6 7 .6 9 3 3 1 1 .7 5 1 4 4 0 .0 0
Q4 3 2 9 2 .0 0 3 6 4 5 3 .8 8 3 3 0 5 .5 0 6 4 .0 0
1984
Q1 3 1 6 4 .0 0 3 6 1 4 5 .1 3 3 3 1 0 .2 5 0 .0 0
Q2 4 0 4 7 .0 0 3 6 1 5 1 .4 9 3 3 1 0 .0 0 5 8 5 .0 0
Q3 4 6 5 1 .0 0 3 6 0 8 2 .9 5 3 3 0 7 .7 5 1 3 5 4 .0 0
Q4 2 8 1 3 .0 0 3 6 4 5 3 .8 8 3 3 0 5 .5 0 7 1 .0 0
1985
Q1 3 4 0 0 .0 0 3 6 3 5 3 :2 6 3 3 0 3 .2 5 1 1 .0 0
Q2 3 8 1 3 .0 0 3 6 1 5 2 .6 0 3 3 0 1 .0 0 5 4 8 .0 0
Q3 4 4 3 9 .0 0 3 5 9 4 4 .7 4 3 3 0 5 .2 5 1 3 3 4 .0 0
Q4 3 1 3 7 .0 0 3 5 9 6 5 .6 1 3 3 0 5 .0 0 3 9 .0 0
1986
Q1 3 0 2 3 .0 0 3 5 9 1 0 .2 9 3 3 0 3 .3 6 2 6 .0 0
Q2 4 0 3 3 .0 0 3 6 0 1 7 .6 4 3 3 0 3 .9 5 6 7 3 .0 0
Q3 4 6 9 8 .0 0 3 5 4 2 8 .0 8 3 3 0 5 .2 9 1 4 3 2 .5 0
Q4 2 9 6 0 .0 0 3 5 5 5 1 .6 0 3 3 0 6 .0 1 1 0 5 .0 0
CHAPTER NINE
9.1 INTRODUCTION
In the first part of the book (Chapters 2 through 7), we developed probabilistic
models for random signals and noise and used these models to derive signal
extraction algorithms. We assumed that the models as well as the parameters
o f the models such as means, variances, and autocorrelation functions are known.
Although this may be the case in some applications, in a majority of practical
applications we have to estimate (or identify) the model structure as well as
estimate parameter values from data. In Chapter 8 we developed procedures
for estimating unknown values of parameters such as means, variances, and
probability density functions. In this chapter we focus on the subject of estimating
autocorrelation and power spectral density functions.
Autocorrelation and power spectral density functions describe the (second-
order) time domain and frequency domain structure o f stationary random proc
esses. In any design using the MSE criteria, the design will be specified in terms
of the correlation functions or equivalently the power spectral density functions
of the random processes involved. When these functions are not known, we
estimate them from data and use the estimated values to specify the design.
We emphasize both model-based (or parametric) and model-free (or non-
parametric) estimators. Model-based estimators assume definite forms for auto
correlation and power spectral density functions. Parameters of the assumed
model are estimated from data, and the model structure is also tested using the
data. If the assumed model is correct, then accurate estimates can be obtained
using relatively few data. Model-free estimators are based on relatively general
assumptions and require more data. Thus, models can be used to reduce the
amount of data required to obtain “ satisfactory” estimates. However, a model
TESTS FOR STATIONARITY AND ERGODICITY 561
that substitutes assumptions for data will produce “ satisfactory” estimates only
if the assumptions are correct.
In the first part o f this chapter we emphasize model-free estimators for
autocorrelation and power spectral density functions. The remainder o f the
chapter is devoted to model-based estimators using A R IM A (autoregressive
integrated moving average) models. We use this Box-Jenkins type of parametric
estimator because we feel that autoregressive and moving-average models are
convenient and useful parametric models for random sequences and because
Box-Jenkins algorithms provide well-developed and systematic procedures for
identifying the model, estimating the parameters, and examining the adequacy
of the fit. Software packages for processing data using the Box-Jenkins pro
cedure are commercially available.
Throughout this chapter we will emphasize digital processing techniques for
estimation.
then the process is ergodic. Thus, if we can assume that the process is Gaussian
and that this condition is pnet, .then testing for stationarity is equivalent to testing
for ergodicity.
562 ESTIMATING THE P A R A M E T E R S OF RANDOM PROCESSES
Various tests for stationarity are possible. For instance, one can test the
hypothesis that the mean is stationary by a test o f hypothesis introduced in
Section 8.8.4. In addition, the A R IM A model estimation procedure suggested
later in this chapter provides some automatic tests related to stationarity, and
in fact suggests ways o f creating stationary functions from nonstationary ones.
Thus, stationarity can and should be tested when estimating the parameters of
a random process. However, practical considerations militate against completely
stationary processes. Rather, random sequences encountered in practice may
be classified into three categories:
1. Those that are stationary over long periods o f time: The underlying
process seems stationary, for example, as with thermal-noise and white-
noise generators, and the resulting data do not fail standard stationarity
tests such as those mentioned later. No series will be stationary indefi
nitely; however, conditions may be stationary for the purpose of the
situation being analyzed.
2. Those that may be considered stationary for short periods o f time: For
example, the height o f ocean waves may be stationary for short periods
when the wind is relatively constant and the tide does not change sig
nificantly. In this case, the data and their subsequent interpretation and
use must be limited to these periods.
3. Sequences that are obviously nonstationary: Such series possibly may be
transformed into (quasi-)stationary series as suggested in a later section.
Figure 9.1 S a m p le fu n c t io n o f a r a n d o m s e q u e n c e .
or for nonstationary trends in the data. First we will describe the test and then
we will suggest how it can be applied to test for stationarity. In Section 9.4.7,
we use this test for “ whiteness” of a noise or error sequence.
Assume that we have 2N samples. N of these samples will be above (a) the
median and N of these samples will be below (b) the median. (Ties and an odd
number of samples can be easily coped with.)
The total number of possible distinct arrangements of the N a’s and the N b's
is (2#). If the sample is random and i.i.d., then each of the possible arrangements
is equally likely. A possible sequence with (N = 8) is
a a b b b a b b b b a a a b a a
The total number of clusters of a’s and b’s will be called R for the number
of runs. R = 7 in the foregoing sequence.
It can be shown [12] that the probability mass function for R is
564 ESTIMATING THE P A R A M E T E R S O F R A N D O M P R O C E S S E S
E{R} = N + 1 (9.2)
, N( N -1)
_ O AT 7~ (9.3)
2N - 1
If the observed value o f R is close to the mean, then randomness seems rea
sonable. On the other hand, if R is either too small or too large, then the
null hypothesis of randomness should be rejected. Too large or too small is de
cided on the basis of the distribution of R. This distribution is tabulated in
Appendix H.
EXAMPLE 9.1.
Test the given sequence for randomness at the 10% significance level.
SOLUTION:
In this sequence N = 8 ,
£■[/?] = 9
and
Using R as the statistic for the run test, the limits found at .95 and .05 from
Appendix H with N = 8 are 5 and 12. Because the observed value of 7 falls
between 5 and 12, the hypothesis o f randomness is not rejected at the 10%
significance level.
In order to test stationarity of the mean square values for either a random
sequence or a random process, the time-average mean square values are com
puted for each of 2 N nonoverlapping equal time intervals by (continuous data)
M O D E L -F R E E E S T I M A T I O N 565
or by (sample data)
tj G (7) _i , Tj)
i y i y=l
where 7)_, and 7) are respectively the lower and the upper time limits o f the z'th
interval and jV, is the number o f samples in the z'th interval. The resulting sample
mean square values can be the basis of a run test. That is, the sequence
X j , i = 1, 2, . . . , 2 jV is classified as above (a) or below (b) the median and
the foregoing procedure is followed. It is important that the interval length (7) —
7)_i) should be significantly longer than the correlation time of the random
process. That is if
r ,-,)
then R xx(t ) ~ 0 is a requirement to use the run test as a test for stationarity of
mean square values as described.
In this section we address the problem of estimating the mean, variance, auto
correlation, and the power spectral density functions o f an ergodic (and hence
stationary) random process X(t), given the random variables A'(O), A'(l), . . . ,
X ( N — 1), which are sampled values o f X(t). A normalized sampling rate of
one sample per second is assumed. Although the estimators o f this section require
some assumptions (or model), they are relatively model-free as compared with
the estimators introduced in the next section.
We will first specify an estimator for each o f the unknown parameters or
functions and then examine certain characteristics o f the estimators including
the bias and mean squared error (MSE). The estimators that we describe in this
section,/will bej similar in form to the “ time averages” described in Chapter 3,
except )ye will'use discrete-time versions o f the “ time averages” to obtain these
estimators. Problems at the end of this chapter cover some continuous versions.
_ , N- 1
* - s 2 m (9.4)
' V /=<)
566 ESTIMATING THE P A R A M E T E R S O F R A N D O M P R O C E S S E S
V a r[*] = — 2 2 (9.5.a)
(=0 ;=0
where
V a rA
varL m J -- NCxx{0)
N2 + 2{-NJp
+ - 1} rc xx(l)
m 4. 2{ N— —
+ ----- m
Cxx{2)
+ + Jp Cxx(N ~ !) (9-5.b)
Cxx(0)
Var[A] =
N
j N - k - 1
R x x ( ~ k ) = Rxx(k)
Note that because of finite sample size (N), we have the following effects (see
Figures 9.2 and 9.3).
M O D E L -F R E E E S T I M A T I O N 567
------------------------------ N points-----------------------------
F ig u r e 9 .2 E s t im a t io n o f Rxx(k).
1. With N data points we can only estimate Rxx(k) for values of k less than
N, that is
f 1
RxAQ = j J f T J 2 X [i )X( i + k), k < S (9.7)
0 k > ,V
Rxx(~~k) = Rxx(k)
RxxW
In the case of Gaussian processes, these moments can be evaluated (see Section
2.5.3) and the variance of Rxx(k) can be computed. (Problem 9.8.)
EXAMPLE 9.2.
-)
rx, k = 0
Rxx(k) 0 A: A 0
S O L U T IO N :
For k = 0
1
&xx(P) = Tr ?2 M O P
N /=0
N- 1
J.
2 E{ [ X(i ) f} =
N
V a r { ^ ( 0 ) } = £ { [ ^ v ( 0)]2} - t o * ( 0)F
2 [ £ ( M 0 ] 4} + (N - l ) v x2 E{[X(i)]2}
N 2 _ i=0
[/V(3a4
x) + iV(iV - 1) ctx]
N2
N + 2
Vx
N
and hence
Based on the foregoing equation, we can define an estimator for the psd as
•V-l
Sxxif) = 2 Rxx(k)exp(-j2Trkf),
(9.9)
where
Rxx(k) — 1 X { i ) X( i + k), k = 0, 1, 2,
N 1
kn
Xf (^ ) = 2 * 0 )exp ( - ; 2 tt 7 7 /> k = 0, . . . ,N - l (9 .1 0 .a)
1 ^ __ ( k
N *?0 X f W 6XP ( +j2lT 7 7 / ’ H = ° ’ ' ’ •’ N ~ 1 (9-10.b)
quations 9 lO.a and 9.10.b define the discrete Fourier transform (DFT) It
can be shown that a time-limited signal cannot be perfectly band-limited- how
ever, tor practical purposes, we can record a signal from 0 to TM, which is
approximately limited to no frequency component at or above B. Discrete Four-
ier transforms (DFTs) based on such algorithms are readily available (see A p
pendix B). If N is chosen to be a power of 2, then fast Fourier transform (FFT)
algorithms, which are computationally efficient, can be used to implement Equa-
S -k - 1
1_
Rxx(k) 2 X (i )X( i + k),
N i=0 k = 0, 1 , . . . , N — 1 ( 9. 1 1 .a)
^ t^ t/At/hisif stimator differs from Rxx{k) given in Equation 9.6 by the factor
(A - k)/N, that is
Now if we define
d{n) = \ 1’ n = _
(0 elsewhere (9-12)
A A
The Fourier transform of Rxx(k), denoted by Sxx(f), is
= 2 t; E d(n)X(n)d(n + k )X { n + k) exp( - j l T t k f )
k= - ® L /V rt= - ®
_1
d(n)X(n)exp(j2Tinf)
/V
N-l
= 2 X(n) exp( - }2Tt nf )
rt=0
E{Sxx(f)}
{J. R-XX(/c)e x p (-y 2 ir/c/)
= E £ {^xAr(k)}exp( -j2Ttkf)
2 d(n)d(n + k)
2 d(n)d(n + k)
=i k=-
Rxx( k) ex p (- j 2i rk f) _
where
qN(k) = ^ 2 d(n)d(n + k)
The sum on the right-hand side of Equation 9.14, is the Fourier transform of
the product of q»(k) and Rxx{k)\ thus
sin^JTV)
QM) - i (9.16.b)
sin irf
Equations 9.14 and 9.15 show that Sxx( f ) is a biased estimator and the bias
results from the truncation and the triangular window function [as a result of
using 1/N instead of l/(N — \k\) in the definition o f /?**(*)]. Equation 9.16
shows that Sxx is convolved with a (sine)2 function in the frequency domain.
Plots of qx (k), its transform Q x( f ) , and the windowing effect are shown in
Figure 9.4.
The convolution given in Equation 9.16.a has an “ averaging” effect and it
produces a “ smeared” estimate of Sxx( f ) . The effect o f this smearing is known
as spectral leakage. If Sxx( f ) has two closely spaced spectral peaks, the period
ogram estimate will smooth these peaks together. This reduces the spectral
resolution as shown in Figure 9.4.b.
574 E S T I M A T I N G THE P A R A M E T E R S O F R A N D O M P R O C E S S E S
qNm Q„(f)
(a)
sxx(f>
sxx<n * qnt )
« fl /2
E{Sxx( f ) } = I Sxx( a)8(f — a) da = Sxx( f )
J- 1/2
j2"nnp\
Z X («)ex p - p = 0, . . . , [N/2]
N N ~N~I
=Al+B] (9.17)
where
( iTxnp
(9.18.a)
a ’ ~ V n x ( " )c“ V N
and
B,
‘ vs
1 V x(")sm
^ l~
w • ( 2™ P (9.18.b)
Both A p and Bp are linear combinations of Gaussian random variables and hence
are Gaussian. Also because X( n ) is zero-mean
E { A P} = E{Bp) = 0
and
Var{ A p} = E{Aj}Pi
,V- i
u
<J ^ ^ / 2irnp\
— > cos- ----------
N \ N J
cr- p A 0,
N
T’ 7
(9.19)
N\
0,
2
p A 0,
1
(9.20)
~N~
N(0, cr2), P = o. 2
576 ESTIM ATING THE PARAMETERS O F RA N D O M PROCESSES
( o-A N
n°’7 )’ p ¥= 0 ,
_2_
~N~
0 P = o,
_2 _
and
N~
p A 0, (9.21)
2
and
(9.22)
N
p A 0,
2
Var (9.23)
k
Equation 9.23 shows that, for most values of f , Sxx( f ) has a variance of cr4.
Since we have assumed that Sxx{ f ) = cr1, the normalized standard error, er, of
M O D E L -F R E E E S T I M A T I O N 577
V v a r Sxx( f) cr2
— = 100%
Sxx(f) cU
EXAMPLE 9.3.
If X(n) is a zero-mean white Gaussian sequence with unity variance, find the
expected value and the variance of the periodogram estimator o f the psd.
SOLUTION:
Rxx(k) — 1> k = 0
= 0 elsewhere
E{Sxx( f ) } = ?*(<)) -1 = 1
N
1, p * 0,
2
Var
5
Note that because of the white Gaussian assumption, both the mean and the
variance are essentially (except at the end points) constant with p.
o. —
Frequency (Hz)
( a ) N= 4096
Frequency (Hz)
(6) N = 1024
Figure 9.5 Periodogram of a random binary waveform plus two tones.
where Sxx( f ) k is the spectral estimate obtained from the kt\\ segment o f the
data. If we assume that the estimators Sxx( f ) k are independent (which is not
completely true if the data segments are adjacent), the variance of the averaged
estimator will be reduced by the factor n. However, since fewer points are used
to obtain the estimator Sxx( f ) k, the function (Equation 9.16.b) will be
wider than in the frequency domain, and from Equation 9.16.a it can be
seen that the bias will be larger.
A similar form of averaging can also be done by averaging spectral estimates
in the frequency domain. This averaging can be done simply as
1 P + ‘
SXX 2 ^xx (9.25)
k 2m + 1 N
580 ESTIMATING THE P A R A M E T E R S O F R A N D O M P R O C E S S E S
Frequency (Hz)
{ a ) N = 1024
Frequency (Hz) -
(6) N -1 0 2 4
( j2trkm\
Rxx(k) —
2N 2 XP
2N expl^ r jj
0, 1, 2, . . . , N - 1 (9.26.a)
(«) ‘ R"w“p(-^xr)
|p| = 0, 1, . . . , y (9.26.d)
Steps l.b and l.c, if completed via FFT algorithms, produce a computationally
efficient method of calculating an estimate of RXx(k). Truncating in Step 2, with
M N, is equivalent to deleting the “ end points” of Rxx(k), which are estimates
produced with fewer data points and hence are subjected to large variations.
Deleting them will reduce the variance, but also will reduce the resolution, of
the spectral estimator. Multiplying with * (* ) and taking the Fourier transform
(Equation 9.26.c) has the effect
_ r i/2
E{SXx( f) } = Sxx( a ) W M( f — a) da
J - 1/2
where WM(J) is the Fourier transform o f the window function \ (*). In order to
reduce the bias (and spectral leakage), \(*) should be chosen such that WM(f)
has most of its energy in a narrow “ main lobe” and has smaller “ side lobes.”
This reduces the amount of “ leakage.” Several window functions have been
proposed and we briefly summarize their properties. Derivations o f these prop
erties may be found in References [10] and [11], It should be noted that most
of these windows introduce a scale factor in the estimator of the power spectral
density.
Rectangular Window.
sin M + |) 2 tr /
WM(f) (9.27.b)
sin(rr/ )
Since WM( f) is negative for some / , this window might produce a negative
estimate for the psd. This window is seldom used in practice.
M O D E L -F R E E E S T I M A T I O N 583
Bartlett Window.
Since WM(f) is always positive, the estimated value is always positive. When
M = N — 1, this window produces the unsmoothed periodogram estimator
given in Equation 9.11.
Blackman-Tukey Window.
1 uk
\(k) = 2
1 4- cos
~M
|A:j £ M, M < N
(9.29.a)
0, otherwise
where
sin M + 2-17f
D M(2itf) (9.29.c)
sin(irf)
Parzen Window.
m =
2I1--I f
lo
M )’ < t S M <N
otherwise
<9-m a )
TtMf
sin
3
w M(f) = 1 - | sin2(-rr/) (9.30.b)
4 A/ 3
- sin(ir/ )
584 E S T I M A T I N G THE P A R A M E T E R S O F R A N D O M P R O C E S S E S
Note that WM(f) is nonnegative and hence Parzen’s window produces a non
negative estimate of the psd.
B i a s { ^ ( / ) } = C, (9.31)
and
(9.32)
where C) and C2 are constants that have different values for the different windows
and S'xx(f) is the second derivative of Sxx(f).
If we choose M = 2V iv as suggested in the literature on spectral estimation,
then as the sample size N —* we can see from Equations 9.31 and 9.32 that
both the bias and variance of the smoothed estimators approach zero. Thus, we
have a family of asymptotically unbiased and consistent estimators for the psd.
P q
x (n) = 2 <!>„,*(« - 0 + 2 e,.*e(n - k) (9.33)
i' =l k=0
9 ( /)
H(f) = (9.34)
«*>(/)
where
q
© (/) = 2 e?.*exP( —/ 2 tt/ / c) (Moving average part) (9.35)
k=0
p
$ (/) = <f>p,/exp(-y'2 -ir/t) (Autoregressive part) (9.36)
1=1
586 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
Then
where Sxx(f) is the power spectral density function of the random sequence
X (n ) and
Data
Win)
Differencing to Remove Linear Mean Trends. Assume W(n) has a linear trend
superimposed upon a white noise sequence. A plot o f a sample sequence is
shown in Figure 9.8. W(n) is clearly a nonstationary sequence. However, its
first difference Y(n), where
will be stationary.
EXAMPLE 9.4.
Let
or
This sequence is stationary because the difference of two jointly stationary se
quences is stationary. Furthermore the nonzero mean, .2 in the example, can
be removed as follows
Y(n) = .2 + X (n ) (9.43)
That is, the original series is a summation of the differenced series plus
perhaps a constant as in Example 9.4. Because of the analogy with continuous
processes, W(n) is usually called the integrated version of Y(n), and if Y(n)
(or X (n ) in the example) is represented by an A R M A model, then W(n) is said
to have an A R IM A model where the / stands for “ integrated.”
A general procedure for trend removal calls for first or second differencing
of nonstationary sequences in order to attempt to produce stationary series. The
resulting A R M A stationary models are then integrated (summed) the corre
sponding number of times to produce a model for the original sequence. The
M O D E L -B A S E D ESTIMATION 589
total model is then called A R IM A (p, d, q), where p is the order o f the auto
regressive (A R ) part of the model, d is the order o f differencing (I), and q is
the order of the moving average (M A ) part of the model.
EXAMPLE 9.5.
Let
S O L U T IO N :
and all o f the roots k, must lie within the unit circle. If |k;| = 1 then the auto
correlation coefficient will not decay rapidly with m. Thus, failure of the auto
correlation to decay rapidly suggests that a root with magnitude near 1 exists.
This indicates a nonstationarity of a form that simple differencing might
transform into a stationary model because differencing removes a root o f mag-
M O D E L -B A S E D ESTIMATION 591
nitude one. That is, if rxx(m) has a root of 1, then it can be shown that W( n),
where
W(n) = (1 - z ~ l) X ( n ) (9.49)
( z _1 is the backshift operator) has a unit root o f rxx(m) factored out, and thus
may be stationary. Thus, it is usually assumed that the order of differencing
necessary to produce stationarity has been reached when the differenced series
has a sample autocorrelation coefficient that decays fairly rapidly.
l.o r-
<t>kk
o
25
Lag number k
Figure 9.10. Judgement and experience are required in order to decide when
a.c.c.’s or p.a.c.c.’s are approximately equal to zero.
S O L U T IO N :
Sample Partial
autocorrelation autocorrelation
coefficients coefficients
TlTTTrrT T T TTT T
* li
(a)
.T.m .
l **
InTTT.l t.
( 6)
It l i t TTT T
i * ***....~k
(c)
tTt. t t, li
M)
(e )
F ig u r e 9 .1 1 S a m p le a . c . c . a n d p . a . c . c . f o r E x a m p le
9 .6 . C u m u la t iv e p o w e r s p e c t r a l d e n s it y fu n c t io n s o f
r e s id u a ls .
Thus, the order of both the autoregressive and the moving average parts of
the model can be guessed as described. In practice, usually both p and q are
small, say p + q < 10. Commercial computer algorithms that plot p .a.c.c.’s and
a.c.c.’s for various trial models are readily available, and some packages auto
matically “ identify” p, d, and q.
£ {e (n )} = 0
n = ]
n ¥= j
and
and then find the values of 4>p,i and cr2 that maximize L. Although this approach
is conceptionally simple, the resulting estimators are somewhat difficult to ob
tain.
MODEL-BASED ESTIMATION 595
and hence by using Lc to find the estimators, we are essentially ignoring the
effects o f the first p data points.
If p, the order o f the autoregressive model, is small compared with N, the
number o f observations, then this “ starting transient” can be neglected and
Equation 9.51 can be used to approximate the unconditional density and thus
the likelihood function.
From Equation 9.50 with X( i ) = x(i), i = 2, . . . , p, we obtain
e ( p + 1) = x ( p + 1) - 2 4>P,iX(p + 1 - i)
1= 1
e ( p + 2) = x ( p + 2) - X 4>P,ix iP + 2 ~ i)
i= 1
e(N) = x(N) - X - 0, N » p
l« 1
L c = n
j = p +1
U p U i)
V
- t KMj -
i- 1
0)
/
\A (9-53)
N
1
exp 0 (9 .5 4 )
Lc s x(j) X 4 v x(/
(V^TTcr)2 2c r 2 1
596 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
Equation 9.54 can be written as the logarithm lc, o f the conditional likelihood
function
- ( N - p) (N-p)
4(4yu 4>p,2> ■ ■ ■ <IW>o'2) ln(2-rr) — In cr2
N
1
x { j ) ~ E 4>Pjx(.j ~ i) (9.55)
2<r2 E
Because 4>u appears in only the last or the sum of squares terms, the max
imum likelihood estimator of 4>u is the one that minimizes
This results in the estimator (the observations are now considered to be random
variables)
Note that this is simply the sample correlation coefficient at lag 1. In order to
find an estimator for a 2, we differentiate Equation 9.57 with respect to a 2 and
set the result equal to zero. This produces an estimator for cr2 in which the
residual sum of squares
’ That is, e(n) is the observed value of e(n) with the estimated value used
for «)>!,!.
Thus, Equations 9.59 and 9.61 can be used to estimate the parameters in a
first-order autoregressive model. The estimated values of cr and <f>u can be used
to obtain an estimate o f the psd o f the process (according to Equation 5.11) as
1
S x x (f) = !/!< (9-62)
1 - 2 $ , , 1c o s ( 2 tt/ ) T 2
598 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
An example is given in Problem 9.28. Note that Sxx( f) will be biased even if
4>w and cr* are unbiased. This will be a good estimator only if the variances of
4>u and ct2 are small. Similarly the autocorrelation function can be estimated
by replacing the parameter in Equation 5.10 by its estimate.
The sample autocorrelation functions can be estimated from data by any o f the
standard methods; here we use
&xx(n) = ~ ~ n 2 X ( i ) X ( i + n)
Equations 9.65 and 9.66 can be solved for 4>2,i and 4>2,2 as follows:
l^ ^ ( l) R x x (l) \ „
1Rxx(2) ^ xy (0)1 _ ^ x x (l)[^ x x (0 ) ~ &xxU-)\
\RXX(0) £**(1)] ~ [^xx-(O)]2 - [ A w (l)] 2
\R x x U ) Rxx( 0)|
Rxx(0)R xx(2) — [ R XX ( M
(9.68)
* («) = X - 0 + e(/j)
where
The set o f equations (9.70) can be solved recursively for the using the
Levinson recursive algorithm. Levinson recursion starts with a first-order A R
process and uses Gram-Schmidt orthogonalization to find the solutions through
600 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
a /7th order A R process. (See Reference [8].) One valuable feature o f the
Levinson recursive solution algorithm is that the residual sum o f squared errors
at each iteration can be monitored and used to decide when the order o f the
model is sufficiently large.
e(n) = - 0 e ( n - 1) + X ( n ) (9.71.a)
or using the z ~1operator and assuming that this model is invertible (see Equation
5.38)
Equation 9.71 shows that not only is the error a sum o f the observations,
but is a nonlinear function of 0. Thus, the sum o f squares of the e{i) is nonlinear
in 0. Differentiating this function and setting it equal to zero will also lead to a
set of nonlinear equations that will be difficult to solve. Higher-order moving
average processes result in the same kind of difficulties. Thus, the usual regres
sion techniques are not applicable for minimizing the sum o f the squared errors.
We will resort to an empirical technique for finding the best estimators for moving
average processes.
sum o f the squared errors. Then, with the same starting value e(0), we assume
another value, say 02, and calculate the sum of the squared errors. This is done
for a number of values of 0 and the 0, that minimizes the sum of the squared
errors is the estimate §. The observed value of error e{n) will be denoted by
e(n) where e(/i) = X(n) — §e(n - 1).
EXAMPLE 9.7.
0 0
1 .50 .50
2 .99 .74
3 - .4 8 -.8 5
4 - .2 0 + .22
5 -1.31 -1 .4 2
6 .81 + 1.52
7 1.82 1.06
8 2.46 1.93
9 1.07 .10
10 -1 .2 9 -1 .3 4
602 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
SOLUTION:
For illustration we start with an initial guess o f 0 = .5 and calculate the sum of
the squared errors as shown in Table 9.3. The value e(0), o f e(0) is assumed to
be zero. This reduces the likelihood function to be the conditional [on e(0) =
0] likelihood function.
Table 9.4 gives the sum of squared errors for various values of 0. From this
table it is clear that the sum o f squared errors is minimum when 0 = .5. Thus,
the best or (conditional) maximum likelihood estimate is 0 = .5, if only one
significant digit is used.
Note that the estimate in Example 9.7 is based on, or conditional on, e(0) =
0. A different assumption for e(0) might result in a slightly different estimate.
However, when N S> 1, then the assumed value o f e(0) does not significantly
affect the estimate o f 0.
Problem 9.33 asks for an estimate based on the procedure described, and
Problem 9.34 asks for an estimate when the initial value e(0) is estimated by a
backcasting procedure.
X ( n ) = 02,ie(n - 1) + d2,2e ( n - 2) + e ( n )
is carried out in basically the same manner as for the first-order moving average
process. Once again, the likelihood function is a nonlinear function o f the pa
rameters. Thus, the same trial calculation o f the sums o f squares is used. There
are two modifications required.
First, two values o f e, e(0) and e( —1), must be assumed, and they are usually
assumed to be zero. Second, the plot o f the sum o f squares versus values of 02,i
and 022 is now two-dimensional.
MODEL-BASED ESTIMATION 603
EXAMPLE 9.8.
Using the data from Table 9.5, estimate 02,i, 02,2> <j 2n, and a 2
x in a second-order
moving average model.
SOLUTION:
The conditional sum o f squared errors is shown in Table 9.6 for selected values
o f 02,i and 022. From this table, the best estimators o f 02,i and 02,2 are
The variance of e, a 2
N can be estimated from
y *\n) 97.481
a 2s
N - 2 98
where the summation is the minimum sum of squares (97.481 in the example)
and N — 2 is the number o f observations minus two, the degrees o f freedom
used in estimating 02,i and 022.
TABLE 9.5 X(n), n = 1......... 100 (READ ACROSS) FOR EXAMPLE 9.8
Procedure for General Moving Average Models. The procedure for estimating
the parameters of the general moving average model
X(n) = 2 9 qj e { n - i) + e(n)
is the same as that given previously. However, when q is 3 or larger, the opti
mization procedure is much more difficult and time consuming. We do not
consider this case in this introductory text. We should note that the optimization
is aided if reasonable first guesses of the parameters are available (see Section
9.4.6).
MODEL-BASED ESTIMATION 605
One can obtain a slightly better estimate o f the sum o f the squared errors
by a reverse estimation of e(0), e( - 1 ) , . . . , e( - q + 1) (see Problem 9.34),
but for reasonably long data sequences, this extra effort is usually unnecessary
because the initial values have little significant effect on the final estimates.
Note that again in this case, the maximum likelihood estimator is closely
approximated by the minimum MSE estimator based on the conditional sum of
squares and the deviation is significant only for small N. However, one cannot
minimize the sum of the squares by differentiating and setting the derivatives
equal to zero as was done in the autoregressive case.
0U \ -0 .4 0 -0 .3 6 -0 .3 2 -0 .3 1 -0 .3 0 -0 .2 9 -0 .2 5
0.40 98.998 97.647 96.745 96.584 96.448 96.336 96.122
0.43 97.936 96.916 96.317 96.228 96.163 96.121 96.176
0.44 97.636 96.724 96.223 96.158 96.116 96.097 96.240
0.45 97.364 96.558 96.155 96.113 96.094 96.097 96.327
0.46 97.120 96.418 96.111 96.092 96.095 96.121 96.437
0.47 96.902 96.304 96.092 96.096 96.121 96.169 96.570
0.48 96.712 96.215 96.097 96.124 96.171 96.240 96.725
0.49 96.549 96.153 96.128 96.176 96.246 96.336 96.904
0.55 96.147 96.323 96.829 97.004 97.199 97.413 98.461
0.60 96.563 97.178 98.093 98.367 98.659 98.969 100.391
EXAMPLE 9.9.
Consider the data in Table 9.7. Find the estimates o f cj>u , 0U , and <j 2
n.
SOLUTION:
The sum of squares for various values of < J > i and 0 La are shown in Table 9.8.
The resulting estimates are 0u = .46 and <}>u = —.31. (Note 0M = .47 and
4>!, = —.32 are also possible choices.) The variance a N 2 o f e is estimated from
,v
1 e2(n)
96.092
= .99 (9.73)
97
Briefly consider the first-order moving average model. For zero-mean ran
dom variables the sample autocorrelation coefficient at lag 1 can be estimated
by
2 X{n)X(n - 1)
rXX{ 1) = ------------------
2 X 2(n)
eu
rrar(l)
1 + 0?.i
(9.74)
= 2 iT T > l ' t V r = ^ ;jm
It can be shown that only one o f these values will fall within the invertibility
limits, —1 < 0U < + 1 , if the correlation coefficients are true values. If the
correlation coefficients are estimated, then usually Equation 9.74 will produce
only one solution within the allowable constraints.
In a second-order moving average model (see Equation 5.45)
02,1 + 0 2 , 102,2
rXx { l ) (9.75)
l + 02,i + el,2
and
_____ 0^2_____
r x v ( 2 ) — (9.76)
1 + 02,1 + 0 l2
These two equations can be solved for the initial estimates, §2,i and 02,2 by using
the estimated values for rx x { 1) and rx x (2).
Initial estimates for a mixed autoregressive moving average process are found
using the same general techniques. The autocorrelation coefficients for the first
p + q values o f lag are estimated from the data, and equations that relate these
608 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
(1 + 4>i,i8u )(4>u + 9 U )
(9.77)
(1 + 0f,i + 2<t>u 0u )
and
These two equations can be solved in order to find initial estimates of (j>u and
0i,! in terms o f the estimates, rx x ( l) and rx x (2).
In all cases the allowable regions for the unknown parameters must be ob
served. Then these initial estimates should be used in the more efficient (i.e.,
smaller variance) estimating procedures described in previous sections. (If N is
very large, then the initial estimates may be sufficiently accurate.)
C(k) = (N - d) 2 n . ( m) (9.80)
m=1
is approximately distributed as
X k-p-q
where ree is the sample correlation coefficient of the observed errors e(m ), for
example, e(m) = X ( m ) — X ( m ) , N — d is the number of samples (after dif
ferencing) used to fit the model, and p and q are the number of autoregressive
and moving average terms respectively in the model.
Thus, a table of chi-square with (k - p — q) degrees of freedom can be
used to compare the observed value of C with the value x l - P-q;a found in a table
EXAMPLE 9.10.
Using the data o f Table 9.9, which are the observed errors e (« ), after an assumed
A R IM A (1, 0, 2) model has been fit, test for randomness using a chi-square
test with the first 20 sample autocorrelation coefficients, at the 5% significance
level.
SOLUTION:
Using Equation 9.80
Durbin- Watson Test. If we wish to examine the residual errors e (n) after fitting
an A R IM A model, the Durbin-Watson test is often used. The observed errors
e(n) are assumed to follow a first-order autoregressive model, that is,
2 [« (« ) ~ 6(n “ !) ] 2
D = ^ v-------------------- (9.82)
X e2( ; )
; =i
To test the null hypothesis p = 0 versus the alternative p < 0 the procedure
is
EXAMPLE 9.11.
Use the data from Example 9.10 and test the hypothesis p = 0 at the 5%
significance level versus the alternative, p > 0.
612 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
SOLUTION:
100
2 [eO ) - e( « - i)]
/. =*>
D = 100 « .246
2 *2( j )
It has been shown by Bartlett* that the cumulative spectral density function
C ( f ) of the residuals provides an efficient test for whiteness, where
cr 1
N - d - p - q „ =I
( a ) A p p r o x i m a t e l y w h it e re s id u a ls
( b ) N o n w h it e re s id u a ls .
F ig u r e 9 .1 2 C u m u la t iv e s p e c tr a l d e n s it y
fu n c t io n s o f re s id u a ls .
9.5 SUMMARY
The primary purpose of this chapter was to introduce the subject of estimat
ing the unknown parameters o f a random sequence from data. The basic
ideas of statistics that were introduced in Chapter 8 were applied to estimat
ing the mean value, the autocorrelation function, and the spectral density
function. Throughout this chapter, the random sequence is assumed to be sta-
614 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
tionary, and if only one sample function is available, which is often the case,
then ergodicity is also assumed. Tests for stationary were discussed in Section
9.2, and simple transformations to induce stationarity were discussed in Sec
tion 9.4.1.
This model-based approach has the advantage o f needing fewer data for esti
mators with acceptably small variance, that is, the model is a substitute for
data. If the assumed model is adequate, this is a definite savings; if not, the
use of the estimated parameters in the assumed model is likely to result in
poor designs and analysis based on the model. Another advantage o f the
A R M A parametric models is that the form o f the resulting model is very con
venient for both analysis and design of communication and control systems.
9.6 REFERENCES
T h e lite r a t u r e o n s p e c tr a l e s t im a t io n is v a st a n d is e x p a n d in g at a r a p id r a te . R e f e r e n c e s
[8 ] a n d [9 ] p r e s e n t c o m p r e h e n s i v e c o v e r a g e o f r e c e n t d e v e l o p m e n t s in p a r a m e t e r e s t i
m a t io n , m o s t o f w h ic h e m p h a s iz e a u t o r e g r e s s iv e m o d e l s . In c e r t a in im p o r t a n t a p p lic a
t io n s , t h e s e r e c e n t ly d e v e l o p e d t e c h n iq u e s r e s u lt in m o r e e f f ic ie n t e s t im a t o r s th a n th e
_
PROBLEMS 615
g e n e r a l m e t h o d s i n t r o d u c e d h e r e . R e f e r e n c e s [7 ], [1 0 ] a n d [1 1 ] c o v e r b o t h p a r a m e t r ic
a n d n o n p a r a m e t r ic m e t h o d s o f s p e c t r a l e s t im a t io n . R e f e r e n c e s [ 8 ] , [9 ] a n d [1 0 ] a r e r e c e n t
t e x t b o o k s th at h a v e e x c e lle n t t r e a tm e n t s o f s p e c t r a l e s t im a t io n m e t h o d s . R e f e r e n c e [7]
c o m p a r e s a v a r ie ty o f t e c h n iq u e s th a t a r e n o t c o v e r e d in th is c h a p t e r a n d c o n t a in s s e v e r a l
in t e r e s tin g e x a m p le s .
T h e b o o k b y B o x a n d J e n k in s [3 ] c o n t a in s a n e x c e ll e n t c o v e r a g e o f fittin g A R I M A
m o d e l s t o d a ta . M u c h o f th e m a te r ia l in t h e s e c o n d h a lf o f th is c h a p t e r is d e r i v e d f r o m
[3 ], R e f e r e n c e [6] c o v e r s s im ila r m a te r ia l.
R e f e r e n c e [2 ] is th e c la s s ic r e f e r e n c e o n n o n p a r a m e t r ic e s t im a t io n o f p o w e r s p e c t r a ,
a n d [1 ] h a s b e e n a w id e ly u s e d t e x t o n th is m a te r ia l.
C u r r e n t a r t ic le s d e a lin g w it h s p e c t r a l e s t im a t io n m a y b e f o u n d in t h e r e c e n t is s u e s o f
th e IE E E Transactions on Acoustics, Speech and Signal Processing.
[1 ] J. S. B e n d a t a n d A . G . P i e r s o l, Random Data: Analysis and Measurement Pro
cedures, S e c o n d E d it i o n , W i l e y I n t e r s c ie n c e , N e w Y o r k , 1 9 8 6 .
[5 ] J. D u r b in , “ T e s t in g f o r S e r ia l C o r r e l a t i o n in L e a s t -S q u a r e s R e g r e s s i o n w h e n S o m e
o f t h e R e g r e s s o r s a r e L a g g e d D e p e n d e n t V a r i a b l e s ,” Econometrica, V o l. 3 8 , 1970,
p . 410.
[7 ] S. M . K a y a n d S . L . M a r p l e , J r ., “ S p e c t r u m A n a ly s is — A M o d e r n P e r s p e c t iv e ,”
Proceedings o f the IEEE, V o l. 6 9 , N o . 11, N o v . 1981.
9.7 PROBLEMS
by comparing the sample mean o f the first half with the sample mean of
the second half and testing, at the 1% significance level, the null hypothesis
that they are the same.
Rxx(k) = e x p ( -0 .2 £ 1
2)
Show that X(ti) is ergodic.
9.4 If Xi and X 2 are jointly normal with zero means, then show that
a.
E { X $ = 3ct2
Hint: See Example 2.13.
1 90
«xx(10) = — 2 X ( i ) X ( i + 10)
9.6 Show that Equation 9.5.b follows from Equation 9.5.a in the stationary
case.
PROBLEMS 617
9.8 Show that the variance o f Rxx(k) as given in Equation 9.7 for a Gaussian
random sequence is for 0 < k < N
^ N -k-1
(N _ k\2 Z (N — k - \m\)[R2
xx(m)
\i y K) m =~(N-k-l)
= N, P = o, ' E
2
9.10 Find the mean, variance, and covariance o f A p and Bp defined in Equation
9.18, that is find E{A„}, E{BP}, Vai{Ap}, Var{Bp}, Covar{Ap, A q} and
Covar{/4p5 ,}.
X ( n ) = [W W + W{n - 1)]
n = 1,2,3, , 1000
9.13 With reference to Problem 9.12, divide the data into 10 contiguous seg
ments o f 100 points each. Obtain Sxx(f) for each segment and average
the estimates. Plot the average o f the estimates Sxx(f) and Sxx(f).
9.14 Smooth the spectral estimate in the frequency domain obtained in Prob
lem 9.12 using a rectangular window o f size 2m + 1 = 9. Com
pare the smoothed estimate Sxx(f) with Sxx(f).
618 ESTIMATING THE PARAMETERS OF RANDOM PROCESSES
9.16 With reference to Problem 9.12, apply the Barlett, Blackman-Tukey, and
Parzen windows with M = 25 and plot the smoothed estimates of Sxx(f).
Show that
a. p. is unbiased
Show that
2=
^ 4 BT
Show that
a. X 2 is unbiased.
a- R x x (k ) = ^ E d ( i ) X ( i ) d ( i + k ) X ( i + k) ,
k = 0, . . . , ' n - 1
a X(n) ~ CM-A»
Y (n ) = In A'(n)
9.24 Examine the data set given in Table 9.9 for stationarity by
c. Plotting S2 versus n.
d. Plotting rxx versus n.
e. Plotting X {n ) x X( n - 1) versus n
PROBLEMS 621
9.27 The data W{n) in Table 9.10 (read across) have a periodic component.
Plot the data and find a reasonable guess for m in the model
X (n ) — (1 — z~m) W( n) = W( n) — W(n — m)
9.28 Estimate the parameters <f>u and cr2 o f a first-order autoregressive model
from the data set in Table 9.11 (read across).
9.29 Differentiate Equation 9.57 partially with respect to ar2and set the resulting
derivative equal to zero in order to find the maximum likelihood estimator
of a 2.
9.30 Why are “ Equations” 9.65 and 9.66 only approximations? Show the exact
form that follows from Equation 9.64.
9.32 Give the matrix form of the estimators for an orderp autoregressive model.
Hint: See Section 8.10.3.
9.33 Given the data in Table 9.13, estimate 0 in the model X(n) = 0<?(« — 1)
+ e{n).
9.34 In the previous problem, estimate the unconditional sum of squares using
the following procedure:
c. Assume that a(0) is equal to zero, that is, its unconditional ex
pectation because it is independent of X( n) , n a 1.
d. At the end of the backcast, AT(0) can be found.
e. Using this AT(0), as opposed to assuming it is equal to zero, find
X ( l ) from X ( n ) = 0e(n - 1) + e(n). That is, e(0) = AT(0) - 0e( —1)
and because e ( - l ) = 0, e(0) = AT(0). Then e (l) = AT(1) - 0e(O) and
subsequently all e(n) and the sum of squares can be calculated.
9.35 Compare the estimated unconditional sum of squares in Problem 9.33 with
the conditional sum of squares from Problem 9.34.
9.38 Solve Equations 9.75 and 9.76 for 02,i and 02,2 if L w (l) = -4 and
rx x (2 ) = .1 .
9.39 Solve Equations 9.77 and 9.78 for , and 0!! if rx x ( 1) = .4 and
rXx( 2) = -I-
9.41 Show that if e(n) is a zero-mean white noise sequence with variance cr2,
then the expected value o f D as given by Equation 9.82 is approximately
equal 2.
9.42 Test the error sequence shown in Table 9.15 for whiteness using the run
test.
9.43 Test the error sequence shown in Table 9.15 for whiteness, that is, p =
0, using the Durbin-Watson test.
9.44 Test the error sequence shown in Table 9.15 for whiteness using a cu
mulative spectral density plot.
9.45 Test the error sequence shown in Table 9.15 for whiteness using a chi-
square test on the sum o f the sample autocorrelation coefficients.
APPENDIX A
Fourier Transforms
x(t) = J X (f ) e x p (j 2 n f t ) df
p |x(r)P dt = |*(/)|2 df
J -0 0 J - X
( 1) A
s in tt/ t A
t A t A t s in e / t
- t /2 0 t 72 tt / t
( 2) a
„ sin2 ti/ t ,
St , ". = St sine2 / t
-T 0 T (tt/ t )2
1
(3 ) e ~ “ 'u ( f )
a + y'2Tr/
2 t
(4) exp( —| t ) / T )
1 + (2 - tt/ t )2
1 /( 2 W )
s in 2 T rH /f
( 6) = s in e 21/Vf
2vW t -w o IV
j + 1 , f > 0
(11) sgn f =
t~ 1 , f < 0 tt/
1, t > 0 1
(12) o(f) = I 2 » (/) +
0, f < 0 /2 tt/
S o u rc e : K. Sam Shanmugan, D ig ita l a n d A n a lo g C o m m u n ic a tio n S y s te m s , John Wiley
& Sons, New York, 1979, p. 582.
A P P E N D IX B
-jlTikn
2 x(n)exp k = 0, 1, 2, . . . , N - 1
N ~~N
If x{n) is even,
^ |<N
k = 1 ,2 , ,
s
= X*F ( - kIN)
DISCRETE FOURIER TRANSFORMS 629
1 for n = n0 - j2 iz k n0
x(n) = exp
otherwise N
APPENDIX C
Z Transforms
F(z)
/.
1. 8, 1, all z
4. k". k 3= 0
( - 4 ) 1,- r r ' ,<l2'
z-11 - z - ’ )” '. 0 < |zj
5. n =s k
(w' )
(1 + ZW”, 0 < |z|
6. ("1 0 « k n
8. k^O
^ - z - ^ jll - a z -')-', |a | < |z|
11. a all k 2
(1 - a 2)[(1 - az) (1 - a z " 1) ] ^ 1, | a| < |z| <
a
k ?0
17. cosh a k k » 0 (1 - z ' ' c o s h a)(1 - 2 z ~ ' c o s h a + z -2 ) \
1
<N
18. si n h a k , k 0 ( z ~ ’ s i n h a)(1 — 2 z 'c o s h a + z V ,
m a x | |a j,
Gaussian Probabilities
(1) P { X > \LX + yu x ) = Q (y ) = J - ^ = exp ( —^~j dz
(4) erfcO ) = ■
— = | exp( —z 2) dz = 2 Q (V 2 y ), y > 0.
A P P E N D IX
CD
co" C T
i—H
33 CD
c
E
&
O
O O
ZJ z r
0) T '
CO
_Q
c
CD
—i
CD
2 .0 1 0 0 2 5 1 .0 2 0 1 0 0 7 .0 5 0 6 3 5 6 .1 0 2 5 8 7 5 .9 9 1 4 7 7 .3 7 7 7 6 9 .2 1 0 3 4 1 0 .5 9 6 6
3 .0 7 1 7 2 1 2 .1 1 4 8 3 2 .2 1 5 7 9 5 .3 5 1 8 4 6 7 .8 1 4 7 3 9 .3 4 8 4 0 1 1 .3 4 4 9 1 2 .8 3 8 1
4 .2 0 6 9 9 0 .2 9 7 1 1 0 .4 8 4 4 1 9 .7 1 0 7 2 1 9 .4 8 7 7 3 1 1 .1 4 3 3 1 3 .2 7 6 7 1 4 .8 6 0 2
5 .4 1 1 7 4 0 .5 5 4 3 0 0 .8 3 1 2 1 1 1 .1 4 5 4 7 6 1 1 .0 7 0 5 1 2 .8 3 2 5 1 5 .0 8 6 3 1 6 .7 4 9 6
6 .6 7 5 7 2 7 .8 7 2 0 8 5 1 .2 3 7 3 4 7 1 .6 3 5 3 9 1 2 .5 9 1 6 1 4 .4 4 9 4 1 6 .8 1 1 9 1 8 .5 4 7 6
7 .9 8 9 2 6 5 1 .2 3 9 0 4 3 1 .6 8 9 8 7 2 .1 6 7 3 5 1 4 .0 6 7 1 1 6 .0 1 2 8 1 8 .4 7 5 3 2 0 .2 7 7 7
8 1 .3 4 4 4 1 9 1 .6 4 6 4 8 2 2 .1 7 9 7 3 2 .7 3 2 6 4 1 5 .5 0 7 3 1 7 .5 3 4 6 2 0 .0 9 0 2 2 1 .9 5 5 0
9 1 .7 3 4 9 2 6 2 .0 8 7 9 1 2 2 .7 0 0 3 9 , 3 .3 2 5 1 1 1 6 .9 1 9 0 1 9 .0 2 2 8 2 1 .6 6 6 0 2 3 .5 8 9 3
10 2 .1 5 5 8 5 2 .5 5 8 2 1 3 .2 4 6 9 7 3 .9 4 0 3 0 1 8 .3 0 7 0 2 0 .4 8 3 1 2 3 .2 0 9 3 2 5 .1 8 8 2
11 2 .6 0 3 2 1 3 .0 5 3 4 7 3 .8 1 5 7 5 4 .5 7 4 8 1 1 9 .6 7 5 1 2 1 .9 2 0 0 2 4 .7 2 5 0 2 6 .7 5 6 9
12 3 .0 7 3 8 2 3 .5 7 0 5 6 4 .4 0 3 7 9 5 .2 2 6 0 3 2 1 .0 2 6 1 2 3 .3 3 6 7 2 6 .2 1 7 0 2 8 .2 9 9 5
13 3 .5 6 5 0 3 4 .1 0 6 9 1 5 .0 0 8 7 4 5 .8 9 1 8 6 2 2 .3 6 2 1 2 4 .7 3 5 6 2 7 .6 8 8 3 2 9 .8 1 9 4
14 4 .0 7 4 6 8 4 .6 6 0 4 3 5 .6 2 8 7 2 6 .5 7 0 6 3 2 3 .6 8 4 8 264190 2 9 .1 4 1 3 3 1 .3 1 9 3
15 4 .6 0 0 9 4 5 .2 2 9 3 5 6 .2 6 2 1 4 7 .2 6 0 9 4 2 4 .9 9 5 8 2 7 .4 8 8 4 3 0 .5 7 7 9 3 2 .8 0 1 3
16 5 .1 4 2 2 4 5 .8 1 2 2 1 6 .9 0 7 6 6 7 .9 6 1 6 4 2 6 .2 9 6 2 2 8 .8 4 5 4 3 1 .9 9 9 9 3 4 .2 6 7 2
17 5 .6 9 7 2 4 6 .4 0 7 7 6 7 .5 6 4 1 8 8 .6 7 1 7 6 2 7 .5 8 7 1 3 0 .1 9 1 0 3 3 .4 0 8 7 3 5 .7 1 8 5
18 6 .2 6 4 8 1 7 .0 1 4 9 1 8 .2 3 0 7 5 9 .3 9 0 4 6 2 8 .8 6 9 3 3 1 .5 2 6 4 3 4 .8 0 5 3 3 7 .1 5 6 4
mm. £
Percentage Points of the x™Distribution (C o n tin u e d )
That is, values of x i;. where m represents degrees of freedom and
{m/2)-1
exp{—y l2 ) dy = 1 - a.
Jo 2r(ml2) \2,
For m <100, linear interpolation is adequate. For m > 100, V2x^is approximately normally distributed with mean
V2 m - 1 and unit variance, so that percentage points may be obtained from Appendix D.
.995 .990 .975 .950 .050 .025 .010 .005
19 6.84398 7.63273 8.90655 10.1170 30.1435 32.8523 36.1908 38.5822
20 7.43386 8.26040 9.59083 10.8508 31.4104 34.1696 37.5662 39.9968
21 8.03366 8.89720 10.28293 11.5913 32.6705 35.4789 38.9321 41.4010
22 8.64272 9.54249 10.9823 12.3380 33.9244 36.7807 40.2894 42.7956
23 9.26042 10.19567 11.6885 13.0905 35.1725 38.0757 41.6384 44.1813
24 9.88623 10.8564 12.4011 13.8484 36.4151 39.3641 42.9798 45.5585
Table of tm Distributions
TABLE F.1________________________________________
Values of fm;owhere m equals degrees of freedom and
+ n v y r i - g
J— V T rm r(m / 2 ) \ mj
APPENDIX G
Table of F Distributions
Values of F „,2.a, where (m„ m 2) is the pair of degrees of freedom in Fmvmj and
+ m2)l 2
r((m , + m z)/2! /# n, \ 071/2 [ F - _ •„
x ' - 1 2) — dk = 1 — a
r ( m 1/2 ) r ( m 2/2 ) \ m2 / Jo m2 j
\ tn,
h 2 3 4 5 6 7 8 9
1 3 9 .8 6 4 4 9 .5 0 0 5 3 .5 9 3 5 5 .8 3 3 5 7 .2 4 1 5 8 .2 0 4 5 8 .9 0 6 5 9 .4 3 9 5 9 .8 5 8
2 8 .5 2 6 3 9 .0 0 0 0 9 .1 6 1 8 9 ,2 4 3 4 9 .2 9 2 6 9 .3 2 5 5 9 ,3 4 9 1 9 .3 6 6 8 9 .3 8 0 5
3 5 .5 3 8 3 5 .4 6 2 4 5 .3 9 0 8 5 .3 4 2 7 5 .3 0 9 2 5 .2 8 4 7 5 ,2 6 6 2 5 .2 5 1 7 5 .2 4 0 0
4 4 .5 4 4 8 4 .3 2 4 6 4 .1 9 0 8 4 .1 0 7 3 4 .0 5 0 6 4 .0 0 9 8 3 .9 7 9 0 3 .9 5 4 9 3 .9 3 5 7
5 4 .0 6 0 4 3 .7 7 9 7 3 .6 1 9 5 3 .5 2 0 2 3 .4 5 3 0 3 .4 0 4 5 3 .3 6 7 9 3 .3 3 9 3 3 .3 1 6 3
6 3 .7 7 6 0 3 .4 6 3 3 3 .2 8 8 8 3 .1 8 0 8 3 .1 0 7 5 3 .0 5 4 6 3 .0 1 4 5 2 .9 8 3 0 2 .9 5 7 7
7 3 .5 8 9 4 3 .2 5 7 4 3 .0 7 4 1 2 .9 6 0 5 2 .8 8 3 3 2 .8 2 7 4 2 .7 8 4 9 2 .7 5 1 6 2 .7 2 4 7
8 3 .4 5 7 9 3 .1 1 3 1 2 .9 2 3 8 2 .8 0 6 4 2 .7 2 6 5 2 .6 6 8 3 2 .6 2 4 1 2 .5 8 9 3 2 .5 6 1 2
9 3 .3 6 0 3 3 .0 0 6 5 2 .8 1 2 9 2 .6 9 2 7 2 .6 1 0 6 2 .5 5 0 9 2 .5 0 5 3 2 .4 6 9 4 2 .4 4 0 3
10 3 .2 8 5 0 2 .9 2 4 5 2 .7 2 7 7 2 .6 0 5 3 2 .5 2 1 6 2 .4 6 0 6 2 ,4 1 4 0 2 .3 7 7 2 2 .3 4 7 3
11 3 .2 2 5 2 2 .8 5 9 5 2 .6 6 0 2 2 .5 3 6 2 2 .4 5 1 2 2 .3 8 9 1 2 .3 4 1 6 2 .3 0 4 0 2 .2 7 3 5
12 3 .1 7 6 5 2 .8 0 6 8 2 .6 0 5 5 2 .4 8 0 1 2 .3 9 4 0 2 .3 3 1 0 2 .2 8 2 8 2 .2 4 4 6 2 .2 1 3 5
13 3 .1 3 6 2 2 .7 6 3 2 2 .5 6 0 3 2 .4 3 3 7 2 .3 4 6 7 2 .2 8 3 0 2 .2 3 4 1 2 .1 9 5 3 2 .1 6? 8
14 3 .1 0 2 2 2 .7 2 6 5 2 .5 2 2 2 2 .3 9 4 7 2 .3 0 6 9 2 .2 4 2 6 2 .1 9 3 1 2 .1 5 3 9 2.1220
15 3 .0 7 3 2 2 .6 9 5 2 2 .4 8 9 8 2 .3 6 1 4 2 .2 7 3 0 2 .2 0 8 1 2 .1 5 8 2 2 .1 1 8 5 2 .0 8 6 2
16 3 .0 4 8 1 2 .6 6 8 2 2 .4 6 1 8 2 .3 3 2 7 2 .2 4 3 8 2 .1 7 8 3 2 .1 2 8 0 2 .0 8 8 0 2 .0 5 5 3
17 3 .0 2 6 2 2 .6 4 4 6 2 .4 3 7 4 2 .3 0 7 7 2 .2 1 8 3 2 .1 5 2 4 2 .1 0 1 7 2 .0 6 1 3 2 .0 2 8 4
18 3 .0 0 7 0 2 .6 2 3 9 2 .4 1 6 0 2 .2 8 5 8 2 .1 9 5 8 2 .1 2 9 6 2 .0 7 8 5 2 .0 3 7 9 2 .0 0 4 7
19 2 .9 8 9 9 2 .6 0 5 6 2 .3 9 7 0 2 .2 6 6 3 2 .1 7 6 0 2 .1 0 9 4 2 .0 5 8 0 2 .0 1 7 1 1 .9 8 3 6
20 2 .9 7 4 7 2 .5 8 9 3 2 .3 8 0 1 2 .2 4 8 9 2 .1 5 8 2 2 .0 9 1 3 2 .0 3 9 7 1 .9 9 8 5 1 .9 6 4 9
21 2 .9 6 0 9 2 ,5 7 4 6 2 .3 6 4 9 2 .2 3 3 3 2 .1 4 2 3 2 .0 7 5 1 2 .0 2 3 2 1 .9 8 1 9 1 .9 4 8 0
22 2 .9 4 8 6 2 .5 6 1 3 2 .3 5 1 2 2 .2 1 9 3 2 .1 2 7 9 2 .0 6 0 5 2 .0 0 8 4 1 .9 6 6 8 1 .9 3 2 7
23 2 .9 3 7 4 2 .5 4 9 3 2 .3 3 8 7 2 .2 0 6 5 2 .1 1 4 9 2 .0 4 7 2 1 .9 9 4 9 1 .9 5 3 1 1 .9 1 8 9
24 2 .9271 2 .5 3 8 3 2 .3 2 7 4 2 .1 9 4 9 2 .1 0 3 0 2 .0 3 5 1 1 .9 8 2 6 1 .9 4 0 7 1 .9 0 6 3
25 2 .9 1 7 7 2 .5 2 8 3 2 .3 1 7 0 2 .1 8 4 3 2 .0 9 2 2 2 .0 2 4 1 1 .9 7 1 4 1 .9 2 9 2 1 .8 9 4 7
26 2 .9091 2 .5 1 9 1 2 .3 0 7 5 2 .1 7 4 5 2 .0 8 2 2 2 .0 1 3 9 1 .9 6 1 0 1 .9 1 8 8 1 .8841
27 2 .9 0 1 2 2 . 5 1 Q6 2 .2 9 8 7 2 .1 6 5 5 2 .0 7 3 0 2 .0 0 4 5 1 .9 5 1 5 1 .9 0 9 1 1 .8 7 4 3
28 2 .8 9 3 9 2 .5 0 2 8 2 ,2 9 0 6 2 .1 5 7 1 2 .0 6 4 5 1 .9 9 5 9 1 .9 4 2 7 1 .9001 1 .8 6 6 2
29 2 .8871 2 .4 9 5 5 2 .2 8 3 1 2 .1 4 9 4 2 .0 5 6 6 1 .9 8 7 8 1 .9 3 4 5 1 .8 9 1 8 1 .8 5 6 0
30 2 .8 8 0 7 2 .4 8 8 7 2 .2 7 6 1 2 .1 4 2 2 2 .0 4 9 2 1 .9 8 0 3 1 .9 2 6 9 1 .8 8 4 1 1 .8 4 9 8
40 2 .8 3 5 4 2 .4 4 0 4 2 .2 2 6 1 2 .0 9 0 9 1 .9 9 6 8 1 .9 2 6 9 1 .8 7 2 5 1 .8 2 8 9 1 .7 9 2 9
60 2 .7 9 1 4 2 .3 9 3 2 2 .1 7 7 4 2 .0 4 1 0 1 .9 4 5 7 1 .8 7 4 7 1 .8 1 9 4 1 .7 7 4 8 1 .7 3 8 0
120 2 .7 4 7 8 2 .3 4 7 3 2 .1 3 0 0 1 .9 9 2 3 1 .8 9 5 9 1 .8 2 3 8 1 .7 6 7 5 1 .7 2 2 0 1 .6 8 4 3
00 2 .7 0 5 5 2 .3 0 2 6 2 .0 8 3 8 1 .9 4 4 9 1 .8 4 7 3 1 .7741 1 .7 1 6 7 1 .6 7 0 2 1 .6 3 1 5
T A B L E O F F D IS T R IB U T IO N S 639
APPENDIX G ( C o n t in u e d )
a = .10
\ m,
10 12 15 20 24 30 40 60 120 CO
1 6 0 .1 9 5 6 0 .7 0 5 6 1 .2 2 0 6 1 .7 4 0 6 2 .0 0 2 6 2 .2 6 5 6 2 .5 2 9 6 2 .7 9 4 6 3 .0 6 1 6 3 .3 2 8
2 9 .3 9 1 6 9 .4 0 8 1 9 .4 2 4 7 9 .4 4 1 3 9 .4 4 9 6 9 .4 5 7 9 9 .4 6 6 3 9 .4 7 4 6 9 .4 8 2 9 9 .4 9 1 3
5 .2 3 0 4 5 .2 1 5 6 5 .2 0 0 3 5 .1 8 4 5 5 .1 7 6 4 5 .1 6 8 1 5 .1 5 9 7 5 .1 5 1 2 5 .1 4 2 5 5 .1 3 3 7
3
4 3 .9 1 9 9 3 .8 9 5 5 3 .8 6 8 9 3 .8 4 4 3 3 .8 3 1 0 3 .8 1 7 4 3 .8 0 3 6 3 .7 8 9 6 3 .7 7 5 3 3 .7 6 0 7
3 .2 9 7 4 3 .2 6 8 2 3 .2 3 8 0 3 .2 0 6 7 3 .1 9 0 5 3 .1 7 4 1 3 .1 5 7 3 3 .1 4 0 2 3 .1 2 2 8 3 .1 0 5 0
5
6 2 .9 3 6 9 2 .9 0 4 7 2 .8 7 1 2 2 .8 3 6 3 2 .8 1 8 3 2 .8 0 0 0 2 .7 8 1 2 2 .7 6 2 0 2 .7 4 2 3 2 .7 2 2 2
2 .7 0 2 5 2 .6 6 8 1 2 .6 3 2 2 2 .5 9 4 7 2 .5 7 5 3 2 .5 5 5 5 2 .5 3 5 1 2 .5 1 4 2 2 .4 9 2 8 2 .4 7 0 8
7
8 2 .5 3 3 0 2 .5 0 2 0 2 .4 6 4 2 2 .4 2 4 6 2 .4 0 4 1 2 .3 8 3 0 2 .3 6 1 4 2 .3 3 9 1 2 .3 1 6 2 2 .2 9 2 6
9 2 .4 1 6 3 2 .3 7 8 9 2 .3 3 9 6 2 .2 9 8 3 2 .2 7 6 8 2 .2 5 4 7 2 .2 3 2 0 2 .2 0 8 5 2 .1 8 4 3 2 .1 5 9 2
10 2 .3 2 2 6 2 .2 8 4 1 2 .2 4 3 5 2 .2 0 0 7 2 .1 7 8 4 2 .1 5 5 4 2 .1 3 1 7 2 .1 0 7 2 2 .0 8 1 8 2 .0 5 5 4
12 2 .1 8 7 8 2 .1 4 7 4 2 .1 0 4 9 2 .0 5 9 7 2 .0 3 6 0 2 .0 1 1 5 1 .9 8 6 1 1 .9 5 9 7 1 .9 3 2 3 1 .9 0 3 6
13 2 .1 3 7 6 2 .0 9 6 6 2 .0 5 3 2 2 .0 0 7 0 1 .9 8 2 7 1 .9 5 7 6 1 .9 3 1 5 1 .9 0 4 3 1 .8 7 5 9 1 .8 4 6 2
14 2 .0 9 5 4 2 .0 5 3 7 2 .0 0 9 5 1 .9 6 2 5 1 .9 3 7 7 1 .9 1 1 9 1 .8 8 5 2 1 .8 5 7 2 1 .8 2 8 0 1 .7 9 7 3
15 2 .0 5 9 3 2 .0 1 7 1 1 .9 7 2 2 1 .9 2 4 3 1 .8 9 9 0 1 .8 7 2 8 1 .8 4 5 4 1 .8 1 6 8 1 .7 8 6 7 1 .7551
16 2 .0 2 8 1 1 .9 8 5 4 1 .9 3 9 9 1 .8 9 1 3 1 .8 6 5 6 1 .8 3 8 8 1 .8 1 0 8 1 .7 8 1 6 1 .7 5 0 7 1 .7 1 8 2
17 2 .0 0 0 9 1 .9 5 7 7 1 .9 1 1 7 1 .8 6 2 4 1 .8 3 6 2 1 .8 0 9 0 1 .7 8 0 5 1 .7 5 0 6 1.7191 1 .6 8 5 6
18 1 .9 7 7 0 1 .9 3 3 3 1.8868 1 .8 3 6 8 1 .8 1 0 3 1 .7 8 2 7 1 .7 5 3 7 1 .7 2 3 2 1 .6 9 1 0 1 .6 5 6 7
19 1 .9 5 5 7 1 .9 1 1 7 1 .8 6 4 7 1 .8 1 4 2 1 .7 8 7 3 1 .7 5 9 2 1 .7 2 9 8 1 .6 9 8 8 1 ,6 6 5 9 1 .6 3 0 8
20 1 .9 3 6 7 1 .8 9 2 4 1 .8 4 4 9 ’ 1 .7 9 3 8 1 .7 6 6 7 1 .7 3 8 2 1 .7 0 8 3 1 .6 7 6 8 1 .6 4 3 3 1 .6 0 7 4
21 1 .9 1 9 7 1 .8 7 5 0 1 .8 2 7 2 1 .7 7 5 6 1.7481 1 .7 1 9 3 1 .6 8 9 0 1 .6 5 6 9 1 .6 2 2 8 1 .5 8 6 2
22 1 .9 0 4 3 1 .8 5 9 3 1 .8111 1 .7 5 9 0 1 .7 3 1 2 1 .7021 1 .6 7 1 4 1 .6 3 8 9 1 .6 0 4 2 1 .5 6 6 8
23 1 .8 9 0 3 1 .8 4 5 0 1 .7 9 6 4 1 .7 4 3 9 1 .7 1 5 9 1 .6 8 6 4 1 .6 5 5 4 1 .6 2 2 4 1.5871 1 .5 4 9 0
24 1 .8 7 7 5 1 .8 3 1 9 1.7831 1 .7 3 0 2 1 .7 0 1 9 1.6721 1 .6 4 0 7 1 .6 0 7 3 1 .5 7 1 5 1 .5 3 2 7
25 1 ,8 6 5 8 1 .8 2 0 0 1 .7 7 0 8 1 .7 1 7 5 1 .6 8 9 0 1 .6 5 8 9 1 .6 2 7 2 1 .5 9 3 4 1 .5 5 7 0 1 .5 1 7 6
26 1 .8 5 5 0 1 .8 0 9 0 1 .7 5 9 6 1 .7 0 5 9 1.6771 1 .6 4 6 8 1 .6 1 4 7 1 .5 8 0 5 1 .5 4 3 7 1 .5 0 3 6
27 1 .8 4 5 1 1 .7 9 8 9 1 .7 4 9 2 1 .6951 1 .6 6 6 2 1 .6 3 5 6 1 .6 0 3 2 1 .5 6 8 6 1 .5 3 1 3 1 .4 9 0 6
1 .8 3 5 9 1 .7 8 9 5 1 .7 3 9 5 1 .6 8 5 2 1 .6 5 6 0 1 .6 2 5 2 1 .5 9 2 5 1 .5 5 7 5 1 .5 1 9 8 1 .4 7 8 4
28
29 1 .8 2 7 4 1 .7 8 0 8 1 .7 3 0 6 1 .6 7 5 9 1 .6 4 6 5 1 .6 1 5 5 1 .5 8 2 5 1 .5 4 7 2 1 .5 0 9 0 1 .4 6 7 0
1 .8 1 9 5 1 .7 7 2 7 1 .7 2 2 3 1 .6 6 7 3 1 .6 3 7 7 1 .6 0 6 5 1 .5 7 3 2 1 .5 3 7 6 1 .4 9 8 9 1 .4 5 6 4
30
40 1 .7 6 2 7 1 .7 1 4 6 1 .6 6 2 4 1 .6 0 5 2 1.5741 1.5411 1 .5 0 5 6 1 .4 6 7 2 1 .4 2 4 8 1 .3 7 6 9
60 1 .7 0 7 0 1 .6 5 7 4 1 .6 0 3 4 1 .5 4 3 5 1 .5 1 0 7 1 .4 7 5 5 1 .4 3 7 3 1 .3 9 5 2 1 .3 4 7 6 1 .2 9 1 5
120 1 .6 5 2 4 1 .6 0 1 2 1 .5 4 5 0 1 .4821 1 .4 4 7 2 1 .4 0 9 4 1 .3 6 7 6 1 .3 2 0 3 1 .2 6 4 6 1 .1 9 2 6
IDIX G ( C ontinued)
a = .05
1 2 3 4 5 6 7 8 9
1 1 6 1 .4 5 1 9 9 .5 0 2 1 5 .7 1 2 2 4 .5 8 2 3 0 .1 6 2 3 3 .9 9 2 3 6 .7 7 2 3 8 .8 8 2 4 0 .5 4
2 1 8 .5 1 3 1 9 .0 0 0 1 9 ,1 6 4 1 9 .2 4 7 1 9 .2 9 6 1 9 .3 3 0 1 9 .3 5 3 1 9 .371 1 9 .3 8 5
3 1 0 .1 2 8 9 .5 5 2 1 9 .2 7 6 6 9 .1 1 7 2 9 .0 1 3 5 8 .9 4 0 6 8.8868 8 .8 4 5 2 8 .8 1 2 3
4 7 .7 0 8 6 6 .9 4 4 3 6 .5 9 1 4 6 .3 8 8 3 6 .2 5 6 0 6 .1 6 3 1 6 .0 9 4 2 6 .0 4 1 0 5 .9 9 8 8
5 6 .6 0 7 9 5 .7 8 6 1 5 .4 0 9 5 5 .1 9 2 2 5 .0 5 0 3 4 .9 5 0 3 4 .8 7 5 9 4 .8 1 8 3 4 .7 7 2 5
6 5 .9 8 7 4 5 .1 4 3 3 4 .7 5 7 1 4 .5 3 3 7 4 .3 8 7 4 4 .2 8 3 9 4 .2 0 6 6 4 .1 4 6 8 4 .0 9 9 0
7 5 .5 9 1 4 4 .7 3 7 4 4 .3 4 6 8 4 .1 2 0 3 3 .9 7 1 5 3 .8 6 6 0 3 .7 8 7 0 3 .7 2 5 7 3 .6 7 6 7
8 5 .3 1 7 7 4 .4 5 9 0 4 .0 6 6 2 3 .8 3 7 8 3 .6 8 7 5 3 .5 8 0 6 3 .5 0 0 5 3 .4 3 8 1 3 .3 8 8 1
9 5 .1 1 7 4 4 .2 5 6 5 3 .8 6 2 6 3 .6 3 3 1 3 .4 8 1 7 3 .3 7 3 8 3 .2 9 2 7 3 .2 2 9 6 3 .1 7 8 9
10 4 .9 6 4 6 4 .1 0 2 8 3 .7 0 8 3 3 .4 7 8 0 3 .3 2 5 8 3 .2 1 7 2 3 .1 3 5 5 3 .0 7 1 7 3 .0 2 0 4
11 4 .8 4 4 3 3 .9 8 2 3 3 .5 8 7 4 3 .3 5 6 7 3 .2 0 3 9 3 .0 9 4 6 3 .0 1 2 3 2 .9 4 8 0 2 .8 9 6 2
12 4 .7 4 7 2 3 .8 8 5 3 3 .4 9 0 3 3 .2 5 9 2 3 .1 0 5 9 2 .9961 2 .9 1 3 4 2 .8 4 8 6 2 .7 9 6 4
13 4 .6 6 7 2 3 .8 0 5 6 3 .4 1 0 5 3 .1 7 9 1 3 .0 2 5 4 2 .9 1 5 3 2 .8 3 2 1 2 .7 6 6 9 2 .7 1 4 4
14 4 .6 0 0 1 3 .7 3 8 9 3 .3 4 3 9 3 .1 1 2 2 2 .9 5 8 2 2 .8 4 7 7 2 .7 6 4 2 2 .6 9 8 7 2 .6 4 5 8
15 4 .5 4 3 1 3 .6 8 2 3 3 .2 8 7 4 3 .0 5 5 6 2 .9 0 1 3 2 .7 9 0 5 2 .7 0 6 6 2 .6 4 0 8 2 .5 8 7 6
16 4 .4 9 4 0 3 .6 3 3 7 3 .2 3 8 9 3 .0 0 6 9 2 .8 5 2 4 2 .7 4 1 3 2 .6 5 7 2 2 .5 9 1 1 2 .5 3 7 7
17 4 .4 5 1 3 3 .5 9 1 5 3 .1 9 6 8 2 .9 6 4 7 2 .8 1 0 0 2 .6 9 8 7 2 .6 1 4 3 2 .5 4 8 0 2 .4 9 4 3
18 4 .4 1 3 9 3 .5 5 4 6 3 .1 5 9 9 2 .9 2 7 7 2 .7 7 2 9 2 .6 6 1 3 2 .5 7 6 7 2 .5 1 0 2 2 .4 5 6 3
19 4 .3 8 0 8 3 .5 2 1 9 3 .1 2 7 4 2 .8 9 5 1 2 .7 4 0 1 2 .6 2 8 3 2 .5 4 3 5 2 .4 7 6 8 2 .4 2 2 7
20 4 .3 5 1 3 3 .4 9 2 8 3 .0 9 8 4 2 .8 6 6 1 2 .7 1 0 9 2 .5 9 9 0 2 .5 1 4 0 2 .4 4 7 1 2 .3 9 2 8
21 4 .3 2 4 6 3 .4 6 6 8 3 .0 7 2 5 2 .8401 2 .6 8 4 8 2 .5 7 2 7 2 .4 8 7 6 2 .4 2 0 5 2.3661
22 4 .3 0 0 9 3 .4 4 3 4 3 .0 4 9 1 2 .8 1 6 7 2 .6 6 1 3 2 .5 4 9 1 2 .4 6 3 8 2 .3 9 6 5 2 .3 4 1 9
23 4 .2 7 9 3 3 .4 2 2 1 3 .0 2 8 0 2 .7 9 5 5 2 .6 4 0 0 2 .5 2 7 7 2 .4 4 2 2 2 .3 7 4 8 2 .3201
24 4 .2 5 9 7 3 .4 0 2 8 3 .0 0 8 8 2 .7 7 6 3 2 .6 2 0 7 2 .5 0 8 2 2 .4 2 2 6 2 .3 5 5 1 2 .3 0 0 2
25 4 ,2 4 1 7 3 .3 8 5 2 2 .9 9 1 2 2 .7 5 8 7 2 .6 0 3 0 2 .4 9 0 4 2 .4 0 4 7 2 .3 3 7 1 2 .2821
26 4 .2 2 5 2 3 .3 6 9 0 2 .9751 2 .7 4 2 6 2 .5 8 6 8 2 .4741 2 .3 8 8 3 2 .3 2 0 5 2 .2 6 5 5
27 4 .2 1 0 0 3 .3 5 4 1 2 .9 6 0 4 2 .7 2 7 8 2 .5 7 1 9 2 .4 5 9 1 2 .3 7 3 2 2 .3 0 5 3 2 .2501
28 4 .1 9 6 0 3 .3 4 0 4 2 .9 4 6 7 2 .7141 2 .5 5 8 1 2 .4 4 5 3 2 .3 5 9 3 2 .2 9 1 3 2 .2 3 6 0
29 4 .1 8 3 0 3 .3 2 7 7 2 .9 3 4 0 2 .7 0 1 4 2 .5 4 5 4 2 .4 3 2 4 2 .3 4 6 3 2 .2 7 8 2 2 .2 2 2 9
30 4 .1 7 0 9 3 .3 1 5 8 2 ,9 2 2 3 2 .6 8 9 6 2 .5 3 3 6 2 .4 2 0 5 2 .3 3 4 3 2 .2 6 6 2 2 .2 1 0 7
40 4 .0 8 4 8 3 .2 3 1 7 2 .8 3 8 7 2 .6 0 6 0 2 .4 4 9 5 2 .3 3 5 9 2 .2 4 9 0 2 .1 8 0 2 2 .1 2 4 0
60 4 .0 0 1 2 3 .1 5 0 4 2 .7581 2 .5 2 5 2 2 .3 6 8 3 2 .2 5 4 0 2 .1 6 6 5 2 .0 9 7 0 2 .0401
20 3 .9 2 0 1 3 .0 7 1 8 2 .6 8 0 2 2 .4 4 7 2 2 .2 9 0 0 2 .1 7 5 0 2 .0 8 6 7 2 .0 1 6 4 1 .9 5 8 8
3 .8 4 1 5 2 .9 9 5 7 2 .6 0 4 9 2 .3 7 1 9 2 .2141 2 .0 9 8 6 2 .0 0 9 6 1 .9 3 8 4 1 .8 7 9 9
j
TABLE OF F DISTRIBUTIONS 641
APPENDIX G (Continued)
a = .05
\ m i
m2 >Ss\. 10 12 15 20 24 30 40 60 120 00 j
1 2 4 1 .8 8 2 4 3 .91 2 4 5 .9 5 2 4 8 .01 2 4 9 .0 5 2 5 0 .0 9 2 5 1 .1 4 2 5 2 .2 0 2 5 3 .2 5 2 5 4 .3 2
2 1 9 .3 9 6 1 9 .4 1 3 1 9 .4 2 9 1 9 .4 4 6 1 9 .4 5 4 1 9 .4 6 2 1 9 .471 1 9 .4 7 9 1 9 .4 8 7 1 9 .4 9 6 ;
3 8 .7 8 5 5 8 .7 4 4 6 8 .7 0 2 9 8 .6 6 0 2 8 .6 3 8 5 8 .6 1 6 6 8 .5 9 4 4 8 .5 7 2 0 8 .5 4 9 4 8 .5 2 6 5 |
4 5 .9 6 4 4 5 .9 1 1 7 5 .8 5 7 8 5 .8 0 2 5 5 .7 7 4 4 5 .7 4 5 9 5 .7 1 7 0 5 .6 8 7 8 5 .6 5 8 1 5 .6 2 8 1 1
5 4 .7 3 5 1 4 .6 7 7 7 4 .6 1 8 8 4 .5 5 8 1 4 .5 2 7 2 4 .4 9 5 7 4 .4 6 3 8 4 .4 3 1 4 4 .3 9 8 4 4 .3 6 5 0 ;
6 4 .0 6 0 0 3 .9 9 9 9 3 .9 3 8 1 3 .8 7 4 2 3 .8 4 1 5 3 .8 0 8 2 3 .7 7 4 3 3 .7 3 9 8 3 .7 0 4 7 3 .6 6 8 8
7 3 .6 3 6 5 3 .5 7 4 7 3 .5 1 0 8 3 .4 4 4 5 3 .4 1 0 5 3 .3 7 5 8 3 .3 4 0 4 3 .3 0 4 3 3 .2 6 7 4 3 .2 2 9 8
8 3 .3 4 7 2 3 .2 8 4 0 3 .2 1 8 4 3 .1 5 0 3 3 .1 1 5 2 3 .0 7 9 4 3 .0 4 2 8 3 .0 0 5 3 2 .9 6 6 9 2 .9 2 7 6
9 3 .1 3 7 3 3 .0 7 2 9 3 .0 0 6 1 2 .9 3 6 5 2 .9 0 0 5 2 .8 6 3 7 2 .8 2 5 9 2 .7 8 7 2 2 .7 4 7 5 2 .7 0 6 7 : 1
10 2 .9 7 8 2 2 .9 1 3 0 2 .8 4 5 0 2 .7 7 4 0 2 .7 3 7 2 2 .6 9 9 6 2 .6 6 0 9 2 ,6 2 1 1 2 .5 8 0 1 2 .5 3 7 9
11 2 .8 5 3 6 2 .7 8 7 6 2 .7 1 8 6 2 .6 4 6 4 2 .6 0 9 0 2 .5 7 0 5 2 .5 3 0 9 2 .4 9 0 1 2 .4 4 8 0 2 .4 0 4 5
12 2 .7 5 3 4 2.6866 2 .6 1 6 9 2 .5 4 3 6 2 .5 0 5 5 2 .4 6 6 3 2 .4 2 5 9 2 .3 8 4 2 2 .3 4 1 0 2 .2 9 6 2 ;
13 2 .6 7 1 0 2 .6 0 3 7 2 .5331 2 .4 5 8 9 2 .4 2 0 2 2 .3 8 0 3 2 .3 3 9 2 2 .2 9 6 6 2 .2 5 2 4 2 .2 0 6 4
14 2 .6021 2 .5 3 4 2 2 .4 6 3 0 2 .3 8 7 9 2 .3 4 8 7 2 .3 0 8 2 2 .2 6 6 4 2 .2 2 3 0 2 .1 7 7 8 2 .1 3 0 7 :
15 2 .5 4 3 7 2 .4 7 5 3 2 .4 0 3 5 2 .3 2 7 5 2 .2 8 7 8 2 .2 4 6 8 2 .2 0 4 3 2 .1 6 0 1 2 .1 1 4 1 2 .0 6 5 8 .
16 2 .4 9 3 5 2 .4 2 4 7 2 .3 5 2 2 2 .2 7 5 6 2 .2 3 5 4 2 .1 9 3 8 2 .1 5 0 7 2 .1 0 5 8 2 .0 5 8 9 2 .0 0 9 6
17 2 .4 4 9 9 2 .3 8 0 7 2 .3 0 7 7 2 .2 3 0 4 2 .1 8 9 8 2 .1 4 7 7 2 .1 0 4 0 2 .0 5 8 4 2 .0 1 0 7 1 .9 6 0 4 |
19 2 .3 7 7 9 2 .3 0 8 0 2.2341 2 .1 5 5 5 2 .1141 2 .0 7 1 2 2 .0 2 6 4 1 .9 7 9 6 1 .9 3 0 2 1 .8 7 8 0 ;
20 2 .3 4 7 9 2 .2 7 7 6 2 .2 0 3 3 2 .1 2 4 2 2 .0 8 2 5 2 .0391 1 .9 9 3 8 1 .9 4 6 4 1 .8 9 6 3 1 .8 4 3 2
21 2 .3 2 1 0 2 .2 5 0 4 2 .1 7 5 7 2 .0 9 6 0 2 .0 5 4 0 2.0102 1 .9 6 4 5 1 .9 1 6 5 1 .8 6 5 7 1 .8 1 1 7 i
22 2 .2 9 6 7 2 .2 2 5 8 2 .1 5 0 8 2 .0 7 0 7 2 .0 2 8 3 1 .9 8 4 2 1 .9 3 8 0 1 .8 8 9 5 1 .8 3 8 0 1 .7831
23 2 .2 7 4 7 2 .2 0 3 6 2 .1 2 8 2 2 .0 4 7 6 2 .0 0 5 0 1 .9 6 0 5 1 .9 1 3 9 1 .8 6 4 9 1 .8 1 2 8 1 .7 5 7 0 i
24 2 .2 5 4 7 2 .1 8 3 4 2 .1 0 7 7 2 .0 2 6 7 1 .9 8 3 8 1 .9 3 9 0 1 .8 9 2 0 1 .8 4 2 4 1 .7 8 9 7 1 .7 3 3 1 ;
25 2 .2 3 6 5 2 .1 6 4 9 2 .0 8 8 9 2 .0 0 7 5 1 .9 6 4 3 1 .9 1 9 2 1 .8 7 1 8 1 .8 2 1 7 1 .7 6 8 4 1 .7 1 1 0
26 2 .2 1 9 7 2 .1 4 7 9 2 .0 7 1 6 1 .9 8 9 8 1 .9 4 6 4 1 .9 0 1 0 1 .8 5 3 3 1 .8 0 2 7 1 .7 4 8 8 1 .6 9 0 6 |
27 2 .2 0 4 3 2 .1 3 2 3 2 .0 5 5 8 1 .9 7 3 6 1 .9 2 9 9 1 .8 8 4 2 1 .8 3 6 1 1 .7 8 5 1 1 .7 3 0 7 1 .6 7 1 7
28 2 .1 9 0 0 2 .1 1 7 9 2 .0411 1 .9 5 8 6 1 .9 1 4 7 1 .8 6 8 7 1 .8 2 0 3 1 .7 6 8 9 1 .7 1 3 8 1 .6 5 4 1 |
29 2 .1 7 6 8 2 .1 0 4 5 2 .0 2 7 5 1 .9 4 4 6 1 .9 0 0 5 1 .8 5 4 3 1 .8 0 5 5 1 .7 5 3 7 1.6981 1 .6 3 7 7
30 2 .1 6 4 6 2 .0921 2 .0 1 4 8 1 .9 3 1 7 1 .8 8 7 4 1 .8 4 0 9 1 .7 9 1 8 1 .7 3 9 6 1 .6 8 3 5 1 .6 2 2 3 1
40 2 .0 7 7 2 2 .0 0 3 5 1 .9 2 4 5 1 .8 3 8 9 1 .7 9 2 9 1 .7 4 4 4 1 .6 9 2 8 1 .6 3 7 3 1 .5 7 6 6 1 .5 0 8 9
60 1 .9 9 2 6 1 .9 1 7 4 1 .8 3 6 4 1 .7 4 8 0 1.7001 1.6491 1 .5 9 4 3 1 .5 3 4 3 1 .4 6 7 3 1 .3 8 9 3 i
120 1 .9 1 0 5 1 .8 3 3 7 1 .7 5 0 5 1 .6 5 8 7 1 .6 0 8 4 1 .5 5 4 3 1 .4 9 5 2 1 .4 2 9 0 1 .3 5 1 9 1 .2 5 3 9 |
00 1 .8 3 0 7 1 .7 5 2 2 1 .6 6 6 4 1 .5 7 0 5 1 .5 1 7 3 1.4591 1 .3 9 4 0 1 .3 1 8 0 1 .2 2 1 4 1.0000
I
11
(I
:i
I|
OF F DISTRIBUTIONS
DIX G ( C o n t in u e d )
a = .025
1 2 3 4 5 6 7 8 9
1 6 4 7 .7 9 7 9 9 .5 0 8 6 4 .1 6 8 9 9 .5 8 9 2 1 .8 5 !9 3 7 .1 1 9 4 8 .2 2 9 5 6 .6 6 i9 6 3 .2 8
3 9 .3 5 5 3 9 .3 7 3 3 9 .3 8 7
2 3 8 .5 0 6 3 9 .0 0 0 3 9 .1 6 5 3 9 .2 4 8 3 9 .2 9 8 3 9 .3 3 1
1 6 .0 4 4 1 5 .4 3 9 15.101 1 4 .8 8 5 1 4 .7 3 5 1 4 .6 2 4 1 4 .5 4 0 1 4 .4 7 3
3 1 7 .4 4 3
9 .6 0 4 5 9 .3 6 4 5 9 .1 9 7 3 9 .0 7 4 1 8 .9 7 9 6 8 .9 0 4 7
4 1 2 .2 1 8 1 0 .6 4 9 9 .9 7 9 2
7 .3 8 7 9 7 .1 4 6 4 6 .9 7 7 7 6 .8 5 3 1 6 .7 5 7 2 6 .6 8 1 0
5 1 0 .0 0 7 8 .4 3 3 6 7 .7 6 3 6
5 .9 8 7 6 5 .3 1 9 7 5 .6 9 5 5 5 .5 9 9 6 5 .5 2 3 4
6 8 .8 1 3 1 7 .2 5 9 8 6 .5 9 8 8 6 .2 2 7 2
5 .8 8 9 8 5 .5 2 2 6 5 .2 8 5 2 5 .1 1 8 6 4 .9 9 4 9 4 .8 9 9 4 4 .8 2 3 2
7 8 .0 7 2 7 6 .5 4 1 5
4 .8 1 7 3 4 .6 5 1 7 4 .5 2 8 6 4 .4 3 3 2 4 .3 5 7 2
8 7 .5 7 0 9 6 .0 5 9 5 5 .4 1 6 0 5 .0 5 2 6
5 .0 7 8 1 4 .7 1 8 1 4 .4 8 4 4 4 .3 1 9 7 4 .1 9 7 1 4 .1 0 2 0 4 .0 2 6 0
9 7 .2 0 9 3 5 .7 1 4 7
4 .4 6 8 3 4 .2 3 6 1 4 .0 7 2 1 3 .9 4 9 8 3 .8 5 4 9 3 .7 7 9 0
10 6 .9 3 6 7 5 .4 5 6 4 4 .8 2 5 6
4 .0 4 4 0 3 .8 8 0 7 3 .7 5 8 6 3 .6 6 3 8 3 .5 8 7 9
11 6 .7 2 4 1 5 .2 5 5 9 4 .6 3 0 0 4 .2 7 5 1
4 ,1 2 1 2 3 .8 9 1 1 3 .7 2 8 3 3 .6 0 6 5 3 .5 1 1 8 3 .4 3 5 8
12 6 .5 5 3 8 5 .0 9 5 9 4 .4 7 4 2
4 .9 6 5 3 4 .3 4 7 2 3 .9 9 5 9 3 .7 6 6 7 3 .6 0 4 3 3 .4 8 2 7 3 .3 8 8 0 3 .3 1 2 0
13 6 .4 1 4 3
6 .2 9 7 9 4 .8 5 6 7 4 .2 4 1 7 3 .8 9 1 9 3 .6 6 3 4 3 .5 0 1 4 3 .3 7 9 9 3 .2 8 5 3 3 .2 0 9 3
14
6 .1 9 9 5 4 .7 6 5 0 4 .1 5 2 8 3 .8 0 4 3 3 .5 7 6 4 3 .4 1 4 7 3 .2 9 3 4 3 .1 9 8 7 3 .1 2 2 7
15
6 .1 1 5 1 4 .6 8 6 7 4 .0 7 6 8 3 .7 2 9 4 3 .5 0 2 1 3 .3 4 0 6 3 .2 1 9 4 3 .1 2 4 8 3 .0 4 8 8
16
6 .0 4 2 0 4 .6 1 8 9 4 .0 1 1 2 3 .6 6 4 8 3 .4 3 7 9 3 .2 7 6 7 3 .1 5 5 6 3 .0 6 1 0 2 .9 8 4 9
17
18 5 .9 7 8 1 4 .5 5 9 7 3 .9 5 3 9 3 .6 0 8 3 3 .3 8 2 0 3 .2 2 0 9 3 .0 9 9 9 3 .0 0 5 3 2 .9 2 9 1
5 .9 2 1 6 4 .5 0 7 5 3 .9 0 3 4 3 .5 5 8 7 3 .3 3 2 7 3 .1 7 1 8 3 .0 5 0 9 2 .9 5 6 3 2 .8 8 0 0
19
20 5 .8 7 1 5 4 .4 6 1 3 3 .8 5 8 7 3 .5 1 4 7 3 .2 8 9 1 3 .1 2 8 3 3 .0 0 7 4 2 .9 1 2 8 2 .8 3 6 5
3 .4 7 5 4 3 .2 5 0 1 3 .0 8 9 5 2 .9 6 8 6 2 .8 7 4 0 2 .7 9 7 7
21 5 .8 2 6 6 4 .4 1 9 9 3 .8 1 8 8
22 5 .7 8 6 3 4 .3 8 2 8 3 .7 8 2 9 3 .4 4 0 1 3 .2 1 5 1 3 .0 5 4 6 2 .9 3 3 8 2 .8 3 9 2 2 .7 6 2 8
23 5 .7 4 9 8 4 .3 4 9 2 3 .7 5 0 5 3 .4 0 8 3 3 .1 8 3 5 3 .0 2 3 2 2 .9 0 2 4 2 .8 0 7 7 2 .7 3 1 3
5 .7 1 6 7 4 .3 1 8 7 3 .7 2 1 1 3 .3 7 9 4 3 .1 5 4 8 2 .9 9 4 6 2 .8 7 3 8 2 .7 7 9 1 2 .7 0 2 7
24
5 .6 8 6 4 4 .2 9 0 9 3 .6 9 4 3 3 .3 5 3 0 3 .1 2 8 7 2 .9 6 8 5 2 .8 4 7 8 2 .7 5 3 1 2 .6 7 6 6
25
5 .6 5 8 6 4 .2 6 5 5 3 .6 6 9 7 3 .3 2 8 9 3 .1 0 4 8 2 .9 4 4 7 2 .8 2 4 0 2 .7 2 9 3 2 .6 5 2 8
26
27 5 .6 3 3 1 4 .2 4 2 1 3 .6 4 7 2 3 .3 0 6 7 3 .0 8 2 8 2 .9 2 2 8 2 .8 0 2 1 2 .7 0 7 4 2 .6 3 0 9
28 5 .6 0 9 6 4 .2 2 0 5 3 .6 2 6 4 3 .2 8 6 3 3 .0 6 2 5 2 .9 0 2 7 2 -7 8 2 0 2 .6 8 7 2 2 .6 1 0 6
29 5 .5 8 7 8 4 .2 0 0 6 3 .6 0 7 2 3 .2 6 7 4 3 .0 4 3 8 2 .8 8 4 0 2 .7 6 3 3 2.6686 2 .5 9 1 9
30 5 .5 6 7 5 4 .1 8 2 1 3 .5 8 9 4 3 .2 4 9 9 3 .0 2 6 5 2 .8 6 6 7 2 .7 4 6 0 2 .6 5 1 3 2 .5 7 4 6
40 5 .4 2 3 9 4 .0 5 1 0 3 .4 6 3 3 3 .1 2 6 1 2 .9 0 3 7 2 .7 4 4 4 2 .6 2 3 8 2 .5 2 8 9 2 .4 5 1 9
60 5 .2 8 5 7 3 .9 2 5 3 3 .3 4 2 5 3 .0 0 7 7 2 .7 8 6 3 2 .6 2 7 4 2 .5 0 6 8 2 .4 1 1 7 2 .3 3 4 4
5 .1 5 2 4 3 .8 0 4 6 3 .2 2 7 0 2 .8 9 4 3 2 .6 7 4 0 2 .5 1 5 4 2 .3 9 4 8 2 .2 9 9 4 2 .2 2 1 7
5 .0 2 3 9 3 .6 8 8 9 3 .1 1 6 1 2 .7 8 5 8 2 .5 6 6 5 2 .4 0 8 2 2 .2 8 7 5 2 .1 9 1 8 2 .1 1 3 6
TABLE OF F DISTRIBUTIONS 643
ENDIX G ( Continued)
a = .025
10 12 15 20 24 30 40 60 120 CO
1 9 6 8 .6 3 9 7 6 .7 1 9 8 4 .8 7 9 9 3 .1 0 9 9 7 .2 5 1 0 0 1 .4 1 0 0 5 .6 1 0 0 9 .8 1 0 1 4 .0 1 0 1 8 .3
2 3 9 .3 9 8 3 9 .4 1 5 3 9 .4 3 1 3 9 .4 4 8 3 9 .4 5 6 3 9 .4 6 5 3 9 .4 7 3 3 9 .4 8 1 3 9 .4 9 0 3 9 .4 9 8
3 1 4 .4 1 9 1 4 .3 3 7 1 4 .2 5 3 1 4 .1 6 7 1 4 .1 2 4 1 4 .081 1 4 .0 3 7 1 3 .9 9 2 1 3 .9 4 7 1 3 .9 0 2
4 8 .8 4 3 9 8 .7 5 1 2 8 .6 5 6 5 8 .5 5 9 9 8 .5 1 0 9 8 .4 6 1 3 8 .4 1 1 1 8 .3 6 0 4 8 .3 0 9 2 8 .2 5 7 3
5 6 .6 1 9 2 6 .5 2 4 6 6 .4 2 7 7 6 .3 2 8 5 6 .2 7 8 0 6 .2 2 6 9 6 .1 7 5 1 6 .1 2 2 5 6 .0 6 9 3 6 .0 1 5 3
6 5 .4 6 1 3 5 .3 6 6 2 5 .2 6 8 7 5 .1 6 8 4 5 .1 1 7 2 5 .0 6 5 2 5 .0 1 2 5 5 .9 5 8 9 4 .9 0 4 5 4 .8 4 9 1
7 4 .7 6 1 1 4 .6 6 5 8 4 .5 6 7 8 4 .4 6 6 7 4 .4 1 5 0 4 .3 6 2 4 4 .3 0 8 9 4 .2 5 4 4 4 .1 9 8 9 4 .1 4 2 3
8 4 .2 9 5 1 4 .1 9 9 7 4 .1 0 1 2 3 .9 9 9 5 3 .9 4 7 2 3 .8 9 4 0 3 .8 3 9 8 3 .7 8 4 4 3 .7 2 7 9 3 .6 7 0 2
9 3 .9 6 3 9 3 .8 6 8 2 3 .7 6 9 4 3 .6 6 6 9 3 .6 1 4 2 3 .5 6 0 4 3 .5 0 5 5 3 .4 4 9 3 3 .3 9 1 8 3 .3 3 2 9
10 3 .7 1 6 8 3 .6 2 0 9 3 .5 2 1 7 3 .4 1 8 6 3 .3 6 5 4 3 .3 1 1 0 3 .2 5 5 4 3 .1 9 8 4 3 .1 3 9 9 3 .0 7 9 8
11 3 .5 2 5 7 3 .4 2 9 6 3 .3 2 9 9 3 .2 2 6 1 3 .1 7 2 5 3 .1 1 7 6 3 .0 6 1 3 3 .0 0 3 5 2 .9 4 4 1 2 .8 8 2 8
12 3 .3 7 3 6 3 .2 7 7 3 3 .1 7 7 2 3 .0 7 2 8 3 .0 1 8 7 2 .9 6 3 3 2 .9 0 6 3 2 .8 4 7 8 2 .7 8 7 4 2 .7 2 4 9
13 3 .2 4 9 7 3 .1 5 3 2 3 .0 5 2 7 2 .9 4 7 7 2 .8 9 3 2 2 .8 3 7 3 2 .7 7 9 7 2 .7 2 0 4 2 .6 5 9 0 2 .5 9 5 5
14 3 .1 4 6 9 3 .0 5 0 1 2 .9 4 9 3 2 .8 4 3 7 2 .7 8 8 8 2 .7 3 2 4 2 .6 7 4 2 2 .6 1 4 2 2 .5 5 1 9 2 .4 8 7 2
15 3 .0 6 0 2 2 .9 6 3 3 2 .8 6 2 1 2 .7 5 5 9 2 .7 0 0 6 2 .6 4 3 7 2 .5 8 5 0 2 .5 2 4 2 2 .4 6 1 1 2 .3 9 5 3
16 2 .9 8 6 2 2 .8 8 9 0 2 .7 8 7 5 2 .6 8 0 8 2 .6 2 5 2 2 .5 6 7 8 2 .5 0 8 5 2 .4 4 7 1 2 .3 8 3 1 2 .3 1 6 3
17 2 .9 2 2 2 2 .8 2 4 9 2 .7 2 3 0 2 .6 1 5 8 2 .5 5 9 8 2 .5 0 2 1 2 .4 4 2 2 2 .3 8 0 1 2 .3 1 5 3 2 .2 4 7 4
18 2 .8 6 6 4 2 .7 6 8 9 2 .6 6 6 7 2 .5 5 9 0 2 .5 0 2 7 2 .4 4 4 5 2 .3 8 4 2 2 .3 2 1 4 2 .2 5 5 8 2 .1 8 6 9
19 2 .8 1 7 3 2 .7 1 9 6 2 .6 1 7 1 2 .5 0 8 9 2 .4 5 2 3 2 .3 9 3 7 2 .3 3 2 9 2 .2 6 9 5 2 .2 0 3 2 2 .1 3 3 3
20 2 .7 7 3 7 2 .6 7 5 8 2 .5 7 3 1 2 .4 6 4 5 2 .4 0 7 6 2 .3 4 8 6 2 .2 8 7 3 2 .2 2 3 4 2 .1 5 6 2 2 .0 8 5 3
21 2 .7 3 4 8 2 .6 3 6 8 2 .5 3 3 8 2 .4 2 4 7 2 .3 6 7 5 2 .3 0 8 2 2 .2 4 6 5 2 .1 8 1 9 2 .1141 2 .0 4 2 2
22 2 .6 9 9 8 2 .6 0 1 7 2 .4 9 8 4 2 .3 8 9 0 2 .3 3 1 5 2 .2 7 1 8 2 .2 0 9 7 2 .1 4 4 6 2 .0 7 6 0 2 .0 0 3 2
23 2 .6 6 8 2 2 .5 6 9 9 2 .4 6 6 5 2 .3 5 6 7 2 .2 9 8 9 2 .2 3 8 9 2 .1 7 6 3 2 .1 1 0 7 2 .0 4 1 5 1 .9 6 7 7
24 2 .6 3 9 6 2 .5 4 1 2 2 .4 3 7 4 2 .3 2 7 3 2 .2 6 9 3 2 .2 0 9 0 2 .1 4 6 0 2 .0 7 9 9 2 .0 0 9 9 1 .9 3 5 3
25 2 .6 1 3 5 2 .5 1 4 9 2 .4 1 1 0 2 .3 0 0 5 2 .2 4 2 2 2 .1 8 1 6 2 .1 1 8 3 2 .0 5 1 7 1 .9811 1 .9 0 5 5
26 2 .5 8 9 5 2 .4 9 0 9 2 .3 8 6 7 2 .2 7 5 9 2 .2 1 7 4 2 .1 5 6 5 2 .0 9 2 8 2 .0 2 5 7 1 .9 5 4 5 1 .8781
27 2 .5 6 7 6 2 .4 6 8 8 2 .3 6 4 4 2 .2 5 3 3 2 .1 9 4 6 2 .1 3 3 4 2 .0 6 9 3 2 .0 0 1 8 1 .9 2 9 9 1 .8 5 2 7
28 2 .5 4 7 3 2 .4 4 8 4 2 .3 4 3 8 2 .2 3 2 4 2 .1 7 3 5 2.1121 2 .0 4 7 7 1 .9 7 9 6 1 .9 0 7 2 1 .8 2 9 1
29 2 .5 2 8 6 2 .4 2 9 5 2 .3 2 4 8 2 .2 1 3 1 2 .1 5 4 0 2 .0 9 2 3 2 .0 2 7 6 1 .9591 1 .8861 1 .8 0 7 2
30 2 .5 1 1 2 2 .4 1 2 0 2 .3 0 7 2 2 .1 9 5 2 2 .1 3 5 9 2 .0 7 3 9 2 .0 0 8 9 1 .9 4 0 0 1 .8 6 6 4 1 .7 8 6 7
40 2 .3 8 8 2 2 .2 8 8 2 2 .1 8 1 9 2 .0 6 7 7 2 .0 0 6 9 1 .9 4 2 9 1 .8 7 5 2 1 .8 0 2 8 1 .7 2 4 2 1 .6 3 7 1
60 2 .2 7 0 2 2 .1 6 9 2 2 .0 6 1 3 1 .9 4 4 5 1 .8 8 1 7 1 .8 1 5 2 1 .7 4 4 0 1.6668 1 .5 8 1 0 1 .4 8 2 2
2 .1 5 7 0 2 .0 5 4 8 1 .9 4 5 0 1 .8 2 4 9 1 .7 5 9 7 1 .6 8 9 9 1.6141 1 .5 2 9 9 1 .4 3 2 7 1 .3 1 0 4
2 .0 4 8 3 1 .9 4 4 7 1 .8 3 2 6 1 .7 0 8 5 1 .6 4 0 2 1 .5 6 6 0 1 .4 8 3 5 1 ,3 8 8 3 1 .2 6 8 4 1.0000
OF F DISTRIBUTIONS
G (Continued)
a = .01
1 2 3 4 5 6 7 8 9
1 4 0 5 2 .2 4 9 9 9 .5 5 4 0 3 .3 5 6 2 4 .6 5 7 6 3 .7 5 8 5 9 .0 5 9 2 8 .3 5 9 8 1 .6 6 0 2 2 .5
2 9 8 .5 0 3 9 9 .0 0 0 9 9 .1 6 6 9 9 .2 4 9 9 9 .2 9 9 9 9 .3 3 2 9 9 .3 5 6 9 9 .3 7 4 9 9 .3 8 8
3 3 4 .1 1 6 3 0 .8 1 7 2 9 .4 5 7 2 8 .7 1 0 2 8 .2 3 7 2 7 .911 2 7 .6 7 2 2 7 .4 8 9 2 7 .3 4 5
4 2 1 .1 9 8 1 8 .0 0 0 1 6 .6 9 4 1 5 .9 7 7 1 5 .5 2 2 1 5 .2 0 7 1 4 .9 7 6 1 4 .7 9 9 1 4 .6 5 9
5 1 6 .2 5 8 1 3 .2 7 4 1 2 .0 6 0 1 1 .3 9 2 1 0 .9 6 7 1 0 .6 7 2 1 0 .4 5 6 1 0 .2 8 9 1 0 .1 5 8
6 1 3 .7 4 5 1 0 .9 2 5 9 .7 7 9 5 9 .1 4 8 3 8 .7 4 5 9 8 .4 6 6 1 8 .2 6 0 0 8 .1 0 1 6 7 .9 7 6 1
7 1 2 .2 4 6 9 .5 4 6 6 8 .4 5 1 3 7 .8 4 6 7 7 .4 6 0 4 7 .1 9 1 4 6 .9 9 2 8 6 .8 4 0 1 6 .7 1 8 8
8 1 1 .2 5 9 8 .6 4 9 1 7 .5 9 1 0 7 .0 0 6 0 6 .6 3 1 8 6 .3 7 0 7 6 .1 7 7 6 6 .0 2 8 9 5 .9 1 0 6
9 1 0 .561 8 .0 2 1 5 6 .9 9 1 9 6 .4 2 2 1 6 .0 5 6 9 5 .8 0 1 8 5 .6 1 2 9 5 .4 6 7 1 5 .3 5 1 1
10 1 0 .0 4 4 7 .5 5 9 4 6 .5 5 2 3 5 .9 9 4 3 5 .6 3 6 3 : 5 .3 8 5 8 5 .2001 5 .0 5 6 7 4 .9 4 2 4
11 9 .6 4 6 0 7 .2 0 5 7 6 .2 1 6 7 5 .6 6 8 3 5 .3 1 6 0 5 .0 6 9 2 4 .8 8 6 1 4 .7 4 4 5 4 .6 3 1 5
12 9 .3 3 0 2 6 .9 2 6 6 5 .9 5 2 6 5 .4 1 1 9 5 .0 6 4 3 4 .8 2 0 6 4 .6 3 9 5 4 .4 9 9 4 4 .3 8 7 5
13 9 .0 7 3 8 6 .7 0 1 0 5 .7 3 9 4 5 .2 0 5 3 4 .8 6 1 6 4 .6 2 0 4 4 .4 4 1 0 4 .3 0 2 1 4 .1 9 1 1
14 8 .8 6 1 6 6 .5 1 4 9 5 .5 6 3 9 5 .0 3 5 4 4 ,6 9 5 0 4 .4 5 5 8 4 .2 7 7 9 4 .1 3 9 9 4 .0 2 9 7
15 8 .6 8 3 1 6 .3 5 8 9 5 .4 1 7 0 4 .8 9 3 2 4 .5 5 5 6 4 .3 1 8 3 4 .1 4 1 5 4 .0 0 4 5 3 .8 9 4 8
16 8 .5 3 1 0 6 .2 2 6 2 5 .2 9 2 2 4 .7 7 2 6 4 .4 3 7 4 4 .2 0 1 6 4 .0 2 5 9 3 .8 8 9 6 3 .7 8 0 4
17 8 .3 9 9 7 6.1121 5 .1 8 5 0 4 .6 6 9 0 4 .3 3 5 9 4 .1 0 1 5 3 .9 2 6 7 3 .7 9 1 0 3 .6 8 2 2
18 8 .2 8 5 4 6 .0 1 2 9 5 .0 9 1 9 4 .5 7 9 0 4 .2 4 7 9 4 .0 1 4 6 3 .8 4 0 6 3 .7 0 5 4 3 .5 9 7 1
19 8 .1 8 5 0 5 .9 2 5 9 5 .0 1 0 3 4 .5 0 0 3 4 .1 7 0 8 3 .9 3 8 6 3 .7 6 5 3 3 .6 3 0 5 3 .5 2 2 5
20 8 .0 9 6 0 5 .8 4 8 9 4 .9 3 8 2 4 .4 3 0 7 4 .1 0 2 7 3 .8 7 1 4 3 .6 9 8 7 3 .5 6 4 4 3 .4 5 6 7
21 8 .0 1 6 6 5 .7 8 0 4 4 .8 7 4 0 4 .3 6 8 8 4 .0 4 2 1 3 .8 1 1 7 3 .6 3 9 6 3 .5 0 5 6 3 .3 9 8 1
22 7 .9 4 5 4 5 .7 1 9 0 4 .8 1 6 6 4 .3 1 3 4 3 .9 8 8 0 3 .7 5 8 3 3 .5 8 6 7 3 .4 5 3 0 3 .3 4 5 8
23 7 .8 8 1 1 5 .6 6 3 7 4 .7 6 4 9 4 .2 6 3 5 3 .9 3 9 2 3 .7 1 0 2 3 .5 3 9 0 3 .4 0 5 7 3 .2 9 8 6
24 7 .8 2 2 9 5 .6 1 3 6 4 .7 1 8 1 4 .2 1 8 4 3 .8 9 5 1 3 .6 6 6 7 3 .4 9 5 9 3 .3 6 2 9 3 .2 5 6 0
25 7 .7 6 9 8 5 .5 6 8 0 4 .6 7 5 5 4 ,1 7 7 4 3 .8 5 5 0 3 .6 2 7 2 3 .4 5 6 8 3 .3 2 3 9 3 .2 1 7 2
26 7 .7 2 1 3 5 .5 2 6 3 4 .6 3 6 6 4 .1 4 0 0 3 .8 1 8 3 3 .5 9 1 1 3 .4 2 1 0 3 .2 8 8 4 3 .1 8 1 8
27 7 .6 7 6 7 5 .4 8 8 1 4 .6 0 0 9 4 .1 0 5 6 3 .7 8 4 8 3 .5 5 8 0 3 .3 8 8 2 3 .2 5 5 8 3 .1 4 9 4
28 7 .6 3 5 6 5 .4 5 2 9 4 .5 6 8 1 4 .0 7 4 0 3 .7 5 3 9 3 .5 2 7 6 3 .3 5 8 1 3 .2 2 5 9 3 ,1 1 9 5
29 7 .5 9 7 6 5 .4 2 0 5 4 .5 3 7 8 4 .0 4 4 9 3 .7 2 5 4 3 .4 9 9 5 3 .3 3 0 2 3 .1 9 8 2 3 .0 9 2 0
30 7 .5 6 2 5 5 .3 9 0 4 4 .5 0 9 7 4 .0 1 7 9 3 .6 9 9 0 3 .4 7 3 5 3 .3 0 4 5 3 .1 7 2 6 3 .0 6 6 5
40 7 .3 1 4 1 5 .1 7 8 5 4 .3 1 2 6 3 .8 2 8 3 3 .5 1 3 8 3 .2 9 1 0 3 .1 2 3 8 2 .9 9 3 0 2 .8 8 7 6
60 7 .0 7 7 1 4 .9 7 7 4 4 .1 2 5 9 3 .6 4 9 1 3 .3 3 8 9 3 .1 1 8 7 2 .9 5 3 0 2 .8 2 3 3 2 .7 1 8 5
20 6 .8 5 1 0 4 .7 8 6 5 3 .9 4 9 3 3 .4 7 9 6 3 .1 7 3 5 2 .9 5 5 9 2 .7 9 1 8 2 .6 6 2 9 2 .5 5 8 6
6 .6 3 4 9 4 .6 0 5 2 3 .7 8 1 6 3 .3 1 9 2 3 .0 1 7 3 2 .8 0 2 0 2 .6 3 9 3 2 .5 1 1 3 2 .4 0 7 3
TABLE OF F DISTRIBUTIONS 645
APPENDIX G (Continued)
a = .01
m,
m2 10 12 15 20 24 30 40 60 120
1 6 0 5 5 .8 6 1 0 6 .3 6 1 5 7 .3 6 2 0 8 .7 6 2 3 4 .6 6 2 6 0 .7 6 2 8 6 .8 6 3 1 3 .0 6 3 3 9 .4 6 3 6 6 .0
2 9 9 .3 9 9 9 9 .4 1 6 9 9 .4 3 2 9 9 .4 4 9 9 9 .4 5 8 9 9 .4 6 6 9 9 .4 7 4 9 9 .4 8 3 9 9 .4 9 1 9 9 .5 0 1
3 2 7 .2 2 9 2 7 .0 5 2 2 6 .8 7 2 2 6 .6 9 0 2 6 .5 9 8 2 6 .5 0 5 2 6 .4 1 1 2 6 .3 1 6 2 6 .2 2 1 2 6 .1 2 5
4 1 4 .5 4 6 1 4 .3 7 4 1 4 .1 9 8 1 4 .0 2 0 1 3 .9 2 9 1 3 .8 3 8 1 3 .7 4 5 1 3 .6 5 2 1 3 .5 5 8 1 3 .4 6 3
5 1 0 .0 5 1 9 .8 8 8 3 9 .7 2 2 2 9 .5 5 2 7 9 .4 6 6 5 9 .3 7 9 3 9 .2 9 1 2 9 .2 0 2 0 9 .1 1 1 8 9 .0 2 0 4
6 7 .8 7 4 1 7 .7 1 8 3 7 ,5 5 9 0 7 .3 9 5 8 7 .3 1 2 7 7 .2 2 8 5 7 .1 4 3 2 7 .0 5 6 8 6 .9 6 9 0 6 .8 8 0 1
7 6 .6 2 0 1 6 .4 6 9 1 6 .3 1 4 3 6 .1 5 5 4 6 .0 7 4 3 5 .9 9 2 1 5 .9 0 8 4 5 .8 2 3 6 5 .7 3 7 2 5 .6 4 9 5
8 5 .8 1 4 3 5 .6 6 6 8 5 .5 1 5 1 5 .3 5 9 1 5 .2 7 9 3 5 .1 9 8 1 5 .1 1 5 6 5 .0 3 1 6 4 .9 4 6 0 4 .8 5 8 8
9 5 .2 5 6 5 5 .1 1 1 4 4 .9 6 2 1 4 .8 0 8 0 4 .7 2 9 0 4 .6 4 8 6 4 .5 6 6 7 4 .4 8 3 1 4 .3 9 7 8 4 .3 1 0 5
10 4 .8 4 9 2 4 .7 0 5 9 4 .5 5 8 2 4 .4 0 5 4 4 .3 2 6 9 4 .2 4 6 9 4 .1 6 5 3 4 .0 8 1 9 3 .9 9 6 5 3 .9 0 9 0
11 4 .5 3 9 3 4 .3 9 7 4 4 .2 5 0 9 4 .0 9 9 0 4 .0 2 0 9 3 .9 4 1 1 3 .8 5 9 6 3 .7 7 6 1 3 .6 9 0 4 3 .6 0 2 5
12 4 .2 9 6 1 4 .1 5 5 3 4 .0 0 9 6 3 .8 5 8 4 3 .7 8 0 5 3 .7 0 0 8 3 .6 1 9 2 3 .5 3 5 5 3 .4 4 9 4 3 .3 6 0 8
13 4 .1 0 0 3 3 .9 6 0 3 3 .8 1 5 4 3 .6 6 4 6 3 .5 8 6 8 3 .5 0 7 0 3 .4 2 5 3 3 .3 4 1 3 3 .2 5 4 8 3 .1 6 5 4
14 3 .9 3 9 4 3 .8 0 0 1 3 ,6 5 5 7 3 .5 0 5 2 3 .4 2 7 4 3 .3 4 7 6 3 .2 6 5 6 3 .1 8 1 3 3 .0 9 4 2 3 .0 0 4 0
15 3 .8 0 4 9 3 .6 6 6 2 3 .5 2 2 2 3 .3 7 1 9 3 .2 9 4 0 3 .2 1 4 1 3 .1 3 1 9 3 .0 4 7 1 2 .9 5 9 5 2 .8 6 8 4
16 3 .6 9 0 9 3 .5 5 2 7 3 .4 0 8 9 3 .2 5 8 8 3 ,1 8 0 8 3 .1 0 0 7 3 .0 1 8 2 2 .9 3 3 0 2 .8 4 4 7 2 .7 5 2 8
17 3 .5 9 3 1 3 .4 5 5 2 3 .3 1 1 7 3 .1 6 1 5 3 .0 8 3 5 3 .0 0 3 2 2 .9 2 0 5 2 .8 3 4 8 2 .7 4 5 9 2 .6 5 3 0
18 3 .5 0 8 2 3 .3 7 0 6 3 .2 2 7 3 3 .0 7 7 1 2 .9 9 9 0 2 .9 1 8 5 2 .8 3 5 4 2 .7 4 9 3 2 .6 5 9 7 2 .5 6 6 0
19 3 .4 3 3 8 3 .2 9 6 5 3 .1 5 3 3 3 .0 0 3 1 2 .9 2 4 9 2 .8 4 4 2 2 .7 6 0 8 2 .6 7 4 2 2 .5 8 3 9 2 .4 8 9 3
20 3 .3 6 8 2 3 .2 3 1 1 3 .0 8 8 0 2 .9 3 7 7 2 .8 5 9 4 2 .7 7 8 5 2 .6 9 4 7 2 .6 0 7 7 2 .5 1 6 8 2 .4 2 1 2
21 3 .3 0 9 8 3 .1 7 2 9 3 .0 2 9 9 2 .8 7 9 6 2 .8 0 1 1 2 .7 2 0 0 2 .6 3 5 9 2 .5 4 8 4 2 .4 5 6 8 2 .3 6 0 3
22 3 .2 5 7 6 3 .1 2 0 9 2 .9 7 8 0 2 .8 2 7 4 2 .7 4 8 8 2 .6 6 7 5 2 .5 8 3 1 2 .4 9 5 1 2 .4 0 2 9 2 .3 0 5 5
23 3 .2 1 0 6 3 .0 7 4 0 2 .9 3 1 1 2 .7 8 0 5 2 .7 0 1 7 2 .6 2 0 2 2 .5 3 5 5 2 .4 4 7 1 2 .3 5 4 2 2 .2 5 5 9
24 3 .1 6 8 1 3 .0 3 1 6 2 .8 8 7 2 .7 3 8 0 2 .6 5 9 1 2 .5 7 7 3 2 .4 9 2 3 2 .4 0 3 5 2 .3 0 9 9 2 .2 1 0 7
25 3 .1 2 9 4 2 .9 9 3 1 2 .8 5 0 2 2 .6 9 9 3 2 .6 2 0 3 2 .5 3 8 3 2 .4 5 3 0 2 .3 6 3 7 2 .2 6 9 5 2 .1 6 9 4
26 3 .0 9 4 1 2 .9 5 7 9 2 .8 1 5 0 2 .6 6 4 0 2 .5 8 4 8 2 .5 0 2 6 2 .4 1 7 0 2 ,3 2 7 3 2 .2 3 2 5 2 .1 3 1 5
27 3 .0 6 1 8 2 .9 2 5 6 2 .7 8 2 7 2 .6 3 1 6 2 .5 5 2 2 2 .4 6 9 9 2 .3 8 4 0 2 .2 9 3 8 2 .1 9 8 4 2 .0 9 6 5
28 3 .0 3 2 0 2 .8 9 5 9 2 .7 5 3 0 2 .6 0 1 7 2 .5 2 2 3 2 .4 3 9 7 2 .3 5 3 5 2 .2 6 2 9 2 .1 6 7 0 2 .0 6 4 2
29 3 .0 0 4 5 2 .8 6 8 5 2 .7 2 5 6 2 .5 7 4 2 2 .4 9 4 6 2 .4 1 1 8 2 .3 2 5 3 2 .2 3 4 4 2 .1 3 7 8 2 .0 3 4 2
30 2 .9 7 9 1 2 .8 4 3 1 2 .7 0 0 2 2 .5 4 8 7 2 .4 6 8 9 2 .3 8 6 0 2 .2 9 9 2 2 .2 0 7 9 2 .1 1 0 7 2 .0 0 6 2
40 2 .8 0 0 5 2 .6 6 4 8 2 .5 2 1 6 2 .3 6 8 9 2 .2 8 8 0 2 .2 0 3 4 2 .1 1 4 2 2 .0 1 9 4 1 .9 1 7 2 1 .8 0 4 7
60 2 .6 3 1 8 2 .4 9 6 1 2 .3 5 2 3 2 .1 9 7 8 2 .1 1 5 4 2 .0 2 8 5 1 .9 3 6 0 1 .8 3 6 3 1 .7 2 6 3 1 .6 0 0 6
120 2 .4 7 2 1 2 .3 3 6 3 2 .1 9 1 5 2 .0 3 4 6 1 .9 5 0 0 1 .8 6 0 0 1 .7 6 2 8 1 .6 5 5 7 1 .5 3 3 0 1 .3 8 0 5
CO 2 .3 2 0 9 2 .1 8 4 8 2 .0 3 8 5 1 .8 7 8 3 1 .7 9 0 8 1 .6 9 6 4 1 .5 9 2 3 1 .4 7 3 0 1 .3 2 4 6 1.0000
OF F DISTRIBUTIONS
D1X G ( C o n t in u e d )
a = .005
1 2 3 4 5 6 7 8 9
5 2 2 .7 8 5 1 8 .3 1 4 1 6 .5 3 0 1 5 .5 5 6 1 4 .9 4 0 1 4 .5 1 3 1 4 .2 0 0 1 3 .961 1 3 ,7 7 2
6 1 8 .6 3 5 1 4 .5 4 4 1 2 .9 1 7 1 2 .0 2 8 1 1 .4 6 4 1 1 ,0 7 3 1 0 .7 8 6 1 0 .5 6 6 1 0 .391
7 1 6 .2 3 6 1 2 .4 0 4 1 0 .8 8 2 1 0 .0 5 0 9 .5 2 2 1 9 ,1 5 5 4 8 .8 8 5 4 8 .6 7 8 1 8 .5 1 3 8
8 1 4 .6 8 8 1 1 .0 4 2 9 .5 9 6 5 8 .8 0 5 1 8 .3 0 1 8 7 .9 5 2 0 7 .6 9 4 2 7 .4 9 6 0 7 .3 3 8 6
9 1 3 .6 1 4 1 0 .1 0 7 8 .7 1 7 1 7 .9 5 5 9 7 .4 7 1 1 7 .1 3 3 8 6 .8 8 4 9 6 .6 9 3 3 6 .5 4 1 1
10 1 2 .8 2 6 9 .4 2 7 0 8 .0 8 0 7 7 .3 4 2 8 6 .8 7 2 3 6 .5 4 4 6 6 .3 0 2 5 6 .1 1 5 9 5 .9 6 7 6
11 1 2 .2 2 6 8 .9 1 2 2 7 .6 0 0 4 6 .8 8 0 9 6 .4 2 1 7 6 .1 0 1 5 5 .8 6 4 8 5 .6 8 2 1 5 .5 3 6 8
12 1 1 .7 5 4 8 .5 0 9 6 7 .2 2 5 8 6 .5 2 1 1 6 .0 7 1 1 5 .7 5 7 0 5 .5 2 4 5 5 .3 4 5 1 5 .2 0 2 1
13 1 1 .3 7 4 8 .1 8 6 5 6 .9 2 5 7 6 .2 3 3 5 5 .7 9 1 0 5 .4 8 1 9 5 .2 5 2 9 5 .0 7 6 1 4 .9 3 5 1
14 1 1 .0 6 0 7 .9 2 1 7 6 .6 8 0 3 5 .9 9 8 4 5 .5 6 2 3 5 .2 5 7 4 5 .0 3 1 3 4 .8 5 6 6 4 .7 1 7 3
15 1 0 .7 9 8 7 .7 0 0 8 6 .4 7 6 0 5 .8 0 2 9 5 .3 7 2 1 5 .0 7 0 8 4 .8 4 7 3 4 .6 7 4 3 4 .5 3 6 4
16 1 0 .5 7 5 7 .5 1 3 8 6 .3 0 3 4 5 .6 3 7 8 5 .2 1 1 7 4 .9 1 3 4 4 .6 9 2 0 4 .5 2 0 7 4 .3 8 3 8
17 1 0 .3 8 4 7 .3 5 3 6 6 .1 5 5 6 5 .4 9 6 7 5 .0 7 4 6 4 .7 7 8 9 4 .5 5 9 4 4 .3 8 9 3 4 .2 5 3 5
18 1 0 .2 1 8 7 .2 1 4 8 6 .0 2 7 7 5 .3 7 4 6 4 .9 5 6 0 4 .6 6 2 8 4 .4 4 4 8 4 .2 7 5 9 4 .1 4 1 0
19 1 0 .0 7 3 7 .0 9 3 5 5 .9 1 6 1 5 .2 6 8 1 4 .8 5 2 6 4 .5 6 1 4 4 .3 4 4 8 4 .1 7 7 0 4 .0 4 2 8
20 9 .9 4 3 9 6 .9 8 6 5 5 .8 1 7 7 5 .1 7 4 3 4 .7 6 1 6 . 4 .4 7 2 1 4 .2 5 6 9 4 .0 9 0 0 3 .9 5 6 4
21 9 .8 2 9 5 6 .8 9 1 4 5 .7 3 0 4 5 .0 9 1 1 4 .6 8 0 8 4 .3 9 3 1 4 .1 7 8 9 4 .0 1 2 8 3 .8 7 9 9
22 9 .7 2 7 1 6 .8 0 6 4 5 .6 5 2 4 5 .0 1 6 8 4 .6 0 8 8 4 .3 2 2 5 4 .1 0 9 4 3 .9 4 4 0 3 .8 1 1 6
23 9 .6 3 4 8 6 .7 3 0 0 5 .5 8 2 3 4 .9 5 0 0 4 .5 4 4 1 4 .2 5 9 1 4 .0 4 6 9 3 .8 8 2 2 3 .7 5 0 2
24 9 .5 5 1 3 6 .6 6 1 0 5 .5 1 9 0 4 .8 8 9 8 4 .4 8 5 7 4 .2 0 1 9 3 .9 9 0 5 3 .8 2 6 4 3 .6 9 4 9
25 9 .4 7 5 3 6 .5 9 8 2 5 .4 6 1 5 4 .8 3 5 1 4 .4 3 2 7 4 .1 5 0 0 3 .9 3 9 4 3 .7 7 5 8 3 .6 4 4 7
26 9 .4 0 5 9 6 .5 4 0 9 5 .4 0 9 1 4 .7 8 5 2 4 .3 8 4 4 4 .1 0 2 7 3 .8 9 2 8 3 .7 2 9 7 3 .5 9 8 9
27 9 .3 4 2 3 6 .4 8 8 5 5 .3 6 1 1 4 .7 3 9 6 4 .3 4 0 2 4 .0 5 9 4 3 .8 5 0 1 3 .6 8 7 5 3 .5 5 7 1
28 9 .2 8 3 8 6 .4 4 0 3 5 .3 1 7 0 4 .6 9 7 7 4 .2 9 9 6 4 .0 1 9 7 3 .8 1 1 0 3 .6 4 8 7 3 .5 1 8 6
29 9 .2 2 9 7 6 .3 9 5 8 5 .2 7 6 4 4 .6 5 9 1 4 .2 6 2 2 3 .9 8 3 0 3 .7 7 4 9 3 .6 1 3 0 3 .4 8 3 2
30 9 .1 7 9 7 6 .3 5 4 7 5 .2 3 8 8 4 .6 2 3 3 4 .2 2 7 6 3 .9 4 9 2 3 .7 4 1 6 3 .5 8 0 1 3 .4 5 0 5
40 8 .8 2 7 8 6 .0 6 6 4 4 .9 7 5 9 4 .3 7 3 8 3 .9 8 6 0 3 .7 1 2 9 3 .5 0 8 8 3 .3 4 9 8 3 .2 2 2 0
60 8 .4 9 4 6 5 .7 9 5 0 4 .7 2 9 0 4 .1 3 9 9 3 .7 6 0 0 3 .4 9 1 8 3 .2 9 1 1 3 .1 3 4 4 3 .0 0 8 3
20 8 .1 7 9 0 5 .5 3 9 3 4 .4 9 7 3 3 .9 2 0 7 3 .5 4 8 2 3 .2 8 4 9 3 .0 8 7 4 2 .9 3 3 0 2 .8 0 8 3
7 .8 7 9 4 5 .2 9 8 3 4 .2 7 9 4 3 .7 1 5 1 3 .3 4 9 9 3 .0 9 1 3 2 .8 9 6 8 2 .7 4 4 4 2 .6 2 1 0
TABLE OF F DISTRIBUTIONS 647
APPENDIX G ( Continued)
a = .005
\ j> 1
m2 \ 10 12 15 20 24 30 40 60 120 cc
1 24224 24426 24630 24836 24940 25044 25148 25253 25359 25465
2 199.40 199.42 199.43 199.45 199.46 199.47 199.47 199.48 199.49 199.51
3 43.686 43.387 43.085 42.778 42.622 42.466 42.308 42.149 41.989 41.829
4 20.967 20.705 20.438 20.167 20.030 19,892 19.752 19.611 19.468 19.325
5 13.618 13.384 13.146 12.903 12.780 12.656 12.530 12.402 12.274 12.144
6 10.250 10.034 9.8140 9.5888 9.4741 9.3583 9.2408 9.1219 9.0015 8.8793
7 8.3803 8.1764 7.9678 7.7540 7.6450 7.5345 7.4225 7.3088 7.1933 7.0760
8 7.2107 7.0149 6.8143 6.6082 6.5029 6.3961 6.2875 6.1772 6.0649 5.9505
9 6.4171 6.2274 6.0325 5.8318 5.7292 5.6248 5.5186 5.4104 5.3001 5.1875
10 5.8467 5.6613 5.4707 5.2740 5.1732 5.0705 4.9659 4.8592 4.7501 4.6385
11 5.4182 5.2363 5.0489 4.8552 4.7557 4.6543 4.5508 4.4450 4.3367 4.2256
12 5.0855 4.9063 4.7214 4.5299 4.4315 4.3309 4.2282 4.1229 4.0149 3.9039
13 4.8199 4.6429 4.4600 4.2703 4.1726 4.0727 3.9704 3.8655 3.7577 3.6465
14 4.6034 4.4281 4.2468 4.0585 3.9614 3.8619 3.7600 3.6553 3.5473 3.4359
15 4.4236 4.2498 4.0698 3.8826 3.7859 3.6867 3.5850 3.4803 3,3722 3.2602
16 4.2719 4.0994 3.9205 3.7342 3.6378 3.5388 3.4372 3.3324 3.2240 3.1115
17 4.1423 3.9709 3.7929 3.6073 3.5112 3.4124 3.3107 3.2058 3,0971 2.9839
18 4.0305 3.8599 3.6827 3.4977 3.4017 3.3030 3.2014 3.0962 2.9871 2.8732
19 3.9329 3.7631 3.5866 3.4020 3.3062 3.2075 3.1058 3.0004 2.8908 2.7762
20 3.8470 3.6779 3.5020 3.3178 3.2220 3.1234 3.0215 2.9159 2.8058 2.6904
21 3.7709 3.6024 3.4270 3.2431 3.1474 3.0488 2.9467 2.8408 2.7302 2.6140
22 3.7030 3.5350 3.3600 3.1764 3.0807 2.9821 2.8799 2.7736 2.6625 2.5455
23 3.6420 3.4745 3.2999 3.1165 3.0208 2.9221 2.8198 2.7132 2.6016 2.4837
24 3.5870 3.4199 3.2456 3.0624 2.9667 2.8679 2.7654 2.6585 2.5463 2.4276
25 3.5370 3.3704 3.1963 3.0133 2.9176 2.8187 2,7160 2.6088 2.4960 2.3765
26 ' 3.4916 3.3252 3.1515 2.9695 2.8728 2.7738 2.6709 2.5633 2.4501 2.3297
27 3.4499 3.2839 3.1104 2.9275 2.8318 2.7327 2.6296 2.5217 2.4078 2.2867
28 3.4117 3.2460 3.0727 2.8899 2.7941 2.6949 2.5916 2.4834 2.3689 2.2469
29 3.3765 3.2111 3.0379 •2.8551 2.7594 2.6601 2.5565 2.4479 2.3330 2.2102
30 3.3440 3.1787 3.0057 2.8230 2.7272 2.6278 2.5241 2.4151 2.2997 2.1760
40 3.1167 2.9531 2.7811 2.5984 2.5020 2.4015 2.2958 2.1838 2.0635 1.9318
60 2.9042 2.7419 2.5705 2.3872 2.2898 2.1874 2.0789 1.9622 1.8341 1.6885
120 2.7052 2.5439 2.3727 2.1881 2.0890 1.9839 1.8709 1.7469 1.6055 1.4311
CO 2.5188 2.3583 2.1868 1.9998 1.8983 1.7891 1.6691 1.5325 1.3637 1.0000
S o u rc e : I. Guttman and S. S. Wilks, In tro d u c tio n to E n g in e e rin g S ta tis tic s , John Wiley & Sons,
New York, 1965, pp. 320-29.
APPENDIX H
Percentage Points of
Run Distribution
Laplace transform (two-sided), 451 Markov sequences, 139, 249, 276, 295
Lattice-type random variable, 28 asymptotic behavior, 284
Law o f large numbers, 93. 94 chains, 276
Leakage, spectral, 573, 582 C hapm an-K olm ogorov equations, 284
Levinson recursive algorithm, 599 hom ogeneous, 282
Likelihood function, 488. 595 state diagram, 277
conditional, 595 state probabilities, 279
Likelihood ratio, 346, 350 transition matrix, 281
Limitations o f linear estimation, 391 transition probabilities, 279
Limiting state probabilities, 285 Martingale, 126, 129, 205, 206
Limit in mean square, 161 M-ary detection, 366
Linear filter, 377 Matched filter, 355
Linear MS estimation, 377-474 for colored noise, 362
Linear regression, 529 for white noise, 355
Linear system. 215, 216 Matrix, covariance, 50, 313
causal, 215, 217 Maximum a p o s t e r i o r i (M A P ) rule, 345, 350,
impulse response, 219 368
lumped parameter, 215. 216 Maximum likelihood estimator, 488
Linear time invariant system: o f probability, 553
causal, 216 o f a -2 in normal, 553
continuous time, 227 o f \ in Poisson, 553
discrete time, 218 o f p. in normal, 553
output autocorrelation function, 221, 223, Maxwell density, 57
229, 230 Maxwell random variable, 507
output mean-squared value, 234 Mean square (M S):
output mean value, 221, 222, 228, 229 continuity, 161, 162
output power spectrum, 224, 228, 230 continuous process, 162, 211
stable. 219, 220, 227 convergence, 94
stationarity o f the output, 222, 229 derivative, 162, 163
transfer function, 219, 230 differentiability, 162, 211
Linear transformation, 402 integral, 165, 211
o f Gaussian variables, 70 limit, 94
general, 66 value, 234, 564
random processes, s e e Linear system Mean squared error (M SE ), 327, 379, 469,
o f random variable, 66 496,560
Linear trend, correction for, 587 definition, 327, 496
Lowpass filter, s e e Filters for histograms, 500
Low-pass processes, 146, 173 for Kalman filters, 429, 437
sampling theorem, 190, 191 recursive expression for, 427
for Wiener filters, 456
Mean squared estimation, 377-474
Marginal probability, 16, 25 filtering, 407-466
Marginal probability density function, 36, 37 recursive filtering, 420
Markov chain, 276, 295 Mean value, 26
continuous time, 276, 289 o f complex random variables, 47
homogeneous, 282 conditional, 27
limiting state probabilities, 285 estimate of, 486
long-run behavior, 284 o f function o f several random variables, 81
state diagram, 277 normalized RM S error for, 497
two-state, 286, 291 o f random variable, 26
Markov processes, 126, 249, 276 o f system response, 221, 228
birth death process, 292 o f time averages, 170
C hapm an-Kolm ogorov equations, 289 Mean vector, 49
homogeneous, 282 Measurements, 476
transition intensities, 289 unbiased, 476
Markov property, 126, 205 Median o f a random variable, 467, 555
INDEX 659
Variance:
o f autocorrelation function estimator, 569
o f complex random variables, 47
definition, 27
estimate o f, 486
o f histograms, 498
o f mean estimator, 566
o f periodogram, 574
o f smoothed estimators, 584
o f time averages, 170
Vector measurement, 432
Vector-valued random variable, 47
Z transform, 220
table, 630