Problems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Problems in Detection and Estimation Theory

Joseph A. O’Sullivan
Electronic Systems and Signnals Research Laboratory
Department of Electrical and Systems Engineering
Washington University in St. Louis
St. Louis, MO 63130
[email protected]
May 4, 2006

Introduction
In this document, problems in detection and estimation theory are collected. These problems are primarily
written by Professor Joseph A. O’Sullivan. Most have been written for examinations ESE 524 or its pre-
decessor EE 552A at Washington University in St. Louis, and are thereby copyrighted. Some come from
qualifying examinations and others are simply problems from homework assignments in one of these classes.
Use of these problems should include a citation to this document.
In order to give some organization to these problems, they are grouped into roughly six categories:
1. basic detection theory;
2. basic estimation theory;
3. detection theory;
4. estimation theory;
5. expectation-maximization;
6. recursive detection and estimation.
The separation into these categories is rather rough. Basic detection and estimation theory deal with finite
dimensional observations and test knowledge of introductory, fundamental ideas. Detection and estimation
theory problems are more advanced, touching on random processes, joint detection and estimation, and other
important extensions of the basic theory. The use of the expectation-maximization algorithm has played an
important role in research at Washington University since the early 1980’s, motivating inclusion of problems
that test its fundamental understanding. Recursive estimation theory is primarily based on the Kalman
filter. The recursive computation of a loglikelihood function leads to results in recursive detection.
The problems are separated by theoretical areas rather than applications based on the view that theory
is more fundamental. Many applications touched on here are explored in significantly more depth elsewhere.

1 Basic Detection Theory


1.1 Analytically Computable ROC
Suppose that under hypothesis H1 , the random variable X has probability density function
3 2
pX (x) = x , for − 1 ≤ x ≤ 1. (1)
2

1
Under hypothesis H0 , the random variable X is uniformly distributed on [ -1 , 1 ].

a. Use the Neyman-Pearson lemma to determine the decision rule to maximize the probability of detection
subject to the constraint that the false alarm probability is less than or equal to 0.1. Find the resulting
probability of detection.
b. Plot the receiver operating characteristic for this problem. Make your plot as good as possible.

1.2 Correlation Test of Two Gaussian Random Variables


Suppose that X1 and X2 are jointly distributed Gaussian random variables. There are two hypotheses for
their joint distribution. Under either hypothesis they are both zero mean. Under hypothesis H1 , they are
independent with variances 20/9 and 5, respectively. Under hypothesis H2 ,
    
X1 4 4
E [X1 X2 ] = (2)
X2 4 9

Determine the optimal test for a Neyman-Pearson test. Sketch the form of the corresponding decision
region.

1.3 Discrete-Time Exponentially Decaying Signal in AWGN


Suppose that two data models are
 1 n
Hypothesis 1: R(n) = 4 + W (n) (3)
 1 n
Hypothesis 2: R(n) = c 3
+ W (n), (4)

where under either hypothesis W (n) is a sequence of independent and identically distributed (i.i.d.) Gaussian
random variables with zero mean and variance σ 2 ; under each hypothesis, the noise W (n) is independent of
the signal. The variable c and the variance σ 2 are known.
Assume that measurements are available for n = 0, 1, . . . , N − 1.
a. Find the loglikelihood ratio test.
b. What single quantity parameterizes performance?
c. What is the limiting signal to noise ratio as N goes to infinity?
d. What is the value of the variable c that minimizes performance?

1.4 Two Zero Mean Gaussian


Two random variables (R1 , R2 ) are measured. Assume that under Hypothesis 0, the two random variables
are independent Gaussian random variables with mean zero and variance 2. Under Hypothesis 1, the two
random variables are independent Gaussian random variables with mean zero and variance 3.
a. Find the decision rule that maximizes the probability of detection subject to a constraint on the
probability of false alarm, PF ≤ α.
b. Derive an equation for the probability of detection as a function of α.

1.5 Exponential Random Variables in Queuing


In queuing systems, packets or messages are processed by blocks in the system. These processing blocks
are often called queues. A common model for a queue is that the time it takes to process a message is an
exponential random variable. There may be an additional model for the times at which messages enter the
queue, a common model of which is a Poisson process.

2
Recall that if X is exponentially distributed with mean μ, then
 
1 x
pX (x) = exp − , x ≥ 0. (5)
μ μ
Suppose that one queue is being monitored. A message enters at time t = 0 and exits at time t = T .
Under Hypothesis H1 , T is an exponentially distributed random variable with mean μ1 ; under H0 , T is
exponentially distributed with mean μ0 . Assume that μ1 > μ0 .
a. Prove that the likelihood ratio test is equivalent to comparing T to a threshold γ.
b. For an optimum Bayes test, find γ as a function of the costs and the a priori probabilities.
c. Now assume that a Neyman-Pearson test is used. Find γ as a function of the bound on the false alarm
probability PF , where PF = P (say H1 |H0 is true).
d. Plot the ROC for this problem for μ0 = 1 and μ1 = 5.
e. Now consider N independent and identically distributed measurements of T denoted T1 , T2 , . . . , TN . Show
that the likelihood ratio test may be reduced to comparing

1 
N
l(T) = Ti (6)
N i=1

to a threshold. Find the probability density function for l(T) under each hypothesis.

1.6 Gaussian Variance Test


In many problems in radar, the reflectivity is a complex Gaussian random variable. Sequential measurements
of a given target that fluctuates rapidly may yield independent realizations of these random variables. It
may then be of interest to decide between two models for the variance.
Assume that N independent measurements are made, with resulting i.i.d. random variables Ri , i =
1, 2, . . . , N . The models are

H1 : Ri is N (0, σ12 ), i = 1, 2, . . . , N, (7)


H0 : Ri is N (0, σ02 ), i = 1, 2, . . ., N, (8)

where σ1 > σ0 .
a. Find the likelihood ratio test.
b. Show that the likelihood ratio test may be simplified to comparing the sufficient statistic

1  2
N
l(R) = Ri (9)
N
i=1

to a threshold.
c. Find an expression for the probability of false alarm, PF , and the probability of miss, PM .
d. Plot the ROC for σ02 = 1, σ12 = 2, and N = 2.

1.7 Binary Observations: Test for Bias in Coin Flips


Suppose that there are only two possible outcomes of an experiment; call the outcomes heads and tails. The
problem here is to decide whether the process used to generate the outcomes is fair. The hypotheses are

H1 : P (Ri = heads) = p, i = 1, 2, . . . , N (10)


H0 : P (Ri = heads) = 0.5, i = 1, 2, . . . , N. (11)

Under each hypothesis, the random variables Ri are i.i.d.


a. Determine the optimal likelihood ratio test. Show that the number of heads is a sufficient statistic.

3
b. Note that the sufficient statistic does not depend on p, but the threshold does. For a finite number N , if
only nonrandomized tests are considered, then the ROC has N +1 points on it. For N = 10 and p = 0.7, plot
the ROC for this problem. You may want to do this using a computer because you will need the cumulative
distribution function for a binomial.
c. Now consider a randomized test. In a randomized test, for each value of the sufficient statistic, the decision
is random. Hypothesis H1 is chosen with probability φ(l) and H0 is chosen with probability 1−φ(l). Consider
the Neyman-Pearson criterion with probability of false alarm PF = α. Show that the optimal randomized
strategy is a probabilistic mixture of two ordinary likelihood ratio tests; the first likelihood ratio test achieves
the next greater probability of false alarm than α, while the second achieves the next lower probability of
false alarm than α. Find φ as a function of α. Finally, note that the resulting ROC results from the original
ROC after connecting the achievable points with straight lines.

1.8 Likelihood Ratio as a Random Variable


The problem follows Problem 2.2.13 from H. L. Van Trees, Vol. 1, closely.
The likelihood ratio Λ(R) is a random variable

p(R|H1)
Λ(R) = . (12)
p(R|H0)

Prove the following properties of the random variable Λ.


a. E(Λn |H1) = E(Λn+1 |H0 )
b. E(Λ|H0 ) = 1.
c. E(Λ|H1 ) − E(Λ|H0 ) = var(Λ|H0 ).

1.9 Matlab Problem


Write Matlab subroutines that allow you to determine detection performance experimentally. The case
studied here is the signal in additive Gaussian noise problem, for which the experimental performance
should be close to optimal.
Let Wk be i.i.d. N (0, σ 2 ), independent of the signal. In the general case, there can be a signal under
either hypothesis,

H0 : Rk = s0k + Wk , k = 1, 2, . . . , K, (13)
H1 : Rk = s1k + Wk , k = 1, 2, . . ., K. (14)

For this problem, we look at the special case of deciding whether there is one or there are two exponentials
in noise, as a function of the noise level and the two exponentials.

H0 : Rk = exp(−k/4) + Wk , k = 1, 2, . . . , K, (15)
H1 : Rk = exp(−k/4) + exp(−k/(4T )) + Wk , k = 1, 2, . . . , K. (16)

Write Matlab routines to generate the data under either hypothesis. Hint:
kk=1:K;
w=randn(1,K);
r0=exp(kk/4)+w;

Write a separate Matlab function to compute the optimal test statistic.


Write Matlab routines to generate N independent vectors, and compute the resulting N test statistics, under
each hypothesis.
Write a Matlab routine to compute an empirical receiver operating characteristic from these data. Given
any value of the threshold, the fraction of test statistic values greater than the threshold, given the null

4
hypothesis H0 , is the empirical probability of false alarm. Similarly, given any value of the threshold, the
fraction of test statistic values greater than the threshold, given hypothesis H1 , is the empirical probability
of detection.
a. Plot the empirical receiver operating characteristic for fixed σ 2 and K as T varies (T > 1).
b. Plot the empirical receiver operating characteristic for fixed T and K as σ 2 varies.
c. Plot the empirical receiver operating characteristic for fixed σ 2 and T as K varies.
Evaluate and summarize your results in a concise manner.

1.10 Test for Poisson Means


An experiment is performed to determine whether one or two radioactive substances are in a beaker. A
sensor is set up nearby the beaker to measure the radioactive events. Assuming the measurements are made
over a time interval which is short compared to the half-lives of the radioactive substances, the rate of decay
is constant. The number of events for each substance is a Poisson counting process. Thus, we have two
hypotheses:
(λ1 T )n −λ1 T
H0 : m(T ) = n1 (T ), where P {n1(T ) = n} = e (17)
n!
(λ2 T )n −λ2 T
H1 : m(T ) = n1 (T ) + n2 (T ), where P {n2 (T ) = n} = e (18)
n!
Here, T is the length of time over which the measurements are made. Assume the cost of a correct decision
is 0 (C00 = 0, and C11 = 0).
a. For fixed T , P0 , P1 , C10 and C01 , determine the optimum Bayes test.
b. Let λ1 T = 1 and λ2 T = 3. Plot the receiver operating characteristic. Note that the plot consists
of a discrete set of points. Locate the point on your curve where P0 C10 /P1 C01 is approximately equal to 4.
c. How does the ROC curve vary as T is increased? State your answer in general, but to quantify your
answer, examine the specific case where the observation interval is doubled (T is doubled from its value in
part b). In particular, you will want to plot the ROC for this new observation interval.

1.11 Basic Detection Theory


Suppose that under each of two hypotheses the three random variables X1 , X2 , and X3 are independent
Poisson random variables. Under hypothesis H0 , they all have mean 2. Under hypothesis H1 , they have
means 1, 2, and 3, respectively. Assume that the prior probabilities on the two hypotheses are each 0.5.
a. Find the minimum probability of error decision rule.
b. Draw a picture of the decision region.

1.12 Cauchy Distributions


Suppose that under hypothesis H1 , the random variable X has a Cauchy probability density function with
mean 1
1
pX (x) = . (19)
π [1 + (x − 1)2 ]
Under hypothesis H0 , the random variable X has a Cauchy probability density function with mean 0
1
pX (x) = . (20)
π [1 + x2 ]
a. Given a single measurement find the likelihood ratio test.
b. For a single measurement, sketch the receiver operating characteristic.
In your calculations, you may need the following indefinite integral:
1
dX = tan−1 (x) + constant. (21)
1 + x2

5
1.13 Maxwell versus Rayleigh Distribution
Problem motivation. In some problems of a particle moving in space, it is not clear whether the motion is
random in three-dimensions, random but restricted to a two-dimensional surface, or somewhere in between
(fractal motion of smoke, for example). In order to decide between these hypotheses, the positions of
particles are measured and candidate density functions can be compared. While this can be turned into a
dimensionality estimation problem, here we treat the simpler case of deciding between the two extreme cases
of random motion in two or three dimensions.
It is assumed that only the distance of the particles relative to an origin is measurable, not the positions
in three dimensions. In three dimensions, the distance at any time is Maxwell distributed. In two dimensions,
the distance is Rayleigh distributed. The particles are assumed to be independent and identically distributed
under either hypothesis.
Suppose that under hypothesis H1 , each random variable Xi has a Maxwell probability density function
 
x2 2/π x2
pX (x) = exp − 2 , x ≥ 0. (22)
σ3 2σ

Under hypothesis H0 , each random variable Xi has a Rayleigh probability density function
 
x x2
pX (x) = 2 exp − 2 , x ≥ 0. (23)
σ 2σ

a. Given N independent and identically distributed measurements, determine the optimal Bayes test.
b. Determine the probability of false alarm for N = 1 measurement.
c. Determine the threshold used in the Neyman-Pearson test for N = 1.

1.14 3-ary Detection Theory


Suppose that there are three hypotheses, H1 , H2 , and H3 . The prior probabilities of the hypotheses are
each 1/3. There are three observed random variables, X1 , X2 , and X3 . Under each hypothesis, the random
variables are independent and equal
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
X1 S1 W1
⎣ X2 ⎦ = ⎣ S2 ⎦ + ⎣ W2 ⎦ . (24)
X3 S3 W3

The random variables W1 , W2 , and W3 are independent noise random variables. They are Poisson distributed
with means equal to 1. The noise random variables Wi are all independent of the signal random variables
Sk . Under each hypothesis, the signal random variables S1 , S2 , and S3 are independent Poisson random
variables. The means of the signal random variables depend on the hypotheses:
⎧⎡ ⎤⎫ ⎡ ⎤ ⎧⎡ ⎤⎫ ⎡ ⎤
⎨ S1 ⎬ 1 ⎨ S1 ⎬ 1
H1 : E ⎣ S2 ⎦ = ⎣ 1 ⎦ H2 : E ⎣ S2 ⎦ = ⎣ 0 ⎦ (25)
⎩ ⎭ ⎩ ⎭
S3 0 S3 1
⎧⎡ ⎤⎫ ⎡ ⎤
⎨ S1 ⎬ 0
H3 : E ⎣ S2 ⎦ = ⎣ 1 ⎦ . (26)
⎩ ⎭
S3 1

a. Find the optimal decision rule.


b. Find an expression for the probability of error.

6
1.15 N Pairs of Gaussian Random Variables in AWGN
Consider the hypothesis testing problem:

H0 : r = n (27)

H1 : r = s + n (28)

where n is a 2N × 1 real valued Gaussian random vector which is 0 mean with covariance matrix σn2 I (n is
N (0, σn2 I); the signal vector s is also a 2N ×1 real valued Gaussian random vector, N (0, Ks ). The covariance
matrix Ks is a block diagonal matrix with N blocks of size 2×2 each:
⎡ ⎤
σs2 ρσs2 0 0 ··· 0 0
⎢ ρσs2 σs2 0 0 ··· 0 0 ⎥
⎢ ⎥
⎢ 0 0 2 2
σs ρσs · · · 0 0 ⎥
⎢ ⎥
Ks = ⎢ ⎢ 0 0 ρσs2 σs2 · · · 0 0 ⎥ ⎥ (29)
⎢ ··· ··· ··· ··· ··· ··· ··· ⎥
⎢ ⎥
⎣ 0 0 0 0 · · · σs2 ρσs2 ⎦
0 0 0 0 · · · ρσs2 σs2

a. Determine the optimum Bayes test as a function of the costs and the a priori probabilities (assume the
cost of correct decisions is 0). Determine a sufficient statistic for the problem. Simplify the expression for
the sufficient statistic as much as possible. (You may want to use the fact that the inverse of a block diagonal
matrix is also block diagonal, with the inverses of the blocks from the original matrix along its diagonal.)
b. Determine an equation the threshold must satisfy for the Neyman-Pearson criterion. Is this threshold
easy to compute?
c. Plot the ROC for the particular case of N = 2, ρ = .25, and σn2 /σs2 = .25. Hint: This is probably
most easily done after an appropriate rotation of the received data. It may also be done without a rotation.
Remember, r is a 2N × 1 vector, so it is 4×1 for this part.

1.16 Information Rate Functions: Gaussian Densities with Different Means


Define the two hypotheses:

H1 : Ri = mi + Wi , i = 1, 2, . . . , n
H0 : Ri = Wi , i = 1, 2, . . . , n

where the random variables Wi are i.i.d. N (0, σ 2 ). The means mi under hypothesis H1 are known.
a. Derive the log-likelihood ratio test for this problem. Show that when written in vector notation, the
loglikelihood ratio equals
1 1
l(r) = 2 (r − m)T m, (30)
σ 2
where m is the n × 1 vector of the means mi .
b. Show that the performance is completely determined by a signal-to-noise ratio

n
m2i
2
d = (31)
2σ2
i=1

c. Compute the moment generating function Φ0 (s) for the log-likelihood ratio given hypothesis H0 .
d. Compute the information rate function in two ways. First, directly from the log-moment generating
function φ0 (s) = ln Φ0 (s), compute
I0 (γ) = max sγ − φ0 (s). (32)
s

7
Second, use the relative entropy formula and the tilted density function. Recall that the tilted density
function is defined by
pR (r|H0)esl(r)
p(r : s) = . (33)
Φ0 (s)
The second formula for I0 (γ) is then

I0 (γ) = D(p(r : s)||pR(r|H0 )). (34)

Show that the two expressions are equal. This requires substituting into the relative entropy formula the
value of s such that the desired mean of l(R) is achieved (the desired mean is the threshold γ).

1.17 Information Rate Functions:


Gaussian Densities with Different Variances
Consider the hypothesis testing problem with hypotheses:
2
H1 : Ri is N (0, σ1i ), i = 1, 2, . . . , n
2
H0 : Ri is N (0, σ0i), i = 1, 2, . . . , n,

and under either hypothesis, the random variables Ri are independent.


a. Derive the log-likelihood ratio test for this problem. Show that the test statistic is

n  
1 σ2 rk2 1 1
l(r) = − ln 1k
2 − 2 − 2 . (35)
2 σ0k 2 σ1k σ0k
k=1

b. Find the tilted probability density function for this problem. Show that this tilted density function
2
corresponds to independent Gaussian random variables with zero mean and variances σsk , where

1 1−s s
2 = 2 + 2 . (36)
σsk σ0k σ1k

c. Find the log-moment generating function φ(s). Recall (as in the last problem) that Φ(s) is the normalizing
factor for the tilted density function and φ(s) = ln Φ(s).
d. For this problem, it may be difficult to find s explicitly as a function of the threshold γ. In order to
circumvent this difficulty, the standard approach is to represent the curves parametrically as a function of s.
Find γ as a function of s by using the property that the mean of the log-likelihood function using the tilted
density equals γ.
e. Using the representation of γ from part d, the information rate function may be found as a function of s
using
I0 (γ(s)) = sγ(s) − φ(s), (37)
and the result from part c.

1.18 Information Rate Functions:


Poisson Distribution Functions
Consider the hypothesis testing problem with hypotheses:

H1 : Ri is Poisson with means λ1i , i = 1, 2, . . . , n


H0 : Ri is Poisson with means λ0i , i = 1, 2, . . . , n,

8
and under either hypothesis, the random variables Ri are independent.
a. Derive the log-likelihood ratio test for this problem. Show that the test statistic is

n
λ1k
l(r) = rk ln − λ1k + λ0k . (38)
λ0k
k=1

b. Find the tilted probability density function for this problem. Show that this tilted density function
corresponds to independent Poisson random variables with means λsk , where

ln[λsk ] = (1 − s) ln[λ0k ] + s ln[λ1k ]. (39)

c-e. Repeat parts c, d, and e from the last problem here. This is needed again since there is no straightforward
expression for s in terms of γ.

1.19 Information Rate Functions:


Exponential Densities with Different Means
Consider the hypothesis testing problem with hypotheses:
r
1 − λ0
H0 : Ri ∼ p0 (r) = λ0
e , r ≥ 0, i = 1, 2, . . . , n
r
1 − λ1
H1 : Ri ∼ p1 (r) = λ1 e , r ≥ 0, i = 1, 2, . . . , n,

and under either hypothesis, the random variables Ri are independent.


a. Derive the log-likelihood ratio test for this problem and denote the loglikelihood ratio by l(r).
b. Find the log-moment generating function for the loglikelihood ratio φ0 (s). Recall that Φ0 (s) is the
moment generating function for the loglikelihood ratio (and is the normalizing factor for the tilted density
function) and φ0 (s) = ln Φ0 (s). Let l(r) be the loglikelihood function derived in part a; then
 
Φ0 (s) = E esl(R) |H0 . (40)

c. Consider a threshold in the loglikelihood ratio test that gets smaller with n. In particular, consider the
test that compares l(r) to a threshold γ/n. Find, as a function of γ, an upper bound on the probability of
false alarm

PF ≤ e−nI(γ) . (41)

In particular, I(γ) is the information rate function. In fact, for all problems like this,
1
I0 (γ) = lim − ln P (l(R) > γ/n|H0 ). (42)
n→∞ n
d. Find the tilted probability density function for this problem. Show that this tilted density function
corresponds to independent exponentially distributed random variables with mean λs , where
1 1−s s
= + . (43)
λs λ0 λ1
e. Let s correspond to γ as in the computation of the information rate function in part c. Show that the
relative entropy between the tilted density function using s and the density function under hypothesis H0
equals I(γ). That is,

D(ps ||p0) = I(γ(s)). (44)

9
2 Basic Estimation Theory
2.1 Basic Minimum Mean-Square Estimation Theory
Suppose that X, Y , and Z are jointly distributed Gaussian random variables. To establish notation, suppose
that
E [X Y Z] = [μx μy μz ] , (45)
and that ⎛⎡ ⎤ ⎞ ⎡ ⎤
X Rxx Rxy Rxz
E ⎝⎣ Y ⎦ [X Y Z]⎠ = ⎣ Rxy Ryy Ryz ⎦ (46)
Z Rxz Ryz Rzz
a. Show that X and Y − E[Y |X] are independent Gaussian random variables.
b. Show that X, Y −E[Y |X], and Z−E [Z|(X, Y − E[Y |X])] are independent Gaussian random variables.
Hint. This problem is easy.

2.2 Three Jointly Gaussian Random Variables


Suppose there are three random variables X, Y , and Z. X is Gaussian distributed with zero mean and
variance 4. Given X = x, the pair of random variables [Y, Z]T is jointly Gaussian with mean [x/2 x/6]T ,
and covariance matrix  
2 2/3
K= . (47)
2/3 17/9
a. In terms of a realization of the pair of random variables [Y, Z] = [y, z], find the conditional mean of
X: E[X|Y = y, Z = z].
b. Comment on the special form of the result in part a.
c. What is the variance of X given [Y, Z] = [y, z]?

2.3 Estimation of Variance and Correlation of Pairs of Gaussian Random Vari-


ables
Suppose a 2N × 1 real valued Gaussian random vector s is observed, where s is N (0, Ks ), and where Ks is
given by the expression in problem 1.15. The two parameters in this covariance matrix are σs2 and ρ.
a. Find the maximum likelihood estimates for σs2 and ρ.
b. Find the Cramer-Rao lower bound for the variance in estimating σs2 .

2.4 Uniform Plus Laplacian


Suppose that the random variable S is uniformly distributed between -3 and +3, denoted U [−3, +3]. The
data R is a noisy measurement of S,
R = S + N, (48)
where N is independent of S and is Laplacian distributed with probability density function
1 −|n|
pN (n) = e . (49)
2
a. Find the minimum mean-square error estimate of S given R. Show your work.
b. Find the maximum a posteriori estimate of S given R.

10
2.5 Poisson Mean Estimation
Suppose that Xi , i = 1, 2, . . . , N are i.i.d. Poisson distributed random variables with mean λ.
a. Find the maximum likelihood estimate for λ.
b. Find the Cramer-Rao lower bound on the variance of any unbiased estimator.
c. Compute the bias of the maximum likelihood estimator.
d. Compute the variance of the maximum likelihood estimator. Compare this result to the Cramer-Rao
lower bound from part b, and comment on the result.

2.6 Covariance Function Structure


Suppose that X1 , X2 , . . . , Xn is a set of zero mean Gaussian random variables. The covariance between any
two random variables is
E{Xi Xj } = 0.9|i−j|. (50)
a. Derive an expression for E{X2 |X1 }.
b. Derive an expression for E{X3 |X2 , X1 }.
c. Find the conditional mean squared error

E{(X3 − E{X3 |X2 , X1 })2 }. (51)

d. Can you generalize this to a conclusion about the form of E{Xn |X1 , X2 , . . . , Xn−1 }? Given previous
values, what is a minimal sufficient statistic for estimating Xn ?
Comment: Note that any covariance function of the form α|i−j| has the property discussed in this problem.
Relate this covariance function to the first-order autoregressive model

Xn = αXn−1 + Wn , (52)

where Wn is a sequence of independent and identically distributed random variables, each with zero mean
and variance
1
σ2 = . (53)
1 − α2

2.7 Independent and Identically Distributed Pairs of Gaussian Random Vari-


ables
Suppose that (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) are pairwise independent and identically distributed jointly
Gaussian random variables. The mean of Xi is 3. The mean of Yi is 5. The variance of Xi is 17. The
variance of Yi is 11. The covariance between Xi and Yi is 7.
a. Find the expression for E[Xi |Yi ] in terms of Yi .
b. Find the expression for   
1  
n
E Xi  Y1 , Y2 , . . . Yn (54)
n 
i=1
1 n
Given the random variables {Y1 , Y2 , . . . Yn }, what is the sufficient statistic for the mean n i=1 Xi ?
c. Now define two new random variables

1 1
n n
T = Xi S= Yi . (55)
n i=1 n i=1

Find the joint distribution for the random variables T and S.


d. Find the expression for E[T |S] in terms of S. Compare this to your answer in part b. Comment on
this result.

11
2.8 Basic Estimation Theory
Suppose that

R1 = 5 cos θ + W1 , (56)
R2 = 5 sin θ + W2 , (57)

where θ is a deterministic parameter to be estimated and W1 and W2 are independent and identically
distributed Gaussian random variables, with zero mean and variance 3.
a. Find the maximum likelihood estimate for θ.
b. Find the Cramer-Rao lower bound for the estimate of θ. Does this bound depend on the true value
of θ? Comment on this. The variable θ has units of radians.

2.9 Compound Model: Simple Discrete Case


Suppose that an experiment has three possible outcomes, denoted 1, 2, and 3. Each run of the experiment
consists of two parts. In the first part, a biased coin is flipped, and heads occurs with unknown probability p.
If a head occurs, then the three outcomes have probabilities [1/2 1/3 1/6]. Otherwise, the three outcomes
have probabilities [1/6 1/3 1/2]. In the second part of the experiment, one of the three outcomes is drawn
from the distribution determined in the first part.
This experiment is run n independent times.
a. Find the probabilities for outcomes 1, 2, and 3 in any run of the experiment in terms of the probability
p. You must do this part correctly to solve this problem.
b. Find the probability distribution for the n runs of the experiment.
c. From the distribution you determined in part b, determine a sufficient statistic for this problem.
d. Find the maximum likelihood estimate for p.
e. Is the estimate that you found in part d biased?

2.10 Cramer-Rao Bound for Gamma Density:


problem 12, p. 83, Alfred O. Hero, Statistical Methods for Signal Pro-
cessing, University of Michigan, unpublished notes, January 2003.
Let X1 , X2 , . . . , Xn be i.i.d. drawn from the Gamma density
1 θ−1 −x
p(x|θ) = x e , x ≥ 0, (58)
Γ(θ)

where θ is an unknown nonegative parameter and Γ(θ) is the Gamma function. Note that Γ(θ) is the
normalizing constant,

Γ(θ) = xθ−1 e−x dx. (59)
0
The Gamma function satisfies a recurrence formula Γ(θ + 1) = θΓ(θ).
a. Find the Cramer-Rao lower bound on unbiased estimators of θ using X1 , X2 , . . . , Xn . You may leave
your answer in terms of the first and second derivatives of the Gamma function.

3 Detection Theory
3.1 M-ary Detection and Chernoff Bounds
There are many different ways to explore the performance of M-ary detection problems.
Suppose that there are M random vectors Sm , m = 1, 2, . . . , M , each vector having N components
(dimension N × 1). These vectors are independent and identically distributed (i.i.d.). Furthermore, the

12
components of the vectors are i.i.d. Gaussian with zero mean and variance σs2 . Thus, the probability density
function for a vector Sm is
!N
1 − 1 s2
p(sm ) = e 2σs2 mk . (60)
k=1
2πσs2
The model for the measured data in terms of the signal Sm is

R = Sm + W, (61)
2
where W is a noise vector with i.i.d. zero mean, Gaussian distributed components whose variance is σw , and
W is independent of Sm .
a. Let’s first assume that the signal vectors are not known at the receiver. Suppose that given R it is
desired to estimate the random vector Sm . Find the minimum mean-squared error estimate; denote this
estimator by Ŝ(r). Find the mean-squared error of the estimate.
b. Compute the average mean-square error for our MMSE estimate Ŝ(r),

2 1
ξave = E{(Sm − Ŝ(R)) (Sm − Ŝ(R))}. (62)
N
Show that for any ,
1
P (| (Sm − Ŝ(R)) (Sm − Ŝ(R)) − ξave
2
|> ) (63)
N
goes to zero exponentially fast as N gets large. HINT: Use a Chernoff bound.
c. Assume that we have a channel N uses of which may be modeled by (61), when the transmitted signal
is sm . Assume a receiver knows all of the possible transmitted signals S = {sm , m = 1, 2, . . ., M }, and that
the receiver structure is of the following form. First, the receiver computes ŝ(r). Second, the receiver looks
through the vectors in S and finds all m such that sm satisfies
1
| (sm − ŝ(r)) (sm − ŝ(r)) − ξave
2
|< . (64)
N
If there is only one, then it is decided to be the transmitted signal. If there is more than one, the receiver
randomly chooses any signal. To evaluate the probability of error, we do the following. Suppose that sm
was the signal sent and let sl be any other signal. Show that the probability
1
P (| (sl − ŝ(r)) (sl − ŝ(r)) − ξave
2
|< ) (65)
N
goes to zero exponentially fast as N gets large; find the exponent. HINT: Use a Chernoff bound.
d. To finish the performance analysis, note that the probability that sl1 satisfies (64) is independent
of the probability that sl2 satisfies (64), for l1 = l2 . Use the union bound to find an expression for the
probability that no l other than l = m satisfies (64). Then, find an exponential bound on the following
expression in terms of K = log2 M

P (error|m) < P (sm does not satisfy (64)) + (1 − P (all l = m do not satisfy (64))) (66)

If you get this done, then you have an exponential rate on the error in terms of K, , σs2 , and σw
2
. Note that
Shannon derived the capacity of this channel to be
 
1 σs2
C = log 1 + 2 . (67)
2 σw

Is C related to your exponent at all? It would relate to the largest possible K/N at which exponential error
rates begin. Do not spend a lot of time trying to relate this K/N to C if it does not drop out, since it
probably will not. However, you surely made a mistake if your largest rate is greater than C.

13
3.2 Symmetric 3-ary Detection with Sinusoids
Suppose that three hypotheses are equally likely. The hypotheses are H1 , H2 , and H3 with corresponding
data models

H1 : r(t) = a cos(100t) + w(t), 0≤t≤π (68)


H2 : r(t) = a cos(100t + 2π/3) + w(t), 0≤t≤π (69)
H3 : r(t) = a cos(100t − 2π/3) + w(t), 0 ≤ t ≤ π, (70)

where w(t) is white Gaussian noise with mean zero and intensity N0 /2,

N0
E[w(t)w(τ )] = δ(t − τ ). (71)
2
a. Assume that a > 0 is known and fixed. Determine the optimal receiver for the minimum probability
of error decision rule. Draw a block diagram of this receiver. Does your receiver depend on a? Comment on
the receiver design.
b. Derive an expression for the probability of error as a function of a. Simplify the expression if possible.

3.3 Detection for Two Measured Gaussian Random Processes


Suppose that two random processes, r1 (t), and r2 (t), are measured. Under hypothesis H1 , there is a signal
present in both, while under hypothesis H0 (the null hypothesis), there is no signal present. More specifically,
under H1 ,

r1 (t) = Xe−αt cos ωt + w1 (t), 0 ≤ t ≤ T (72)


−αt
r2 (t) =Ye sin ωt + w2 (t), 0 ≤ t ≤ T, (73)

where
• X and Y are independent zero mean Gaussian random variables with variances σ 2
• α and ω are known
• w1 (t) and w2 (t) are independent realizations of white Gaussian noise, both with intensity N0 /2
• w1 (t) and w2 (t) are independent of X and Y .
Under H0 ,

r1 (t) = w1 (t), 0 ≤ t ≤ T (74)


r2 (t) = w2 (t), 0 ≤ t ≤ T, (75)

where w1 (t) and w2 (t) are as above.


a. Find the decision rule that maximizes the probability of detection given an upper bound on the
probability of false alarm.
b. Find expressions for the probability of false alarm and the probability of detection using the optimal
decision rule. In order to simplify this, you may need to assume that the two exponentially damped sinusoids
are orthogonal over the interval of length T .
c. Analyze the expressions obtained in part b. How does the performance depend on α, ω, T , σ 2 , and
N0 /2?

14
3.4 Deterministic Signals in WGN:
Comparison of On-Off, Orthogonal, and Antipodal Signaling
In this problem, you will compare the performance of three detection problems and determine which of the
three performs the best. The following assumptions are the same for each problem:
• The prior probabilities are equal: P1 = P2 = 0.5.
• The performance is measured by the probability of error.
• The models are known signal plus additive white Gaussian noise.
• As shown below, for each problem the average signal energy (averaged over the two hypotheses) is E.
• The signals are given in terms of s1 (t) and s2 (t) plotted in the figure below.
• The noise w(t) is white Gaussian noise with intensity N0 /2, and is independent of the signal under
each hypothesis.
Problem 1:

H1 r(t) = 2Es1 (t) + w(t), 0 ≤ t ≤ 4,
H2 r(t) = w(t), 0 ≤ t ≤ 4.

Problem 2:

H1 r(t) = Es1 (t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = Es2 (t) + w(t), 0 ≤ t ≤ 4.

Problem 3:

H1 r(t) = Es1 (t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = − Es1 (t) + w(t), 0 ≤ t ≤ 4.

Determine the optimal performance for each of the three problems. Compare these performances.

Signal s1 (t) Signal s2 (t)


1 1

0.5 0.5
Signal Value

Signal Value

0 0

−0.5 −0.5

−1 −1
−1 0 1 2 3 4 5 −1 0 1 2 3 4 5
Time, s Time, s

Figure 1: Signals s1 (t) and s2 (t).

15
3.5 Random Signal Amplitude:
Comparison of Antipodal and Orthogonal Signaling
In this problem, you will compare the performance of two problems, only this time, the signals are random.
Let A be a Gaussian random variable with mean 0 and variance σ 2 . The random variable A is independent
of the noise w(t). The two signals s1 (t) and s2 (t) are the same as in detection problem 3.4, as shown in
Figure 1. As before, the hypotheses are equally likely, the goal is to minimize the probability of error, and
the independent additive white Gaussian noise w(t) has intensity N0 /2.
Problem A:

H1 r(t) = A Es1 (t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = A Es2 (t) + w(t), 0 ≤ t ≤ 4.
Problem B:

H1 r(t) = A Es1 (t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = −A Es1 (t) + w(t), 0 ≤ t ≤ 4.
a. Find the probability of error for the minimum probability of error decision rule for Problem B.
b. Find an expression (given as an integral, but without solving the integral) for the probability of error
for Problem A. Show that the performance is determined by the ratio SN R = 2Eσ 2 /N0 .
c. Which of these two problems has better performance and why?
d. This part is difficult, so you may want to save it for last. Solve the integral for the minimum probability
of error for Problem A. Show that the performance is determined by the fraction of an ellipse that is in a
specific quadrant. Show that this fraction equals
1 1
tan−1 √ , (76)
π SN R + 1
and therefore that the probability of error is
2 1
P (error) = tan−1 √ . (77)
π SN R + 1

3.6 Degenerate Detection Problem


Consider the binary hypothesis testing problem over the interval −1 ≤ t ≤ 1,
H0 : r(t) = n(t), −1 ≤ t ≤ 1, (78)
H1 : r(t) = s(t) + n(t), −1 ≤ t ≤ 1, (79)
where s(t) is a known (real-valued) signal and n(t) is a (real-valued) zero mean Gaussian random process
with covariance function
E [n(t)n(u)] = 1 + tu, −1 ≤ t ≤ 1, −1 ≤ u ≤ 1, (80)
= Kn (t, u). (81)
The noise n(t) is assumed to be independent of the signal s(t).
a. Find the eigenfunctions and eigenvalues for Kn (t, u) over the interval −1 ≤ t ≤ 1.
b. Suppose that s(t) = 2 − 3t, for −1 ≤ t ≤ 1. Find the likelihood ratio test. What is the signal to noise
ratio?
c. Note that there are only a finite number of nonzero eigenvalues. Comment on the implications
for the performance if the signal s(t) is not in the subspace of signal space spanned by the corresponding
eigenfunctions. That is, what can be said about thhe performance if s(t) is not a linear combination of the
eigenfunctions corresponding to the nonzero eigenvalues.

16
3.7 Random Phase in Sinusoid
In a binary hypothesis testing problem, the two hypotheses are
H0 : r(t) = w(t) , for 0 <= t <= T, (82)
H1 : r(t) = 2E/T cos(ωc t + θ) + w(t) , for 0 <= t <= T. (83)
Here, w(t) is WGN with intensity N0 /2, ωc is a known frequency (a multiple of 2π/T ), and θ is a random
variable with probability density function pθ (Θ).
a. Determine the likelihood ratio. Interpret the result as an expected value of the likelihood if θ is known.
Show that a sufficient statistic consists of the pair (Lc , Ls ) where
T T
Lc = r(t) 2/T cos(ωc t)dt and Ls = r(t) 2/T sin(ωc t)dt . (84)
0 0
b. Assume that θ is uniformly distributed between 0 and 2π. Derive the optimum test
L2c + L2s > γ . (85)
Hints: For this distribution, show that the likelihood reduces to

I0 (2 E L2c + L2s /N0 )e−E/N0 (86)
where I0 (·) is a modified Bessel function of the first kind defined by
π 2π
1 1
I0 (x) = ex cos θ dθ = ex cos(θ+φ) dθ (87)
π 0 2π 0

for all φ. Then use the fact that I0 (·) is a monotonic function (so it is invertible) to get the result. A plot
of I0 is given on page 340 of the text. Incidentally, you will want to read some of the nearby pages to solve
this problem (including pg. 344).
c. Find the probabilities of false alarm and detection. Show pertinent details.
d. Read the section in the book for pθ given by (364), pg. 338.

3.8 Detection of Signal in Simple Colored Noise


In this problem, we look at detection in the presence of colored noise. Under hypothesis H0 ,
r(t) = n(t), 0 ≤ t ≤ 3, (88)
while under hypothesis H1 ,
r(t) = s(t) + n(t), 0 ≤ t ≤ 3, (89)
where the noise n(t) is independent of the signal s(t). Supppose that n(t), 0 ≤ t ≤ 3 is a Gaussian random
process with covariance function
Kn (t, τ ) = δ(t − τ ) + 2s1 (t)s1 (τ ) + 3s2 (t)s2 (τ ) + 5s3 (t)s3 (τ ) + 7s4 (t)s4 (τ ), 0 ≤ t ≤ 3, 0 ≤ τ ≤ 3,
where
"
2
s1 (t) = cos(2πt/3), 0 ≤ t ≤ 3,
3
"
2
s2 (t) = sin(2πt/3), 0 ≤ t ≤ 3,
3
"
2
s3 (t) = cos(4πt/3), 0 ≤ t ≤ 3,
3
"
2
s4 (t) = sin(4πt/3), 0 ≤ t ≤ 3.
3

17
These signals that parameterize the covariance function for the noise are shown in the top plots in Figure 2.
The signal s(t) is shown in the lower plot in Figure 2 and is equal to

s(t) = 11 cos(4πt/3 + π/3), 0 ≤ t ≤ 3. (90)

a. Derive the optimal Bayes decision rule, assuming prior probabilities P0 and P1 , and costs of false alarm
and miss CF and CM , respectively.
b. What are the sufficient statistics? Find the joint probability density function for the sufficient statistics.
c. Derive an expression for the probability of detection for a fixed threshold.
Four Noise Functions
1

Noise Signal Values 0.5

−0.5

−1
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
Time, s

15

10
Signal Value, H1

−5

−10

−15
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
Time, s

Figure 2: Upper Plot: Signals s1 (t), s2 (t), s3 (t), and s4 (t). Lower Plot: Signal s(t) for hypothesis H1 .

4 Estimation Theory
4.1 Curve Fitting
In this problem you will solve a curve fitting problem. Assume that it is known that x(t) has the parame-
terized form

x(t) = at2 + bt + c . (91)

The data available is the function r(t) where

r(t) = x(t) + n(t) , for − T /2 < t < T /2 , (92)

where n(t) is a Gaussian random process with covariance function Kn (t, u).
a. In this first part of the problem, assume that the parameters a, b, and c are nonrandom. Find the
maximum likelihood estimates for these parameters. This solution should be in a general form involving
integrals. Note that this solution involves the inverse of a matrix. In order to give an example of when this
matrix is invertible, assume that Kn (t, u) = (N0 /2)δ(t − u), and find the inverse of this matrix.
b. Now assume that the parameters are independent, zero mean, Gaussian random variables with variances
σa2 , σb2 , and σc2 . Find the equation for the general MAP estimates. For the specific case of each of these
variances being equal to 1 and n(t) being white noise with intensity N0 /2, find the inverse of the matrix
needed for the solution.

18
Signal s1 (t)

Signal Value
2
1
0
−1
−1 0 1 2 3 4 5 6 7
Time, s
Signal s (t)
2

Signal Value
2
1
0
−1
−1 0 1 2 3 4 5 6 7
Time, s
Signal s (t)
3
Signal Value 2
1
0
−1
−1 0 1 2 3 4 5 6 7
Time, s

Figure 3: Signals s1 (t), s2 (t), and s3 (t).

4.2 Estimation of Linear Combination of Signals


Suppose that

r(t) = as1 (t) + bs2 (t) + cs3 (t) + w(t), 0 ≤ t ≤ 6, (93)

where
• w(t) is white Gaussian noise, independent of the signals, with intensity N0
2
,
• the signals s1 (t), s2 (t), and s3 (t) are shown in Figure 3, and
• the scale factors a, b, and c are independent and identically distributed Gaussian random variables
with zero mean and variance 9.
a. Find the maximum a posteriori (MAP) estimates for a, b, and c given r(t), 0 ≤ t ≤ 6.
b. Find the Fisher information for estimating a, b, and c given r(t), 0 ≤ t ≤ 6 (take into account that
they are random variables).
c. Do the MAP estimates achieve the Cramer-Rao lower bound consistent with the Fisher information
from part b? Comment.

4.3 Time-Scale Factor Estimation


4.3.1 Background Motivation
In ultrasonic imaging, the substance through which the sound waves pass determines the speed of sound.
Local variations in the speed of sound, if detectable, may be used to infer properties of the medium. On a
grand scale, whales are able to communicate in the ocean over long distances because variations in the speed
of sound with depth create effective waveguides near the surface, thereby enabling sound to propagate over
long distances. On a smaller scale, variations in the speed of sound in human tissue may confound inferences
about other structures. The speed of sound in a uniform medium causes an ultrasound signal to be scaled
in time. Thus, an estimate of a time-scale factor, as discussed below, may be used to derive an estimate of
the speed of sound.
A second example in which time-scale estimation plays a role is in estimating the speed of an emitter or
a reflector whose velocity is such that the usual narrowband assumption does not hold. This is the case if

19
the velocity is significant relative to the speed of propagation. This is also the case if the bandwidth of the
signal is comparable to its largest frequency, so the notion of a carrier frequency does not make sense.

4.3.2 Problem Statement


Suppose that a real-valued signal with an unknown time-scale factor is observed in white Gaussian noise:

r(t) = aEs(at) + w(t), T /2 ≤ t ≤ T /2, (94)
where w(t) is white Gaussian noise with intensity N0 /2, w(t) is independent of the signal and the unknown
time-scale factor a, s(t) is a signal of unit energy, and E is the energy of the transmitted signal. This problem
relates to finding the maximum likelihood estimate for the time-scale factor a subject to the constraint that
a > 0.
Assume that T is large enough so that, whatever value of a is considered,
T /2 √ aT /2
| as(at)|2 dt = |s(τ )|2 dτ
−T /2 −aT /2
T /2
= |s(t)|2 dt
−T /2

= |s(t)|2 dt
−∞
= 1. (95)

Essentially, this assumption allows us to make some of the integrals that are relevant for the problem go
over an infinite time interval. Do not use this assumption in part a below.
a. Find the log-likelihood functional for estimating a.
b. Derive an equation that the maximum likelihood estimate for a must satisfy. What role does the
constraint a > 0 play?
c. For a general estimation problem, state in words what the Cramer-Rao bound is.
d. Assume that the function s(t) is twice continuously differentiable. Find the Cramer-Rao lower bound
for any estimate of a. Hint: For a > 0 and α > 0,
∞ √ √ ∞ √ √
as(at) αs(αt)dt = as(at) αs(αt)dt
−∞ −∞

" #α $
α
= s(τ )s τ dτ
−∞ a a
#α$
= C . (96)
a
The function C is a time-scale correlation function. This correlation between two time-scaled signals depends
only on the ratio of the time scales. This correlation is symmetric in a and α, so C(ρ) = C(1/ρ). Note that
the maximum of C is obtained at C(1) = 1. C(ρ) is differentiable at ρ = 1 if s(t) is differentiable. If in
addition to being differentiable, ts2 (t) goes to zero as t gets large, dC
dρ (1) = 0. Derive the Cramer-Rao lower
bound in terms of C(ρ).

4.4 Finding a Needle in a Haystack


In order to understand the difficulty of finding a needle in a haystack, one first must understand the statistics
of the haystack. In this problem, you will derive a model for a haystack and then derive an algorithm for
estimating how many pieces of hay are in the stack.
Suppose that a piece of hay is measure using hay units, abbreviated hu. On average a piece of hay has
length 1 hu and width 0.02 hu. The standard deviation of each of these dimensions is 20%. Assume a fill

20
factor of f = 50% of a volume; that is, when the hay is stacked, on average 50% of the volume is occupied
by hay.
A stack of hay is typically much higher in the middle than around the edges. For simplicity, assume the
stack is appoximately circularly symmetric.
a. Write down a reasonable model for the shape of a haystack. Give the model in hay units (hu). Assume
there is an overall scale factor that determines the size, A, given in hu. Thus, for a doubling of the value of
A, the volume of the haystack increases by a factor of A3 .
b. Write down a reasonable model for N pieces of hay stacked up. That is, assume a distribution on
hay shapes consistent with the statistics above. Determine is the distribution on the volume occupied by N
pieces of hay.
c. For your model of the shape of a haystack, derive a reasonable estimator on the number of pieces of
hay. Justify your model based on an optimality criterion and your statistical models above.
d. Evaluate the performance of your estimator as a function of A.
e. As A gets very large, what is the form of your estimator?
f. Comment on the difficulty of finding a needle in a haystack. You may assume that the length of the
needle is much smaller than 1 hu.

4.5 Two Dimensional Random Walk, Observation at Fixed Time


Suppose that a random walk takes place in two dimensions. At time t = 0, a particle starts at (x, y) = (0, 0).
The random walk begins at t = 0. At any time t > 0, the probability density function for the location of the
particle is  
1 1 2 2
p(x, y; t, D) = exp − [x + y ] , (97)
2πDt 2Dt
where D is the diffusion constant. If x and y have units of length, and t has units of time, then D has units
of length squared per unit time.

Parts a and b
Suppose that the random walk is started at time t = 0 and then at some fixed time t = T , the position is
measured exactly. Let this be repeated N independent times, yielding data {(Xi , Yi ), i = 1, 2, . . . , N }.
a. Find the maximum likelihood estimate for the diffusion constant D from these N measurements.
b. Find the Cramer-Rao lower bound on estimating D. Is the maximum-likelihood estimator efficient?

Parts c, d, e, and f
Now let us consider a more practical situation. The device making the measurement of position is of finte
size. Thus, for
x2 + y2 ≤ r, (98)
the measurement is perfect (where r is the radius of the device). For
x2 + y2 ≥ r, (99)
there is no measurement; that is, no particle is detected.
Suppose that this experiment is run N independent times. Out of those N times, only M , where M ≤ N ,
runs yield particle measurements.
c. Find the probability for any value of M . That is, find P (M = m), for each 0 ≤ m ≤ N .
d. Find the conditional distribution on the particles measured given M . In order to fix notation, the
particles that are measured are relabeled from 1 to M . The set of measurements, given M , is {(Xi , Yi ), i =
1, 2, . . . , M }.
e. Find the maximum likelihood estimate for D.
f. Find the Cramer-Rao bound for estimating D given these data. Compare this bound to the original
bound. Under what conditions is the maximum likelihood estimate for D approximately efficient?

21
4.6 Gaussian Estimation from Covariance Functions
Two jointly Gaussian random processes a(t) and r(t) have zero mean and are stationary. The covariance
functions are

E[r(t)r(τ )] = Krr (t − τ ) = 7e−3|t−τ| + 6δ(t − τ ) (100)


E[a(t)a(τ )] = Kaa (t − τ ) = 7e−3|t−τ| (101)
E[a(t)r(τ )] = Kar (t − τ ) = 7e−3|t−τ| (102)
(103)

a. Find the probability density function for a(1) given all values of r(t) for −∞ < t < ∞.
b. Interpret the optimal minimum mean square error (MMSE) estimator for a(t) given r(u) for −∞ <
u < ∞ in terms of power spectra.
c. Now consider a measurement over a finite time interval, r(u), 0 ≤ u ≤ 2. Find the form of the optimal
MMSE estimator for a(t) for 0 ≤ t ≤ 2. Do not work through all of the details on this part–just convince
me that you could if you had enough time.

4.7 Gaussian Estimation: Covariance Functions and Dimension One Kalman


Filter
In this problem, we use the same data model as in problem 4.6. Two jointly Gaussian random processes a(t)
and r(t) have zero mean and are stationary. The covariance functions are

E[r(t)r(τ )] = Krr (t − τ ) = 7e−3|t−τ| + 6δ(t − τ ) (104)


E[a(t)a(τ )] = Kaa (t − τ ) = 7e−3|t−τ| (105)
E[a(t)r(τ )] = Kar (t − τ ) = 7e−3|t−τ| (106)
(107)

Suppose that a causal estimator is desired. Design an optimal MMSE estimator of the form
dâ
= −λâ(t) + gr(t). (108)
dt
Note that this estimator model has two parameters, λ and g.
HINT: Consider the state space system
da
= −3a(t) + u(t) (109)
dt
r(t) = a(t) + w(t), (110)

where u(t) and w(t) are appropriately chosen white noise processes. Do you know an optimal causal MMSE
estimator for a(t)?

4.8 Linear Estimation Theory:


Estimation of Colored Noise in AWGN
Suppose that n(t), −∞ < t < ∞ is a stationary Gaussian random process with covariance function
5
E[n(t)n(t − τ )] = δ(τ ) + e−2|τ|
4
= Kn (τ ). (111)

22
a. Assume that n(t) = nc (t) + w(t), where w(t) and nc (t) are independent stationary Gaussian random
processes, w(t) is white Gaussian noise, and nc (t) has mean finite energy over any finite interval. Find the
covariance functions for w(t) and nc (t). Denote the covariance function for nc (t) by Kc (τ ).
b. Find an equation for the optimal estimate of nc (t) given n(u), −∞ < u ≤ t, in terms of Kn and Kc .
Be as specific as you can, that is, make sure that the equations define the unique solution to the problem.
Make sure that you account for the causality, that is, the estimate of nc (t) at time t depends only on current
and previous values of n(t).
c. Examine the equations from part b carefully and argue that the unique solution for the estimator is
a linear, causal, time-invariant filter. Find the Fourier transform of the impulse response of this filter and
find the impulse response.

4.9 Point Process Parameter Estimation


Many radioactive decay problems can be modeled as Poisson processes with intensity (or rate) functions that
decay exponentially over time. That is, all radioactive decay events are independent and each such event
decreases the total amount of the material. In this problem, you will estimate both the total amount of the
material and the decay rate using a simplified, two-parameter model.
Assume that 0 < X1 < X2 < X3 < . . . < Xn < . . . is a set of points (a realization) drawn from a Poisson
process with intensity function
λ(t) = ace−ct , t ≥ 0. (112)
Thus the two parameters are a and c. Denote such a realization by X. We know that number of points in
any interval [T0 , T1 ) is Poisson distributed with mean μ equal to
T1
μ= λ(t)dt. (113)
T0

The numbers of points in two nonoverlapping intervals are independent.


The derivation of the loglikelihood function for a Poisson process is technically involved. A simplified
derivation starts by placing intervals of width around each point Xi . The probability of getting one point
in the interval around Xi is
λ(Xi )e− λ(Xi ) , (114)
which for small is close to λ(Xi ). The probability of getting two or more points in an interval is negligible
for small . The probability of getting no points between Xi−1 and Xi is
% Xi
− λ(t)dt
Xi−1
e . (115)

The likelihood function for the first N points is then proportional to (taking X0 = 0)
% Xi
!
N
− λ(t)dt
Xi−1
L(X) = e λ(Xi ) (116)
i=1
% XN !
N
− λ(t)dt
= e 0 λ(Xi ), (117)
i=1

and the loglikelihood function is the natural logarithm of L(X).


a. What is the probability distribution on the total number of points? Argue that the total number is
finite with probability one. For a finite number of points, the likelihood function is modified by multiplying
by the term % ∞
− λ(t)dt
XN
e . (118)

23
that corresponds to getting zero points after the last point XN . Thus the likelihood function for N total
points becomes
%∞ !
N
− λ(t)dt
L(X) = e 0 λ(Xi ). (119)
i=1

b. Argue that a corresponds to the total amount of material. Find the maximum likelinood estimate for
a.
c. The parameter c determines the rate of decay. Find the maximum likelihood estimate for c. Argue
that 1/ĉM L is the maximum likelihood estimate of the time constant of the decay.

4.10 Variance Estimation in AWGN


Suppose that the random variable A is known to be Gaussian distributed with mean 0, but it has unknown
variance σ 2 . The problem here is to find the maximum likelihood estimate for σ 2 given a realization of a
random process r(t), where √
r(t) = A Es(t) + w(t), 0 ≤ t ≤ T. (120)
In this equation, s(t) is a known function that has unit energy in the interval [0, T ], the energy E is known,
and the additive white Gaussian noise w(t) is independent of A and the signal and has intensity N0 /2.
a. Find the maximum likelihood estimate for σ 2 .
b. Find the Cramer-Rao lower bound for estimating σ 2 . State what the Cramer-Rao lower bound
signifies.
c. Find the bias of the maximum-likelihood estimator (if you cannot find it in closed form, it suffices to
give an integral).
d. Is the variance of the maximum-likelihood estimator greater than the Cramer-Rao lower bound, equal
to it, or less than it? Explain your answer.

5 Expectation-Maximization Algorithm
5.1 Exponentially Distributed Random Variables
Suppose that the random variable R is a sum of two exponentially distributed random variables, X and Y ,

R = X + Y, (121)

where pX (x) = α exp(−αx), x ≥ 0, and pY (y) = β exp(−βy), y ≥ 0. The value of β is known, but α is not
known. The goal of the problem is to derive an algorithm to estimate α.
a. Write down the loglikeihood function for the data R (this is the incomplete data loglikelihood function).
Find a first order necessary condition for α to be a maximum likelihood estimate. Is this equation easy to
solve for α?
b. Define the complete data to be the pair (X, R), and write down the complete data loglikelihood function.
c. Determine the conditional probability density function on X given R, as a function of a nominal value α̃.
Denote this probability density function (pdf) p(x|r, α̃).
d. Using the pdf p(x|r, α̃), determine the conditional mean of X given R and α̃.
e. Determine the function Q(α|α̃), the expected value of the complete data loglikelihood function given the
incomplete data and α̃.
f. Derive the expectation-maximization algorithm for estimating α given R.

24
5.2 Parameterized Signals in White Gaussian Noise
Suppose that an observed, continuous-time signal consists of a sum of signals of a given parameterized
form, plus noise. The problem is to estimate the parameters in each of the signals using the expectation
maximization algorithm.
To be more specific, assume that the real-valued signal is


N
r(t) = s(t; θk ) + w(t), 0 ≤ t ≤ T, (122)
k=1

where s(t; θ) is a signal of a given type, and θk , k = 1, 2, . . . , N are the parameters; w(t) is white Gaussian
noise with intensity N0 /2. Example signals (with Matlab code fragments) include:
1. Sinusoidal signals: θ = (f,phase,amplitudes) and s=amplitudes’*cos(2*pi*f*t+phase*ones(size(t)));
2. Exponential signals: θ = (exponents,amplitudes) and s=amplitudes’*exp(exponents*t);
3. Exponentially decaying sinusoids: θ = (exponents,f,phase,amplitudes) and
s=amplitudes’*(exp(exponents*t).*cos(2*pi*f*t+phase*ones(size(t))));
Define the vector of all parameters to be estimated by

Θ = (θ1 , θ2 , . . . , θN ) . (123)

a. Derive the loglikelihood ratio functional for Θ. This is the incomplete-data loglikelihood ratio func-
tional, where we refer to r(t) as the incomplete data. The ratio is obtained as in class relative to a null
hypothesis of white noise only.
b. For the expectation-maximization algorithm, define the complete-data signals

rk (t) = s(t; θk ) + wk (t), 0 ≤ t ≤ T, k = 1, 2, . . . , N, (124)

where wk (t) is white Gaussian noise with intensity σk2 where


N
σk2 = N0 /2. (125)
k=1

Using these complete-data signals, we have


N
r(t) = rk (t). (126)
k=1

The standard choice for the intensities assigned to the components is σk2 = N0 /(2N ), but other choices may
yield better convergence. Derive the complete-data log-likelihood ratio functional; notice that it is written as
a sum of log-likelihood ratio functionals for the rk (which are again relative to the noise only case). Denote
the log-likelihood ratio functional for rk given θk by l(rk |θk ). Note that this log-likelihood ratio functional
is linear in rk .
c. Compute the expected value of the complete-data log-likelihood function given the incomplete data and
a previous estimate for the parameters; denote this by Q. Suppose the previous estimate of the parameters
is Θ̂(m) , where m denotes the iteration number in the EM algorithm. So Q(Θ|Θ̂(m) ) is a function of Θ and
of the previous estimate. Note that Q can be decomposed as a sum


N
Q(Θ|Θ̂(m) ) = Qk (θk |Θ̂(m) ). (127)
k=1

25
d. Conclude from the derivation in parts b and c that only the conditional mean of rk given r and Θ̂(m)
is needed to find Qk . Explicitly compute this expected value,
r̂k (t) = E{rk (t)|r, Θ̂(m) }. (128)
(m)
e. The maximization step of the EM algorithm consists of maximizing Q(Θ|Θ̂ ) over Θ. The EM algo-
rithm effectively decouples the problem at every iteration into a set of independent maximization problems.
Show that to maximize Q over Θ it suffices to maximize each Qk over θk . Derive the necessary equations
(m+1)
for θ̂k by taking the gradient of Qk (θk |Θ̂(m) ) with respect to θk . Write the result in terms of r̂k . Note
that this equation depends on θk through two terms, s(t; θk ) and the gradient of s(t; θk ) with respect to θk .
f. In this part, you will develop Matlab code for the general problem described above and apply it to
two of the cases listed above. From the derivation of the EM algorithm, it is clear that there are two critical
components. The first is the maximization of Qk and the second is the conditional expected value of rk . For
the maximization step, we only need to analyze a simpler problem. The simpler problem has data
ρ(t) = s(t; θ) + w(t), 0 ≤ t ≤ T, (129)
where s is of the same form as above, and w(t) is white Gaussian noise with intensity σ 2 . Write a Matlab
program to simulate (129), taking into account the following guidelines. Note that the code developed earlier
in the semester should help significantly here.
f.i. To simulate the continuous-time case using discrete-time data, some notion of sampling must be
used. We know from class that white Gaussian noise cannot be sampled in the usual sense. The discrete-
time processing must be accomplished so that the resulting implementations converge to the solution of the
continuous-time problem in a mean-square sense as the sampling interval converges to zero. To accomplish
this, we model the data as being an integral of ρ(t) over a small time interval,
(i+1)Δ
ρi = ρ(t)dt (130)

= s(iΔ; θ)Δ + wi + gi , (131)
where Δ is the sampling interval, and the term gi at the end is small and can be ignored (it is order Δ2 ).
Show that the discrete-time noise terms wi are i.i.d. zero mean Gaussian random variables with variance
σ 2 Δ. In order to avoid having both the signal part (that is multiplied by Δ which is small) and the noise
going to zero, it is equivalent to assume that the measured data are ηi = ρi /Δ, so
ηi = s(iΔ; θ) + ni , (132)
where ni are i.i.d. zero mean Gaussian with variance σ 2 /Δ. Given the signal, σ 2 , and Δ, the following
Matlab code fragment implements this model.
size s=size(signal);
noise=randn(size s);
variance=sigma2/Delta;
noise=noise*sqrt(variance);
eta=signal+noise;
Note that the noise is multiplied by σ and divided by the square root of Δ. Using this concept, the simulation
is flexible in the number of samples. Only in the limit as Δ goes to zero does this truly approximate the
continuous-time signal.
f.ii. In this part, you will write Matlab code to find the maximum likelihood estimates for parameters
for one sinusoid in noise. Assume that
s(t; θ) = a cos(2πf0 t + φ), (133)

and that the three parameters (a, f0 , φ) are to be estimated. For f0 much larger than T , show that
T
cos2 (2πf0 t + φ)dt (134)
0

26
is approximately equal to T /2. For this situation, show that the maximum likelihood estimates are found
by the following algorithm:
Step 1: Compute the Fourier transform of the data
T
R(f) = ρ(t)e−j2πft dt. (135)
0

Step 2: Find the maximum over all f of |R(f)|; set fˆ0 equal to that frequency.
Step 3: Find â > 0 and φ̂ so that
â cos φ̂ + jâ sin φ̂ = R(fˆ0 ). (136)
Implement this algorithm. Perform some experiments that demonstrate your algorithm working. Derive the
Fisher Information Matrix for estimating the three parameters (a, f0 , φ).
f.iii. Implement Matlab code for estimating two parameters for one decaying exponential. That is, assume
that the signal is
s(t; θ) = Ae−αt . (137)
Write Matlabe code for estimating (A, α). Perform some experiments that demonstrate that your algorithm
is working. Derive the Fisher Information Matrix for estimating the two parameters (A, α).
f.iv. Implement general Matlab code to compute the estimate of rk (t) given r(t) and previous guesses for
the parameters.
f.v. To demonstrate that the code in f.iv. works, implement it on a sum of two sinusoids and, separately,
on a sum of two decaying exponentials. Show that it works.
f.vi. Implement the full-blown EM algorithm for both a sum of two sinusoids and a sum of two decaying
exponentials. Run the algorithm many times for one set of parameters and compute the sample variance of
the estimates. Compare the sample variances to the entries in the inverse of the Fisher Information Matrices
computed above. Run your algorithm for selected choices of signal-to-noise ratio and parameters. Briefly
comment on your conclusions.

5.3 Gamma Density Parameter Estimation


In this problem, you will solve a maximum likelihood estimation problem in two ways. First, you will solve
it directly, obtaining a closed form solution. Given a closed form solution, the derivation of an iterative
algorithm for this problem is not fundamental. However, in the second part of this problem you will derive
an expecation-maximization (EM) algorithm for it. This algorithm may be extended to more complicated
scenarios where it would be useful.
Assume that λ is a random variable drawn from a gamma density function

θM M −1 −λθ
p(λ|θ) = λ e , λ ≥ 0. (138)
Γ(M )

Here θ is an unknown nonnegative parameter and Γ(M ) is the Gamma function. Note that Γ(M ) is the
normalizing constant,

Γ(M ) = xM −1 e−x dx. (139)
0

The Gamma function satisfies a recurrence formula Γ(M +1) = M Γ(M ). For M an integer, Γ(M ) = (M −1)!.
Note that M is known. This implies that the mean of this gamma density function is E[λ|θ] = M
θ .
The random variable λ is not directly observable. Given the random variable λ, the observations
X1 , X2 , . . . , XN , are i.i.d. with probability density functions

pXi |λ (xi |λ) = λe−λxi , xi ≥ 0. (140)

Note that there is one random variable λ and there are N random variables Xi that are i.i.d. given λ.

27
a. Find the joint probability density function for X1 , X2 , . . . , XN conditioned on θ.
b. Directly from the probability density function from part a, find the maximum likelihood estimate of
θ. Note that this estimate does not depend on λ.
c. To start the derivation of the EM algorithm, write down the complete data loglikelihood function,
keeping only terms that depend on θ. Denote this function lcd (λ|θ).
d. Compute the expected value of the complete data loglikelihood function given the observations
X1 , X2 , . . . , XN , and the previous estimate for θ denoted θ̂(k) . Denote this function by Q(θ|θ̂(k) ). Hint: This
step requires a little thought. The posterior density function on λ given the measurements is in a familiar
form.
e. Maximize the Q function over θ to obtain θ̂(k+1) . Write down the resulting recursion with θ̂(k+1) as a
function of θ̂(k) .
f. Verify that the maximum likelihood estimate derived in part b is a fixed point of the iterations derived
in part e.

5.4 Mixture of Beta Distributions


The probability density function for the beta distribution is of the form

Γ(a + b) a
f(x : a, b) = x (1 − x)b , 0 ≤ x ≤ 1, (141)
Γ(a)Γ(b)

where Γ(t) is the Gamma function



Γ(t) = xt−1 e−x dx. (142)
0

a. Suppose that N independent and identically distributed (i.i.d.) realizations Xi are drawn from
the probability density function f(x : a, b). Find equations for the maximum likelihood estimates of the
parameters a and b in terms of the observations. DO NOT SOLVE the equations, but represent the solution
in terms of the Gamma function and the derivative of the Gamma function, Γ (t).
b. For the estimation problem in part a, what are the sufficient statistics?
c. Now assume that the true distribution is a mixture of two beta distributions
Γ(a + b) a Γ(α + β) α
f(x : π, a, b, α, β) = π x (1 − x)b + (1 − π) x (1 − x)β , 0 ≤ x ≤ 1. (143)
Γ(a)Γ(b) Γ(α)Γ(β)

The generation of a realization from this probability density function can be viewed as a two-step process. In
the first step, the first density is selected with probability π and the second is selected with probability 1 − π.
In the second step, a realization of the appropriate density function is selected. Define the complete data
for a realization as the pairs (ji , Xi ), i = 1, 2, . . . , N , where ji ∈ {0, 1} indicates which density is selected in
the first step, and Xi is the realization from that density. For this complete data, write down the complete
data loglikelihood function.
d. Find the expected value of the complete data loglikelihood function given the incomplete data
{X1 , X2 , . . . XN }; call this function Q.
e. Maximize Q over the variables π, a, b, α, β. Write down the equations that the maximum likelihood
estimates satisfy in terms of the functions Γ(t) and Γ (t).
f. If the maximum likelihood estimator defined in part a is given by (âM L , b̂M L) = h(x1 , x2, . . . xN ),
express the result of the maximization step in part e in terms of h.

5.5 Mixture Density Estimation


In this problem, you will explore some classical issues in density estimation. These are the same types
of issues which arise when one tries to estimate a continuous function from discrete data. The difficulty
arises because the estimates tend to be concentrated when in fact some prior knowledge leads you to believe

28
that the functions are not “peaked.” To be specific, suppose that you observe N independent identically
distributed random variables, and you were supposed to guess what the density was which gave rise to this
data. One description of a maximum likelihood solution is


N
1
δ(x − Xk ), (144)
N
k=1

where the Xk are the observations. If you have some prior knowledge that the original density was in fact
smooth, you would immediately reject this solution as infeasible. One approach is to represent your set of
possible solutions as a sum of smooth functions (or equivalently as the result of a convolution with a smooth
function).
Suppose our set of admissible density functions are

1 
N
p(x) = f(x − mk ) (145)
N
k=1

1
where f(x) = √2πσ 2
exp(−x2 /2σ 2 ). This set of functions makes sense in that it is the result of a concentrated
density being smoothed with a gaussian. The problem of interest is then, given M independent observations
of the random variable x, to find your best estimate for N and for the mk , k = 1, 2, . . . , N .
a. Assume that M = 2 and that N = 2. Thus there are two random variables observed, X1 and X2 ,
and we wish to estimate m1 and m2 . Use the method of maximum likelihood to find two equations which the
maximum likelihood estimates must satisfy. These equations are difficult to solve. They may be reduced to
determining one parameter, however, by noting the symmetry involved in the equations. In particular, if we
let m1 = X1 + Δ and m2 = X2 − Δ, then the equations to be solved reduce to one (still difficult) equation
in Δ. You don’t need to solve these equations. Is Δ = 0 a valid solution? Comment. Is Δ = (X2 − X1 )/2
a valid solution? Comment. If X2 > X1 , can Δ ever be negative? Comment.
As a side comment, it is worth noting that one way of finding the solution is to center a Gaussian on
each of the observed data points, sum them, then look for the peaks of the sum.
b. In this part of the problem, you will find an EM algorithm for solving for the maximum likelihood
solution. A model for the data must be determined which can be put in the usual form of a complete data
space and an incomplete data space. In the general problem, one models the observed random variable as
having come from one of N equally likely experiments the kth one of which has probability density function
f(x − mk ). The complete data consists of pairs (Xi , ni ), where ni specifies which of the N p.d.f.’s Xi came
from. The mapping from the complete data to the incomplete data just selects the Xi .
b.1. Assume M = N = 2. Determine the complete data loglikelihood. This step is crucial as it
determines the function we are going to maximize. Define the indicator functions
&
1 if k = n
Ik (n) = (146)
0 otherwise

(A hint for finding the complete data loglikelihood is that the density on Xi if n = k is Ik (n) ln[f(x − mk )];
since the Xi ’s are independent, the complete data loglikelihood is the sum of terms like this.)
b.2. Find the expected value of the complete data loglikelihood given X1 , X2 , m̂r1 , m̂r2 . This involves
finding the expected value of the indicator functions Ik (ni ). Be careful with these. (I use r to indicate the
iteration number here.)
b.3. Maximize the result of the last step to get the updates for m̂r+1
1 and m̂r+1
2 .
b.4. Pick a good initial value for your estimates. Justify why this selection is good.
c. This part is almost trivial. Let M = 2 and N = 1. There is only one parameter to determine here,
the mean of the Gaussian density. Find the maximum likelihood estimate for this mean.
d. In this part, you will set up a detection problem to determine how many Gaussians should be included
in the sum. Suppose M = 2. Under hypothesis H1 , there is one Gaussian in the density (N =1). Under

29
hypothesis H2 there are two Gaussians in the density (N =2). Determine the likelihood ratio test for this
problem. Assume the prior probabilities on the hypotheses are equal and that the costs of errors are equal.
Since there are unwanted parameters in the ratio (namely the means of the Gaussians), and these parameters
are nonrandom, substitute for them their appropriate estimates. Usually in hypothesis testing problems the
condition that the threshold is exactly equal to the likelihood ratio is not important. Is it important here?
For the case of the ratio equalling the threshold, choose H1 (the choice with fewer parameters).
e. Let X1 = 1 and X2 = 2. First, suppose σ 2 = 0.04. Find the maximum likelihood estimates for
m1 and m2 (this is the N = 2 case). Now, let σ 2 get larger. At what point does the hypothesis test in d
yield the decision that N =1? What are the maximum likelihood estimates for m1 and m2 at this point? If
you cannot determine the point exactly, find a few values nearby.
f. Write and test a Matlab routine to run the EM algorithm derived in part b. Show the performance of
the algorithm by running it many times for one choice of the means. Compute the means and covariances of
the estimates. Run for different choices of means. Note that it is sufficient to write the code assuming that
σ 2 = 1 and to scale the means by 1/σ.

6 Recursive Detection and Estimation


6.1 Background and Understanding of Autoregressive Models
Suppose that R1 , R2 , . . . is a stationary sequence of Gaussian random variables with zero mean. The co-
variance function is determined by an autoregressive model which the random variables satisfy. The autore-
gressive model is an mth order Markov model, meaning that the probability density function of Rn given
Rn−1 , Rn−2, . . . , R1 equals the probability density function of Rn given Rn−1 , Rn−2, . . . , Rn−m.
More specifically, suppose that

Rn = −a1 Rn−1 − a2 Rn−2 − . . . − am Rn−m + Wn , (147)

where Wn are indepedent and identically distributed Gaussian random variables with zero mean and variance
σ 2 . Let the covariance function for the random process be Ck , so

Ck = E{Rn Rn−k }. (148)

Comment: In order for this equation to model a stationary random process and to be viewed as a
generative model for the data, the corresponding discrete time system must be stable. That is, if one were to
compute the transfer function in the Z-transform domain, then all of the poles of the transfer function must
be inside of the unit disk in the complex plane. These poles are obviously the roots of the characteristic
equation with coefficients aj .
a. Using the autoregressive model in equation (147), show that the covariance function satisfies the
equations

C 0 + a1 C 1 + a2 C 2 + . . . + am C m = σ2 (149)
Ck + a1 Ck−1 + a2 Ck−2 + . . . + am Ck−m = 0, (150)

where the second equation holds for all k > 0. Hint: Multiply both sides of (147) by a value of the random
sequence and take expected values. Use the symmetry property of covariance functions for the first equality.
b. Derive a recursive structure for computing the logarithm of the probability density function of
Rn−1 , Rn−2, . . . , R1 . More specifically, let

vn = ln p(r1 , r2 , . . . , rn ). (151)

Derive an expression for vn in terms of vn−1 and an update. Focus on the case where n > m.
Hint: This is a key part of the problem, so make sure you do it correctly. It obviously relates to the Markov
property expressed through the autoregressive model in (147).

30
c. Consider the special case of m = 1. Suppose that C0 = 1. Find a relationship between a1 and σ 2
(essentially you must solve (150) in this general case).
Comment: Note that the stability requirement implies that |a1 | < 1.

6.2 Recursive Detection for Autoregressive Models


Suppose that one has to decide whether data arise from an autoregressive model or from white noise. In this
problem, the loglikelihood ratio is computed recursively.
Under hypothesis H1 , the data arise from the autoregressive model (147). Under hypothesis H0 , the
data Rn are i.i.d. Gaussian with zero mean and variance C0 . That is, under either hypothesis the marginal
deistribution on any sample Rn is the same. The only difference between the two models is in the covariance
structure.
a. Find the loglikelihood ratio for n samples. Call this loglikelihood ratio ln . Derive a recursive expression
for ln in terms of ln−1 and an update. Focus on the case n > m.
b. Consider the special case of m = 1. Write down the recursive structure for this case.
c. The performance increases as n grows. This can be quantified in various ways. One way is to compute
the information rate functions for each n. In this problem, you will compute a special case.
Consider again m = 1. Find the log-moment generating function for the difference between ln and
ln−1 conditioned on each hypothesis, and conditioned on previous measurements; call these two log-moment
generating functions m0 (s) and m1 (s):
m1 (s) = ln E{es(ln −ln−1 ) |H1 , r1 , r2, . . . , rn−1}. (152)

Compute and plot the information rate functions I0 (x) and I1 (x) for these two log-moment generating
functions.
Comment: These two functions quantify the increase in information for detection provided by the new
measurement.

6.3 Recursive Estimation for Autoregressive Models


In this problem, you will estimate the parameters in an autoregressive model given observations of the data
Rn , Rn−1, . . . , R1 .
a. First, assume that the maximum likelihood estimate for the parameters given data Rn−1 , Rn−2, . . . , R1
satisfies
Bn−1 ân−1 = dn−1, (153)
where the vector ân−1 is the maximum likelihood estimate of the parameter vector
a = [a1 a2 . . . am ]T . (154)

Find the update equations for Bn and dn . These may be obtained by writing down the likelihood
equation using the recursive update for the log-likelihood function, and taking the derivative with respect
to the parameter vector.
b. The computation for ân may also be written in recursive form. This is accomplished using the matrix
inversion lemma. The matrix inversion lemma states that a rank one update to a matrix yields a rank one
update to its inverse. More specifically, if A is an m × m symmetric, invertible matrix and f is an m × 1
vector, then
1
(A + f f T )−1 = A−1 − A−1 f f T A−1 . (155)
1 + f A−1 f
T

Use this equation to derive an equation for the estimate ân in terms of ân−1 . Hint: The final form should
look like
ân = ân−1 + gn [rn + âTn−1 (rn−1 rn−2 . . . rn−m )T ], (156)
where an auxiliary equation defines the vector gn in terms of B−1
n−1 and the appropriate definition of f .

31
6.4 Recursive Detection:
Order 1 Versus Order 2 Autoregressive Model
A decision must be made between two models for a sequence of Gaussian distributed random variables.
Each model is an autoregressive model. The first model is autoregressive of order one, while the second is
autoregressive of order two. There are two goals here as outlined below. First, the optimal test statistic
for a Neyman-Pearson test must be computed for a fixed number N of consecutive samples of a realization.
Second, an efficient update of this test statistic to the case with N + 1 samples must be derived.
Consider the following two hypotheses. Under H1 , the model for the measurements is
Yi = 0.75Yi−1 + Wi , (157)
where Wi are independent and identically distributed Gaussian random variables with zero mean and variance
equal to 7/4=1.75; Wi are independent of Y0 for all i; and Y0 is Gaussian distributed with zero mean and
variance 4.
Under H2 , the model for the measurements is
Yi = 0.75Yi−1 + 0.2Yi−2 + Wi , (158)
where Wi are independent and identically distributed Gaussian random variables with zero mean and variance
equal to 1.75; Wi are independent of Y0 for all i; and Y0 is Gaussian distributed with zero mean and variance
4. Y1 = 0.75Y0 + W1 where W1 is a zero mean Gaussian random variable with zero mean and variance 1.75.
a. Given Y0 , Y1 , . . . , YN , find the optimal test statistic for a Neyman-Pearson test. Simplify the expression
as much as possible. Interpret your answer.
b. Denote the test statistic computed in part a by lN . The optimal test statistic for N + 1 measurements
is lN+1 . Find an efficient update rule for computing lN+1 from lN .

6.5 Sequential Estimation Problem


Suppose that X and Y are independent Gaussian random variables with means 0 and variances 3 and 5,
respectively. Define the random variables
R1 = 5X + 3Y + W1 (159)
R2 = 3X + Y + W2 (160)
R3 = X − Y + W3 , (161)
where W1 , W2 , and W3 are identically distributed Gaussian random variables with zero mean and variance
1. The random variables X, Y , W1 , W2 , and W3 are all independent.
a. Find
  &  '
X̂(1) X
= E |R1 = r1 . (162)
Ŷ (1) Y

b. Derive an expression for


  &  '
X̂(2) X
= E |R1 = r1 , R2 = r2 (163)
Ŷ (2) Y

that only depends on X̂(1), Ŷ (1), and r2 .


c. Derive an expression for
  &  '
X̂(3) X
= E |R1 = r1 , R2 = r2 , R3 = r3 (164)
Ŷ (3) Y

that only depends on X̂(2), Ŷ (2),and r3 .

32
6.6 One-Dimensional Kalman Filtering Problem
Assume that

ẍ(t) = u(t) (165)


r(t) = ax(t) + w(t) (166)

where u(t) is white Gaussian noise with intensity Q and w(t) is white Gaussian noise with intensity N0 /2.
Assume that the observation of r(t) is available from 0 to t. Assume that x(0) is N (0, Λ). What is the
variance of the estimate for x(T )? What is the steady state variance in the estimate for x(t) as t gets large?
Are there any similarities between this problem for 0 < t < T and problem 4.1? Notice that if a series of
solutions to problem 3 are pieced together properly, then the result could closely approximate the solution
of this problem. Comment on this further.

33
Appendix
Potentially useful information:

1
e−αt e−j2πft dt = , Re{α} > 0, (167)
0 α + j2πf
0
1
eαt e−j2πft dt = , Re{α} > 0. (168)
−∞ α − j2πf


 1
αk = , |α| < 1 (169)
1−α
k=0


 1
kαk−1 = , |α| < 1 (170)
(1 − α)2
k=0

A Poisson distribution for a random variable X with mean λ > 0 has probabilities

λk −λ
P (X = k) = e , k = 0, 1, 2, . . . (171)
k!

34

You might also like