Random Processes: 8.1 Basic Concepts
Random Processes: 8.1 Basic Concepts
Random Processes: 8.1 Basic Concepts
Random Processes
In this chapter we study random processes. We will also study statistical properties of
random processes.
Statistical View: Fix time t. We can look at the 2-dimensional function X(t, ξ) “vertically”
as
X(t, ξ1 )
X(t, ξ2 )
..
.
X(t, ξ )
N
1
Temporal View: Fix the random index ξ. We can look at X(t, ξ) “horizontally” as
Figure 8.1: The sample space of a random process X(t, ξ) contains many functions. There-
fore, each random realization is a function.
In this example, the magnitude A(ξ) is a random quantity depending on the random index
ξ. For example if we draw ξ1 , perhaps we will land on a value A(ξ1 ) = 0.5. Then, X(t, ξ1 ) =
0.5 cos(2πt). In another instance, for example, if we draw ξ2 , we may land on A(ξ2 ) = 1.
Then, X(t, ξ2 ) = 1 cos(2πt).
We can look at X(t, ξ) from the statistical and the temporal views.
• Temporal View: Fix ξ (for example A(ξ) = 0.7). In this case, we have
c 2017 Stanley Chan. All Rights Reserved. 2
Figure 8.2: Left: Realizations of the random process X(t, ξ). Right: Realizations of the
random process X(n, ξ).
In this example, A can only take two states. If A = +1, then X(n, ξ) = (−1)n . If A = −1,
then X(n, ξ) = (−1)n+1 . Again, we can look at X(n, ξ) from two views.
• Statistical View: Fix n, say n = 10. Then,
(
(−1)10 = 1, with prob 1/2
X(n, ξ) = X(10, ξ) =
(−1)11 = −1, with prob 1/2,
which is a random variable.
• Temporal View: Fix ξ. Then,
(
(−1)n , if A = +1
X(n, ξ) =
(−1)n+1 , if A = −1,
which is a time series.
In this example, we see that the sample space of X(n, ξ) consists of only two functions with
probabilities
1
P(X[n] = (−1)n ) =
2
1
P(X[n] = (−1)n+1 ) =
2
c 2017 Stanley Chan. All Rights Reserved. 3
Therefore, if there is a sequence outside the sample space, e.g.,
P X[n] = 1 1 1 −1 1 −1 . . . =0
then the probability of obtaining that sequence is 0.
Fortunately, to characterize a random process X(t), we only need to compute the joint PDF
of X(t1 ), X(t2 ), . . . , X(tN ) for a finite number of time instants t1 , t2 , . . . , tN . That is, as long
as we know
fX(t1 ),...,X(tN ) (x1 , . . . , xN ),
we will be able to characterize the entire random process X(t). But this is still not easy
because the joint PDF is an N -dimensional function.
Our Goal: Since is practically very difficult to study all random processes, we turn our goal
to study a subset of random processes that can be characterized by its means and variances
(to be defined because random processes are functions). This class of random processes is
called the stationary random process, with a broader class called the wide sense stationary
process. We will discuss these two classes of random processes shortly. For the time being,
we first define “mean” and “variance” of a random process.
Mean function
Definition 2. The mean function µX (t) of a random process X(t) is
1
µX (0) = E[X(0)] = E[A cos(0)] = E[A] =
2
1
µX (t) = E[X(t)] = E[A cos(2πt)] = cos(2πt)E[A] = cos(2πt).
2
Example: Let Θ ∼Uniform[−π, π], and let X(t) = cos(ωt + Θ). Find µX (t).
Z π
1
µX (t) = E [cos(ωt + Θ)] = cos(ωt + θ) dθ = 0
−π 2π
c 2017 Stanley Chan. All Rights Reserved. 4
Example: Let X[n] = S n , where S ∼Uniform[0, 1]. Find µX [n].
Z 1
n 1
µX [n] = E[s ] = sn ds = .
0 n+1
Variance function
Definition 3. The variance function of a random process X(t) is
Autocorrelation function takes two time instants t1 and t2 . Since X(t1 ) and X(t2 ) are two
random variables, RX (t1 , t2 ) = E [X(t1 )X(t2 )] measures the correlation of these two random
variables.
CX (t, t) = E [X(t1 )X(t2 ) − X(t1 )µX (t2 ) − X(t2 )µX (t1 ) + µX (t1 )µX (t2 )]
= RX (t1 , t2 ) − µX (t1 )µX (t2 ) − µX (t1 )µX (t2 ) + µX (t1 )µX (t2 )
= RX (t1 , t2 ) − µX (t1 )µX (t2 ),
and the second property holds since CX (t, t) = E (X(t) − µX (t))2 = Var [X(t)].
c 2017 Stanley Chan. All Rights Reserved. 5
Definition 7. The cross-covariance function of X(t) and Y (t) is
Example: Let Y (t) = X(t)+N (t), where X(t) and N (t) are independent. Find RX,Y (t1 , t2 ).
RX,Y (t1 , t2 ) = E [X(t1 )Y (t2 )] = E [X(t1 ) (X(t2 ) + N (t2 ))] = RX (t1 , t2 ) + µX (t1 )µN (t2 )
c 2017 Stanley Chan. All Rights Reserved. 6
for any t1 and t2 . If X = Y , we can say that the random process itself is uncorrelated. This
leads to RX (t1 , t2 ) = µX (t1 )µX (t2 ), or CX (t1 , t2 ) = 0.
Note: If two random variable are independent, then they must be uncorrelated. On the
other hand, if they are uncorrelated, they are not necessarily independent.
⇒
independent X and Y uncorrelated X and Y
:
1. First order CDF must be independent of time, i.e., FX(t) (x) = FX(t+τ ) (x) = FX (x).
Therefore, we have
RX (t1 , t2 ) = RX (t2 − t1 )
CX (t1 , t2 ) = CX (t2 − t1 ).
c 2017 Stanley Chan. All Rights Reserved. 7
To check whether a random process is strictly stationary, we have to check its definition.
The following two examples illustrates how.
Example. Let Z be a random variable, and let X(t) = Z for all t. Then, X(t) is a strictly
stationary process because for any τ , we can write
Example. Let X[n] be a sequence of i.i.d. random variables. Then, X[n] is a strictly
stationary process because the CDF
In order to relax the criteria of stationarity, we consider a weaker class of stationary processes
called wide sense stationary (W.S.S.), defined as follows.
Remark: If a random process is strictly stationary, then it must be W.S.S. On the other
hand, if they are W.S.S., they are not necessarily strictly stationary. In short, we have that
⇒
strictly stationary WSS
:
CX (t1 , t2 ) = CX (τ )
RX (t1 , t2 ) = RX (τ ).
The reason of introducing this notation is that if X(t) is W.S.S., then the autocovariance
function CX (t1 , t2 ) and the autocorrelation function RX (t1 , t2 ) will be a function of t1 − t2 .
Therefore, it becomes unnecessary to consider a two-dimensional function.
c 2017 Stanley Chan. All Rights Reserved. 8
Properties of RX (τ )
When X(t) is W.S.S., the autocorrelation function RX (τ ) has several important properties.
Proof. Since RX (0) = E[X(t + 0)X(t)] = E[X(t)2 ], and since E[X(t)2 ] is the average power,
we have that RX (0) is the average power of X(t).
Proof. Note that RX (τ ) = E[X(t + τ )X(t)]. By switching the order of multiplication in the
expectation, we have E[X(t + τ )X(t)] = E[X(t)X(t + τ )] = RX (−τ ).
Corollary 3.
2(RX (0) − RX (τ ))
P(|X(t + τ ) − X(τ )| > ) ≤
2
This result says that if RX (τ ) is slowly decaying from RX (0), then the probability of having
a large deviation |X(t + τ ) − X(τ )| is small.
Proof.
= 2E[X(t)2 ] − 2E[X(t + τ )X(t)] /2
= 2 RX (0) − RX (τ ) /2
RX (τ )2 = E[X(t)X(t + τ )]2
≤ E[X(t)2 ]E[X(t + τ )2 ]
= E[X(t)2 ]2
= RX (0)2
c 2017 Stanley Chan. All Rights Reserved. 9
Physical Interpretation of RX (τ )
How do we understand the auto-correlation function RX (τ )? We can answer this question
from a computation’s perspective.
In practice, we are not given auto-correlation function, just like the PDF of a Gaussian
random variable. (If we were given the PDF of a Gaussian random variable, then there is
nothing to estimate. ) Therefore, given a set of observed signals, we have to estimate the
auto-correlation function. If we want to estimate the auto-correlation function RX (τ ), we
can first come up with a guess and check whether our guess is correct.
Is this a good choice? Let us take the expectation on both sides to check if E[R
bX (τ )] = RX (τ ).
If this result does not hold, then our guess is problematic because even on average R bX (τ ) is
not RX (τ ). Fortunately, the following result holds.
bX (τ ) def 1
RT
Lemma 1. Let R = 2T −T
X(t + τ )X(t)dt. Then,
h i
E RbX (τ ) = RX (τ ). (8.11)
Proof.
Z T
h i 1
E RX (τ ) =
b E [X(t + τ )X(t)] dt
2T −T
Z T
1
= RX (τ )dt
2T −T
Z T
1
= RX (τ ) dt
2T −T
= RX (τ ).
The implication of this lemma is that if the signal X(t) is long enough, we can estimate
RX (τ ) by computing
Z T
1
RX (τ ) =
b X(t + τ )X(t)dt.
2T −T
Now, we need to ask the question: what is R
bX (τ )? It turns out that R
bX (τ ) can be interpreted
as
c 2017 Stanley Chan. All Rights Reserved. 10
Proposition 1. R
bX (τ ) is the “un-flipped convolution”, or correlation, of X(τ ) and X(t+
τ)
There is nothing to prove in this proposition. All we need to know is that for convolution,
the definition is
Z T
Y (τ ) = X(t − τ )X(t)dt, (8.12)
−T
Clearly, R
bX (τ ) is the latter.
Theorem 1 (Mean-Ergodic Theorem). Let Y (t) be a W.S.S. process, with mean E[Y (t)] =
µ and autocovariance function CY (τ ). Define
Z T
1 def
MT = Y (t)dt. (8.14)
2T −T
h 2 i
Then, E MT − µ → 0 as T → ∞.
The above analysis also works for discrete cases. If the random process X[n] is discrete, we
can define
N
1 X
RX [k] =
b X[n + k]X[n], (8.15)
2N + 1 n=−N
h i
and show that E RbX [k] = RX [k].
bX [k] → RX [k] as N → ∞.
By Mean-Ergodic theorem, we have R
c 2017 Stanley Chan. All Rights Reserved. 11
8.4 Power Spectral Density
The autocorrelation function RX (τ ) can be analyzed in the frequency domain. In this section
we will discuss how this is done.
Assumption: X(t) is a W.S.S. random process with mean µX and autocorrelation RX (τ ).
Definition 10. The power spectral density (PSD) of a W.S.S. process is defined as
h i
2
E |XT (ω)|
e
SX (ω) = lim , (8.16)
T →∞ 2T
where Z T
X
eT (ω) = X(t)e−jωt dt (8.17)
−T
Before we move on and discuss how SX (ω) can be used in practice, we first explain why
SX (ω) is called the power spectral density.
Lemma 2. Define Z T
def 1 2
PX = E lim |X(t)| dt .
T →∞ 2T −T
If we can prove this result, then clearly SX (ω) is the density that integrates to yield the
power PX .
Proof. First of all, we recall that PX is the expected average power of X(t). Let
X(t), −T ≤ t ≤ T
XT (t) =
0 otherwise
c 2017 Stanley Chan. All Rights Reserved. 12
Therefore, we can show that PX satisfies
Z T
1 2
PX = E lim |X(t)| dt
T →∞ 2T −T
Z ∞
1 1 2
= E lim |XT (ω)| dω
e
T →∞ 2π 2T −∞
Z ∞
1 1 h e i
= lim E |XT (ω)|2 dω
2π −∞ T| →∞ 2T {z }
def
= SX (ω)
Physically, the presence of Fourier transform makes sense because RX (τ ) is the “self-convolution”
of the random process: RX (τ ) = E[X(t)X(t + τ )]. Since convolution in time is equivalent
to multiplication in frequency, the Fourier transform of a self convolved signal is equivalent
to the magnitude square in the Fourier domain, which is the power.
Properties of SX (ω)
Corollary 5. Z ∞
SX (ω) = 2 RX (τ )e−jωτ dτ. (8.19)
0
Corollary 6.
SX (ω) ≥ 0, ∀ω. (8.20)
c 2017 Stanley Chan. All Rights Reserved. 13
Proof. This is because SX (ω) is power, and so it must be non-negative.
Corollary 7.
RX (τ ) = F −1 {SX (ω)} (8.21)
Corollary 8. Z ∞
2 1
E[X(t) ] = SX (ω)dω (8.22)
2π −∞
1
R∞
Proof. E[X(t)2 ] = RX (0) = 2π S (ω)dω.
−∞ X
Example. Let X(t) = a cos(ω0 t + Θ), Θ ∼ Uniform[0, 2π]. Then we can show that
a2 a2 ejω0 τ + e−jω0 τ
RX (τ ) = cos(ω0 τ ) = .
2 2 2
Then, by taking Fourier transform of both sides, we have
a2 2πδ(ω − ω0 ) + 2πδ(ω + ω0 )
SX (ω) =
2 2
2
πa
= [δ(ω − ω0 ) + δ(ω + ω0 )] .
2
N0 ω
Example. Given that SX (ω) = 2
rect( 2W ), then
N0 W
RX (τ ) = sinc(W τ )
2 π
Finally, we define cross power spectral densities, which will be useful when we discuss optimal
filters.
Definition 11.
SX,Y (ω) = F(RX,Y (τ )) where RX,Y (τ ) = E[X(t + τ )Y (t)]
(8.23)
SY,X (ω) = F(RY,X (τ )) where RY,X (τ ) = E[Y (t + τ )X(t)]
Remark: In general, SX,Y (ω) 6= SY,X (ω). Rather, since RX,Y (τ ) = RY,X (−τ ), we have
SX,Y (ω) = SY,X (ω)∗ .
c 2017 Stanley Chan. All Rights Reserved. 14