Random Processes: 8.1 Basic Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14
At a glance
Powered by AI
The key takeaways are that a random process is a family of random variables expressed in a time sequence and can be understood from both a statistical and temporal view.

A random process can be understood from a statistical view by fixing time and looking at the random variables vertically, and from a temporal view by fixing the random index and looking at the deterministic time series horizontally.

The power spectral density of a wide-sense stationary process is defined as the expected value of the squared magnitude of the Fourier transform of the random process limited to a finite time interval, as the time interval approaches infinity.

Chapter 8

Random Processes

In this chapter we study random processes. We will also study statistical properties of
random processes.

8.1 Basic Concepts


A random process is a family of random variables expressed in a time sequence. In principle,
the sequence can be finite, e.g., {A1 , A2 , A3 }. However, in practice, a random process can
refer to an infinite family {A1 , A2 , A3 , . . .}. The need for working with infinite family of
random variables arises when we have an undetermined amount of data. For example,
sending bits but we do not know how much to send.

The definition of a random process is as follows.

Definition 1 (Random Process). A random process X(t, ξ) is a function of t indexed by a


random index ξ.

According to this definition of random processes, X(t, ξ) is a two-dimensional function. One


dimension is in t, and the other dimension is in ξ. The random part goes to ξ. In order to
understand what X(t, ξ) actually mean, we can have two different perspectives:

Statistical View: Fix time t. We can look at the 2-dimensional function X(t, ξ) “vertically”
as 
 X(t, ξ1 )

 X(t, ξ2 )

..


 .
 X(t, ξ )
N

This is a sequence of random variables because ξ1 , . . . , ξN are realizations of the random


variable ξ.

1
Temporal View: Fix the random index ξ. We can look at X(t, ξ) “horizontally” as

X(t1 , ξ), X(t2 , ξ), . . . , X(tK , ξ).

This is a deterministic time series evaluated at time points t1 , . . . , tK .

Figure 8.1: The sample space of a random process X(t, ξ) contains many functions. There-
fore, each random realization is a function.

Example. Let A ∼Uniform[0, 1]. Define X(t, ξ) = A(ξ) cos(2πt).

In this example, the magnitude A(ξ) is a random quantity depending on the random index
ξ. For example if we draw ξ1 , perhaps we will land on a value A(ξ1 ) = 0.5. Then, X(t, ξ1 ) =
0.5 cos(2πt). In another instance, for example, if we draw ξ2 , we may land on A(ξ2 ) = 1.
Then, X(t, ξ2 ) = 1 cos(2πt).

We can look at X(t, ξ) from the statistical and the temporal views.

• Statistical View: Fix t (for example t = 10). In this case, we have

X(t, ξ) = A(ξ) cos(2π(10)) = A(ξ) cos(20π),

which is a random variable because cos(20π) is a constant. The randomness of X


comes from the fact that A(ξ) ∼ Uniform[0, 1].

• Temporal View: Fix ξ (for example A(ξ) = 0.7). In this case, we have

X(t, ξ) = 0.7 cos(2πt),

which is a deterministic function in t.


c 2017 Stanley Chan. All Rights Reserved. 2
Figure 8.2: Left: Realizations of the random process X(t, ξ). Right: Realizations of the
random process X(n, ξ).

Example. Let A be a discrete random variable with PMF


1 1
P(A = +1) = , and P(A = −1) = .
2 2
Define X(n, ξ) = A(ξ)(−1)n .

In this example, A can only take two states. If A = +1, then X(n, ξ) = (−1)n . If A = −1,
then X(n, ξ) = (−1)n+1 . Again, we can look at X(n, ξ) from two views.
• Statistical View: Fix n, say n = 10. Then,
(
(−1)10 = 1, with prob 1/2
X(n, ξ) = X(10, ξ) =
(−1)11 = −1, with prob 1/2,
which is a random variable.
• Temporal View: Fix ξ. Then,
(
(−1)n , if A = +1
X(n, ξ) =
(−1)n+1 , if A = −1,
which is a time series.
In this example, we see that the sample space of X(n, ξ) consists of only two functions with
probabilities
1
P(X[n] = (−1)n ) =
2
1
P(X[n] = (−1)n+1 ) =
2


c 2017 Stanley Chan. All Rights Reserved. 3
Therefore, if there is a sequence outside the sample space, e.g.,
 
P X[n] = 1 1 1 −1 1 −1 . . . =0
then the probability of obtaining that sequence is 0.

8.2 Characterization of Random Process


To characterize a single random variable X, we need the PDF fX (x). To characterize a pair
of random variable (X,Y ), we need the joint PDF fX,Y (x, y). How about random process?
The difficulty comes because a random process is a collection of infinitely many random
variables.

Fortunately, to characterize a random process X(t), we only need to compute the joint PDF
of X(t1 ), X(t2 ), . . . , X(tN ) for a finite number of time instants t1 , t2 , . . . , tN . That is, as long
as we know
fX(t1 ),...,X(tN ) (x1 , . . . , xN ),
we will be able to characterize the entire random process X(t). But this is still not easy
because the joint PDF is an N -dimensional function.

Our Goal: Since is practically very difficult to study all random processes, we turn our goal
to study a subset of random processes that can be characterized by its means and variances
(to be defined because random processes are functions). This class of random processes is
called the stationary random process, with a broader class called the wide sense stationary
process. We will discuss these two classes of random processes shortly. For the time being,
we first define “mean” and “variance” of a random process.

Mean function
Definition 2. The mean function µX (t) of a random process X(t) is

µX (t) = E [X(t)] . (8.1)


Example: Let A ∼Uniform[0, 1], and let X(t) = A cos(2πt), find µX (0), and µX (t).

1
µX (0) = E[X(0)] = E[A cos(0)] = E[A] =
2
1
µX (t) = E[X(t)] = E[A cos(2πt)] = cos(2πt)E[A] = cos(2πt).
2
Example: Let Θ ∼Uniform[−π, π], and let X(t) = cos(ωt + Θ). Find µX (t).
Z π
1
µX (t) = E [cos(ωt + Θ)] = cos(ωt + θ) dθ = 0
−π 2π


c 2017 Stanley Chan. All Rights Reserved. 4
Example: Let X[n] = S n , where S ∼Uniform[0, 1]. Find µX [n].

Z 1
n 1
µX [n] = E[s ] = sn ds = .
0 n+1

Variance function
Definition 3. The variance function of a random process X(t) is

Var[X(t)] = E (X(t) − µX (t))2 ) .


 
(8.2)
Note: Both µX (t) and Var [X(t)] are functions of t.

Definition 4. The autocorrelation function of a random process X(t) is

RX (t1 , t2 ) = E [X(t1 )X(t2 )] . (8.3)

Autocorrelation function takes two time instants t1 and t2 . Since X(t1 ) and X(t2 ) are two
random variables, RX (t1 , t2 ) = E [X(t1 )X(t2 )] measures the correlation of these two random
variables.

Definition 5. The autocovariance function of a random process X(t) is

CX (t1 , t2 ) = E [(X(t1 ) − µX (t1 )) (X(t2 ) − µX (t2 ))] . (8.4)

There are two interesting properties of the autocovariance function:


1. CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 )
2. CX (t, t) = Var [X(t)]
Proof.

CX (t, t) = E [X(t1 )X(t2 ) − X(t1 )µX (t2 ) − X(t2 )µX (t1 ) + µX (t1 )µX (t2 )]
= RX (t1 , t2 ) − µX (t1 )µX (t2 ) − µX (t1 )µX (t2 ) + µX (t1 )µX (t2 )
= RX (t1 , t2 ) − µX (t1 )µX (t2 ),

and the second property holds since CX (t, t) = E (X(t) − µX (t))2 = Var [X(t)].
 

Definition 6. The cross-correlation function of X(t) and Y (t) is

RX,Y (t1 , t2 ) = E [X(t1 )Y (t2 )] . (8.5)


c 2017 Stanley Chan. All Rights Reserved. 5
Definition 7. The cross-covariance function of X(t) and Y (t) is

CX,Y (t1 , t2 ) = E [(X(t1 ) − µX (t1 )) (Y (t2 ) − µY (t2 ))] (8.6)

Remark: CX,Y (t1 , t2 ) = RX,Y (t1 , t2 ) = E[X(t1 )Y (t2 )] if µX (t1 ) = µY (t2 ) = 0.

Example: Let A ∼Uniform[0, 1], X(t) = A cos(2πt). Then,


1
µX (t) = E [A cos(2πt)] = cos(2πt)
2
1
RX (t1 , t2 ) = E [A cos(2πt1 )A cos(2πt2 )] = E[A2 ] cos(2πt1 ) cos(2πt2 ) = cos(2πt1 ) cos(2πt2 )
3
1
CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ) = cos(2πt1 ) cos(2πt2 )
12

Example: Let Θ ∼Uniform[−π, π], X(t) = cos(ωt + Θ). Then,


Z π
1
µX (t) = E [cos(ωt + Θ)] = cos(ωt + θ) dθ = 0
−π 2π
1  
RX (t1 , t2 ) = E [cos(ωt1 + Θ) cos(ωt2 + Θ)] = cos ω(t1 − t2 )
2
1  
CX (t1 , t2 ) = RX (t1 , t2 ) − µX (t1 )µX (t2 ) = cos ω(t1 − t2 )
2

Example: Let Y (t) = X(t)+N (t), where X(t) and N (t) are independent. Find RX,Y (t1 , t2 ).

RX,Y (t1 , t2 ) = E [X(t1 )Y (t2 )] = E [X(t1 ) (X(t2 ) + N (t2 ))] = RX (t1 , t2 ) + µX (t1 )µN (t2 )

Independent and Uncorrelated


Independent: Two random processes are independent if

FX(t1 ),...,X(tN ),Y (t1 ),...,Y (tN ) (x1 , . . . , xN , y1 , . . . , yN )


= FX(t1 ),...,X(tN ) (x1 , . . . , xN ) × FY (t1 ),...,Y (tN ) (y1 , . . . , yN ).

Uncorrelated: Two random processes are uncorrelated if

E [X(t1 )Y (t2 )] = E [X(t1 )] E [X(t2 )] , (8.7)


c 2017 Stanley Chan. All Rights Reserved. 6
for any t1 and t2 . If X = Y , we can say that the random process itself is uncorrelated. This
leads to RX (t1 , t2 ) = µX (t1 )µX (t2 ), or CX (t1 , t2 ) = 0.

Note: If two random variable are independent, then they must be uncorrelated. On the
other hand, if they are uncorrelated, they are not necessarily independent.


independent X and Y uncorrelated X and Y
:

8.3 Stationary Processes


Among all random processes, we are interested in studying a subset which allows us to say
something concrete about the statistics. This subset of random processes is called stationary
processes.

Strictly Stationary Processes


Definition 8. A random process X(t) is strictly stationary if the joint distribution of
any set of samples does not depend on the placement of the time origin. That is, for any
τ,
FX(t1 ),...,X(tN ) (x1 , . . . , xN ) = FX(t1 +τ ),...,X(tN +τ ) (x1 , . . . , xN ) (8.8)

Implications of Strictly Stationary Processes If X(t) is strictly stationary, then the


CDF must be independent of the starting time but only the time difference. In particular,
we must have

1. First order CDF must be independent of time, i.e., FX(t) (x) = FX(t+τ ) (x) = FX (x).
Therefore, we have

µX (t) = µX for all t


2
Var [X(t)] = σX for all t.

2. Second order CDF depends on time difference only, i.e.,

FX(t1 ),X(t2 ) (x1 , x2 ) = FX(0),X(t2 −t1 ) (x1 , x2 ). (8.9)

If this is the case, then we have

RX (t1 , t2 ) = RX (t2 − t1 )
CX (t1 , t2 ) = CX (t2 − t1 ).


c 2017 Stanley Chan. All Rights Reserved. 7
To check whether a random process is strictly stationary, we have to check its definition.
The following two examples illustrates how.

Example. Let Z be a random variable, and let X(t) = Z for all t. Then, X(t) is a strictly
stationary process because for any τ , we can write

FX(t1 +τ ),...,X(tN +τ ) (x1 , . . . , xN ) = FZ,...,Z (x1 , . . . , xN )


= FX(t1 ),...,X(tN ) (x1 , . . . , xN )

Example. Let X[n] be a sequence of i.i.d. random variables. Then, X[n] is a strictly
stationary process because the CDF

FX[n1 +k],...,X[nN +k] (x1 , . . . , xN ) = FX[n1 ],...,X[nN ] (x1 , . . . , xN ).

Wide Sense Stationary Processes


As illustrated in the above examples, checking strict stationarity requires verifying the defi-
nition, which in turn requires checking the entire CDF. This is practically not always feasible
because the CDF could be complicated.

In order to relax the criteria of stationarity, we consider a weaker class of stationary processes
called wide sense stationary (W.S.S.), defined as follows.

Definition 9. A random process X(t) is wide sense stationary (W.S.S.) if:

1. µX (t) = µX for all t,

2. CX (t1 , t2 ) = CX (t1 − t2 ) for all t1 , t2 .

Remark: If a random process is strictly stationary, then it must be W.S.S. On the other
hand, if they are W.S.S., they are not necessarily strictly stationary. In short, we have that

strictly stationary WSS
:

Notation: When X(t) is W.S.S., we define τ = t2 − t1 and write

CX (t1 , t2 ) = CX (τ )
RX (t1 , t2 ) = RX (τ ).

The reason of introducing this notation is that if X(t) is W.S.S., then the autocovariance
function CX (t1 , t2 ) and the autocorrelation function RX (t1 , t2 ) will be a function of t1 − t2 .
Therefore, it becomes unnecessary to consider a two-dimensional function.


c 2017 Stanley Chan. All Rights Reserved. 8
Properties of RX (τ )
When X(t) is W.S.S., the autocorrelation function RX (τ ) has several important properties.

Corollary 1. RX (0) = average power of X(t)

Proof. Since RX (0) = E[X(t + 0)X(t)] = E[X(t)2 ], and since E[X(t)2 ] is the average power,
we have that RX (0) is the average power of X(t).

Corollary 2. RX (τ ) is symmetric. That is, RX (τ ) = RX (−τ ).

Proof. Note that RX (τ ) = E[X(t + τ )X(t)]. By switching the order of multiplication in the
expectation, we have E[X(t + τ )X(t)] = E[X(t)X(t + τ )] = RX (−τ ).

Corollary 3.
2(RX (0) − RX (τ ))
P(|X(t + τ ) − X(τ )| > ) ≤
2

This result says that if RX (τ ) is slowly decaying from RX (0), then the probability of having
a large deviation |X(t + τ ) − X(τ )| is small.
Proof.

P(|X(t + τ ) − X(τ )| > ) ≤ E[(X(t + τ ) − X(τ ))2 ]/2


 
= E[X(t + τ ) ] − 2E[X(t + τ )X(t)] + E[X(t) ] /2
2 2

 
= 2E[X(t)2 ] − 2E[X(t + τ )X(t)] /2
 
= 2 RX (0) − RX (τ ) /2

Corollary 4. |RX (τ )| ≤ RX (0), for all τ .

Proof. By Cauchy inequality E[XY ]2 ≤ E[X 2 ]E[Y 2 ], we can show that

RX (τ )2 = E[X(t)X(t + τ )]2
≤ E[X(t)2 ]E[X(t + τ )2 ]
= E[X(t)2 ]2
= RX (0)2


c 2017 Stanley Chan. All Rights Reserved. 9
Physical Interpretation of RX (τ )
How do we understand the auto-correlation function RX (τ )? We can answer this question
from a computation’s perspective.

In practice, we are not given auto-correlation function, just like the PDF of a Gaussian
random variable. (If we were given the PDF of a Gaussian random variable, then there is
nothing to estimate. ) Therefore, given a set of observed signals, we have to estimate the
auto-correlation function. If we want to estimate the auto-correlation function RX (τ ), we
can first come up with a guess and check whether our guess is correct.

To this end, let us consider a W.S.S. process X(t) and a function


Z T
def 1
RX (τ ) =
b X(t + τ )X(t)dt. (8.10)
2T −T

Is this a good choice? Let us take the expectation on both sides to check if E[R
bX (τ )] = RX (τ ).
If this result does not hold, then our guess is problematic because even on average R bX (τ ) is
not RX (τ ). Fortunately, the following result holds.

bX (τ ) def 1
RT
Lemma 1. Let R = 2T −T
X(t + τ )X(t)dt. Then,
h i
E RbX (τ ) = RX (τ ). (8.11)

Proof.
Z T
h i 1
E RX (τ ) =
b E [X(t + τ )X(t)] dt
2T −T
Z T
1
= RX (τ )dt
2T −T
Z T
1
= RX (τ ) dt
2T −T
= RX (τ ).

The implication of this lemma is that if the signal X(t) is long enough, we can estimate
RX (τ ) by computing
Z T
1
RX (τ ) =
b X(t + τ )X(t)dt.
2T −T
Now, we need to ask the question: what is R
bX (τ )? It turns out that R
bX (τ ) can be interpreted
as


c 2017 Stanley Chan. All Rights Reserved. 10
Proposition 1. R
bX (τ ) is the “un-flipped convolution”, or correlation, of X(τ ) and X(t+
τ)

There is nothing to prove in this proposition. All we need to know is that for convolution,
the definition is
Z T
Y (τ ) = X(t − τ )X(t)dt, (8.12)
−T

whereas for correlation, the definition is


Z T
Y (τ ) = X(t + τ )X(t)dt. (8.13)
−T

Clearly, R
bX (τ ) is the latter.

bX (τ ) → RX (τ ) as T → ∞? The answer to this question is addressed by a


How soon will R
very important theorem called Mean-Ergodic Theorem, which can be considered as the
random process version of the weak law of large number.

Theorem 1 (Mean-Ergodic Theorem). Let Y (t) be a W.S.S. process, with mean E[Y (t)] =
µ and autocovariance function CY (τ ). Define
Z T
1 def
MT = Y (t)dt. (8.14)
2T −T
h 2 i
Then, E MT − µ → 0 as T → ∞.

We will explain and discuss this theorem later.

The above analysis also works for discrete cases. If the random process X[n] is discrete, we
can define
N
1 X
RX [k] =
b X[n + k]X[n], (8.15)
2N + 1 n=−N
h i
and show that E RbX [k] = RX [k].

bX [k] → RX [k] as N → ∞.
By Mean-Ergodic theorem, we have R


c 2017 Stanley Chan. All Rights Reserved. 11
8.4 Power Spectral Density
The autocorrelation function RX (τ ) can be analyzed in the frequency domain. In this section
we will discuss how this is done.
Assumption: X(t) is a W.S.S. random process with mean µX and autocorrelation RX (τ ).

Definition 10. The power spectral density (PSD) of a W.S.S. process is defined as
h i
2
E |XT (ω)|
e
SX (ω) = lim , (8.16)
T →∞ 2T
where Z T
X
eT (ω) = X(t)e−jωt dt (8.17)
−T

is the Fourier transform of X(t) limited to [−T, T ].

Before we move on and discuss how SX (ω) can be used in practice, we first explain why
SX (ω) is called the power spectral density.

Lemma 2. Define  Z T 
def 1 2
PX = E lim |X(t)| dt .
T →∞ 2T −T

Then, it holds that Z ∞


1
PX = SX (ω)dω. (8.18)
2π −∞

If we can prove this result, then clearly SX (ω) is the density that integrates to yield the
power PX .

Proof. First of all, we recall that PX is the expected average power of X(t). Let

X(t), −T ≤ t ≤ T
XT (t) =
0 otherwise

Then we can show that integrating over −∞ to ∞ is equivalent to


Z ∞ Z T
2
|XT (t)| dt = |X(t)|2 dt.
−∞ −T

By Parseval’s theorem, energy preserves in both time and frequency domain:


Z ∞ Z ∞
2 1 eT (ω)|2 dω.
|XT (t)| dt = |X
−∞ 2π −∞


c 2017 Stanley Chan. All Rights Reserved. 12
Therefore, we can show that PX satisfies
 Z T 
1 2
PX = E lim |X(t)| dt
T →∞ 2T −T
 Z ∞ 
1 1 2
= E lim |XT (ω)| dω
e
T →∞ 2π 2T −∞
Z ∞
1 1 h e i
= lim E |XT (ω)|2 dω
2π −∞ T| →∞ 2T {z }
def
= SX (ω)

How is SX (ω) related to the autocorrelation function RX (τ )? This is addressed by the


Einstein-Wiener-Khinchin theorem.
Theorem 2 (Einstein-Wiener-Khinchin Theorem). The power spectral density SX (ω) of a
W.S.S. process is
Z ∞
SX (ω) = RX (τ )e−jωτ dτ
−∞
= F(RX (τ )).

That is, SX (ω) is the Fourier Transform of the autocorrelation function.

Physically, the presence of Fourier transform makes sense because RX (τ ) is the “self-convolution”
of the random process: RX (τ ) = E[X(t)X(t + τ )]. Since convolution in time is equivalent
to multiplication in frequency, the Fourier transform of a self convolved signal is equivalent
to the magnitude square in the Fourier domain, which is the power.

Properties of SX (ω)
Corollary 5. Z ∞
SX (ω) = 2 RX (τ )e−jωτ dτ. (8.19)
0

Proof. Since R(τ ) = R(−τ ), we have


Z ∞
SX (ω) = RX (τ )e−jωτ dτ
−∞
Z ∞
=2 RX (τ )e−jωτ dτ.
0

Corollary 6.
SX (ω) ≥ 0, ∀ω. (8.20)


c 2017 Stanley Chan. All Rights Reserved. 13
Proof. This is because SX (ω) is power, and so it must be non-negative.

Corollary 7.
RX (τ ) = F −1 {SX (ω)} (8.21)

Proof. Since SX (ω) is the Fourier transform of RX (τ ), the inverse holds.

Corollary 8. Z ∞
2 1
E[X(t) ] = SX (ω)dω (8.22)
2π −∞

1
R∞
Proof. E[X(t)2 ] = RX (0) = 2π S (ω)dω.
−∞ X

Example. Let RX (τ ) = e−2α|τ | , then



SX (ω) = F {RX (τ )} = .
4α2 + ω2

Example. Let X(t) = a cos(ω0 t + Θ), Θ ∼ Uniform[0, 2π]. Then we can show that
a2 a2 ejω0 τ + e−jω0 τ
 
RX (τ ) = cos(ω0 τ ) = .
2 2 2
Then, by taking Fourier transform of both sides, we have
a2 2πδ(ω − ω0 ) + 2πδ(ω + ω0 )
 
SX (ω) =
2 2
2
πa
= [δ(ω − ω0 ) + δ(ω + ω0 )] .
2
N0 ω
Example. Given that SX (ω) = 2
rect( 2W ), then
N0 W
RX (τ ) = sinc(W τ )
2 π

Finally, we define cross power spectral densities, which will be useful when we discuss optimal
filters.
Definition 11.
SX,Y (ω) = F(RX,Y (τ )) where RX,Y (τ ) = E[X(t + τ )Y (t)]
(8.23)
SY,X (ω) = F(RY,X (τ )) where RY,X (τ ) = E[Y (t + τ )X(t)]

Remark: In general, SX,Y (ω) 6= SY,X (ω). Rather, since RX,Y (τ ) = RY,X (−τ ), we have
SX,Y (ω) = SY,X (ω)∗ .


c 2017 Stanley Chan. All Rights Reserved. 14

You might also like