2021 - Week - 3 - Ch.2 Random Process

2.
Random Variables and Stochastic Process
2.4 Probabilistic Concepts Applied to Random Variables
 Joint Probability Distribution
F ( x 1 , x 2 , … , x n ) =P ( X 1 ≤ x 1 , X 2 ≤ x 2 , … , X n ≤ x n )
 Marginal Probability
F ( x 1 , x 2 , … , x k ) =P( X 1 ≤ x 1 , X 2 ≤ x 2 , X k ≤ x k , ∞ , … , ∞)
 Joint Probability Density

n
∂
f ( x1 , x2 , … , xn )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 n n
 Marginal Probability Density

n
∂
f ( x1 , x2 , … , xk )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 k k
 The marginal probability distribution

x1 xk ∞ ∞
F ( x 1 , x 2 , … , x k ) =∫ … ∫ … ∫ … ∫ f ( s 1 , s2 , … s n ) d s1 … d s n
−∞ −∞ −∞ −∞
%%% Kim’s Example
Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )

∞ ∞
f ( x )= ∫ f ( x , s ) ds , f ( y )= ∫ f ( s , y ) dy
−∞ −∞
Ok. That is, by definition, a continuous case. In discrete case
Given a group of people, there are two experiments as measurement of temperature and
blood pulse rate. Let us denote ω as one person
T H = { ω :high temperature } ,T L= { ω :low temperature }
P H ={ ω :high pulse rate } , P L ={ω :low pulserate }
The probabilities of the outcomes are
P ( T H ∩ P H }=0.4 , P ( T L ∩ PH ) =0.2
P ( T H ∩ P L }=0.3 , P ( T L ∩ P L )=0.1
Then the marginal probabilities are
P ( T H )=P ( T H ∩ P H ) + P ( T H ∩ P L ) =0.4+0.3=0.7
And
P ( T L ) =P ( T L ∩ P H )+ P ( T H ∩ P L ) =0.2+0.1=0.3
You may compare the continuous case. %%%
%%% Kim’s Example for continuous case

Given the joint probability as
{
f ( x , y )= 2, 0< x , 0< y , x + y <1
0 otherwise
1) Is it a PDF ( or CDF) ?
One the necessary condition is

1.5
0.5
dx
dy
-0.5 ∞ ∞ 1
∫ ∫ f (x , y )dxdy =∫ ¿ ¿
-0.5 0 0.5 1 1.5
−∞ −∞ 0
2) Find marginal probabilities f ( x ) , f ( y )

∞ 1− y
f ( y ) =∫ f ( x , y )dx= ∫ 2 dx=2 ( 1− y ) , 0< y <1
−∞ 0
∞ 1−x
f ( x )= ∫ f (x , y ) dy= ∫ 2 dy=2 ( 1−x ) , 0< x< 1
−∞ 0
3) Is ( X , Y ) independent? No since
2=f ( x , y ) ≠ f ( x ) f ( y )=4 (1−x)(1− y)
 HA_2_1 :
{
f ( x , y )= 2 x + y=1 , 0< x <2 , 0< y <1
0 otherwise
%%%
f(x,y)
1
1) Is this a PDF? 2 x
2) Find marginal probabilities f ( x )∧f ( y )
%%%
Def 2.16. Two random variables X and Y are called independent if any event of the
form X ( ω ) ∈ A id independent of any event of the form Y ( ω ) ∈ A where A , B are
sets in Rn
 Fact
P ( X ∈ A , Y ∈ B )=P ( X ∈ A ) P (Y ∈ B )
 The joint probability distribution
F ( x , y ) =P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) P ( Y ≤ y )=F ( x ) F ( y )
 The joint probability density function
∂2 ∂ ∂ ∂F ∂F
f XY ( x , y )= F ¿ X = x, Y = y¿ F ¿ X = x F ¿Y = y ¿ ¿X = x ¿
∂ x∂ y ∂x ∂ y ∂x ∂ y Y=y
¿ f X ( x) f Y ( y )
2.5 Functions of a Random Variable -skip
Y ( ω ) =g ( X ( ω ) ) has the density function
X ( ω ) f Y ( y )=f X ( g−1 ( y ) ) ∨J ( y )∨¿
Where ¿ J ( y )∨¿ stands for the absolute value of the determinant of the matrix
[ ]
−1 −1
∂ g1 ∂ gn
⋯
∂ y1 ∂ y1
J ( y )= ⋮ ⋱ ⋮
−1 −1
∂ g1 ∂ gn
⋯
∂ yn ∂ yn Y=y
2.6 Expectations and Moments of a Random Variable
Def.
 The mean
∞
E [ X ] = ∫ xf ( x ) dx (a)
−∞
 The sample mean

n
1
mk = ∑ X k (1)
n k=1
%% The sample mean is a Random Variable! It is an estimator of the mean of a random
variable X . If X k is an independent identical distributed (iid) random variable i.e.,
E ( X k )=m ∀ k ,
Then the mean of the sample mean is
( )
n n
1 1 1
E ( m k ) =E ∑
n k =1
X k = ∑ E ( X k )= ( nm ) =m(b)
n k=1 n
%%%Kim’s Comment
What is the difference between a) and b)? In order to use (a) , it is needed know the
probability density function, whereas in (b), not needed.
Example. 2.19. X is uniformly distributed from 0 to 1, i.e.,
{
f ( x )= 1 ,0 ≤ x ≤ 1
0 , otherwise
∞ 1
1
Then E ( X ) = ∫ xf ( x ) dx=∫ xdx= 2
−∞ 0
Examp. 2.22 The expectation of the value of one roll of one die?
Properties
1) The operator of expectation is linear
E ( aX +bY )=aE ( X )+ bE(Y )

2) The square mean / second moment
∞
E ( X ) =∫ x f (x ) dx
2 2
−∞
3) The higher order moment

∞
E ( X )=∫ x f (x) dx
n n
−∞
4) The variance
var ( X )=E [( X−E ( X )2) ]=E ( X 2 )−E ( X )2
5) The standard deviation
σ X = √ var (X )
6) The sample variance

n
1
σ =
2
n ∑
n−1 k=1
( X k −mn )
2
2
This is a random variable. And the unbiased estimator of σ X
%%% Kim’s comment: biased and un-biased estimator
What is the estimator? Let X be a RV. I want to find a constant “C” as RV in some sense.
We may call C as an estimator of the RV X . So there may be many estimators as you like.
We may classify the estimator as
1) Unbiased estimator / biased estimator
If C=E( X) , then C is the unbiased estimator, otherwise the biased estimator
2) The minimum variance estimator /the least square error estimator
C=min E ( X −a )2
a
3) The mean of X is the minimum variance estimator / the least square error estimator.
min E ( X −a )2=E ( X ) (c)

a
Proof:
d
( E ( X−a )2 )= d ( E X 2 +a 2−2 aE ( X ) ) =2 a−2 E ( X )
da da
 a=E( X) , which minimizes the (c).
Examp. 2.24 The uniform distributed random variable X , [ 0 1 ]

1
1
E ( X ) =∫ x dx=
2 2
0 3
The Variance is
1 1 1
var ( X )=E ( X ) −E ( X ) = − =
2 2
3 4 12
2.7 Characteristic Functions -skip
Lemma 2.27
1 n
d ϕX (υ )
jn
E [ X ]=
n
|ν=0
d υn
Prop.2.28 If X is a Gaussian random vector with mean, m, and covariance matrix P,
then its characteristic function is
( 1
ϕ X ( υ )=exp j υ T m− υT Pυ .
2 )
%%% Kim’s comment: correlation
Def: Two R.V. are uncorrelated if
Cov ( X , Y )=E ( X−E ( X ) )( Y −E ( Y ) )=E ( XY ) −E ( X ) E (Y )=0
%%%
Fact: Two Gaussian Random Vectors are uncorrelated if Cov ( X , Y ) is a diagonal matrix
 Prop. 2.29. Uncorrelated Gaussian random variables are independent
Theorem 2.30. If X is a Gaussian random vector with mean m X , and covariance, P X , and if
Y =CX +V , where υ is a Gaussian random vector with zero mean and covariance, PV , then Y
is a Gaussian random vector with mean, C m , and covariance, C P X C + PV .
T
X
 Theorem 2.30
A R.V X N (m x , P X ) , another R.V. V N ( 0 , PV ) and they are independent. Find mean and
covariance of Y =CX +V
%%% Kim’s comment: Characteristic function is difficult to remember. In the text book, using
the characteristic method. In this case we may apply basic theory.
Sol: Let’s apply the basic definition.
m y =E ( Y ) =E (CX +V ) =E (CX ) + E ( V )=CE ( X )+ E ( V ) =C mx

PY =E [ ( Y −m y ) ( Y −m y )T ]=E [ Y Y T ]−m y mTy =E [ ( CX +V )( CX +V )T ]−C mx mTx CT
E [ ( CX+ V )( CX +V ) ]=E [ CX X C +CX V +V CX +V V ]

T T T T T T
¿ E¿
¿ E¿
¿ E¿
Hence
PY =E [ ( Y −m y ) ( Y −m y ) ]=E ¿
T
T
¿ C P X C + PV
- In general, independency implies the uncorrelated, not vice versa
- However, in Gaussian Does satisfy the opposite direction. %%%
%%% Kim’s comment: covariance matrix
Sometimes, but most case in this course, we may deal with a random vector whose
components are random variables, i.e.
X =( x , y , z ) is a random vector, its components are random variables ( x , y , z ) Then the

covariance of X random vector is defined ad
[ ]
cov ( x , x ) cov ( x , y ) cov ( x , z)
Cov ( X )= cov ( x , y) cov ( y , y) cov ( y , z )
cov (x , z) cov ( y , z) cov (z , z )
Where,
cov ( x , y )=E ¿
hence by definition
E¿
Therefore, the matrix Cov (X ) is a symmetric matrix, i.e.,
T
Cov ( X )=[ Cov ( X ) ]
The diagonal terms of the covariance matrix are variance of each random variable
%%%
 The covariance of a uncorrelated (so independent) Gaussian is a diagonal matrix,
[ ]
σx 0 0
PX= 0 σ y 0
0 0 σz
%%% Kim’s Comment: Linear matrix theory: similar transform

For any semi-positive symmetric matrix M , there is a similar transform matrix S such
that
diag ( Λ )=SM S−1=SM ST

Hence the covariance P X for any gaussian Random vectors (correlated), there exits a S
such that
diag ( Λ X )=S P X ST
 Any Gaussian Random vectors, we can find a transformed Random Vectors which is
uncorrelated (independent).
 Independency is important to calculate the probability. You know the Gaussian

probability table, but it is a scalar. So it you want to calculate the joint probability
which may be correlated, first find a similar transform matrix to generate a diagonal
covariance matrix. Then you may calculate the joint probability as a separate
probability.
%%%
 The central limit theorem
Theorem 2.31. Let X 1 , … , X n be i.i.d. random variables with finite mean and variance,
n
E [ X k ] =m< ∞ , E [ ( X k −m) ] =σ < ∞, and denote their sum as Y n ≔∑ X k. Then the
2 2
k=1
distribution of the normalized sum
Y n−E [ Y n ] Y n−nm
Zn≔ =
√ var ( Y n ) σ √n
is a Gaussian distribution with mean 0 and variance 1 in the limit as n → ∞
- Proof : textbook P.52
- Remarks:
[ ]
1) See, the condition, E [ X k ] =m< ∞ , E ( X k −m ) =σ < ∞, that means
2 2
the mean and the variance is constant, but the experiment is many time
processing. For example,
a) A die, which is fair or not, you roll the same die many times. Then
Yn 1 n
the mean of the sum ( = ∑ X k) is a Gaussian if n−→ ∞.
n n k=1
2) Some RV has no mean, then it will not be applicable.
2.8 Conditional Expectations and Conditional Probabilities
 The conditional expectation

∞
E [ X|Y ]= ∫ x f ( x| y ) dx
−∞
f (x , y)
f ( x| y )=
f ( y)
 Remarks
- E [ X ] is a constant, means it is not random variable.

- E [ X| y =2 ] if y is a constant, then E [ X| y =2 ] is a constant
- E [ X|Y ] if Y is a RV, then E [ X|Y ] is a Random Variable of y
 Iterated expectation (See the proof at p.57 and remember)
E [ X ] =E [ E [ X|Y ] ]
%%% Kim’s comment
E [ X ] =E X [ X ] −→ need f X ( x)
E X [ X ] =EY [ E X [ X|Y ] ]−→ need f Y ( y ) , E x [ X|Y ] (¿ f X ∨Y ( x| y ) )
Even if we do not know f X ( x ).
I should say, this formula cannot emphasize too much! This very simple fact use diverse
applications, big data, machine learning, and dynamic system analysis. We should remember this.
%%%
 Lemma 2.34.
1) E [ X|Y ]=E [ X ] if X ∧Y areindependent
2) E [ g ( Y ) X|Y ]=g ( Y ) E [ X∨Y ]
2.9 Stochastic Process
 Def. 2.36. A stochastic process is a family of random variables, X ( ω , t), indexed by a real
parameter t ∈ T and defined on a common probability space ( Ω, A , P).
%%% Kim’s comment
A stochastic process (or random process) is a time varying random variable, i.e., for any fixed t ,
the process is a random variable.
%%%
 Ex. 2.37
X ( ω , t )= A ( ω ) sin t , A ( ω ) ∈ U [−1,1 ]
 Def. 2.38.
1) A stochastic process X (ω , t) is said to be continuous in probability at t if
lim P (¿ ω :|X ( ω ,t )−X ( ω , s )|≥ ϵ )=0 ¿

s →t
for all ϵ >0
2) Skip: A stochastic process X ( ω , t ) is said to be separable if there exists a

countable, dense set S ⊂ T such that for any closed set K ⊂ [ −∞ , ∞ ] thetwo sets
A 1 = { ω : X ( ω , t ) ∈ K ∀ t ∈ T } , A 1= { ω : X ( ω , t ) ∈ K ∀ t ∈ S } ,
differ by a set A0 such that P ( A 0 ) =0.
 Skip: Theorem 2.40. The rational numbers in T provide a separating set S.
 Def. 2.42. Let X be a random process defined on the time interval, T. Let
t 0 <t 1 …<t n be a partition of the time interval, T. If the increments,

X ( t k )− X ( t k−1 ) , are mutually independent for any partition of T, then X is said to
be a process with independent increments.
 Def. 2.43 We say that a random process, X, is a Gaussian process if for every
finite collection, X t 1 , X t2 , … , X tn ,the corresponding density function,
f (x 1 , x 2 , … , x n)
is a Gaussian density function.
 Def. 2.44 We say that a random process X is a Gaussian process if every finite
linear combination of the form
N
Y =∑ α j X (t j )
j=1
is a Gaussian random variable
 Def 2.45. A random process{ X t , t ∈T }, where T is a subset of the real line, is

said to be a Markov process if for any increasing collection
t 1< t 2 <, … ,<t n ∈ T
P ( X t < x n| X t =x n−1 ,… , X t =x 1 ¿=P( X t ≤ x n∨ X t =x n−1)

n n−1 1 n n−1
or, equivalently
F ( X |X
tn t1 ,… , X t n−1 )(
x n|x 1 ,… , x n−1 )=F ( X |X ) ( x n|x n−1) .
tn t n−1
2.10 Gauss-Markov Processes – The fundamental
1) Dynamics
x k+1=Φk x k + wk ( 2.36)
- State x k , Φ k is a known matrix, w k is a Gaussian Random sequence.

2) Given Conditions
a) Noise
E [ w k ]=w k
T
E {( wk −w k ) ( wl−wl ) =W k δ kl
where
{
δ kl = 1 , k=l
0,k≠l
b) The states
E [ x 0 ] =x 0
E [ ( x 0−x 0 ) ( x 0−x 0 )T ]=P 0
c) The correlation
E [ ( x 0−x 0 ) ( wk −wk ) ]=0 ∀ k ,( 2.37)

T
which implies
E [ ( x k − xk ) ( w j−w j ) ]=0 ∀ j ≥ k
T
3) The mean and covariance
 The mean
x k+1 =Φk x k + w k ( 2.38)
 The covariance
P K+1 =Φk Pk Φ Tk +W k
2.11 Non-linear Stochastic Difference Equations  skip

2021 - Week - 3 - Ch.2 Random Process

Uploaded by

Copyright:

Available Formats

2021 - Week - 3 - Ch.2 Random Process

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 - Week - 3 - Ch.2 Random Process

Uploaded by

Copyright:

Available Formats

2.

Random Variables and Stochastic Process

2.4 Probabilistic Concepts Applied to Random Variables

 Joint Probability Distribution

 Joint Probability Density

 Marginal Probability Density

 The marginal probability distribution

%%% Kim’s Example

Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )

Ok. That is, by definition, a continuous case. In discrete case

T H = { ω :high temperature } ,T L= { ω :low temperature }

P H ={ ω :high pulse rate } , P L ={ω :low pulserate }

The probabilities of the outcomes are

You may compare the continuous case. %%%

%%% Kim’s Example for continuous case

One the necessary condition is

2) Find marginal probabilities f ( x ) , f ( y )

2) Find marginal probabilities f ( x )∧f ( y )

2.5 Functions of a Random Variable -skip

Y ( ω ) =g ( X ( ω ) ) has the density function

X ( ω ) f Y ( y )=f X ( g−1 ( y ) ) ∨J ( y )∨¿

2.6 Expectations and Moments of a Random Variable

 The sample mean

Then the mean of the sample mean is

Example. 2.19. X is uniformly distributed from 0 to 1, i.e.,

1) The operator of expectation is linear

E ( aX +bY )=aE ( X )+ bE(Y )

3) The higher order moment

var ( X )=E [( X−E ( X )2) ]=E ( X 2 )−E ( X )2

5) The standard deviation

6) The sample variance

%%% Kim’s comment: biased and un-biased estimator

We may classify the estimator as

1) Unbiased estimator / biased estimator

If C=E( X) , then C is the unbiased estimator, otherwise the biased estimator

2) The minimum variance estimator /the least square error estimator

min E ( X −a )2=E ( X ) (c)

Examp. 2.24 The uniform distributed random variable X , [ 0 1 ]

Def: Two R.V. are uncorrelated if

Cov ( X , Y )=E ( X−E ( X ) )( Y −E ( Y ) )=E ( XY ) −E ( X ) E (Y )=0

 Prop. 2.29. Uncorrelated Gaussian random variables are independent

m y =E ( Y ) =E (CX +V ) =E (CX ) + E ( V )=CE ( X )+ E ( V ) =C mx

E [ ( CX+ V )( CX +V ) ]=E [ CX X C +CX V +V CX +V V ]

- However, in Gaussian Does satisfy the opposite direction. %%%

%%% Kim’s comment: covariance matrix

X =( x , y , z ) is a random vector, its components are random variables ( x , y , z ) Then the

 The covariance of a uncorrelated (so independent) Gaussian is a diagonal matrix,

%%% Kim’s Comment: Linear matrix theory: similar transform

diag ( Λ )=SM S−1=SM ST

 Independency is important to calculate the probability. You know the Gaussian

 The central limit theorem

is a Gaussian distribution with mean 0 and variance 1 in the limit as n → ∞

- Proof : textbook P.52

2.8 Conditional Expectations and Conditional Probabilities

 The conditional expectation

- E [ X ] is a constant, means it is not random variable.

E X [ X ] =EY [ E X [ X|Y ] ]−→ need f Y ( y ) , E x [ X|Y ] (¿ f X ∨Y ( x| y ) )

Even if we do not know f X ( x ).

1) E [ X|Y ]=E [ X ] if X ∧Y areindependent