2021 - Week - 3 - Ch.2 Random Process

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

2.

Random Variables and Stochastic Process

2.4 Probabilistic Concepts Applied to Random Variables

 Joint Probability Distribution

F ( x 1 , x 2 , … , x n ) =P ( X 1 ≤ x 1 , X 2 ≤ x 2 , … , X n ≤ x n )

 Marginal Probability

F ( x 1 , x 2 , … , x k ) =P( X 1 ≤ x 1 , X 2 ≤ x 2 , X k ≤ x k , ∞ , … , ∞)

 Joint Probability Density


n

f ( x1 , x2 , … , xn )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 n n

 Marginal Probability Density


n

f ( x1 , x2 , … , xk )= F ¿ X = x , … X =x
∂ x1 ∂ x2 , … ∂ xn 1 1 k k

 The marginal probability distribution


x1 xk ∞ ∞
F ( x 1 , x 2 , … , x k ) =∫ … ∫ … ∫ … ∫ f ( s 1 , s2 , … s n ) d s1 … d s n
−∞ −∞ −∞ −∞

%%% Kim’s Example

Given ( X , Y ) , PDF f ( x , y ) find the marginal f ( x ) , f ( y )


∞ ∞
f ( x )= ∫ f ( x , s ) ds , f ( y )= ∫ f ( s , y ) dy
−∞ −∞

Ok. That is, by definition, a continuous case. In discrete case

Given a group of people, there are two experiments as measurement of temperature and
blood pulse rate. Let us denote ω as one person

T H = { ω :high temperature } ,T L= { ω :low temperature }

P H ={ ω :high pulse rate } , P L ={ω :low pulserate }

The probabilities of the outcomes are

P ( T H ∩ P H }=0.4 , P ( T L ∩ PH ) =0.2

P ( T H ∩ P L }=0.3 , P ( T L ∩ P L )=0.1
Then the marginal probabilities are

P ( T H )=P ( T H ∩ P H ) + P ( T H ∩ P L ) =0.4+0.3=0.7

And

P ( T L ) =P ( T L ∩ P H )+ P ( T H ∩ P L ) =0.2+0.1=0.3

You may compare the continuous case. %%%

%%% Kim’s Example for continuous case


Given the joint probability as

{
f ( x , y )= 2, 0< x , 0< y , x + y <1
0 otherwise
1) Is it a PDF ( or CDF) ?

One the necessary condition is


1.5

0.5
dx
dy

-0.5 ∞ ∞ 1

∫ ∫ f (x , y )dxdy =∫ ¿ ¿
-0.5 0 0.5 1 1.5

−∞ −∞ 0

2) Find marginal probabilities f ( x ) , f ( y )


∞ 1− y
f ( y ) =∫ f ( x , y )dx= ∫ 2 dx=2 ( 1− y ) , 0< y <1
−∞ 0

∞ 1−x
f ( x )= ∫ f (x , y ) dy= ∫ 2 dy=2 ( 1−x ) , 0< x< 1
−∞ 0

3) Is ( X , Y ) independent? No since
2=f ( x , y ) ≠ f ( x ) f ( y )=4 (1−x)(1− y)
 HA_2_1 :

{
f ( x , y )= 2 x + y=1 , 0< x <2 , 0< y <1
0 otherwise
%%%

f(x,y)
1

1) Is this a PDF? 2 x

2) Find marginal probabilities f ( x )∧f ( y )

%%%

Def 2.16. Two random variables X and Y are called independent if any event of the
form X ( ω ) ∈ A id independent of any event of the form Y ( ω ) ∈ A where A , B are
sets in Rn

 Fact

P ( X ∈ A , Y ∈ B )=P ( X ∈ A ) P (Y ∈ B )
 The joint probability distribution

F ( x , y ) =P ( X ≤ x ,Y ≤ y )=P ( X ≤ x ) P ( Y ≤ y )=F ( x ) F ( y )
 The joint probability density function

∂2 ∂ ∂ ∂F ∂F
f XY ( x , y )= F ¿ X = x, Y = y¿ F ¿ X = x F ¿Y = y ¿ ¿X = x ¿
∂ x∂ y ∂x ∂ y ∂x ∂ y Y=y
¿ f X ( x) f Y ( y )

2.5 Functions of a Random Variable -skip

Y ( ω ) =g ( X ( ω ) ) has the density function

X ( ω ) f Y ( y )=f X ( g−1 ( y ) ) ∨J ( y )∨¿

Where ¿ J ( y )∨¿ stands for the absolute value of the determinant of the matrix

[ ]
−1 −1
∂ g1 ∂ gn

∂ y1 ∂ y1
J ( y )= ⋮ ⋱ ⋮
−1 −1
∂ g1 ∂ gn

∂ yn ∂ yn Y=y

2.6 Expectations and Moments of a Random Variable

Def.
 The mean

E [ X ] = ∫ xf ( x ) dx (a)
−∞

 The sample mean


n
1
mk = ∑ X k (1)
n k=1
%% The sample mean is a Random Variable! It is an estimator of the mean of a random
variable X . If X k is an independent identical distributed (iid) random variable i.e.,

E ( X k )=m ∀ k ,

Then the mean of the sample mean is

( )
n n
1 1 1
E ( m k ) =E ∑
n k =1
X k = ∑ E ( X k )= ( nm ) =m(b)
n k=1 n

%%%Kim’s Comment

What is the difference between a) and b)? In order to use (a) , it is needed know the
probability density function, whereas in (b), not needed.

Example. 2.19. X is uniformly distributed from 0 to 1, i.e.,

{
f ( x )= 1 ,0 ≤ x ≤ 1
0 , otherwise
∞ 1
1
Then E ( X ) = ∫ xf ( x ) dx=∫ xdx= 2
−∞ 0

Examp. 2.22 The expectation of the value of one roll of one die?

Properties

1) The operator of expectation is linear

E ( aX +bY )=aE ( X )+ bE(Y )


2) The square mean / second moment

E ( X ) =∫ x f (x ) dx
2 2

−∞

3) The higher order moment



E ( X )=∫ x f (x) dx
n n

−∞

4) The variance

var ( X )=E [( X−E ( X )2) ]=E ( X 2 )−E ( X )2

5) The standard deviation

σ X = √ var (X )

6) The sample variance


n
1
σ =
2
n ∑
n−1 k=1
( X k −mn )
2

2
This is a random variable. And the unbiased estimator of σ X

%%% Kim’s comment: biased and un-biased estimator

What is the estimator? Let X be a RV. I want to find a constant “C” as RV in some sense.

We may call C as an estimator of the RV X . So there may be many estimators as you like.

We may classify the estimator as

1) Unbiased estimator / biased estimator

If C=E( X) , then C is the unbiased estimator, otherwise the biased estimator

2) The minimum variance estimator /the least square error estimator

C=min E ( X −a )2
a

3) The mean of X is the minimum variance estimator / the least square error estimator.

min E ( X −a )2=E ( X ) (c)


a

Proof:

d
( E ( X−a )2 )= d ( E X 2 +a 2−2 aE ( X ) ) =2 a−2 E ( X )
da da
 a=E( X) , which minimizes the (c).

Examp. 2.24 The uniform distributed random variable X , [ 0 1 ]


1
1
E ( X ) =∫ x dx=
2 2

0 3
The Variance is

1 1 1
var ( X )=E ( X ) −E ( X ) = − =
2 2
3 4 12
2.7 Characteristic Functions -skip

Lemma 2.27

1 n
d ϕX (υ )
jn
E [ X ]=
n
|ν=0
d υn
Prop.2.28 If X is a Gaussian random vector with mean, m, and covariance matrix P,
then its characteristic function is

( 1
ϕ X ( υ )=exp j υ T m− υT Pυ .
2 )
%%% Kim’s comment: correlation

Def: Two R.V. are uncorrelated if

Cov ( X , Y )=E ( X−E ( X ) )( Y −E ( Y ) )=E ( XY ) −E ( X ) E (Y )=0

%%%

Fact: Two Gaussian Random Vectors are uncorrelated if Cov ( X , Y ) is a diagonal matrix

 Prop. 2.29. Uncorrelated Gaussian random variables are independent

Theorem 2.30. If X is a Gaussian random vector with mean m X , and covariance, P X , and if
Y =CX +V , where υ is a Gaussian random vector with zero mean and covariance, PV , then Y
is a Gaussian random vector with mean, C m , and covariance, C P X C + PV .
T
X

 Theorem 2.30

A R.V X N (m x , P X ) , another R.V. V N ( 0 , PV ) and they are independent. Find mean and
covariance of Y =CX +V

%%% Kim’s comment: Characteristic function is difficult to remember. In the text book, using
the characteristic method. In this case we may apply basic theory.
Sol: Let’s apply the basic definition.

m y =E ( Y ) =E (CX +V ) =E (CX ) + E ( V )=CE ( X )+ E ( V ) =C mx


PY =E [ ( Y −m y ) ( Y −m y )T ]=E [ Y Y T ]−m y mTy =E [ ( CX +V )( CX +V )T ]−C mx mTx CT

E [ ( CX+ V )( CX +V ) ]=E [ CX X C +CX V +V CX +V V ]


T T T T T T

¿ E¿
¿ E¿
¿ E¿
Hence

PY =E [ ( Y −m y ) ( Y −m y ) ]=E ¿
T

T
¿ C P X C + PV
- In general, independency implies the uncorrelated, not vice versa

- However, in Gaussian Does satisfy the opposite direction. %%%

%%% Kim’s comment: covariance matrix

Sometimes, but most case in this course, we may deal with a random vector whose
components are random variables, i.e.

X =( x , y , z ) is a random vector, its components are random variables ( x , y , z ) Then the


covariance of X random vector is defined ad

[ ]
cov ( x , x ) cov ( x , y ) cov ( x , z)
Cov ( X )= cov ( x , y) cov ( y , y) cov ( y , z )
cov (x , z) cov ( y , z) cov (z , z )

Where,

cov ( x , y )=E ¿
hence by definition

E¿
Therefore, the matrix Cov (X ) is a symmetric matrix, i.e.,
T
Cov ( X )=[ Cov ( X ) ]

The diagonal terms of the covariance matrix are variance of each random variable

%%%

 The covariance of a uncorrelated (so independent) Gaussian is a diagonal matrix,

[ ]
σx 0 0
PX= 0 σ y 0
0 0 σz

%%% Kim’s Comment: Linear matrix theory: similar transform


For any semi-positive symmetric matrix M , there is a similar transform matrix S such
that

diag ( Λ )=SM S−1=SM ST


Hence the covariance P X for any gaussian Random vectors (correlated), there exits a S
such that

diag ( Λ X )=S P X ST

 Any Gaussian Random vectors, we can find a transformed Random Vectors which is
uncorrelated (independent).

 Independency is important to calculate the probability. You know the Gaussian


probability table, but it is a scalar. So it you want to calculate the joint probability
which may be correlated, first find a similar transform matrix to generate a diagonal
covariance matrix. Then you may calculate the joint probability as a separate
probability.

%%%

 The central limit theorem

Theorem 2.31. Let X 1 , … , X n be i.i.d. random variables with finite mean and variance,
n
E [ X k ] =m< ∞ , E [ ( X k −m) ] =σ < ∞, and denote their sum as Y n ≔∑ X k. Then the
2 2

k=1
distribution of the normalized sum

Y n−E [ Y n ] Y n−nm
Zn≔ =
√ var ( Y n ) σ √n

is a Gaussian distribution with mean 0 and variance 1 in the limit as n → ∞

- Proof : textbook P.52

- Remarks:

[ ]
1) See, the condition, E [ X k ] =m< ∞ , E ( X k −m ) =σ < ∞, that means
2 2

the mean and the variance is constant, but the experiment is many time
processing. For example,

a) A die, which is fair or not, you roll the same die many times. Then
Yn 1 n
the mean of the sum ( = ∑ X k) is a Gaussian if n−→ ∞.
n n k=1
2) Some RV has no mean, then it will not be applicable.

2.8 Conditional Expectations and Conditional Probabilities

 The conditional expectation



E [ X|Y ]= ∫ x f ( x| y ) dx
−∞

f (x , y)
f ( x| y )=
f ( y)
 Remarks

- E [ X ] is a constant, means it is not random variable.


- E [ X| y =2 ] if y is a constant, then E [ X| y =2 ] is a constant
- E [ X|Y ] if Y is a RV, then E [ X|Y ] is a Random Variable of y
 Iterated expectation (See the proof at p.57 and remember)

E [ X ] =E [ E [ X|Y ] ]
%%% Kim’s comment

E [ X ] =E X [ X ] −→ need f X ( x)

E X [ X ] =EY [ E X [ X|Y ] ]−→ need f Y ( y ) , E x [ X|Y ] (¿ f X ∨Y ( x| y ) )

Even if we do not know f X ( x ).

I should say, this formula cannot emphasize too much! This very simple fact use diverse
applications, big data, machine learning, and dynamic system analysis. We should remember this.

%%%

 Lemma 2.34.

1) E [ X|Y ]=E [ X ] if X ∧Y areindependent

2) E [ g ( Y ) X|Y ]=g ( Y ) E [ X∨Y ]

2.9 Stochastic Process

 Def. 2.36. A stochastic process is a family of random variables, X ( ω , t), indexed by a real
parameter t ∈ T and defined on a common probability space ( Ω, A , P).

%%% Kim’s comment

A stochastic process (or random process) is a time varying random variable, i.e., for any fixed t ,
the process is a random variable.

%%%

 Ex. 2.37

X ( ω , t )= A ( ω ) sin t , A ( ω ) ∈ U [−1,1 ]
 Def. 2.38.
1) A stochastic process X (ω , t) is said to be continuous in probability at t if

lim P (¿ ω :|X ( ω ,t )−X ( ω , s )|≥ ϵ )=0 ¿


s →t

for all ϵ >0

2) Skip: A stochastic process X ( ω , t ) is said to be separable if there exists a


countable, dense set S ⊂ T such that for any closed set K ⊂ [ −∞ , ∞ ] thetwo sets

A 1 = { ω : X ( ω , t ) ∈ K ∀ t ∈ T } , A 1= { ω : X ( ω , t ) ∈ K ∀ t ∈ S } ,

differ by a set A0 such that P ( A 0 ) =0.

 Skip: Theorem 2.40. The rational numbers in T provide a separating set S.

 Def. 2.42. Let X be a random process defined on the time interval, T. Let

t 0 <t 1 …<t n be a partition of the time interval, T. If the increments,


X ( t k )− X ( t k−1 ) , are mutually independent for any partition of T, then X is said to
be a process with independent increments.

 Def. 2.43 We say that a random process, X, is a Gaussian process if for every
finite collection, X t 1 , X t2 , … , X tn ,the corresponding density function,

f (x 1 , x 2 , … , x n)
is a Gaussian density function.

 Def. 2.44 We say that a random process X is a Gaussian process if every finite
linear combination of the form
N
Y =∑ α j X (t j )
j=1

is a Gaussian random variable

 Def 2.45. A random process{ X t , t ∈T }, where T is a subset of the real line, is


said to be a Markov process if for any increasing collection
t 1< t 2 <, … ,<t n ∈ T

P ( X t < x n| X t =x n−1 ,… , X t =x 1 ¿=P( X t ≤ x n∨ X t =x n−1)


n n−1 1 n n−1

or, equivalently

F ( X |X
tn t1 ,… , X t n−1 )(
x n|x 1 ,… , x n−1 )=F ( X |X ) ( x n|x n−1) .
tn t n−1

2.10 Gauss-Markov Processes – The fundamental

1) Dynamics

x k+1=Φk x k + wk ( 2.36)

- State x k , Φ k is a known matrix, w k is a Gaussian Random sequence.


2) Given Conditions

a) Noise

E [ w k ]=w k
T
E {( wk −w k ) ( wl−wl ) =W k δ kl

where

{
δ kl = 1 , k=l
0,k≠l

b) The states

E [ x 0 ] =x 0

E [ ( x 0−x 0 ) ( x 0−x 0 )T ]=P 0

c) The correlation

E [ ( x 0−x 0 ) ( wk −wk ) ]=0 ∀ k ,( 2.37)


T

which implies

E [ ( x k − xk ) ( w j−w j ) ]=0 ∀ j ≥ k
T

3) The mean and covariance

 The mean

x k+1 =Φk x k + w k ( 2.38)

 The covariance

P K+1 =Φk Pk Φ Tk +W k

2.11 Non-linear Stochastic Difference Equations  skip

You might also like