Joint Probability Distribution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

JOINT PROBABILITY distribution:

An expt. may have two sets of random o/cs—


x1 , x2 , x3 ,.... and y1 , y2 , y3 ,....
Example--- target practice hit position described by 2 coordinates (rvs)
X & Y.
Then X & Y be two discrete rv’s defined on the sample space of an
experiment.
The joint PMF W(x, y) is defined for each pair of values (x, y) assumed by X
&Y , is given by

W xi ,=
yj (( X x=
P= i ,Y yi ) )
Note that as usual, the comma means "and,".
y x
F ( x , y ) = P ( X ≤ x ,Y ≤ y ) = ∑ ∑ P(X =
y j = −∞ xi = −∞
xi ,Y = y j )
W ( x , y ) ≥ 0;
+∞ +∞

∑ ∑ P (=
X
y j = −∞ xi = −∞
xi = )
,Y y=
j 1.

The cdf is

y x
F ( x , y ) = P ( X ≤ x ,Y ≤ y ) = ∑ ∑ P(X =
y j = −∞ xi = −∞
xi ,Y = y j )
The marginal PMF of X, denoted by W (x), is given by

W ( x ) = ∑ W ( x , y ) ; for each possible x


y

Similarly, the marginal probability mass function of Y is

W ( y ) = ∑ W ( x , y ) ; for each possible y


x
.
 Say, X can take on a continuum of values from –∞ ≤ x ≤ +∞.
Y can take on a continuum of values from –∞ ≤ y ≤ +∞.
Event A = { X ≤ x} ; Event B = { Y ≤ y}

P(A,B)= P (X ≤ x , Y≤ y ) = F(x,y) is the joint probability distribution


function.
1 . 0 ≤ F(x, y) ≤ 1; - ∞ < x < ∞ ; –∞ < y < +∞
2. F(-∞, y) = F(x, -∞) = F(-∞, -∞) = 0
3 . F(∞,∞) = 1
4. F(x, y) is a non-decreasing function, as either x or y, or

both ↑ .

Joint probability density function is


∂ 2 F ( x, y )
f ( x, y ) =
∂x∂y
1. f (x , y) ≥ 0 - ∞ < x < ∞ ; - ∞ < y < ∞
+∞ +∞
2. ∫ ∫ f ( x , y ) dxdy = 1.
−∞ −∞

Implies that the volume beneath any joint pdf must be unity.

y x

3. F ( x , y ) = ∫ ∫ f ( u, v ) dudv
−∞ −∞
+∞ +∞

=4. f ( x ) f ( x , y ) dy ; f ( y )
∫= ∫ f ( x , y ) dx .
−∞ −∞

x 2 y2

5. P ( x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 ) =∫ ∫ f ( x , y ) dydx
x1 y1

+∞ x

F ( x) = ∫ ∫ f ( u, v ) dudv
−∞ −∞
6. +∞ y

F ( y) = ∫ ∫ f ( u, v ) dvdu
−∞ −∞
Let Z= Φ (X,Y)
Suppose if x ≤ X ≤ x + dx and y ≤ Y ≤ y + dy ,
then, z ≤ Z ≤ z + dz .
Then, P ( z ≤ Z ≤ z + dz )
= P ( x ≤ X ≤ x + dx , y ≤ Y ≤ y + dy )
or ,
f ( z )dz = f ( x , y )dxdy
+∞
+∞ +∞
E(z)
= ∫=
−∞
zf ( z )dz ∫ ∫
−∞ −∞
φ ( x , y ) f ( x , y )dxdy

If Z = φ ( x , y ) = Xj Yl , where j& l may assume any +ve integral values,


+∞ +∞
E ( Z ) E=
= ( X jY l ) ∫ ∫
−∞ −∞
x j y l f ( x , y )dxdy
is the joint moment.

If j=l=1, i.e. if Z=XY,


+∞ +∞
E ( XY ) = ∫ ∫ xyf ( x , y )dxdy
−∞ −∞

is the correlation or covariance-moment of X & Y .


+∞ +∞ +∞

E[X]
= xf ( x )dx ∫ x ∫ f ( x , y ) dy dx
∫=
−∞ −∞ −∞

 &
+∞ +∞ +∞

E [Y ]
= yf ( y )dy ∫ y ∫ f ( x , y ) dx dy
∫=
−∞ −∞ −∞

X & Y are s-independent if the occurrence of any o/c X=x does not
influence the occurrence of any o/c Y=y & vice versa.
Necessary & sufficient condn :
y ) P ({ X ≤ x}  {Y ≤ y} )
P ( X ≤ x,Y ≤ =
 { X ≤ x} & {Y ≤ y} are s-independent,
P ({ X ≤ x}  {Y ≤ y} )= P ({ X ≤ x} ) . P ({Y ≤ y} )
) P ( X ≤ x ) . P (Y ≤ y )
P ( X ≤ x , Y ≤ y=

F ( x , y ) = F ( x ) .F ( y )
y x x y

∫ ∫ f ( λ , s ) d λ ds = ∫ f ( λ ) d λ . ∫ f ( s ) ds
−∞ −∞ −∞ −∞

Similarly,
W ( x , y ) = W ( x ) .W ( y ) for drv.

Therefore, necessary & sufficient condn for s-independence is


f(x,y) = f(x). f(y)

If X & Y are s-independent, then,


+∞ +∞
E ( XY ) = ∫ ∫ xyf ( x ) f ( y )dxdy
−∞ −∞
+∞ +∞

∫ xf ( x )dx ×
=
−∞

−∞
yf ( y )dy =
E ( X ) × E (Y )

Under this condn, X & Y are also said to be uncorrelated.

If E(XY) = 0, then X & Y are said to be orthogonal.


Centred RVs X─µX & Y─µY.
Correlation of these centred RVs is known as covariance of X & Y.
Thus,

Cov ( X , Y ) = E ( X − µ X )( Y − µY ) 
+∞ +∞
=∫ ∫ ( x − µ )( y − µ )
−∞ −∞
X Y f ( x , y )dxdy
Cov ( X , Y ) =E [ XY ] − µ X E [Y ] − µY E [ X ] + µ X µY
= E [ XY ] − µ X µY

Correlation coefficient for X & Y is given by


Cov ( X , Y ) E [ XY ] − µ µ X Y
=ρ =
σ Xσ Y σ Xσ Y

0 ≤ ρ ≤1

 If two RVs X & Y are s-independent, they are also s-


uncorrelated.
 However if X & Y are uncorrelated random variables, they
are not necessarily s-independent.
Example:
Let the RV X denote the time until a computer server connects to your
machine (in millisecond), and let Y denote the time until the server
authorizes you as a valid user (in millisecond). Each of these random
variables measures the wait from a common starting time and X < Y.
Assume that the joint pdf for X & Y is

f ( x, y ) =
6 × 10−6 exp ( −0.001 x − 0.002 y ) ; for x < y
Reasonable assumptions can be used to develop such a distribution, but for
now, our focus is only on the joint probability density function. The region
with nonzero probability is shaded in Fig 5.8.
The property that this joint pdf integrates to 1 can be verified by the
integral of f(x, y) over this region as follows:

The probability that X<1000 & Y < 2000 is determined as the integral over
the darkly shaded region in Fig. 5-9.
The probability that Y exceeds 2000 milliseconds is determined as the
integral of f (x, y) over the darkly shaded region in Fig. 5-10.

The region is partitioned into two parts & different limits of integration
are determined for each part.
Jointly Gaussian Random Variables:
As with single RVs, the most common & important example of a
two-dimensional probability distribution is that of a joint Gaussian
distribution.
.
DEFINITION: A pair of RVs X & Y are said to be jointly Gaussian if their
joint pdf is of the general form

where μX & μY are the expectations of X& Y, respectively; σX and σY are the
standard deviations of X & Y, respectively; and ρXY is the correlation
coefficient of X & Y.
It is left as an exercise for the reader to verify that this joint PDF results in
marginal pdfs that are Gaussian. That is

The concepts of PMF, conditional distribution function,


and PDF are easily extended to an arbitrary number of
RVs. Their definitions follow.

 For a set of N number of RVs X1,X2, . . . ,XN, the Nth order joint
PMF, CDF, and PDF are given respectively by
W ( x K 1 , x K 2 , x K 3 , .....,=
xkN ) P=
( X 1 x K=
1, X2 2, X3
x K= x K 3 , .....,
= X N xkN )

F ( x1 , x2 , x3 , ....., x N ) = P ( X 1 ≤ x1 , X 2 ≤ x2 , X 3 ≤ x3 , ....., X N ≤ x N )

and
∂ N F ( x1 , x2 , x3 , ....., x N )
f ( x1 , x2 , x3 , ....., x N ) =
∂x1∂x2 ∂x3 .....∂x N

Marginal CDFs can be found for a subset of the variables by evaluating the
joint CDF at infinity for the unwanted variables. For example,

F (=
x1 , x2 , x3 , ....., x M ) F ( x1 , x2 , x3 , ....., x M , ∞ , ∞ , ...., ∞ )

Marginal PDFs are found from the joint PDF by integrating out the
unwanted variables. Similarly, marginal PMFs are obtained from the joint
PMF by summing out the unwanted variables:
+∞

f ( x1 , x2 , x3 , ....., x M ) = ∫∫∫ ∫ ....∫ f ( x , x , x , ....., x ) dx


1 2 3 N M +1 dx M + 2 dx N
−∞

W ( x K 1 , x K 2 , x K 3 , ....., xkM ) = ∑ ∑ ∑W ( x K1 , x K 2 , x K 3 , ....., xkN )


K M +1 K M + 2 K N

For s-independence of three RVs X1, X2, and X3 we should have


f ( x1 , x2 , x3 ) = f ( x1 ) f ( x2 ) f ( x3 )

implying
f ( x1 , x2 ) ( x1 ) f ( x2 ) ; f ( x2 , x3 )
f= f ( x2 ) f ( x3 )
& f ( x2 , x3 ) = f ( x2 ) f ( x3 )

For X1, X2 ,….., XN to be s-independent, we should have


N
f ( x1 , x2 , x3 , ....., x N ) = ∏ f ( xi )
i =1

Conditional CDF, PDF and PMF:


For a discrete RV,
P (=
(X
P= K ,Y
x= yK )
X x K= K )
Y y=
P (Y = yK )
W ( xK , yK )
i .e . W ( x K y K ) =
W ( yK )
We refer to this as the conditional PMF of X given Y.

The conditional cdf of a continuous RV X given that an event M (related


to Y) has occurred, is
P ( X ≤ x, M )
F ( x M ) =P ( X ≤ x M ) =
P(M)
There are several different ways in which the given event M can be defined

in terms of Y. For example, M might be the event Y ≤ y and, hence, P (M)

would be just the marginal distribution function of Y-that is, F(y). From the
basic definition of the conditional distribution function, it would follow that
F ( x, y )
F ( x Y ≤ y )= P ( X ≤ x Y ≤ y )=
F ( y)

Another possible definition of M is that it is the event y1 < Y< y2. The
definition of conditional distribution now leads to
F ( x , y2 ) − F ( x , y1 )
) P ( X ≤ x y1 < Y < y2=)
F ( x y1 < Y < y2=
F ( y2 ) − F ( y1 )

In both of the above situations, P (M) > 0, i.e P(M) is non-zero.


However, the most common form of conditional probability is one in which
M is the event that Y = y ; in almost all these cases P (M) = 0, since Y is
continuously distributed. Since the conditional distribution function is
defined as a ratio, it usually still exists even in these cases.
It can be obtained by letting y1 = y and y2 = y + Δy and by taking a limit as
Δy0. Thus,
F ( x , y + ∆y ) − F ( x , y ) ∂
∆y ∂
F ( x y ) =F ( x Y =y ) =P ( X ≤ x Y =y ) =lim =
∆y → 0 F ( y + ∆y ) − F ( y )
∆y
+∞

∫ f ( u, y ) du
∴ F ( x y) =
−∞

f ( y)

Hence,
∂F ( x , y )
f ( x, y )
f ( x y=
) f (x Y ) ∂∂x F ( x y=) ∂f x(∂yy) =
= y=
f ( y)

Similarly,
f ( x, y )
f ( y x) = ; and consequently, we have the Bayes’ theorem
f ( x)

f ( x y) f ( y) f ( y x) f ( x)
f ( y x) = & f ( x y)
f ( x) f ( y)

We also have,
+∞ +∞

=f ( x) f ( x , y ) dy
∫=
−∞
∫ f ( x y ) f ( y ) dy
−∞

&
+∞ +∞

=f ( y) f ( x , y ) dx
∫=
−∞
∫ f ( y x ) f ( x ) dx
−∞

The conditional expectation of a RV X is defined as


+∞
E  X Y   E  X =
Y y=
 ∫ xf ( x y ) dx
−∞

Similarly,
+∞
E Y X   E Y =
X x=
 ∫ yf ( y x ) dy
−∞

One of the most common (and probably the simplest) use of conditional
density functions is that in which some observed quantity is the sum of two
quantities-one of which is usually considered to be a signal while the other
is considered to be a noise. Suppose, for example, that a signal X (t ) is
perturbed by additive noise N (t ) that the sum of these two, Y(t) , is the
only quantity that can be observed. Hence, at some time instant, there are
three random variables related by
Y = X+ N
and it is desired to find the conditional probability density function of X
given the observed value of Y-that is, f (x l y ) .
The reason for being interested in this is that the most probable values of
X, given the observed value Y, may be a reasonable guess, or estimate, of
the true value of X when X can be observed only in the presence of noise.
From Bayes' theorem this conditional probability is
f ( y x) f ( x)
f ( x y ) ==
f ( y)

But if X is given, as implied by f (y l x) , then the only randomness about Y


is the noise N, and it is assumed that its density function, fN(n ) , is known.
Thus, since
N = Y ‒ X, and X is given,
f ( y x ) = f N ( n =y − x ) = f N ( y − x )

 Another consequence of statistical independence is that conditional


probability density functions become marginal density functions.
That is,
f ( x, y) f ( x) f ( y)
( x y)
f= =
f ( y)
=
f ( y)
f ( x)

&
f ( x, y) f ( x) f ( y)
( y x)
f= =
f ( x)
=
f ( x)
f ( y)

For a set of N RVs X1,X2, . . . ,XN, the conditional PMF and PDF are
given by
(K1, X2
P X 1 x=
= x K 2 ,=
..., X M x KM= , X M + 2 x K M + 2 ,=
X M + 1 x K M + 1= ..., X N x KN )
=P ( X 1 x=K1, X2 ..., X N x KN )
x K 2 ,=
=
= (
P X M +1 x=K M +1 , X M + 2 x K=M +2
, ..., X N x KN )
(
i .e . W x K 1 , x K 2 , ..., x KM x K M +1 , x K M + 2 , ..., x KN )
W ( x K 1 , x K 2 , ..., x KN )
=
(
W x K M +1 , x K M + 2 , ..., x KN )
and
f ( x1 , x2 , ..., x N )
( )
f x1 , x2 , ..., x M x M +1 , x M + 2 , ..., x N =
f ( x M +1 , x M + 2 , ..., x N )

You might also like