Expected Values Expected Values: Alberto Suárez
Expected Values Expected Values: Alberto Suárez
Expected Values Expected Values: Alberto Suárez
E[g (X )] = dx pdf (x ) g (x )
Alberto Surez
E[g (X )]
X (t ) = {X i (t )} ;
1
X i (t ) =
0
si , s j S
otherwise
Markov property
t =Ttransient +1
|S |
i =1
(
P (X (t ) = s
g (X ) ;
1
T Ttransient
P(t ) : Pi (t ) = P (X (t ) = si );
P (t ) = 1
i
i
|S |
n 1/
W n j i > 0
si S : Di {n 1 / W n ii > 0} g.c.d .( Di ) = 1
m 1 W i j > 0
Irreducible
n
n 1 W ji > 0
W m +n ii = W m ik W n k i W m i j W n ji > 0
k
W m +n +1ii = W m ik Wk l W n l i W m i j W j j W n ji > 0
k ,l
W =
the stationary distribution is an eigenvector of W, with eigenvalue 1.
An arbitrary initial distribution converges to the stationary distribution
provided that all other eigenvalues of W are smaller than 1 in absolute value.
W = ; W v
(n)
= n v ; n = 2,3,K, | S |; 1 > 2 3 K |S | ;
( n)
P(t ) = W P (0);
t
Detailed balance
Theorem (Feller,1950) Let W be the transition matrix of a finite Marko
Chain, which is irreducible and aperiodic.
The equation W = (Global detailed balance) has a unique
solution, which is the stationary distribution
A sufficient condition for convergence to the stationary distribution
for a homogeneous Markov Chain is that its transition matrix W
satisfies local detailed balance (reversibility)
|S |
W j i i = Wi j j
P(0) = + n (0) v ( n ) ;
n=2
|S |
P(t ) = + n (0)tn v ( n )
n=2
limt P(t ) =
W j i i = Wi j j W j i i = j
Metropolis--Hastings algorithm
Metropolis
(Y ) q ( X t | Y )
(X t , Y ) = min 1,
( X t ) q (Y | X t )
Metropolis--Hastings algorithm
Metropolis
(convergence)
Pseudocode:
P(X t +1 | X t ) = q (X t +1 | X t ) (X t , X t +1 ) + ( X t +1 X t ) 1 dy q(y | X t ) (X t , y )
Chapman-Kolmogorov eqn:
End
(x)
q(y | x ) = q(x | y ) (x, y ) = min 1,
(y )
Random-walk Metropolis q(y | x ) = q(x | y ) = q ( x y )
If steps |Y-Xt| generated by q(|Y-Xt|) are either too small or too
large, the chain may have poor mixing properties.
Independence Sampler:
w(x)
(x)
; w(x) =
q (y | x ) = q(y ) (x, y ) = min 1,
w
q ( x)
(
)
y
T
dx dx
x = mode of [ ( x )]
Metropolis--Hastings algorithm
Metropolis
(variants)
Variations:
Update blocks of components
Random updating order:
If one component is modified, then update with larger probability the
components that are highly correlated with it.
Metropolis--Hastings algorithm
Metropolis
(implementation)
Length of burn-in
End
Stopping time:
Estimate the variance of the expected value that is being calculated
Variance estimates are easiest if multiple chains are run.
Simulated annealing
Global optimization in many dimensions.
Physical annealing: Minimize free energy
Heat up a solid until it melts.
Cool down slowly until crystal is formed.
1
( si , sl* );
*
|S | l
Ei E j
Tk
If exp
> u
Then i:= j;
k:= k+1;
calculateLength(Lk); calculateTemperature(Tk)
Until stopCriterion;
End;
W ji (T ) =
1 G ji Aji (T ) j = i
l i
1 if s j is in the neighorhood of si
1
ji ; i = ji ; ji =
i
j
0 otherwise.
(E j Ei )+
;
Aji (T ) = exp
T
E
qi (T ) = exp i ;
T
A j i (T )qi (T ) = Ai j (T )q j (T ) W j i (T )qi (T ) = Wi j (T )q j (T );
Tk
( L + 1)
;
log(k + 2)
( si ) =
1
( si , sl* );
| S* | l
si , s j S ; p 1 /
sl0 , sl1 , K , sl p S ; l0 = i; l p = j;
Consider si , s j S and
[W
(T )
j i
p 1
j k p1
k1 ,k 2 ,K,k p 1
(T ) Wk p 1 k p 2 (T ) K Wk2 k1 (T ) Wk1 i (T )
1 G j i
k i , j
k i
Aji (T ) < 1;
k i , j
= 1 Gk i = Gi i > 0
k i
k i
Al i (T ) 1, l j
(T ) = 1 G j i Aj i (T )
k i , j
k i
Ak i (T ) >
Wi i (T ) > 0
Annealing schedules
Kirkpatrick, Gelatt & Vecchi (1982,1983)
Choose T0 large enough so that most transitions are accepted.
Start with a small value of T0
k:=0; Choose >1
Repeat
T0 = T0;
Until accepance ratio is sufficiently close to 1.
Tk+1 = Tk ; [0.8,0.99]
Lk is sufficiently large so that for each value of Tk quasi-equilibrium
obtains ( | Lk | < Lmax so that long chains are avoided at low T)
Stop criterion: Value of cost function does not change in a specified
number of epochs.
Important
For sufficiently large numbers of individuals the algorithm improves the
average fitness of the population.
Convergence is not guaranteed (not even in principle).
The hardest part is the coding scheme.
Bibliography
Markov Chain Monte Carlo in practice
W. R. Gilks, S. Richardson and D. J. Siegelhalter
Chapman & Hall, London 1996.
Simulated Annealing and Boltzmann Machines
E. Aarts and J. Korst
Wiley-Intescience, New York 1990
Genetic algorithms in search, optimization, and
machine learning
David E. Goldberg
Addison-Wesley, Reading,Mass 1989