0% found this document useful (0 votes)
5 views11 pages

Gibbs sampling algorithm from scratch using R programming language

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
5 views11 pages

Gibbs sampling algorithm from scratch using R programming language

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

Gibbs sampling algorithm

from scratch

using R programming language

By: Mehdi Hamedi, MD

Psychiatrist

https://www.linkedin.com/in/mehdihamedi/
Gibbs sampling algorithm is one type of Markov Chain Monte Carlo
methods to approximate posterior distribution, when direct sampling is
difficult or not possible.

So, in this article, we are going to calculating marginal likelihood using


R programming language from scratch without using any packages:

Bayes theorem

Let’s to restate Bayes theorem:

P ( θ | y )=( P ( y | θ ) × P ( θ ) ) / P ( y ) (1)

Posterior Likelihood prior Marginal Likelihood

In Bayesian inference, the last part or Marginal likelihood treated as a


constant value and P(y) is dropped. So, the mentioned equation usually
states as:

𝑃(𝜃| 𝑦) ∝ 𝑃(𝑦|𝜃) × 𝑃(𝜃))


Markov Chain Monte Carlo methods

Sometimes, it is not possible to find an analytical way to determine


posterior distribution or p(θ|y). In this conditions Markov Chain Monte
Carlo (MCMC) methods, can help us to have a good approximation
about p(θ|y).
MCMC methods are based on sampling of posterior distribution. The
more number of samples, the better estimation of posterior distribution.
A Markov Chain is formed by a random process that undergoes
transitions between states, where the state at time t + 1 depends only on
the state at time t. Thus, a Markov Chain is ahistorical or "memory-less"
because its current state is entirely independent of what happened more
than one step in the past.
MCMC methods are primarily used for calculating numerical
approximations of multi-dimensional integrals, for example in Bayesian
statistics, computational physics, computational biology and
computational linguistics.
In Bayesian statistics, the recent development of MCMC methods has
made it possible to compute large hierarchical models that require
integrations over hundreds to thousands of unknown parameters.
Metropolis–Hastings algorithm and Gibbs sampling are two well-known
Markov Chain Monte Carlo methods.
As said before MCMC methods are based on sampling of posterior
distribution. So, if the number of parameters increases, more samples are
needed to have the better estimation of posterior distribution and make
this process, computationally expensive.

Gibbs sampling algorithm

Gibbs sampling algorithm is one type of Markov Chain Monte Carlo


methods to approximate unknown distribution, when direct sampling is
difficult or not possible.

This algorithm was invented by brothers Stuart and Donald Geman in


1984 and named after the late 19th -century American physicist Josiah
Willard Gibbs.

This algorithm involves a Markov Chain of samples and converges on


the target "multivariate probability distribution" when given a
sufficiently large number of samples, same as Metropolis–Hastings
algorithm.

The key property of the Gibbs sampler is that it samples from


conditional distributions, which are often known even in situations in
which the joint density is not available for integration – as is required for
computation of the marginal likelihood (or "evidence").

This method can be used to approximate:


1. the joint distribution (e.g., to generate a histogram of the distribution);
2. the marginal distribution of one of the variables, or some subset of the
variables (for example, the unknown parameters or latent variables); or
3. to compute an integral (such as the expected value of one of the variables).

Under some necessary but plausible assumptions, this algorithm will


converge on the desired target distribution when sufficient samples are
obtained.

Suppose, we have a multivariate distribution with k parameters (x1,


x2,…,xk). The sampling process is:

x1(i+1) ~ f (x1| x2(i) , x3(i) , … , xk(i) )

x2(i+1) ~ f (x2| x1(i+1) , x3(i) , … , xk(i) )

xj(i+1) ~ f (xj| x1(i+1) , … , xj-1(i+1), xj+1(i) , … , xk(i) )

xk(i+1) ~ f (xk| x1(i+1) , x2(i+1) , … , xk-1(i+1) )

The regularity that all samples for xj are conditioned on the current
sample (i.e.,i+1) for all variables x1,..., xj−1, and on the previous sample
(i.e., i) for all variables xj+1,..., xk, should be immediately apparent.

Gibbs sampling algorithm from scratch


Now let’s to implement Gibbs sampling algorithm from scratch without
using any R programming language packages.

In this example, we have a bivariate normal distribution, which


supposed that the sampling process is difficult.

We first set up our bivariate structure by specifying the means (μ) and
standard deviations (σ) of two normal distributions, their correlation (ρ),
and their variance-covariance matrix (Σ).
#gibbs sampler

require(mvtnorm)
require(MASS)

nsamples <- 1000


rho <- .8
mux <- muy <- 0
sigx <- 1
sigy <- .5
sigma <- matrix(c(sigx^2,rho*sigx*sigy,rho*sigy*sigx,sigy^2), nrow=2)

We then draw a contour plot of the bivariate distribution:


#draw contour plot of known distribution
fiftyticks <- seq(from=-3, to =3, length.out=50)
y<-rep(fiftyticks,50)
x<-rep(fiftyticks,each=50)
z<-matrix( dmvnorm(cbind(y,x),c(mux,muy),sigma),50,50)
contour(list(x=fiftyticks,y=fiftyticks,z=z),
ylim=c(-3,3),xlim=c(-3,3),drawlabels=FALSE)
These lines perform the Gibbs sampling to estimate the parameters of
the same bivariate structure via MCMC. The core of the sampling
process is the for loop. Each iteration of this loop draws another pair of
conditional samples.
In bivariate normal distribution with x and y parameters, the conditional
distributions are:

𝜎𝑥
𝑥 𝑖+1 ~ 𝑓(𝑥|𝑦 𝑖 ) = 𝑁 (𝜇𝑥 + 𝜌 ( ) (𝑦 (𝑖) − 𝜇𝑦 ), √𝜎𝑥2 (1 − 𝜌2 ))
𝜎𝑦

𝜎𝑦
𝑦 𝑖+1 ~ 𝑓(𝑦|𝑥 𝑖+1 ) = 𝑁 (𝜇𝑦 + 𝜌 ( ) (𝑥 (𝑖+1) − 𝜇𝑥 ), √𝜎𝑦2 (1 − 𝜌2 ))
𝜎𝑥

In this case μx= μy . So we have:

𝜎𝑥 (𝑖)
𝑥 𝑖+1 ~ 𝑓(𝑥|𝑦 𝑖 ) = 𝑁 (𝜌 ( ) 𝑦 , √𝜎𝑥2 (1 − 𝜌2 ))
𝜎𝑦

𝜎𝑦 (𝑖+1)
𝑦 𝑖+1
~ 𝑓(𝑦|𝑥 𝑖+1
) = 𝑁 (𝜌 ( ) 𝑥 , √𝜎𝑦2 (1 − 𝜌2 ))
𝜎𝑥

#gibbs sampling
sxt1mr <- sqrt(sigx^2*(1-rho^2))
syt1mr <- sqrt(sigy^2*(1-rho^2))
rxy <- rho*(sigx/sigy)
ryx <- rho*(sigy/sigx)
xsamp <- ysamp <- rep(0,nsamples)
xsamp[1] <- -2
ysamp[1] <- 2
for (i in c(1:(nsamples-1))) {
xsamp[i+1] <- rnorm(1, mean=rxy*ysamp[i], sd=sxt1mr)
ysamp[i+1] <- rnorm(1, mean=ryx*xsamp[i+1], sd=syt1mr)
}

Then, the last few lines of code plots the last 500 sampled x−y pairs. We
discarded the first 500 as the usual burnin, although to illustrate the path
of the sampler from the starting value, we also plot and identify by
number the first five samples. Then check the mean, standard deviation,
and correlation of the samples:
points(xsamp[-c(1:500)],ysamp[-c(1:500)],pch=21,bg="red")
for (j in c(1:5)){
points(xsamp[j],ysamp[j]-.005,pch=21,cex=3.5,bg="white")
text(xsamp[j],ysamp[j],as.character(j))
}
cor.test(xsamp,ysamp)
sd(xsamp)
sd(ysamp)

bivn<-rmvnorm(1000,rep(0,2),sigma)
#some checks
apply(bivn,2,mean)
apply(bivn,2,sd)
cor.test(bivn[,1],bivn[,2])
And the checks:
> cor.test(xsamp,ysamp)

Pearson's product-moment correlation

data: xsamp and ysamp


t = 41.469, df = 998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7715231 0.8171653
sample estimates:
cor
0.7954696

> sd(xsamp)
[1] 1.028386
> sd(ysamp)
[1] 0.5167615
>
> bivn<-rmvnorm(1000,rep(0,2),sigma)
> #some checks
> apply(bivn,2,mean)
[1] -0.003818762 0.006641190
> apply(bivn,2,sd)
[1] 1.0365947 0.4919069
> cor.test(bivn[,1],bivn[,2])

Pearson's product-moment correlation

data: bivn[, 1] and bivn[, 2]


t = 42.18, df = 998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7769553 0.8216216
sample estimates:
cor
0.8003966

Now all integrated codes are illustrated below for your convenience:
#gibbs sampler

require(mvtnorm)
require(MASS)

nsamples <- 1000


rho <- .8
mux <- muy <- 0
sigx <- 1
sigy <- .5
sigma <- matrix(c(sigx^2,rho*sigx*sigy,rho*sigy*sigx,sigy^2), nrow=2)

#draw contour plot of known distribution


fiftyticks <- seq(from=-3, to =3, length.out=50)
y<-rep(fiftyticks,50)
x<-rep(fiftyticks,each=50)
z<-matrix( dmvnorm(cbind(y,x),c(mux,muy),sigma),50,50)
contour(list(x=fiftyticks,y=fiftyticks,z=z),
ylim=c(-3,3),xlim=c(-3,3),drawlabels=FALSE)

#gibbs sampling
sxt1mr <- sqrt(sigx^2*(1-rho^2))
syt1mr <- sqrt(sigy^2*(1-rho^2))
rxy <- rho*(sigx/sigy)
ryx <- rho*(sigy/sigx)
xsamp <- ysamp <- rep(0,nsamples)
xsamp[1] <- -2
ysamp[1] <- 2
for (i in c(1:(nsamples-1))) {
xsamp[i+1] <- rnorm(1, mean=rxy*ysamp[i], sd=sxt1mr)
ysamp[i+1] <- rnorm(1, mean=ryx*xsamp[i+1], sd=syt1mr)
}
points(xsamp[-c(1:500)],ysamp[-c(1:500)],pch=21,bg="red")
for (j in c(1:5)){
points(xsamp[j],ysamp[j]-.005,pch=21,cex=3.5,bg="white")
text(xsamp[j],ysamp[j],as.character(j))
}
cor.test(xsamp,ysamp)
sd(xsamp)
sd(ysamp)

bivn<-rmvnorm(1000,rep(0,2),sigma)
#some checks
apply(bivn,2,mean)
apply(bivn,2,sd)
cor.test(bivn[,1],bivn[,2])

Source:

Computational Modeling of Cognition and Behavior, By SIMON


FARRELL , STEPHAN LEWANDOWSKY

You might also like