Gibbs sampling algorithm from scratch using R programming language

Gibbs sampling algorithm
from scratch
using R programming language
By: Mehdi Hamedi, MD
Psychiatrist
https://www.linkedin.com/in/mehdihamedi/
Gibbs sampling algorithm is one type of Markov Chain Monte Carlo
methods to approximate posterior distribution, when direct sampling is
difficult or not possible.
So, in this article, we are going to calculating marginal likelihood using

R programming language from scratch without using any packages:
Bayes theorem
Let’s to restate Bayes theorem:
P ( θ | y )=( P ( y | θ ) × P ( θ ) ) / P ( y ) (1)
Posterior Likelihood prior Marginal Likelihood
In Bayesian inference, the last part or Marginal likelihood treated as a

constant value and P(y) is dropped. So, the mentioned equation usually
states as:
𝑃(𝜃| 𝑦) ∝ 𝑃(𝑦|𝜃) × 𝑃(𝜃))

Markov Chain Monte Carlo methods
Sometimes, it is not possible to find an analytical way to determine

posterior distribution or p(θ|y). In this conditions Markov Chain Monte
Carlo (MCMC) methods, can help us to have a good approximation
about p(θ|y).
MCMC methods are based on sampling of posterior distribution. The
more number of samples, the better estimation of posterior distribution.
A Markov Chain is formed by a random process that undergoes
transitions between states, where the state at time t + 1 depends only on
the state at time t. Thus, a Markov Chain is ahistorical or "memory-less"
because its current state is entirely independent of what happened more
than one step in the past.
MCMC methods are primarily used for calculating numerical
approximations of multi-dimensional integrals, for example in Bayesian
statistics, computational physics, computational biology and
computational linguistics.
In Bayesian statistics, the recent development of MCMC methods has
made it possible to compute large hierarchical models that require
integrations over hundreds to thousands of unknown parameters.
Metropolis–Hastings algorithm and Gibbs sampling are two well-known
Markov Chain Monte Carlo methods.
As said before MCMC methods are based on sampling of posterior
distribution. So, if the number of parameters increases, more samples are
needed to have the better estimation of posterior distribution and make
this process, computationally expensive.
Gibbs sampling algorithm
Gibbs sampling algorithm is one type of Markov Chain Monte Carlo

methods to approximate unknown distribution, when direct sampling is
difficult or not possible.
This algorithm was invented by brothers Stuart and Donald Geman in

1984 and named after the late 19th -century American physicist Josiah
Willard Gibbs.
This algorithm involves a Markov Chain of samples and converges on

the target "multivariate probability distribution" when given a
sufficiently large number of samples, same as Metropolis–Hastings
algorithm.
The key property of the Gibbs sampler is that it samples from

conditional distributions, which are often known even in situations in
which the joint density is not available for integration – as is required for
computation of the marginal likelihood (or "evidence").
This method can be used to approximate:

1. the joint distribution (e.g., to generate a histogram of the distribution);
2. the marginal distribution of one of the variables, or some subset of the
variables (for example, the unknown parameters or latent variables); or
3. to compute an integral (such as the expected value of one of the variables).
Under some necessary but plausible assumptions, this algorithm will

converge on the desired target distribution when sufficient samples are
obtained.
Suppose, we have a multivariate distribution with k parameters (x1,

x2,…,xk). The sampling process is:
x1(i+1) ~ f (x1| x2(i) , x3(i) , … , xk(i) )
x2(i+1) ~ f (x2| x1(i+1) , x3(i) , … , xk(i) )
xj(i+1) ~ f (xj| x1(i+1) , … , xj-1(i+1), xj+1(i) , … , xk(i) )
xk(i+1) ~ f (xk| x1(i+1) , x2(i+1) , … , xk-1(i+1) )
The regularity that all samples for xj are conditioned on the current
sample (i.e.,i+1) for all variables x1,..., xj−1, and on the previous sample
(i.e., i) for all variables xj+1,..., xk, should be immediately apparent.
Gibbs sampling algorithm from scratch

Now let’s to implement Gibbs sampling algorithm from scratch without
using any R programming language packages.
In this example, we have a bivariate normal distribution, which

supposed that the sampling process is difficult.
We first set up our bivariate structure by specifying the means (μ) and
standard deviations (σ) of two normal distributions, their correlation (ρ),
and their variance-covariance matrix (Σ).
#gibbs sampler
require(mvtnorm)
require(MASS)
nsamples <- 1000

rho <- .8
mux <- muy <- 0
sigx <- 1
sigy <- .5
sigma <- matrix(c(sigx^2,rho*sigx*sigy,rho*sigy*sigx,sigy^2), nrow=2)
We then draw a contour plot of the bivariate distribution:

#draw contour plot of known distribution
fiftyticks <- seq(from=-3, to =3, length.out=50)
y<-rep(fiftyticks,50)
x<-rep(fiftyticks,each=50)
z<-matrix( dmvnorm(cbind(y,x),c(mux,muy),sigma),50,50)
contour(list(x=fiftyticks,y=fiftyticks,z=z),
ylim=c(-3,3),xlim=c(-3,3),drawlabels=FALSE)
These lines perform the Gibbs sampling to estimate the parameters of
the same bivariate structure via MCMC. The core of the sampling
process is the for loop. Each iteration of this loop draws another pair of
conditional samples.
In bivariate normal distribution with x and y parameters, the conditional
distributions are:
𝜎𝑥
𝑥 𝑖+1 ~ 𝑓(𝑥|𝑦 𝑖 ) = 𝑁 (𝜇𝑥 + 𝜌 ( ) (𝑦 (𝑖) − 𝜇𝑦 ), √𝜎𝑥2 (1 − 𝜌2 ))
𝜎𝑦
𝜎𝑦
𝑦 𝑖+1 ~ 𝑓(𝑦|𝑥 𝑖+1 ) = 𝑁 (𝜇𝑦 + 𝜌 ( ) (𝑥 (𝑖+1) − 𝜇𝑥 ), √𝜎𝑦2 (1 − 𝜌2 ))
𝜎𝑥
In this case μx= μy . So we have:
𝜎𝑥 (𝑖)
𝑥 𝑖+1 ~ 𝑓(𝑥|𝑦 𝑖 ) = 𝑁 (𝜌 ( ) 𝑦 , √𝜎𝑥2 (1 − 𝜌2 ))
𝜎𝑦
𝜎𝑦 (𝑖+1)
𝑦 𝑖+1
~ 𝑓(𝑦|𝑥 𝑖+1
) = 𝑁 (𝜌 ( ) 𝑥 , √𝜎𝑦2 (1 − 𝜌2 ))
𝜎𝑥
#gibbs sampling
sxt1mr <- sqrt(sigx^2*(1-rho^2))
syt1mr <- sqrt(sigy^2*(1-rho^2))
rxy <- rho*(sigx/sigy)
ryx <- rho*(sigy/sigx)
xsamp <- ysamp <- rep(0,nsamples)
xsamp[1] <- -2
ysamp[1] <- 2
for (i in c(1:(nsamples-1))) {
xsamp[i+1] <- rnorm(1, mean=rxy*ysamp[i], sd=sxt1mr)
ysamp[i+1] <- rnorm(1, mean=ryx*xsamp[i+1], sd=syt1mr)
}
Then, the last few lines of code plots the last 500 sampled x−y pairs. We
discarded the first 500 as the usual burnin, although to illustrate the path
of the sampler from the starting value, we also plot and identify by
number the first five samples. Then check the mean, standard deviation,
and correlation of the samples:
points(xsamp[-c(1:500)],ysamp[-c(1:500)],pch=21,bg="red")
for (j in c(1:5)){
points(xsamp[j],ysamp[j]-.005,pch=21,cex=3.5,bg="white")
text(xsamp[j],ysamp[j],as.character(j))
}
cor.test(xsamp,ysamp)
sd(xsamp)
sd(ysamp)
bivn<-rmvnorm(1000,rep(0,2),sigma)
#some checks
apply(bivn,2,mean)
apply(bivn,2,sd)
cor.test(bivn[,1],bivn[,2])
And the checks:
> cor.test(xsamp,ysamp)
Pearson's product-moment correlation
data: xsamp and ysamp

t = 41.469, df = 998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7715231 0.8171653
sample estimates:
cor
0.7954696
> sd(xsamp)
[1] 1.028386
> sd(ysamp)
[1] 0.5167615
>
> bivn<-rmvnorm(1000,rep(0,2),sigma)
> #some checks
> apply(bivn,2,mean)
[1] -0.003818762 0.006641190
> apply(bivn,2,sd)
[1] 1.0365947 0.4919069
> cor.test(bivn[,1],bivn[,2])
Pearson's product-moment correlation
data: bivn[, 1] and bivn[, 2]

t = 42.18, df = 998, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7769553 0.8216216
sample estimates:
cor
0.8003966
Now all integrated codes are illustrated below for your convenience:
#gibbs sampler
require(mvtnorm)
require(MASS)
nsamples <- 1000

rho <- .8
mux <- muy <- 0
sigx <- 1
sigy <- .5
sigma <- matrix(c(sigx^2,rho*sigx*sigy,rho*sigy*sigx,sigy^2), nrow=2)
#draw contour plot of known distribution

fiftyticks <- seq(from=-3, to =3, length.out=50)
y<-rep(fiftyticks,50)
x<-rep(fiftyticks,each=50)
z<-matrix( dmvnorm(cbind(y,x),c(mux,muy),sigma),50,50)
contour(list(x=fiftyticks,y=fiftyticks,z=z),
ylim=c(-3,3),xlim=c(-3,3),drawlabels=FALSE)
#gibbs sampling
sxt1mr <- sqrt(sigx^2*(1-rho^2))
syt1mr <- sqrt(sigy^2*(1-rho^2))
rxy <- rho*(sigx/sigy)
ryx <- rho*(sigy/sigx)
xsamp <- ysamp <- rep(0,nsamples)
xsamp[1] <- -2
ysamp[1] <- 2
for (i in c(1:(nsamples-1))) {
xsamp[i+1] <- rnorm(1, mean=rxy*ysamp[i], sd=sxt1mr)
ysamp[i+1] <- rnorm(1, mean=ryx*xsamp[i+1], sd=syt1mr)
}
points(xsamp[-c(1:500)],ysamp[-c(1:500)],pch=21,bg="red")
for (j in c(1:5)){
points(xsamp[j],ysamp[j]-.005,pch=21,cex=3.5,bg="white")
text(xsamp[j],ysamp[j],as.character(j))
}
cor.test(xsamp,ysamp)
sd(xsamp)
sd(ysamp)
bivn<-rmvnorm(1000,rep(0,2),sigma)
#some checks
apply(bivn,2,mean)
apply(bivn,2,sd)
cor.test(bivn[,1],bivn[,2])
Source:
Computational Modeling of Cognition and Behavior, By SIMON

FARRELL , STEPHAN LEWANDOWSKY

Gibbs sampling algorithm from scratch using R programming language

Uploaded by

Gibbs sampling algorithm from scratch using R programming language

Uploaded by

Gibbs sampling algorithm

using R programming language

By: Mehdi Hamedi, MD

So, in this article, we are going to calculating marginal likelihood using

Let’s to restate Bayes theorem:

Posterior Likelihood prior Marginal Likelihood

In Bayesian inference, the last part or Marginal likelihood treated as a

𝑃(𝜃| 𝑦) ∝ 𝑃(𝑦|𝜃) × 𝑃(𝜃))

Sometimes, it is not possible to find an analytical way to determine

Gibbs sampling algorithm

Gibbs sampling algorithm is one type of Markov Chain Monte Carlo

This algorithm was invented by brothers Stuart and Donald Geman in

This algorithm involves a Markov Chain of samples and converges on

The key property of the Gibbs sampler is that it samples from

This method can be used to approximate:

Under some necessary but plausible assumptions, this algorithm will

Suppose, we have a multivariate distribution with k parameters (x1,

x1(i+1) ~ f (x1| x2(i) , x3(i) , … , xk(i) )

x2(i+1) ~ f (x2| x1(i+1) , x3(i) , … , xk(i) )

xj(i+1) ~ f (xj| x1(i+1) , … , xj-1(i+1), xj+1(i) , … , xk(i) )

xk(i+1) ~ f (xk| x1(i+1) , x2(i+1) , … , xk-1(i+1) )

Gibbs sampling algorithm from scratch

In this example, we have a bivariate normal distribution, which

nsamples <- 1000

We then draw a contour plot of the bivariate distribution:

In this case μx= μy . So we have:

Pearson's product-moment correlation

data: xsamp and ysamp

Pearson's product-moment correlation

data: bivn[, 1] and bivn[, 2]

nsamples <- 1000

#draw contour plot of known distribution

Computational Modeling of Cognition and Behavior, By SIMON

You might also like