Hi All

Last time I promised some Bayesian neural network post; but as happens I lied. Basically the post I wanted would be too long so I have broken it into several parts that will drop over the next little while.This part is on a Bayesian MCMC system called Parallel Tempering or Replica Exchange Method (REM).

REM works by running many Markov chains in parallel and allowing them to exchange information. Each chain is run at a different temperature (sometimes you might run many at the same temperature but lets not get ahead of ourselves). Temperature (denoted ) in this context is a modifier to the acceptance probability () of the MCMC steps; i.e., min ( exp ,1). If is large the MCMC process will tend to accept bad (low probability) models. If is small only high liklyhood models will be accepted.

At random points in the MCMC walk the chains have the option to swap temperatures with each other. The idea is that high temp. chains are good at exploring the space but bad at finding local maxima, low temp. chains are good at fined maxima but bad at exploring. So we use both. The probability of two randomly selected chains swapping their temperatures is: min ( exp ,1). IN general i use a strategy of doing about 100-1000 random swap attempts every 4 or so MCMC steps. These swaps are basically free as there is no need to evaluate any likelihood functions.

So why is this useful? Consider running conventional MCMC with the target distribution shown in Fig. 1. Because there are four disjointed probability masses it would be extremely difficult to pick perturbation sizes *a prioi*. Also many adaptive methods will just optimize for one of the modes making sampling in the other difficult as they have different correlation structures.

Figure 2 shows the posterior samples from the distribution without the use of REM (Note that no attempt to optimize the proposal distribution was made). 51,000 samples are taken; these where down sampled to 10,000. As can be seen only two of the four modes are sampled by the process. Figure 3 shows samples using REM. Thirteen chains are used with temperatures 1.00 1.19, 1.41, 1.68, 2.00, 2.38, 2.83, 3.36, 4.00, 4.76, 5.66, 6.73, and 8.00. 10,000 MCMC steps are taken; this means that the process has made about 2.5 as many likelihood calculations. All four modes are sampled from!

Code is appended g.LL <- function(m) { m = as.numeric(m) mu = abs(m[2]) - abs(m[1]) nu = sqrt(m[1]^2 + m[2]^2 ) #print(mu) #print(nu) #print(is.numeric(mu)) P1 = dnorm(mu[1], mean = 0, sd = 0.25, log=T) P2 = dnorm(nu[1], mean = 3.5, sd = 1, log=T) #print(P1) #print(P2) P = P1 + P2 return(P) } g.REM.mcmc <- function(N= 1000, DREM = 1 ) { UB = c( 5, 5) LB = c(-5, -5) N.temps = 13 EX = seq(0, 3, length.out = N.temps ) t.star = 2^EX M = matrix( nrow = N.temps, ncol= 4) for ( i in 1:N.temps) { m.old = c(runif(1, min = -5, max = 5), runif(1, min = -5, max = 5)) ll.old = g.LL(m = m.old) M[i,1] = ll.old M[i,2:3] = m.old M[i,4] = t.star[i] } M.list = c(ll.old, m.old,0) alpha = 1 for ( i in 1:N) { # the mcmc steps for each chain for (j in 1:N.temps) { m.old = M[j,2:3] ll.old = M[j,1] t.star = M[j,4] jump = rnorm(2, mean=0, sd = 0.07) # this is not best but the point is to highlight REM m.new = m.old + jump if ( (m.new[1] < UB[1]) & (m.new[2] < UB[2]) & (m.new[1] > LB[1]) & (m.new[2] > LB[2]) ) { ll.new = g.LL(m = m.new) # standard mcmc acceptance rule A = exp((ll.new - ll.old)/t.star) r = runif(1) if (r <= A) { M[j,1:3] = c(ll.new, m.new) } } } # Replica exchange method steps r = runif(1) if ( (r<= 0.25) & (DREM == 1)) # clearly this (the DREM on switch) should also be in the liklihood step but i am lazy and this fast anyway { for (j in 1:100) { k = sample(x = 1:N.temps, size= 2, replace =T) A = exp((M[k[1],1] - M[k[2],1])*( (1/ M[k[1],4] ) - (1/M[k[2],4] ) )) r = runif(1) if ( r <= A) { T1 = M[k[1],4] T2 = M[k[2],4] M[k[1],4] = T2 M[k[2],4] = T1 } } } if ( i <= 9999) { M.list = rbind(M.list, c(M[M[,4]==1,1:3 ],i)) rownames(M.list) =NULL } # fixed length thinning if ( i > 9999) { alpha = alpha*10000/(alpha+10000) r = runif(1) if ( r <= alpha) { row = sample(1:10000,1) M.list[row,] = c(M[M[,4]==1,1:3 ],i) } } } N.s = nrow(M.list) return(M.list[floor(N.s/2):N.s,] ) } samples = g.REM.mcmc(N=10999, DREM = 1 ) png(file = "target3.png") image(y= seq(-5, 5, length.out =101), x= seq(-5, 5, length.out =101), LL.mat, col = c("white", rev(heat.colors(51))), xlab="m1", ylab ="m2") points(samples[,2], samples[,3], pch=".") dev.off()