# Parallel-Tempering Ensemble MCMC¶

*Added in version 1.2.0*

When your posterior is multi-modal or otherwise hard to sample with a standard MCMC, a good option to try is parallel-tempered MCMC (PTMCMC). PTMCMC runs multiple MCMC’s at different temperatures, \(T\). Each MCMC samples from a modified posterior, given by

As \(T \to \infty\), the posterior becomes the prior, which is hopefully easy to sample. If the likelihood is a Gaussian with standard deviation \(\sigma\), then the tempered likelihood is proportional to a Gaussian with standard deviation \(\sigma \sqrt{T}\).

Periodically during the run, the different temperatures swap members
of their ensemble in a way that preserves detailed balance. The hot
chains can more easily explore parameter space because the likelihood
is flatter and broader, while the cold chains do a good job of
exploring the peaks of the likelihood. This can **dramatically**
improve convergence if your likelihood function has many
well-separated modes.

## How To Sample a Multi-Modal Gaussian¶

Suppose we want to sample from the posterior given by

If the modes \(\mu_{1,2}\) are well-separated with respect to the
scale of \(\Sigma_{1,2}\), then this distribution will be hard to
sample with the `EnsembleSampler`

. Here is how we would sample
from it using the `PTSampler`

.

First, some preliminaries:

```
import numpy as np
from emcee import PTSampler
```

Define the means and standard deviations of our multi-modal likelihood:

```
# mu1 = [1, 1], mu2 = [-1, -1]
mu1 = np.ones(2)
mu2 = -np.ones(2)
# Width of 0.1 in each dimension
sigma1inv = np.diag([100.0, 100.0])
sigma2inv = np.diag([100.0, 100.0])
def logl(x):
dx1 = x - mu1
dx2 = x - mu2
return np.logaddexp(-np.dot(dx1, np.dot(sigma1inv, dx1))/2.0,
-np.dot(dx2, np.dot(sigma2inv, dx2))/2.0)
# Use a flat prior
def logp(x):
return 0.0
```

Now we can construct a sampler object that will drive the PTMCMC; arbitrarily, we choose to use 20 temperatures (the default is for each temperature to increase by a factor of \(\sqrt{2}\), so the highest temperature will be \(T = 1024\), resulting in an effective \(\sigma_T = 32 \sigma = 3.2\), which is about the separation of our modes). Let’s use 100 walkers in the ensemble at each temperature:

```
ntemps = 20
nwalkers = 100
ndim = 2
sampler=PTSampler(ntemps, nwalkers, ndim, logl, logp)
```

Making the sampling multi-threaded is as simple as adding the
`threads=Nthreads`

argument to `PTSampler`

. We could have
modified the temperature ladder using the `betas`

optional argument
(which should be an array of \(\beta \equiv 1/T\) values). The
`pool`

argument also allows to specify our own pool of worker
threads if we wanted fine-grained control over the parallelism.

First, we run the sampler for 1000 burn-in iterations:

```
p0 = np.random.uniform(low=-1.0, high=1.0, size=(ntemps, nwalkers, ndim))
for p, lnprob, lnlike in sampler.sample(p0, iterations=1000):
pass
sampler.reset()
```

Now we sample for 10000 iterations, recording every 10th sample:

```
for p, lnprob, lnlike in sampler.sample(p, lnprob0=lnprob,
lnlike0=lnlike,
iterations=10000, thin=10):
pass
```

The resulting samples (1000 of them) are stored as the
`sampler.chain`

property:

```
assert sampler.chain.shape == (ntemps, nwalkers, 1000, ndim)
# Chain has shape (ntemps, nwalkers, nsteps, ndim)
# Zero temperature mean:
mu0 = np.mean(np.mean(sampler.chain[0,...], axis=0), axis=0)
# Longest autocorrelation length (over any temperature)
max_acl = np.max(sampler.acor)
# etc
```

## Implementation Notes¶

For a description of the parallel-tempering algorithm, see, e.g. Earl & Deem (2010), Phys Chem Chem Phys, 7, 23, 3910. The algorithm has some tunable parameters:

- Temperature Ladder
- The choice of temperature for the chains will strongly influence
the rate of convergence of the sampling. By default, the
`PTSampler`

class uses an exponential ladder, with each temperature increasing by a factor of \(\sqrt{2}\). The user can supply their own ladder using the`beta`

optional argument in the constructor. - Rate of Temperature Swaps
- The rate at which temperature swaps are proposed can, to a lesser extent, also influence the rate of convergence of the sampling. The goal is to make sure that good positions found by the high temperatures can propogate to the lower temperatures, but still ensure that the high-temperatures do not lose all memory of good locations. Here we choose to implement one temperature swap proposal per walker per rung on the temperature ladder after each ensemble update. This is not user-tunable, but seems to work well in practice.

The `args`

optional argument is not available in the
`PTSampler`

constructor; use a custom class with a `__call__`

method if you need to pass arguments to the `lnlike`

or `lnprior`

functions and do not want to use a global variable.

The `thermodynamic_integration_log_evidence`

uses thermodynamic
integration (see, e.g., Goggans & Chi (2004), AIP Conf Proc, 707, 59) to estimate the evidence
integral. Define the evidence as a function of inverse temperature:

We want to compute \(Z(1)\). \(Z\) satisfies the following differential equation

where \(\left\langle \ldots \right\rangle_\beta\) is the average of a quantity over the posterior at temperature \(T = 1/\beta\). Integrating (note that \(Z(0) = 1\) because the prior is normalized), we have

This quantity can be estimated from a PTMCMC by computing the average \(\ln l\) within each chain and applying a quadrature formula to estimate the integral.