Note: This tutorial was generated from an IPython notebook that can be downloaded here.

Quickstart¶

This notebook was made with the following version of emcee:

import emcee
emcee.__version__

'0.3.0.dev0'

The easiest way to get started with using emcee is to use it for a project. To get you started, here’s an annotated, fully-functional example that demonstrates a standard usage pattern.

How to sample a multi-dimensional Gaussian¶

We’re going to demonstrate how you might draw samples from the multivariate Gaussian density given by:

\[p(\vec{x}) \propto \exp \left [ - \frac{1}{2} (\vec{x} - \vec{\mu})^\mathrm{T} \, \Sigma ^{-1} \, (\vec{x} - \vec{\mu}) \right ]\]

where \(\vec{\mu}\) is an \(N\)-dimensional vector position of the mean of the density and \(\Sigma\) is the square N-by-N covariance matrix.

The first thing that we need to do is import the necessary modules:

import numpy as np

Then, we’ll code up a Python function that returns the density \(p(\vec{x})\) for specific values of \(\vec{x}\), \(\vec{\mu}\) and \(\Sigma^{-1}\). In fact, emcee actually requires the logarithm of \(p\). We’ll call it log_prob:

def log_prob(x, mu, icov):
    diff = x - mu
    return -0.5*np.dot(diff,np.dot(icov,diff))

It is important that the first argument of the probability function is the position of a single “walker” (a N dimensional numpy array). The following arguments are going to be constant every time the function is called and the values come from the args parameter of our EnsembleSampler that we’ll see soon.

Now, we’ll set up the specific values of those “hyperparameters” in 5 dimensions:

ndim = 5

np.random.seed(42)
means = np.random.rand(ndim)

cov = 0.5 - np.random.rand(ndim ** 2).reshape((ndim, ndim))
cov = np.triu(cov)
cov += cov.T - np.diag(cov.diagonal())
cov = np.dot(cov,cov)

and where cov is \(\Sigma\). Before going on, let’s compute the inverse of cov because that’s what we need in our probability function:

icov = np.linalg.inv(cov)

It’s probably overkill this time but how about we use 250 walkers? Before we go on, we need to guess a starting point for each of the 250 walkers. This position will be a 50-dimensional vector so the initial guess should be a 250-by-50 array—or a list of 250 arrays that each have 50 elements. It’s not a very good guess but we’ll just guess a random number between 0 and 1 for each component:

nwalkers = 250
p0 = np.random.rand(nwalkers, ndim)

Now that we’ve gotten past all the bookkeeping stuff, we can move on to the fun stuff. The main interface provided by emcee is the EnsembleSampler object so let’s get ourselves one of those:

sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, args=[means, icov])

Remember how our function log_prob required two extra arguments when it was called? By setting up our sampler with the args argument, we’re saying that the probability function should be called as:

log_prob(p0[0], means, icov)

-2.5960945890854434

If we didn’t provide any args parameter, the calling sequence would be log_prob(p0[0]) instead.

It’s generally a good idea to run a few “burn-in” steps in your MCMC chain to let the walkers explore the parameter space a bit and get settled into the maximum of the density. We’ll run a burn-in of 100 steps (yep, I just made that number up… it’s hard to really know how many steps of burn-in you’ll need before you start) starting from our initial guess p0:

pos, prob, state = sampler.run_mcmc(p0, 100)
sampler.reset()

100%|██████████| 100/100 [00:00<00:00, 305.62it/s]

You’ll notice that I saved the final position of the walkers (after the 100 steps) to a variable called pos. You can check out what will be contained in the other output variables by looking at the documentation for the EnsembleSampler.run_mcmc() function. The call to the EnsembleSampler.reset() method clears all of the important bookkeeping parameters in the sampler so that we get a fresh start. It also clears the current positions of the walkers so it’s a good thing that we saved them first.

Now, we can do our production run of 10000 steps:

sampler.run_mcmc(pos, 10000);

100%|██████████| 10000/10000 [00:24<00:00, 407.28it/s]

The sampler now has a property EnsembleSampler.chain that is a numpy array with the shape (1000, 250, 50). Take note of that shape and make sure that you know where each of those numbers come from. Another useful object is the EnsembleSampler.flatchain which has the shape (250000, 50) and contains all the samples reshaped into a flat list. You can see now that we now have 250 000 unbiased samples of the density \(p(\vec{x})\). You can make histograms of these samples to get an estimate of the density that you were sampling:

Autocorrelation analysis & convergence

sampler.get_autocorr_time()

array([ 54.61620237,  53.72668829,  54.67029465,  54.80001017,  53.99357549])

import matplotlib.pyplot as plt

for i in range(3):
    plt.figure()
    plt.hist(sampler.flatchain[:,i], 100, color="k", histtype="step")
    plt.title("Dimension {0:d}".format(i))

Another good test of whether or not the sampling went well is to check the mean acceptance fraction of the ensemble using the EnsembleSampler.acceptance_fraction() property:

print("Mean acceptance fraction: {0:.3f}"
      .format(np.mean(sampler.acceptance_fraction)))

Mean acceptance fraction: 0.551

This number should be between approximately 0.25 and 0.5 if everything went as planned.

plt.plot(sampler.chain[:, :, 0]);

plt.plot(sampler.chain[:, :, -1]);