API

This page details the methods and classes provided by the emcee module. The main entry point is through the EnsembleSampler object.

The Affine-Invariant Ensemble Sampler

Standard usage of emcee involves instantiating an EnsembleSampler.

class emcee.EnsembleSampler(nwalkers, dim, lnpostfn, a=2.0, args=[], kwargs={}, postargs=None, threads=1, pool=None, live_dangerously=False, runtime_sortingfn=None)[source]

A generalized Ensemble sampler that uses 2 ensembles for parallelization. The __init__ function will raise an AssertionError if k < 2 * dim (and you haven’t set the live_dangerously parameter) or if k is odd.

Warning: The chain member of this object has the shape: (nwalkers, nlinks, dim) where nlinks is the number of steps taken by the chain and k is the number of walkers. Use the flatchain property to get the chain flattened to (nlinks, dim). For users of pre-1.0 versions, this shape is different so be careful!

Parameters:
  • nwalkers – The number of Goodman & Weare “walkers”.
  • dim – Number of dimensions in the parameter space.
  • lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
  • a – (optional) The proposal scale parameter. (default: 2.0)
  • args – (optional) A list of extra positional arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
  • kwargs – (optional) A list of extra keyword arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
  • postargs – (optional) Alias of args for backwards compatibility.
  • threads – (optional) The number of threads to use for parallelization. If threads == 1, then the multiprocessing module is not used but if threads > 1, then a Pool object is created and calls to lnpostfn are run in parallel.
  • pool – (optional) An alternative method of using the parallelized algorithm. If provided, the value of threads is ignored and the object provided by pool is used for all parallelization. It can be any object with a map method that follows the same calling sequence as the built-in map function.
  • runtime_sortingfn – (optional) A function implementing custom runtime load-balancing. See Loadbalancing in parallel runs for more information.
acceptance_fraction

An array (length: k) of the fraction of steps accepted for each walker.

acor

An estimate of the autocorrelation time for each parameter (length: dim).

blobs

Get the list of “blobs” produced by sampling. The result is a list (of length iterations) of list s (of length nwalkers) of arbitrary objects. Note: this will actually be an empty list if your lnpostfn doesn’t return any metadata.

chain

A pointer to the Markov chain itself. The shape of this array is (k, iterations, dim).

clear_blobs()[source]

Clear the blobs list.

clear_chain()

An alias for reset() kept for backwards compatibility.

flatchain

A shortcut for accessing chain flattened along the zeroth (walker) axis.

flatlnprobability

A shortcut to return the equivalent of lnprobability but aligned to flatchain rather than chain.

get_autocorr_time(low=10, high=None, step=1, c=10, fast=False)[source]

Compute an estimate of the autocorrelation time for each parameter (length: dim).

Parameters:
  • low – (Optional[int]) The minimum window size to test. (default: 10)
  • high – (Optional[int]) The maximum window size to test. (default: x.shape[axis] / (2*c))
  • step – (Optional[int]) The step size for the window search. (default: 1)
  • c – (Optional[float]) The minimum number of autocorrelation times needed to trust the estimate. (default: 10)
  • fast – (Optional[bool]) If True, only use the first 2^n (for the largest power) entries for efficiency. (default: False)
get_lnprob(p)

Return the log-probability at the given position.

lnprobability

A pointer to the matrix of the value of lnprobfn produced at each step for each walker. The shape is (k, iterations).

random_state

The state of the internal random number generator. In practice, it’s the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

reset()[source]

Clear the chain and lnprobability array. Also reset the bookkeeping parameters.

run_mcmc(pos0, N, rstate0=None, lnprob0=None, **kwargs)

Iterate sample() for N iterations and return the result.

Parameters:
  • pos0 – The initial position vector. Can also be None to resume from where :func:run_mcmc left off the last time it executed.
  • N – The number of steps to run.
  • lnprob0 – (optional) The log posterior probability at position p0. If lnprob is not provided, the initial value is calculated.
  • rstate0 – (optional) The state of the random number generator. See the random_state() property for details.
  • kwargs – (optional) Other parameters that are directly passed to sample().

This returns the results of the final sample in whatever form sample() yields. Usually, that’s: pos, lnprob, rstate, blobs (blobs optional)

sample(p0, lnprob0=None, rstate0=None, blobs0=None, iterations=1, thin=1, storechain=True, mh_proposal=None)[source]

Advance the chain iterations steps as a generator.

Parameters:
  • p0 – A list of the initial positions of the walkers in the parameter space. It should have the shape (nwalkers, dim).
  • lnprob0 – (optional) The list of log posterior probabilities for the walkers at positions given by p0. If lnprob is None, the initial values are calculated. It should have the shape (k, dim).
  • rstate0 – (optional) The state of the random number generator. See the Sampler.random_state property for details.
  • iterations – (optional) The number of steps to run.
  • thin – (optional) If you only want to store and yield every thin samples in the chain, set thin to an integer greater than 1.
  • storechain – (optional) By default, the sampler stores (in memory) the positions and log-probabilities of the samples in the chain. If you are using another method to store the samples to a file or if you don’t need to analyse the samples after the fact (for burn-in for example) set storechain to False.
  • mh_proposal – (optional) A function that returns a list of positions for nwalkers walkers given a current list of positions of the same size. See utils.MH_proposal_axisaligned for an example.

At each iteration, this generator yields:

  • pos - A list of the current positions of the walkers in the parameter space. The shape of this object will be (nwalkers, dim).
  • lnprob - The list of log posterior probabilities for the walkers at positions given by pos . The shape of this object is (nwalkers, dim).
  • rstate - The current state of the random number generator.
  • blobs - (optional) The metadata “blobs” associated with the current position. The value is only returned if lnpostfn returns blobs too.

The Parallel-Tempered Ensemble Sampler

The PTSampler class performs a parallel-tempered ensemble sampling using EnsembleSampler to sample within each temperature. This sort of sampling is useful if you expect your distribution to be multi-modal. Take a look at the documentation to see how you might use this class.

class emcee.PTSampler(ntemps, nwalkers, dim, logl, logp, threads=1, pool=None, betas=None, a=2.0, Tmax=None, loglargs=[], logpargs=[], loglkwargs={}, logpkwargs={})[source]

A parallel-tempered ensemble sampler, using EnsembleSampler for sampling within each parallel chain.

Parameters:
  • ntemps – The number of temperatures. Can be None, in which case the Tmax argument sets the maximum temperature.
  • nwalkers – The number of ensemble walkers at each temperature.
  • dim – The dimension of parameter space.
  • logl – The log-likelihood function.
  • logp – The log-prior function.
  • threads – (optional) The number of parallel threads to use in sampling.
  • pool – (optional) Alternative to threads. Any object that implements a map method compatible with the built-in map will do here. For example, multi.Pool will do.
  • betas – (optional) Array giving the inverse temperatures, \(\beta=1/T\), used in the ladder. The default is chosen so that a Gaussian posterior in the given number of dimensions will have a 0.25 tswap acceptance rate.
  • a – (optional) Proposal scale factor.
  • Tmax – (optional) Maximum temperature for the ladder. If ntemps is None, this argument is used to set the temperature ladder.
  • loglargs – (optional) Positional arguments for the log-likelihood function.
  • logpargs – (optional) Positional arguments for the log-prior function.
  • loglkwargs – (optional) Keyword arguments for the log-likelihood function.
  • logpkwargs – (optional) Keyword arguments for the log-prior function.
acceptance_fraction

Matrix of shape (Ntemps, Nwalkers) detailing the acceptance fraction for each walker.

acor

Returns a matrix of autocorrelation lengths for each parameter in each temperature of shape (Ntemps, Ndim).

betas

Returns the sequence of inverse temperatures in the ladder.

chain

Returns the stored chain of samples; shape (Ntemps, Nwalkers, Nsteps, Ndim).

clear_chain()

An alias for reset() kept for backwards compatibility.

flatchain

Returns the stored chain, but flattened along the walker axis, so of shape (Ntemps, Nwalkers*Nsteps, Ndim).

get_autocorr_time(**kwargs)[source]

Returns a matrix of autocorrelation lengths for each parameter in each temperature of shape (Ntemps, Ndim).

Any arguments will be passed to autocorr.integrate_time().

get_lnprob(p)

Return the log-probability at the given position.

lnlikelihood

Matrix of ln-likelihood values; shape (Ntemps, Nwalkers, Nsteps).

lnprobability

Matrix of lnprobability values; shape (Ntemps, Nwalkers, Nsteps).

random_state

The state of the internal random number generator. In practice, it’s the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

reset()[source]

Clear the chain, lnprobability, lnlikelihood, acceptance_fraction, tswap_acceptance_fraction stored properties.

run_mcmc(pos0, N, rstate0=None, lnprob0=None, **kwargs)

Iterate sample() for N iterations and return the result.

Parameters:
  • pos0 – The initial position vector. Can also be None to resume from where :func:run_mcmc left off the last time it executed.
  • N – The number of steps to run.
  • lnprob0 – (optional) The log posterior probability at position p0. If lnprob is not provided, the initial value is calculated.
  • rstate0 – (optional) The state of the random number generator. See the random_state() property for details.
  • kwargs – (optional) Other parameters that are directly passed to sample().

This returns the results of the final sample in whatever form sample() yields. Usually, that’s: pos, lnprob, rstate, blobs (blobs optional)

sample(p0, lnprob0=None, lnlike0=None, iterations=1, thin=1, storechain=True)[source]

Advance the chains iterations steps as a generator.

Parameters:
  • p0 – The initial positions of the walkers. Shape should be (ntemps, nwalkers, dim).
  • lnprob0 – (optional) The initial posterior values for the ensembles. Shape (ntemps, nwalkers).
  • lnlike0 – (optional) The initial likelihood values for the ensembles. Shape (ntemps, nwalkers).
  • iterations – (optional) The number of iterations to preform.
  • thin – (optional) The number of iterations to perform between saving the state to the internal chain.
  • storechain – (optional) If True store the iterations in the chain property.

At each iteration, this generator yields

  • p, the current position of the walkers.
  • lnprob the current posterior values for the walkers.
  • lnlike the current likelihood values for the walkers.
thermodynamic_integration_log_evidence(logls=None, fburnin=0.1)[source]

Thermodynamic integration estimate of the evidence.

Parameters:
  • logls – (optional) The log-likelihoods to use for computing the thermodynamic evidence. If None (the default), use the stored log-likelihoods in the sampler. Should be of shape (Ntemps, Nwalkers, Nsamples).
  • fburnin – (optional) The fraction of the chain to discard as burnin samples; only the final 1-fburnin fraction of the samples will be used to compute the evidence; the default is fburnin = 0.1.
Return (lnZ, dlnZ)(lnZ, dlnZ):
 

Returns an estimate of the log-evidence and the error associated with the finite number of temperatures at which the posterior has been sampled.

The evidence is the integral of the un-normalized posterior over all of parameter space:

\[Z \equiv \int d\theta \, l(\theta) p(\theta)\]

Thermodymanic integration is a technique for estimating the evidence integral using information from the chains at various temperatures. Let

\[Z(\beta) = \int d\theta \, l^\beta(\theta) p(\theta)\]

Then

\[\frac{d \ln Z}{d \beta} = \frac{1}{Z(\beta)} \int d\theta l^\beta p \ln l = \left \langle \ln l \right \rangle_\beta\]

so

\[\ln Z(\beta = 1) = \int_0^1 d\beta \left \langle \ln l \right\rangle_\beta\]

By computing the average of the log-likelihood at the difference temperatures, the sampler can approximate the above integral.

tswap_acceptance_fraction

Returns an array of accepted temperature swap fractions for each temperature; shape (ntemps, ).

Standard Metropolis-Hastings Sampler

The Metropolis-Hastings sampler included in this module is far from fine-tuned and optimized. It is, however, stable and it has a consistent API so it can be useful for testing and comparison.

class emcee.MHSampler(cov, *args, **kwargs)[source]

The most basic possible Metropolis-Hastings style MCMC sampler

Parameters:
  • cov – The covariance matrix to use for the proposal distribution.
  • dim – Number of dimensions in the parameter space.
  • lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
  • args – (optional) A list of extra positional arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
  • kwargs – (optional) A list of extra keyword arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
acceptance_fraction

The fraction of proposed steps that were accepted.

acor

An estimate of the autocorrelation time for each parameter (length: dim).

chain

A pointer to the Markov chain.

clear_chain()

An alias for reset() kept for backwards compatibility.

flatchain

Alias of chain provided for compatibility.

get_autocorr_time(low=10, high=None, step=1, c=10, fast=False)[source]

Compute an estimate of the autocorrelation time for each parameter (length: dim).

Parameters:
  • low – (Optional[int]) The minimum window size to test. (default: 10)
  • high – (Optional[int]) The maximum window size to test. (default: x.shape[axis] / (2*c))
  • step – (Optional[int]) The step size for the window search. (default: 1)
  • c – (Optional[float]) The minimum number of autocorrelation times needed to trust the estimate. (default: 10)
  • fast – (Optional[bool]) If True, only use the first 2^n (for the largest power) entries for efficiency. (default: False)
get_lnprob(p)

Return the log-probability at the given position.

lnprobability

A list of the log-probability values associated with each step in the chain.

random_state

The state of the internal random number generator. In practice, it’s the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

run_mcmc(pos0, N, rstate0=None, lnprob0=None, **kwargs)

Iterate sample() for N iterations and return the result.

Parameters:
  • pos0 – The initial position vector. Can also be None to resume from where :func:run_mcmc left off the last time it executed.
  • N – The number of steps to run.
  • lnprob0 – (optional) The log posterior probability at position p0. If lnprob is not provided, the initial value is calculated.
  • rstate0 – (optional) The state of the random number generator. See the random_state() property for details.
  • kwargs – (optional) Other parameters that are directly passed to sample().

This returns the results of the final sample in whatever form sample() yields. Usually, that’s: pos, lnprob, rstate, blobs (blobs optional)

sample(p0, lnprob=None, randomstate=None, thin=1, storechain=True, iterations=1)[source]

Advances the chain iterations steps as an iterator

Parameters:
  • p0 – The initial position vector.
  • lnprob0 – (optional) The log posterior probability at position p0. If lnprob is not provided, the initial value is calculated.
  • rstate0 – (optional) The state of the random number generator. See the random_state() property for details.
  • iterations – (optional) The number of steps to run.
  • thin – (optional) If you only want to store and yield every thin samples in the chain, set thin to an integer greater than 1.
  • storechain – (optional) By default, the sampler stores (in memory) the positions and log-probabilities of the samples in the chain. If you are using another method to store the samples to a file or if you don’t need to analyse the samples after the fact (for burn-in for example) set storechain to False.

At each iteration, this generator yields:

  • pos - The current positions of the chain in the parameter space.
  • lnprob - The value of the log posterior at pos .
  • rstate - The current state of the random number generator.

Abstract Sampler Object

This section is mostly for developers who would be interested in implementing a new sampler for inclusion in emcee. A good starting point would be to subclass the sampler object and override the Sampler.sample() method.

class emcee.Sampler(dim, lnprobfn, args=[], kwargs={})[source]

An abstract sampler object that implements various helper functions

Parameters:
  • dim – The number of dimensions in the parameter space.
  • lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
  • args – (optional) A list of extra positional arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
  • kwargs – (optional) A list of extra keyword arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args, **kwargs).
acceptance_fraction

The fraction of proposed steps that were accepted.

chain

A pointer to the Markov chain.

clear_chain()[source]

An alias for reset() kept for backwards compatibility.

flatchain

Alias of chain provided for compatibility.

get_lnprob(p)[source]

Return the log-probability at the given position.

lnprobability

A list of the log-probability values associated with each step in the chain.

random_state

The state of the internal random number generator. In practice, it’s the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

reset()[source]

Clear chain, lnprobability and the bookkeeping parameters.

run_mcmc(pos0, N, rstate0=None, lnprob0=None, **kwargs)[source]

Iterate sample() for N iterations and return the result.

Parameters:
  • pos0 – The initial position vector. Can also be None to resume from where :func:run_mcmc left off the last time it executed.
  • N – The number of steps to run.
  • lnprob0 – (optional) The log posterior probability at position p0. If lnprob is not provided, the initial value is calculated.
  • rstate0 – (optional) The state of the random number generator. See the random_state() property for details.
  • kwargs – (optional) Other parameters that are directly passed to sample().

This returns the results of the final sample in whatever form sample() yields. Usually, that’s: pos, lnprob, rstate, blobs (blobs optional)

Autocorrelation Analysis

A good heuristic for assessing convergence of samplings is the integrated autocorrelation time. emcee includes (as of version 2.1.0) tools for computing this and the autocorrelation function itself.

emcee.autocorr.integrated_time(x, low=10, high=None, step=1, c=10, full_output=False, axis=0, fast=False)[source]

Estimate the integrated autocorrelation time of a time series.

This estimate uses the iterative procedure described on page 16 of Sokal’s notes to determine a reasonable window size.

Args:
x: The time series. If multidimensional, set the time axis using the
axis keyword argument and the function will be computed for every other axis.

low (Optional[int]): The minimum window size to test. (default: 10) high (Optional[int]): The maximum window size to test. (default:

x.shape[axis] / (2*c))
step (Optional[int]): The step size for the window search. (default:
1)
c (Optional[float]): The minimum number of autocorrelation times
needed to trust the estimate. (default: 10)
full_output (Optional[bool]): Return the final window size as well as
the autocorrelation time. (default: False)
axis (Optional[int]): The time axis of x. Assumed to be the first
axis if not specified.
fast (Optional[bool]): If True, only use the first 2^n (for
the largest power) entries for efficiency. (default: False)
Returns:
float or array: An estimate of the integrated autocorrelation time of
the time series x computed along the axis axis.
Optional[int]: The final window size that was used. Only returned if
full_output is True.
Raises
AutocorrError: If the autocorrelation time can’t be reliably estimated
from the chain. This normally means that the chain is too short.
emcee.autocorr.function(x, axis=0, fast=False)[source]

Estimate the autocorrelation function of a time series using the FFT.

Args:
x: The time series. If multidimensional, set the time axis using the
axis keyword argument and the function will be computed for every other axis.
axis (Optional[int]): The time axis of x. Assumed to be the first
axis if not specified.
fast (Optional[bool]): If True, only use the first 2^n (for
the largest power) entries for efficiency. (default: False)
Returns:
array: The autocorrelation function of the time series.

Utilities

emcee.utils.sample_ball(p0, std, size=1)[source]

Produce a ball of walkers around an initial parameter value.

Parameters:
  • p0 – The initial parameter value.
  • std – The axis-aligned standard deviation.
  • size – The number of samples to produce.
class emcee.utils.MH_proposal_axisaligned(stdev)[source]

A Metropolis-Hastings proposal, with axis-aligned Gaussian steps, for convenient use as the mh_proposal option to EnsembleSampler.sample() .

Pools

These are some helper classes for using the built-in parallel version of the algorithm. These objects can be initialized and then passed into the constructor for the EnsembleSampler object using the pool keyword argument.

Interruptible Pool

Python’s multiprocessing.Pool class doesn’t interact well with KeyboardInterrupt signals, as documented in places such as:

Various workarounds have been shared. Here, we adapt the one proposed in the last link above, by John Reese, and shared as

Our version is a drop-in replacement for multiprocessing.Pool … as long as the map() method is the only one that needs to be interrupt-friendly.

Contributed by Peter K. G. Williams <peter@newton.cx>.

Added in version 2.1.0

class emcee.interruptible_pool.InterruptiblePool(processes=None, initializer=None, initargs=(), **kwargs)[source]

A modified version of multiprocessing.pool.Pool that has better behavior with regard to KeyboardInterrupts in the map() method.

Parameters:
  • processes – (optional) The number of worker processes to use; defaults to the number of CPUs.
  • initializer – (optional) Either None, or a callable that will be invoked by each worker process when it starts.
  • initargs – (optional) Arguments for initializer; it will be called as initializer(*initargs).
  • kwargs – (optional) Extra arguments. Python 2.7 supports a maxtasksperchild parameter.
map(func, iterable, chunksize=None)[source]

Equivalent of map() built-in, without swallowing KeyboardInterrupt.

Parameters:
  • func – The function to apply to the items.
  • iterable – An iterable of items that will have func applied to them.

MPI Pool

Built-in support for MPI distributed systems. See the documentation: Using MPI to distribute the computations.

class emcee.utils.MPIPool(comm=None, debug=False, loadbalance=False)[source]

A pool that distributes tasks over a set of MPI processes. MPI is an API for distributed memory parallelism. This pool will let you run emcee without shared memory, letting you use much larger machines with emcee.

The pool only support the map() method at the moment because this is the only functionality that emcee needs. That being said, this pool is fairly general and it could be used for other purposes.

Contributed by Joe Zuntz.

Parameters:
  • comm – (optional) The mpi4py communicator.
  • debug – (optional) If True, print out a lot of status updates at each step.
  • loadbalance – (optional) if True and ntask > Ncpus, tries to loadbalance by sending out one task to each cpu first and then sending out the rest as the cpus get done.
bcast(*args, **kwargs)[source]

Equivalent to mpi4py bcast() collective operation.

close()[source]

Just send a message off to all the pool members which contains the special _close_pool_message sentinel.

is_master()[source]

Is the current process the master?

map(function, tasks)[source]

Like the built-in map() function, apply a function to all of the values in a list and return the list of results.

Parameters:
  • function – The function to apply to the list.
  • tasks – The list of elements.
wait()[source]

If this isn’t the master process, wait for instructions.