API¶
This page details the methods and classes provided by the emcee
module.
The main entry point is through the EnsembleSampler
object.
The Affine-Invariant Ensemble Sampler¶
Standard usage of emcee
involves instantiating an
EnsembleSampler
.
-
class
emcee.
EnsembleSampler
(nwalkers, dim, lnpostfn, a=2.0, args=[], kwargs={}, postargs=None, threads=1, pool=None, live_dangerously=False, runtime_sortingfn=None)[source]¶ A generalized Ensemble sampler that uses 2 ensembles for parallelization. The
__init__
function will raise anAssertionError
ifk < 2 * dim
(and you haven’t set thelive_dangerously
parameter) or ifk
is odd.Warning: The
chain
member of this object has the shape:(nwalkers, nlinks, dim)
wherenlinks
is the number of steps taken by the chain andk
is the number of walkers. Use theflatchain
property to get the chain flattened to(nlinks, dim)
. For users of pre-1.0 versions, this shape is different so be careful!Parameters: - nwalkers – The number of Goodman & Weare “walkers”.
- dim – Number of dimensions in the parameter space.
- lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
- a – (optional)
The proposal scale parameter. (default:
2.0
) - args – (optional)
A list of extra positional arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
. - kwargs – (optional)
A list of extra keyword arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
. - postargs – (optional)
Alias of
args
for backwards compatibility. - threads – (optional)
The number of threads to use for parallelization. If
threads == 1
, then themultiprocessing
module is not used but ifthreads > 1
, then aPool
object is created and calls tolnpostfn
are run in parallel. - pool – (optional)
An alternative method of using the parallelized algorithm. If
provided, the value of
threads
is ignored and the object provided bypool
is used for all parallelization. It can be any object with amap
method that follows the same calling sequence as the built-inmap
function. - runtime_sortingfn – (optional) A function implementing custom runtime load-balancing. See Loadbalancing in parallel runs for more information.
-
acceptance_fraction
¶ An array (length:
k
) of the fraction of steps accepted for each walker.
-
acor
¶ An estimate of the autocorrelation time for each parameter (length:
dim
).
-
blobs
¶ Get the list of “blobs” produced by sampling. The result is a list (of length
iterations
) oflist
s (of lengthnwalkers
) of arbitrary objects. Note: this will actually be an empty list if yourlnpostfn
doesn’t return any metadata.
-
chain
¶ A pointer to the Markov chain itself. The shape of this array is
(k, iterations, dim)
.
-
flatchain
¶ A shortcut for accessing chain flattened along the zeroth (walker) axis.
-
flatlnprobability
¶ A shortcut to return the equivalent of
lnprobability
but aligned toflatchain
rather thanchain
.
-
get_autocorr_time
(low=10, high=None, step=1, c=10, fast=False)[source]¶ Compute an estimate of the autocorrelation time for each parameter (length:
dim
).Parameters: - low – (Optional[int])
The minimum window size to test.
(default:
10
) - high – (Optional[int])
The maximum window size to test.
(default:
x.shape[axis] / (2*c)
) - step – (Optional[int])
The step size for the window search.
(default:
1
) - c – (Optional[float])
The minimum number of autocorrelation times needed to trust the
estimate.
(default:
10
) - fast – (Optional[bool])
If
True
, only use the first2^n
(for the largest power) entries for efficiency. (default: False)
- low – (Optional[int])
The minimum window size to test.
(default:
-
get_lnprob
(p)¶ Return the log-probability at the given position.
-
lnprobability
¶ A pointer to the matrix of the value of
lnprobfn
produced at each step for each walker. The shape is(k, iterations)
.
-
random_state
¶ The state of the internal random number generator. In practice, it’s the result of calling
get_state()
on anumpy.random.mtrand.RandomState
object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.
-
run_mcmc
(pos0, N, rstate0=None, lnprob0=None, **kwargs)¶ Iterate
sample()
forN
iterations and return the result.Parameters: - pos0 – The initial position vector. Can also be None to resume from
where :func:
run_mcmc
left off the last time it executed. - N – The number of steps to run.
- lnprob0 – (optional)
The log posterior probability at position
p0
. Iflnprob
is not provided, the initial value is calculated. - rstate0 – (optional)
The state of the random number generator. See the
random_state()
property for details. - kwargs – (optional)
Other parameters that are directly passed to
sample()
.
This returns the results of the final sample in whatever form
sample()
yields. Usually, that’s:pos
,lnprob
,rstate
,blobs
(blobs optional)- pos0 – The initial position vector. Can also be None to resume from
where :func:
-
sample
(p0, lnprob0=None, rstate0=None, blobs0=None, iterations=1, thin=1, storechain=True, mh_proposal=None)[source]¶ Advance the chain
iterations
steps as a generator.Parameters: - p0 – A list of the initial positions of the walkers in the
parameter space. It should have the shape
(nwalkers, dim)
. - lnprob0 – (optional)
The list of log posterior probabilities for the walkers at
positions given by
p0
. Iflnprob is None
, the initial values are calculated. It should have the shape(k, dim)
. - rstate0 – (optional)
The state of the random number generator.
See the
Sampler.random_state
property for details. - iterations – (optional) The number of steps to run.
- thin – (optional)
If you only want to store and yield every
thin
samples in the chain, set thin to an integer greater than 1. - storechain – (optional)
By default, the sampler stores (in memory) the positions and
log-probabilities of the samples in the chain. If you are
using another method to store the samples to a file or if you
don’t need to analyse the samples after the fact (for burn-in
for example) set
storechain
toFalse
. - mh_proposal – (optional)
A function that returns a list of positions for
nwalkers
walkers given a current list of positions of the same size. Seeutils.MH_proposal_axisaligned
for an example.
At each iteration, this generator yields:
pos
- A list of the current positions of the walkers in the parameter space. The shape of this object will be(nwalkers, dim)
.lnprob
- The list of log posterior probabilities for the walkers at positions given bypos
. The shape of this object is(nwalkers, dim)
.rstate
- The current state of the random number generator.blobs
- (optional) The metadata “blobs” associated with the current position. The value is only returned iflnpostfn
returns blobs too.
- p0 – A list of the initial positions of the walkers in the
parameter space. It should have the shape
The Parallel-Tempered Ensemble Sampler¶
The PTSampler
class performs a parallel-tempered ensemble
sampling using EnsembleSampler
to sample within each
temperature. This sort of sampling is useful if you expect your
distribution to be multi-modal. Take a look at the documentation to see how you might use this class.
-
class
emcee.
PTSampler
(ntemps, nwalkers, dim, logl, logp, threads=1, pool=None, betas=None, a=2.0, Tmax=None, loglargs=[], logpargs=[], loglkwargs={}, logpkwargs={})[source]¶ A parallel-tempered ensemble sampler, using
EnsembleSampler
for sampling within each parallel chain.Parameters: - ntemps – The number of temperatures. Can be
None
, in which case theTmax
argument sets the maximum temperature. - nwalkers – The number of ensemble walkers at each temperature.
- dim – The dimension of parameter space.
- logl – The log-likelihood function.
- logp – The log-prior function.
- threads – (optional) The number of parallel threads to use in sampling.
- pool – (optional)
Alternative to
threads
. Any object that implements amap
method compatible with the built-inmap
will do here. For example,multi.Pool
will do. - betas – (optional) Array giving the inverse temperatures, \(\beta=1/T\), used in the ladder. The default is chosen so that a Gaussian posterior in the given number of dimensions will have a 0.25 tswap acceptance rate.
- a – (optional) Proposal scale factor.
- Tmax – (optional)
Maximum temperature for the ladder. If
ntemps
isNone
, this argument is used to set the temperature ladder. - loglargs – (optional) Positional arguments for the log-likelihood function.
- logpargs – (optional) Positional arguments for the log-prior function.
- loglkwargs – (optional) Keyword arguments for the log-likelihood function.
- logpkwargs – (optional) Keyword arguments for the log-prior function.
-
acceptance_fraction
¶ Matrix of shape
(Ntemps, Nwalkers)
detailing the acceptance fraction for each walker.
-
acor
¶ Returns a matrix of autocorrelation lengths for each parameter in each temperature of shape
(Ntemps, Ndim)
.
-
betas
¶ Returns the sequence of inverse temperatures in the ladder.
-
chain
¶ Returns the stored chain of samples; shape
(Ntemps, Nwalkers, Nsteps, Ndim)
.
-
flatchain
¶ Returns the stored chain, but flattened along the walker axis, so of shape
(Ntemps, Nwalkers*Nsteps, Ndim)
.
-
get_autocorr_time
(**kwargs)[source]¶ Returns a matrix of autocorrelation lengths for each parameter in each temperature of shape
(Ntemps, Ndim)
.Any arguments will be passed to
autocorr.integrate_time()
.
-
get_lnprob
(p)¶ Return the log-probability at the given position.
-
lnlikelihood
¶ Matrix of ln-likelihood values; shape
(Ntemps, Nwalkers, Nsteps)
.
-
lnprobability
¶ Matrix of lnprobability values; shape
(Ntemps, Nwalkers, Nsteps)
.
-
random_state
¶ The state of the internal random number generator. In practice, it’s the result of calling
get_state()
on anumpy.random.mtrand.RandomState
object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.
-
reset
()[source]¶ Clear the
chain
,lnprobability
,lnlikelihood
,acceptance_fraction
,tswap_acceptance_fraction
stored properties.
-
run_mcmc
(pos0, N, rstate0=None, lnprob0=None, **kwargs)¶ Iterate
sample()
forN
iterations and return the result.Parameters: - pos0 – The initial position vector. Can also be None to resume from
where :func:
run_mcmc
left off the last time it executed. - N – The number of steps to run.
- lnprob0 – (optional)
The log posterior probability at position
p0
. Iflnprob
is not provided, the initial value is calculated. - rstate0 – (optional)
The state of the random number generator. See the
random_state()
property for details. - kwargs – (optional)
Other parameters that are directly passed to
sample()
.
This returns the results of the final sample in whatever form
sample()
yields. Usually, that’s:pos
,lnprob
,rstate
,blobs
(blobs optional)- pos0 – The initial position vector. Can also be None to resume from
where :func:
-
sample
(p0, lnprob0=None, lnlike0=None, iterations=1, thin=1, storechain=True)[source]¶ Advance the chains
iterations
steps as a generator.Parameters: - p0 – The initial positions of the walkers. Shape should be
(ntemps, nwalkers, dim)
. - lnprob0 – (optional)
The initial posterior values for the ensembles. Shape
(ntemps, nwalkers)
. - lnlike0 – (optional)
The initial likelihood values for the ensembles. Shape
(ntemps, nwalkers)
. - iterations – (optional) The number of iterations to preform.
- thin – (optional) The number of iterations to perform between saving the state to the internal chain.
- storechain – (optional)
If
True
store the iterations in thechain
property.
At each iteration, this generator yields
p
, the current position of the walkers.lnprob
the current posterior values for the walkers.lnlike
the current likelihood values for the walkers.
- p0 – The initial positions of the walkers. Shape should be
-
thermodynamic_integration_log_evidence
(logls=None, fburnin=0.1)[source]¶ Thermodynamic integration estimate of the evidence.
Parameters: - logls – (optional) The log-likelihoods to use for
computing the thermodynamic evidence. If
None
(the default), use the stored log-likelihoods in the sampler. Should be of shape(Ntemps, Nwalkers, Nsamples)
. - fburnin – (optional)
The fraction of the chain to discard as burnin samples; only the
final
1-fburnin
fraction of the samples will be used to compute the evidence; the default isfburnin = 0.1
.
Return (lnZ, dlnZ) (lnZ, dlnZ)
:Returns an estimate of the log-evidence and the error associated with the finite number of temperatures at which the posterior has been sampled.
The evidence is the integral of the un-normalized posterior over all of parameter space:
\[Z \equiv \int d\theta \, l(\theta) p(\theta)\]Thermodymanic integration is a technique for estimating the evidence integral using information from the chains at various temperatures. Let
\[Z(\beta) = \int d\theta \, l^\beta(\theta) p(\theta)\]Then
\[\frac{d \ln Z}{d \beta} = \frac{1}{Z(\beta)} \int d\theta l^\beta p \ln l = \left \langle \ln l \right \rangle_\beta\]so
\[\ln Z(\beta = 1) = \int_0^1 d\beta \left \langle \ln l \right\rangle_\beta\]By computing the average of the log-likelihood at the difference temperatures, the sampler can approximate the above integral.
- logls – (optional) The log-likelihoods to use for
computing the thermodynamic evidence. If
-
tswap_acceptance_fraction
¶ Returns an array of accepted temperature swap fractions for each temperature; shape
(ntemps, )
.
- ntemps – The number of temperatures. Can be
Standard Metropolis-Hastings Sampler¶
The Metropolis-Hastings sampler included in this module is far from fine-tuned and optimized. It is, however, stable and it has a consistent API so it can be useful for testing and comparison.
-
class
emcee.
MHSampler
(cov, *args, **kwargs)[source]¶ The most basic possible Metropolis-Hastings style MCMC sampler
Parameters: - cov – The covariance matrix to use for the proposal distribution.
- dim – Number of dimensions in the parameter space.
- lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
- args – (optional)
A list of extra positional arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
. - kwargs – (optional)
A list of extra keyword arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
.
-
acceptance_fraction
¶ The fraction of proposed steps that were accepted.
-
acor
¶ An estimate of the autocorrelation time for each parameter (length:
dim
).
-
chain
¶ A pointer to the Markov chain.
-
clear_chain
()¶ An alias for
reset()
kept for backwards compatibility.
-
flatchain
¶ Alias of
chain
provided for compatibility.
-
get_autocorr_time
(low=10, high=None, step=1, c=10, fast=False)[source]¶ Compute an estimate of the autocorrelation time for each parameter (length:
dim
).Parameters: - low – (Optional[int])
The minimum window size to test.
(default:
10
) - high – (Optional[int])
The maximum window size to test.
(default:
x.shape[axis] / (2*c)
) - step – (Optional[int])
The step size for the window search.
(default:
1
) - c – (Optional[float])
The minimum number of autocorrelation times needed to trust the
estimate.
(default:
10
) - fast – (Optional[bool])
If
True
, only use the first2^n
(for the largest power) entries for efficiency. (default: False)
- low – (Optional[int])
The minimum window size to test.
(default:
-
get_lnprob
(p)¶ Return the log-probability at the given position.
-
lnprobability
¶ A list of the log-probability values associated with each step in the chain.
-
random_state
¶ The state of the internal random number generator. In practice, it’s the result of calling
get_state()
on anumpy.random.mtrand.RandomState
object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.
-
run_mcmc
(pos0, N, rstate0=None, lnprob0=None, **kwargs)¶ Iterate
sample()
forN
iterations and return the result.Parameters: - pos0 – The initial position vector. Can also be None to resume from
where :func:
run_mcmc
left off the last time it executed. - N – The number of steps to run.
- lnprob0 – (optional)
The log posterior probability at position
p0
. Iflnprob
is not provided, the initial value is calculated. - rstate0 – (optional)
The state of the random number generator. See the
random_state()
property for details. - kwargs – (optional)
Other parameters that are directly passed to
sample()
.
This returns the results of the final sample in whatever form
sample()
yields. Usually, that’s:pos
,lnprob
,rstate
,blobs
(blobs optional)- pos0 – The initial position vector. Can also be None to resume from
where :func:
-
sample
(p0, lnprob=None, randomstate=None, thin=1, storechain=True, iterations=1)[source]¶ Advances the chain
iterations
steps as an iteratorParameters: - p0 – The initial position vector.
- lnprob0 – (optional)
The log posterior probability at position
p0
. Iflnprob
is not provided, the initial value is calculated. - rstate0 – (optional)
The state of the random number generator. See the
random_state()
property for details. - iterations – (optional) The number of steps to run.
- thin – (optional)
If you only want to store and yield every
thin
samples in the chain, set thin to an integer greater than 1. - storechain – (optional)
By default, the sampler stores (in memory) the positions and
log-probabilities of the samples in the chain. If you are
using another method to store the samples to a file or if you
don’t need to analyse the samples after the fact (for burn-in
for example) set
storechain
toFalse
.
At each iteration, this generator yields:
pos
- The current positions of the chain in the parameter space.lnprob
- The value of the log posterior atpos
.rstate
- The current state of the random number generator.
Abstract Sampler Object¶
This section is mostly for developers who would be interested in implementing
a new sampler for inclusion in emcee
. A good starting point would be
to subclass the sampler object and override the Sampler.sample()
method.
-
class
emcee.
Sampler
(dim, lnprobfn, args=[], kwargs={})[source]¶ An abstract sampler object that implements various helper functions
Parameters: - dim – The number of dimensions in the parameter space.
- lnpostfn – A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.
- args – (optional)
A list of extra positional arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
. - kwargs – (optional)
A list of extra keyword arguments for
lnpostfn
.lnpostfn
will be called with the sequencelnpostfn(p, *args, **kwargs)
.
-
acceptance_fraction
¶ The fraction of proposed steps that were accepted.
-
chain
¶ A pointer to the Markov chain.
-
flatchain
¶ Alias of
chain
provided for compatibility.
-
lnprobability
¶ A list of the log-probability values associated with each step in the chain.
-
random_state
¶ The state of the internal random number generator. In practice, it’s the result of calling
get_state()
on anumpy.random.mtrand.RandomState
object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.
-
run_mcmc
(pos0, N, rstate0=None, lnprob0=None, **kwargs)[source]¶ Iterate
sample()
forN
iterations and return the result.Parameters: - pos0 – The initial position vector. Can also be None to resume from
where :func:
run_mcmc
left off the last time it executed. - N – The number of steps to run.
- lnprob0 – (optional)
The log posterior probability at position
p0
. Iflnprob
is not provided, the initial value is calculated. - rstate0 – (optional)
The state of the random number generator. See the
random_state()
property for details. - kwargs – (optional)
Other parameters that are directly passed to
sample()
.
This returns the results of the final sample in whatever form
sample()
yields. Usually, that’s:pos
,lnprob
,rstate
,blobs
(blobs optional)- pos0 – The initial position vector. Can also be None to resume from
where :func:
Autocorrelation Analysis¶
A good heuristic for assessing convergence of samplings is the integrated
autocorrelation time. emcee
includes (as of version 2.1.0) tools for
computing this and the autocorrelation function itself.
-
emcee.autocorr.
integrated_time
(x, low=10, high=None, step=1, c=10, full_output=False, axis=0, fast=False)[source]¶ Estimate the integrated autocorrelation time of a time series.
This estimate uses the iterative procedure described on page 16 of Sokal’s notes to determine a reasonable window size.
- Args:
- x: The time series. If multidimensional, set the time axis using the
axis
keyword argument and the function will be computed for every other axis.
low (Optional[int]): The minimum window size to test. (default:
10
) high (Optional[int]): The maximum window size to test. (default:x.shape[axis] / (2*c)
)- step (Optional[int]): The step size for the window search. (default:
1
)- c (Optional[float]): The minimum number of autocorrelation times
- needed to trust the estimate. (default:
10
) - full_output (Optional[bool]): Return the final window size as well as
- the autocorrelation time. (default:
False
) - axis (Optional[int]): The time axis of
x
. Assumed to be the first - axis if not specified.
- fast (Optional[bool]): If
True
, only use the first2^n
(for - the largest power) entries for efficiency. (default: False)
- Returns:
- float or array: An estimate of the integrated autocorrelation time of
- the time series
x
computed along the axisaxis
. - Optional[int]: The final window size that was used. Only returned if
full_output
isTrue
.
- Raises
- AutocorrError: If the autocorrelation time can’t be reliably estimated
- from the chain. This normally means that the chain is too short.
-
emcee.autocorr.
function
(x, axis=0, fast=False)[source]¶ Estimate the autocorrelation function of a time series using the FFT.
- Args:
- x: The time series. If multidimensional, set the time axis using the
axis
keyword argument and the function will be computed for every other axis.- axis (Optional[int]): The time axis of
x
. Assumed to be the first - axis if not specified.
- fast (Optional[bool]): If
True
, only use the first2^n
(for - the largest power) entries for efficiency. (default: False)
- Returns:
- array: The autocorrelation function of the time series.
Utilities¶
Pools¶
These are some helper classes for using the built-in parallel version of the
algorithm. These objects can be initialized and then passed into the
constructor for the EnsembleSampler
object using the pool
keyword
argument.
Interruptible Pool¶
Python’s multiprocessing.Pool class doesn’t interact well with
KeyboardInterrupt
signals, as documented in places such as:
- http://stackoverflow.com/questions/1408356/
- http://stackoverflow.com/questions/11312525/
- http://noswap.com/blog/python-multiprocessing-keyboardinterrupt
Various workarounds have been shared. Here, we adapt the one proposed in the last link above, by John Reese, and shared as
Our version is a drop-in replacement for multiprocessing.Pool … as long as the map() method is the only one that needs to be interrupt-friendly.
Contributed by Peter K. G. Williams <peter@newton.cx>.
Added in version 2.1.0
-
class
emcee.interruptible_pool.
InterruptiblePool
(processes=None, initializer=None, initargs=(), **kwargs)[source]¶ A modified version of
multiprocessing.pool.Pool
that has better behavior with regard toKeyboardInterrupts
in themap()
method.Parameters: - processes – (optional) The number of worker processes to use; defaults to the number of CPUs.
- initializer – (optional)
Either
None
, or a callable that will be invoked by each worker process when it starts. - initargs – (optional)
Arguments for initializer; it will be called as
initializer(*initargs)
. - kwargs – (optional)
Extra arguments. Python 2.7 supports a
maxtasksperchild
parameter.
MPI Pool¶
Built-in support for MPI distributed systems. See the documentation: Using MPI to distribute the computations.
-
class
emcee.utils.
MPIPool
(comm=None, debug=False, loadbalance=False)[source]¶ A pool that distributes tasks over a set of MPI processes. MPI is an API for distributed memory parallelism. This pool will let you run emcee without shared memory, letting you use much larger machines with emcee.
The pool only support the
map()
method at the moment because this is the only functionality that emcee needs. That being said, this pool is fairly general and it could be used for other purposes.Contributed by Joe Zuntz.
Parameters: - comm – (optional)
The
mpi4py
communicator. - debug – (optional)
If
True
, print out a lot of status updates at each step. - loadbalance – (optional)
if
True
and ntask > Ncpus, tries to loadbalance by sending out one task to each cpu first and then sending out the rest as the cpus get done.
-
close
()[source]¶ Just send a message off to all the pool members which contains the special
_close_pool_message
sentinel.
- comm – (optional)
The