Blobs#
Way back in version 1.1 of emcee, the concept of blobs was introduced. This allows a user to track arbitrary metadata associated with every sample in the chain. The interface to access these blobs was previously a little clunky because it was stored as a list of lists of blobs. In version 3, this interface has been updated to use NumPy arrays instead and the sampler will do type inference to save the simplest possible representation of the blobs.
Anything that your log_prob
function returns in addition to the log
probability is assumed to be a blob and is tracked as part of blobs.
Put another way, if log_prob
returns more than one thing, all the things
after the first (which is assumed to be the log probability) are assumed to be
blobs.
If log_prob
returns -np.inf
for the log probability, the blobs are not
inspected or tracked so can be anything (but the correct number of arguments
must still be returned).
Using blobs to track the value of the prior#
A common pattern is to save the value of the log prior at every step in the
chain.
To do this, your log_prob
function should return your blobs (in this case log prior) as well as the log probability when called.
This can be implemented something like:
import emcee
import numpy as np
def log_prior(params):
return -0.5 * np.sum(params**2)
def log_like(params):
return -0.5 * np.sum((params / 0.1)**2)
def log_prob(params):
lp = log_prior(params)
if not np.isfinite(lp):
# log prior is not finite, return -np.inf for log probability
# and None for log prior as it won't be used anyway (but we
# must use the correct number of return values)
return -np.inf, None
ll = log_like(params)
if not np.isfinite(ll):
# log likelihood is not finite, return -np.inf for log
# probability and None for log prior (again, this value isn't
# used but we have to have the correct number of return values)
return -np.inf, None
# return log probability (sum of log prior and log likelihood)
# and log prior. Log prior will be saved as part of the blobs.
return lp + ll, lp
coords = np.random.randn(32, 3)
nwalkers, ndim = coords.shape
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob)
sampler.run_mcmc(coords, 100)
log_prior_samps = sampler.get_blobs()
flat_log_prior_samps = sampler.get_blobs(flat=True)
print(log_prior_samps.shape) # (100, 32)
print(flat_log_prior_samps.shape) # (3200,)
As shown above, after running this, the “blobs” stored by the sampler will be
a (nsteps, nwalkers)
NumPy array with the value of the log prior at every
sample.
Named blobs & custom dtypes#
If you want to save multiple pieces of metadata, it can be useful to name
them.
To implement this, we use the blobs_dtype
argument in
EnsembleSampler
.
Using this is also helpful to specify types.
If you don’t provide blobs_dtype
, the dtype of the extra args is automatically guessed the first time log_prob
is called.
For example, let’s say that, for some reason, we wanted to save the mean of the parameters as well as the log prior. To do this, we would update the above example as follows:
def log_prob(params):
lp = log_prior(params)
if not np.isfinite(lp):
# As above, log prior is not finite, so return -np.inf for log
# probability and None for everything else (these values aren't
# used, but the number of return values must be correct)
return -np.inf, None, None
ll = log_like(params)
if not np.isfinite(ll):
# Log likelihood is not finite so return -np.inf for log
# probabilitiy and None for everything else (maintaining the
# correct number of return values)
return -np.inf, None, None
# All values are finite, so return desired blobs (in this case: log
# probability, log prior and mean of parameters)
return lp + ll, lp, np.mean(params)
coords = np.random.randn(32, 3)
nwalkers, ndim = coords.shape
# Here are the important lines for defining the blobs_dtype
dtype = [("log_prior", float), ("mean", float)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob,
blobs_dtype=dtype)
sampler.run_mcmc(coords, 100)
blobs = sampler.get_blobs()
log_prior_samps = blobs["log_prior"]
mean_samps = blobs["mean"]
print(log_prior_samps.shape) # (100, 32)
print(mean_samps.shape) # (100, 32)
flat_blobs = sampler.get_blobs(flat=True)
flat_log_prior_samps = flat_blobs["log_prior"]
flat_mean_samps = flat_blobs["mean"]
print(flat_log_prior_samps.shape) # (3200,)
print(flat_mean_samps.shape) # (3200,)
This will print
(100, 32)
(100, 32)
(3200,)
(3200,)
and the blobs
object will be a structured NumPy array with two columns
called log_prior
and mean
.