| Title: | Simulation Based Calibration for Bayesian models |
|---|---|
| Description: | SBC helps perform Simulation Based Calibration on Bayesian models. SBC lets you check for bugs in your model code and/or algorithm that fits the model. SBC focuses on models built with 'Stan' <https://mc-stan.org>, but can support other modelling languages as well. |
| Authors: | Martin Modrák [aut] (ORCID: <https://orcid.org/0000-0002-8886-7797>), Shinyoung Kim [aut], Angie.H Moon [aut, cre], Teemu Säilynoja [aut], Luna Fazio [ctb], Timo Stenz [ctb] |
| Maintainer: | Angie.H Moon <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.0.0 |
| Built: | 2026-03-10 15:50:33 UTC |
| Source: | https://github.com/hyunjimoon/SBC |
SBC_datasets object.Subset an SBC_datasets object.
## S3 method for class 'SBC_datasets' x[indices]## S3 method for class 'SBC_datasets' x[indices]
Subset the results.
## S3 method for class 'SBC_results' x[indices]## S3 method for class 'SBC_results' x[indices]
indices |
integer indices or a binary vector of the same length as the number fits, selecting which fits to keep. |
Get attribute presence for a vector of variable names.
attribute_present(attribute, variable_names, var_attr)attribute_present(attribute, variable_names, var_attr)
attribute |
the attribute to test |
variable_names |
names of variable to test for the attribute |
var_attr |
an object defining the attributes, see variable-attributes. |
a logical vector the same length as variable_names
$stats element of the results
for presence of an attribute.Check an attribute in the $stats element of the results
for presence of an attribute.
attribute_present_stats(attribute, x)attribute_present_stats(attribute, x)
Obtain estimate of binary prediction calibration and the associated uncertainty interval for further processing or plot it directly.
binary_calibration_from_bp( bp, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL ) plot_binary_calibration_diff( x, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) ## S3 method for class 'SBC_results' plot_binary_calibration_diff(res, ...) ## S3 method for class 'data.frame' plot_binary_calibration_diff( bp, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) plot_binary_calibration( x, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) ## S3 method for class 'SBC_results' plot_binary_calibration(res, ...) ## S3 method for class 'data.frame' plot_binary_calibration( bp, type = c("reliabilitydiag", "calibrationband"), ..., region.position = NULL, prob_histogram = TRUE )binary_calibration_from_bp( bp, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL ) plot_binary_calibration_diff( x, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) ## S3 method for class 'SBC_results' plot_binary_calibration_diff(res, ...) ## S3 method for class 'data.frame' plot_binary_calibration_diff( bp, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) plot_binary_calibration( x, type = c("reliabilitydiag", "calibrationband"), alpha = 0.05, ..., region.position = NULL, prob_histogram = TRUE ) ## S3 method for class 'SBC_results' plot_binary_calibration(res, ...) ## S3 method for class 'data.frame' plot_binary_calibration( bp, type = c("reliabilitydiag", "calibrationband"), ..., region.position = NULL, prob_histogram = TRUE )
bp |
the binary probabilities — typically obtained with
|
type |
the type of calibration uncertainty bands to compute, see details. |
alpha |
the level associated with the confidence intervals reports |
... |
additional arguments passed to
|
region.position |
for |
prob_histogram |
Whether a histogram of the observed probabilities should be overlaid with the calibration curve. |
res |
An SBC_results object |
When type = "reliabilitydiag", the intervals are based on
reliabilitydiag::reliabilitydiag() and depending on region.position
can be centered on the null distribution of perfect calibration or on the
estimated calibration.
When type = "calibrationband" the intervals
are around the estimated calibration using calibrationband::calibration_bands()
— in our experience the calibrationband
method has less sensitivity to detect miscalibration, but they require
somewhat weaker assumptions.
binary_calibration_from_bp returns a data.frame with columns variable, prob, estimate, low and high,
for each variable, it contains an estimate + confidence interval across a range
of probabilities (in equal steps). The plot_ methods return a ggplot2
object showing either the calibration curve or the difference between
the calibration curve and perfect calibration (the diagonal)
Takes all variables marked with binary_var_attribute() in the
statistics (the $stats field of an SBC_results object)
and transforms the reported CDF values into probabilities
that the underlying binary variable is 1.
binary_probabilities_from_stats(stats)binary_probabilities_from_stats(stats)
To support exact binary probabilities, a backend must implement SBC_posterior_cdf(),
as exact probabilities cannot be derived from samples.
A data.frame with at least the columns variable and prob.
Combine multiple datasets together.
bind_datasets(...)bind_datasets(...)
... |
datasets to bind |
Combine two lists of derived quantities
bind_derived_quantities(dq1, dq2)bind_derived_quantities(dq1, dq2)
Combine two sets globals for use in derived quantities or backend
bind_globals(globals1, globals2)bind_globals(globals1, globals2)
compute_SBC(), derived_quantities()
Primarily useful for iteratively adding more simulations to your SBC check.
bind_results(...)bind_results(...)
... |
objects of type |
An example usage can be found in the small_model_workflow vignette.
Dimitriadis et al. propose several tests based on comparing actual predictions to predictions when the probabilities are calibrated. This yields several possible tests of correctly calibrated predictions (i.e. that the expected proportion of true values matches the predicted probability).
brier_resampling_test(x, y, alpha = 0.05, B = 10000) brier_resampling_p(x, y, B = 10000) binary_miscalibration(x, y) miscalibration_resampling_p(x, y, B = 10000) miscalibration_resampling_test(x, y, alpha = 0.05, B = 10000)brier_resampling_test(x, y, alpha = 0.05, B = 10000) brier_resampling_p(x, y, B = 10000) binary_miscalibration(x, y) miscalibration_resampling_p(x, y, B = 10000) miscalibration_resampling_test(x, y, alpha = 0.05, B = 10000)
x |
the predicted success probabilities |
y |
the actual observed outcomes (just 0 or 1) |
alpha |
the type I error rate for the test |
B |
number of boostrap samples for the null distribution |
The brier_ functions represent a test based on brier score, while
the miscalibration_ functions represent a test based on miscalibration.
In both cases we evaluate the null distribution via bootstrapping.
brier_resampling_test and miscalibration_resampling_test return
an object of class htest, brier_resampling_p and miscalibration_resampling_p
return just the p-value (for easier use with automated workflows).
binary_miscalibration computes just the miscalibration component using
the PAV (pool adjacent violators) algorithm.
T. Dimitriadis, T. Gneiting, & A.I. Jordan,
Stable reliability diagrams for probabilistic classifiers, Proc. Natl. Acad. Sci. U.S.A. 118 (8) e2016191118, https://doi.org/10.1073/pnas.2016191118 (2021).
This is to allow you to directly access the cache, should you need to.
For normal fits, the cache file will always be a list readable by readRDS and have
element $fit for the actual fit. Other elements will contain the
text output, messages and warnings emitted by the fit.
cached_fit_filename(cache_dir, backend, generated, suffix = "")cached_fit_filename(cache_dir, backend, generated, suffix = "")
This is also used internally to cache some extra data (notably bridgesampling
results) using a non-empty suffix string.
Calculate prior standard deviation of a dataset
calculate_prior_sd(datasets)calculate_prior_sd(datasets)
a named vector of prior SDs
When there are ties (e.g. for discrete variables), the rank is currently drawn stochastically
among the ties. Depending on na_lowest,
NA is assumed to be equal to any value.
calculate_ranks_draws_matrix(variables, dm, params = NULL)calculate_ranks_draws_matrix(variables, dm, params = NULL)
variables |
a vector of values to check |
dm |
draws_matrix of the fit (assumed to be already thinned if that was necessary) |
params |
DEPRECATED. Use |
Produces a message for each failed check.
check_all_SBC_diagnostics(results)check_all_SBC_diagnostics(results)
TRUE if all checks are OK, FALSE otherwise.
combine_all_variables will merge all variables in a single plot, while
combine_array_elements will merge all elements of any array into a single
panel of the plot
combine_all_variables(x) combine_array_elements(x)combine_all_variables(x) combine_array_elements(x)
x |
parameter names |
Combine two named lists and overwrite elements with the same name using the value from args2
combine_args(args1, args2)combine_args(args1, args2)
Combine multiple sets of variable attributes.
combine_var_attributes(...)combine_var_attributes(...)
... |
the individual |
It is currently by design that multiple copies of an attribute are kept
Compute derived quantities based on given data and posterior draws.
compute_dquants(draws, generated, dquants, gen_quants = NULL)compute_dquants(draws, generated, dquants, gen_quants = NULL)
gen_quants |
Deprecated, use |
Performs the main SBC routine given datasets and a backend.
compute_SBC( datasets, backend, cores_per_fit = default_cores_per_fit(length(datasets)), keep_fits = TRUE, thin_ranks = SBC_backend_default_thin_ranks(backend), ensure_num_ranks_divisor = 2, chunk_size = default_chunk_size(length(datasets)), dquants = NULL, cache_mode = "none", cache_location = NULL, globals = list(), gen_quants = NULL )compute_SBC( datasets, backend, cores_per_fit = default_cores_per_fit(length(datasets)), keep_fits = TRUE, thin_ranks = SBC_backend_default_thin_ranks(backend), ensure_num_ranks_divisor = 2, chunk_size = default_chunk_size(length(datasets)), dquants = NULL, cache_mode = "none", cache_location = NULL, globals = list(), gen_quants = NULL )
datasets |
an object of class |
backend |
the model + sampling algorithm. The built-in backends can be constructed
using |
cores_per_fit |
how many cores should the backend be allowed to use for a single fit?
Defaults to the maximum number that does not produce more parallel chains
than you have cores. See |
keep_fits |
boolean, when |
thin_ranks |
how much thinning should be applied to posterior draws before computing ranks for SBC. Should be large enough to avoid any noticeable autocorrelation of the thinned draws See details below. |
ensure_num_ranks_divisor |
Potentially drop some posterior samples to ensure that this number divides the total number of SBC ranks (see Details). |
chunk_size |
How many simulations within the |
dquants |
Derived quantities to include in SBC. Use |
cache_mode |
Type of caching of results, currently the only supported modes are
|
cache_location |
The filesystem location of cache. For |
globals |
A list of names of objects that are defined
in the global environment and need to present for the backend to work (
if they are not already available in package).
It is added to the |
gen_quants |
Deprecated, use dquants instead |
An object of class SBC_results().
Parallel processing is supported via the future package, for most uses, it is most sensible
to just call plan(multisession) once in your R session and all
cores your computer will be used. For more details refer to the documentation
of the future package.
On some Linux systems, using future.mirai::mirai_multisession may give
performance boosts.
This function supports progress reporting with the
progressr package.
Run progressr::handlers(global = TRUE) to enable progress reporting
globally, see the progressr docs for more options.
When using backends based on MCMC, there are two possible moments when
draws may need to be thinned. They can be thinned directly within the backend
and they may be thinned only to compute the ranks for SBC as specified by the
thin_ranks argument. The main reason those are separate is that computing the
ranks requires no or negligible autocorrelation while some autocorrelation
may be easily tolerated for summarising the fit results or assessing convergence.
In fact, thinning too aggressively in the backend may lead to overly noisy
estimates of posterior means, quantiles and the posterior::rhat() and
posterior::ess_tail() diagnostics. So for well-adapted Hamiltonian Monte-Carlo
chains (e.g. Stan-based backends), we recommend no thinning in the backend and
even value of thin_ranks between 6 and 10 is usually sufficient to remove
the residual autocorrelation. For a backend based on Metropolis-Hastings,
it might be sensible to thin quite aggressively already in the backend and
then have some additional thinning via thin_ranks.
Backends that don't require thining should implement SBC_backend_iid_draws()
or SBC_backend_default_thin_ranks() to avoid thinning by default.
Some of the visualizations and post processing steps
we use in the SBC package (e.g. plot_rank_hist(), empirical_coverage())
work best if the total number of possible SBC ranks is a "nice" number
(lots of divisors).
However, the number of ranks is one plus the number of posterior samples
after thinning - therefore as long as the number of samples is a "nice"
number, the number of ranks usually will not be. To remedy this, you can
specify ensure_num_ranks_divisor - the method will drop at most
ensure_num_ranks_divisor - 1 samples to make the number of ranks divisible
by ensure_num_ranks_divisor. The default 2 prevents the most annoying
pathologies while discarding at most a single sample.
Maybe not export in the end? Useful for debugging
data_for_ecdf_plots(x, ..., prob = 0.95, gamma = NULL, K = NULL)data_for_ecdf_plots(x, ..., prob = 0.95, gamma = NULL, K = NULL)
By default will make every worker process a single chunk.
You can set the options(SBC.min_chunk_size = value) to enforce a minimum
chunk size globally (chunk size can still be larger if you have substantially more
fits to run than workers.
default_chunk_size(n_fits, n_workers = future::nbrOfWorkers())default_chunk_size(n_fits, n_workers = future::nbrOfWorkers())
When parallel processing is disabled, this just returns the number of available
cores. Otherwise, it chooses the largest integer that keeps
cores_per_fit * (n_fits / chunk_size) <= total_cores, i.e. it avoids
running so many chains in parallel that there will be more chains than cores.
default_cores_per_fit( n_fits, total_cores = future::availableCores(), chunk_size = default_chunk_size(n_fits) )default_cores_per_fit( n_fits, total_cores = future::availableCores(), chunk_size = default_chunk_size(n_fits) )
When the expression contains non-library functions/objects, and parallel processing
is enabled, those must be
named in the .globals parameter (hopefully we'll be able to detect those
automatically in the future). Note that recompute_SBC_statistics() currently
does not use parallel processing, so .globals don't need to be set.
derived_quantities(..., .var_attributes = NULL, .globals = list())derived_quantities(..., .var_attributes = NULL, .globals = list())
... |
named expressions representing the quantitites |
.var_attributes |
a |
.globals |
A list of names of objects that are defined
in the global environment and need to present for the gen. quants. to evaluate.
It is added to the |
# Derived quantity computing the total log likelihood of a normal distribution # with known sd = 1 normal_lpdf <- function(y, mu, sigma) { sum(dnorm(y, mean = mu, sd = sigma, log = TRUE)) } # Note the use of .globals to make the normal_lpdf function available # within the expression log_lik_dq <- derived_quantities(log_lik = normal_lpdf(y, mu, 1), .globals = "normal_lpdf" )# Derived quantity computing the total log likelihood of a normal distribution # with known sd = 1 normal_lpdf <- function(y, mu, sigma) { sum(dnorm(y, mean = mu, sd = sigma, log = TRUE)) } # Note the use of .globals to make the normal_lpdf function available # within the expression log_lik_dq <- derived_quantities(log_lik = normal_lpdf(y, mu, 1), .globals = "normal_lpdf" )
var_attributes() object associated with the derived quantitiesGet the var_attributes() object associated with the derived quantities
dquants_var_attributes(dquants)dquants_var_attributes(dquants)
Uses ranks to compute coverage and surrounding uncertainty of posterior credible intervals.
The uncertainty is only approximate (treating coverage for each interval as a set of independent
Bernoulli trials, while in fact they are not independent), so for making claims on presence/
absence of detectable discrepancies we strongly recommend using plot_ecdf() or plot_ecdf_diff().
The uncertainty about the coverage can however be useful for guiding decisions on whether
more SBC steps should be performed (i.e. whether we can rule out that the coverage of
the given backend differs too much for our purposes from the optimal value).
empirical_coverage(stats, width, prob = 0.95, interval_type = "central")empirical_coverage(stats, width, prob = 0.95, interval_type = "central")
stats |
a data.frame of rank statistics (e.g. as returned in the |
width |
a vector of values between 0 and 1 representing widths of credible intervals for which we compute coverage. |
prob |
determines width of the uncertainty interval around the observed coverage |
inteval_type |
|
Note that while coverage of central posterior intervals (with the default type = "central")
is often of the biggest practical interest, perfect calibration of central intervals
still leaves space for substantial problems with the model (e.g. if the posterior 25% - 50% intervals
contain 50% of the true values and the posterior 50% - 75% interval never contains the true value,
the central 50% interval still has the ideal 50% coverage), so investigating central
intervals should always be accompanied by checks with plot_ecdf() or plot_ecdf_diff()
or by using type = "leftmost", because if all leftmost credible intervals are well calibrated,
then all intervals are well calibrated.
A data.frame with columns variable, width (width of the interval as given
in the width parameter), width_represented the closest width that can be represented by
the ranks in the input (any discrepancy needs to be judged against this rather than width),
estimate - observed coverage for the interval, ci_low, ci_high the uncertainty
interval around estimate (width of the interval is given by the prob argument).
$stats element of the resultsExtract arguments of a given attribute from the $stats element of the results
extract_attribute_arguments_stats(attribute_name, x)extract_attribute_arguments_stats(attribute_name, x)
Test a null hypothesis about the mean of i.i.d. samples. The test is based on Gaffke 2005, though a more detailed analysis and exposition can be found in Learned-Miller & Thomas 2020.
gaffke_ci(probs, B = 10000, alpha = 0.05) gaffke_p( probs, mu = 0.5, alpha = 0.05, B = 10000, alternative = c("two.sided", "less", "greater") ) gaffke_test( x, mu = 0.5, alpha = 0.05, lb = 0, ub = 1, B = 10000, alternative = c("two.sided", "less", "greater") )gaffke_ci(probs, B = 10000, alpha = 0.05) gaffke_p( probs, mu = 0.5, alpha = 0.05, B = 10000, alternative = c("two.sided", "less", "greater") ) gaffke_test( x, mu = 0.5, alpha = 0.05, lb = 0, ub = 1, B = 10000, alternative = c("two.sided", "less", "greater") )
B |
number of bootstrap samples for the null distribution |
alpha |
the level of the test |
mu |
the mean under null hypothesis |
alternative |
the alternative for the test. |
x |
a vector of observed values |
lb |
the lower bound for |
ub |
the upper bound for |
The test is expected to be valid for any bounded distribution without further assumptions. The test has been proven valid only for special cases but no counterexample is known despite some efforts in the literature to find some. The test also provides way more power (and tighter confidence intervals) than other non-parametric tests for bounded means - in fact its power converges quite quickly to that of the t-test, which is known to be optimal in the large data limit.
In SBC the test is useful for testing the data-averaged posterior for binary variables.
gaffke_test returns an object of class htest, gaffke_p and
gaffke_ci return just the p-value / CI as numeric for easier use in batch
workflows.
Gaffke, N. (2005). “Three test statistics for a nonparametric one-sided hypothesis on the mean of a nonnegative variable.” Mathematical Methods of Statistics, 14(4): 451–467.
Learned-Miller, E. and Thomas, P. S. (2020). “A New Confidence Interval for the Mean of a Bounded Random Variable.” https://arxiv.org/abs/1905.06208
Generate datasets.
generate_datasets(generator, n_sims, n_datasets = NULL)generate_datasets(generator, n_sims, n_datasets = NULL)
generator |
a generator object - build e.g. via |
n_sims |
the number of simulated datasets to use |
n_datasets |
DEPRECATED, use |
object of class SBC_datasets
Extract stats for a submodel from a Bayes factor results
get_stats_for_submodel(stats, submodel_id)get_stats_for_submodel(stats, submodel_id)
plot_rank_hist().Guess the number of bins for plot_rank_hist().
guess_rank_hist_bins(max_rank, N)guess_rank_hist_bins(max_rank, N)
max_rank |
the maximum rank observed |
N |
the number of ranks observed |
The rationale for this plot and its interpretation is explained in Mike Betancourt's Towards A Principled Bayesian Workflow.
plot_contraction( x, prior_sd, variables = NULL, scale = "sd", alpha = 0.8, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_contraction( x, prior_sd, variables = NULL, scale = "sd", alpha = 0.8, show_hidden = FALSE, parameters = NULL )plot_contraction( x, prior_sd, variables = NULL, scale = "sd", alpha = 0.8, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_contraction( x, prior_sd, variables = NULL, scale = "sd", alpha = 0.8, show_hidden = FALSE, parameters = NULL )
x |
object containing results (a data.frame or |
prior_sd |
a named vector of prior standard deviations for your variables.
Either pass in analytically obtained values or use |
variables |
variables to show in the plot or |
scale |
which scale of variability you want to see - either |
alpha |
the alpha for the points |
parameters |
DEPRECATED, use |
|
Show variables marked with |
a ggplot2 plot object
plot_coverage will plot the observed coverage,
while plot_coverage_diff will show the difference between observed
and expected coverage.
Please refer to empirical_coverage() for details on computation
and limitations of this plot as well as details on the arguments.
See vignette("rank_visualizations") for
more details.
plot_coverage( x, variables = NULL, prob = 0.95, interval_type = "central", ..., parameters = NULL, max_points = NULL ) plot_coverage_diff( x, variables = NULL, prob = 0.95, interval_type = "central", ..., parameters = NULL, max_points = NULL )plot_coverage( x, variables = NULL, prob = 0.95, interval_type = "central", ..., parameters = NULL, max_points = NULL ) plot_coverage_diff( x, variables = NULL, prob = 0.95, interval_type = "central", ..., parameters = NULL, max_points = NULL )
x |
object containing results (a data.frame or |
variables |
variables to show in the plot or |
prob |
the with of the uncertainty interval to be shown |
parameters |
DEPRECATED. Use |
max_points |
maximum number of points where to evaluate the coverage.
If set to |
a ggplot2 plot object
empirical_coverage
See vignette("rank_visualizations") for
more details.
See the methods for data_for_ecdf_plots() for available data formats.
plot_ecdf( x, variables = NULL, K = NULL, gamma = NULL, prob = 0.95, size = 1, alpha = 0.33, combine_variables = NULL, ecdf_alpha = NULL, show_hidden = FALSE, facet_args = list(), ..., parameters = NULL ) plot_ecdf_diff( x, variables = NULL, K = NULL, gamma = NULL, prob = 0.95, size = 1, alpha = 0.33, combine_variables = NULL, ecdf_alpha = NULL, show_hidden = FALSE, facet_args = list(), ..., parameters = NULL )plot_ecdf( x, variables = NULL, K = NULL, gamma = NULL, prob = 0.95, size = 1, alpha = 0.33, combine_variables = NULL, ecdf_alpha = NULL, show_hidden = FALSE, facet_args = list(), ..., parameters = NULL ) plot_ecdf_diff( x, variables = NULL, K = NULL, gamma = NULL, prob = 0.95, size = 1, alpha = 0.33, combine_variables = NULL, ecdf_alpha = NULL, show_hidden = FALSE, facet_args = list(), ..., parameters = NULL )
x |
object supporting the |
variables |
optional subset of variables to show in the plot |
K |
number of uniformly spaced evaluation points for the ECDF or ECDFs. Affects
the granularity of the plot and can significantly speed up the computation
of the simultaneous confidence bands. Default value is chosen heuristically.
You can also use |
gamma |
TODO |
prob |
the width of the plotted confidence interval for the ECDF. |
size |
size (linewidth) passed to |
alpha |
alpha level of the confidence band |
combine_variables |
optionally specify a named list where each entry is a character vectors which specifies a group of variables that will be displayed in the same panel. Panel title will be the name of the list element. A function that takes a character vector as an input and produces a list can also be specified (see combine-functions). |
ecdf_alpha |
the alpha level of the empirical CDF. Can be either a single number or
a function taking the number of variables that were combined (when |
|
Show variables marked with |
|
facet_args |
extra arguments for the call to |
... |
additional arguments passed to |
parameters |
DEPRECATED, use |
arxiv::1903.08008 by A. Vehtari et al.
The expected uniform distribution and an approximate confidence interval
is also shown. The confidence interval cannot be taken too seriously
as it is derived assuming the bins are independent (which they are not).
The plot_ecdf() and plot_ecdf_diff() plots provide better confidence interval
but are somewhat less interpretable. See vignette("rank_visualizations") for
more details.
plot_rank_hist( x, variables = NULL, bins = NULL, prob = 0.95, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_rank_hist( x, variables = NULL, bins = NULL, prob = 0.95, max_rank = x$max_rank, show_hidden = FALSE, parameters = NULL, facet_args = list() )plot_rank_hist( x, variables = NULL, bins = NULL, prob = 0.95, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_rank_hist( x, variables = NULL, bins = NULL, prob = 0.95, max_rank = x$max_rank, show_hidden = FALSE, parameters = NULL, facet_args = list() )
x |
Object supporting the plotting method. |
variables |
Names of variables to show |
bins |
number of bins to be used in the histogram, if left unspecified,
it is determined by |
prob |
The width of the approximate confidence interval shown. |
|
Show variables marked with |
|
facet_args |
extra arguments for the call to |
By default the support is for SBC_results objects and data frames in the same
format as the $stats element of SBC_results.
Plot the simulated "true" values versus posterior estimates
plot_sim_estimated( x, variables = NULL, estimate = "mean", uncertainty = c("q5", "q95"), alpha = NULL, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_sim_estimated( x, variables = NULL, estimate = "mean", uncertainty = c("q5", "q95"), alpha = NULL, show_hidden = FALSE, parameters = NULL )plot_sim_estimated( x, variables = NULL, estimate = "mean", uncertainty = c("q5", "q95"), alpha = NULL, ..., parameters = NULL ) ## S3 method for class 'data.frame' plot_sim_estimated( x, variables = NULL, estimate = "mean", uncertainty = c("q5", "q95"), alpha = NULL, show_hidden = FALSE, parameters = NULL )
x |
object containing results (a data.frame or |
variables |
variables to show in the plot or |
estimate |
which estimate to use for the central tendency,
must correspond a field already computed in the results (most likely |
uncertainty |
which estimates to use for uncertainty (a character vector of length 2)
must correspond a field already computed in the results. Pass |
alpha |
the alpha for the points and uncertainty intervals |
parameters |
DEPRECATED, use |
|
Show variables marked with |
a ggplot2 plot object
Custom rbind implementation maintainig information about submodels
## S3 method for class 'SBC_bridgesampling_diagnostics' rbind(...)## S3 method for class 'SBC_bridgesampling_diagnostics' rbind(...)
Useful for example to recompute SBC ranks with a different choice of thin_ranks
or added derived quantities.
recompute_SBC_statistics( old_results, datasets, backend, thin_ranks = SBC_backend_default_thin_ranks(backend), ensure_num_ranks_divisor = 2, dquants = NULL, gen_quants = NULL )recompute_SBC_statistics( old_results, datasets, backend, thin_ranks = SBC_backend_default_thin_ranks(backend), ensure_num_ranks_divisor = 2, dquants = NULL, gen_quants = NULL )
datasets |
an object of class |
backend |
backend used to fit the results. Used to pull various defaults and other setting influencing the computation of statistics. |
thin_ranks |
how much thinning should be applied to posterior draws before computing ranks for SBC. Should be large enough to avoid any noticeable autocorrelation of the thinned draws See details below. |
ensure_num_ranks_divisor |
Potentially drop some posterior samples to ensure that this number divides the total number of SBC ranks (see Details). |
dquants |
Derived quantities to include in SBC. Use |
gen_quants |
Deprecated, use dquants instead |
An S3 object of class SBC_results with updated $stats and $default_diagnostics fields.
data.frame column with attributes.Remove an attribute from a data.frame column with attributes.
remove_attribute_from_stats(attribute, x)remove_attribute_from_stats(attribute, x)
By design, only one copy of an attribute is removed, if there are multiple copies
Backend using bridgesampling to do Bayes factor calculation (and Bayesian Model Averaging) over two candidate models.
SBC_backend_bridgesampling( ..., model_var = "model", prior_probs = NULL, prior_prob1 = NULL, bridgesampling_args = list() )SBC_backend_bridgesampling( ..., model_var = "model", prior_probs = NULL, prior_prob1 = NULL, bridgesampling_args = list() )
... |
passed to |
brms package.Build a backend based on the brms package.
SBC_backend_brms( ..., template_data, out_stan_file = NULL, template_dataset = NULL )SBC_backend_brms( ..., template_data, out_stan_file = NULL, template_dataset = NULL )
... |
arguments passed to |
template_data |
a representative value for the |
out_stan_file |
A filename for the generated Stan code. Useful for debugging and for avoiding unnecessary recompilations. |
template_dataset |
DEPRECATED. Use |
SBC_generator_brms
object.Build a brms backend, reusing the compiled model from a previously created SBC_generator_brms
object.
SBC_backend_brms_from_generator(generator, ...)SBC_backend_brms_from_generator(generator, ...)
The cache is stored in files named based on dataset and underlying
backend hashes, within the given directory, you can use
cached_fit_filename() to manually access the cache.
SBC_backend_cached(cache_dir, backend)SBC_backend_cached(cache_dir, backend)
cache_dir |
directory where the cache files (one file per fit) will be stored |
backend |
any other backend that does the actual computation |
The cached fit will try to re-emit all output, warnings and messages. The ordering within each category is preserved, but the relative ordering of e.g. outputs vs. messages (and all other combinations) may not be.
cmdstanr.Backend based on sampling via cmdstanr.
SBC_backend_cmdstan_sample(model, ..., init_factory = NULL)SBC_backend_cmdstan_sample(model, ..., init_factory = NULL)
model |
an object of class |
... |
other arguments passed to the |
init_factory |
an optional function that takes in a dataset and returns a value that
can be passed to the |
cmdstanr.Backend based on variational approximation via cmdstanr.
SBC_backend_cmdstan_variational(model, ..., n_retries_init = 1)SBC_backend_cmdstan_variational(model, ..., n_retries_init = 1)
model |
an object of class |
... |
other arguments passed to the |
n_retries_init |
number of times to retry the variational fit if the algorithm has trouble initializing (e.g. too many dropped evaluations (see https://discourse.mc-stan.org/t/advi-too-many-dropped-evaluations-even-for-well-behaved-models/24338), or "cannot compute ELBO using the initial variational distribution") |
The default implementation plays it relatively safe and returns 10, unless
SBC_backend_iid_draws() returns TRUE in which case it returns 1.
SBC_backend_default_thin_ranks(backend) ## Default S3 method: SBC_backend_default_thin_ranks(backend)SBC_backend_default_thin_ranks(backend) ## Default S3 method: SBC_backend_default_thin_ranks(backend)
A backend for Bayes factor workflow with BayesFactor::extractBF.
SBC_backend_extractBF_comparison(backend_H0, backend_H1, model_var = "model")SBC_backend_extractBF_comparison(backend_H0, backend_H1, model_var = "model")
If the function returns a draws_matrix object, no other
work is necessary to make it work with SBC. Useful for quick tests.
SBC_backend_function( func, generated_arg = "generated", cores_arg = NULL, args = list(), iid_draws = FALSE, default_thin_ranks = 10 )SBC_backend_function( func, generated_arg = "generated", cores_arg = NULL, args = list(), iid_draws = FALSE, default_thin_ranks = 10 )
func |
the function that will be called in |
generated_arg |
name of the argument of |
cores_arg |
name of the argument of |
args |
a (named) list of additional arguments to the function |
iid_draws |
does the result of the backend have independent identically
distribute draws (will be returned by |
default_thin_ranks |
suggested thinning if user does not specify any (will be returned by |
# Generate t-distributed variables as a ratio of standard normal (z) # and transformed chi-squared (v) variables. # What is the conditional distribution of z if t is observed? # See https://math.stackexchange.com/a/5085538/423833 for derivation that it # is generalized gamma distribution. Here we test this is correct. library(ggplot2) theme_set(theme_minimal()) N_sims <- 100 df <- 5 z <- rnorm(N_sims) v <- rchisq(N_sims, df = df) t <- z / sqrt(v/df) # Bundle in a dataset with extra quantities my_data <- SBC_datasets( variables = posterior::draws_matrix (z = z, v = v, lik = abs(z) * dchisq(df * z^2 / t^2, df = df)), generated = purrr::map(t, \(t) list(t = t, df = df))) # Main workhorse function my_post_func <- function(generated) { df <- generated$df t <- generated$t gg_d <- df + 1 gg_p <- 2 gg_a <- 1/sqrt(df/(2*t^2) + 0.5) # Transform to parametrization used by ggamma gg_b <- gg_p gg_k <- gg_d / gg_p abs_z <- ggamma::rggamma(1000, gg_a, gg_b, gg_k) v <- df * abs_z^2 / t^2 lik = abs_z * dchisq(df * abs_z^2 / t^2, df = df) posterior::draws_matrix(z = abs_z * sign(generated$t), v = v, lik = lik) } backend <- SBC_backend_function(my_post_func, iid_draws = TRUE) res <- compute_SBC(my_data, backend, keep_fits = FALSE) plot_ecdf_diff(res) plot_sim_estimated(res)# Generate t-distributed variables as a ratio of standard normal (z) # and transformed chi-squared (v) variables. # What is the conditional distribution of z if t is observed? # See https://math.stackexchange.com/a/5085538/423833 for derivation that it # is generalized gamma distribution. Here we test this is correct. library(ggplot2) theme_set(theme_minimal()) N_sims <- 100 df <- 5 z <- rnorm(N_sims) v <- rchisq(N_sims, df = df) t <- z / sqrt(v/df) # Bundle in a dataset with extra quantities my_data <- SBC_datasets( variables = posterior::draws_matrix (z = z, v = v, lik = abs(z) * dchisq(df * z^2 / t^2, df = df)), generated = purrr::map(t, \(t) list(t = t, df = df))) # Main workhorse function my_post_func <- function(generated) { df <- generated$df t <- generated$t gg_d <- df + 1 gg_p <- 2 gg_a <- 1/sqrt(df/(2*t^2) + 0.5) # Transform to parametrization used by ggamma gg_b <- gg_p gg_k <- gg_d / gg_p abs_z <- ggamma::rggamma(1000, gg_a, gg_b, gg_k) v <- df * abs_z^2 / t^2 lik = abs_z * dchisq(df * abs_z^2 / t^2, df = df) posterior::draws_matrix(z = abs_z * sign(generated$t), v = v, lik = lik) } backend <- SBC_backend_function(my_post_func, iid_draws = TRUE) res <- compute_SBC(my_data, backend, keep_fits = FALSE) plot_ecdf_diff(res) plot_sim_estimated(res)
S3 generic that allows backends to override how a hash is computed. By default rlang::hash()
is used.
SBC_backend_hash_for_cache(backend)SBC_backend_hash_for_cache(backend)
Most backends (e.g. those based on variatns of MCMC) don't produce
independent draws and thus diagnostics like Rhat and ESS are important
and draws may need thinning. Backends that already produce independent
draws (e.g. ADVI/optimizing) can implement this method to return TRUE
to signal this is the case. If this method returns TRUE, ESS and Rhat will
always attain their best possible values and SBC_backend_default_thin_ranks()
will return 1.
The default implementation returns FALSE.
SBC_backend_iid_draws(backend) ## Default S3 method: SBC_backend_iid_draws(backend)SBC_backend_iid_draws(backend) ## Default S3 method: SBC_backend_iid_draws(backend)
backend |
to check |
BayesFactor::lmBF()
A backend using BayesFactor::lmBF()
SBC_backend_lmBF(..., iterations = 2000)SBC_backend_lmBF(..., iterations = 2000)
... |
all parameters are passed directly to |
Mock backend is useful for testing the package.
It will ignore all data passed to it and always
provide result as the draws generated by the backend.
SBC_backend_mock( result = posterior::draws_matrix(a = rnorm(100)), output = NULL, message = NULL, warning = NULL, error = NULL, bridgesampler = NULL )SBC_backend_mock( result = posterior::draws_matrix(a = rnorm(100)), output = NULL, message = NULL, warning = NULL, error = NULL, bridgesampler = NULL )
result |
a |
This is useful e.g. to provide a live stanmodel instance to an rstan fit,
as the one loaded from cache won't work for some cases.
SBC_backend_postprocess_cached_fit(backend, generated, fit) SBC_backend_preprocess_fit_for_cache(backend, generated, fit)SBC_backend_postprocess_cached_fit(backend, generated, fit) SBC_backend_preprocess_fit_for_cache(backend, generated, fit)
The deafult implementation does nothing.
rjags
Create a JAGS backend using rjags
SBC_backend_rjags( file, n.iter, n.burnin, variable.names, thin = 1, na.rm = TRUE, ... )SBC_backend_rjags( file, n.iter, n.burnin, variable.names, thin = 1, na.rm = TRUE, ... )
file |
model file or connection to model code (passed to |
n.iter |
number of iterations for sampling (passed to [rjags::coda.samples()) |
n.burnin |
number of iterations used for burnin |
variable.names |
names of variables to monitor (passed to |
thin |
thinning (passed to |
na.rm |
whether to omit variables containing NA (passed to |
... |
additional optional arguments passed to
|
optimizing method from rstan.SBC backend using the optimizing method from rstan.
SBC_backend_rstan_optimizing(model, ..., n_retries_hessian = 1)SBC_backend_rstan_optimizing(model, ..., n_retries_hessian = 1)
model |
a |
... |
other arguments passed to |
n_retries_hessian |
the number of times the backend is allow to retry optimization (with different seeed) to produce a usable Hessian that can produce draws. In some cases, the Hessian may be numerically unstable and not be positive definite. |
sampling method from rstan.SBC backend using the sampling method from rstan.
SBC_backend_rstan_sample(model, ...)SBC_backend_rstan_sample(model, ...)
model |
a |
... |
other arguments passed to |
SBC_datasets object.In most cases, you may want to use generate_datasets to build the object, but
for full control, you can also create datasets directly via this function.
SBC_datasets(variables, generated, var_attributes = NULL, parameters = NULL)SBC_datasets(variables, generated, var_attributes = NULL, parameters = NULL)
variables |
draws of "true" values of unobserved parameters or other derived variables.
An object of class |
generated |
a list of objects that can be passed as data to the backend you plan to use.
(e.g. list of values for Stan-based backends, a data frame for |
parameters |
DEPRECATED. Use variables instead. |
Merge two or more datasets of the same length, each generated by a different model. The "true" model will be chosen randomly for each dataset.
SBC_datasets_for_bf(..., probs = NULL, model_var = "model", prob_H1 = NULL)SBC_datasets_for_bf(..., probs = NULL, model_var = "model", prob_H1 = NULL)
The merged dataset will keep track of all parameters and later allow
splitting results with split_SBC_results_for_bf(), to this, a number of
submodel-specific variables will be stored, but marked as
hidden_var_attribute() to not clutter default visualisations.
Note that this will involve compiling a Stan model and may take a while.
SBC_example_backend( example = c("normal_sd", "normal_bad"), interface = c("rstan", "cmdstanr", "rjags") )SBC_example_backend( example = c("normal_sd", "normal_bad"), interface = c("rstan", "cmdstanr", "rjags") )
example |
name of the example model. |
interface |
name of the interface to be used to fit the model |
Construct a generator used in the examples.
SBC_example_generator(example = c("normal"), N = 100)SBC_example_generator(example = c("normal"), N = 100)
example |
name of example |
N |
size of the dataset the generator should simulate |
an object that can be passed to generate_datasets()
Except for example = "visualizations", all examples will actually
compile and fit Stan models and thus may take a while to complete.
SBC_example_results( example = c("normal_ok", "normal_bad", "visualizations"), interface = c("rstan", "cmdstanr", "rjags"), N = 100, n_sims = 50 )SBC_example_results( example = c("normal_ok", "normal_bad", "visualizations"), interface = c("rstan", "cmdstanr", "rjags"), N = 100, n_sims = 50 )
example |
|
interface |
name of the interface to be used for the backend |
N |
number of datapoints to simulate from the generator for each simulation |
n_sims |
number of simulations to perform |
Needs to be implemented by all backends.
All implementations have to return an object for which you can safely
call SBC_fit_to_draws_matrix() and get some draws.
If that's not possible an error should be raised.
SBC_fit(backend, generated, cores)SBC_fit(backend, generated, cores)
SBC_backend_extractBF_comparison() backend.A fits need to implement this S3 generic to be useful for the
SBC_backend_extractBF_comparison() backend.
SBC_fit_to_BFBayesFactor(fit)SBC_fit_to_BFBayesFactor(fit)
Users should typically not call this method and instead let
SBC_backend_bridgesampling() to call it for them.
SBC_fit_to_bridge_sampler(backend, fit, generated, ...)SBC_fit_to_bridge_sampler(backend, fit, generated, ...)
backend |
the underlying backend |
fit |
corresponding to the backend |
generated |
the generated data used to fit the underlying model |
... |
passed to |
an object of class bridge or bridge_list.
bridgesampling::bridge_sampler()
The diagnostics object has to be a data.frame but may
inherit additional classes - in particular it may be useful
for the returning object to implement diagnostic_types().
SBC_fit_to_diagnostics(fit, fit_output, fit_messages, fit_warnings)SBC_fit_to_diagnostics(fit, fit_output, fit_messages, fit_warnings)
fit |
The fit returned by |
fit_output |
a character string capturing what the backend wrote to stdout |
fit_messages |
a character vector of messages the backend raised |
fit_warnings |
a character vector of warnings the backend raised |
an single row data.frame that includes diagnostics or NULL, if no diagnostics available.
draws_matrix object.Needs to be implemented for all types of objects the backend can
return from SBC_fit(). Default implementation just calls,
posterior::as_draws_matrix(), so if the fit object already supports
this, it will work out of the box.
SBC_fit_to_draws_matrix(fit) ## Default S3 method: SBC_fit_to_draws_matrix(fit)SBC_fit_to_draws_matrix(fit) ## Default S3 method: SBC_fit_to_draws_matrix(fit)
Brms generator uses a brms model with sample_prior = "only" to generate
new datasets.
SBC_generator_brms( formula, data, ..., generate_lp = TRUE, generate_lp_chunksize = 5000/nrow(data), out_stan_file = NULL )SBC_generator_brms( formula, data, ..., generate_lp = TRUE, generate_lp_chunksize = 5000/nrow(data), out_stan_file = NULL )
formula, data, ...
|
arguments passed to |
generate_lp |
whether to compute the overall log-likelihood of the model as an additional variable. This can be somewhat computationally expensive, but improves sensitivity of the SBC process. |
generate_lp_chunksize |
Will determine whether
the chunk size used with |
out_stan_file |
A filename for the generated Stan code. Useful for debugging and for avoiding unnecessary recompilations. |
This creates a very thin wrapper around the function and can store additional arguments, but does not do anything more..
SBC_generator_custom(f, ...)SBC_generator_custom(f, ...)
f |
function accepting at least an |
... |
Additional arguments passed to |
Running:
gen <- SBC_generator_custom(f, <<some other args>>) datasets <- generate_datasets(gen, n_sims = my_n_sims)
is equivalent to just running
datasets <- f(<<some other args>>, n_sims = my_n_sims)
So whenever you control the code calling generate_datasets,
it usually makes more sense to just create an SBC_datasets
object directly and avoid using SBC_generator_custom and generate_datasets at all.
SBC_generator_custom can however be useful, when a code you
do not control calls generate_datasets for you and the
built-in generators do not provide you with enough flexibility.
The generator supports progress reporting with the
progressr package.
Run progressr::handlers(global = TRUE) to enable progress reporting
globally, see the progressr docs for more options.
SBC_generator_function( f, future.chunk.size = getOption("SBC.generator_chunk_size", +Inf), future.globals = TRUE, ... )SBC_generator_function( f, future.chunk.size = getOption("SBC.generator_chunk_size", +Inf), future.globals = TRUE, ... )
f |
function returning a list with elements |
future.chunk.size |
if finite, datasets will be generated in parallel using
the |
... |
Additional arguments passed to |
Backends that represent explicit posterior distributions (i.e. not just samples) can implement this S3 generic to evaluate the CDF of that distribution at the simulated values.
SBC_posterior_cdf(fit, variables) ## Default S3 method: SBC_posterior_cdf(fit, variables)SBC_posterior_cdf(fit, variables) ## Default S3 method: SBC_posterior_cdf(fit, variables)
fit |
an object returned by the |
variables |
a named vector of values at which to evaluate the CDF |
A backend may choose to return explicit CDF only for some model parameters.
return from SBC_fit().
either NULL (the default implementation) or a data.frame
with a row for each variable. The columns must include:
variable (variable name), and either cdf (if there are no ties) or the
pair cdf_low and cdf_high indicate the possible tied values for the CDF.
Print the Stan code of a model used in the examples.
SBC_print_example_model( example = c("normal_sd", "normal_bad"), interface = c("rstan", "cmdstanr", "rjags") )SBC_print_example_model( example = c("normal_sd", "normal_bad"), interface = c("rstan", "cmdstanr", "rjags") )
example |
name of the example model. |
SBC_results objectThis will build and validate an SBC_results object from its constituents.
SBC_results( stats, fits, backend_diagnostics, default_diagnostics, outputs, messages, warnings, errors )SBC_results( stats, fits, backend_diagnostics, default_diagnostics, outputs, messages, warnings, errors )
The SBC_results contains the following fields:
$stats statistics for all variables and fits (one row per variable-fit combination)
$fits the raw fits (unless keep_fits = FALSE) or NULL if the fit failed
$errors error messages that caused fit failures
$outputs, $messages, $warnings the outputs/messages/warnings written by fits
$default_diagnostics a data frame of default convergence/correctness diagnostics (one row per fit)
$backend_diagnostics a data frame of backend-specific diagnostics (one row per fit)
Potentially useful for doing some advanced stuff, but should not
be used in regular workflow. Use recompute_SBC_statistics() to update
an [SBC_results] objects with different thin_ranks or other settings.
SBC_statistics_from_single_fit( fit, variables, generated, thin_ranks, ensure_num_ranks_divisor, dquants, backend, var_attributes = NULL, gen_quants = NULL )SBC_statistics_from_single_fit( fit, variables, generated, thin_ranks, ensure_num_ranks_divisor, dquants, backend, var_attributes = NULL, gen_quants = NULL )
thin_ranks |
how much thinning should be applied to posterior draws before computing ranks for SBC. Should be large enough to avoid any noticeable autocorrelation of the thinned draws See details below. |
ensure_num_ranks_divisor |
Potentially drop some posterior samples to ensure that this number divides the total number of SBC ranks (see Details). |
dquants |
Derived quantities to include in SBC. Use |
backend |
the model + sampling algorithm. The built-in backends can be constructed
using |
gen_quants |
Deprecated, use dquants instead |
Custom select implementation maintaining information about submodels
## S3 method for class 'SBC_bridgesampling_diagnostics' select(diags, ...)## S3 method for class 'SBC_bridgesampling_diagnostics' select(diags, ...)
Split stats from SBC of Bayes factor into stats for the individual submodels.
split_SBC_results_for_bf(results_bf)split_SBC_results_for_bf(results_bf)
list of two stats data.frames, each for simulations where the
corresponding model was chosen.
Validate a definition of derived quantities evaluated in R.
validate_derived_quantities(x)validate_derived_quantities(x)
Attributes give additional information useful for presenting SBC results concerning the variables.
var_attributes(...) validate_var_attributes(var_attr) possibly_constant_var_attribute() binary_var_attribute() hidden_var_attribute() na_valid_var_attribute() inf_valid_var_attribute() submodel_var_attribute(sub_id)var_attributes(...) validate_var_attributes(var_attr) possibly_constant_var_attribute() binary_var_attribute() hidden_var_attribute() na_valid_var_attribute() inf_valid_var_attribute() submodel_var_attribute(sub_id)
Should be passed via an instance of the var_attributes class to
SBC_datasets() (e.g. by returning a $var_attributes element from a
function passed to SBC_generator_function()).
possibly_constant_var_attribute attribute signals that having all
posterior draws identical is possible and thus no warnings should be
made for the resulting NAs in rhat and ESS checks.
binary_var_attribute marks the attribute as a binary variable (0 or 1)
and thus eligible for some special handling and visualisations
(e.g., binary_probabilites_from_stats(), binary_calibration_from_bp()).
hidden_var_attribute will hide the variable in default visualisations,
unless the variable is explicitly mentioned.
na_valid_var_attribute will treat NAs as potentially equal to any
other value in rank ordering. This gives the expected results when NA
represents rare problems in computation that
should be ignored (see calculate_ranks_draws_matrix()). Setting this
attribute also changes the ESS/Rhat computation to ignore NAs.
inf_valid_var_attribute means infinity values may appear in the samples
(this is useful e.g. to note that the parameter is actually not present for the
given draw). Setting this
attribute also changes the ESS/Rhat computation to ignore infinities. It
is also assume that it is OK if all of the draws are infinity.
submodel_var_attribute signals that the parameter belongs to a submodel
which can be extracted individually
In SBC results, the attributes of a variable are summarised in the
attributes column of the $stats data.frame. Use attribute_present_stats()
to check for presence of an attribute there.