--- title: "Working with Posteriors" output: rmarkdown::html_vignette: toc: true toc_depth: 3 params: EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true") vignette: > %\VignetteIndexEntry{Working with Posteriors} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r child="children/_settings-knitr.Rmd"} ``` ```{r include=FALSE} # Needed temporarily to avoiding weird rendering of posterior's tibbles # in pkgdown sites print.tbl_df <- function(x, ...) { print.data.frame(x) } ``` ## Summary statistics We can easily customize the summary statistics reported by `$summary()` and `$print()`. ```{r} fit <- cmdstanr::cmdstanr_example("schools", method = "sample") fit$summary() ``` By default all variables are summaries with the follow functions: ```{r} posterior::default_summary_measures() ``` To change the variables summarized, we use the variables argument ```{r} fit$summary(variables = c("mu", "tau")) ``` We can additionally change which functions are used ```{r} fit$summary(variables = c("mu", "tau"), mean, sd) ``` To summarize all variables with non-default functions, it is necessary to set explicitly set the variables argument, either to `NULL` or the full vector of variable names. ```{r} fit$metadata()$model_params fit$summary(variables = NULL, "mean", "median") ``` Summary functions can be specified by character string, function, or using a formula (or anything else supported by `rlang::as_function()`). If these arguments are named, those names will be used in the tibble output. If the summary results are named they will take precedence. ```{r} my_sd <- function(x) c(My_SD = sd(x)) fit$summary( c("mu", "tau"), MEAN = mean, "median", my_sd, ~quantile(.x, probs = c(0.1, 0.9)), Minimum = function(x) min(x) ) ``` Arguments to all summary functions can also be specified with `.args`. ```{r} fit$summary(c("mu", "tau"), quantile, .args = list(probs = c(0.025, .05, .95, .975))) ``` The summary functions are applied to the array of sample values, with dimension `iter_sampling`x`chains`. ```{r} fit$summary(variables = NULL, dim, colMeans) ``` For this reason users may have unexpected results if they use `stats::var()` directly, as it will return a covariance matrix. An alternative is the `distributional::variance()` function, which can also be accessed via `posterior::variance()`. ```{r} fit$summary(c("mu", "tau"), posterior::variance, ~var(as.vector(.x))) ``` Summary functions need not be numeric, but these won't work with `$print()`. ```{r} strict_pos <- function(x) if (all(x > 0)) "yes" else "no" fit$summary(variables = NULL, "Strictly Positive" = strict_pos) # fit$print(variables = NULL, "Strictly Positive" = strict_pos) ``` For more information, see `posterior::summarise_draws()`, which is called by `$summary()`. ## Extracting posterior draws/samples The [`$draws()`](https://mc-stan.org/cmdstanr/reference/fit-method-draws.html) method can be used to extract the posterior draws in formats provided by the [**posterior**](https://mc-stan.org/posterior/) package. Here we demonstrate only the `draws_array` and `draws_df` formats, but the **posterior** package supports other useful formats as well. ```{r draws, message=FALSE} # default is a 3-D draws_array object from the posterior package # iterations x chains x variables draws_arr <- fit$draws() # or format="array" str(draws_arr) # draws x variables data frame draws_df <- fit$draws(format = "df") str(draws_df) print(draws_df) ``` To convert an existing draws object to a different format use the `posterior::as_draws_*()` functions. To manipulate the `draws` objects use the various methods described in the **posterior** package [vignettes](https://mc-stan.org/posterior/articles/index.html) and [documentation](https://mc-stan.org/posterior/reference/index.html). ### Structured draws similar to `rstan::extract()` The **posterior** package's `rvar` format provides a multidimensional, sample-based representation of random variables. See https://mc-stan.org/posterior/articles/rvar.html for details. In addition to being useful in its own right, this format also allows CmdStanR users to obtain draws in a similar format to `rstan::extract()`. Suppose we have a parameter `matrix[2,3] x`. The `rvar` format lets you interact with `x` as if it's a `2 x 3` matrix and automatically applies operations over the many posterior draws of `x`. To instead directly access the draws of `x` while maintaining the structure of the matrix use `posterior::draws_of()`. For example: ```{r structured-draws, eval = FALSE} draws <- posterior::as_draws_rvars(fit$draws()) x_rvar <- draws$x x_array <- posterior::draws_of(draws$x) ``` The object `x_rvar` will be an `rvar` that can be used like a `2 x 3` matrix, with the draws handled behind the scenes. The object `x_array` will be a `4000 x 2 x 3` array (assuming `4000` posterior draws), which is the same as it would be after being extracted from the list returned by `rstan::extract()`.