---
title: "Working with Posteriors"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
params:
  EVAL: !r identical(Sys.getenv("NOT_CRAN"), "true")
vignette: >
  %\VignetteIndexEntry{Working with Posteriors}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r child="children/_settings-knitr.Rmd"}
```

```{r include=FALSE}
# Needed temporarily to avoiding weird rendering of posterior's tibbles
# in pkgdown sites
print.tbl_df <- function(x, ...) {
  print.data.frame(x)
}
```

## Summary statistics

We can easily customize the summary statistics reported by `$summary()` and `$print()`.

```{r}
fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
fit$summary()
```

By default all variables are summaries with the follow functions:
```{r}
posterior::default_summary_measures()
```

To change the variables summarized, we use the variables argument
```{r}
fit$summary(variables = c("mu", "tau"))
```

We can additionally change which functions are used
```{r}
fit$summary(variables = c("mu", "tau"), mean, sd)
```

To summarize all variables with non-default functions, it is necessary to set explicitly set the variables argument, either to `NULL` or the full vector of variable names.
```{r}
fit$metadata()$model_params
fit$summary(variables = NULL, "mean", "median")
```

Summary functions can be specified by character string, function, or using a
formula (or anything else supported by `rlang::as_function()`). If these
arguments are named, those names will be used in the tibble output. If the
summary results are named they will take precedence.
```{r}
my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
  c("mu", "tau"), 
  MEAN = mean, 
  "median",
  my_sd,
  ~quantile(.x, probs = c(0.1, 0.9)),
  Minimum = function(x) min(x)
)        
```

Arguments to all summary functions can also be specified with `.args`. 
```{r}
fit$summary(c("mu", "tau"), quantile, .args = list(probs = c(0.025, .05, .95, .975)))
```

The summary functions are applied to the array of sample values, with dimension `iter_sampling`x`chains`.
```{r}
fit$summary(variables = NULL, dim, colMeans)
```


For this reason users may have unexpected results if they use `stats::var()` directly, as it will return a covariance matrix. An alternative is the `distributional::variance()` function, 
which can also be accessed via `posterior::variance()`.
```{r}
fit$summary(c("mu", "tau"), posterior::variance, ~var(as.vector(.x)))
```

Summary functions need not be numeric, but these won't work with `$print()`.

```{r}
strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
```

For more information, see `posterior::summarise_draws()`, which is called by `$summary()`.


## Extracting posterior draws/samples

The [`$draws()`](https://mc-stan.org/cmdstanr/reference/fit-method-draws.html)
method can be used to extract the posterior draws in formats provided by the
[**posterior**](https://mc-stan.org/posterior/) package. Here we demonstrate
only the `draws_array` and `draws_df` formats, but the **posterior** package
supports other useful formats as well.

```{r draws, message=FALSE}
# default is a 3-D draws_array object from the posterior package
# iterations x chains x variables
draws_arr <- fit$draws() # or format="array"
str(draws_arr)

# draws x variables data frame
draws_df <- fit$draws(format = "df")
str(draws_df)
print(draws_df)
```

To convert an existing draws object to a different format use the
`posterior::as_draws_*()` functions.

To manipulate the `draws` objects use the various methods described in the
**posterior** package [vignettes](https://mc-stan.org/posterior/articles/index.html)
and [documentation](https://mc-stan.org/posterior/reference/index.html).

### Structured draws similar to `rstan::extract()`

The **posterior** package's `rvar` format provides a multidimensional,
sample-based representation of random variables. See
https://mc-stan.org/posterior/articles/rvar.html for details. 
In addition to being useful in its own right, this format also allows CmdStanR
users to obtain draws in a similar format to `rstan::extract()`.

Suppose we have a parameter `matrix[2,3] x`. The `rvar` format lets you 
interact with `x` as if it's a `2 x 3` matrix and automatically applies operations
over the many posterior draws of `x`. To instead directly access the draws of `x`
while maintaining the structure of the matrix use `posterior::draws_of()`. 
For example: 

```{r structured-draws, eval = FALSE}
draws <- posterior::as_draws_rvars(fit$draws())
x_rvar <- draws$x
x_array <- posterior::draws_of(draws$x)
```

The object `x_rvar` will be an `rvar` that can be used like a `2 x 3` matrix,
with the draws handled behind the scenes. The object `x_array` will be a 
`4000 x 2 x 3` array (assuming `4000` posterior draws), which is the same as it
would be after being extracted from the list returned by `rstan::extract()`.