Title: | Tools for Outbreak Investigation/Infectious Disease Surveillance |
---|---|
Description: | Create epicurves, epigantt charts, and diverging bar charts using 'ggplot2'. Prepare data for visualisation or other reporting for infectious disease surveillance and outbreak investigation (time series data). Includes tidy functions to solve date based transformations for common reporting tasks, like (A) seasonal date alignment for respiratory disease surveillance, (B) date-based case binning based on specified time intervals like isoweek, epiweek, month and more, (C) automated detection and marking of the new year based on the date/datetime axis of the 'ggplot2', (D) labelling of the last value of a time-series. An introduction on how to use epicurves can be found on the US CDC website (2012, <https://www.cdc.gov/training/quicklearns/epimode/index.html>). |
Authors: | Alexander Bartel [aut, cre] (ORCID: <https://orcid.org/0000-0002-1280-6138>) |
Maintainer: | Alexander Bartel <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.5.1 |
Built: | 2025-09-24 16:39:46 UTC |
Source: | https://github.com/biostats-dev/ggsurveillance |
align_dates_seasonal()
standardizes dates from multiple years to enable comparison of epidemic curves
and visualization of seasonal patterns in infectious disease surveillance data.
Commonly used for creating periodicity plots of respiratory diseases like
influenza, RSV, or COVID-19.align_and_bin_dates_seasonal()
is a convenience wrapper that first aligns the dates and then bins the data to calculate counts and incidence.
align_dates_seasonal( x, dates_from = NULL, date_resolution = c("week", "isoweek", "epiweek", "day", "month"), start = NULL, target_year = NULL, drop_leap_week = TRUE ) align_and_bin_dates_seasonal( x, dates_from, n = 1, population = 1, fill_gaps = FALSE, date_resolution = c("week", "isoweek", "epiweek", "day", "month"), start = NULL, target_year = NULL, drop_leap_week = TRUE, .groups = "drop" )
align_dates_seasonal( x, dates_from = NULL, date_resolution = c("week", "isoweek", "epiweek", "day", "month"), start = NULL, target_year = NULL, drop_leap_week = TRUE ) align_and_bin_dates_seasonal( x, dates_from, n = 1, population = 1, fill_gaps = FALSE, date_resolution = c("week", "isoweek", "epiweek", "day", "month"), start = NULL, target_year = NULL, drop_leap_week = TRUE, .groups = "drop" )
x |
Either a data frame with a date column, or a date vector.
|
dates_from |
Column name containing the dates to align and bin. Used when x is a data.frame. |
date_resolution |
Character string specifying the temporal resolution. One of:
|
start |
Numeric value indicating epidemic season start, i.e. the start and end of the new year interval:
|
target_year |
Numeric value for the reference year to align dates to. The default target year is the start of the most recent season in the data. This way the most recent dates stay unchanged. |
drop_leap_week |
If |
n |
Numeric column with case counts (or weights). Supports quoted and unquoted column names. |
population |
A number or a numeric column with the population size. Used to calculate the incidence. |
fill_gaps |
Logical; If |
.groups |
See |
This function helps create standardized epidemic curves by aligning surveillance data from different years. This enables:
Comparison of disease patterns across multiple seasons
Identification of typical seasonal trends
Detection of unusual disease activity
Assessment of current season against historical patterns
The alignment can be done at different temporal resolutions (daily, weekly, monthly) with customizable season start points to match different disease patterns or surveillance protocols.
A data frame with standardized date columns:
year
: Calendar year from original date
week/month/day
: Time unit based on chosen resolution
date_aligned
: Date standardized to target year
season
: Epidemic season identifier (e.g., "2023/24"), if start = 1
this is the year only (e.g. 2023).
current_season
: Logical flag for most recent season
Binning also creates the columns:
n
: Sum of cases in bin
incidence
: Incidence calculated using n/population
# Seasonal Visualization of Germany Influenza Surveillance Data library(ggplot2) influenza_germany |> align_dates_seasonal( dates_from = ReportingWeek, date_resolution = "epiweek", start = 28 ) -> df_flu_aligned ggplot(df_flu_aligned, aes(x = date_aligned, y = Incidence, color = season)) + geom_line() + facet_wrap(~AgeGroup) + theme_bw() + theme_mod_rotate_x_axis_labels_45()
# Seasonal Visualization of Germany Influenza Surveillance Data library(ggplot2) influenza_germany |> align_dates_seasonal( dates_from = ReportingWeek, date_resolution = "epiweek", start = 28 ) -> df_flu_aligned ggplot(df_flu_aligned, aes(x = date_aligned, y = Incidence, color = season)) + geom_line() + facet_wrap(~AgeGroup) + theme_bw() + theme_mod_rotate_x_axis_labels_45()
Aggregates data by specified time periods (e.g., weeks, months) and calculates (weighted)
counts. Incidence rates are also calculated using the provided population numbers.
This function is the core date binning engine
used by geom_epicurve()
and stat_bin_date()
for creating epidemiological
time series visualizations.
bin_by_date( x, dates_from, n = 1, population = 1, fill_gaps = FALSE, date_resolution = "week", week_start = 1, .groups = "drop" )
bin_by_date( x, dates_from, n = 1, population = 1, fill_gaps = FALSE, date_resolution = "week", week_start = 1, .groups = "drop" )
x |
Either a data frame with a date column, or a date vector.
|
dates_from |
Column name containing the dates to bin. Used when x is a data.frame. |
n |
Numeric column with case counts (or weights). Supports quoted and unquoted column names. |
population |
A number or a numeric column with the population size. Used to calculate the incidence. |
fill_gaps |
Logical; If |
date_resolution |
Character string specifying the time unit for date aggregation.
Possible values include:
|
week_start |
Integer specifying the start of the week (1 = Monday, 7 = Sunday).
Only used when |
.groups |
See |
The function performs several key operations:
Date coercion: Converts the date column to proper Date format
Gap filling (optional): Generates complete temporal sequences to fill missing time periods with zeros
Date binning: Rounds dates to the specified resolution using lubridate::floor_date()
Weight and population handling: Processes count weights and population denominators
Aggregation: Groups by binned dates and sums weights to get counts and incidence
Grouping behaviour: The function respects existing grouping in the input data frame.
A data frame with the following columns:
A date column with the same name as dates_from
, where values are binned to the start of the specified time period.
n
: Count of observations (sum of weights) for each time period
incidence
: Incidence rate calculated as n / population
for each time period
Any existing grouping variables are preserved
library(dplyr) # Create sample data outbreak_data <- data.frame( onset_date = as.Date("2024-12-10") + sample(0:100, 50, replace = TRUE), cases = sample(1:5, 50, replace = TRUE) ) # Basic weekly binning bin_by_date(outbreak_data, dates_from = onset_date) # Weekly binning with case weights bin_by_date(outbreak_data, onset_date, n = cases) # Monthly binning bin_by_date(outbreak_data, onset_date, date_resolution = "month" ) # ISO week binning (Monday start) bin_by_date(outbreak_data, onset_date, date_resolution = "isoweek" ) |> mutate(date_formatted = strftime(onset_date, "%G-W%V")) # Add correct date labels # US CDC epiweek binning (Sunday start) bin_by_date(outbreak_data, onset_date, date_resolution = "epiweek" ) # With population data for incidence calculation outbreak_data$population <- 10000 bin_by_date(outbreak_data, onset_date, n = cases, population = population )
library(dplyr) # Create sample data outbreak_data <- data.frame( onset_date = as.Date("2024-12-10") + sample(0:100, 50, replace = TRUE), cases = sample(1:5, 50, replace = TRUE) ) # Basic weekly binning bin_by_date(outbreak_data, dates_from = onset_date) # Weekly binning with case weights bin_by_date(outbreak_data, onset_date, n = cases) # Monthly binning bin_by_date(outbreak_data, onset_date, date_resolution = "month" ) # ISO week binning (Monday start) bin_by_date(outbreak_data, onset_date, date_resolution = "isoweek" ) |> mutate(date_formatted = strftime(onset_date, "%G-W%V")) # Add correct date labels # US CDC epiweek binning (Sunday start) bin_by_date(outbreak_data, onset_date, date_resolution = "epiweek" ) # With population data for incidence calculation outbreak_data$population <- 10000 bin_by_date(outbreak_data, onset_date, n = cases, population = population )
Creates age groups from numeric values using customizable break points and formatting options. The function allows for flexible formatting and customization of age group labels.
If a factor is returned, this factor includes factor levels of unobserved age groups. This allows for reproducible age groups, which can be used for joining data (e.g. adding age grouped population numbers for incidence calculation).
create_agegroups( values, age_breaks = c(5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90), breaks_as_lower_bound = TRUE, first_group_format = "0-{x}", interval_format = "{x}-{y}", last_group_format = "{x}+", pad_numbers = FALSE, pad_with = "0", collapse_single_year_groups = FALSE, na_label = NA, return_factor = FALSE )
create_agegroups( values, age_breaks = c(5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90), breaks_as_lower_bound = TRUE, first_group_format = "0-{x}", interval_format = "{x}-{y}", last_group_format = "{x}+", pad_numbers = FALSE, pad_with = "0", collapse_single_year_groups = FALSE, na_label = NA, return_factor = FALSE )
values |
Numeric vector of ages to be grouped |
age_breaks |
Numeric vector of break points for age groups. |
breaks_as_lower_bound |
Logical; if |
first_group_format |
Character string template for the first age group. Uses glue::glue syntax. |
interval_format |
Character string template for intermediate age groups. Uses glue::glue syntax. |
last_group_format |
Character string template for the last age group. Uses glue::glue syntax. |
pad_numbers |
Logical or numeric; if numeric, pad numbers up to the specified length (Tip: use |
pad_with |
Character to use for padding numbers. Default: |
collapse_single_year_groups |
Logical; if |
na_label |
Label for |
return_factor |
Logical; if |
Vector of age group labels (character or factor depending on return_factor)
# Basic usage create_agegroups(1:100) # Custom formatting with upper bounds create_agegroups(1:100, breaks_as_lower_bound = FALSE, interval_format = "{x} to {y}", first_group_format = "0 to {x}" ) # Ages 1 to 5 are kept as numbers by collapsing single year groups create_agegroups(1:10, age_breaks = c(1, 2, 3, 4, 5, 10), collapse_single_year_groups = TRUE )
# Basic usage create_agegroups(1:100) # Custom formatting with upper bounds create_agegroups(1:100, breaks_as_lower_bound = FALSE, interval_format = "{x} to {y}", first_group_format = "0 to {x}" ) # Ages 1 to 5 are kept as numbers by collapsing single year groups create_agegroups(1:10, age_breaks = c(1, 2, 3, 4, 5, 10), collapse_single_year_groups = TRUE )
geom_bar_diverging()
creates a diverging bar chart, i.e. stacked bars which are centred at 0.
This is useful for visualizing contrasting categories like:
case counts by contrasting categories like vaccination status or autochthonous (local) vs imported infections
population pyramids
likert scales for e.g. agreement (sentiment analysis)
or any data with natural opposing groups.
stat_diverging()
calculates the required statistics for diverging
charts and can be used with different geoms. Used for easy labelling of diverging charts.
geom_area_diverging()
creates a diverging area chart, for continuous data of opposing categories.
x (or y) has to be continuous for this geom.
See scale_x_continuous_diverging()
, scale_y_continuous_diverging()
for the corresponding ggplot2
scales.
geom_bar_diverging( mapping = NULL, data = NULL, position = "identity", proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_area_diverging( mapping = NULL, data = NULL, position = "identity", proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_diverging( mapping = NULL, data = NULL, geom = "text", position = "identity", stacked = TRUE, proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, totals_by_direction = FALSE, nudge_label_outward = 0, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_bar_diverging( mapping = NULL, data = NULL, position = "identity", proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_area_diverging( mapping = NULL, data = NULL, position = "identity", proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_diverging( mapping = NULL, data = NULL, geom = "text", position = "identity", stacked = TRUE, proportion = FALSE, neutral_cat = c("odd", "never", "NA", "force"), break_pos = NULL, totals_by_direction = FALSE, nudge_label_outward = 0, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. |
position |
Position adjustment. For the geoms, categories will be stacked by default. Don't use |
proportion |
Logical. If |
neutral_cat |
How to handle the middle category for a odd number of factor levels.
|
break_pos |
Only used for
|
... |
Other arguments passed on to |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
geom |
|
stacked |
Logical. If |
totals_by_direction |
Logical. If |
nudge_label_outward |
Numeric. Relative value to nudge labels outward from |
A ggplot2
geom layer that can be added to a plot.
Diverging bar charts split categories into positive and negative directions based on factor level order. Categories in the first half of factor levels go in the negative direction, while categories in the second half go in the positive direction.
Required aesthetics:
x
or y
diverging_groups
: Will default to fill
if missing. A factor should be used for this aesthetic for best results.
All factor levels defined will be used to determine positive, negative and neutral categories.
Behaviour of the diverging bar charts can therefore be controlled by creating empty dummy factor levels.
Optional aesthetics:
weight
: Adjust the weight of observations. Can be used to pass case counts or incidences.
The following calculated stats can be used further in aes:
after_stat(count)
after_stat(prop)
: Proportion of the category within the stacked bar.
after_stat(sign)
: Direction of the category. Either -1
, 0
or +1
scale_x_continuous_diverging()
, scale_y_continuous_diverging()
# Basic example with geom_bar_diverging library(ggplot2) library(dplyr) library(tidyr) set.seed(123) df_6cat <- data.frame(matrix(sample(1:6, 600, replace = TRUE), ncol = 6)) |> mutate_all(~ ordered(., labels = c("+++", "++", "+", "-", "--", "---"))) |> pivot_longer(cols = everything()) ggplot(df_6cat, aes(y = name, fill = value)) + geom_bar_diverging() + # Bars stat_diverging() + # Labels scale_x_continuous_diverging() + # Scale theme_classic() ggplot(df_6cat, aes(y = name, fill = value)) + geom_bar_diverging() + # Bars stat_diverging(totals_by_direction = TRUE, nudge_label_outward = 0.05) + # Totals as Label scale_x_continuous_diverging() + # Scale theme_classic() # Population pyramid population_german_states |> filter(state %in% c("Berlin", "Mecklenburg-Vorpommern"), age < 90) |> ggplot(aes(y = age, fill = sex, weight = n)) + geom_bar_diverging(width = 1) + geom_vline(xintercept = 0) + scale_x_continuous_diverging(n.breaks = 10) + facet_wrap(~state, scales = "free_x") + theme_bw() # Vaccination status: set neutral category set.seed(456) cases_vacc <- data.frame(year = 2017:2025) |> rowwise() |> mutate(vacc = list(sample(1:4, 100, prob = (4:1)^(1 - 0.2 * (year - 2017)), replace = TRUE))) |> unnest(vacc) |> mutate( year = as.factor(year), "Vaccination Status" = ordered(vacc, labels = c("Fully Vaccinated", "Partially Vaccinated", "Unknown", "Unvaccinated") ) ) ggplot(cases_vacc, aes(y = year, fill = `Vaccination Status`)) + geom_vline(xintercept = 0) + geom_bar_diverging(proportion = TRUE, neutral_cat = "force", break_pos = "Unknown") + stat_diverging( size = 3, proportion = TRUE, neutral_cat = "force", break_pos = "Unknown", totals_by_direction = TRUE, nudge_label_outward = 0.05 ) + scale_x_continuous_diverging(labels = scales::label_percent(), n.breaks = 10) + scale_y_discrete_reverse() + ggtitle("Proportion of vaccinated cases by year") + theme_classic() + theme_mod_legend_bottom()
# Basic example with geom_bar_diverging library(ggplot2) library(dplyr) library(tidyr) set.seed(123) df_6cat <- data.frame(matrix(sample(1:6, 600, replace = TRUE), ncol = 6)) |> mutate_all(~ ordered(., labels = c("+++", "++", "+", "-", "--", "---"))) |> pivot_longer(cols = everything()) ggplot(df_6cat, aes(y = name, fill = value)) + geom_bar_diverging() + # Bars stat_diverging() + # Labels scale_x_continuous_diverging() + # Scale theme_classic() ggplot(df_6cat, aes(y = name, fill = value)) + geom_bar_diverging() + # Bars stat_diverging(totals_by_direction = TRUE, nudge_label_outward = 0.05) + # Totals as Label scale_x_continuous_diverging() + # Scale theme_classic() # Population pyramid population_german_states |> filter(state %in% c("Berlin", "Mecklenburg-Vorpommern"), age < 90) |> ggplot(aes(y = age, fill = sex, weight = n)) + geom_bar_diverging(width = 1) + geom_vline(xintercept = 0) + scale_x_continuous_diverging(n.breaks = 10) + facet_wrap(~state, scales = "free_x") + theme_bw() # Vaccination status: set neutral category set.seed(456) cases_vacc <- data.frame(year = 2017:2025) |> rowwise() |> mutate(vacc = list(sample(1:4, 100, prob = (4:1)^(1 - 0.2 * (year - 2017)), replace = TRUE))) |> unnest(vacc) |> mutate( year = as.factor(year), "Vaccination Status" = ordered(vacc, labels = c("Fully Vaccinated", "Partially Vaccinated", "Unknown", "Unvaccinated") ) ) ggplot(cases_vacc, aes(y = year, fill = `Vaccination Status`)) + geom_vline(xintercept = 0) + geom_bar_diverging(proportion = TRUE, neutral_cat = "force", break_pos = "Unknown") + stat_diverging( size = 3, proportion = TRUE, neutral_cat = "force", break_pos = "Unknown", totals_by_direction = TRUE, nudge_label_outward = 0.05 ) + scale_x_continuous_diverging(labels = scales::label_percent(), n.breaks = 10) + scale_y_discrete_reverse() + ggtitle("Proportion of vaccinated cases by year") + theme_classic() + theme_mod_legend_bottom()
Creates a bar chart with explicitly defined ranges.
geom_col_range( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_col_range( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
Defaults to "identity". |
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
... |
Other arguments passed on to
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
A ggplot2
geom layer that can be added to a plot.
Required aesthetics:
Either x
or y
Either xmin
and xmax
or ymin
and ymax
# Basic example library(ggplot2) df <- data.frame(x = 1:3, ymin = -1:-3, ymax = 1:3) ggplot(df, aes(x = x, ymin = ymin, ymax = ymax)) + geom_col_range()
# Basic example library(ggplot2) df <- data.frame(x = 1:3, ymin = -1:-3, ymax = 1:3) ggplot(df, aes(x = x, ymin = ymin, ymax = ymax)) + geom_col_range()
Creates a epicurve plot for visualizing epidemic case counts in outbreaks (epidemiological curves).
An epicurve is a bar plot, where every case is outlined. geom_epicurve
additionally provides
date-based aggregation of cases (e.g. per week or month and many more) using bin_by_date.
For week aggregation both isoweek (World + ECDC) and epiweek (US CDC) are supported.
stat_bin_date
and its alias stat_date_count
provide date based binning only. After binning the by date with bin_by_date, these
stats behave like ggplot2::stat_count.
geom_epicurve_text
adds text labels to cases on epicurve plots.
geom_epicurve_point
adds points/shapes to cases on epicurve plots.
geom_epicurve( mapping = NULL, data = NULL, stat = "epicurve", position = "stack", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), width = NULL, relative.width = 1, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_bin_date( mapping = NULL, data = NULL, geom = "line", position = "identity", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), fill_gaps = FALSE, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_date_count( mapping = NULL, data = NULL, geom = "line", position = "identity", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), fill_gaps = FALSE, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_epicurve_text( mapping = NULL, data = NULL, stat = "epicurve", vjust = 0.5, date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_epicurve_point( mapping = NULL, data = NULL, stat = "epicurve", vjust = 0.5, date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_epicurve( mapping = NULL, data = NULL, stat = "epicurve", position = "stack", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), width = NULL, relative.width = 1, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_bin_date( mapping = NULL, data = NULL, geom = "line", position = "identity", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), fill_gaps = FALSE, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_date_count( mapping = NULL, data = NULL, geom = "line", position = "identity", date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), fill_gaps = FALSE, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_epicurve_text( mapping = NULL, data = NULL, stat = "epicurve", vjust = 0.5, date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_epicurve_point( mapping = NULL, data = NULL, stat = "epicurve", vjust = 0.5, date_resolution = NULL, week_start = getOption("lubridate.week.start", 1), ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by
|
data |
The data frame containing the variables for the plot |
stat |
For the geoms, use " |
position |
Position adjustment. Currently supports " |
date_resolution |
Character string specifying the time unit for date aggregation. If
|
week_start |
Integer specifying the start of the week (1 = Monday, 7 = Sunday).
Only used when |
width |
Numeric value specifying the width of the bars. If |
relative.width |
Numeric value between 0 and 1 adjusting the relative width of bars. Defaults to 1 |
... |
Other arguments passed to
For
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
geom |
The geometric object to use to display the data for this layer.
When using a |
fill_gaps |
Logical; If |
vjust |
Vertical justification of the text or shape. Value between 0 and 1.
Used by |
Epi Curves are a public health tool for outbreak investigation. For more details see the references.
A ggplot2
geom layer that can be added to a plot
Centers for Disease Control and Prevention. Quick-Learn Lesson: Using an Epi Curve to Determine Mode of Spread. USA. https://www.cdc.gov/training/quicklearns/epimode/
Dicker, Richard C., Fátima Coronado, Denise Koo, and R. Gibson Parrish. 2006. Principles of Epidemiology in Public Health Practice; an Introduction to Applied Epidemiology and Biostatistics. 3rd ed. USA. https://stacks.cdc.gov/view/cdc/6914
scale_y_cases_5er()
, geom_vline_year()
# Basic epicurve with dates library(ggplot2) set.seed(1) plot_data_epicurve_imp <- data.frame( date = rep(as.Date("2023-12-01") + ((0:300) * 1), times = rpois(301, 0.5)) ) ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_vline_year(break_type = "week") + geom_epicurve(date_resolution = "week") + labs(title = "Epicurve Example") + scale_y_cases_5er() + # Correct ISOWeek labels for week-year scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + coord_equal(ratio = 7) + # Use coord_equal for square boxes. 'ratio' are the days per week. theme_bw() # Categorical epicurve library(tidyr) library(outbreaks) sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(date_labels = "W%V'%g", date_breaks = "2 weeks") + scale_y_cases_5er() + theme_classic()
# Basic epicurve with dates library(ggplot2) set.seed(1) plot_data_epicurve_imp <- data.frame( date = rep(as.Date("2023-12-01") + ((0:300) * 1), times = rpois(301, 0.5)) ) ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_vline_year(break_type = "week") + geom_epicurve(date_resolution = "week") + labs(title = "Epicurve Example") + scale_y_cases_5er() + # Correct ISOWeek labels for week-year scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + coord_equal(ratio = 7) + # Use coord_equal for square boxes. 'ratio' are the days per week. theme_bw() # Categorical epicurve library(tidyr) library(outbreaks) sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(date_labels = "W%V'%g", date_breaks = "2 weeks") + scale_y_cases_5er() + theme_classic()
Creates Epi Gantt charts, which are specialized timeline visualizations used in outbreak investigations to track potential exposure periods and identify transmission patterns. They are particularly useful for:
Hospital outbreak investigations to visualize patient movements between wards
Identifying potential transmission events by showing when cases were in the same location
Visualizing common exposure times using overlapping exposure time intervals
The chart displays time intervals as horizontal bars, typically with one row per case/patient. Different colours can be used to represent different locations (e.g., hospital wards) or exposure types. Additional points or markers can show important events like symptom onset or test dates.
geom_epigantt()
will adjust the linewidth depending on the number of cases.
geom_epigantt( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_epigantt( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings. Must include:
|
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
A |
position |
A |
... |
Other arguments passed to
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
A ggplot2
geom layer that can be added to a plot
library(dplyr) library(tidyr) library(ggplot2) # Transform hospital outbreak line list to long format linelist_hospital_outbreak |> pivot_longer( cols = starts_with("ward"), names_to = c(".value", "num"), names_pattern = "ward_(name|start_of_stay|end_of_stay)_([0-9]+)", values_drop_na = TRUE ) -> df_stays_long linelist_hospital_outbreak |> pivot_longer(cols = starts_with("pathogen"), values_to = "date") -> df_detections_long # Create Epi Gantt chart showing ward stays and test dates ggplot(df_stays_long) + geom_epigantt(aes(y = Patient, xmin = start_of_stay, xmax = end_of_stay, color = name)) + geom_point(aes(y = Patient, x = date, shape = "Date of pathogen detection"), data = df_detections_long ) + scale_y_discrete_reverse() + theme_bw() + theme_mod_legend_bottom()
library(dplyr) library(tidyr) library(ggplot2) # Transform hospital outbreak line list to long format linelist_hospital_outbreak |> pivot_longer( cols = starts_with("ward"), names_to = c(".value", "num"), names_pattern = "ward_(name|start_of_stay|end_of_stay)_([0-9]+)", values_drop_na = TRUE ) -> df_stays_long linelist_hospital_outbreak |> pivot_longer(cols = starts_with("pathogen"), values_to = "date") -> df_detections_long # Create Epi Gantt chart showing ward stays and test dates ggplot(df_stays_long) + geom_epigantt(aes(y = Patient, xmin = start_of_stay, xmax = end_of_stay, color = name)) + geom_point(aes(y = Patient, x = date, shape = "Date of pathogen detection"), data = df_detections_long ) + scale_y_discrete_reverse() + theme_bw() + theme_mod_legend_bottom()
Creates a label, point or any geom at the last point of a line (highest x value). This is useful for
line charts where you want to identify each line at its endpoint, write the last value of a
time series at the endpoint or just add a point at the end of a geom_line
. This functions
also nudges the last value relative to the length of the x-axis.
The function automatically positions the label slightly to the right of the last point.
There are 5 functions:
stat_last_value()
: The core statistical transformation that identifies the last point of a line
(e.g. last date of the time series).
geom_label_last_value()
: Adds the last y value or a custom label
after the last observation using geom_label
.
geom_text_last_value()
: Adds the last y value or a custom text
after the last observation using geom_text
.
geom_label_last_value_repel()
: Adds non-overlapping labels with geom_label_repel
.
geom_text_last_value_repel()
: Adds non-overlapping text with geom_text_repel
.
stat_last_value( mapping = NULL, data = NULL, geom = "point", position = "identity", nudge_rel = 0, nudge_add = 0, expand_rel = 0, expand_add = 0, labeller = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_label_last_value( mapping = NULL, data = NULL, stat = "last_value", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.05, expand_add = 0, labeller = NULL, hjust = 0, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_text_last_value( mapping = NULL, data = NULL, stat = "last_value", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.035, expand_add = 0, labeller = NULL, hjust = 0, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_label_last_value_repel( mapping = NULL, data = NULL, stat = "last_value_repel", position = "identity", nudge_rel = 0.03, nudge_add = 0, expand_rel = 0.05, expand_add = 0, labeller = NULL, hjust = 0, direction = "y", min.segment.length = 0.5, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_text_last_value_repel( mapping = NULL, data = NULL, stat = "last_value_repel", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.035, expand_add = 0, labeller = NULL, hjust = 0, direction = "y", min.segment.length = 0.5, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
stat_last_value( mapping = NULL, data = NULL, geom = "point", position = "identity", nudge_rel = 0, nudge_add = 0, expand_rel = 0, expand_add = 0, labeller = NULL, ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_label_last_value( mapping = NULL, data = NULL, stat = "last_value", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.05, expand_add = 0, labeller = NULL, hjust = 0, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_text_last_value( mapping = NULL, data = NULL, stat = "last_value", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.035, expand_add = 0, labeller = NULL, hjust = 0, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_label_last_value_repel( mapping = NULL, data = NULL, stat = "last_value_repel", position = "identity", nudge_rel = 0.03, nudge_add = 0, expand_rel = 0.05, expand_add = 0, labeller = NULL, hjust = 0, direction = "y", min.segment.length = 0.5, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) geom_text_last_value_repel( mapping = NULL, data = NULL, stat = "last_value_repel", position = "identity", nudge_rel = 0.015, nudge_add = 0, expand_rel = 0.035, expand_add = 0, labeller = NULL, hjust = 0, direction = "y", min.segment.length = 0.5, ..., na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by
|
data |
The data frame containing the variables for the plot |
geom |
The geometric object to use to display the data for this layer.
When using a |
position |
Position adjustment. Defaults to "identity" |
nudge_rel |
Numeric value specifying how far to nudge the label to the right, relative to the range of the x-values of the data. Defaults to 0.015 (1.5% of axis width) for labels. |
nudge_add |
Numeric value specifying an absolute amount to nudge the label (in units of the x-axis). |
expand_rel |
Numeric value specifying how far to expand the axis limits, relative to the range of the x-values of the data. This can be used to create room for longer text/labels. For repel functions this has to be large enough to place the text to achieve good results. |
expand_add |
Numeric value specifying an absolute amount to expand the axis limits (in units of the x-axis). |
labeller |
Label function to format the last value.
E.g. |
... |
Other arguments passed to |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat |
The statistical transformation to use on the data. Defaults to "last_value" |
hjust |
Horizontal text alignment. Defaults to left aligned (0). |
direction |
Direction in which to repel the labels. See |
min.segment.length |
Minimum length of the leader line segments. See |
The following calculated stats can be used further in aes:
after_stat(x0)
: the highest x value
after_stat(y)
: the y value of the observation with the highest x value.
after_stat(label_formatted)
: the formatted y value using the labeller
.
A ggplot2
layer that can be added to a plot
# Basic example with last value labels library(ggplot2) ggplot(economics, aes(x = date, y = unemploy)) + geom_line() + geom_text_last_value() # Percentages ggplot(economics, aes(x = date, y = unemploy / pop)) + geom_line() + geom_label_last_value(labeller = scales::label_percent(accuracy = 0.1)) # Multiple lines with custom labels ggplot(economics_long, aes(x = date, y = value, color = variable)) + geom_line() + stat_last_value() + # Add a point at the end geom_label_last_value_repel(aes(label = variable), expand_rel = 0.1, nudge_rel = 0.05 ) + scale_y_log10() + theme_mod_disable_legend()
# Basic example with last value labels library(ggplot2) ggplot(economics, aes(x = date, y = unemploy)) + geom_line() + geom_text_last_value() # Percentages ggplot(economics, aes(x = date, y = unemploy / pop)) + geom_line() + geom_label_last_value(labeller = scales::label_percent(accuracy = 0.1)) # Multiple lines with custom labels ggplot(economics_long, aes(x = date, y = value, color = variable)) + geom_line() + stat_last_value() + # Add a point at the end geom_label_last_value_repel(aes(label = variable), expand_rel = 0.1, nudge_rel = 0.05 ) + scale_y_log10() + theme_mod_disable_legend()
Determines turn of year dates based on the range of either the x or y axis of the ggplot.
geom_vline_year()
draws vertical lines at the turn of each year
geom_hline_year()
draws horizontal lines at the turn of each year
geom_vline_year( mapping = NULL, year_break = "01-01", break_type = c("day", "week", "isoweek", "epiweek"), just = NULL, ..., show.legend = NA ) geom_hline_year( mapping = NULL, year_break = "01-01", break_type = c("day", "week", "isoweek", "epiweek"), just = NULL, ..., show.legend = NA )
geom_vline_year( mapping = NULL, year_break = "01-01", break_type = c("day", "week", "isoweek", "epiweek"), just = NULL, ..., show.legend = NA ) geom_hline_year( mapping = NULL, year_break = "01-01", break_type = c("day", "week", "isoweek", "epiweek"), just = NULL, ..., show.legend = NA )
mapping |
Mapping created using |
year_break |
String specifying the month and day ("MM-DD") or week ("W01") of the year break .
Defaults to: |
break_type |
String specifying the type of break to use. Options are:
|
just |
Numeric offset in days (justification). Shifts the lines from the year break date.
Defaults to |
... |
Other arguments passed to
|
show.legend |
logical. Should this layer be included in the legends? |
A ggplot2 layer that can be added to a plot.
geom_epicurve()
, ggplot2::geom_vline()
library(ggplot2) set.seed(1) plot_data_epicurve_imp <- data.frame( date = rep(as.Date("2023-12-01") + ((0:300) * 1), times = rpois(301, 0.5)) ) # Break type day ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_epicurve(date_resolution = "week") + geom_vline_year() + labs(title = "Epicurve Example") + scale_y_cases_5er() + scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + # Correct ISOWeek labels week'year theme_bw() # Break type week ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_epicurve(date_resolution = "week") + geom_vline_year(break_type = "week") + labs(title = "Epicurve Example") + scale_y_cases_5er() + scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + # Correct ISOWeek labels week'year theme_bw()
library(ggplot2) set.seed(1) plot_data_epicurve_imp <- data.frame( date = rep(as.Date("2023-12-01") + ((0:300) * 1), times = rpois(301, 0.5)) ) # Break type day ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_epicurve(date_resolution = "week") + geom_vline_year() + labs(title = "Epicurve Example") + scale_y_cases_5er() + scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + # Correct ISOWeek labels week'year theme_bw() # Break type week ggplot(plot_data_epicurve_imp, aes(x = date, weight = 2)) + geom_epicurve(date_resolution = "week") + geom_vline_year(break_type = "week") + labs(title = "Epicurve Example") + scale_y_cases_5er() + scale_x_date(date_breaks = "4 weeks", date_labels = "W%V'%g") + # Correct ISOWeek labels week'year theme_bw()
The geometric mean is typically defined for strictly positive values. This function computes the geometric mean of a numeric vector, with the option to replace certain values (e.g., zeros, non-positive values, or values below a user-specified threshold) before computation.
geometric_mean( x, na.rm = FALSE, replace_value = NULL, replace = c("all", "non-positive", "zero"), warning = TRUE )
geometric_mean( x, na.rm = FALSE, replace_value = NULL, replace = c("all", "non-positive", "zero"), warning = TRUE )
x |
A numeric or complex vector of values. |
na.rm |
Logical. If |
replace_value |
Numeric or |
replace |
Character string indicating which values to replace:
|
warning |
Disable warnings by setting it to |
Replacement Considerations:
The geometric mean is only defined for strictly positive numbers ().
Despite this, the geometric mean can be useful for laboratory measurements which can contain 0 or negative values.
If these values are treated as NA and are removed, this results in an upward bias due to missingness.
To reduce this, values below the limit of detection (LOD) or limit of quantification (LOQ)
are often replaced with the chosen limit, making this limit the practical lower limit of the measurement scale.
This is therefore an often recommended approach.
There are also alternatives approaches, where values are replaced by
either or
(or LOQ). These approaches create a gap in the distribution
of values (e.g. no values for
) and should therefore be used with caution.
If the replacement approach for values below LOD or LOQ has a material effect on the interpretation of the results, the values should be treated as statistically censored. In this case, proper statistical methods to handle (left) censored data should be used.
When replace_value
is provided, the function will first perform
the specified replacements, then proceed with the geometric mean calculation.
If no replacements are requested but zero or negative values remain and
na.rm = FALSE
, an NA
will be returned with a warning.
A single numeric value representing the geometric mean of the
processed vector x
, or NA
if the resulting vector is empty
(e.g., if na.rm = TRUE
removes all positive values) or if non-positive
values exist when na.rm = FALSE
.
# Basic usage with no replacements: x <- c(1, 2, 3, 4, 5) geometric_mean(x) # Replace all values < 0.5 with 0.5 (common in LOD scenarios): x3 <- c(0.1, 0.2, 0.4, 1, 5) geometric_mean(x3, replace_value = 0.5, replace = "all") # Remove zero or negative values, since log(0) = -Inf and log(-1) = NaN x4 <- c(-1, 0, 1, 2, 3) geometric_mean(x4, na.rm = TRUE)
# Basic usage with no replacements: x <- c(1, 2, 3, 4, 5) geometric_mean(x) # Replace all values < 0.5 with 0.5 (common in LOD scenarios): x3 <- c(0.1, 0.2, 0.4, 1, 5) geometric_mean(x3, replace_value = 0.5, replace = "all") # Remove zero or negative values, since log(0) = -Inf and log(-1) = NaN x4 <- c(-1, 0, 1, 2, 3) geometric_mean(x4, na.rm = TRUE)
A specialized axis guide for date scales that creates nested axis labels by automatically detecting hierarchical patterns in date labels (e.g., separating day-month from year components). This guide is particularly useful for time series data, where the axis can get crowded when showing the full dates. This is similar to the date scale from Excel.
guide_axis_nested_date( sep = "[^[:alnum:]]+", regular_key = "auto", type = "bracket", mode = "simple", pad_date = NULL, oob = "none", ... )
guide_axis_nested_date( sep = "[^[:alnum:]]+", regular_key = "auto", type = "bracket", mode = "simple", pad_date = NULL, oob = "none", ... )
sep |
A regular expression pattern used to split axis labels into
hierarchical components. Default is |
regular_key |
Default is |
type |
The visual type of nested axis guide to create. Options include:
|
mode |
Processing mode for the guide. Default is |
pad_date |
Numeric value controlling the padding around date levels,
i.e. extending the length of the bracket or box or for correctly positioning the fences.
If |
oob |
How to handle out-of-bounds values of the scale labels. Default is |
... |
Additional arguments passed to |
A nested axis guide object that can be used with ggplot2::scale_x_date()
etc. or ggplot2::guides()
.
library(ggplot2) # Create sample epidemic curve data epi_data <- data.frame( date = rep(as.Date("2023-12-15") + 0:100, times = rpois(101, 2)) ) ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "2 weeks", date_labels = "%d-%b-%Y", guide = guide_axis_nested_date() ) # Using fence type with ISO week labels ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "2 weeks", date_labels = "W%V.%G", guide = guide_axis_nested_date(type = "fence") ) # Using box type with custom padding ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "month") + scale_x_date( date_breaks = "1 month", date_labels = "%b.%Y", guide = guide_axis_nested_date(type = "box", pad_date = 0.3) ) # Custom separator for different label formats ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "1 week", date_labels = "%d-%b-%Y", guide = guide_axis_nested_date(type = "bracket", sep = "-") ) # Datetime example with fence type datetime_data <- data.frame( datetime = rep(as.POSIXct("2024-02-05 01:00:00") + 0:50 * 3600, times = rpois(51, 3) ) ) ggplot(datetime_data, aes(x = datetime)) + geom_epicurve(date_resolution = "2 hours") + scale_x_datetime( date_breaks = "6 hours", date_labels = "%Hh %e.%b", limits = c(as.POSIXct("2024-02-04 22:00:00"), NA), guide = guide_axis_nested_date() )
library(ggplot2) # Create sample epidemic curve data epi_data <- data.frame( date = rep(as.Date("2023-12-15") + 0:100, times = rpois(101, 2)) ) ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "2 weeks", date_labels = "%d-%b-%Y", guide = guide_axis_nested_date() ) # Using fence type with ISO week labels ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "2 weeks", date_labels = "W%V.%G", guide = guide_axis_nested_date(type = "fence") ) # Using box type with custom padding ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "month") + scale_x_date( date_breaks = "1 month", date_labels = "%b.%Y", guide = guide_axis_nested_date(type = "box", pad_date = 0.3) ) # Custom separator for different label formats ggplot(epi_data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_x_date( date_breaks = "1 week", date_labels = "%d-%b-%Y", guide = guide_axis_nested_date(type = "bracket", sep = "-") ) # Datetime example with fence type datetime_data <- data.frame( datetime = rep(as.POSIXct("2024-02-05 01:00:00") + 0:50 * 3600, times = rpois(51, 3) ) ) ggplot(datetime_data, aes(x = datetime)) + geom_epicurve(date_resolution = "2 hours") + scale_x_datetime( date_breaks = "6 hours", date_labels = "%Hh %e.%b", limits = c(as.POSIXct("2024-02-04 22:00:00"), NA), guide = guide_axis_nested_date() )
A subset of the weekly German influenza surveillance data from January 2020 to January 2025.
influenza_germany
influenza_germany
A data frame with 1,037 rows and 4 columns:
Reporting Week in "2024-W03" format
Age groups: 00+
for all and 00-14
, 15-59
and 60+
for age stratified cases.
Weekly case count
Calculated weekly incidence
License CC-BY 4.0: Robert Koch-Institut (2025): Laborbestätigte Influenzafälle in Deutschland. Dataset. Zenodo. DOI:10.5281/zenodo.14619502. https://github.com/robert-koch-institut/Influenzafaelle_in_Deutschland
library(ggplot2) influenza_germany |> align_dates_seasonal( dates_from = ReportingWeek, date_resolution = "isoweek", start = 28 ) -> df_flu_aligned ggplot(df_flu_aligned, aes(x = date_aligned, y = Incidence, color = season)) + geom_line() + facet_wrap(~AgeGroup) + theme_bw() + theme_mod_rotate_x_axis_labels_45()
library(ggplot2) influenza_germany |> align_dates_seasonal( dates_from = ReportingWeek, date_resolution = "isoweek", start = 28 ) -> df_flu_aligned ggplot(df_flu_aligned, aes(x = date_aligned, y = Incidence, color = season)) + geom_line() + facet_wrap(~AgeGroup) + theme_bw() + theme_mod_rotate_x_axis_labels_45()
Re-export from the scales package.
Can be used to overwrite the default locale of date labels.
label_date_short()
only labels part of the dates, when they change,
i.e. year is only labelled when the year changes.
See scales::label_date()
and scales::label_date_short()
for more details.
label_date(format = "%Y-%m-%d", tz = "UTC", locale = NULL) label_date_short( format = c("%Y", "%b", "%d", "%H:%M"), sep = "\n", leading = "0", tz = "UTC", locale = NULL )
label_date(format = "%Y-%m-%d", tz = "UTC", locale = NULL) label_date_short( format = c("%Y", "%b", "%d", "%H:%M"), sep = "\n", leading = "0", tz = "UTC", locale = NULL )
format |
For For |
tz |
a time zone name, see |
locale |
Locale to use when for day and month names. The default
uses the current locale. Setting this argument requires stringi, and you
can see a complete list of supported locales with
|
sep |
Separator to use when combining date formats into a single string. |
leading |
A string to replace leading zeroes with. Can be |
A character vector of formatted dates.
library(tidyr) library(outbreaks) library(ggplot2) # Change locale of date labels to Italian sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(labels = label_date("%B %Y", locale = "it"), date_breaks = "1 month") + scale_y_cases_5er() + theme_classic() # label_date_short() sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(labels = label_date_short(), date_breaks = "1 week") + scale_y_cases_5er() + theme_classic()
library(tidyr) library(outbreaks) library(ggplot2) # Change locale of date labels to Italian sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(labels = label_date("%B %Y", locale = "it"), date_breaks = "1 month") + scale_y_cases_5er() + theme_classic() # label_date_short() sars_canada_2003 |> # SARS dataset from outbreaks pivot_longer(starts_with("cases"), names_prefix = "cases_", names_to = "origin") |> ggplot(aes(x = date, weight = value, fill = origin)) + geom_epicurve(date_resolution = "week") + scale_x_date(labels = label_date_short(), date_breaks = "1 week") + scale_y_cases_5er() + theme_classic()
Creates a labeller function that formats numbers in scientific notation using
power-of-10 R expressions (e.g., or
). Useful for axis
labels in ggplot2 when dealing with large numbers or when you want to emphasize
the order of magnitude.
label_power10( decimal.mark = NULL, digits = 3, scale = 1, prefix = "", suffix = "", magnitude_only = FALSE, ... )
label_power10( decimal.mark = NULL, digits = 3, scale = 1, prefix = "", suffix = "", magnitude_only = FALSE, ... )
decimal.mark |
Character used as decimal separator. If |
digits |
Number of significant digits to show in the mantissa. |
scale |
Scaling factor multiplied to the input values. Default is |
prefix |
Character string to prepend to each label. Default is |
suffix |
Character string to append to each label. Default is |
magnitude_only |
Logical. If |
... |
Additional arguments passed to |
The function converts numbers to scientific notation and then formats them as mathematical expressions using the R expression syntax:
For exponent 0: returns the mantissa as-is (e.g., )
For exponent 1: it omits the exponent (e.g., )
For other exponents: everything is shown (e.g., )
When magnitude_only = TRUE
:
For exponent 0: returns
For exponent 1: returns
For other exponents (positive or negative): returns
The function handles negative numbers by preserving the sign and supports custom decimal marks, prefixes, and suffixes.
A label function that takes a numeric vector and returns an expression vector suitable for use as axis labels in ggplot2.
library(ggplot2) # Basic usage with default settings label_power10()(c(1000, 10000, 100000, -1000)) # Use in ggplot2 ggplot( data.frame(x = 1:5, y = c(1, 50000, 75000, 100000, 200000)), aes(x, y) ) + geom_point() + scale_y_continuous(labels = label_power10()) # Use in ggplot2 with options ggplot( data.frame(x = 1:5, y = c(1, 50000, 75000, 100000, 200000)), aes(x, y) ) + geom_point() + scale_y_continuous(labels = label_power10(decimal.mark = ",", digits = 2, suffix = " CFU")) # Magnitude only for cleaner labels with log scales ggplot( data.frame(x = 1:5, y = c(1000, 10000, 100000, 1000000, 10000000)), aes(x, y) ) + geom_point() + scale_y_log10(labels = label_power10(magnitude_only = TRUE))
library(ggplot2) # Basic usage with default settings label_power10()(c(1000, 10000, 100000, -1000)) # Use in ggplot2 ggplot( data.frame(x = 1:5, y = c(1, 50000, 75000, 100000, 200000)), aes(x, y) ) + geom_point() + scale_y_continuous(labels = label_power10()) # Use in ggplot2 with options ggplot( data.frame(x = 1:5, y = c(1, 50000, 75000, 100000, 200000)), aes(x, y) ) + geom_point() + scale_y_continuous(labels = label_power10(decimal.mark = ",", digits = 2, suffix = " CFU")) # Magnitude only for cleaner labels with log scales ggplot( data.frame(x = 1:5, y = c(1000, 10000, 100000, 1000000, 10000000)), aes(x, y) ) + geom_point() + scale_y_log10(labels = label_power10(magnitude_only = TRUE))
Creates a labeller function that removes every n-th label on an ggplot2
axis.
Useful for reducing overlapping labels while keeping the major ticks.
label_skip(n = 2, start = c("left", "right"), labeller = NULL)
label_skip(n = 2, start = c("left", "right"), labeller = NULL)
n |
Integer. Display every nth label. Default is |
start |
Where to start the pattern. Either |
labeller |
Optional function to transform labels before applying skip pattern.
For example |
A function that takes a vector of labels and returns a vector with skipped labels replaced by empty strings.
library(ggplot2) # Default skip labels ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point() + scale_x_continuous(labels = label_skip()) # Skip date labels, while keep ticks ggplot(economics, aes(x = date, y = unemploy)) + geom_line() + scale_x_date( date_breaks = "2 years", labels = label_skip(start = "right", labeller = label_date(format = "%Y")) ) + theme_bw()
library(ggplot2) # Default skip labels ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point() + scale_x_continuous(labels = label_skip()) # Skip date labels, while keep ticks ggplot(economics, aes(x = date, y = unemploy)) + geom_line() + scale_x_date( date_breaks = "2 years", labels = label_skip(start = "right", labeller = label_date(format = "%Y")) ) + theme_bw()
This hospital outbreak is inspired by typical hospital outbreaks with resistant 4MRGN bacterial pathogens. These outbreaks start silent, since they are not initially apparent from the symptoms of the patient.
linelist_hospital_outbreak
linelist_hospital_outbreak
A data frame with 8 rows and 9 columns:
Patient
- Patient ID (0-7)
ward_name_1
- Name of first ward where patient stayed
ward_start_of_stay_1
- Start date of stay in first ward
ward_end_of_stay_1
- End date of stay in first ward
ward_name_2
- Name of second ward where patient stayed (if applicable)
ward_start_of_stay_2
- Start date of stay in second ward (if applicable)
ward_end_of_stay_2
- End date of stay in second ward (if applicable)
pathogen_detection_1
- Date of first positive pathogen test
pathogen_detection_2
- Date of second positive pathogen test (if applicable)
Patient details:
Patient 0: Index case (ICU), infected early on but detected June 30, 2024
Patient 1-2: ICU patients, found during initial screening
Patient 3: Case who moved from ICU to general ward prior to the detection of patient 0, potentially linking both outbreak clusters. Detected during extended case search
Patient 4-6: General ward cases, found after Patient 3's detection
Patient 7: General ward case, detected post-discharge by GP, who notified the hospital
library(dplyr) library(tidyr) library(ggplot2) # Transform hospital outbreak line list to long format linelist_hospital_outbreak |> pivot_longer( cols = starts_with("ward"), names_to = c(".value", "num"), names_pattern = "ward_(name|start_of_stay|end_of_stay)_([0-9]+)", values_drop_na = TRUE ) -> df_stays_long linelist_hospital_outbreak |> pivot_longer(cols = starts_with("pathogen"), values_to = "date") -> df_detections_long # Create Epi Gantt chart showing ward stays and test dates ggplot(df_stays_long) + geom_epigantt(aes(y = Patient, xmin = start_of_stay, xmax = end_of_stay, color = name)) + geom_point(aes(y = Patient, x = date, shape = "Date of pathogen detection"), data = df_detections_long ) + scale_y_discrete_reverse() + theme_bw() + theme(legend.position = "bottom")
library(dplyr) library(tidyr) library(ggplot2) # Transform hospital outbreak line list to long format linelist_hospital_outbreak |> pivot_longer( cols = starts_with("ward"), names_to = c(".value", "num"), names_pattern = "ward_(name|start_of_stay|end_of_stay)_([0-9]+)", values_drop_na = TRUE ) -> df_stays_long linelist_hospital_outbreak |> pivot_longer(cols = starts_with("pathogen"), values_to = "date") -> df_detections_long # Create Epi Gantt chart showing ward stays and test dates ggplot(df_stays_long) + geom_epigantt(aes(y = Patient, xmin = start_of_stay, xmax = end_of_stay, color = name)) + geom_point(aes(y = Patient, x = date, shape = "Date of pathogen detection"), data = df_detections_long ) + scale_y_discrete_reverse() + theme_bw() + theme(legend.position = "bottom")
German Population data by state in 2023
population_german_states
population_german_states
A data frame with 2912 rows and 5 columns:
Date: Always "2023-12-31"
Character: Name of the German state
Numeric: Age from 0 to 89. Age 90 includes "90 and above"
Factor: "female" or "male"
Numeric: Population size
© Statistisches Bundesamt (Destatis), Genesis-Online, 2025: Bevölkerung: Bundesländer, Stichtag, Geschlecht, Altersjahre (12411-0013). Data licence Germany (dl-de/by-2-0) https://www-genesis.destatis.de/datenbank/online/statistic/12411/table/12411-0013
# Population pyramid library(ggplot2) library(dplyr) population_german_states |> filter(age < 90) |> ggplot(aes(y = age, fill = sex, weight = n)) + geom_bar_diverging(width = 1) + geom_vline(xintercept = 0) + scale_x_continuous_diverging() + facet_wrap(~state, scales = "free_x") + theme_bw(base_size = 8) + theme_mod_legend_top()
# Population pyramid library(ggplot2) library(dplyr) population_german_states |> filter(age < 90) |> ggplot(aes(y = age, fill = sex, weight = n)) + geom_bar_diverging(width = 1) + geom_vline(xintercept = 0) + scale_x_continuous_diverging() + facet_wrap(~state, scales = "free_x") + theme_bw(base_size = 8) + theme_mod_legend_top()
These scales automatically create symmetrical limits around a centre point (zero by default).
They're useful for diverging continuous variables where the visual encoding should
be balanced around a center point, such as positive and negative values.
They are intended to be used with geom_bar_diverging()
, geom_area_diverging()
and stat_diverging()
.
scale_x_continuous_diverging( name = waiver(), limits = waiver(), labels = NULL, transform = "identity", ..., breaks = waiver(), n.breaks = NULL, expand = waiver(), position = "bottom" ) scale_y_continuous_diverging( name = waiver(), limits = NULL, labels = NULL, transform = "identity", ..., breaks = waiver(), n.breaks = NULL, expand = waiver(), position = "left" )
scale_x_continuous_diverging( name = waiver(), limits = waiver(), labels = NULL, transform = "identity", ..., breaks = waiver(), n.breaks = NULL, expand = waiver(), position = "bottom" ) scale_y_continuous_diverging( name = waiver(), limits = NULL, labels = NULL, transform = "identity", ..., breaks = waiver(), n.breaks = NULL, expand = waiver(), position = "left" )
name |
The name of the scale. Used as the axis or legend title. If
|
limits |
Numeric vector of length two providing limits of the scale.
If |
labels |
Either |
transform |
Defaults to "identity". Use "reverse" to invert the scale. Especially useful to flip the direction of diverging bar charts. |
... |
Other arguments passed on to |
breaks |
One of:
|
n.breaks |
An integer guiding the number of major breaks. The algorithm
may choose a slightly different number to ensure nice break labels. Will
only have an effect if |
expand |
For position scales, a vector of range expansion constants used to add some
padding around the data to ensure that they are placed some distance
away from the axes. Use the convenience function |
position |
For position scales, The position of the axis.
|
A ggplot2
scale object that can be added to a plot.
geom_bar_diverging()
, geom_area_diverging()
, stat_diverging()
library(ggplot2) # Create sample data with positive and negative values df <- data.frame( x = c(-5, -2, 0, 3, 7), y = c(2, -1, 0, -3, 5) ) # Basic usage ggplot(df, aes(x, y)) + geom_point() + scale_x_continuous_diverging() + scale_y_continuous_diverging()
library(ggplot2) # Create sample data with positive and negative values df <- data.frame( x = c(-5, -2, 0, 3, 7), y = c(2, -1, 0, -3, 5) ) # Basic usage ggplot(df, aes(x, y)) + geom_point() + scale_x_continuous_diverging() + scale_y_continuous_diverging()
A continuous ggplot scale for count data with sane defaults for breaks.
It uses base::pretty()
to increase the default number of breaks and prefers 5er breaks.
Additionally, the first tick (i.e. zero) is aligned to the lower left corner.
scale_y_cases_5er( name = waiver(), n = 8, min.n = 5, u5.bias = 4, expand = NULL, limits = c(0, NA), labels = waiver(), oob = scales::censor, na.value = NA_real_, transform = "identity", position = "left", sec.axis = waiver(), guide = waiver(), ... ) scale_x_cases_5er( name = waiver(), n = 8, min.n = 5, u5.bias = 4, expand = NULL, limits = c(0, NA), labels = waiver(), oob = scales::censor, na.value = NA_real_, transform = "identity", position = "bottom", sec.axis = waiver(), guide = waiver(), ... )
scale_y_cases_5er( name = waiver(), n = 8, min.n = 5, u5.bias = 4, expand = NULL, limits = c(0, NA), labels = waiver(), oob = scales::censor, na.value = NA_real_, transform = "identity", position = "left", sec.axis = waiver(), guide = waiver(), ... ) scale_x_cases_5er( name = waiver(), n = 8, min.n = 5, u5.bias = 4, expand = NULL, limits = c(0, NA), labels = waiver(), oob = scales::censor, na.value = NA_real_, transform = "identity", position = "bottom", sec.axis = waiver(), guide = waiver(), ... )
name |
The name of the scale. Used as the axis or legend title. If
|
n |
Target number of breaks passed to |
min.n |
Minimum number of breaks passed to |
u5.bias |
The "5-bias" parameter passed to |
expand |
Uses own expansion logic. Use |
limits |
The lower limit defaults to 0 and the upper limits is chosen based on the data.
This is the recommended approach for visualizing case numbers and incidences,
i.e. the scale starts at 0 and is only positive.
To use the default |
labels |
One of:
|
oob |
One of:
|
na.value |
Missing values will be replaced with this value. |
transform |
For continuous scales, the name of a transformation object or the object itself. Built-in transformations include "asn", "atanh", "boxcox", "date", "exp", "hms", "identity", "log", "log10", "log1p", "log2", "logit", "modulus", "probability", "probit", "pseudo_log", "reciprocal", "reverse", "sqrt" and "time". A transformation object bundles together a transform, its inverse,
and methods for generating breaks and labels. Transformation objects
are defined in the scales package, and are called |
position |
For position scales, The position of the axis.
|
sec.axis |
|
guide |
A function used to create a guide or its name. See
|
... |
Additional arguments passed on to |
A ggplot2
scale object that can be added to a plot.
geom_epicurve()
, ggplot2::scale_y_continuous()
, base::pretty()
,
theme_mod_remove_minor_grid_y()
library(ggplot2) data <- data.frame(date = as.Date("2024-01-01") + 0:30) ggplot(data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_y_cases_5er() + theme_mod_remove_minor_grid_y()
library(ggplot2) data <- data.frame(date = as.Date("2024-01-01") + 0:30) ggplot(data, aes(x = date)) + geom_epicurve(date_resolution = "week") + scale_y_cases_5er() + theme_mod_remove_minor_grid_y()
scale_y_discrete_reverse()
and scale_x_discrete_reverse()
are standard discrete 'ggplot2'
scales with a reversed order of values. Since the ggplot2 coordinate system starts with 0 in
the lower left corner, factors on the y-axis are sorted is descending order by default
(i.e. alphabetically from Z to A). With this scale the the y-axis will start with the
first factor level at the top or with alphabetically correctly ordered values
scale_y_discrete_reverse( name = waiver(), limits = NULL, ..., expand = waiver(), position = "left" ) scale_x_discrete_reverse( name = waiver(), limits = NULL, ..., expand = waiver(), position = "bottom" )
scale_y_discrete_reverse( name = waiver(), limits = NULL, ..., expand = waiver(), position = "left" ) scale_x_discrete_reverse( name = waiver(), limits = NULL, ..., expand = waiver(), position = "bottom" )
name |
The name of the scale. Used as the axis or legend title. If
|
limits |
Can be either NULL which uses the default reversed scale values or a character vector which will be reversed. |
... |
Arguments passed on to |
expand |
For position scales, a vector of range expansion constants used to add some
padding around the data to ensure that they are placed some distance
away from the axes. Use the convenience function |
position |
For position scales, The position of the axis.
|
A ggplot2
scale object that can be added to a plot.
geom_epigantt()
, ggplot2::scale_y_discrete()
library(ggplot2) # Create sample data df <- data.frame( category = factor(c("A", "B", "C", "D")), value = c(10, 5, 8, 3) ) # Basic plot with reversed y-axis ggplot(df, aes(x = value, y = category)) + geom_col() + scale_y_discrete_reverse()
library(ggplot2) # Create sample data df <- data.frame( category = factor(c("A", "B", "C", "D")), value = c(10, 5, 8, 3) ) # Basic plot with reversed y-axis ggplot(df, aes(x = value, y = category)) + geom_col() + scale_y_discrete_reverse()
Convenience functions to control the legend position for ggplot2
.
Has to be called after setting the theme.
theme_mod_disable_legend() theme_mod_legend_position( position = c("top", "bottom", "left", "right", "none", "inside"), position.inside = NULL ) theme_mod_legend_top() theme_mod_legend_bottom() theme_mod_legend_left() theme_mod_legend_right() theme_mod_remove_legend_title()
theme_mod_disable_legend() theme_mod_legend_position( position = c("top", "bottom", "left", "right", "none", "inside"), position.inside = NULL ) theme_mod_legend_top() theme_mod_legend_bottom() theme_mod_legend_left() theme_mod_legend_right() theme_mod_remove_legend_title()
position |
Position of the ggplot2 legend.
Options are |
position.inside |
Coordinates for the legend inside the plot.
If set overwrites |
Changes the legend.position
of the ggplot2::theme()
.
theme_mod_remove_minor_grid()
, theme_mod_remove_minor_grid_x()
,
theme_mod_remove_minor_grid_y()
are convenience functions remove the minor lines
of the panel grid.
Has to be called after setting the theme.
theme_mod_remove_minor_grid() theme_mod_remove_minor_grid_y() theme_mod_remove_minor_grid_x() theme_mod_remove_panel_grid()
theme_mod_remove_minor_grid() theme_mod_remove_minor_grid_y() theme_mod_remove_minor_grid_x() theme_mod_remove_panel_grid()
Changes the panel.grid.minor
of the ggplot2::theme()
.
Rotate axis labels by 90°, 45° or any angle. Has to be called after setting the theme.
theme_mod_rotate_x_axis_labels( angle = 90, margin_top = 2, vjust = 0.4, hjust = 0, ... ) theme_mod_rotate_x_axis_labels_90(angle = 90, ...) theme_mod_rotate_x_axis_labels_45(angle = 45, ...) theme_mod_rotate_x_axis_labels_30(angle = 30, ...) theme_mod_rotate_x_axis_labels_60(angle = 60, ...) theme_mod_rotate_y_axis_labels(angle = 90, hjust = 0.5, vjust = 0, ...)
theme_mod_rotate_x_axis_labels( angle = 90, margin_top = 2, vjust = 0.4, hjust = 0, ... ) theme_mod_rotate_x_axis_labels_90(angle = 90, ...) theme_mod_rotate_x_axis_labels_45(angle = 45, ...) theme_mod_rotate_x_axis_labels_30(angle = 30, ...) theme_mod_rotate_x_axis_labels_60(angle = 60, ...) theme_mod_rotate_y_axis_labels(angle = 90, hjust = 0.5, vjust = 0, ...)
angle |
Angle of rotation. Should be between 10 and 90 degrees. |
margin_top |
Used to move the tick labels downwards to prevent text intersecting the x-axis. Increase for angled multiline text (e.g. 5 for two lines at 45°). |
hjust , vjust
|
Text justification within the rotated text element. Just ignore. |
... |
Arguments passed to |
Changes the rotation of the axis labels by modifying the axis.text
of the ggplot2::theme()
.
uncount()
is provided by the tidyr package, and re-exported
by ggsurveillance. See tidyr::uncount()
for more details.
uncount()
and its alias expand_counts()
are complements of dplyr::count()
: they take
a data.frame with a column of frequencies and duplicate each row according to
those frequencies.
uncount(data, weights, ..., .remove = TRUE, .id = NULL) expand_counts(data, weights, ..., .remove = TRUE, .id = NULL)
uncount(data, weights, ..., .remove = TRUE, .id = NULL) expand_counts(data, weights, ..., .remove = TRUE, .id = NULL)
data |
A data frame, tibble, or grouped tibble. |
weights |
A vector of weights. Evaluated in the context of |
... |
Additional arguments passed on to methods. |
.remove |
If |
.id |
Supply a string to create a new variable which gives a unique identifier for each created row. |
A data.frame
with rows duplicated according to weights.
df <- data.frame(x = c("a", "b"), n = c(2, 3)) df |> uncount(n) # Or equivalently: df |> expand_counts(n)
df <- data.frame(x = c("a", "b"), n = c(2, 3)) df |> uncount(n) # Or equivalently: df |> expand_counts(n)