| Title: | Lagged Cross-Correlation Analysis of Environmental Time Series |
|---|---|
| Description: | Tools for analysing lagged relationships between environmental variables and ecological or epidemiological time series. The package implements a workflow to aggregate meteorological data over multiple lagged intervals, fit regression models for each lag window, and visualise effect strength and direction using cross-correlation maps (CCM). |
| Authors: | Nicolas Moiroux [aut, cre] (ORCID: <https://orcid.org/0000-0001-6755-6167>), Colombine Bartholomée [aut] (ORCID: <https://orcid.org/0000-0001-7291-5195>), Paul Taconet [aut] (ORCID: <https://orcid.org/0000-0001-7429-7204>) |
| Maintainer: | Nicolas Moiroux <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.8 |
| Built: | 2026-02-12 14:15:43 UTC |
| Source: | https://github.com/Nmoiroux/ecoXCorr |
This function computes aggregated values of one or several meteorological time series over all possible lagged time intervals defined relative to one or more reference dates.
aggregate_lagged_intervals( data, date_col, value_cols, ref_date, interval = 1, max_lag, shift = 0, funs = list(mean = mean, min = min, max = max, sum = sum), na.rm = TRUE )aggregate_lagged_intervals( data, date_col, value_cols, ref_date, interval = 1, max_lag, shift = 0, funs = list(mean = mean, min = min, max = max, sum = sum), na.rm = TRUE )
data |
A data.frame containing the meteorological time series. |
date_col |
Character string giving the name of the date column in
|
value_cols |
Character vector giving the names of numeric variables to be aggregated (e.g. rainfall, temperature). |
ref_date |
Vector of reference dates |
interval |
Integer giving the length of the base time interval |
max_lag |
Integer giving the maximum lag (number of intervals) |
shift |
Integer specifying how many days When This parameter is useful when the predictor measured at or immediately before the sampling date should not contribute to the lagged summary. |
funs |
Named list of aggregation functions to apply to each variable.
Each function must accept a numeric vector as first argument. The names
of the list are used to construct output column names
(e.g. |
na.rm |
Logical indicating whether missing values should be removed
before aggregation. Passed to the aggregation functions (default: |
For each reference date d, lag windows are
constructed backward in time as
,
where x shifts the reference date d back in time
(excluding d when x >= 1), i is the base interval length (in days),
m is the maximum number of intervals considered,
and k, l range from 1 to m with k >= l.
The function supports multiple reference dates, multiple variables, and multiple aggregation functions, and returns all combinations as additional columns in the output data frame.
Reference dates for which at least one interval contains no observations are reported in the console as having missing data. Reference dates for which at least one interval partially lies outside the temporal bounds of the input time series are reported as having truncated intervals.
A data.frame with one row per reference date and lag window, containing:
date: reference date
start, end: start (inclusive) and end (exclusive) of
the aggregation interval
lag_start, lag_end: lag indices defining the interval
One column per combination of variable and aggregation function
(e.g. rain_mean, temperature_sum)
sampling_dates <- unique(albopictusMPL2023$date) met_agg <- aggregate_lagged_intervals( data = meteoMPL2023, date_col = "date", value_cols = c("rain_sum", "temp_mean"), ref_date = sampling_dates, interval = 7, max_lag = 8 ) head(met_agg)sampling_dates <- unique(albopictusMPL2023$date) met_agg <- aggregate_lagged_intervals( data = meteoMPL2023, date_col = "date", value_cols = c("rain_sum", "temp_mean"), ref_date = sampling_dates, interval = 7, max_lag = 8 ) head(met_agg)
This dataset contains entomological sampling records of female *Aedes albopictus* collected in Montpellier (France) during 2023. The data originate from GBIF and correspond to a zero-truncated subset (i.e. only positive captures) of adult female mosquitoes collected using fixed traps across different areas of the city.
albopictusMPL2023albopictusMPL2023
A data.frame with the following variables:
Scientific name of the species (*Aedes albopictus*).
Number of female individuals captured during the sampling event.
Original event date as provided by GBIF.
Identifier of the trap used for sampling.
Identifier of the area where the trap is located.
Sampling date (class Date).
Unique sampling event identifier.
The dataset was derived from a GBIF annotated archive (DOI: 10.15468/4qafbu) and processed to extract sampling dates, trap identifiers, and spatial grouping variables.
Only records corresponding to adult females of *Aedes albopictus* were retained. Observations with zero counts were removed, resulting in a zero-truncated dataset suitable for abundance modelling.
The dataset can be used to illustrate lagged associations between
environmental variables and mosquito abundance, for example in
conjunction with the functions aggregate_lagged_intervals(),
fit_models_by_lag(), and plotCCM().
GBIF occurrence data: doi:10.15468/4qafbu
This wrapper function combines aggregate_lagged_intervals
and fit_models_by_lag into a single workflow. It aggregates
environmental predictors over multiple lag windows relative to sampling
dates, merges them with a response dataset, and fits regression models
separately for each lag window.
ecoXCorr( meteo_data, response_data, date_col_meteo = "date", date_col_resp = "date", value_cols, agg_fun = "mean", interval = 1, max_lag, shift = 0, response, covariates = character(0), random = character(0), family = "gaussian", na.rm = TRUE, track = FALSE, ... )ecoXCorr( meteo_data, response_data, date_col_meteo = "date", date_col_resp = "date", value_cols, agg_fun = "mean", interval = 1, max_lag, shift = 0, response, covariates = character(0), random = character(0), family = "gaussian", na.rm = TRUE, track = FALSE, ... )
meteo_data |
Data frame containing meteorological time series. |
response_data |
Data frame containing the response variable and sampling dates. |
date_col_meteo |
Name of the date column in |
date_col_resp |
Name of the date column in |
value_cols |
Name of one meteorological variables to aggregate. |
agg_fun |
Name (character string) of the aggregation function. Function must accept a numeric vector as first argument. Default to |
interval |
Length of the base lag interval (in days). |
max_lag |
Maximum number of lag intervals. |
shift |
Integer specifying how many days the response date should be shifted back in time when constructing lag windows. |
response |
Name of the response variable. |
covariates |
Optional fixed-effect covariates (adjustment covariates). |
random |
Optional random-effects structure (passed to
|
family |
Model family (GLM or glmmTMB). |
na.rm |
Logical indicating whether NA values are removed before aggregation. |
track |
Logical; if TRUE, prints lag windows during model fitting. |
... |
Additional arguments passed to the model fitting function. |
Both fixed-effect and mixed-effect models are supported.
The modelling function used depends on the random and family arguments:
random is not specified (default): glm
random is note an empty string OR family is a valid glmmTMB family: glmmTMB
For mixed-effects models, marginal (Nakagawa) is returned. For fixed-effects
models, appropriate is used (see ?performance::r2()).
Depending on model specification (depending on random and/or family), for glmmTMB models may not be computed: the returned error or warning is printed in the console.
For each unique combination of lag_start and lag_end, the
function:
Subsets the data to the corresponding lag window,
Removes rows with missing values in the response or predictor,
Fits the specified model,
Extracts beta parameter of the linear predictor,
Extracts the p-value of the predictor effect,
Computes AIC for the specified and null models (i.e. excluding 'predictors' in the fixed part),
Computes the appropriate model (marginal Nakagawa for mixed models),
Records the sign of the estimated effect and the sample size,
Computes delta-AIC and Akaike weight of each model (van de Pol et al. 2016),
Computes adjusted p-value using the False Discovery Rate method (Benjamini and Yekutieli 2001)
The returned table is suitable for lag-window screening, heatmap visualisation, or sensitivity analyses in epidemiological or ecological studies.
A data frame with one row per lag window, containing:
Start lag index of the aggregation window.
End lag index of the aggregation window.
Name of the response variable.
Name of the predictor variable.
Model's coefficient of determination (marginal for mixed models).
Estimated beta parameter of the linear predictor.
Sign of the estimated predictor effect (-1 or +1).
AIC reduction compared to the null model.
Number of observations used to fit the model.
P-value associated with the predictor effect.
Akaike weight.
P-value of the estimated predictor effect, adjusted for multiple testing (false discovery rate method.
Benjamini Y, Yekutieli D (2001).
“The Control of the False Discovery Rate in Multiple Testing under Dependency.”
The Annals of Statistics, 29(4), 1165–1188.
ISSN 0090-5364, 2168-8966, doi:10.1214/aos/1013699998, 2025-01-29.
van de Pol M, Bailey LD, McLean N, Rijsdijk L, Lawson CR, Brouwer L (2016).
“Identifying the Best Climatic Predictors in Ecology and Evolution.”
Methods in Ecology and Evolution, 7(10), 1246–1257.
doi:10.1111/2041-210X.12590, 2026-02-03.
res_glmm <- ecoXCorr( meteo_data = meteoMPL2023, response_data = albopictusMPL2023, date_col_meteo = "date", date_col_resp = "date", value_cols = "rain_sum", agg_fun = "sum", response = "individualCount", interval = 7, max_lag = 8, random = "(1|area/trap)", family = "nbinom2" ) head(res_glmm)res_glmm <- ecoXCorr( meteo_data = meteoMPL2023, response_data = albopictusMPL2023, date_col_meteo = "date", date_col_resp = "date", value_cols = "rain_sum", agg_fun = "sum", response = "individualCount", interval = 7, max_lag = 8, random = "(1|area/trap)", family = "nbinom2" ) head(res_glmm)
This function launches an interactive Shiny application allowing users to run a complete ecoXCorr workflow (aggregation, lagged modelling and visualisation) using either example datasets included in the package or user-provided data.
ecoXCorrApp()ecoXCorrApp()
This function fits a regression model separately for each lag window
defined by the lag_start and lag_end columns of the input
data frame. For each lag window, the model is fitted using
observations corresponding to different reference dates (date),
and summary statistics (betas, sign of effect, , delta AIC, Akaike weight, sample size, p-value, p-value adjusted for multiple testing) are returned
for the specified predictor.
fit_models_by_lag( data, response, predictors, covariates = character(0), random = character(0), family = "gaussian", min_n = 10, track = F, ... )fit_models_by_lag( data, response, predictors, covariates = character(0), random = character(0), family = "gaussian", min_n = 10, track = F, ... )
data |
A data frame containing, at minimum, the columns
|
response |
Character string giving the name of the response variable. |
predictors |
Character vector of predictor names. Currently, only a single predictor is supported; providing more than one predictor will result in an error. |
covariates |
Character vector of predictor names to be included in the fixed part of the model to control for their effect. |
random |
Optional character string specifying random-effects terms or
covariance structure to be added to the model formula (without a leading |
family |
Character string. The name of a family function
to be used in GLM or GLMM models. Default to "gaussian" (Linear model).
see |
min_n |
Minimum number of observations required to fit a model. (Currently not enforced; retained for future extensions.) |
track |
If TRUE, lag window is printed in the console before model fitting. |
... |
Additional arguments passed to the underlying modelling
function ( |
Both fixed-effect and mixed-effect models are supported.
The modelling function used depends on the random and family arguments:
random is not specified (default): glm
random is note an empty string OR family is a valid glmmTMB family: glmmTMB
For mixed-effects models, marginal (Nakagawa) is returned. For fixed-effects
models, appropriate is used (see ?performance::r2()).
Depending on model specification (depending on random and/or family), for glmmTMB models may not be computed: the returned error or warning is printed in the console.
For each unique combination of lag_start and lag_end, the
function:
Subsets the data to the corresponding lag window,
Removes rows with missing values in the response or predictor,
Fits the specified model,
Extracts beta parameter of the linear predictor,
Extracts the p-value of the predictor effect,
Computes AIC for the specified and null models (i.e. excluding 'predictors' in the fixed part),
Computes the appropriate model (marginal Nakagawa for mixed models),
Records the sign of the estimated effect and the sample size,
Computes delta-AIC and Akaike weight of each model (van de Pol et al. 2016),
Computes adjusted p-value using the False Discovery Rate method (Benjamini and Yekutieli 2001)
The returned table is suitable for lag-window screening, heatmap visualisation, or sensitivity analyses in epidemiological or ecological studies.
A data frame with one row per lag window, containing:
Start lag index of the aggregation window.
End lag index of the aggregation window.
Name of the response variable.
Name of the predictor variable.
Model's coefficient of determination (marginal for mixed models).
Estimated beta parameter of the linear predictor.
Sign of the estimated predictor effect (-1 or +1).
AIC reduction compared to the null model.
Number of observations used to fit the model.
P-value associated with the predictor effect.
Akaike weight.
P-value of the estimated predictor effect, adjusted for multiple testing (false discovery rate method.
Benjamini Y, Yekutieli D (2001).
“The Control of the False Discovery Rate in Multiple Testing under Dependency.”
The Annals of Statistics, 29(4), 1165–1188.
ISSN 0090-5364, 2168-8966, doi:10.1214/aos/1013699998, 2025-01-29.
van de Pol M, Bailey LD, McLean N, Rijsdijk L, Lawson CR, Brouwer L (2016).
“Identifying the Best Climatic Predictors in Ecology and Evolution.”
Methods in Ecology and Evolution, 7(10), 1246–1257.
doi:10.1111/2041-210X.12590, 2026-02-03.
sampling_dates <- unique(albopictusMPL2023$date) met_agg <- aggregate_lagged_intervals( data = meteoMPL2023, date_col = "date", value_cols = c("rain_sum", "temp_mean"), ref_date = sampling_dates, interval = 7, max_lag = 8 ) albo_lag <- merge(met_agg, albopictusMPL2023, by = "date", all = TRUE) res_glm <- fit_models_by_lag( data = albo_lag, response = "individualCount", predictors = "rain_sum_sum", random = "", family = "poisson" ) head(res_glm)sampling_dates <- unique(albopictusMPL2023$date) met_agg <- aggregate_lagged_intervals( data = meteoMPL2023, date_col = "date", value_cols = c("rain_sum", "temp_mean"), ref_date = sampling_dates, interval = 7, max_lag = 8 ) albo_lag <- merge(met_agg, albopictusMPL2023, by = "date", all = TRUE) res_glm <- fit_models_by_lag( data = albo_lag, response = "individualCount", predictors = "rain_sum_sum", random = "", family = "poisson" ) head(res_glm)
This dataset contains daily meteorological variables for Montpellier (Fréjorgues Airport, WMO station 07643) during the year 2023. The data were derived from 3-hourly SYNOP observations provided by Météo-France and aggregated to daily summaries.
meteoMPL2023meteoMPL2023
A data.frame with one row per day and the following variables:
Date of observation (class Date).
Daily mean wind speed (m·s⁻¹).
Daily minimum wind speed (m·s⁻¹).
Daily maximum wind speed (m·s⁻¹).
Daily mean air temperature (°C).
Daily minimum air temperature (°C).
Daily maximum air temperature (°C).
Daily mean dew point temperature (°C).
Daily minimum dew point temperature (°C).
Daily maximum dew point temperature (°C).
Daily mean relative humidity (%).
Daily minimum relative humidity (%).
Daily maximum relative humidity (%).
Daily mean atmospheric pressure (Pa).
Daily minimum atmospheric pressure (Pa).
Daily maximum atmospheric pressure (Pa).
Daily mean precipitation (mm).
Daily minimum precipitation (mm).
Daily maximum precipitation (mm).
Daily total precipitation (mm).
The dataset includes temperature, humidity, wind speed, atmospheric pressure and precipitation, and is intended for use in studies of lagged associations between environmental conditions and ecological or epidemiological time series.
Original 3-hourly SYNOP observations were filtered to retain data from Montpellier–Fréjorgues Airport (WIGOS station 0-20000-0-07643). Air temperature and dew point temperature were converted from Kelvin to degrees Celsius. Negative precipitation values were set to zero prior to aggregation.
Daily statistics (mean, minimum and maximum) were computed for all variables. For precipitation, daily totals are also provided.
This dataset is designed to be used in combination with
aggregate_lagged_intervals() to generate lagged environmental
predictors for ecological or epidemiological modelling.
Météo-France SYNOP data, distributed via data.gouv.fr: https://meteo.data.gouv.fr/datasets/686f8595b351c06a3a790867
This function visualises the strength and direction of associations between
a response variable and a lagged predictor across multiple lag windows (Curriero et al. 2005), using
the output of fit_models_by_lag. The resulting plot is a
two-dimensional "cross-correlation map", where each tile represents a
lag window defined by lag_start and lag_end.
plotCCM( data, model_outcome = c("d_aic", "R2sign", "R2", "betas", "weight"), threshold_p = 1 )plotCCM( data, model_outcome = c("d_aic", "R2sign", "R2", "betas", "weight"), threshold_p = 1 )
data |
A data.frame produced by |
model_outcome |
Character string specifying the model's outcomes to plot. Either:
|
threshold_p |
Numeric value giving the p-value threshold above which
associations are masked (set to |
The colour of each tile corresponds to model_outcome. Positive associations are shown in red, negative
associations in blue, and non-significant or filtered values (using threshold_p) are shown in grey.
The lag window yielding the maximum absolute value of model_outcome is highlighted
with a coloured border.
The x-axis corresponds to lag_start (displayed in reverse order), and
the y-axis corresponds to lag_end. Tiles are coloured using a diverging
colour scale centred on zero. Lag windows with p_value >= threshold_p
are not displayed and appear in grey.
This function does not perform any modelling itself; it is intended solely
for visualising results obtained from fit_models_by_lag.
A ggplot2 object representing the cross-correlation map.
Curriero FC, Shone SM, Glass GE (2005). “Cross Correlation Maps: A Tool for Visualizing and Modeling Time Lagged Associations.” Vector Borne and Zoonotic Diseases (Larchmont, N.Y.), 5(3), 267–275. ISSN 1530-3667, doi:10.1089/vbz.2005.5.267.
res_glm <- ecoXCorr( meteo_data = meteoMPL2023, response_data = albopictusMPL2023, date_col_meteo = "date", date_col_resp = "date", value_cols = "rain_sum", agg_fun = "sum", response = "individualCount", interval = 7, max_lag = 8, family = "poisson" ) plotCCM(res_glm, model_outcome = "R2sign", threshold_p = 0.05) plotCCM(res_glm, model_outcome = "R2") plotCCM(res_glm, model_outcome = "d_aic") plotCCM(res_glm, model_outcome = "betas", threshold_p = 0.05)res_glm <- ecoXCorr( meteo_data = meteoMPL2023, response_data = albopictusMPL2023, date_col_meteo = "date", date_col_resp = "date", value_cols = "rain_sum", agg_fun = "sum", response = "individualCount", interval = 7, max_lag = 8, family = "poisson" ) plotCCM(res_glm, model_outcome = "R2sign", threshold_p = 0.05) plotCCM(res_glm, model_outcome = "R2") plotCCM(res_glm, model_outcome = "d_aic") plotCCM(res_glm, model_outcome = "betas", threshold_p = 0.05)