
Prepare Person-Time and Event Data for SIR/IRR Calculations
Source:R/poisson_analysis.R
pirr_data.RdThis function merges exposure, response, and population-level data to produce an aggregated dataset for person-time and event rate analysis (e.g., Standardized Incidence Ratios [SIR] or Incidence Rate Ratios [IRR]). It compares disease occurrence across exposure time windows using registry-style longitudinal data.
Arguments
- exposure_diagnoses
A `data.frame` containing exposure diagnoses. Must include columns `ID`, `DATE`, and `DG`. Typically created with `search_diagnoses()`.
- response_diagnoses
A `data.frame` containing response diagnoses. Must include columns `ID`, `DATE`, and `DG`. Typically created with `search_diagnoses()`.
- pop_dates
A `data.frame` with population registry information, including `ID`, `DATE_BIRTH`, `DATE_DEATH`, and `DATE_MIGRATION`. Usually from `classify_population()`.
- all_cases
Logical; if `TRUE`, follow-up continues after the first response case. If `FALSE`, follow-up stops at the first response diagnosis.
- censoring_age
Numeric vector (length 2) specifying the lower and upper ages for follow-up inclusion (e.g., `c(50, 90)`).
- censoring_date
A `Date` vector (length 2) defining the administrative start and end of follow-up (e.g., `c(as.Date("1960-01-01"), as.Date("2022-12-31"))`).
- custom_responses
Optional named list defining custom response diagnosis groupings. For example: `list(Any_fracture = "ankle+forearm+hip+humerus+vertebral", Osteoporotic = "forearm+hip+humerus+vertebral", Hip = "hip")`.
Value
A `data.frame` summarizing:
- pyrs
Person-years within each exposure and age stratum.
- Death
Count of deaths within each stratum.
- Diagnosis counts
Optional columns for each response diagnosis or custom grouping defined in `custom_responses`.
- caika
Exposure time category (e.g., `<1y`, `1–4y`, `5–9y`, etc.).
- Age
Age group at risk.
Details
The function requires the **heaven** package for Lexis splitting utilities. See: [https://github.com/tagteam/heaven](https://github.com/tagteam/heaven)
The function performs the following steps:
Extracts and merges first exposure and response diagnoses per individual.
Computes time differences between exposure and response dates.
Splits follow-up time using `heaven::lexisSeq()` and `heaven::lexisTwo()`.
Categorizes person-time into exposure windows (`<1y`, `1–4y`, `5–9y`, `10–14y`, `15+y`).
Optionally aggregates diagnoses using custom groupings from `custom_responses`.
Restricts follow-up to the age range specified in `censoring_age`.
Examples
if (FALSE) { # \dontrun{
result <- pirr_data(
exposure_diagnoses = exposure_data,
response_diagnoses = response_data,
pop_dates = population_data,
censoring_age = c(50, 90),
censoring_date = c(as.Date("1960-01-01"), as.Date("2023-12-31")),
custom_responses = list(
Any_fracture = "ankle+forearm+hip+humerus+vertebral",
Osteoporotic = "forearm+hip+humerus+vertebral",
Hip = "hip"
)
)
} # }