Calculate quantile-quantile data comparing the distribution of a variable between treatment groups. This function computes the quantiles for both groups and returns a tidy data frame suitable for plotting or further analysis.
Usage
qq(
.data,
.var,
.group,
.wts = NULL,
quantiles = seq(0.01, 0.99, 0.01),
include_observed = TRUE,
treatment_level = NULL,
na.rm = FALSE
)
Arguments
- .data
A data frame containing the variables.
- .var
Variable to compute quantiles for. Supports tidyselect syntax.
- .group
Column name of treatment/group variable. Supports tidyselect syntax.
- .wts
Optional weighting variable(s). Can be unquoted variable names (supports tidyselect syntax), a character vector, or NULL. Multiple weights can be provided to compare different weighting schemes. Default is NULL (unweighted).
- quantiles
Numeric vector of quantiles to compute. Default is
seq(0.01, 0.99, 0.01)
for 99 quantiles.- include_observed
Logical. If using
.wts
, also compute observed (unweighted) quantiles? Defaults to TRUE.- treatment_level
The reference treatment level to use for comparisons. If
NULL
(default), uses the last level for factors or the maximum value for numeric variables.- na.rm
Logical; if TRUE, drop NA values before computation.
Value
A tibble with columns:
- method
Character. The weighting method ("observed" or weight variable name).
- quantile
Numeric. The quantile probability (0-1).
- treated_quantiles
Numeric. The quantile value for the treatment group.
- untreated_quantiles
Numeric. The quantile value for the control group.
Details
This function computes the data needed for quantile-quantile plots by calculating corresponding quantiles from two distributions. The computation uses the inverse of the empirical cumulative distribution function (ECDF). For weighted data, it first computes the weighted ECDF and then inverts it to obtain quantiles.
Examples
# Basic QQ data (observed only)
qq(nhefs_weights, age, qsmk)
#> # A tibble: 99 × 4
#> method quantile treated_quantiles untreated_quantiles
#> <fct> <dbl> <dbl> <dbl>
#> 1 observed 0.01 25 25
#> 2 observed 0.02 25 25
#> 3 observed 0.03 26 25
#> 4 observed 0.04 26 25.5
#> 5 observed 0.05 27 26
#> 6 observed 0.06 27 26
#> 7 observed 0.07 28 26
#> 8 observed 0.08 28 27
#> 9 observed 0.09 29 27
#> 10 observed 0.1 29 28
#> # ℹ 89 more rows
# With weighting
qq(nhefs_weights, age, qsmk, .wts = w_ate)
#> # A tibble: 198 × 4
#> method quantile treated_quantiles untreated_quantiles
#> <fct> <dbl> <dbl> <dbl>
#> 1 observed 0.01 25 25
#> 2 observed 0.02 25 25
#> 3 observed 0.03 26 25
#> 4 observed 0.04 26 25.5
#> 5 observed 0.05 27 26
#> 6 observed 0.06 27 26
#> 7 observed 0.07 28 26
#> 8 observed 0.08 28 27
#> 9 observed 0.09 29 27
#> 10 observed 0.1 29 28
#> # ℹ 188 more rows
# Compare multiple weighting schemes
qq(nhefs_weights, age, qsmk, .wts = c(w_ate, w_att))
#> # A tibble: 297 × 4
#> method quantile treated_quantiles untreated_quantiles
#> <fct> <dbl> <dbl> <dbl>
#> 1 observed 0.01 25 25
#> 2 observed 0.02 25 25
#> 3 observed 0.03 26 25
#> 4 observed 0.04 26 25.5
#> 5 observed 0.05 27 26
#> 6 observed 0.06 27 26
#> 7 observed 0.07 28 26
#> 8 observed 0.08 28 27
#> 9 observed 0.09 29 27
#> 10 observed 0.1 29 28
#> # ℹ 287 more rows