Compute QQ plot data for weighted and unweighted samples

Calculate quantile-quantile data comparing the distribution of a variable between treatment groups. This function computes the quantiles for both groups and returns a tidy data frame suitable for plotting or further analysis.

Usage

qq(
  .data,
  .var,
  .group,
  .wts = NULL,
  quantiles = seq(0.01, 0.99, 0.01),
  include_observed = TRUE,
  treatment_level = NULL,
  na.rm = FALSE
)

Arguments

.data: A data frame containing the variables.
.var: Variable to compute quantiles for. Supports tidyselect syntax.
.group: Column name of treatment/group variable. Supports tidyselect syntax.
.wts: Optional weighting variable(s). Can be unquoted variable names (supports tidyselect syntax), a character vector, or NULL. Multiple weights can be provided to compare different weighting schemes. Default is NULL (unweighted).
quantiles: Numeric vector of quantiles to compute. Default is seq(0.01, 0.99, 0.01) for 99 quantiles.
include_observed: Logical. If using .wts, also compute observed (unweighted) quantiles? Defaults to TRUE.
treatment_level: The reference treatment level to use for comparisons. If NULL (default), uses the last level for factors or the maximum value for numeric variables.
na.rm: Logical; if TRUE, drop NA values before computation.

Value

A tibble with columns:

method: Character. The weighting method ("observed" or weight variable name).
quantile: Numeric. The quantile probability (0-1).
treated_quantiles: Numeric. The quantile value for the treatment group.
untreated_quantiles: Numeric. The quantile value for the control group.

Details

This function computes the data needed for quantile-quantile plots by calculating corresponding quantiles from two distributions. The computation uses the inverse of the empirical cumulative distribution function (ECDF). For weighted data, it first computes the weighted ECDF and then inverts it to obtain quantiles.

Examples

# Basic QQ data (observed only)
qq(nhefs_weights, age, qsmk)
#> # A tibble: 99 × 4
#>    method   quantile treated_quantiles untreated_quantiles
#>    <fct>       <dbl>             <dbl>               <dbl>
#>  1 observed     0.01                25                25  
#>  2 observed     0.02                25                25  
#>  3 observed     0.03                26                25  
#>  4 observed     0.04                26                25.5
#>  5 observed     0.05                27                26  
#>  6 observed     0.06                27                26  
#>  7 observed     0.07                28                26  
#>  8 observed     0.08                28                27  
#>  9 observed     0.09                29                27  
#> 10 observed     0.1                 29                28  
#> # ℹ 89 more rows

# With weighting
qq(nhefs_weights, age, qsmk, .wts = w_ate)
#> # A tibble: 198 × 4
#>    method   quantile treated_quantiles untreated_quantiles
#>    <fct>       <dbl>             <dbl>               <dbl>
#>  1 observed     0.01                25                25  
#>  2 observed     0.02                25                25  
#>  3 observed     0.03                26                25  
#>  4 observed     0.04                26                25.5
#>  5 observed     0.05                27                26  
#>  6 observed     0.06                27                26  
#>  7 observed     0.07                28                26  
#>  8 observed     0.08                28                27  
#>  9 observed     0.09                29                27  
#> 10 observed     0.1                 29                28  
#> # ℹ 188 more rows

# Compare multiple weighting schemes
qq(nhefs_weights, age, qsmk, .wts = c(w_ate, w_att))
#> # A tibble: 297 × 4
#>    method   quantile treated_quantiles untreated_quantiles
#>    <fct>       <dbl>             <dbl>               <dbl>
#>  1 observed     0.01                25                25  
#>  2 observed     0.02                25                25  
#>  3 observed     0.03                26                25  
#>  4 observed     0.04                26                25.5
#>  5 observed     0.05                27                26  
#>  6 observed     0.06                27                26  
#>  7 observed     0.07                28                26  
#>  8 observed     0.08                28                27  
#>  9 observed     0.09                29                27  
#> 10 observed     0.1                 29                28  
#> # ℹ 287 more rows