Skip to contents

Computes balance statistics for multiple variables across different groups and optional weighting schemes. This function generalizes balance checking by supporting multiple metrics (SMD, variance ratio, Kolmogorov-Smirnov, weighted correlation) and returns results in a tidy format.

Usage

check_balance(
  .data,
  .vars,
  .group,
  .wts = NULL,
  .metrics = c("smd", "vr", "ks", "energy"),
  include_observed = TRUE,
  reference_group = 1L,
  na.rm = FALSE,
  make_dummy_vars = TRUE,
  squares = FALSE,
  cubes = FALSE,
  interactions = FALSE
)

Arguments

.data

A data frame containing the variables to analyze.

.vars

Variables for which to calculate metrics. Can be unquoted variable names, a character vector, or a tidyselect expression.

.group

Grouping variable, e.g., treatment or exposure group.

.wts

Optional weighting variables. Can be unquoted variable names, a character vector, or NULL. Multiple weights can be provided to compare different weighting schemes.

.metrics

Character vector specifying which metrics to compute. Available options: "smd" (standardized mean difference), "vr" (variance ratio), "ks" (Kolmogorov-Smirnov), "correlation" (for continuous exposures), "energy" (multivariate energy distance). Defaults to c("smd", "vr", "ks", "energy").

include_observed

Logical. If using .wts, also calculate observed (unweighted) metrics? Defaults to TRUE.

reference_group

The reference group level to use for comparisons. Defaults to 1 (first level).

na.rm

A logical value indicating whether to remove missing values before computation. If FALSE (default), missing values in the input will produce NA in the output.

make_dummy_vars

Logical. Transform categorical variables to dummy variables using model.matrix()? Defaults to TRUE. When TRUE, categorical variables are expanded into separate binary indicators for each level.

squares

Logical. Include squared terms for continuous variables? Defaults to FALSE. When TRUE, adds squared versions of numeric variables.

cubes

Logical. Include cubed terms for continuous variables? Defaults to FALSE. When TRUE, adds cubed versions of numeric variables.

interactions

Logical. Include all pairwise interactions between variables? Defaults to FALSE. When TRUE, creates interaction terms for all variable pairs, excluding interactions between levels of the same categorical variable and between squared/cubed terms.

Value

A tibble with columns:

variable

Character. The variable name being analyzed.

group_level

Character. The non-reference group level.

method

Character. The weighting method ("observed" or weight variable name).

metric

Character. The balance metric computed ("smd", "vr", "ks").

estimate

Numeric. The computed balance statistic.

Details

This function serves as a comprehensive balance assessment tool by computing multiple balance metrics simultaneously. It automatically handles different variable types and can optionally transform variables (dummy coding, polynomial terms, interactions) before computing balance statistics.

The function supports several balance metrics:

  • SMD (Standardized Mean Difference): Measures effect size between groups, with values around 0.1 or smaller generally indicating good balance

  • Variance Ratio: Compares group variances, with values near 1.0 indicating similar variability between groups

  • Kolmogorov-Smirnov: Tests distributional differences between groups, with smaller values indicating better balance

  • Correlation: For continuous exposures, measures linear association between covariate and exposure

  • Energy Distance: Multivariate test comparing entire distributions

When multiple weighting schemes are provided, the function computes balance for each method, enabling comparison of different approaches (e.g., ATE vs ATT weights). The include_observed parameter controls whether unweighted ("observed") balance is included in the results.

See also

bal_smd(), bal_vr(), bal_ks(), bal_corr(), bal_energy() for individual metric functions

Other balance functions: bal_corr(), bal_ks(), bal_smd(), bal_vr(), check_auc()

Examples

# Basic usage with all metrics
check_balance(nhefs_weights, c(age, wt71), qsmk, .wts = c(w_ate, w_att))
#> # A tibble: 21 × 5
#>    variable group_level method   metric estimate
#>    <chr>    <chr>       <chr>    <chr>     <dbl>
#>  1 age      0           observed ks      0.130  
#>  2 age      0           w_ate    ks      0.0293 
#>  3 age      0           w_att    ks      0.0362 
#>  4 age      0           observed smd     0.282  
#>  5 age      0           w_ate    smd     0.00585
#>  6 age      0           w_att    smd     0.0122 
#>  7 age      0           observed vr      1.07   
#>  8 age      0           w_ate    vr      1.01   
#>  9 age      0           w_att    vr      1.01   
#> 10 wt71     0           observed ks      0.0700 
#> # ℹ 11 more rows

# With specific metrics only
check_balance(nhefs_weights, c(age, wt71), qsmk, .metrics = c("smd", "energy"))
#> # A tibble: 3 × 5
#>   variable group_level method   metric estimate
#>   <chr>    <chr>       <chr>    <chr>     <dbl>
#> 1 age      0           observed smd      0.282 
#> 2 wt71     0           observed smd      0.133 
#> 3 NA       NA          observed energy   0.0503

# Exclude observed results
check_balance(nhefs_weights, c(age, wt71), qsmk, .wts = w_ate, include_observed = FALSE)
#> # A tibble: 7 × 5
#>   variable group_level method metric estimate
#>   <chr>    <chr>       <chr>  <chr>     <dbl>
#> 1 age      0           w_ate  ks      0.0293 
#> 2 age      0           w_ate  smd     0.00585
#> 3 age      0           w_ate  vr      1.01   
#> 4 wt71     0           w_ate  ks      0.0358 
#> 5 wt71     0           w_ate  smd    -0.00903
#> 6 wt71     0           w_ate  vr      1.00   
#> 7 NA       NA          w_ate  energy  0.00217

# Use correlation for continuous exposure
check_balance(mtcars, c(mpg, hp), disp, .metrics = c("correlation", "energy"))
#> # A tibble: 3 × 5
#>   variable group_level method   metric      estimate
#>   <chr>    <chr>       <chr>    <chr>          <dbl>
#> 1 hp       disp        observed correlation    0.791
#> 2 mpg      disp        observed correlation   -0.848
#> 3 NA       NA          observed energy         0.882

# With dummy variables for categorical variables (default behavior)
check_balance(nhefs_weights, c(age, sex, race), qsmk)
#> # A tibble: 10 × 5
#>    variable group_level method   metric estimate
#>    <chr>    <chr>       <chr>    <chr>     <dbl>
#>  1 age      0           observed ks       0.130 
#>  2 age      0           observed smd      0.282 
#>  3 age      0           observed vr       1.07  
#>  4 race     0           observed ks       0.0568
#>  5 race     0           observed smd     -0.177 
#>  6 race     0           observed vr       0.652 
#>  7 sex      0           observed ks       0.0799
#>  8 sex      0           observed smd     -0.160 
#>  9 sex      0           observed vr       0.996 
#> 10 NA       NA          observed energy   0.0641

# Without dummy variables for categorical variables
check_balance(nhefs_weights, c(age, sex, race), qsmk, make_dummy_vars = FALSE)
#> # A tibble: 10 × 5
#>    variable group_level method   metric estimate
#>    <chr>    <chr>       <chr>    <chr>     <dbl>
#>  1 age      0           observed ks       0.130 
#>  2 age      0           observed smd      0.282 
#>  3 age      0           observed vr       1.07  
#>  4 race     0           observed ks      NA     
#>  5 race     0           observed smd     NA     
#>  6 race     0           observed vr      NA     
#>  7 sex      0           observed ks      NA     
#>  8 sex      0           observed smd     NA     
#>  9 sex      0           observed vr      NA     
#> 10 NA       NA          observed energy   0.0641